Video Calling Technology

Written By H T (Super Administrator)

Updated at April 23rd, 2020

Video Calling Technology

  • tokbox or opentalk
  •  WebRTC 

What is OpenTok?

It adds the clarity and emotion of face-to-face communication to your brand whether you're developing for the web, iOS, or Android. We make the integration of high-quality live video a breeze so that you can focus on building a great product.

What is WebRTC?

It is a free, open project that enables web browsers with Real-Time Communications (RTC) capabilities via simple JavaScript APIs. The WebRTC components have been optimized to best serve this purpose.

Real-time communication without plugins

Imagine a world where your phone, TV and computer could all communicate on a common platform. Imagine it was easy to add video chat and peer-to-peer data sharing to your web application. That's the vision of WebRTC.

Want to try it out? WebRTC is available now in Google Chrome, Safari, Firefox and Opera, on desktop and mobile. A good place to start is the simple video chat application at

  1. Open in your browser.
  2. Click the Join button to join a chat room and let the app use your webcam.
  3. Open the URL displayed at the bottom of the page in a new tab or, better still, on a different computer.

Quick start

Haven't got time to read this article, or just want code?

  1. Get an overview of WebRTC from the Google I/O presentation (the slides are here):

  2. If you haven't used getUserMedia, take a look at the HTML5 Rocks article and view the source for the simple example at
  3. Get to grips with the RTCPeerConnection API by reading through the example below and the demo at, which implements WebRTC on a single web page.
  4. Learn more about how WebRTC uses servers for signaling, and firewall and NAT traversal, by reading through the code and console logs from
  5. Can’t wait and just want to try out WebRTC right now? Try out some of the 20+ demos that exercise the WebRTC JavaScript APIs.
  6. Having trouble with your machine and WebRTC? Try out our troubleshooting page

Alternatively, jump straight into our WebRTC codelab: a step-by-step guide that explains how to build a complete video chat app, including a simple signaling server.

A very short history of WebRTC

One of the last major challenges for the web is to enable human communication via voice and video: Real Time Communication, RTC for short. RTC should be as natural in a web application as entering text in a text input. Without it, we're limited in our ability to innovate and develop new ways for people to interact.

Historically, RTC has been corporate and complex, requiring expensive audio and video technologies to be licensed or developed in house. Integrating RTC technology with existing content, data and services has been difficult and time consuming, particularly on the web.

Gmail video chat became popular in 2008, and in 2011 Google introduced Hangouts, which use the Google Talk service (as did Gmail). Google bought GIPS, a company which had developed many components required for RTC, such as codecs and echo cancellation techniques. Google open sourced the technologies developed by GIPS and engaged with relevant standards bodies at the IETF and W3C to ensure industry consensus. In May 2011, Ericsson built the first implementation of WebRTC.

WebRTC implemented open standards for real-time, plugin-free video, audio and data communication. The need was real:

  • Many web services used RTC, but needed downloads, native apps or plugins. Those included Skype, Facebook and Google Hangouts.
  • Downloading, installing and updating plugins is complex, error prone and annoying.
  • Plugins are difficult to deploy, debug, troubleshoot, test and maintain—and may require licensing and integration with complex, expensive technology. It's often difficult to persuade people to install plugins in the first place!

The guiding principles of the WebRTC project are that its APIs should be open source, free, standardized, built into web browsers and more efficient than existing technologies.

Where are we now?

WebRTC is used in various apps like WhatsApp, Facebook Messenger, and platforms such as TokBox. WebRTC has also been integrated with WebKitGTK+ and Qt native apps.

WebRTC implements three APIs:

The APIs are defined in two specs:

All three APIs are supported on mobile and desktop by Chrome, Safari, Firefox, Edge and Opera.

getUserMedia: View the demos and code at or try out Chris Wilson's amazing examples that use getUserMedia as input for Web Audio.

RTCPeerConnection: There's an ultra-simple demo at and a fully functional video chat application at This app uses adapter.js, a JavaScript shim, maintained Google with help from the WebRTC community, to abstracts away browser differences and spec changes.

RTCDataChannel: Check out one of the data channel demos at to see this in action.

Our WebRTC codelab shows how to use all three APIs to build a simple application for video chat and file sharing.

My first WebRTC

WebRTC applications need to do several things:

  • Get streaming audio, video or other data.
  • Get network information such as IP addresses and ports, and exchange this with other WebRTC clients (known as peers) to enable connection, even through NATs and firewalls.
  • Coordinate signaling communication to report errors and initiate or close sessions.
  • Exchange information about media and client capability, such as resolution and codecs.
  • Communicate streaming audio, video or data.

To acquire and communicate streaming data, WebRTC implements the following APIs:

  • MediaStream: get access to data streams, such as from the user's camera and microphone.
  • RTCPeerConnection: audio or video calling, with facilities for encryption and bandwidth management.
  • RTCDataChannel: peer-to-peer communication of generic data.

(There is detailed discussion of the network and signaling aspects of WebRTC below.)

MediaStream (aka getUserMedia)

The MediaStream API represents synchronized streams of media. For example, a stream taken from camera and microphone input has synchronized video and audio tracks. (Don't confuse MediaStreamTrack with the <track> element, which is something entirely different.)

Probably the easiest way to understand MediaStream is to look at it in the wild:

  1. In your browser, open the demo at
  2. Open the console.
  3. Inspect the stream variable, which is in global scope.

Each MediaStream has an input, which might be a MediaStream generated by getUserMedia(), and an output, which might be passed to a video element or an RTCPeerConnection.

The getUserMedia() method takes a MediaStreamConstraints object parameter, and returns a Promise that resolves to a MediaStream object.

Each MediaStream has a label, such as'Xk7EuLhsuHKbnjLWkW4yYGNJJ8ONsgwHBvLQ'. An array of MediaStreamTracks is returned by the getAudioTracks() and getVideoTracks() methods.

For the example, stream.getAudioTracks() returns an empty array (because there's no audio) and, assuming a working webcam is connected, stream.getVideoTracks() returns an array of one MediaStreamTrack representing the stream from the webcam. Each MediaStreamTrack has a kind ('video' or 'audio'), and a label (something like 'FaceTime HD Camera (Built-in)'), and represents one or more channels of either audio or video. In this case, there is only one video track and no audio, but it is easy to imagine use cases where there are more: for example, a chat application that gets streams from the front camera, rear camera, microphone, and a 'screenshared' application.

A MediaStream can be attached to a video element by setting the srcObject attribute.. Previously this was done by setting the src attribute to an object URL created with URL.createObjectURL(), but this has been deprecated.

The MediaStreamTrack is actively using the camera, which takes resources and keeps the camera open (and camera light on). When you are no longer using a track make sure to call track.stop() so that the camera can be closed.

getUserMedia can also be used as an input node for the Web Audio API:

// cope with browser differences
let audioContext;
if (typeof AudioContext === 'function') {
  audioContext = new AudioContext();
} else if (typeof webkitAudioContext === 'function') {
  audioContext = new webkitAudioContext(); // eslint-disable-line new-cap
} else {
  console.log('Sorry! Web Audio not supported.');

// create a filter node
var filterNode = audioContext.createBiquadFilter();
// see
filterNode.type = 'highpass';
// cutoff frequency: for highpass, audio is attenuated below this frequency
filterNode.frequency.value = 10000;

// create a gain node (to change audio volume)
var gainNode = audioContext.createGain();
// default is 1 (no change); less than 1 means audio is attenuated
// and vice versa
gainNode.gain.value = 0.5;

navigator.mediaDevices.getUserMedia({audio: true}, (stream) => {
  // Create an AudioNode from the stream
  const mediaStreamSource =
  // connect the gain node to the destination (i.e. play the sound)

Chromium-based apps and extensions can also incorporate getUserMedia. Adding audioCapture and/or videoCapture permissions to the manifest enables permission to be requested and granted only once, on installation. Thereafter the user is not asked for permission for camera or microphone access.

Permission only has to be granted once for getUserMedia(). First time around, an Allow button is displayed in the browser's infobar. HTTP access for getUserMedia() was deprecated by Chrome at the end of 2015 due to it being classified as a Powerful feature.

The intention is potentially to enable a MediaStream for any streaming data source, not just a camera or microphone. This would enable streaming from disc, or from arbitrary data sources such as sensors or other inputs.

getUserMedia() really comes to life in combination with other JavaScript APIs and libraries:

  • Webcam Toy is a photobooth app that uses WebGL to add weird and wonderful effects to photos which can be shared or saved locally.
  • FaceKat is a 'face tracking' game built with headtrackr.js.
  • ASCII Camera uses the Canvas API to generate ASCII images.
gUM ASCII art!


Constraints can be used to set values for video resolution for getUserMedia(). This also allows support for other constraints such as aspect ratio, facing mode (front or back camera), frame rate, height and width, along with an applyConstraints() method.

There's an example at

One gotcha: getUserMedia constraints may affect the available configurations of a shared resource. For example if a camera was opened in 640x480 mode by one tab, another tab will not be able to use constraints to open it in a higher-resolution mode since it can only be opened in one mode. Note that this is an implementation detail: it would be possible to let the second tab reopen the camera in a higher resolution mode and use video processing to downscale the video track to 640x480 for the first tab, but this has not been implemented.

Setting a disallowed constraint value gives a DOMException, or an OverconstrainedError if (for example) a resolution requested is not available. To see this in action, try the demo at

Screen and tab capture

Chrome apps also make it possible to share a live 'video' of a single browser tab or the entire desktop via chrome.tabCapture and chrome.desktopCapture APIs. (There's a demo and more information in the HTML5 Rocks Update article Screensharing with WebRTC. A few years old, but still interesting.)

It's also possible to use screen capture as a MediaStream source in Chrome using the experimental chromeMediaSource constraint, as in this demo. Note that screen capture requires HTTPS and should only be used for development due to it being enabled via a command line flag as explaind in this discuss-webrtc post.

Signaling: session control, network and media information

WebRTC uses RTCPeerConnection to communicate streaming data between browsers (aka peers), but also needs a mechanism to coordinate communication and to send control messages, a process known as signaling. Signaling methods and protocols are not specified by WebRTC: signaling is not part of the RTCPeerConnection API.

Instead, WebRTC app developers can choose whatever messaging protocol they prefer, such as SIP or XMPP, and any appropriate duplex (two-way) communication channel. The example uses XHR and the Channel API as the signaling mechanism. The codelab we built uses running on a Node server.

Signaling is used to exchange three types of information:

  • Session control messages: to initialize or close communication and report errors.
  • Network configuration: to the outside world, what's my computer's IP address and port?
  • Media capabilities: what codecs and resolutions can be handled by my browser and the browser it wants to communicate with?

The exchange of information via signaling must have completed successfully before peer-to-peer streaming can begin.

For example, imagine Alice wants to communicate with Bob. Here's a code sample from the W3C WebRTC spec, which shows the signaling process in action. The code assumes the existence of some signaling mechanism, created in the createSignalingChannel() method. Also note that on Chrome and Opera, RTCPeerConnection is currently prefixed.

// handles JSON.stringify/parse
const signaling = new SignalingChannel();
const constraints = {audio: true, video: true};
const configuration = {iceServers: [{urls: ''}]};
const pc = new RTCPeerConnection(configuration);

// send any ice candidates to the other peer
pc.onicecandidate = ({candidate}) => signaling.send({candidate});

// let the "negotiationneeded" event trigger offer generation
pc.onnegotiationneeded = async () => {
  try {
    await pc.setLocalDescription(await pc.createOffer());
    // send the offer to the other peer
    signaling.send({desc: pc.localDescription});
  } catch (err) {

// once remote track media arrives, show it in remote video element
pc.ontrack = (event) => {
  // don't set srcObject again if it is already set.
  if (remoteView.srcObject) return;
  remoteView.srcObject = event.streams[0];

// call start() to initiate
async function start() {
  try {
    // get local stream, show it in self-view and add it to be sent
    const stream =
      await navigator.mediaDevices.getUserMedia(constraints);
    stream.getTracks().forEach((track) =>
      pc.addTrack(track, stream));
    selfView.srcObject = stream;
  } catch (err) {

signaling.onmessage = async ({desc, candidate}) => {
  try {
    if (desc) {
      // if we get an offer, we need to reply with an answer
      if (desc.type === 'offer') {
        await pc.setRemoteDescription(desc);
        const stream =
          await navigator.mediaDevices.getUserMedia(constraints);
        stream.getTracks().forEach((track) =>
          pc.addTrack(track, stream));
        await pc.setLocalDescription(await pc.createAnswer());
        signaling.send({desc: pc.localDescription});
      } else if (desc.type === 'answer') {
        await pc.setRemoteDescription(desc);
      } else {
        console.log('Unsupported SDP type.');
    } else if (candidate) {
      await pc.addIceCandidate(candidate);
  } catch (err) {

First up, Alice and Bob exchange network information. (The expression 'finding candidates' refers to the process of finding network interfaces and ports using the ICE framework.)

  1. Alice creates an RTCPeerConnection object with an onicecandidate handler.
  2. The handler is run when network candidates become available.
  3. Alice sends serialized candidate data to Bob, via whatever signaling channel they are using: WebSocket or some other mechanism.
  4. When Bob gets a candidate message from Alice, he calls addIceCandidate, to add the candidate to the remote peer description.

WebRTC clients (known as peers, aka Alice and Bob) also need to ascertain and exchange local and remote audio and video media information, such as resolution and codec capabilities. Signaling to exchange media configuration information proceeds by exchanging an offer and an answer using the Session Description Protocol (SDP):

  1. Alice runs the RTCPeerConnection createOffer() method. The return from this of this is passed an RTCSessionDescription: Alice's local session description.
  2. In the callback, Alice sets the local description using setLocalDescription() and then sends this session description to Bob via their signaling channel. Note that RTCPeerConnection won't start gathering candidates until setLocalDescription() is called: this is codified in JSEP IETF draft.
  3. Bob sets the description Alice sent him as the remote description using setRemoteDescription().
  4. Bob runs the RTCPeerConnection createAnswer() method, passing it the remote description he got from Alice, so a local session can be generated that is compatible with hers. The createAnswer() callback is passed an RTCSessionDescription: Bob sets that as the local description and sends it to Alice.
  5. When Alice gets Bob's session description, she sets that as the remote description with setRemoteDescription.
  6. Ping!

Make sure to allow the RTCPeerConnection to be garbage collected by calling close() when it's no longer needed. Otherwise threads and connections are kept alive. It's possible to leak heavy resources in WebRTC!

RTCSessionDescription objects are blobs that conform to the Session Description Protocol, SDP. Serialized, an SDP object looks like this:

o=- 3883943731 1 IN IP4
t=0 0
a=group:BUNDLE audio video
m=audio 1 RTP/SAVPF 103 104 0 8 106 105 13 126

// ...

a=ssrc:2223794119 label:H4fjnMzxy3dPIgQ7HxuCTLb4wLLLeRHnFxh810

The acquisition and exchange of network and media information can be done simultaneously, but both processes must have completed before audio and video streaming between peers can begin.

The offer/answer architecture described above is called JSEP, JavaScript Session Establishment Protocol. (There's an excellent animation explaining the process of signaling and streaming in Ericsson's demo video for its first WebRTC implementation.)

JSEP architecture

Once the signaling process has completed successfully, data can be streamed directly peer to peer, between the caller and callee—or if that fails, via an intermediary relay server (more about that below). Streaming is the job of RTCPeerConnection.


RTCPeerConnection is the WebRTC component that handles stable and efficient communication of streaming data between peers.

Below is a WebRTC architecture diagram showing the role of RTCPeerConnection. As you will notice, the green parts are complex!

WebRTC architecture diagram
WebRTC architecture (from

From a JavaScript perspective, the main thing to understand from this diagram is that RTCPeerConnection shields web developers from the myriad complexities that lurk beneath. The codecs and protocols used by WebRTC do a huge amount of work to make real-time communication possible, even over unreliable networks:

  • packet loss concealment
  • echo cancellation
  • bandwidth adaptivity
  • dynamic jitter buffering
  • automatic gain control
  • noise reduction and suppression
  • image 'cleaning'.

The W3C code above shows a simplified example of WebRTC from a signaling perspective. Below are walkthroughs of two working WebRTC applications: the first is a simple example to demonstrate RTCPeerConnection; the second is a fully operational video chat client.

RTCPeerConnection without servers

The code below is taken from the 'single page' WebRTC demo at, which has local and remote RTCPeerConnection (and local and remote video) on one web page. This doesn't constitute anything very useful—caller and callee are on the same page—but it does make the workings of the RTCPeerConnection API a little clearer, since the RTCPeerConnection objects on the page can exchange data and messages directly without having to use intermediary signaling mechanisms.

In this example, pc1 represents the local peer (caller) and pc2 represents the remote peer (callee).


  1. Create a new RTCPeerConnection and add the stream from getUserMedia():

    // servers is an optional config file (see TURN and STUN discussion below)
    pc1 = new RTCPeerConnection(servers);
    // ...
    localStream.getTracks().forEach((track) => {
      pc1.addTrack(track, localStream);
  2. Create an offer and set it as the local description for pc1 and as the remote description for pc2. This can be done directly in the code without using signaling, because both caller and callee are on the same page:

    pc1.setLocalDescription(desc).then(() => {
      trace('pc2 setRemoteDescription start');
      pc2.setRemoteDescription(desc).then(() => {


  1. Create pc2 and, when the stream from pc1 is added, display it in a video element:

    pc2 = new RTCPeerConnection(servers);
    pc2.ontrack = gotRemoteStream;
    function gotRemoteStream(e){
      vid2.srcObject =;

RTCPeerConnection plus servers

In the real world, WebRTC needs servers, however simple, so the following can happen:

  • Users discover each other and exchange 'real world' details such as names.
  • WebRTC client applications (peers) exchange network information.
  • Peers exchange data about media such as video format and resolution.
  • WebRTC client applications traverse NAT gateways and firewalls.

In other words, WebRTC needs four types of server-side functionality:

  • User discovery and communication.
  • Signaling.
  • NAT/firewall traversal.
  • Relay servers in case peer-to-peer communication fails.

NAT traversal, peer-to-peer networking, and the requirements for building a server app for user discovery and signaling, are beyond the scope of this article. Suffice to say that the STUN protocol and its extension TURN are used by the ICE framework to enable RTCPeerConnection to cope with NAT traversal and other network vagaries.

ICE is a framework for connecting peers, such as two video chat clients. Initially, ICE tries to connect peers directly, with the lowest possible latency, via UDP. In this process, STUN servers have a single task: to enable a peer behind a NAT to find out its public address and port. (You can find out more about STUN and TURN from the HTML5 Rocks article WebRTC in the real world.)

Finding connection candidates

If UDP fails, ICE tries TCP. If direct connection fails—in particular, because of enterprise NAT traversal and firewalls—ICE uses an intermediary (relay) TURN server. In other words, ICE will first use STUN with UDP to directly connect peers and, if that fails, will fall back to a TURN relay server. The expression 'finding candidates' refers to the process of finding network interfaces and ports.

WebRTC data pathways

WebRTC engineer Justin Uberti provides more information about ICE, STUN and TURN in the 2013 Google I/O WebRTC presentation. (The presentation slides give examples of TURN and STUN server implementations.)

A simple video chat client

A good place to try out WebRTC, complete with signaling and NAT/firewall traversal using a STUN server, is the video chat demo at This app uses adapter.js, a shim to insulate apps from spec changes and prefix differences. For full interop information, see

The code is deliberately verbose in its logging: check the console to understand the order of events. Below we give a detailed walk-through of the code.

If you find this somewhat baffling, you may prefer our WebRTC codelab. This step-by-step guide explains how to build a complete video chat application, including a simple signaling server running on a Node server.

Network topologies

WebRTC as currently implemented only supports one-to-one communication, but could be used in more complex network scenarios: for example, with multiple peers each communicating each other directly, peer-to-peer, or via a Multipoint Control Unit (MCU), a server that can handle large numbers of participants and do selective stream forwarding, and mixing or recording of audio and video:

Multipoint Control Unit topology example

Many existing WebRTC apps only demonstrate communication between web browsers, but gateway servers can enable a WebRTC app running on a browser to interact with devices such as telephones (aka PSTN) and with VOIP systems. In May 2012, Doubango Telecom open-sourced the sipml5 SIP client, built with WebRTC and WebSocket which (among other potential uses) enables video calls between browsers and apps running on iOS or Android. At Google I/O, Tethr and Tropo demonstrated a framework for disaster communications 'in a briefcase', using an OpenBTS cell to enable communications between feature phones and computers via WebRTC. Telephone communication without a carrier!