In the world of web applications, everyone loves the scenario where one app requests an action and another app responds to that request afterward. This basic yet efficient pattern is, in most cases, everything that your web applications need to function properly. However, what happens if one app cannot wait for another app to finish its action and then send a response? In instant messaging and video conference calls, for example, both apps need to be perfectly synchronized with one another so that they can communicate in real-time. This is where real-time communication web technologies come into the spotlight.
Real-time communication (RTC) web technology is a term used to refer to any web technology that enables live communication between two or more peers over the Internet without transmission delays. Lately, the leader in this field has been one modern, promising and powerful technology called WebRTC.
What is WebRTC
WebRTC is a free, open-source technology that provides browsers and mobile applications with real-time communication (RTC) capabilities through simple application programming interfaces (APIs). It allows direct peer-to-peer audio and video communication, eliminating the need to install any additional plugins or native apps. It was created by Google in May 2011 as an open-source technology for browser-based real-time communication. Currently, WebRTC is fully stable, its last release (v1.0) being made in May 2018. It’s standardized through the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF) and is supported by Apple, Google, Microsoft, Mozilla, and Opera.
How does WebRTC work
WebRTC comes with a pre-defined messaging algorithm, which has to be configured in the software solution in order for WebRTC communication to start and work normally. The image below illustrates what happens when a WebRTC connection is established.
To sum up, there are four main steps in starting a WebRTC session:
- The peers need to find their public IP address using a STUN – Session Traversal Utilities for NAT – server.
- After the peers have found their public IP address, client A needs to send ‘Offer SDP’ to client B through a signaling server, after which client B needs to validate the offer and provide an ‘Answer SDP’ back to client A. The SDP – Session Description Protocol – is a format for describing streaming media communications parameters that are used to announce a session and to validate its parameters.
- After the session is announced and validated, client A needs to send ‘ICE Candidates’ to client B through a signaling server, after which client B needs to validate the ICE candidates and reply with the same type of message. The ICE – Interactive Connectivity Establishment – is a method used to retrieve all available candidates (IP addresses), which are then passed to the clients.
- When the session is fully validated and the ICE candidates are ready, the WebRTC audio and video communication can start, either directly or through a TURN – Traversal Using Relays around NAT – server.
Under the hood, WebRTC uses several different protocols in other to provide reliable audio and video communication.
As the image above shows, WebRTC uses UDP – User Datagram Protocol – at the transport layer, mainly because UDP protocol is the foundation for real-time communication in modern browsers. Apart from that, WebRTC uses mentioned ICE, STUN and TURN protocols to establish and maintain a peer-to-peer connection over the UDP. Finally, SCTP – Stream Control Transport Protocol – and SRTP – Secure Real-Time Transport Protocol – are used to multiplex the different streams, provide security and congestion, and provide partially reliable delivery on top of UDP.
How to implement WebRTC
As seen in the explanation of the WebRTC messaging algorithm, in order to implement WebRTC in a software solution, three main things need to be configured:
- The signaling server
- The STUN and TURN servers
- The client-side apps (web or mobile)
The signaling server is a necessary server-side app that the client apps use to communicate with one another. It uses network sockets technology (WebSocket, socket.io or a similar solution) to organize the clients into specific groups (usually called rooms), in order to listen to their messages and redirect them to the right recipient. Implementation of the signaling server is usually pretty straightforward, but it has to be synchronized with the messaging algorithm implemented in the client apps.
STUN and TURN servers
STUN servers are lightweight servers used to help clients find their public IP address. They’re not so difficult to set up and are also commonly accessible for free (e.g. Google STUN servers are fully free to use). When peers learn their public IP address, they can continue with the messaging process, after which a connection is established, and media transfer can start. And, what about the TURN servers, you may ask? In a perfect world, STUN servers would always be enough to establish a WebRTC connection. However, in about 30 per cent of cases, finding the public IP address of a peer through a STUN server isn’t enough to establish connectivity. This mostly happens because of strict Firewall rules or symmetric NAT (Network Address Translation) that is used in the network configuration of the peers. In those cases, a TURN server is needed to establish the connection and to serve as a relay while broadcasting media data between peers.
TURN servers can be custom implemented (either fully or by using open-source solutions, such as coTURN) or can be bought as a service (e.g. Xirsys). It’s recommended that you include several STUN servers, as well as a few TURN servers in a WebRTC configuration. In fact, the more servers you use the more stable the connection. However, it’s also worth remembering that, beyond a certain point, using too many can make the connection process slower.
Client-side apps (web or mobile)
The final step in WebRTC implementation is the configuration of the client-side apps (web or mobile apps). In programming the client-side app, it’s necessary to configure two major things for WebRTC to fully function:
- Getting the client-specific user media – it consists of using a few pre-defined WebRTC APIs in program code. The developers of WebRTC made it really simple for programmers to implement this part, so a few lines of code per technology should do the trick.
- Defining the communication through sockets – it means creating a messaging algorithm in program code, which will allow each peer to go through each step of the previously described WebRTC messaging algorithm in the right order.
When the communication through sockets is correctly defined and synchronized with the signaling server implementation, the client only needs to join the socket group, get the user media from the device and start the messaging pattern through the signaling server. If everything is configured correctly, the WebRTC session should automatically initialize and the call can start.
Having in mind everything that’s been said so far, we can conclude that WebRTC is definitely not by accident the leader in real-time communication technologies. The main advantages of WebRTC over similar technologies of the same type are its simplicity of implementation, predefined security on multiple layers, the possibility of implementing cross-platform solution connecting both web browsers and mobile operating systems, and last but not least, the ease of use by the end-users. Whether its simple webchat between two clients on the same platform or complicated real-time communication solution between multiple clients on different platforms, WebRTC is for a reason the first choice by a large number of IT experts and it will be very interesting to see how will the further development of this technology improve the understanding of real-time communication as we know today.