Demystifying WebRTC by Calling Beyoncé

Read this in 6 minutes

A still of Beyoncé with mascara running down her cheeks as if she's been crying. She's holding a light blue handset to her ear.

You may not have heard of WebRTC but with the rise of internet-based video applications this year, you’ve probably used it. On the Samsung Internet team we use Whereby for our daily stand-ups, which is a WebRTC-based service like many of these services. The Web Real Time Communication Protocol (WebRTC) is a set of JavaScript web APIs that allow communication between devices.

Working with WebRTC is intimidating and can be very complicated. It pretends to be just a group of simple JavaScript APIs but then you realise there’s a lot more to it. ICE candidates, STUN & TURN servers, NAT, there’s just so much to it that goes beyond the scope of knowing JavaScript. So in this post, I’m going to give brief explainers about these acronyms, what they mean and the role they play within the context of the WebRTC API.

The allure of WebRTC is that it’s peer-to-peer, two peers/devices can open a connection with each other without a central server and send data back and forth.

A diagram of devices/peers connected to each other without a server.

It’s like having Beyoncé’s phone number and being able to call her directly, instead of having to call an operator or Jay first to get the number. However, calling Beyoncé isn’t that simple, you’ll have to go through a few authentication steps first to verify who you are and if you have permission to speak to the Queen Bee 🐝. This is where all those acronyms come into play.


Network Address Translation (NAT)

NAT assigns public IP addresses to devices. While every device has a unique IP address, within a network only the router has a public IP address, other devices connected to the router all have private IP addresses. NAT handles the translation between private and public IP address, requests from the device’s private IP address are translated to use the router’s public IP address with the addition of a port number. This allows devices to be discovered on the internet without the need of a unique public IP address. 

When we’re ready to call Beyoncé, we won’t be given her real mobile number, that’s too private and no one trusts us non-wealthy people. Instead, we’ll be given a public number that will get redirected to Bey when we call her, and when Bey calls us it’ll show up as this public number even though she’s using her private mobile number. NAT makes sure the numbers (i.e. device & router IP addresses) are properly translated.

Interactive Connectivity Establishment (ICE)

Calling Beyoncé doesn’t just happen like that, you’ll need to get past security blocks, get her number and also another means of talking to her if the call doesn’t go through. Similarly when connecting peers using the WebRTC APIs, you’ll need to be able to get past firewalls and other security blocks, retrieve an address for the peer you want to connect to and have the capability to default to using a server if the router doesn’t allow direct connection to peers. ICE is a framework that implements STUN and/or TURN to address these issues and get peer-to-peer working. Let’s look into more detail at STUN and TURN.

Session Traversal for NAT (STUN)

STUN is the protocol that determines if there are any security restrictions (e.g. firewalls) on the router that would prevent direct connection to the device. It also has the job of discovering the peer’s public IP address. The STUN server is internet-connected and receives requests from the peer, in response it sends the peer’s public IP address and details about any possible security restrictions. You only need to use this protocol during the initial connection request and once the connection is open, the data can flow freely.

In our Beyoncé example this would be the job of a middleman, maybe Bey’s hairdresser is your cousin’s fiancé. He can give you Bey’s public phone number and tell you the kind of security she’s got set up on her phone. If you’re lucky, the first time you call her you’ll get through! But then while you’re talking the connection drops or the next time you call her, you can’t get through. This is where TURN is helpful.

Traversal Using Relays around NAT (TURN)

The TURN protocol is usually used alongside STUN for situations where you’re unable to open a connection with STUN alone, TURN only accepts connections from trusted peers, i.e. peers that have been previously connected to it. This is a more involved process than using STUN and so is technically (& financially) expensive, however, it provides a good solution if you can’t open a connection to the router. Essentially what happens is that a TURN server is set up between peers and a connection to the TURN server is opened, data packets are then sent through the TURN server which relays the data to the receiving peer.

So instead of the hairdresser giving you her number and phone security details, you’d call him and he’d call Bey & he’d pass your messages between the both of you. This is obviously not ideal but acts a good fallback in the WebRTC world, and fallbacks make applications more robust, so it’s a good idea to implement both STUN and TURN.


How it All Fits Together

Essentially we use the ICE framework to manage our two protocols, STUN and TURN and with the framework, we can say when to use STUN and when to use TURN. In my next post I’ll delve a little deeper into when to use STUN vs TURN and how to access these servers.

Further Reading