High call quality with a stable connection

30 July, 2019
7 min

How can we achieve real-time conference scalability with minimal delays?

Introduction

Students get an education abroad without leaving their homes. More and more specialists are sharing knowledge, providing services and even carrying out research activities using remote communication technologies. With the growing demand for accessible channels of communication and interaction, the number of technologies and products offered is progressively increasing. 
Remote communication technologies destroy borders, allowing you to expand the area of ​​activity for industry professionals effectively around the world.
The demand for remote communication technologies is growing daily. Companies are creating distributed employee teams scattered throughout the world. Remote job offers with decent pay are no longer a surprise to anybody in 2019.
With the development of WEB technologies, the idea of direct browser software availability advances in popularity. The obvious user benefits: there is no need to install applications. And for developers, there is no longer a need to think about compatibility and cross-platform compatibility. WEB-technologies are successfully used in IoT, allowing you to run a variety of devices including a TV, car or even a microwave. 
Creating an effective communication channel for the WEB is a relevant task at hand, albeit a difficult one. All leading browser developers are engaged in its solving for the decade. The terms “online conference“, “video call“, “shared screen demo” are becoming more recognizable and common these days. It is due to high-tech advances that the anticipated “web-” prefix appears in front of these terms.

One-To-Many – easy

Transferring audio/video through the World Wide Web is no longer a complex task. 

To transfer media via the Internet, it is necessary:
– to receive a signal from a capture device (camera/microphone)
– to convert the signal format to a compact form (encode/compress)
– to transfer the received data over the network to
– unpack the received data (decode).
Coding and unpacking are the most resource-intensive in this chain. They consume a considerable amount of CPU time. This procedure is necessary in order to ensure a small file size while retaining a sufficient amount of detail and allowing you to quickly convert the packed format into playable one at the same time.
Depending on many factors, you can use different codecs that will allow us to carry out the process we described above. There is a variety of ways to make it happen.

The most common codec for video transmission is a proprietary `H264` codec. 

H264 has a very high compression ratio with a relatively small quality loss. Most hardware manufacturers embed an encoder H264 into their devices.

H264 compression is a very complex process, it takes up a substantial amount of resources, while unpacking is rather simple. H264 is an outdated technology that was used specifically for Flash. Considering that Flash is not designed for real-time communications, the use of technologies based on it is extremely difficult to implement in the WEB. H264 was an integral part of the outdated Flash technology, which contributed to its active implementation in WEB technology. 


An open-source alternative is a VP8/VP9 codec. Unlike h264, the latter has a focus on real-time dispatch and therefore has a simpler packaging process.


The transmission scheme of a single media signal to multiple recipients is relatively simple and has excellent scalability. Receiving a single stream on the server, you only need to provide access to it using any popular transport (for example HTTP). You can expand the availability of such a signal using a CDN, VDN, or any service alike. There is no time synchronization in this case. Time delays are not permitted, they also do not affect the perception of the content. This method is well suited for webinars and streaming. 

Many-To-Many hell


Whenever there’s a need to transmit the signal bilaterally, the extensibility and advantages of the previous method disappear. In two-way communication mode, time synchronization is a very important component, and any delay is a critical indicator. 

From the performance point of view, the user must simultaneously encode (compress) his own signal, while sending it to the conversationalist and decode (unpack) the interlocutor’s signal. Given the need to ensure low latency, the compression/decompression process should be extremely simplified, which necessitates a search for a compromise between increasing file size and maintaining quality. 

A larger file is more difficult to transfer over the network. In the case the size is reduced, in rapid coding conditions, we face considerable quality losses. Another problem is the transport itself. The fact is that the Internet connection bandwidth varies for each participant of the call. And this means another delay.  

The scalability of multiparty communication is a rather problematic process. Distributing the signals on the server-side, maintaining time synchronism, and not creating delays at the same time requires alchemical efforts. 

Limiting 

Each call participant has a different bandwidth of the Internet connection. Each of the participants synchronously sends and receives data during the communication process. This means that within a single user, the number of received signals and the number of recipients of their own signal are strictly limited to the “width” of the Internet channel. 

How do we live with that?Special solutions are necessary at all stages of video signal transmission to battle the issue: 

                                1. video codec must be optimized for fast signal compression. Unpacking should allow processing of a damaged signal or a signal with lost packets.
                                2. At the transport level, it is recommended to abandon the strict control of packet delivery. In this case, the packages must be delivered as quickly as possible, without waiting for the previous ones, only waiting for the shipment. The UDP protocol is very well suited for this purpose. To minimize delays at the transport level, you can use a P2P connection (direct connection between two participants), bypassing the server. 

All these challenges are well resolved by WebRTC

WebRTC is a set of solutions and technologies. It includes  APIs, technologies, and protocols for optimal transport, coding, error correction, delay control, and throughput capacity. In addition, WebRTC also contains solutions for network discovery as well as NAT and firewall bypass.

                              3. Transmission topology. In order to connect a user to five other users, it is necessary to deliver his signal to each one and simultaneously receive a signal from each of them. 


This topology is called “full-star” and has a very rigid framework. The number of participants in a call like this is strictly limited by the channel width for each of the users. This solution allows you to use a direct connection between users and eliminates server processing delays. This approach is well-suited for conferences of up to 5 people.    

To avoid having to send the same signal to each of the participants, it would be better to send the signal to one place and distribute it between users from thereon. Each user will send a signal once and receive it from each of the users. Thus, the number of simultaneous participants in the conference increases by several times.
This approach is called SFU. (Selective forwarding unit). 

If you unpack the video signal on the server-side, merge it with other signals and send it to the recipients, the user could send his signal to the server once and receive one signal from it, including all the signals of the other users. 


This approach allows you to increase the number of users that can be displayed by the client computer within the limits of performance and network. The problem is this approach has major drawbacks. One of these drawbacks is the media processing necessity (unpacking/packing) on the server-side. This creates a delay, destroys synchronization and requires significant resources to maintain this kind of infrastructure.
This approach is called MCU.

Speaking of scaling, the best is the MCU topology (the user receives one stream and sends one, under any circumstances), and the SFU is optimal (does not require complex processing on the server).

To remove the participant limit for SFUs, you can limit the number of video signals delivered to each participant. Let’s suppose the user’s bandwidth allows sending his own signal and receive 5 video signals from other participants. From the screen space point of view on the screen and conversation stability, a larger number of video signals is quite problematic to display, and it would be difficult to perceive a format like this. 

It’s all OK here.
But there is another problem. If you limit the number of incoming videos for the user, then most of the participants in the call will remain invisible to him.
To solve this problem, you can analyze the audio activity of all conference participants in real-time, and deliver the video of the participant who speaks at the moment first.

Thus, limiting the number of participants by the client disappears.

 

 

OptimizationSince the bandwidth of the Internet is different for each participant and may shuffle, the amount of incoming video for different users may vary. To determine the user’s bandwidth, there are several capacities within WebRTC. 
send-side, recv-side and others make it possible to set the optimal number of delivered videos for each user and do it dynamically in real-time. Assuming that normally there is only one active user speaking, it’s only logical to deliver his video in higher quality.

Simulcast 

WebRTC allows you to encode several versions of the same video signal simultaneously. The receiving party can choose which one it prefers to receive. Thus, it is necessary to deliver the highest quality signal from the speaker and lower quality signal can be delivered from other participants. 

Conclusion 
So how can we achieve real-time conference scalability with minimal delays?
Well, indeed, the creation of high-quality two-way communication is quite a challenge.
It is even harder to implement this for WEB. In addition to all the issues mentioned above, it is necessary to provide a lot of additional features of the browser, cross-browser compatibility, all sorts of limitations on system resources and so on. WebRTC provides a very rich set of tools while leaving the possibility to use or not to use them. 
If you 
manage all available means of communication control properly,
find a compromise between infrastructure costs and scalability,
use the bandwidth of the user’s Internet channel with maximum benefit, 
then
you can achieve really high-quality indicators while maintaining stability and extensibility.

Tatiana Romanova
Content Architect at Proficonf