Data Synchronization in Real Time: An Evolution

Contributor - 31 March 2020 -
Contributor - 31 March 2020 -
HyperText Transfer Protocol (HTTP) is the most widely used application layer protocol in the Open Systems Interconnection (OSI) model. Traditionally, it was built to transfer text or media which had links to other similar resources, between a client that common users could interact with, and a server that provided the resources. Clicking on a link usually resulted in the un-mounting of the present page from the client and loading of an entirely another page. Gradually when the content across pages became repetitive with minute differences, engineers started looking for a solution to the only update some half of the content instead of loading the entire page.
This was when XMLHttpRequest or AJAX was born which supported the transfer of data in formats like XML or JSON, which differed from the traditional HTML pages. But all along the process, HTTP was always a stateless protocol where the onus lied on the client to initiate a request to the server for any data it required.
When exponential growth in the volume of data exchanged on the internet lead to application spanning multiple business use cases, the need arose to fetch this data on a real-time basis rather than waiting for the user to request a page refresh. This is the topic that we are trying to address here. Now there are different protocols and solutions available for syncing data between client and server to keep data updated between a third party server and our own server. We are limiting the scope only to real-time synchronization between a client application and a data server.
Without loss of generality, we are assuming that our server is on a cloud platform, with several instances of the server running behind a load balancer. Without going into the details on how this distributed system maintains a single source of new data, we are assuming that whenever a real-time data occurs, all servers are aware and access this new data from the same source. We will now disseminate four technologies that solve real-time data problems – namely Polling, Long Polling, Server-Sent Events, and WebSockets. We will also compare them in terms of ease of implementation on the client-side as well as the server-side.
Polling is a mechanism in which a client application, like a web browser, constantly asks the servers for new data. They are traditional HTTP requests that pull data from servers via XMLHttpRequest objects. The only difference is that we don’t rely on the user to perform any action for triggering this request. We periodically keep on pushing the requests to the server separated by a certain time window. As soon as any new data is available on the server, the immediate occurring request is responded with this data.
Figure 1: Polling
Ease of Implementation on client
Ease of Implementation on Server
Critical Drawbacks
As the name suggests, long polling is mostly equivalent to the basic polling described above as it is a client pull of data and makes an HTTP request to the server using XMLHttpRequest object. But the only difference is that it now expects the server to keep the connection alive as long as it does not respond with new data or the connection timeout over TCP is reached. The client does not initiate a new request till the previous request is responded with.
Figure 2: Long Polling
Ease of Implementation on client
Ease of Implementation on Server
Critical Drawbacks
Server-Sent Events or SSE follows the principle of Server push of data rather than client polling for data. The communication still follows the standard HTTP protocol. A client initiates a request with the server. After the TCP handshake is done, the server informs the client that it will be providing streams of text data. Both the browser and server agree to keep the connection alive for as long as possible. The server in fact never closes the connection on its own. The client can close the connection if it no more needs new data. Now whenever any new data occurs on the server, it keeps on providing stream in text format as a new event for each new data. If the SSE connection is ever interrupted because of network issues, the browser immediately initiates a new SSE request.
Figure 3: Server-Sent Events
Ease of Implementation on client
Ease of Implementation on Server
Critical Drawbacks
Unlike all the above three technologies which follow HTTP protocol, Websockets can be defined as something that’s built over HTTP. The client initiates a normal HTTP request with the server but includes a couple of special headers – Connection: Upgrade and Upgrade: WebSocket. These headers instruct the server to first establish a TCP connection with the client. But then, both the server and client agree to use this now active TCP connection for a protocol which is an upgrade over the TCP transport layer. The handshake that happens now over this active TCP connection follows WebSocket protocol and agree on following payload structure as JSON, XML, MQTT, etc. that both the browser and server can support via the Sec-WebSocket-Protocol Request and Response Header respectively. Once the handshake is complete, the client can push data to the server while the server too can push data to the client without waiting for the client to initiate any request. Thus a bi-directional flow of data is established over.
Figure 4: WebSockets
Ease of Implementation on client
Ease of Implementation on Server
Critical Drawbacks
Below is a table summarising all the parameters:
Polling | Long Polling | SSE | WebSockets | |
Protocol | HTTP | HTTP | HTTP | HTTP Upgraded to WebSockets |
Mechanism | Client Pull | Client Pull | Server Push | Server Push |
Bi-directional | No | No | No | Yes |
Ease of Implementation on Client | Easy via XMLHttpRequest | Easy via XMLHttpRequest | Manageable via EventSource Interface | Manageable via the WebSocket Interface |
Browser support | All | All | Not supported in IE – can be overcome with Polyfill library | All |
Automatic Reconnection | Inherent | Inherent | Yes | No |
Ease of Implementation on Server | Easy via the traditional HTTP Request-Response Cycle | Logic of memorizing connection for a session needed | Standard HTTP endpoint with specific headers and a pool of client connections | Requires efforts and mostly need to set up a separate server |
Secured Connection | HTTPS | HTTPS | HTTPS | WWS |
Risk of Network Saturation | Yes | No | No | Since browser multiplexing not supported, need to optimize connection on both ends |
Latency | Maximum | Acceptable | Minimal | Minimal |
Issue of Caching | Yes, need appropriate Cache-Control headers | Yes, need appropriate Cache-Control headers | No | No |
Polling and Long Polling are client pull mechanism that adheres to the standard HTTP request-response protocol. Both are relatively easier to implement on the server and client. Yet both pose the threat of request throttling on client and server respectively. Latency is also measurable in both the implementations which is somewhat self-contrasting for the purpose of providing real-time data. Server-Sent Events and WebSockets seem to be better candidates in providing real-time data. If the data flow is unidirectional and only the server needs to provide updates, it is advised to use SSE which follows the HTTP protocol. But if the need is that client and server both need to provide real-time data to each other which can be the case in scenarios like a chat application, it is advised to go for WebSockets.
Leave a Reply