Which is best polling or realtime for google applications like Gmail or Google Drive? - real-time

In general everyone say realtime is best for the performance of the application but is it good to have all the applications as realtime ??

There are some cases where polling might be better than real-time streaming. Essentially, it's when you have a massive event stream and the client cannot easily cope with this stream in real time. For example, you are pushing tons of events to a mobile device that dequeues the data more slowly than the producer. In such a case, thanks to polling, the client could ask for a new batch of data, process it quietly, than ask for another batch. Of course, all this makes sense if the data producer (the server) is able to resample the data flow so that at each request, it doesn't need to send all the same data it would send when streaming.
So, to go back to your specific question, both Gmail and Google Drive do not produce so much real-time data to need polling (I know this sounds counterintuitive!), and I would then say that real-time streaming would always be better than polling. But streaming is a bit more delicate than polling). You must monitor if the connection is healthy. It could be half-closed or half-opened and you need bidirectional heartbeats to make sure it's fully alive. In case of disconnection, you must be able to automatically reconnect and restore the state before the connection broke.

Related

When to use polling and streaming in launch darkly

I have started using launch darkly(LD) recently. And I was exploring how LD updates its feature flags.
As mentioned Here, there are two ways.
Streaming
Polling
I was just thinking which implementation will be better in what cases. After a little research about streaming vs polling, It was found Streaming has the following advantages over polling.
Faster than polling
Receives only latest data instead of all the data which is same as before
Avoids periodic requests
I am pretty sure all of the above advantages comes at a cost. So,
Are there any downsides of using streaming over polling?
In what scenarios polling should be preferred? or the other way around?
On what factors should I decide whether to stream or poll?
Streaming
Streaming requires your application to be always alive. This might not be the case in a serverless environment. Furthermore, a streaming solution usually relies on a connection that is always open in the background. This might be costly, so feature flag providers tend to limit the number of concurrent connections you can keep open to their infrastructure. This might be not a problem if you use feature flags only in a few application instances. But you will easily reach the limit if you want to stream feature flag updates to mobile apps or a ton of microservices.
Polling
Polling sounds less fancy, but it's a reliable & robust old-school pattern that will work in almost all environments.
Webhooks
There is a third option too: webhooks. The basic idea is that you create an HTTP endpoint on your end and he feature flag service will call that endpoint whenever a feature flag value update happens. This way you get a "notification" about feature flag value changes. For example ConfigCat supports this model. ConfigCat can notify your infrastructure by calling your webhooks and (optionally) pushing new values to your end. Webhooks have the advantage over streaming that they are cheap to maintain, so feature flag service providers don't limit them as much (for example ConfigCat can give you unlimited webhooks).
How to decide
How I would use the above 3 option really depends on your use-case. A general rule of thumb is: use polling by default and add quasi real-time notifications (by streaming or by webhooks) to the components where it's critical to know about feature flag value updates.
In addition to #Zoltan's answer, I Found the following from LaunchDarkly's Effective Feature management E book (Page 36)
In any networked system there are two methods to distribute information.
Polling is the method by which the endpoints (clients or servers) periodically ask for updates. Streaming, the second method,is when the central authority pushes the new values to all the end‐points as they change.Both options have pros and cons.
However, in a poll-based system, you are faced with an unattractive trade-off: either you poll infrequently and run the risk of different parts of your application having different flag states, or you poll very frequently and shoulder high costs in system load, network bandwidth, and the necessary infra‐structure to support the high demands.
A streaming architecture, on the other hand, offers speed advantages and consistency guarantees. Streaming is a better fit for large-scale and distributed systems. In this design, each client maintains along-running connection to the feature management system, which instantly sends down any changes as they occur to all clients.
Polling Pros:
Simple
Easily cached
Polling Cons:
Inefficient. All clients need to connect momentarily, regardless of whether there is a change.
Changes require roughly twice the polling interval to propagate to all clients.
Because of long polling intervals, the system could create a “split brain” situation, in which both new flag and old flag states exist at the same time.
Streaming Pros:
Efficient at scale. Each client receives messages only when necessary.
Fast Propagation. Changes can be pushed out to clients in real time.
Streaming Cons:
Requires the central service to maintain connections for every client
Assumes a reliable network
For my use case, I have decided to use polling in places where I don't need to update the flags often(long polling interval) and doesn't care about inconsistencies (split-brain) .
And Streaming for applications that need immediate flag updates and consistency is important.

Kafka messages over rest api

we currently have a library which we use to interact with kafka. but we planning to develop this library into a separate application. Other applications will send kafka messages using rest endpoint. Planning to use vert.x in this application to make it non-blocking and fast. Is it a good strategy. My concern 1) http will make it slower compared to TCP of kafka 2) streaming may not be possible 3) single point of failure
But being separate application - release management, control and support will be lot easier than currently.
Is it good strategy and has someone done like this before? Any suggestions?
Your consideration for going with HTTP/ TCP will depend on the number of applications that will be talking to your service. Let's say there is an IOT device that is sending lots of messages continuously, then using HTTP will be expensive and it will increase latency. Since HTTP connection establishment is an expensive operation.
Now, consider the case where you have a transactional system that is sending transaction events as they commit to your database then the rate of messages will be lower I assume, then it makes sense to use HTTP there.
It will depend on the rate of messages that your service will receive, that will decide the way you want to take.
Now, for your current approach of maintaining a library, it is a good way to maintain consistency across the organisation as long as the library is maintained and users of your library constantly update as and when you make changes to your library. It also has the advantage of not maintaining separate infrastructure/servers since your code will run in your users' application.

redis- Should I use redis to store chat messages?

So I am currently working on a chat, and I wonder if I could use Redis to store the chat messages. The messages will be only at the web and I want at least a chat history of 20 messages for each private chat. The Chats subscribers will be already stored in MongoDB.
I mainly want to use Redis, because I get rid of the MongoDB stuff, for more speed.
I already use Pub/Sub, but what about storing a copy in Redis Lists? Also what about reading statuses, how could I implement that?
Redis only loses data in case of power outage, if the system is shutdown properly, it will save its data and in this case, data won't be lost.
It is good approach to dump data from redis to mongoDb/anyotherDb when a size limit is reached or on date basis (weekly or monthly) so that your realtime chat database stays light weighted.
Many modern systems now a days prepare for power outage, a ups will run and the system will shutdown properly.
see : https://hackernoon.com/how-to-shutdown-your-servers-in-case-of-power-failure-ups-nut-co-34d22a08e92
Also what about reading statuses, how could I implement that?
Depends on protocol you are implementing, if you are using xmpp, see this.
Otherwise, you can use a property in message model for e.g "DeliveryStatus" and set it to your enums (1. Sent, 2. Delivered, 3. Read). Mark message as Sent as soon as it is received at server. For Delivered and Read, your clients will send you back packets indicating the respective action has occurred.
As pointed in the comment above, the important thing to consider here is the persistency model. Redis offers some persistency (with snapshots and aof-files). The important thing is to first understand what you need:
can you afford to lose all the data? can you afford to lose some of the data? if the answer is no, then perhaps you should not bother with redis.

WebSocket/REST: Client connections?

I understand the main principles behind both. I have however a thought which I can't answer.
Benchmarks show that WebSockets can serve more messages as this website shows: http://blog.arungupta.me/rest-vs-websocket-comparison-benchmarks/
This makes sense as it states the connections do not have to be closed and reopened, also the http headers etc.
My question is, what if the connections are always from different clients all the time (and perhaps maybe some from the same client). The benchmark suggests it's the same clients connecting from what I understand, which would make sense keeping a constant connection.
If a user only does a request every minute or so, would it not be beneficial for the communication to run over REST instead of WebSockets as the server frees up sockets and can handle a larger crowd as to speak?
To fix the issue of REST you would go by vertical scaling, and WebSockets would be horizontal?
Doe this make sense or am I out of it?
This is my experience so far, I am happy to discuss my conclusions about using WebSockets in big applications approached with CQRS:
Real Time Apps
Are you creating a financial application, game, chat or whatever kind of application that needs low latency, frequent, bidirectional communication? Go with WebSockets:
Well supported.
Standard.
You can use either publisher/subscriber model or request/response model (by creating a correlationId with each request and subscribing once to it).
Small size apps
Do you need push communication and/or pub/sub in your client and your application is not too big? Go with WebSockets. Probably there is no point in complicating things further.
Regular Apps with some degree of high load expected
If you do not need to send commands very fast, and you expect to do far more reads than writes, you should expose a REST API to perform CRUD (create, read, update, delete), specially C_UD.
Not all devices prefer WebSockets. For example, mobile devices may prefer to use REST, since maintaining a WebSocket connection may prevent the device from saving battery.
You expect an outcome, even if it is a time out. Even when you can do request/response in WebSockets using a correlationId, still the response is not guaranteed. When you send a command to the system, you need to know if the system has accepted it. Yes you can implement your own logic and achieve the same effect, but what I mean, is that an HTTP request has the semantics you need to send a command.
Does your application send commands very often? You should strive for chunky communication rather than chatty, so you should probably batch those change request.
You should then expose a WebSocket endpoint to subscribe to specific topics, and to perform low latency query-response, like filling autocomplete boxes, checking for unique items (eg: usernames) or any kind of search in your read model. Also to get notification on when a change request (write) was actually processed and completed.
What I am doing in a pet project, is to place the WebSocket endpoint in the read model, then on connection the server gives a connectionID to the client via WebSocket. When the client performs an operation via REST, includes an optional parameter that indicates "when done, notify me through this connectionID". The REST server returns saying if the command was sent correctly to a service bus. A queue consumer processes the command, and when done (well or wrong), if the command had notification request, another message is placed in a "web notification queue" indicating the outcome of the command and the connectionID to be notified. The read model is subscribed to this queue, gets messessages and forward them to the appropriate WebSocket connection.
However, if your REST API is going to be consumed by non-browser clients, you may want to offer a way to check of the completion of a command using the async REST approach: https://www.adayinthelifeof.nl/2011/06/02/asynchronous-operations-in-rest/
I know, that is quite appealing to have an low latency UP channel available to send commands, but if you do, your overall architecture gets messed up. For example, if you are using a CQRS architecture, where is your WebSocket endpoint? in the read model or in the write model?
If you place it on the read model, then you can easy access to your read DB to answer fast search queries, but then you have to couple somehow the logic to process commands, being the read model the responsible of send the commands to the write model and notify if it is unable to do so.
If you place it on the write model, then you have it easy to place commands, but then you need access to your read model and read DB if you want to answer search queries through the WebSocket.
By considering WebSockets part of your read model and leaving command processing to the REST interface, you keep your loose coupling between your read model and your write model.

long polling vs streaming for about 1 update/second

is streaming a viable option?
will there be a performance difference on the server end depending on which i choose?
is one better than the other for this case?
I am working on a GWT application with Tomcat running on the server end. To understand my needs, imagine updating the stock prices of several stocks concurrently.
Do you want the process to be client- or server-driven? In other words, do you want to push new data to the clients as soon as it's available, or would you rather that the clients request new data whenever they see fit, even though that might not be once/second? What is the likelyhood that the client will be able to stick around to wait for an answer? Even though you expect the events to occur once/second, how long does it take between a request from a client and the return from the server? If it's longer than a second, I'd expect you to lean towards pushing the events to the clients, though the other way around, I'd expect polling to be okay. If the response takes longer than the interval, then you're essentially streaming anyway, since there's a new event ready by the time the client receives the last one, so the client could essentially poll continually and always receive events - in this case, streaming the data would actually be more lightweight, since you're removing the connection/negotiation overhead from the process.
I would suspect that server load to be higher for a client-based (pull) subscription, instead of a streaming configuration, since the client would have to re-negotiate the connection each time, instead of leaving a connection open, but each open connection in a streaming model would require server resources as well. It depends on what the trade-off is between how aggressive your negotiation process is vs. how much memory/processing is required for each open connection. I'm no expert, though, so there may be other factors.
UPDATE: This guy talks about the trade-offs between long-polling and streaming, and he seems to say that with HTTP/1.1, the connection re-negotiation process is trivial, so that's not as much of an issue.
It doesn't really matter. The connection re-negotiation overhead is so slim with HTTP1.1, you won't notice any significant performance differences one way or another.
The benefits of long-polling are compatibility and reliability - no issues with proxies, ports, detecting disconnects, etc.
The benefits of "true" streaming would potentially be reduced overhead, but as mentioned already, this benefit is much, much less than it's made out to be.
Personally, I find a well-designed comet server to be the best solution for large numbers of updates and/or server-push.
Certainly, if you're looking to push data, streaming would seem to provide better performance, if your server can handle the expected number of continuous connections. But there's another issue that you don't address: Are you internet or intranet? Streaming has been reported to have some problems across proxies, much as you'd expect. So for a general purpose solution, you would probably be better served by long poll - for an intranet, where you understand the network infrastructure, streaming is quite likely a simpler, better performance solution for you.
The StreamHub GWT Comet Adapter was designed exactly for this scenario of streaming stock quotes. Example here: GWT Streaming Stock Quotes. It updates the stock prices of several stocks concurrently. I think the implementation underneath is Comet which is essentially streaming over HTTP.
Edit: It uses a different technique for each browser. To quote the website:
There are several different underlying
techniques used to implement Comet
including Hidden iFrame,
XMLHttpRequest/Script Long Polling,
and embedded plugins such as Flash.
The introduction of HTML 5 WebSockets
in to future browsers will provide an
alternative mechanism for HTTP
Streaming. StreamHub uses a "best-fit"
approach utilizing the most performant
and reliable technique for each
browser.
Streaming will be faster because data only crosses the wire one way. With polling, the latency is at least twice.
Polling is more resilient to network outages since it doesn't rely on a connection being kept open.
I'd go for polling just for the robustness.
For live stock prices I would absolutely keep the connection open, and ensure user alert/reconnection on disconnect.