How to roll updates to an RTMP streaming server without downtime? - deployment

How do sites like Twitch (and other streaming sites) roll updates without any downtime?
As far as I understand, RTMP is a stateful protocol and it uses a persistent connection. So in case of an update to the streaming server, there's no way that the client can simply switch to a different server without having to close the connection first and then performing the handshake again.
It seems like draining the connections (waiting for all connections to close) would be the most appropriate way, but a stream could potentially go for months (if not years), making it impossible to update the server while the stream is running.

Related

Which is best polling or realtime for google applications like Gmail or Google Drive?

In general everyone say realtime is best for the performance of the application but is it good to have all the applications as realtime ??
There are some cases where polling might be better than real-time streaming. Essentially, it's when you have a massive event stream and the client cannot easily cope with this stream in real time. For example, you are pushing tons of events to a mobile device that dequeues the data more slowly than the producer. In such a case, thanks to polling, the client could ask for a new batch of data, process it quietly, than ask for another batch. Of course, all this makes sense if the data producer (the server) is able to resample the data flow so that at each request, it doesn't need to send all the same data it would send when streaming.
So, to go back to your specific question, both Gmail and Google Drive do not produce so much real-time data to need polling (I know this sounds counterintuitive!), and I would then say that real-time streaming would always be better than polling. But streaming is a bit more delicate than polling). You must monitor if the connection is healthy. It could be half-closed or half-opened and you need bidirectional heartbeats to make sure it's fully alive. In case of disconnection, you must be able to automatically reconnect and restore the state before the connection broke.

websocket communication between clients in distributed system

I'm trying to build instant messaging app. Clients will not only send messages but also often send audios. And I've decided to use websocket connection to communicate with clients. It is fast and allows to send binary data.
The main idea is to receive from client1 message and notify about it client2. But here's the thing. My app will be running on GAE. And what if client1's socket is opened on server1 and client2's is opened on server2. This servers don't know about each others clients.
I have one idea how to solve it, but I am sure it is shitty way. I am going to use some sort of communication between servers(for example JMS or open another websocket connection between servers, doesn't matter right now).
But it surely will lead to a disaster. I can't even imagine how often those servers will speak to each other. For each message server1 should notify server2, server2 should notify client2. But things become even worse when serverN comes into play.
Another way I see this to work is Firebase. But it restricts message size to 4KB. So I can't send audios via it. As a solution I can notify client about new audio and he goes to my server for it.
Hope I clearly explained the problem. Does anyone know how to solve it? Or maybe there are another ways to build such apps?
If you are building a messaging cluster and expect communicating clients to connect to different instances of the server then server-server communication is inevitable. Usually it's not a problem though.
First, if you don't use any load balancing your clients will connect to the same server 50% of time on average (in case of 2 servers).
Second, intra-datacenter links are fast and free in all known public clouds.
Third, you can often do something smart on the frontend to make sure two likely to communicate clients connect to the same server. For instance direct all clients from the same country to the same server using DNS load balancing.
The second part of the question is about passing large media files. It's a common best practice to send it out of band - store on the server and only pass the reference to it. Like someone suggested in the comment, save the audio on the server and just send a message like "audio is available, fetch it from here ...". You don't need to poll the server for that. Just fetch it once when the receiving client requests it.
In general, it seems like you are trying to reinvent the wheel. Just use something off the shelf.
Let all client get connected to multiple servers and each server keeps this metadata
A centralized system like zookeeper stores active servers details
When a client c1 sends a message to client c2:
the message is received by a server (say s1, we can add a load balancer to distribute incoming requests)
s1 will broadcast this information to all other servers to get which server the client c2 is connected to OR a better approach to use consistent hashing which decides which server the client can connect to & in this approach message broadcast is not required
the corresponding server responses to server s1 (say s2)
now s1 sends the message m to s2 and server s2 to client c2
Cons of the above approach:
Each server will have a connection with the n-1 servers, creating a mesh topology
Centralized system (zookeeper) becomes a single point of failures (which is solvable)
Apps like Whatsapp, G-Talk uses XMPP and TCP/IP.

server push for millions of concurrent connections

I am building a distributed system that consists of potentially millions of clients which all need to keep an open (preferrably HTTP) connection to wait for a command from the server (which is running somewhere else). The load of messages / commmands will not be very high, maybe one message / sec / 1000 clients which means it would be 1000 msg/sec # 1 million clients. => it's basically about the concurrent connections.
The requirements are simple too. One way messaging (server->client), only 1 client per "channel".
I am pretty open in terms of technology (xmpp / websockets / comet / ...). I am using Google App Engine as server, but their "channels" won't work for me unfortunately (too low quotas and no Java client). XMPP was an option but is quite expensive. So far I was using URL Fetch & pubnub, but they just started charging for connections (big time).
So:
Does anyone know of a service out there that can do that for me in an affordable way? Most I have found restrict or heavily charge for connections.
Any experience with implementing such a server yourself? I have actually done that already and it works pretty well (based on Tomcat & NIO) but I haven't had the time yet to actually set up a large load test environment (partially because this is still a fallback solution, I'd prefer a battle hardened msg server). Any experience to how many users you get per GB? Any hard limits?
My architecture also allows to fragment the msg servers, but I'd like to maximize the concurrent connections because the msg processing CPU overhead is minimal.
I have meanwhile implemented my own message server using netty.io. Netty makes use of Java NIO and scales extremely well. For idle connections I get a memory footprint of 500 bytes per connection. I am doing only very simple message forwarding (no caching, storage or other fancy stuff) but with that am easily getting 1000 - 1500 msg / sec (each half a KB) on the small amazon instance (1ECU / 1.6GB).
Otherwise if you are looking for a (paid) service then I can recommend spire.io (they do not charge for connections but have a higher price per message) or pubnub (they do charge for connections but are cheaper per message).
You have to look more in architecture of making such environment.
First of all, if you will write sockets management by yourself, then don't use Thread per Client Socket. Use Asynchronous methods for receiving and sending data.
WebSockets might be too heavy if your messages are small. Because it implements framing, which has to be applied to each message for each socket individually (caching can be used for different versions of WebSockets protocols), that makes them slower to process both directions: for receive and for send, especially because of data masking.
It is possible to create millions of sockets, but only most advanced technologies are capable to do so. Erlang is able to handle millions connections, and is pretty scalable.
If you would like to have millions of connections using other higher level technologies, then you need to think about clustering of what you are trying to accomplish.
For example using gateway server that will keep track of all processing servers. And have data of them (IP, ports, load (if it will be one internal network, firewalling and port forwarding might be handy here).
Client software connects to that gateway server, gateway server checks the least loaded server and sends ip and port to client. Client creates connection directly to working server using provided address.
That way you will have gateway which as well can handle authorization, and wont hold connections for long, so one of them might be enough. And many workers that are doing publishing of data and keeping connections.
This is very related to your needs, and might not be suitable for your solutions.

Should I keep a socket open during a long running process?

I've got some programs that occasionally (anywhere from every few minutes to once an hour) need to send metrics to Graphite. Should I keep the socket to the graphite server open for the duration of my process or make a new connection every time I need to send some metrics? What are the considerations when doing one or the other?
Sounds like you need a TCP connection.
If you should keep the connection active or not depends on answers to points like:
- Would you like to monitor the "connected" clients at the server at any given time?
- Is there a limit at the Server side in relation to the previous point?
- The amount of such clients "connected" to the server?
- Is it a problem if the connection creation takes some time?
If you keep the connection open, just make sure to send keep-alive messages from time to time (application level proffered).
A large amount of clients connected to the server, even when not active, may consume resources of memory or objects (for example, if there is one thread per connection).
On the other hand, keeping the connection on, will allow the client to detect if there is a connection problem to the server much faster (if that even matters).
it all depends on when is needed.

Dropping a streaming HTTP connection as soon as possible after losing connection

So what we're trying to achieve is maintaining a vast number of concurrent connections from mobile devices to our Erlang HTTP server. Mobile devices of course can have have pretty intermittent connections, so we're looking to drop dead connections as soon as possible to avoid their overhead.
Now, I'm not sure at what level we should be detecting dead connections. TCP has keepalive packets, which require an ACK. So ideally we'd send a keepalive packet ever 15 seconds, and if we didn't receive the ACK within the next 15 seconds then we'd drop the connection. However, I've no idea if this is even possible in Erlang. Also, I think there's the possibility that some NATs, wi-fi routers and mobile networks are ACKing the keepalives for a certain amount of time, correct me if I'm wrong. Is that the case, and if so is there any TCP-level alternative way of doing 'heartbeats'?
We've also tried an application-level heartbeat - sending a \n down the HTTP stream. However, even with all applicable Erlang options set, including send_timeout, we're not getting any error for about 5 minutes under certain circumstances, such as, say, the mobile device straying too far from its wi-fi router.
How best can we implement a streaming HTTP connection that the server will drop as soon as possible after losing contact? Any help'd be much appreciated!
You can add a specific watchdog for HTTP connection. Watchdog will have configurable timeout that will be reset after each operation (read or write) on connection. And if there were no operations on socket within specified timeout - connection is closed.
This approach will eliminate the problem of stale connections (connections perfectly healthy but without any I/O activity). And if clients is out of coverage - connection will last only up to specified timeout. Also no keep-alive mechanism is needed when using watchdog approach.
The only drawback is that server will not detect broken connections immediately but will instead wait timeout specified in connection watchdog.
Isac's comment answered it for me - configuring the socket keep alive timeout at the machine level.
See http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html