I'm look recommendations on how to achieve low latency for the following network protocol:
Alice sends out a request for information to many peers selected at random from a very large pool.
Each peer responds with a small packet <20kb.
Alice aggregates the responses and selects a peer accordingly.
Alice and the selected peer then continue to the second phase of the protocol whereby a sequence of 2 requests and responses are performed.
Repeat from 1.
Given that steps 1 and 2 do not need to be reliable (as long as a percentage of responses arrive back we proceed to step 3) and that 1 is essentially a multicast, this part of the protocol seems to suit UDP - setting up a TCP connection to these peers would add an addition round trip.
However step 4 needs to be reliable - we can't tolerate packet loss during the subsequent requests/responses.
The conundrum I'm facing is that UDP suits 1 and 2 and TCP protocol suits 4. Connecting to every peer selected in 1 is slow especially since we aim to transmit just 20kb, however UDP cannot be tolerated for step 4. Handshaking the peer selected in 4. would require an additional round trip, which compared to the 3 round trips still is a considerable increase in total time.
Is there some hybrid scheme whereby you can do a TCP handshake while transmitting a small amount of data? (The handshake could be merged into 1 and 2 and hence doesn't add any additional round trip time.)
Is there a name for such protocols? What should I read to become more acquainted with such problems?
Additional info:
Participants are assumed to be randomly distributed around the globe and connected via the internet.
The pool selected from in step 1. is on the order of 1000 addresses and the the sample from the pool on the order of 10 to 100.
There's not enough detail here to do a well-informed criticism. If you were hiring me for advice, I'd want to know a lot more about the proposal, but since I'm doing this for free, I'll just answer the question as asked, and try to make it practical rather than ideal.
I'd argue that UDP is not suitable for the early part of your protocol. You can't just multicast a single packet to a large number of hosts on the Internet (although you can do it on typical LANs). A 20KB payload is not the sort of thing you can generally transmit in a single datagram in any case, and the moment messages fail to fit in a single datagram, UDP loses most of its attraction, because you start reinventing TCP (badly).
Probably the simplest thing you can do is base your system on HTTP, and work with implementations which incorporate all the various speed-ups that Google (mostly) has been putting into HTTP development. This includes TCP Fast Open, and things like it. Initiate connections out to your chosen servers; some will respond faster than others: use that to your advantage by going with the quickest ones. Don't underestimate the importance of efficient implementation relative to theoretical round-trip time, by the way.
For stage two, continue with HTTP as before. For efficiency, you could hold all the connections open at the end of phase one and then close all the ones except your chosen phase two partner. It's not clear from your description that the phase two exchange lends itself to the HTTP model, though, so I have to hand-wave this a bit.
It's also possible that you can simply hold TCP connections open to all available peers more or less permanently, thus dodging the cost of connection establishment nearly all the time. A thousand simultaneous open connections is large, but not outrageous in most contexts (although you may need to tweak OS settings to allow it). If you do that, you can just talk whatever protocol you like over TCP. If it's a truly peer-to-peer protocol, you only need one TCP connection per pair. Implementing this kind of thing is tricky, though: an average programmer will do a terrible job of it, in my experience.
Related
I'm curious about if there is some type of standard limit on when is better to use Ajax Polling instead of SSE, from a server side viewpoint.
1 request every second: I'm pretty sure is better SSE
1 request per minute: I'm pretty sure is better Ajax
But what about 1 request every 5 seconds? How can we calculate where is the limit frequency for Ajax or SSE?
No way is 1 request per minute always better for Ajax, so that assumption is flawed from the start. Any kind of frequent polling is nearly always a costly choice. It seems from our previous conversation in comments of another question that you start with a belief that an open TCP socket (whether SSE connection or webSocket connection) is somehow costly to server performance. An idle TCP connection takes zero CPU (maybe every once in a long while, a keep alive might be sent, but other than that, an idle socket does not use CPU). It does use a bit of server memory to handle the socket descriptor, but a highly tuned server can have 1,000,000 open sockets at once. So, your CPU usage is going to be more about how many connections are being established and what are they asking the server to do every time they are established than it is about how many open (and mostly idle) connections there are.
Remember, every http connection has to create a TCP socket (which is roundtrips between client/server), then send the http request, then get the http response, then close the socket. That's a lot of roundtrips of data to do every minute. If the connection is https, it's even more work and roundtrips to establish the connection because of the crypto layer and endpoint certification. So doing all that every minute for hundreds of thousands of clients seems like a massive waste of resources and bandwidth when you could create one SSE connection and the client just listen for data to stream from the server over that connection.
As I said in our earlier comment exchange on a different question, these types of questions are not really answerable in the abstract. You have to have specific requirements of both client and server and a specific understanding of the data being delivered and how urgent it is on the client and therefore a specific polling interval and a specific scale in order to begin to do some calculations or test harnesses to evaluate which might be the more desirable way to do things. There are simply too many variables to come up with a purely hypothetical answer. You have to define a scenario and then analyze different implementations for that specific scenario.
Number of requests per second is only one of many possible variables. For example, if most the time you poll there's actually nothing new, then that gives even more of an advantage to the SSE case because it would have nothing to do at all (zero load on the server other than a little bit of memory used for an open socket most of the time) whereas the polling creates continual load, even when nothing to do.
The #1 advantage to server push (whether implement with SSE or webSocket) is that the server only has to do anything with the client when there is actually pertinent data to send to that specific client. All the rest of the time, the socket is just sitting there idle (perhaps occasionally on a long interval, sending a keep-alive).
The #1 disadvantage to polling is that there may be lots of times that the client is polling the server and the server has to expend resources to deal with the polling request only to inform that client that it has nothing new.
How can we calculate where is the limit frequency for Ajax or SSE?
It's a pretty complicated process. Lots of variables in a specific scenario need to be defined. It's not as simple as just requests/sec. Then, you have to decide what you're attempting to measure or evaluate and at what scale? "Server performance" is the only thing you mention, but that has to be completely defined and different factors such as CPU usage and memory usage have to be weighted into whatever you're measuring or calculating. Then, you may even need to run some test harnesses if the calculations don't yield an obvious answer or if the decision is so critical that you want to verify your calculations with real metrics.
It sounds like you're looking for an answer like "at greater than x requests/min, you should use polling instead of SSE" and I don't think there is an answer that simple. It depends upon far more things than requests/min or requests/sec.
"Polling" incurs overhead on all parties. If you can avoid it, don't poll.
If SSE is an option, it might be a good choice. "It depends".
Q: What (if any) kind of "event(s)" will your app need to handle?
I was wondering,
1st question What are the pros and cons of using one socket (full duplex) vs. two socket (simplex) per peer: one for read and other write? Specially in terms of performance and resource utilization.
2nd question In case, if i choose to use more than 1 sockets per peer, on all i do read and write. Then will it helps me scale out in handling no of messages handled?
3rd question: what should help me determine the number of sockets per peer? Network Bandwidth? No. of message in and out?
All questions are different and do not have any inter-relation.
What are the pros and cons of using one socket (full duplex) vs. two socket one for read and other write? Specially in terms of performance and resource utilization.
Pro one socket: resource utilization. Contra one socket: nil. Performance: identical, except that you save on connect and close handshakes if you only use one socket.
In case, I choose to take two socket approach, then will not be useful to use both of them full duplex, that way it helps me scale out in terms of data flowing in and out?
Now you're comparing apples and oranges. You can't compare one full-duplex socket with two full-duplex sockets. I don't know why you think you might need two inbound and two outbound flows, but you don't. Every protocol I can think of except FTP uses only one.
what impact does network bandwidth has on it?
Nil.
or it has on network utilization?
Nil, apart from the connect and close handshakes. But it wastes resources at both ends.
We've added --full-duplex to iperf 2.0.14 which will test a full-duplex socket. One can dompare it to two sockets per the -d or --dualtest option. We've found "your mileage will vary" and there is no universal to answer of having equal performance or not. In theory, it seems they should be equal but, in practice, maybe not.
-d, --dualtest
Do a bidirectional test simultanous test using two unidirectional sockets
--fq-rate n[kmgKMG]
Set a rate to be used with fair-queueing based socket-level pacing, in bytes or bits per second. Only available on platforms supporting the SO_MAX_PACING_RATE socket option.
(Note: Here the suffixes indicate bytes/sec or bits/sec per use of uppercase or lowercase, respectively)
--full-duplex
run a full duplex test, i.e. traffic in both transmit and receive directions using the same socket
Bob
I'd like to know the general cost of creating a new connection, compared to UDP. I know TCP requires an initial exchange of packets (the 3 way handshake). What would be other costs? For instance is there some sort of magic in the kernel needed for setting up buffers etc?
The reason I'm asking is I can keep an existing connection open and reuse it as needed. However if there is little overhead reconnecting it would reduce complexity.
Once a UDP packet's been dumped onto the wire, the UDP protocol stack is free to completely forget about it. With TCP, there's at bare minimum the connection details (source/dest port and source/dest IP), the sequence number, the window size for the connection etc... It's not a huge amount of data, but adds up quickly on a busy server with many connections.
And then there's the 3-way handshake as well. Some braindead (and/or malicious systems) can abuse the process (look up 'syn flood'), or just drop the connection on their end, leaving your system waiting for a response or close notice that'll never come. The plus side is that with TCP the system will do its best to make sure the packet gets where it has to. With UDP, there's no guarantees at all.
Compared to the latency of the packet exchange, all other costs such as kernel setup times are insignificant.
OPTION 1: The general cost of creating a TCP connection are:
Create socket connection
Send data
Tear down socket connection
Step 1: Requires an exchange of packets, so it's delayed by to & from network latency plus the destination server's service time. No significant CPU usage on either box is involved.
Step 2: Depends on the size of the message.
Step 3: IIRC, just sends a 'closing now' packet, w/ no wait for destination ack, so no latency involved.
OPTION 2: Costs of UDP:*
Create UDP object
Send data
Close UDP object
Step 1: Requires minimal setup, no latency worries, very fast.
Step 2: BE CAREFUL OF SIZE, there is no retransmit in UDP since it doesn't care if the packet was received by anyone or not. I've heard that the larger the message, the greater probability of data being received corrupted, and that a rule of thumb is that you'll lose a certain percentage of messages over 20 MB.
Step 3: Minimal work, minimal time.
OPTION 3: Use ZeroMQ Instead
You're comparing TCP to UDP with a goal of reducing reconnection time. THERE IS A NICE COMPROMISE: ZeroMQ sockets.
ZMQ allows you to set up a publishing socket where you don't care if anyone is listening (like UDP), and have multiple listeners on that socket. This is NOT a UDP socket - it's an alternative to both of these protocols.
See: ZeroMQ.org for details.
It's very high speed and fault tolerant, and is in increasing use in the financial industry for those reasons.
What's the difference between sockets (stream) vs sockets (datagrams)? Why use one over the other?
A long time ago I read a great analogy for explaining the difference between the two. I don't remember where I read it so unfortunately I can't credit the author for the idea, but I've also added a lot of my own knowledge to the core analogy anyway. So here goes:
A stream socket is like a phone call -- one side places the call, the other answers, you say hello to each other (SYN/ACK in TCP), and then you exchange information. Once you are done, you say goodbye (FIN/ACK in TCP). If one side doesn't hear a goodbye, they will usually call the other back since this is an unexpected event; usually the client will reconnect to the server. There is a guarantee that data will not arrive in a different order than you sent it, and there is a reasonable guarantee that data will not be damaged.
A datagram socket is like passing a note in class. Consider the case where you are not directly next to the person you are passing the note to; the note will travel from person to person. It may not reach its destination, and it may be modified by the time it gets there. If you pass two notes to the same person, they may arrive in an order you didn't intend, since the route the notes take through the classroom may not be the same, one person might not pass a note as fast as another, etc.
So you use a stream socket when having information in order and intact is important. File transfer protocols are a good example here. You don't want to download some file with its contents randomly shuffled around and damaged!
You'd use a datagram socket when order is less important than timely delivery (think VoIP or game protocols), when you don't want the higher overhead of a stream (this is why DNS is primarily a datagram protocol, so that servers can respond to many, many requests at once very quickly), or when you don't care too much if the data ever reaches its destination.
To expand on the VoIP/game case, such protocols include their own data-ordering mechanism. But if one packet is damaged or lost, you don't want to wait on the stream protocol (usually TCP) to issue a re-send request -- you need to recover quickly. TCP can take up to some number of minutes to recover, and for realtime protocols like gaming or VoIP even three seconds may be unacceptable! Using a datagram protocol like UDP allows the software to recover from such an event extremely quickly, by simply ignoring the lost data or re-requesting it sooner than TCP would.
VoIP is a good candidate for simply ignoring the lost data -- one party would just hear a short gap, similar to what happens when talking to someone on a cell phone when they have poor reception. Gaming protocols are often a little more complex, but the actions taken will usually be to either ignore the missing data (if subsequently-received data supercedes the data that was lost), re-request the missing data, or request a complete state update to ensure that the client's state is in sync with the server's.
Stream Socket:
Dedicated & end-to-end channel between server and client.
Use TCP protocol for data transmission.
Reliable and Lossless.
Data sent/received in the similar order.
Long time for recovering lost/mistaken data
Datagram Socket:
Not dedicated & end-to-end channel between server and client.
Use UDP for data transmission.
Not 100% reliable and may lose data.
Data sent/received order might not be the same.
Don't care or rapid recovering lost/mistaken data.
If it is the network programming I think starting from sockets would be a good start.
socket = ip + port
there are three types of sockets
stream (TCP, order and delivery guaranteed,no duplication,no length or char boundaries for data,connection-oriented,reliable, concurrency)
datagram(UDP,packet-based, connectionless, datagram size limit, data can be lost or duplicated, order not guaranteed,not reliable)
raw (direct access to lower layer protocols IP,ICMP)
I do not see any strict rule for transport protocol type as to what socket has to use what transport protocol and reliability should not be mistaken because UDP is realiable in case both ends are active.
Reliability refers to more like reliability of delivery since there are sequence number checks by using TCP as transport protocol which do not exist in UDP.It is better using network protocol analyzer like wireshark tcpdump etc to see what your software is exactly doing; kind of verification or merging theory on the paper with your work in action.
I would be greatful for help, understanding how long it takes to establish a TCP connection when I have the Ping RoundTripTip:
According to Wikipedia a TCP Connection will be established in three steps:
1.SYN-SENT (=>CLIENT TO SERVER)
2.SYN/ACK-RECEIVED (=>SERVER TO CLIENT)
3.ACK-SENT (=>CLIENT TO SERVER)
My Questions:
Is it correct, that the third transmission (ACK-SENT) will not yet carry any payload (my data) but is only used for the connection establishement.(This leads to the conclusion, that the fourth packt will be the first packt to hold any payload....)
Is it correct to assume, that when my Ping RoundTripTime is 20 milliseconds, that in the example given above, the TCP Connection establishment would at least require 30 millisecons, before any data can be transmitted between the Client and Server?
Thank you very much
Tom
Those things are basically correct, though #2 assumes that the round-trip time is symmetric.
To measure this, called the "Time to Syn/ACK" (which is NOT the Time to Establish a connection - the connection is only half-open when in that state, you need the 3rd packet acknowledging the establishment to consider it established), you usually need professional tools that include their own TCP stack, enabling that kind of measurement. The most used one is called the Spirent Avalanche, but you also have Ixia's IxLoad or BreakingPoint Systems box (BPS has now been acquired by Ixia btw).
Note that, yes, 3rd packet won't have any data, and that is also true of the first two. They are only Syn and Syn+Ack flagged (those are TCP flags), and contain no application data. This initial exchange, called the Three-way Handshake therefore causes some overhead, which is why TCP is typically not used in real-time applications (voice, live video, etc..).
Also as stated, you can't assume that Latency = RTT/2. It is in fact very complicated to measure one-way latency above layer 3 (IP) - and you are already at layer 4 (TCP) here. This blog post covers in details the challenge of this: http://synsynack.wordpress.com/2012/04/09/realistic-latency-measurement-in-the-application-layers/