ZMQ performance in comparison to UDP multicast - sockets

What is performance (I mean latency while sending all messages, maximum fan-out rate for many messages to many receivers) of ZMQ in comparison to "simple" UDP and its multicast implementation?
Assume, I have one static 'sender', which have to send messages to many,many 'receivers'. PUB/SUB pattern with simple TCP transport seems very comfortable to handle such task - ZMQ does many things without our effort, one ZMQ-socket is enough to handle even numerous connections.
But, what I am afraid is: ZMQ could create many TCP sockets in background, even if we don't "see" that. That could create latency. However, if I create "common" UDP socket and will transmit all my messages with multicast - there would be only one socket (multicast), so I think latency problem would be solved. To be honest, I would like to stay with ZMQ and PUB/SUB on TCP. Are my concerns valid?

I don't think you can really compare them in that way. It depends on what is important to you.
TCP provides reliability and you as the sender can choose if loss is more important than latency by setting your block/retry options on send.
mcast provides network bandwidth saving especially if you network has multiple segments/routers.
Other options in zeromq
Use zmq_proxy's to split/share the load of tcp connections
Use pub/sub with pgm/epgm which is just a layer over multicast (I use this)
Use the new radio dish pattern (with this you have limited subscription options)
http://api.zeromq.org/4-2:zmq-udp

Behind the scenes, a TCP "socket" is identified (simplified) by both the "source" and "destination" - so there will be a socket for each direction you're communicating with each peer (for a more full description of how a socket is set up/identified, see here and here). This has nothing to do with ZMQ, ZMQ will set up exactly as many sockets as is required by TCP. You can optimize this if you choose to use multicast, but ZMQ does not optimize this for you by using multicast behind the scenes, except for PUB/SUB see here.
Per James Harvey's answer, ZMQ has added support for UDP, so you can use that if you do not need or want the overhead of TCP, but based on what you've said, you'd like to stay with TCP, which is likely the better choice for most applications (i.e. we often unconsciously expect the reliability of TCP when we're designing our applications, and should only choose UDP if we know what we're doing).
Given that, you should assume that ZMQ is as efficient as TCP enables it to be with regards to low-level socket management. In your case, with PUB/SUB, you're already using multicast. I think you're good.

Related

TCP server vs HTTP server in vert.x

What is the difference between a TCP server/Net server in vertex and HTTP server?
What are the use cases for each?
I tried googling and went through the official website, none of them have a clear explanation.
First off, in General Networking, there are 2 common types of handling connections. This can be done either over TCP (Transmission Control Protocol) or UDP (User Datagram Protocol). The most import difference between these two is that UDP will continuously send out streams/buffers of bytes without checking to see if the network packets made it to the other side of the line. This is useful in situations where security isn't much of an issue and where speed is important. Most VoIP services (Skype, Hangouts), XMPP (chat) and even YouTube (I think) use UDP for their streaming, as it has huge gains on performances and it doesn't matter all that much if a frame made it to the other side of the line, as the person could simply repeat themselves.
TCP on the other hand, is "secure" by default. It performs several handshakes on a regular basis with the endpoint so maintain connectivity and make sure that all packets are received on the other side of the line.
Now, there are a LOT of protocols out there in the Wild Wild West called Internet.
List of TCP and UDP port numbers
As you can see, a lot of protocols support either TCP or UDP. HTTP on it's own is a TCP protocol with port 80 (as you might know). Therefore, HTTPServer is pretty much just an extension of a TCPServer, but with some add-ons such as REST. These add-ons are much welcome as HTTP processing is a pretty common use case. Without HTTPServer, you would need to declare loads of functions on your own.
There are many articles online explaining the difference between HTTP and TCP, so here is: http://www.differencebetween.net/technology/internet/difference-between-tcp-and-http/
Vert.x naturally offers the capacity to do "raw" networking at TCP level or at HTTP-level, the latter offering facilities to deal with the protocol, including decoding TCP packets into HTTTP requests, supporting the creation of HTTP responses, ...

Can sending first a UDP message and then a TCP message be any good?

I have an application that communicates in real time with other clients over LAN. The application requires packets to be in order and all to arrive. It also requires as fast transfer as possible and I seem to have some problems with TCP in this matter.
So I was thinking about this, as a non-experienced network programmer, what if I first send a UDP protocoled message and then the same data with TCP. If the UDP-message arrive I will have it as fast as possible if not I still have the TCP message that will make sure I'll atleast get the packet. Obviously I'll make sure that I don't read the same data twice by giving each message an ID or similar.
Is this any good approach? I was thinking that maybe sending the tcpmessage simultaneously will just slow the udp message down so It wont make a difference anyways.
No, this is not a good approach.
You are doubling your network bandwidth and significantly increasing the complexity of your networking code for very little gain.
TCP and UDP have very different characeristics. If you care about data arriving in a timely manner, where if data is late it is no use, then TCP is not useful and as such you should use and only use UDP. If you do not care about data arriving in a timely manner, then UDP is not useful, as it is not reliable.
UDP has very specific use cases. i.e. say an online game which sends a players co-ordinates. You state the order and acknowledgment is needed, therefore TCP seems like the most sensible approach.
Although just to put a twist in the mix, TCP can sometimes surprise you and be better performance wise under specific circumstances.
TCP will try and buffer the data before it sends it across the network (more efficient use of bandwidth). UDP on the other hand puts a packet across the network immediately.
Now imagine writing lots of small packets across a network, UDP may cause congestion whereas TCP is better controlled.
No it is not a good approach at all. You will have now twice the data being sent.
For real time communication UDP is the best approach. You should design your reciever algorithm to manage out of data arrival and sort it and also non arrival of some data.
Also the kind of data being sent can be a deciding factor. If its transactions of a financial kind, udp is not a good idea. But then you should be on a different network.
If it is video data, real time is very important, losses can be tolerated.
So see if you can use the properties of your data to manage udp connection well.

Would we see any speedup using ZeroMQ instead of TCP Sockets if the two processes communicating are on the same machine?

I understand that 0MQ is supposed to be faster than TCP Sockets in a clustered environment and I can see where that would be the case (I think that's what they're referring to when they say "Faster than TCP, for clustered products and supercomputing" on the 0MQ website). However, will I see any kind of speedup using 0MQ instead of TCP sockets to communicate between two processes running on the same machine?
Well, the short version is give it a try.
The slightly longer version is that writing TCP sockets can be hard, there's a lot of things that are easy to have problems with, but 0MQ guarantees the message will be delivered in its entirety. It is also written by experts in network sockets, which, with the best will in the world, you probably aren't, and they use a few advanced tricks to speed things along.
You are not actually running on one machine because the VM is treated as a separate machine. This means that TCP sockets have to run through the whole network stack and cannot take shortcuts like they do when you communicate between processes on one machine.
However, you could try UDP multicast under ZeroMQ to see if that speeds up your application. UDP is less reliable on a wide area network, but in a closed environment of a VM talking to its host, you can safely skip all the TCP reliability stuff.
I guess IPC should be faster than TCP. If you are willing to move to a single process, INPROC is definitely going to be much faster.
I think (have not tested) that the answer is false as ZMQ likely uses the same standard C lib and adds some message headers.
Same thing applies for UDP.
Same thing applies for IPC pipes.
ZMQ could be just as fast but since it adds headers it is not likely.
Now it could be a different story if you really need some sort of header and ZMQ has implemented it better than you. Like for message size or type, but I digress.

Single source pushing: how to send 5kb each 5 minutes to 50000 clients

I need to implement a client server architecture where the server sends
the same message to many clients over the internet.
I need to send a single message every 5 minutes about.
The message won't excede 5KB.
I need the solution to scale to a big number of clients connected (50.000-100.000)
I considered a bunch of solutions:
TCP Sockets
UDP Multicast
WCF http duplex service (comet)
I think I have to discard UDP solution because it is a good solution only for clients on the same network and it won't work over the internet.
I read somewhere that WCF multicast will cause a bottleneck if I have many clients connected but I can't find anywhere documentation showing performance statistics.
Tcp sockets seems to me the solution to chose.
What do you think about? Am I correct?
I'm certainly wrong when I say UDP doesn't work on internet... I thought
this because I read some articles pointing out that you need properly
configured routers in the network to support multicasting... I read of the
udp ports multicast range and thought it was meant to be locally.
Instead, the range 224.0.0.1 - 239.255.255.255 (Class D address group), can be reached over the internet
Considering that in my case reliability is not a crucial point, the udp multicast is a good choice.
The .net framework offers really helpful classes to accomplish this.
I can easily start an UdpClient and begin send data on a multicast address with two lines of code.
At client side it is really easy to.
There is the UdpSingleSourceMulticastClient class that does exactly what I need.
For what concernes reliability and security the .net framework has a smart and simple way of handle DoS attacks, DNS Rebinding attacks and Revers tunnel attacks that is described here: http://msdn.microsoft.com/en-us/library/ee707325(v=vs.95).aspx
The main question is: Do you care if the updates get to the clients?
If you DO then you will need to build something on top of UDP to add reliability. UDP datagrams are NOT reliable and so you should expect that some wont get to the destination. This is more likely if you are pushing UDP datagrams out quickly. Note that your clients might also get multiple copies of the same datagram in some situations with UDP.
50-100k connections with this level of traffic shouldn't be that difficult to achieve with TCP if you have a decent architecture.
See here for some blog posts that I've done on the subject.
http://www.serverframework.com/asynchronousevents/2010/10/how-to-support-10000-concurrent-tcp-connections.html
http://www.serverframework.com/asynchronousevents/2010/10/how-to-support-10000-or-more-concurrent-tcp-connections---part-2---perf-tests-from-day-0.html
http://www.serverframework.com/asynchronousevents/2010/12/one-million-tcp-connections.html
And here's some example code that deals with sending data to many clients.
http://www.serverframework.com/ServerFramework/latest/Docs/examples-datadistributionservers.html
Unicast (tcp sockets) will work fine for a relatively small amount of traffic such as this, but keep on top of multicasting technology, the situation is changing every year.

UDP for interprocess communications

I have to implement IPC mechanism (Sending short messages) between java/c++/python process running on the same system. One way to implement is using socket using TCP protocol. This requires maintain connection and other associated activities.
Instead I am thinking of using UDP protocol which does not requires connection and I can send messages.
My question is , does UDP on same machine ( for IPC ) still has same disadvantage has it is applicable when communicating across machines ( like un reliable packet delivery, out of order packet.
Yes, is still unrealiable. For local communication try to use named pipes or shared memory
Edit:
Don't know the requirements of your applications, did you considered something like MPI (altough Java is not well supported...) or, Thrift? ( http://thrift.apache.org/ )
Local UDP is still unreliable, but the major advantage is UDP multicast. You can have one data publisher and many data subscribers. The kernel does the job of delivering a copy of the datagram to each subscriber for you.
Unix local datagram sockets, on the other hand, are required to be reliable but they do not support multicast.
Local UDP is more unreliable than on a network, like 50%+ packet drop unreliable. It is a terrible choice, kernel developers have attributed the quality down to lack of demand.
I would recommend investigating message based middleware preferably with a BSD socket compatible interface for easy learning curve. A suggestion would be ZeroMQ which includes C++, Java and Python bindings.
Local UDP is both still unreliable and sometimes blocked by firewalls. We faced this in our MsgConnect product which uses local UDP for interthread communication. BTW MsgConnect can be an option for your task so that you don't need to deal with sockets. Unfortunately there's no Python binding, but "native" C++ and Java implementations exist.