A large scale TCP IP based pubsub systems - publish-subscribe

I am designing a TCP/IP based pub/sub system. This is expected to have a high message update rate and also a large number of subscribers.
I was looking at CometD before but we realised that the Bayeux protocols it supports is just JSON on Http. We don't want a Http overhead in this system.
Now i am looking at ZeroMQ for a possible solution. Are there any other such systems out there which have been proven to handle large scale pub/sub over TCPIP?
Update - My publishers are just TCP/IP clients but my subscribers are web browser based widgets. As I understand, ZeroMQ does not have Http support for browser based subscribers. Are there any workarounds for such a case?

You seem to be making contradictory requirements:
You don't want HTTP overhead
Your clients are browser-based widgets
If you can rewrite your clients you might consider a 0MQ to websocket bridge. There are a few floating around, like https://gist.github.com/1051872.
Also, when you explain your requirements, please provide figures. "High message update rate" and "large number of subscribers" means very little. 10/sec? 1M/sec? 50 subscribers? 50,000? Also, it's worth noting the average message size, whether you have to work over public Internet, and any other constraints.

Related

Kafka messages over rest api

we currently have a library which we use to interact with kafka. but we planning to develop this library into a separate application. Other applications will send kafka messages using rest endpoint. Planning to use vert.x in this application to make it non-blocking and fast. Is it a good strategy. My concern 1) http will make it slower compared to TCP of kafka 2) streaming may not be possible 3) single point of failure
But being separate application - release management, control and support will be lot easier than currently.
Is it good strategy and has someone done like this before? Any suggestions?
Your consideration for going with HTTP/ TCP will depend on the number of applications that will be talking to your service. Let's say there is an IOT device that is sending lots of messages continuously, then using HTTP will be expensive and it will increase latency. Since HTTP connection establishment is an expensive operation.
Now, consider the case where you have a transactional system that is sending transaction events as they commit to your database then the rate of messages will be lower I assume, then it makes sense to use HTTP there.
It will depend on the rate of messages that your service will receive, that will decide the way you want to take.
Now, for your current approach of maintaining a library, it is a good way to maintain consistency across the organisation as long as the library is maintained and users of your library constantly update as and when you make changes to your library. It also has the advantage of not maintaining separate infrastructure/servers since your code will run in your users' application.

IBM IoT Foundation: When to use MQTT and when to use REST for event submission?

The IBM IoT Foundation allows devices to submit events to the IBM cloud for consumption and recording. There appears to be two primary mechanisms to achieve the transmission of events ... MQTT and REST (HTTP POST requests). Assuming that a project will have sensors with direct TCP connectivity to IBM cloud over the Internet, what might we consider as the potential distinctions between the two technologies? What factors would case us to choose MQTT or REST as the technology to use? Are there any substantial performance differences at the final mile at the IBM end that would say that one technology is preferred over another?
MQTT is designed to be a fast and lightweight messaging protocol, and is as a result, faster and more efficient at this than HTTP when used to do the equivalent. More efficient not only means less traffic data and more speed, but sometimes it can mean less electrical power as well. MQTT is particularly good where bandwidth is a concern.
MQTT does, however, need a client implementation (like Paho) which is possibly a rarer thing than an HTTP client implementation, which would be more ubiquitous and therefore more likely/easily available on any given device.
There are also TCP/IP port considerations, where some network hardware may require HTTP ports 80 or 443 (although IoTF supports MQTT and MQTTWS on port 443).
There may also be an ideological or philosophical reason for choosing HTTP instead of MQTT (or COAP for that matter), but usually, I would say the reasons for choosing HTTP instead of MQTT would be network related or client support related.
There is no official paper on the performance differences yet, but safe to say MQTT will be more efficient and faster given just about any messaging scenario (long lived connections or adhoc etc.)
I would summarize the considerations as:
mqtt will support higher throughout and the API is much simpler compared to a REST api
REST API is likely much more readily available on iot devices, BUT this could be changing as mqtt is gaining in popularity and big players like Google Cloud Platform and IBM Bluemix support mqtt in their iot service.

Differences between AMQP and ZeroMQ

Recently started looking into these AMQP (RabbitMQ, ActiveMQ) and ZeroMQ technologies, being interested in distributed systems/computation. Been Googling and StackOverflow'ing around, couldn't find a definite comparison between the two.
The farthest I got is that the two aren't really comparable, but I want to know the differences. It seems to me ZeroMQ is more decentralized (no message broker playing middle-man handling messages/guarenteering delivery) and as such is faster, but is not meant to be a fully fledged system but something to be handled more programmatically, something like Actors.
AMQP on the other hand seems to be a more fully fledged system, with a central message broker ensuring reliable delivery, but slower than ZeroMQ because of this. However, the central broker creates a single point of failure.
Perhaps a metaphor would be client/server vs. P2P?
Are my findings true? Also, what would be the advantages, disadvantages, or use cases of using one over the other? A comparison of the uses of *MQ vs. something like Akka Actors would be nice as well.
EDIT Did a bit more looking around.. ZeroMQ seems to be the new contender to AMQP, seems to be much faster, only issue would be adoption/implementations?
Here's a fairly detailed comparison of AMQP and 0MQ: http://www.zeromq.org/docs:welcome-from-amqp
Note that 0MQ is also a protocol (ZMTP) with several implementations, and a community.
AMQP is a protocol. ZeroMQ is a messaging library.
AMQP offers flow control and reliable delivery. It defines standard but extensible meta-data for messages (e.g. reply-to, time-to-live, plus any application defined headers). ZeroMQ simply provides message delimitation (i.e. breaking a byte stream up into atomic units), and assumes the properties of the underlying protocol (e.g. TCP) are sufficient or that the application will build extra functionality for flow control, reliability or whatever on top of ZeroMQ.
Although earlier versions of AMQP were defined along client/server lines and therefore required a broker, that is no longer true of AMQP 1.0 which at its core is a symmetric, peer-to-peer protocol. Rules for intermediaries (such as brokers) are layered on top of that. The link from Alexis comparing brokered and brokerless gives a good description of the benefits such intermediaries can offer. AMQP defines the rules for interoperability between different components - clients, 'smart clients', brokers, bridges, routers etc -
such that a system can be composed by selecting the parts that are useful.
In ZeroMQ there are NO MESSAGE QUEUES at all, thus the name. It merely provides a way to use messaging semantics over otherwise ordinary sockets.
AMQP is standard protocol for message queueing which is meant to be used with a message-broker handling all message sends and receives. It has a lot of features which are available because it funnels all message traffic through a broker. This may sound slow, but it is actually quite fast when used inside a data centre where host to host latencies are tiny.
I'm not really sure how to respond to your question, which is comparing a lot of different things... but see this which may help you begin to dig into these issues: http://www.rabbitmq.com/blog/2010/09/22/broker-vs-brokerless/
AMQP (Advanced Message Queuing Protocol) is a standard binary wire level protocol that enables conforming client applications to communicate with conforming messaging middleware brokers. AMQP allows cross platform services/systems between different enterprises or within the enterprise to easily exchange messages between each other regardless of the message broker vendor and platform. There are many brokers that have implemented the AMQP protocol like RabbitMQ, Apache QPid, Apache Apollo etc.
ZeroMQ is a high-performance asynchronous messaging library aimed at use in scalable distributed or concurrent applications. It provides a message queue, but unlike message-oriented middleware, a ØMQ system can run without a dedicated message broker.
Broker-less is a misnomer as compared to message brokers like ActiveMQ, QPid, Kafka for simple wiring.
It is useful and can be applied to hotspots to reduce network hops and hence latency, as we add reliability, store and forward feature and high availability requirements, you probably need a distributed broker service along with a queue for sharing data to support a loose coupling - decoupled in time - this topology and architecture can be implemented using ZeroMQ, you have to consider your use cases and see, if asynchronous messaging is required and if so, where ZeroMQ would fit, it has a good role in solution it appears and a reasonable knowledge of TCP/IP and socket programming would help you appreciate all others like ZeroMQ, AMQP, etc.

REST for low latency messaging .

why dont you see more people using REST architecture for client server system. You see people using sockets, or TIBCO RV or EMS or MQ but i haven't seen much basic REST architecture
does anyone know any reason why you would avoid using this architecture for client / server communication for high through put / low latency
REST is not a good fit for every problem.
REST is best for Resource management. If you are writing web services (as with a client-server system) then you find you want things like language-agnostic data representation, argument validation, client/server code generation, error handling, access controls. REST basically requires you to code those things yourself.
On the other hand, it adds the HTTP layer. You get seamless integration of proxies, caching etc, but you do lose some speed due to HTTP headers, the webserver frontend, etc.
I don't know that I would necessarily avoid it but I can think of a couple of reasons why I might not choose it for a high through-put, low latency service. First, you have to deal with the entire web stack to get your message to your service. This could introduce a number of unnecessary layers and services that would delay messages. A custom service need only support the protocol layers required by the service itself.
Second, unless your service is the only service hosted on the web server, you'll be competing with other requests for your messages to be serviced. While having a custom endpoint for your service may not solve all resource contention problems, at least you don't have to compete for access from other services to your endpoint.
Third, a custom protocol need only support the actual service-related protocol information and may result in smaller packet sizes because you don't need to support the additional HTTP protocol overhead. This would particularly effect protocols that exchange small messages as the header information would be a larger fraction of the message size.

What do you use when you need reliable UDP?

If you have a situation where a TCP connection is potentially too slow and a UDP 'connection' is potentially too unreliable what do you use? There are various standard reliable UDP protocols out there, what experiences do you have with them?
Please discuss one protocol per reply and if someone else has already mentioned the one you use then consider voting them up and using a comment to elaborate if required.
I'm interested in the various options here, of which TCP is at one end of the scale and UDP is at the other. Various reliable UDP options are available and each brings some elements of TCP to UDP.
I know that often TCP is the correct choice but having a list of the alternatives is often useful in helping one come to that conclusion. Things like Enet, RUDP, etc that are built on UDP have various pros and cons, have you used them, what are your experiences?
For the avoidance of doubt there is no more information, this is a hypothetical question and one that I hoped would elicit a list of responses that detailed the various options and alternatives available to someone who needs to make a decision.
What about SCTP. It's a standard protocol by the IETF (RFC 4960)
It has chunking capability which could help for speed.
Update: a comparison between TCP and SCTP shows that the performances are comparable unless two interfaces can be used.
Update: a nice introductory article.
It's difficult to answer this question without some additional information on the domain of the problem.
For example, what volume of data are you using? How often? What is the nature of the data? (eg. is it unique, one off data? Or is it a stream of sample data? etc.)
What platform are you developing for? (eg. desktop/server/embedded)
To determine what you mean by "too slow", what network medium are you using?
But in (very!) general terms I think you're going to have to try really hard to beat tcp for speed, unless you can make some hard assumptions about the data that you're trying to send.
For example, if the data that you're trying to send is such that you can tolerate the loss of a single packet (eg. regularly sampled data where the sampling rate is many times higher than the bandwidth of the signal) then you can probably sacrifice some reliability of transmission by ensuring that you can detect data corruption (eg. through the use of a good crc)
But if you cannot tolerate the loss of a single packet, then you're going to have to start introducing the types of techniques for reliability that tcp already has. And, without putting in a reasonable amount of work, you may find that you're starting to build those elements into a user-space solution with all of the inherent speed issues to go with it.
ENET - http://enet.bespin.org/
I've worked with ENET as a reliable UDP protocol and written an asynchronous sockets friendly version for a client of mine who is using it in their servers. It works quite nicely but I don't like the overhead that the peer to peer ping adds to otherwise idle connections; when you have lots of connections pinging all of them regularly is a lot of busy work.
ENET gives you the option to send multiple 'channels' of data and for the data sent to be unreliable, reliable or sequenced. It also includes the aforementioned peer to peer ping which acts as a keep alive.
We have some defense industry customers that use UDT (UDP-based Data Transfer) (see http://udt.sourceforge.net/) and are very happy with it. I see that is has a friendly BSD license as well.
Anyone who decides that the list above isn't enough and that they want to develop their OWN reliable UDP should definitely take a look at the Google QUIC spec as this covers lots of complicated corner cases and potential denial of service attacks. I haven't played with an implementation of this yet, and you may not want or need everything that it provides, but the document is well worth reading before embarking on a new "reliable" UDP design.
A good jumping off point for QUIC is here, over at the Chromium Blog.
The current QUIC design document can be found here.
RUDP - Reliable User Datagram Protocol
This provides:
Acknowledgment of received packets
Windowing and congestion control
Retransmission of lost packets
Overbuffering (Faster than real-time streaming)
It seems slightly more configurable with regards to keep alives then ENet but it doesn't give you as many options (i.e. all data is reliable and sequenced not just the bits that you decide should be). It looks fairly straight forward to implement.
As others have pointed out, your question is very general, and whether or not something is 'faster' than TCP depends a lot on the type of application.
TCP is generally as fast as it gets for reliable streaming of data from one host to another. However, if your application does a lot of small bursts of traffic and waiting for responses, UDP may be more appropriate to minimize latency.
There is an easy middle ground. Nagle's algorithm is the part of TCP that helps ensure that the sender doesn't overwhelm the receiver of a large stream of data, resulting in congestion and packet loss.
If you need the reliable, in-order delivery of TCP, and also the fast response of UDP, and don't need to worry about congestion from sending large streams of data, you can disable Nagle's algorithm:
int opt = -1;
if (setsockopt(sock_fd, IPPROTO_TCP, TCP_NODELAY, (char *)&opt, sizeof(opt)))
printf("Error disabling Nagle's algorithm.\n");
If you have a situation where a TCP connection is potentially too slow and a UDP 'connection' is potentially too unreliable what do you use? There are various standard reliable UDP protocols out there, what experiences do you have with them?
The key word in your sentence is 'potentially'. I think you really need to prove to yourself that TCP is, in fact, too slow for your needs if you need reliability in your protocol.
If you want to get reliability out of UDP then you're basically going to be re-implementing some of TCP's features on top of UDP which will probably make things slower than just using TCP in the first place.
Protocol DCCP, standardized in RFC 4340, "Datagram Congestion Control Protocol" may be what you are looking for.
It seems implemented in Linux.
May be RFC 5405, "Unicast UDP Usage Guidelines for Application Designers" will be useful for you.
RUDP. Many socket servers for games implement something similar.
Did you consider compressing your data ?
As stated above, we lack information about the exact nature of your problem, but compressing the data to transport them could help.
It is hard to give a universal answer to the question but the best way is probably not to stay on the line "between TCP and UDP" but rather to go sideways :).
A bit more detailed explanation:
If an application needs to get a confirmation response for every piece of data it transmits then TCP is pretty much as fast as it gets (especially if your messages are much smaller than optimal MTU for your connection) and if you need to send periodic data that gets expired the moment you send it out then raw UDP is the best choice for many reasons but not particularly for speed as well.
Reliability is a more complex question, it is somewhat relative in both cases and it always depends on a specific application. For a simple example if you unplug the internet cable from your router then good luck keeping reliably delivering anything with TCP. And what even worse is that if you don't do something about it in your code then your OS will most likely just block your application for a couple of minutes before indicating an error and in many cases this delay is just not acceptable as well.
So the question with conventional network protocols is generally not really about speed or reliability but rather about convenience. It is about getting some features of TCP (automatic congestion control, automatic transmission unit size adjustment, automatic retransmission, basic connection management, ...) while also getting at least some of the important and useful features it misses (message boundaries - the most important one, connection quality monitoring, multiple streams within a connection, etc) and not having to implement it yourself.
From my point of view SCTP now looks like the best universal choice but it is not very popular and the only realistic way to reliably pass it across the Internet of today is still to wrap it inside UDP (probably using sctplib). It is also still a relatively basic and compact solution and for some applications it may still be not sufficient by itself.
As for the more advanced options, in some of the projects we used ZeroMQ and it worked just fine. This is a much more of a complete solution, not just a network protocol (under the hood it supports TCP, UDP, a couple of higher level protocols and some local IPC mechanisms to actually deliver messages). Since a couple of releases its initial developer has switched his attention to his new NanoMSG and currently the newest NNG libraries. It is not as thoroughly developed and tested and it is not very popular but someday it may change. If you don't mind the CPU overhead and some network bandwidth loss then some of the libraries might work for you. There are some other network-oriented message exchange libraries available as well.
You should check MoldUDP, which has been around for decades and it is used by Nasdaq's ITCH market data feed. Our messaging system CoralSequencer uses it to implement a reliable multicast event-stream from a central process.
Disclaimer: I'm one of the developers of CoralSequencer
The best way to achieve reliability using UDP is to build the reliability in the application program itself( for example, by adding acknowledgment and retransmission mechanisms)