Implementing a message bus using ZeroMQ - sockets

I have to develop a message bus for processes to send, receive messages from each other. Currently, we are running on Linux with the view of porting to other platforms later.
For this, I am using ZeroMQ over TCP. The pattern is PUB-SUB with a forwarder. My bus runs as a separate process and all clients connect to SUB port to receive messages and PUB to send messages. Each process subscribes to messages by a unique tag. A send call from a process sends messages to all. A receive call will fetch that process the messages marked with the tag of that process. This is working fine.
Now I need to wrap the ZeroMQ stuff. My clients only need to supply a unique tag. I need to maintain a global list of tags vs. ZeroMQ context and sockets details. When a client say,
initialize_comms("name"); the bus needs to check if this name is unique, create ZeroMQ contexts and sockets. Similarly, if a client say receive("name"); the bus needs to fetch messages with that tag.
To summarize the problems I am facing;
Is there anyway to achieve this using facilities provided by ZeroMQ?
Is ZeroMQ the right tool for this, or should I look for something like nanomsg?
Is PUB-SUB with forwarder the right pattern for this?
Or, am I missing something here?

Answers
Yes, ZeroMQ is capable of serving this need
Yes. ZeroMQ is a right tool ( rather a powerful tool-box of low-latency components ) for this. While nanomsg has a straight primitive for bus, the core distributed logic can be integrated in ZeroMQ framework
Yes & No. PUB-SUB as given above may serve for emulation of the "shout-cast"-to-bus and build on a SUB side-effect of using a subscription key(s). The WHOLE REST of the logic has to be re-thought and designed so as the whole scope of the fabrication meets your plans (ref. below). Also kindly bear in mind, that initial versions of ZeroMQ operated PUB/SUB primitive as "subscription filtering" of the incoming stream of messages being done on receiver side, so massive designs shall check against traffic-volumes / risk-of-flooding / process-inefficiency on the massive scale...
Yes. ZeroMQ is rather a well-tuned foundation of primitive elements ( as far as the architecture is discussed, not the power & performance thereof ) to build more clever, more robust & almost-linearly-scaleable Formal Communication Pattern(s). Do not get stuck to PUB/SUB or PAIR primitives once sketching Architecture. Any design will remain poor if one forgets where the True Powers comes from.
A good place to start a next step forward towards a scaleable & fault-resilient Bus
Thus a best next step one may do is IMHO to get a bit more global view, which may sound complicated for the first few things one tries to code with ZeroMQ, but if you at least jump to the page 265 of the Code Connected, Volume 1, if it were not the case of reading step-by-step thereto.
The fastest-ever learning-curve would be to have first an un-exposed view on the Fig.60 Republishing Updates and Fig.62 HA Clone Server pair for a possible High-availability approach and then go back to the roots, elements and details.

Here is what I ended up designing, if anyone is interested. Thanks everyone for the tips and pointers.
I have a message bus implemented using ZeroMQ (and CZMQ) running as a separate process.
The pattern is PUBLISHER-SUBSCRIBER with a LISTENER. They are connected using a PROXY.
In addition, there is a ROUTER invoked using a newly forked thread.
These three endpoints run on TCP and are bound to predefined ports which the clients know of.
PUBLISHER accepts all messages from clients.
SUBSCRIBER sends messages with a unique tag to the client who have subscribed to that tag.
LISTENER listens to all messages passing through. currently, this is for logging testing and purposes.
ROUTER provides a separate comms channel to clients. Messages such as control commands are directed here so that they will not get passed downstream.
Clients connect to,
PUBLISHER to send messages.
SUBSCRIBER to receive messages. Subscription is using unique tags.
ROUTER to send commands (check tag uniqueness etc.)
I am still doing implementation so there may be unseen problems, but right now it works fine. Also, there may be a more elegant way but I didn't want to throw away the PUB-SUB thing I had built.

Related

Is the OS always the server?

Given the server-client model, would the OS initiate messages to applications, or is message passing always initiated by programs that want to use resources and thus must communicate with the OS?
OS is an overloaded term, and application is a vague term.
A pure message passing OS might implement traditional (unix) system calls in applications. For example, you might have an application called FileSystem, which accepts messages like Read,Write,Open,Close.... In these, such an application would be considered a server, and the client would be an application which wanted to use the File Services.
Pure message passing systems typically have difficulty with asynchronous events. When you look at implementing a normal read system call in a message passing system, it is natural that it will be an RPC: the client sends a read request, then suspends until the server has satisfied the read and sent a reply.
When the client wants asynchronous notification, such as send me a message when there is new mouse events available; the RPC somewhat falls down. While purely asynchronous systems exist, they are cumbersome to use with plain old programming languages like C, C++, ... There is hope that message based languages like Golang can break the impass, but that is yet to be seen.
Higher level OS-like services may deploy a number of interaction methods, quite distinct from client serve. Publish-Subscribe, a more recent reimplmentation of the 1980s multi-catch, has been popular in the last decade. Clients subscribe to a set of channels that they are interested in, and every event delivered to that channel is copied to every client subscribed to the channel before it is retired. Normal clients can generate events as well, so the mechanism serves as a dynamic interconnect between modules.
Dbus + zeromq are P-S systems of differing scales. Note that both can be implemented outside of a message passing OS.

Pub/Sub and consumer aware publishing. Stop producing when nobody is subscribed

I'm trying to find a messaging system that supports the following use case.
Producer registers topic namespace
client subscribes to topic
first client triggers notification on producer to start producing
new client with subscription to the same topic receives data (potentially conflated, similar to hot/cold observables in RX world)
When the last client goes away, unsubscribe or crash, notify the producer to stop producing to said topic
I am aware that according to the pub/sub pattern A producer is defined to be blissfully unaware of the existence of consumers, meaning that my use-case simply does not fit the pub/sub paradigm.
So far I have looked into Kafka, Redis, NATS.io and Amazon SQS, but without much success. I've been thinking about a few possible ways to solve this, Haven't however found anything that would satisfy my needs yet.
One option that springs to mind, for bullet 2) is to model a request/reply pattern as amongs others detailed on the NATS page to have the producer listen to clients. A client would then publish a 'subscribe' message into the system that the producer would pick up on a wildcard subscription. This however leaves one big problem, which is unsubscribing. Assuming the consumer stops as it should, publishing an unsubscribe message just like the subscribe would work. But in the case of a crash or similar this won't work.
I'd be grateful for any ideas, references or architectural patterns/best practices that satisfy the above.
I've been doing quite a bit of research over the past week but haven't come across any satisfying Q&A or articles. Either I'm approaching it entirely wrong, or there just doesn't seem to be much out there which would surprise me as to me, this appears to be a fairly common scenario that applies to many domains.
thanks in advance
Chris
//edit
An actual simple use-case that I have at hand is stock quote distribution.
Quotes come from external source
subscribe to stock A quotes from external system when the first end-user looks at stock A
Stop receiving quotes for stock A from external system when no more end-users look at said stock
RabbitMQ has internal events you can use with the Event Exchange Plugin. Event such as consumer.created or consumer.deleted could be use to trigger some actions at your server level: for example, checking the remaining number of consumers using RabbitMQ Management API and takes action such as closing a topic, based on your use cases.
I don't have any messaging consumer present based publishing in mind. Got ever worst because you'll need kind of heartbeat mechanism to handle consumer crashes.
So here are my two cents, not sue if you're looking for an out of the box solution, but if not, you could wrap your application around a zookeeper cluster to handle all your use cases.
Simply use watchers on ephemeral nodes to check when you have no more consumers ( including crashes) and put some watcher around a 'consumers' path to be advertised when you get consumers.
Consumers side, you would have to register your zk node ID whenever you start it.
It's not so complicated to do, and zk is not the only solution for this, you might use other consensus techs as well.
A start for zookeeper :
https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html
( strongly advise to use curator api, which handle lot of recipes in a smooth way)
Yannick
Unfortunately you haven't specified your use business use case that you try to solve with such requirements. From the sound of it you want not the pub/sub system, but an orchestration one.
I would recommend checking out the Cadence Workflow that is capable of supporting your listed requirements and many more orchestration use cases.
Here is a strawman design that satisfies your requirements:
Any new subscriber sends an event to a workflow with a TopicName as a workflowID to subscribe. If workflow with given ID doesn't exist it is automatically started.
Any subscribe sends another signal to unsubscribe.
When no subscribers are left workflow exits.
Publisher sends an event to the workflow to deliver to subscribers.
Workflow delivers the event to the subscribers using an activity.
If workflow with given TopicName doesn't run the publish event to it is going to fail.
Cadence offers a lot of other advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
Distributed CRON support
See the presentation that goes over Cadence programming model.

What happens to messages that come to a server implements stream processing after the source reached its bound?

Im learning akka streams but obviously its relevant to any streaming framework :)
quoting akka documentation:
Reactive Streams is just to define a common mechanism of how to move
data across an asynchronous boundary without losses, buffering or
resource exhaustion
Now, from what I understand is that if up until before streams, lets take an http server for example, the request would come and when the receiver wasent finished with a request, so the new requests that are coming will be collected in a buffer that will hold the waiting requests, and then there is a problem that this buffer have an unknown size and at some point if the server is overloaded we can loose requests that were waiting.
So then stream processing came to play and they bounded this buffer to be controllable...so we can predefine the number of messages (requests in my example) we want to have in line and we can take care of each at a time.
my question, if we implement that a source in our server can have a 3 messages at most, so if the 4th id coming what happens with it?
I mean when another server will call us and we are already taking care of 3 requests...what will happened to he's request?
What you're describing is not actually the main problem that Reactive Streams implementations solve.
Backpressure in terms of the number of requests is solved with regular networking tools. For example, in Java you can configure a thread pool of a networking library (for example Netty) to some parallelism level, and the library will take care of accepting as much requests as possible. Or, if you use synchronous sockets API, it is even simpler - you can postpone calling accept() on the server socket until all of the currently connected clients are served. In either case, there is no "buffer" on either side, it's just until the server accepts a connection, the client will be blocked (either inside a system call for blocking APIs, or in an event loop for async APIs).
What Reactive Streams implementations solve is how to handle backpressure inside a higher-level data pipeline. Reactive streams implementations (e.g. akka-streams) provide a way to construct a pipeline of data in which, when the consumer of the data is slow, the producer will slow down automatically as well, and this would work across any kind of underlying transport, be it HTTP, WebSockets, raw TCP connections or even in-process messaging.
For example, consider a simple WebSocket connection, where the client sends a continuous stream of information (e.g. data from some sensor), and the server writes this data to some database. Now suppose that the database on the server side becomes slow for some reason (networking problems, disk overload, whatever). The server now can't keep up with the data the client sends, that is, it cannot save it to the database in time before the new piece of data arrives. If you're using a reactive streams implementation throughout this pipeline, the server will signal to the client automatically that it cannot process more data, and the client will automatically tweak its rate of producing in order not to overload the server.
Naturally, this can be done without any Reactive Streams implementation, e.g. by manually controlling acknowledgements. However, like with many other libraries, Reactive Streams implementations solve this problem for you. They also provide an easy way to define such pipelines, and usually they have interfaces for various external systems like databases. In particular, such libraries may implement backpressure on the lowest level, down to to the TCP connection, which may be hard to do manually.
As for Reactive Streams itself, it is just a description of an API which can be implemented by a library, which defines common terms and behavior and allows such libraries to be interchangeable or to interact easily, e.g. you can connect an akka-streams pipeline to a Monix pipeline using the interfaces from the specification, and the combined pipeline will work seamlessly and supporting all of the backpressure features of Reacive Streams.

What is a good ZeroMQ / nanomsg architecture for a server that sends data to clients that can connect / disconnect?

I am trying to create a network architecture which has a single server and multiple clients. The clients can connect and disconnect at any time so they need to announce their existence or shut-down to the server. The server must be able to send data to any particular client.
What is the best scalability protocols/architecture to use for this?
Currently I use a REQ/REP so that clients can 'login' and 'logout', and a SURVEY socket so that the server can send data to all clients. The message sent has an ID for the particular client it wants to process the message.
Is this good, or is there something better?
Sounds more like you need publisher subscriber. With both 0MQ and nanomsg you don't need to do anything in particular to manage connection / disconnection, the library does that for you.
However if you want more sophisticated message management (such as caching outgoing messages just in case another client chooses to connect) then you will have to manage that yourself. You might use a single push pull from the clients for them to announce their presence (they'd send a message saying who they were), followed by more push pulls from the server to each of the clients to send the messages from the cache that you have. Fiddly, but still a whole lot easier than programming up with raw sockets.
Using req rep can be problematic - if either end crashes or unexpectedly disconnects the other can be left in a stalled, unrecoverable state.
Sorry, in real world, there is no "One Size Fits All"
There are many further aspects, that influence architecture - The Bigger Picture.
While both Martin SUSTRIK's cool kids -- ZeroMQ & nanomsg -- have made a gread help in providing excellend bases + LEGO-type building blocks of Scaleable Formal Communication Patterns, they are only the beginning and saying that REQ/REP or SURVEY Behavioural Primitives ( great innovation, nevertheless still a building block ) are The Architecture would get upset almost all architects and evangelists.
The original question is important, however you already have gotten a first proposal to get it administratively closed as some people with "wider wingspan" feel your question is "too wide" or "opinion"-oriented instead an MCVE code-example driven ( ... yes, StackOverflow life is sometimes fast and cruel ).
So, without any further details available, my recommendation would be to check recent versions of PUB/SUB ( which can and do filter on PUB-side ( not on the SUB as was the design in the early ZeroMQ versions, once already xmited / delivered zillions of bytes round the world to just realise on the globally distributed peers level that no-one has yet SUB-ed to receive anything of that ) instead of the mentioned SURVEY.
Without any context it is nonsense to seriously judge, the less to improve what you try to design and implement.
I would do a poor service I were trying to do so.
The best next step?
What I can do for you right now is to direct you to see a bigger picture on this subject >>> with more arguments, a simple signalling-plane / messaging-plane illustration and a direct link to a must-read book from Pieter HINTJENS.

WebSocket/REST: Client connections?

I understand the main principles behind both. I have however a thought which I can't answer.
Benchmarks show that WebSockets can serve more messages as this website shows: http://blog.arungupta.me/rest-vs-websocket-comparison-benchmarks/
This makes sense as it states the connections do not have to be closed and reopened, also the http headers etc.
My question is, what if the connections are always from different clients all the time (and perhaps maybe some from the same client). The benchmark suggests it's the same clients connecting from what I understand, which would make sense keeping a constant connection.
If a user only does a request every minute or so, would it not be beneficial for the communication to run over REST instead of WebSockets as the server frees up sockets and can handle a larger crowd as to speak?
To fix the issue of REST you would go by vertical scaling, and WebSockets would be horizontal?
Doe this make sense or am I out of it?
This is my experience so far, I am happy to discuss my conclusions about using WebSockets in big applications approached with CQRS:
Real Time Apps
Are you creating a financial application, game, chat or whatever kind of application that needs low latency, frequent, bidirectional communication? Go with WebSockets:
Well supported.
Standard.
You can use either publisher/subscriber model or request/response model (by creating a correlationId with each request and subscribing once to it).
Small size apps
Do you need push communication and/or pub/sub in your client and your application is not too big? Go with WebSockets. Probably there is no point in complicating things further.
Regular Apps with some degree of high load expected
If you do not need to send commands very fast, and you expect to do far more reads than writes, you should expose a REST API to perform CRUD (create, read, update, delete), specially C_UD.
Not all devices prefer WebSockets. For example, mobile devices may prefer to use REST, since maintaining a WebSocket connection may prevent the device from saving battery.
You expect an outcome, even if it is a time out. Even when you can do request/response in WebSockets using a correlationId, still the response is not guaranteed. When you send a command to the system, you need to know if the system has accepted it. Yes you can implement your own logic and achieve the same effect, but what I mean, is that an HTTP request has the semantics you need to send a command.
Does your application send commands very often? You should strive for chunky communication rather than chatty, so you should probably batch those change request.
You should then expose a WebSocket endpoint to subscribe to specific topics, and to perform low latency query-response, like filling autocomplete boxes, checking for unique items (eg: usernames) or any kind of search in your read model. Also to get notification on when a change request (write) was actually processed and completed.
What I am doing in a pet project, is to place the WebSocket endpoint in the read model, then on connection the server gives a connectionID to the client via WebSocket. When the client performs an operation via REST, includes an optional parameter that indicates "when done, notify me through this connectionID". The REST server returns saying if the command was sent correctly to a service bus. A queue consumer processes the command, and when done (well or wrong), if the command had notification request, another message is placed in a "web notification queue" indicating the outcome of the command and the connectionID to be notified. The read model is subscribed to this queue, gets messessages and forward them to the appropriate WebSocket connection.
However, if your REST API is going to be consumed by non-browser clients, you may want to offer a way to check of the completion of a command using the async REST approach: https://www.adayinthelifeof.nl/2011/06/02/asynchronous-operations-in-rest/
I know, that is quite appealing to have an low latency UP channel available to send commands, but if you do, your overall architecture gets messed up. For example, if you are using a CQRS architecture, where is your WebSocket endpoint? in the read model or in the write model?
If you place it on the read model, then you can easy access to your read DB to answer fast search queries, but then you have to couple somehow the logic to process commands, being the read model the responsible of send the commands to the write model and notify if it is unable to do so.
If you place it on the write model, then you have it easy to place commands, but then you need access to your read model and read DB if you want to answer search queries through the WebSocket.
By considering WebSockets part of your read model and leaving command processing to the REST interface, you keep your loose coupling between your read model and your write model.