Questions on synchronous ZeroMQ pipeline architecture - queue

So, i built this small example of a ZeroMQ pipeline architecture because i'll end up having to do something similar very soon and i'm trying to grasp the pipeline concept the right way.
https://gist.github.com/2765708
Right now, this is completely asynchronous. The controller dispatches a batch of tasks to various workers, which in their turn, send a message to the sink. The controller and sink are fixed parts of my architecture, while workers are dynamic. That's perfect.
However, i would like to know when the workers have finished working on all their tasks. In that example, i do know the amount of messages, but that won't be true on real-life situations. I might have 100 messages or 10,000. So, how can the sink or the controller know when the workers have finished working on their tasks? I have to perform some actions that depend on the conclusion of the jobs sent to workers.

I wanted to expand on #bjlaub's answer. It started as a comment but I was typing too much. I agree with the concept of acknowledgment, but believe it can originate in multiple places.
There are multiple approaches to this communication and it all depends on the behavior you are after in the system.
First, you can either send out messages from the workers as they finish each task, or from the sink as it receives each task. Right now I am not addressing the type of socket, only the act of communicating. I believe it is much more efficient to send it from the sink as you would only need one connection back to the controller instead of one for each worker. The sink does not need to know how many total tasks there are. Only that it is firing off a message after each result it receives. The controller can determine how many to expect since it was the submission point and new when it had exhausted its submission (the count).
Now regardless of whether you have the message sent from the worker or the sink, you can use different socket types. If you want the controller to completely block until all work is done, then you can have it be a push/pull until it receives X messages (message content can be anything. Its just a trigger).
This may be limiting if the controller wants to be able to do other work while these tasks are happening. If so, you could maybe use pub/sub, and let the controller subscribe to being notified as tasks complete, and asynchronously maintain a count until the total has been satisfied.
And finally, maybe you have the situation where you want the controller to ask the sink for a status when you deem fit. You can have a req/rep pattern for the controller to ask the sink how many requests it has received on demand.
I'm sure one of these patterns will fit your specific needs.

One idea (disclaimer: I have very little experience w/ 0MQ!):
Setup an "acknowledgment" pipeline in the reverse direction. Since the controller presumably knows how many tasks it has dispatched to the workers (e.g. the number of times it called send), it can use a PULL socket to receive a small message (an integer for example) from each worker indicating the completion of the task. The worker process dispatches its completed result to the sink, and at the same time sends the acknowledgement back to the controller. Once the controller collects the right number of acknowledgements, it can do whatever post-processing is necessary before farming out the next set of work.
You could also push this downstream to the sink, but you would need to notify the sink of the total number of work units to expect before farming them out to the workers.

Related

High Scalability Question: How to sync data across multiple microservices

I have the following use cases:
Assume you have two micro-services one AccountManagement and ActivityReporting that processes event U.
When a user registers, event U containing the user information will published into a broker for the two micro-services to process.
AccountManagement, and ActivityReporting microservice are replicated across two instances each for performance and scalability reasons.
Each microservice instance has a consumer listening on the broker topic. The choice of topic is so that both AccountManagement, and ActivityReporting can process U concurrently.
However, I want only one instance of AccountManagement to process event U, and one instance of ActivityReporting to process event U.
Please share your experience implementing a Consume Once per Application Group, broker system.
As this would effectively solve this problem.
If all your consumer listeners even from different instances have the same group.id property then only one of them will receive the message. You need to set this property when you initialise the consumer. So in your case you will need one group.id for AccountManagement and another for ActivityReporting.
I would recommend Cadence Workflow which is much more powerful solution for microservice orchestration.
It offers a lot of advantages over using queues for your use case.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.

Why doesn't my Azure Function scale up?

For a test, I created a new function app. I added two functions, one was an http trigger that when invoked, pushed 500 messages to a queue. The other, a queue trigger to read the messages. The queue trigger function code, was setup to read a message and randomly sleep from 1 to 30 seconds. This was intended to simulate longer running tasks.
I invoked the http trigger to create the messages, then watched the que fill up (messages were processed by the other trigger). I also wired up app insights to this function app, but I did not see is scale beyond 1 server.
Do Azure functions scale up soley on the # of messages in the que?
Also, I implemented these functions in Powershell.
If you're running in the Azure Functions consumption plan, we monitor both the length and the throughput of your queue to determine whether additional VM resources are needed.
Note that a single function app instance can process multiple queue messages concurrently without needing to scale across multiple VMs. So if all 500 messages can be consumed relatively quickly (again, in the consumption plan), then it's possible that you won't scale at all.
The exact algorithm for scaling isn't published (it's subject to lots of tweaking), but generally speaking you can expect the system to automatically scale you out if messages are getting added to the queue faster than your functions can process them. Your app will also scale out if the latency of the first message in the queue is continuously increasing (meaning, messages are sitting idle and not getting processed). The time between VMs getting added is usually in the tens of seconds.
There are some thresholds based on queue count as well. For example, the system tries to ensure that there is at least 1 VM for every 1K queue messages, but usually the scale decisions are based on message throughput as I described earlier.
I think #Chris Gillum put it well, it's hard for us to push the limits of the server to the point that things will start to scale.
Some other options available are:
Use durable functions and scale with Threading:
https://learn.microsoft.com/en-us/azure/azure-functions/durable-functions-cloud-backup
Another method could be to use Event Hubs which are designed for massive scale. Instead of queues, have Function #1 trigger an Event, and your Function #2 subscribed to that Event Hub trigger. Adding Streaming Analytics, could also be an option to more fully expand on capabilities if needed.

Implementing sagas with Kafka

I am using Kafka for Event Sourcing and I am interested in implementing sagas using Kafka.
Any best practices on how to do this? The Commander pattern mentioned here seems close to the architecture I am trying to build but sagas are not mentioned anywhere in the presentation.
This talk from this year's DDD eXchange is the best resource I came across wrt Process Manager/Saga pattern in event-driven/CQRS systems:
https://skillsmatter.com/skillscasts/9853-long-running-processes-in-ddd
(requires registering for a free account to view)
The demo shown there lives on github: https://github.com/flowing/flowing-retail
I've given it a spin and I quite like it. I do recommend watching the video first to set the stage.
Although the approach shown is message-bus agnostic, the demo uses Kafka for the Process Manager to send commands to and listen to events from other bounded contexts. It does not use Kafka Streams but I don't see why it couldn't be plugged into a Kafka Streams topology and become part of the broader architecture like the one depicted in the Commander presentation you referenced.
I hope to investigate this further for our own needs, so please feel free to start a thread on the Kafka users mailing list, that's a good place to collaborate on such patterns.
Hope that helps :-)
I would like to add something here about sagas and Kafka.
In general
In general Kafka is a tad different than a normal queue. It's especially good in scaling. And this actually can cause some complications.
One of the means to accomplish scaling, Kafka uses partitioning of the data stream. Data is placed in partitions, which can be consumed at its own rate, independent of the other partitions of the same topic. Here is some info on it: how-choose-number-topics-partitions-kafka-cluster. I'll come back on why this is important.
The most common ways to ensure the order within Kafka are:
Use 1 partition for the topic
Use a partition message key to "assign" the message to a topic
In both scenarios your chronologically dependent messages need to stream through the same topic.
Also, as #pranjal thakur points out, make sure the delivery method is set to "exactly once", which has a performance impact but ensures you will not receive the messages multiple times.
The caveat
Now, here's the caveat: When changing the amount of partitions the message distribution over the partitions (when using a key) will be changed as well.
In normal conditions this can be handled easily. But if you have a high traffic situation, the migration toward a different number of partitions can result in a moment in time in which a saga-"flow" is handled over multiple partitions and the order is not guaranteed at that point.
It's up to you whether this will be an issue in your scenario.
Here are some questions you can ask to determine if this applies to your system:
What will happen if you somehow need to migrate/copy data to a new system, using Kafka?(high traffic scenario)
Can you send your data to 1 topic?
What will happen after a temporary outage of your saga service? (low availability scenario/high traffic scenario)
What will happen when you need to replay a bunch of messages?(high traffic scenario)
What will happen if we need to increase the partitions?(high traffic scenario/outage & recovery scenario)
The alternative
If you're thinking of setting up a saga, based on steps, like a state machine, I would challenge you to rethink your design a bit.
I'll give an example:
Lets consider a booking-a-hotel-room process:
Simplified, it might consist of the following steps:
Handle room reserved (incoming event)
Handle room payed (incoming event)
Send acknowledgement of the booking (after payed and some processing)
Now, if your saga is not able to handle the payment if the reservation hasn't come in yet, then you are relying on the order of events.
In this case you should ask yourself: when will this break?
If you conclude you want to avoid the chronological dependency; consider a system without a saga, or a saga which does not depend on the order of events - i.e.: accepting all messages, even when it's not their turn yet in the process.
Some examples:
aggregators
Modeled as business process: parallel gateways (parallel process flows)
Do note in such a setup it is even more crucial that every action has got an implemented compensating action (rollback action).
I know this is often hard to accomplish; but, if you start small, you might start to like it :-)

MSMQ as a job queue

I am trying to implement job queue with MSMQ to save up some time on me implementing it in SQL. After reading around I realized MSMQ might not offer what I am after. Could you please advice me if my plan is realistic using MSMQ or recommend an alternative ?
I have number of processes picking up jobs from a queue (I might need to scale out in the future), once job is picked up processing follows, during this time job is locked to other processes by status, if needed job is chucked back (status changes again) to the queue for further processing, but physically the job still sits in the queue until completed.
MSMQ doesn't let me to keep the message in the queue while working on it, eg I can peek or read. Read takes message out of queue and peek doesn't allow changing the message (status).
Thank you
Using MSMQ as a datastore is probably bad as it's not designed for storage at all. Unless the queues are transactional the messages may not even get written to disk.
Certainly updating queue items in-situ is not supported for the reasons you state.
If you don't want a full blown relational DB you could use an in-memory cache of some kind, like memcached, or a cheap object db like raven.
Take a look at RabbitMQ, or many of the other messages queues. Most offer this functionality out of the box.
For example. RabbitMQ calls what you are describing, Work Queues. Multiple consumers can pull from the same queue and not pull the same item. Furthermore, if you use acknowledgements and the processing fails, the item is not removed from the queue.
.net examples:
https://www.rabbitmq.com/tutorials/tutorial-two-dotnet.html
EDIT: After using MSMQ myself, it would probably work very well for what you are doing, as far as I can tell. The key is to use transactions and multiple queues. For example, each status should have it's own queue. It's fairly safe to "move" messages from one queue to another since it occurs within a transaction. This moving of messages is essentially your change of status.
We also use the Message Extension byte array for storing message metadata, like status. This way we don't have to alter the actual message when moving it to another queue.
MSMQ and queues in general, require a different set of patterns than what most programmers are use to. Keep that in mind.
Perhaps, if you can give more information on why you need to peek for messages that are currently in process, there would be a way to handle that scenario with MSMQ. You could always add a database for additional tracking.

Message bus integration and resync of Bounded Contexts after downtime - Service Bus 1.0

I have just downloaded joliver eventstore and looking to wire up a service bus with Windows Service Bus 1.0 for an application separated across more than one Bounded Context process.
If a bounded context has been offline whilst events in other bounded contexts have been created (or may even be a new context that has been deployed), I can see the following sequence of events.
For an example ContextA, ContextB and ContextC, all connected using Service Bus 1.0 and each context with their own event store, they all share the same bus messaging backplane.
ContextC goes offline.
When ContextC comes back-up, other bounded contexts need to be notified of the events that need to be resent to the context that has just come back online. These events are replayed from each of the event stores.
My questions are:
The above scenario would apply to any event sourcing libraries, so is there any infrastructure code on top of this I can use, or do I have to roll my own?
With Windows Service Bus 1.0, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
The above scenario would apply to any event sourcing libraries, so is there any infrastructure code on top of this I can use, or do I have to roll my own?
The notion of a Projection mechanism tied to the events is certainly common. Unfortunately, there are many many ways of handling how that might be done, depending on your stack, performance requirements and scale and many other factors.
As a result I'm not aware of a commoditized facility of this nature.
The GetEventStore store has an integrated Projection facility which looks extremely powerful and takes the need to build all this off the table. Before its existence, I'd have argued that one shouldnt even consider looking past the the SRPness of the JOES.
You havent said much about your actual stack other than mentioning Azure.
With Windows Service Bus, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
You can use stream id + the commit sequence number the MessageId (and use that to ensure duplicates are removed by the bus). You will probably also include properties in the Message metadata.
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
If you're on Azure and considering ServiceBus then the Topics can be used to ensure at least once delivery (and you'll use the sessioning facility). Go watch the two hour deep dive ClemensV Subscribe video plus a few other episodes or you'll spent the same amount of time making mistakes)
To keep broadcast traffic down, if ContextC requests replays from ContextA and ContextB, is there any way for these replay messages to be sent only to ContextC? Or should I not worry about this?
Mu. You started off asking whether this stuff was a good idea but now seem to have baked in an assumption that it's the way to go.
Firstly, this infrastructure is a massive wheel to reinvent. Have you considered simply setting up a topic per BC and having anyone that needs to listen listen?
A key thing here is that you need to bear in mind the fact that just because you can think of cases where BCs need to consume each others events, that this central magic bus that's everywhere will deliver everything everywhere.
EDIT: Answers to your edited versions of questions 2+
With Windows Service Bus 1.0, how do I marry sequence numbers in my event store to sequence numbers on the Service Bus?
Your event store doesnt have a sequence number. It has a commit sequence number per aggregate. You'd typically use a sessioned topic and subscription. Then you need to choose whether you want a global ordering (use a single session id) or per aggregate ordering (use the stream id as the session id).
Once events are on a topic, they have a MessageSequenceNumber and the subscription (when sessioned) delivers (actually the subscriber recieves them) them in sequence.
What is the best practice to detect and handle events that have already been received in a safe manner (protecting against message handlers failing)?
This is built into the Service Bus (or any queueing mechanism). You don't mark the Message completed until it has been successfully processed. Any failure leads to Abandonment (which puts it back on the queue for reprocessing).
The subscriber taking a break, becoming disconnected or work backing up is naturally dealt with by the Topic.