We want to build a workflow which contains below steps in that order
Execute some synchronous activities.
Trigger an external operation via kafka event.
Listen to the kafka events for the result of the operation.
Execute some other activities based on the result.
Kafka may contain events not related to workflow, so we need a separate workflow to filter the events for that particular workflow.
Using cadence I'm planning to split it into two workflows
Workflow1 : 1 -> 2 -> wait for signal -> 4
Workflow2 : 3 -> Call workflow1.signal
Is it possible to wait for a signal in workflow1 without actually blocking the thread, so that the thread can process another workflow in the meantime.
I think there is some misunderstanding on how Temporal/Cadence works. There is no requirement to not block a thread for other workflows to be able to make progress. Worker instance will have no problem dealing with such situation.
So I would recommend to block the thread in the workflow one to wait for the signal as it is the simplest way to solve your business requirements.
As a side note I don't understand why you need a second workflow. There is no need to have a workflow to filter Kafka events. You can do it directly in a Kafka consumer that signals the first workflow.
I have some experience writing Kafka/Kinesis consumers (not working with Cadence but plan to do so soon). My feeling is that you only need 1 consumer thread blocked and waiting for new events from the Kafka stream. And this consumer can live anywhere as long as it can talk to your Cadence system to send a signal to a workflow. For each Kafka message (after filter out non related), if it can be designed to contain all the information for the consumer to decide which workflow to signal, it will be very simple. If you have no control over what is in the message (sounds like you have an existing stream), it is a little trick. Your consumer may need to look up which workflow to call based on some other identifier in the message
Related
I have a job processing system where each job contains thousands of individual tasks that require different strategies to complete. The individual tasks make up the whole job. If all tasks have been completed, the job is marked as successfully completed and other steps are taken, if any of the tasks fail, the job must be marked as failed and other steps are taken, if the job times out the job must be marked as failed and other steps are taken.
Once all of the results for a job have been received, the next job can be fetched. The next job shouldn't be fetched while a job is currently being processed.
Here is the what the flow looks like:
The Job Polling Verticle publishes a job to the event bus, and the Job Processing Verticle publishes each task to the event bus. When the job strategy completes, it publishes the task result to the event bus.
The issue is that I don't know the right way to determine when all tasks have been completed in this model. All verticles are stateless, The Job Processing Verticle doesn't await any futures, and even if the Job Results Verticle was stateful, it doesn't know how many results it should expect.
The only way I can think to do this would be to have a global stateful object. But I don't think this is good design.
Additionally, I need to know when a Job has timed out. That is, it's run longer than it should and I need to consider it's failed, log it, and move on.
I could do this with the global state, but again I don't think that's the right solution.
Does this verticle pattern make sense for what I'm trying to do?
First, let me try to address your questions. Then I'll try to explain what problems this design has.
The issue is that I don't know the right way to determine when all tasks have been completed in this model. All verticles are stateless, The Job Processing Verticle doesn't await any futures, and even if the Job Results Verticle was stateful, it doesn't know how many results it should expect.
The solution could be reference counting verticle. Each worker should emit a start message on event bus with jobId when it starts, and end message with jobId when it completes. Even if you have fan-out (those are the cases that you don't know how many workers there are), counting verticle will know that. In your diagram, "Job Post Processing Verticle" is a good candidate for this. It can maintain a counter, and only when it reaches zero, it should start the next job. That also helps avoiding actually sharing some memory reference.
Additionally, I need to know when a Job has timed out. That is, it's run longer than it should and I need to consider it's failed, log it, and move on.
In the same verticle you can start a timer every time you get a new start message. If you get end message, cancel the timer. Otherwise, cancel current job and start again.
Now, this solution will work, but the design has two main flaws. One is the fact that you maintain all your flow in memory, it seems. If your application crashes, all progress is lost, and it's not clear how you record it. Maybe polling Jobs table in DB would actually be better, since your job execution is sequential anyway.
Second point is the fact that all those timeouts and reference counting is homemade implementation of structured concurrency. Maybe you should take a look at something like Kotlin coroutines for that, at it will handle many of your problems for you.
I have the following use cases:
Assume you have two micro-services one AccountManagement and ActivityReporting that processes event U.
When a user registers, event U containing the user information will published into a broker for the two micro-services to process.
AccountManagement, and ActivityReporting microservice are replicated across two instances each for performance and scalability reasons.
Each microservice instance has a consumer listening on the broker topic. The choice of topic is so that both AccountManagement, and ActivityReporting can process U concurrently.
However, I want only one instance of AccountManagement to process event U, and one instance of ActivityReporting to process event U.
Please share your experience implementing a Consume Once per Application Group, broker system.
As this would effectively solve this problem.
If all your consumer listeners even from different instances have the same group.id property then only one of them will receive the message. You need to set this property when you initialise the consumer. So in your case you will need one group.id for AccountManagement and another for ActivityReporting.
I would recommend Cadence Workflow which is much more powerful solution for microservice orchestration.
It offers a lot of advantages over using queues for your use case.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.
For a test, I created a new function app. I added two functions, one was an http trigger that when invoked, pushed 500 messages to a queue. The other, a queue trigger to read the messages. The queue trigger function code, was setup to read a message and randomly sleep from 1 to 30 seconds. This was intended to simulate longer running tasks.
I invoked the http trigger to create the messages, then watched the que fill up (messages were processed by the other trigger). I also wired up app insights to this function app, but I did not see is scale beyond 1 server.
Do Azure functions scale up soley on the # of messages in the que?
Also, I implemented these functions in Powershell.
If you're running in the Azure Functions consumption plan, we monitor both the length and the throughput of your queue to determine whether additional VM resources are needed.
Note that a single function app instance can process multiple queue messages concurrently without needing to scale across multiple VMs. So if all 500 messages can be consumed relatively quickly (again, in the consumption plan), then it's possible that you won't scale at all.
The exact algorithm for scaling isn't published (it's subject to lots of tweaking), but generally speaking you can expect the system to automatically scale you out if messages are getting added to the queue faster than your functions can process them. Your app will also scale out if the latency of the first message in the queue is continuously increasing (meaning, messages are sitting idle and not getting processed). The time between VMs getting added is usually in the tens of seconds.
There are some thresholds based on queue count as well. For example, the system tries to ensure that there is at least 1 VM for every 1K queue messages, but usually the scale decisions are based on message throughput as I described earlier.
I think #Chris Gillum put it well, it's hard for us to push the limits of the server to the point that things will start to scale.
Some other options available are:
Use durable functions and scale with Threading:
https://learn.microsoft.com/en-us/azure/azure-functions/durable-functions-cloud-backup
Another method could be to use Event Hubs which are designed for massive scale. Instead of queues, have Function #1 trigger an Event, and your Function #2 subscribed to that Event Hub trigger. Adding Streaming Analytics, could also be an option to more fully expand on capabilities if needed.
I am wondering if there is some way to delay an akka message from processing?
My use case: For every request I have, I have a small amount of work that I need to do and then I need to additional work two hours later.
Is there any easy way to delay the processing of a message in AKKA? I know I can probably setup an external distributed queue such as ActiveMQ, RabbitMQ which probably has this feature but I rather not.
I know I would need to make the mailbox durable so it can survive restarts or crashes. We already have mongo setup so I probably be using the MongoBasedMailbox for durability.
Temporal Workflow is capable of supporting your use case with minimal effort. You can think about it as a Durable Actor platform. When actor state including threads and local variables is preserved across process restarts.
Temporal offers a lot of other features for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example, it allows executing a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverable failures (SAGA)
Gives complete visibility into the current state of the update. For example, when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Temporal every event is recorded.
Ability to cancel an update in flight.
Throttling of requests
See the presentation that goes over the Temporal programming model. It talks about Cadence which is the predecessor of Temporal.
It's not ideal, but the Akka Camel Quartz scheduler would do the trick. More heavyweight than the built-in ActorSystem scheduler, but know that Quartz has its own issues.
you could still use the normal Akka scheduler, you will just have to keep a state on the actor persistence to avoid loosing the job if the server restarted.
I have recently used PersistentFsmActor - which will keep the state of the actor persisted
I'm not sure in your case you have to use FSM (Finite State Machine) , so you could basically just use a persistentActor to save the time the job was inserted, and start a scheduler to that time. this way - even if you restarted the server, the actor will start and create a new scheduled job use the persistent data to calculate the time left to run it
So, i built this small example of a ZeroMQ pipeline architecture because i'll end up having to do something similar very soon and i'm trying to grasp the pipeline concept the right way.
https://gist.github.com/2765708
Right now, this is completely asynchronous. The controller dispatches a batch of tasks to various workers, which in their turn, send a message to the sink. The controller and sink are fixed parts of my architecture, while workers are dynamic. That's perfect.
However, i would like to know when the workers have finished working on all their tasks. In that example, i do know the amount of messages, but that won't be true on real-life situations. I might have 100 messages or 10,000. So, how can the sink or the controller know when the workers have finished working on their tasks? I have to perform some actions that depend on the conclusion of the jobs sent to workers.
I wanted to expand on #bjlaub's answer. It started as a comment but I was typing too much. I agree with the concept of acknowledgment, but believe it can originate in multiple places.
There are multiple approaches to this communication and it all depends on the behavior you are after in the system.
First, you can either send out messages from the workers as they finish each task, or from the sink as it receives each task. Right now I am not addressing the type of socket, only the act of communicating. I believe it is much more efficient to send it from the sink as you would only need one connection back to the controller instead of one for each worker. The sink does not need to know how many total tasks there are. Only that it is firing off a message after each result it receives. The controller can determine how many to expect since it was the submission point and new when it had exhausted its submission (the count).
Now regardless of whether you have the message sent from the worker or the sink, you can use different socket types. If you want the controller to completely block until all work is done, then you can have it be a push/pull until it receives X messages (message content can be anything. Its just a trigger).
This may be limiting if the controller wants to be able to do other work while these tasks are happening. If so, you could maybe use pub/sub, and let the controller subscribe to being notified as tasks complete, and asynchronously maintain a count until the total has been satisfied.
And finally, maybe you have the situation where you want the controller to ask the sink for a status when you deem fit. You can have a req/rep pattern for the controller to ask the sink how many requests it has received on demand.
I'm sure one of these patterns will fit your specific needs.
One idea (disclaimer: I have very little experience w/ 0MQ!):
Setup an "acknowledgment" pipeline in the reverse direction. Since the controller presumably knows how many tasks it has dispatched to the workers (e.g. the number of times it called send), it can use a PULL socket to receive a small message (an integer for example) from each worker indicating the completion of the task. The worker process dispatches its completed result to the sink, and at the same time sends the acknowledgement back to the controller. Once the controller collects the right number of acknowledgements, it can do whatever post-processing is necessary before farming out the next set of work.
You could also push this downstream to the sink, but you would need to notify the sink of the total number of work units to expect before farming them out to the workers.