Apache kafka message communication between microservices - apache-kafka

I have a problem, that I want to solve using kafka queues.
I need to process some result, then return it to the user.
As you can see in the picture, the Rest Service, requests something to the Calculator Service.
Both services have a kafka consumer, and a kafka producer.
The rest service receive a request, then produces a message on toAdd queue, then keep consuming the fromAdd queue, until receives a value.
The calculator service keep consuming the toAdd queue, when some message comes, it sum two values, then produces a message on fromAdd queue.
Sometimes the rest service receives old messages from the queue, or more than one message.
I find something about idempotent configuration, but I don't know how to implement right.
Is that diagram, the right way to the communication between two or more services using kafka?
Can someone give a example?
Thanks.

Is that diagram, the right way to the communication between two or more services using kafka?
If you mean "Does it make sense to have two or more services communicate indirectly through Kafka?", then yes, it does.
Can someone give a example?
Here are some good pointers including examples:
Build Services on a Backbone of Events, Confluent blog, May 2017
Commander: Better Distributed Applications through CQRS, Event Sourcing, and Immutable Logs, by Bobby Calderwood, StrangeLoop, Sep 2016
Recorded talk
Reference implementation on GitHub

To answer your question: There is no problem with such communication.
Now referring back to other parts...
Keep in mind that it's an asynchronous communication so you should not keep HTTP connection open and keep user of that service waiting for the response. This is just not the way to go. You can solve this in many ways. For instance: you can use WebSockets, you can send an email/SMS/slack msg to the user with the reply and so on.

Related

ExpressJS: expose Event Processing system as a REST Service API

I am looking for a way to expose an existing event processing system to the external world using a REST interface. I have existing system design where we have RabbitMQ message queues where a publisher could post a message and then wait for the message processed results on a separate queue. Message ID is used to track the output to the original message on the output queue.
Now I want this to be exposed to the external consumers but we don't want to expose our RabbitMQ endpoint for this, so I was wondering if anyone has managed to achieve something similar to this using ExpressJS. Above diagram shows the current thought process
Main challenge I am facing here is that; some of this message processing could take more than couple of minutes, so was not sure how best to develop a API like this. Choices like should I create a polling interface for client here or is there a technology these days that help eliminate the polling on the client API to verify if the message is processed and get the result.
Can someone please help me with a good approach to manage these sort of requirement.
I finally ended up going the webhook way. Now when the REST API service receives a request, the client need to also provide a webhook and this will be registered with the client request and server will call it back when the results are available.

What happens to subscribers when the Kafka service is down? Do the need to subscribes to a specific topic when it restart?

Currently I have to sent events externally to the client which needs to subscribe to the these events. I have an endpoint that the client calls (subscribe) that follow the Server-Sent Events specifications. This open a HTTP connection, that is kept alive by the server that send "heartbeat" events.
The problem is that when this service need to be redeployed, or it goes down is the responsibility of the client to re-subscribe making a call to this endpoint, to receive again the events in real-time.
I was wondering, if I switch to technology like rabbitMQ or Kafka can I solve this problem? In other word, I would like to remove the responsibility of the client to re-subscribe if something goes wrong on the server side.
If you can attached article/resources to your answer would be great.
With RabbitMQ , the reconnection feature is dependant on the client library. For e.g., Java and .NET clients do provide this ( check here)
With Kafka I see there are configurations to support this. Also it's worth reading the excellent recommendations from Kakfa for surviving broker outages here.

Mixing communication methods for microservices

I am working on a project which is actually will be a better version of an old project. We want it to be scalable to be able to deal with high load. So we decided to go with microservices instead of monolithic. Then I started to do research about microservices, how they communicate, common design patterns and other things. Since I want my services to be scalable, event based communication made sense to me. So I decided to use kafka for this purpose.
We have much more services in the system but to simplify my question lets say I have 2 types of services which are work-node and master-node. I want both of them to be scalable. For now they are communicating over kafka.
My question : for a case I want to publish an event (produce a message on a topic) from master-node and get that event (consume from the topic) from all work-nodes. But for an other case I need to send a message to specific work-node. To be able to cover first case, all my work-nodes have different group ids in kafka and when a message published on a topic they all get that message. I know that I am not able to send a message to specific consumer with kafka. Since my nodes are scalable and their number can increase or decrease depending on the load, creating a topic for each node does not seem a good idea. My first solution was adding work-node id in message. So other work-nodes can ignore that message. Well it works but I don't think it is a good solution. My second solution is sending http request if I am going to send a message to specific node. But I don't know mixing 2 communication methods is a good solution.
What do you guys think about this problem. Is there a better solution that I am missing ? Or my whole design is going wrong ?
Kafka is not an appropriate technology for the use case you describe. I would recommend using Cadence Workflow which natively supports routing tasks to specific nodes as well as dozens of other features that messaging systems lack.
Feel free to join Cadence Workflow slack channel if you have specific questions.
I think you should able to. Consider regular Kafka flow. You have some consumer groups subscribed to the topic. Producer doesn't send message to specific partition until you specify.
Now think about the scenario that you produce some message based on your algorithm to the specific partitions.
Message received from A
some kind of algorithm like hashcode generated always 0 for A
Message send to Partition 0
Consumer 1 connected to Partiton 0
Only Consumer 1 gets the message coming from A

Pub/Sub and consumer aware publishing. Stop producing when nobody is subscribed

I'm trying to find a messaging system that supports the following use case.
Producer registers topic namespace
client subscribes to topic
first client triggers notification on producer to start producing
new client with subscription to the same topic receives data (potentially conflated, similar to hot/cold observables in RX world)
When the last client goes away, unsubscribe or crash, notify the producer to stop producing to said topic
I am aware that according to the pub/sub pattern A producer is defined to be blissfully unaware of the existence of consumers, meaning that my use-case simply does not fit the pub/sub paradigm.
So far I have looked into Kafka, Redis, NATS.io and Amazon SQS, but without much success. I've been thinking about a few possible ways to solve this, Haven't however found anything that would satisfy my needs yet.
One option that springs to mind, for bullet 2) is to model a request/reply pattern as amongs others detailed on the NATS page to have the producer listen to clients. A client would then publish a 'subscribe' message into the system that the producer would pick up on a wildcard subscription. This however leaves one big problem, which is unsubscribing. Assuming the consumer stops as it should, publishing an unsubscribe message just like the subscribe would work. But in the case of a crash or similar this won't work.
I'd be grateful for any ideas, references or architectural patterns/best practices that satisfy the above.
I've been doing quite a bit of research over the past week but haven't come across any satisfying Q&A or articles. Either I'm approaching it entirely wrong, or there just doesn't seem to be much out there which would surprise me as to me, this appears to be a fairly common scenario that applies to many domains.
thanks in advance
Chris
//edit
An actual simple use-case that I have at hand is stock quote distribution.
Quotes come from external source
subscribe to stock A quotes from external system when the first end-user looks at stock A
Stop receiving quotes for stock A from external system when no more end-users look at said stock
RabbitMQ has internal events you can use with the Event Exchange Plugin. Event such as consumer.created or consumer.deleted could be use to trigger some actions at your server level: for example, checking the remaining number of consumers using RabbitMQ Management API and takes action such as closing a topic, based on your use cases.
I don't have any messaging consumer present based publishing in mind. Got ever worst because you'll need kind of heartbeat mechanism to handle consumer crashes.
So here are my two cents, not sue if you're looking for an out of the box solution, but if not, you could wrap your application around a zookeeper cluster to handle all your use cases.
Simply use watchers on ephemeral nodes to check when you have no more consumers ( including crashes) and put some watcher around a 'consumers' path to be advertised when you get consumers.
Consumers side, you would have to register your zk node ID whenever you start it.
It's not so complicated to do, and zk is not the only solution for this, you might use other consensus techs as well.
A start for zookeeper :
https://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html
( strongly advise to use curator api, which handle lot of recipes in a smooth way)
Yannick
Unfortunately you haven't specified your use business use case that you try to solve with such requirements. From the sound of it you want not the pub/sub system, but an orchestration one.
I would recommend checking out the Cadence Workflow that is capable of supporting your listed requirements and many more orchestration use cases.
Here is a strawman design that satisfies your requirements:
Any new subscriber sends an event to a workflow with a TopicName as a workflowID to subscribe. If workflow with given ID doesn't exist it is automatically started.
Any subscribe sends another signal to unsubscribe.
When no subscribers are left workflow exits.
Publisher sends an event to the workflow to deliver to subscribers.
Workflow delivers the event to the subscribers using an activity.
If workflow with given TopicName doesn't run the publish event to it is going to fail.
Cadence offers a lot of other advantages over using queues for task processing.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
Distributed CRON support
See the presentation that goes over Cadence programming model.

Is a message queue like RabbitMQ the ideal solution for this application?

I have been working on a project that is basically an e-commerce. It's a multi tenant application in which every client has its own domain and the website adjusts itself based on the clients' configuration.
If the client already has a software that manages his inventory like an ERP, I would need a medium on which, when the e-commerce generates an order, external applications like the ERP can be notified that this has happened to take actions in response. It would be like raising events over different applications.
I thought about storing these events in a database and having the client make requests in a short interval to fetch the data, but something about polling and using a REST Api for this seems hackish.
Then I thought about using Websockets, but if the client is offline for some reason when the event is generated, the delivery cannot be assured.
Then I encountered Message Queues, RabbitMQ to be specific. With a message queue, modeling the problem in a simplistic manner, the e-commerce would produce events on one end and push them to a queue that a clients worker would be processing as events arrive.
I don't know what is the best approach, to be honest, and would love some of you experienced developers give me a hand with this.
I do agree with Steve, using a message queue in your situation is ideal. Message queueing allows web servers to respond to requests quickly, instead of being forced to perform resource-heavy procedures on the spot. You can put your events to the queue and let the consumer/worker handle the request when the consumer has time to handle the request.
I recommend CloudAMQP for RabbitMQ, it's easy to try out and you can get started quickly. CloudAMQP is a hosted RabbitMQ service in the cloud. I also recommend this RabbitMQ guide: https://www.cloudamqp.com/blog/2015-05-18-part1-rabbitmq-for-beginners-what-is-rabbitmq.html
Your idea of using a message queue is a good one, better than database or websockets for the reasons you describe. With the message queue (RabbitMQ, or another server/broker based system such as Apache Qpid) approach you should consider putting a broker in a "DMZ" sort of network location so that your internal ecommerce system can push events out to it, and your external clients can reach into without risking direct access to your core business systems. You could also run a separate broker per client.