SQS: How to forward message to subscriber based on a certain key - publish-subscribe

I have a validation service which takes in validation-requests and publishes them to a SQS queue. Now based on the type of validation request, I want to forward the message to that specific service.
So basically, I have one producer and multiple consumers, but essentially, one message is to be consumed by only one consumer.
What approach should I use? Should I have a different SQS queue for each service or I can do this using a single queue based on message type?

As I see it, you have three options;
The first option, like you say is to have a unique consumer for each message type. This is the approach we use and we have thousands of queues and many different messages types.
The second option would be to decorate the message being pushed onto SQS with something that would indicate it's desired consume, then have a generic consumer in your application that can forward the message on to the right consumer. Though this approach is generally seen as an anti pattern, I would personally agree.
Thirdly, you could take advantage of SNS filtering but that's only if you use SNS right now otherwise you'd have to invest in some time to setup it up and make it work.
Hope that helps!

Related

kafka produce messages in consumer

I have an application with the need to pass messages into multiple layers of processing.
I need to do this because all the new messages should be put into the first generic topic so they can be processed to calculate a type and after that, they should be put into the other topic (for further processing), and from now on all the messages with the same key, go directly to the second topic automatically.
I'm planning to create multiple topics for each layer. Messages first go into the first layer and get processed, and then they should be sent to the next layer (another topic) and this might happen again for the next layer.
I was wondering what is the best practice for this. Is it ok to produce messages in the consumer? Or is there any other better solution for this?
Producing within a consumer is perfectly acceptable. Python libraries such as Faust make this much simpler.

Kafka with multiple instances of microservices and end-users

This is more of a design/architecture question.
We have a microservice A (MSA) with multiple instances (say 2) running of it behind LB.
The purpose of this microservice is to get the messages from Kafka topic and send to end users/clients. Both instances use same consumer group id for a particular client/user so as messages are not duplicated. And we have 2 (or =#instances) partitions of Kafka topic
End users/clients connect to LB to fetch the message from MSA. Long polling is used here.
Request from client can land to any instance. If it lands to MSA1, it will pull the data from kafka partion1 and if it lands to MSA2, it will pull the data from partition2.
Now, a producer is producing the messages, we dont have high messages count. So, lets say producer produce msg1 and it goes to partition1. End user/client will not get this message unless it's request lands to MSA1, which might not happen always as there are other requests coming to LB.
We want to solve this issue. We want that client gets the message near realtime.
One of the solution can be having a distributed persistent queue (e.g. ActiveMQ) where both MSA1 and MSA2 keep on putting the messages after reading from Kafka and client just fetch the message from queue. But this will cause separate queue for every end-user/client/groupid.
Is this a good solution, can we go ahead with this? Anything that we should change here. We are deploying our system on AWS, so if any AWS managed service can help here e.g. SNS+SQS combination?
Some statistics:
~1000 users, one group id per user
2-4 instances of microservice
long polling every few seconds (~20s)
average message size ~10KB
Broadly you have three possible approaches:
You can dispense with using Kafka's consumer group functionality and allow each instance to consume from all partitions.
You can make the instances of each service aware of each other. For example, an instance which gets a request which can be fulfilled by another instance will forward the request there. This is most effective if the messages can be partitioned by client on the producer end (so that a request from a given client only needs to be routed to an instance). Even then, the consumer group functionality introduces some extra difficulty (rebalances mean that the consumer currently responsible for a given partition might not have seen all the messages in the partition). You may want to implement your own variant of the consumer group coordination protocol, only on rebalance, the instance starts from some suitably early point regardless of where the previous consumer got to.
If you can't reliably partition by client in the producer (e.g. the client is requesting a stream of all messages matching arbitrary criteria) then Kafka is really not going to be a fit and you probably want a database (with all the expense and complexity that implies).

Is Kafka message headers the right place to put event type name?

In scenario where multiple single domain event types are produced to single topic and only subset of event types are consumed by consumer i need a good way to read the event type before taking action.
I see 2 options:
Put event type (example "ORDER_PUBLISHED") into message body (payload) itself which would be like broker agnostic approach and have other advantages. But would involve parsing of every message just to know the event type.
Utilize Kafka message headers which would allow to consume messages without extra payload parsing.
The context is event-sourcing. Small commands, small payloads. There are no huge bodies to parse. Golang. All messages are protobufs. gRPC.
What is typical workflow in such scenario.
I tried to google on this topic, but didn't found much on Headers use-cases and good practices.
Would be great to hear when and how to use Kafka message headers and when not to use.
Clearly the same topic should be used for different event types that apply to the same entity/aggregate (reference). Example: BookingCreated, BookingConfirmed, BookingCancelled, etc. should all go to the same topic in order to (excuse the pun) guarantee ordering of delivery (in this case the booking ID is the message key).
When the consumer gets one of these events, it needs to identify the event type, parse the payload, and route to the processing logic accordingly. The event type is the piece of message metadata that allows this identification.
Thus, I think a custom Kafka message header is the best place to indicate the type of event. I'm not alone:
Felipe Dutra: "Kafka allow you to put meta-data as header of your message. So use it to put information about the message, version, type, a correlationId. If you have chain of events, you can also add the correlationId of opentracing"
This GE ERP system has a header labeled "event-type" to show "The type of the event that is published" to a kafka topic (e.g., "ProcessOrderEvent").
This other solution mentions that "A header 'event' with the event type is included in each message" in their Kafka integration.
Headers are new in Kafka. Also, as far as I've seen, Kafka books focus on the 17 thousand Kafka configuration options and Kafka topology. Unfortunately, we don't easily find much on how an event-driven architecture can be mapped with the proper semantics onto elements of the Kafka message broker.

Kafka topic filtering vs. ephemeral topics for microservice request/reply pattern

I'm trying to implement a request/reply pattern with Kafka. I am working with named services and unnamed clients that send messages to those services, and clients may expect a reply. Many (10s-100s) of clients may interact with a single service, or consumer group of services.
Strategy one: filtering messages
The first thought was to have two topics per service - the "HelloWorld" service would consume the "HelloWorld" topic, and produce replies back to the "HelloWorld-Reply" topic. Clients would consume that reply topic and filter on unique message IDs to know what replies are relevant to them.
The drawback there is it seems like it might create unnecessary work for clients to filter out a potentially large amount of irrelevant messages when many clients are interacting with one service.
Strategy two: ephemeral topics
The second idea was to create a unique ID per client, and send that ID along with messages. Clients would consume their own unique topic "[ClientID]" and services would send to that topic when they have a reply. Clients would thus not have to filter irrelevant messages.
The drawback there is clients may have a short lifespan, e.g. they may be single use scripts, and they would have to create their topic beforehand and delete it afterward. There might have to be some extra process to purge unused client topics if a client dies during processing.
Which of these seems like a better idea?
We are using Kafka in production as a handler for event based messages and request/response messages. our approach to implementing request/response is your first strategy because, when the number of clients grows, you have to create many topics which some of them are completely useless. another reason for choosing the first strategy was our topic naming guideline that each service should belong to only one topic for tacking. however, Kafka is not made for request/response messages but I recommend the first strategy because:
few numbers of topics
better service tracking
better topic naming
but you have to be careful about your consumer groups. which may causes of data loss.
A better approach is using the first strategy with many partitions in one topic (service) that each client sends and receives its messages with a unique key. Kafka guarantees that all messages with the same key will go to a specific partition. this approach doesn't need filtering irrelevant messages and maybe is a combination of your two strategies.
Update:
As #ValBonn said in the suggested approach you always have to be sure that the number of partitions >= number of clients.

RabbitMQ AMQP queue design

Below is the desirable design of the queue with:
P producer. The application that insert data
X exchange.
C1-C3 consumer. The applications that read from the queue
Queue details:
A. Is just like queue log, if there is no client binding then message will be discarded.
B. This is a working queue. it will do something if there is criteria match.
C. Also a working queue. it will transform the data
A is optional, but B. C. will always in queue until some client process connect it.
The problem is determine which type of exchange that i should use.
is it a fanout, direct or topic ? because I wanted the A queue to discard the message if there is no client connected but B & C should always keep the message.
And should the producer write once to the exchange, or write multiple time with different routing key or topic ?
Answer the question: Do I want all queues to receive all messages?
If the answer is yes then you should use fanout. If the answer is no then you should use direct or topic. The whole point of direct or topic is that the queues themselves will only receive messages based on matching the routing key to the binding key.
Queue A should be instantiated by the consumer C1, and set to autodelete and non durable. This way when C1 disconnects the queue will be deleted and the messages will be discarded.
Conversely Queues B and C should be instantiated when the exchange is, either separately, or by the producer. The should be set to non autodelete and probably durable. If you are using durable queues you might want to have persistent messages (don't worry if queue A doesn't exist even persistent message won't be a problem here). This way as soon as the producer starts sending messages the queues will start queuing them up and no message will be missed, even if the consumers are not yet running.
Whether to use direct or topic exchanges is personal preference. I understand that direct exchanges should be faster while topic exchanges allow a lot of flexibility with routing/binding keys.
I am not 100% what you mean by your last question. Each message should only be written once to an exchange. If using fanout the exchange will take care of routing the messages to the queues correctly and that is it. If you are using direct or topic exchanges then its down to the binding keys to make sure that each queue receives the correct messages. You should not need to send a message with more than one routing key, if you are wishing to do something like that then you have got something backwards in your understanding. But you can have multiple binding keys to the exchange from a single queue.
Simple example. X is a direct exchange. B has the binding key black, C has one binding key of black and one binding key of white. P sends messages with either the routing key black or white. If it is black then both B and C will receive the message, if it is white then only C will receive it.