Create a topic or insert a new message Kafka? - apache-kafka

I have a Order topic where customers pushes a order messages. When order is completed by seller it should be changed to confirmed status. Shoud I creater a new topic as accepted_orders with information about seller and order? Like in relation tables. Or insert a new message to existing topic with extended information?

insert a new message to existing topic with extended information?
Into the orders topic? Then you'd consume it again and need to filter out all each of the unfilled statuses. That'll work, but it'll obviously slow down the processing as you're now consuming two messages for every one order
Therefore, a new topic, or a new partition keyed by order status, would be recommended

Related

consumers process all the related messages with a specific key

I need to make Kafka consumers process all the messages with the same ID in each partition at once. For example, consider one topic containing all orders with different types and there are multiple consumer instances subscribing to this topic. How can I run consumers to process all the messages in each partition with the same Id? Because when the orders are produced with that Id, although Kafka guarantees that all same IDs go to the same partition, but each partition may contain different orders. I need to process all the similar orders in each partition at once(not one by one) and once in a while(not as soon as a new message arrives).
As the comments say, you'll need to manually batch your data into "bins per ID", then process those on your own. For example, write each record to a database, group by ID, then iterate/process each batch.
As far as Kafka is concerned, you're required to look at each event "one by one", but this does not require you to "handle them" in that order, unless you care about sequential processing, at least once processing, and in-order offset commits.
There's also no way to get "all unique ids" in any partition without consuming the whole partition end-to-end. You could use Kafka Streams aggregate function to help with this, and punctuate to periodically handle all gathered IDs up to a certain point, as one other solution.

grouping messages and processing bunch of messages at once with kafka [duplicate]

I need to make Kafka consumers process all the messages with the same ID in each partition at once. For example, consider one topic containing all orders with different types and there are multiple consumer instances subscribing to this topic. How can I run consumers to process all the messages in each partition with the same Id? Because when the orders are produced with that Id, although Kafka guarantees that all same IDs go to the same partition, but each partition may contain different orders. I need to process all the similar orders in each partition at once(not one by one) and once in a while(not as soon as a new message arrives).
As the comments say, you'll need to manually batch your data into "bins per ID", then process those on your own. For example, write each record to a database, group by ID, then iterate/process each batch.
As far as Kafka is concerned, you're required to look at each event "one by one", but this does not require you to "handle them" in that order, unless you care about sequential processing, at least once processing, and in-order offset commits.
There's also no way to get "all unique ids" in any partition without consuming the whole partition end-to-end. You could use Kafka Streams aggregate function to help with this, and punctuate to periodically handle all gathered IDs up to a certain point, as one other solution.

How to avoid Kafka consumer processing old messages in retry queue

Given we use Kafka to update product information in our system.
A new message to update the price of a product is not processed correctly and it's sent to a retry topic that has a 10min delay.
In the next minute a new message to update the price of the same product is sent and correctly consumed.
The old message from the retry topic is consumed, leaving the product with the old price instead of the current one.
How is it possible to avoid this scenario in Kafka?
You will need to track what has been consumed somewhere.
A KTable might be able to do this (lookup record by key, if the table has the key, then it has already been consumed and processed... meaning you'd have a simple "processed" topic next to your "retry" topic), but if you have an external DB, then that will work as well. Main downside is that you will be introducing an external dependency, and it will slow down your processing as every incoming event will need to query the database.

How to guarantee message ordering over multiple topics in kafka?

I am creating a system in which I use kafka as an event store. The problem I am having is not being able to guarantee the message ordering of all the events.
Let's say I have a User entity and a Order entity. Right now I have the topics configured as follows:
user-deleted
user-created
order-deleted
order-created
When consuming these topics from the start (when a new consumer group registers) first the user-deleted topic gets consumed then the user-created etc. The problem with this is that the events over multiple topics do not get consumed chronologically, only within the topic.
Let's say 2 users get created and after this one gets deleted. The result would be one remaing user.
Events:
user-created
user-created
user-deleted
My system would consume these like:
user-deleted
user-created
user-created
Which means the result is 2 remaining users which is wrong.
I do set the partition key (with the user id) but this seems only to guarantee order within a topic. How does this problem normally get tackled?
I have seen people using topic per entity. Resulting in 2 topics for this example (user and order) but this can still cause issues with related enities.
What you've designed is "request/response topics", and you cannot order between multiple topics this way.
Instead, design "entity topics" or "event topics". This way, ordering will be guaranteed, and you only need one topic per entity. For example,
Topic users
For a key=userId, you can structure events this way.
Creates
userId, {userId: userId, name:X, ...}
Updates
userId, {userId: userId, name:Y, ...}
Deletes
userId, null
Use a compacted topic for an event-store such that all deletes will be tombstoned and dropped from any materialized view.
You could go a step further and create a wrapper record.
userId, {action:CREATE, data:{name:X, ...}} // full-record
userId, {action:UPDATE, data:{name:Y}} // partial record
userId, {action:DELETE} // no data needed
This topic acts as your "event entity topic", but then, you need a stream processor to parse and process these events consistently into the above format, such as null-ing any action:DELETE, and writing to a compacted topic (perhaps automatically using Kafka Streams KTable)
Kafka is not able to maintain ordering across multiple topics. It's not capable either to maintain ordering inside one topic that has several partitions. The only ordering guarantee we have is within each partition of one topic.
What this means is that if the order of user-created and user-deleted as known by a kafkfa producer must be the same as the order of those events as perceived by a kafka consumer (which is understandable as you explain), then those events must be sent to same kafka partition of the same topic.
Usually, you don't actually need the whole order to be exactly the same for the producer and producer (i.e. you don't need total ordering), but you need it to be the same at least for each entity id, i.e. for each user id the user-created and user-deleted event must be in the same order for the producer and the consumer, but it's often acceptable to have events mixed up across users (i.e. you need _partial ordering`).
In practice this means you must use the same topic for all those events, which means this topic will contain events with different schemas.
One strategy for achieving that is to use union types, i.e. you declare in your event schema that the type can either be a user-created or a user-deleted. Both Avro and Protobuf offer this feature.
Another strategy, if you're using Confluent Schema registry, is to allow a topic to be associated with several types in the registry, using the RecordNameStrategy schema resolution strategy. The blog post Putting Several Event Types in the Same Topic – Revisited is probably a good source of information for that.

ordering across partitions in Kafka

I am writing a kafka producer and needs help in creating partitions.
I have a group and a user table. Group contains different users and at a time a user can be a part of only one group.
There can be two types of events which I will receive as input and based on that I will add them to Kafka.
The events related to users.
The events related to groups.
Whenever an event related to a group happens, all the users in that group must be updated in bulk at consumer end.
Whenever an event related to a user happens, it must be executed as such at the consumer end.
Also, I want to maintain ordering on basis of time.
If I create user level partitioning, then the bulk update won't be possible at consumer end.
If I create group level partitioning, then the parallel update of user events won't happen.
I am trying to figure out the possibilities I can try here.
Also, I want to maintain ordering on basis of time.
Means that topics, no matter how many, cannot have more than one partition, as you could have received messages out-of-order.
Obviously, unless you implement something like sequence ids in your messages (and can share that sequence across possibly multiple producers).
If I create user level partitioning, then the bulk update won't be possible at consumer end.
If I create group level partitioning, then the parallel update of user events won't happen.
It sounds like a very simple messaging design, where you have a single queue (that's actually backed by a single topic with a single partition) that's consumed by multiple users. Actually any pub-sub messaging technology would be sufficient here (e.g. RabbitMQ's fanout exchanges).
The messages on the queue contain the information whether they are group updates or user updates - the consumers then filter the input depending on what they are interested in.
To discuss an alternative: single queue for group updates, and another for user updates - I understand that it would not be enough due to order demands - it's possible to get a group update independently of user update, breaking the ordering.
From the kafka documentation :
https://kafka.apache.org/documentation/#intro_consumers
Kafka only provides a total order over records within a partition, not
between different partitions in a topic. Per-partition ordering
combined with the ability to partition data by key is sufficient for
most applications. However, if you require a total order over records
this can be achieved with a topic that has only one partition, though
this will mean only one consumer process per consumer group.
so the best you can do is to have single partition-single topic.