I have a use-case, where I have a message which has to be pushed to a number of kafka topics.
Currently at a high level, that method looks like this:
pushToTopics(String msg){
pushToTopicA(msg);
pushToTopicB(msg);
pushToTopicC(msg);
.
.
.
pushToTopicN(msg);
}
Every PushToTopicX(msg) has a condition which when fulfilled should lead the message to be published to the corresponding topic. Right now, all of this logic is at the terminal Bolt and to push the messages, we use KafkaProducer.
I was looking at ways to break this down into topic specific bolts and more importantly use KafkaBolts to push messages.
Is it possible with storm(v 1.2.2)? I saw that very recently a PR has been merged which lets one create custom callbacks, but we don't have that.
The KafkaBolt can decide which topic to send to based on the tuple. You could just use a splitter bolt to split your message into N messages, each with a different destination topic, and then send all of them to the KafkaBolt.
The way I eventually solved it is to create separate streams, each one bound to the destination topics. Then via collector.emit on specific streams, I was able to fan the messages out across various bolts, which eventually push to Kafka using KafkaBolt.
Related
I have an application with the need to pass messages into multiple layers of processing.
I need to do this because all the new messages should be put into the first generic topic so they can be processed to calculate a type and after that, they should be put into the other topic (for further processing), and from now on all the messages with the same key, go directly to the second topic automatically.
I'm planning to create multiple topics for each layer. Messages first go into the first layer and get processed, and then they should be sent to the next layer (another topic) and this might happen again for the next layer.
I was wondering what is the best practice for this. Is it ok to produce messages in the consumer? Or is there any other better solution for this?
Producing within a consumer is perfectly acceptable. Python libraries such as Faust make this much simpler.
Can I have the consumer act as a producer(publisher) as well? I have a user case where a consumer (C1) polls a topic and pulls messages. after processing the message and performing a commit, it needs to notify another process to carry on remaining work. Given this use case is it a valid design for Consumer (C1) to publish a message to a different topic? i.e. C1 is also acting as a producer
Yes. This is a valid use case. We have many production applications does the same, consuming events from a source topic, perform data enrichment/transformation and publish the output into another topic for further processing.
Again, the implementation pattern depends on which tech stack you are using. But if you after Spring Boot application, you can have look at https://medium.com/geekculture/implementing-a-kafka-consumer-and-kafka-producer-with-spring-boot-60aca7ef7551
Totally valid scenario, for example you can have connector source or a producer which simple push raw data to a topic.
The receiver is loosely coupled to your publisher so they cannot communicate each other directly.
Then you need C1 (Mediator) to consume message from the source, transform the data and publish the new data format to a different topic.
If you're using a JVM based client, this is precisely the use case for using Kafka Streams rather than the base Consumer/Producer API.
Kafka Streams applications must consume from an initial topic, then can convert(map), filter, aggregate, split, etc into other topics.
https://kafka.apache.org/documentation/streams/
I want to implement state machine integrated with Kafka topics. Whenever some message will be produced to a topic, I want state machine to react by changing a state. I have two questions:
Is producing a message on a topic identical with publishing an event?
How to wire things up in a proper way? Some simple code example would be welcome.
Produce is not the same as Publish. You can use Produce to send messages to a topic in Kafka from a state machine:
Initially(
When(Started)
.Produce(x => x.Init<KafkaMessage>(new {Text = "text"}))
.TransitionTo(Active));
There are unit tests that show how it works, I don't believe it is documented yet. It was added in this commit
My use-case is as follows:
I have a kafka topic A with messages "logically" belonging to different "services", I don't handle neither the system sending the messages to A.
I want to read such messages from A and dispatch them to a per-service set of topics on the same cluster (let's call them A_1, ..., A_n), based on one column describing the service (the format is CSV-style, but it doesn't matter).
The set of services is static, I don't have to handle addition/removal at the moment.
I was hoping to use KafkaConnect to perform such task but, surprisingly, there are no Kafka source/sinks (I cannot find the tickets, but they have been rejected).
I have seen MirrorMaker2 but it looks like an overkill for my (simple) use-case.
I also know KafkaStreams but I'd rather not write and maintain code just for that.
My question is: is there a way to achieve this topic dispatching with kafka native tools without writing a kafka-consumer/producer myself?
PS: if anybody thinks that MirrorMaker2 could be a good fit I am interested too, I don't know the tool very well.
As for my knowledge, there is no straightforward way to branch incoming topic messages to a list of topics based on the incoming messages. You need to write custom code to achieve this.
Use Processor API Refer here
Pass list of topics inside the Processor method
Use logic to identify topics need to branch
Use context.forward to publish a message to other topics
context.forward(key, value, To.child("selected topic"))
Mirror Maker is for doing ... mirroring. It's useful when you want to mirror one cluster from one data center to the other with the same topics. Your use case is different.
Kafka Connect is for syncing different systems (data from Databases for example) through Kafka topics but I don't see it for this use case either.
I would use a Kafka Streams application for that.
All the other answers are right, at the time of writing I did find any "config-only" solution in the Kafka toolset.
What finally did the trick was to use Logstash, as its "kafka output plugin" supports jinja variables in topic-id parameter.
So once you have the "target topic name" available in a field (say service_name) it's as simple as this:
output {
kafka {
id => "sink"
codec => [...]
bootstrap_servers => [...]
topic_id => "%{[service_name]}"
[...]
}
}
I have a validation service which takes in validation-requests and publishes them to a SQS queue. Now based on the type of validation request, I want to forward the message to that specific service.
So basically, I have one producer and multiple consumers, but essentially, one message is to be consumed by only one consumer.
What approach should I use? Should I have a different SQS queue for each service or I can do this using a single queue based on message type?
As I see it, you have three options;
The first option, like you say is to have a unique consumer for each message type. This is the approach we use and we have thousands of queues and many different messages types.
The second option would be to decorate the message being pushed onto SQS with something that would indicate it's desired consume, then have a generic consumer in your application that can forward the message on to the right consumer. Though this approach is generally seen as an anti pattern, I would personally agree.
Thirdly, you could take advantage of SNS filtering but that's only if you use SNS right now otherwise you'd have to invest in some time to setup it up and make it work.
Hope that helps!