What is the difference between KafkaBolt and BaseTickTupleAwareRichBolt? What exactly do each one do? What are their pros and cons, and can you specify a quick example for each of them? Thanks!
They're not different, and one subsumes all pros and cons of the other
KafkaBolt is a subclass of BaseTickTupleAwareRichBolt, with the later knowing nothing about Kafka. So, if you don't need to interact with Kafka outside of the Spout, you wouldn't use it. You could also just define a Kafka producer within your BaseTickTupleAwareRichBolt implementation
What exactly do each one do?
One defines a Bolt contract that can retrieve tuples from spouts. The other, a specific implementation that can be configured to interact with Kafka and send tuple information to a topic
Related
I'm struggling with Kafka and its multi-event types per topic concept. According to this article, there are some cases when it's fine to keep events of different types in single topic. And I believe I have all prerequisites to use it my case. Without going deep into the idea, I just tell that I want to keep commands and events in same topic under the same key to preserve order of the events.
In my case I'm using avro and would like to use io.confluent.kafka.serializers.subject.RecordNameStrategy for serialisation of events coming from topic. And I would like to use Kafka Streams api to avoid low-level api. Thus, KStream is a java class which designed to heavily use generics and type parameter, I'm not sure the right way to express the nature of such polymorf topic nature with it, as I'm using Avro records and autogenerated classes, where I cannot build inheritance tree of objects or use composition to encapsulate such playload inside some wrapper class.
If I will use Object class in the KStream definition and will allow schemaregistry to convert data, and then apply filtering by type, does not looks right to me...
I also thought about defining different consumer for same topic which are supposed to read events only of right type, but also don't have an glue how to filter such, before reaching up my KStream...
And here is my question. What would be the right way of archiving this with KStream ?
I will appreciate any help or ideas
Thanks!
My use-case is as follows:
I have a kafka topic A with messages "logically" belonging to different "services", I don't handle neither the system sending the messages to A.
I want to read such messages from A and dispatch them to a per-service set of topics on the same cluster (let's call them A_1, ..., A_n), based on one column describing the service (the format is CSV-style, but it doesn't matter).
The set of services is static, I don't have to handle addition/removal at the moment.
I was hoping to use KafkaConnect to perform such task but, surprisingly, there are no Kafka source/sinks (I cannot find the tickets, but they have been rejected).
I have seen MirrorMaker2 but it looks like an overkill for my (simple) use-case.
I also know KafkaStreams but I'd rather not write and maintain code just for that.
My question is: is there a way to achieve this topic dispatching with kafka native tools without writing a kafka-consumer/producer myself?
PS: if anybody thinks that MirrorMaker2 could be a good fit I am interested too, I don't know the tool very well.
As for my knowledge, there is no straightforward way to branch incoming topic messages to a list of topics based on the incoming messages. You need to write custom code to achieve this.
Use Processor API Refer here
Pass list of topics inside the Processor method
Use logic to identify topics need to branch
Use context.forward to publish a message to other topics
context.forward(key, value, To.child("selected topic"))
Mirror Maker is for doing ... mirroring. It's useful when you want to mirror one cluster from one data center to the other with the same topics. Your use case is different.
Kafka Connect is for syncing different systems (data from Databases for example) through Kafka topics but I don't see it for this use case either.
I would use a Kafka Streams application for that.
All the other answers are right, at the time of writing I did find any "config-only" solution in the Kafka toolset.
What finally did the trick was to use Logstash, as its "kafka output plugin" supports jinja variables in topic-id parameter.
So once you have the "target topic name" available in a field (say service_name) it's as simple as this:
output {
kafka {
id => "sink"
codec => [...]
bootstrap_servers => [...]
topic_id => "%{[service_name]}"
[...]
}
}
I'm trying to implement a Kafka producer/consumer model, and am deliberating whether creating a separate publisher thread per topic would be preferred over having a single publisher handle multiple topics. Any help would be appreciated
PS: I'm new to Kafka
By separate publisher thread, I think you mean separate producer objects. If so..
Since messages are stored as key-value pairs in Kafka, different topics can have different key-value types.
So if your Kafka topics have different key-value types like for example..
Topic1 - key:String, value:Student
Topic2 - key:Long, value:Teacher
and so on, then you should be using multiple producers. This is because the KafkaProducer class at the time of constructing the object asks you for the key and value serializers.
Properties props=new Properties();
props.put("key.serializer",StringSerializer.class);
props.put("value.serializer",LongSerializer.class);
KafkaProducer<String,Long> producer=new KafkaProducer<>(props);
Though, you may also write a generic serializer for all the types! But, it is better to know before hand what we are doing with the producer.
I prefer Keep It Stupid Simple (KISS) approach for the sake of obvious reasons - one producer / multiple producers - one topic.
From Wikipedia,
The KISS principle states that most systems work best if they are kept simple rather than made complicated; therefore, simplicity should be a key goal in design, and unnecessary complexity should be avoided.
Talking about the possibility of one producer supporting multiple topics, it is also far from the truth.
Starting with version 2.5, you can use a RoutingKafkaTemplate to select the producer at runtime, based on the destination topic name.
https://docs.spring.io/spring-kafka/reference/html/#routing-template
Single Publisher can handle multiple Topics and you can customize the Producer Config as per Topic needs
I think a separate thread for each topic would be preferred because due to some reasons if the particular producer is down then the respected topic will be impacted and remaining all the topics will work smoothly without any problem.
If we create one publisher for all topics then, if the publisher is down for some reasons then all the topics would impact.
Suppose I have a Kafka topic named account with several message types (each one with a different Avro schema), like account.created, account.deleted and so on.
I would like to understand if it is feasible (and it makes sense) to publish/receive different types on the same topics with Spring Cloud Stream. In particular, it would be very useful to have several #StreamListener, each one dedicated to a particular type. According to this blog post this is really useful when having the need to order messages because they are related to the same entity. What is an example of the configuration in this case?
I think you are talking about content-based routing which allows messages to be delivered to a specific StreamListener for cases where there are multiple.
You do so by using condition attribute. Please refer to this section for more details and let us know if it is still unclear or not what you're looking for.
I have started using Kafka recently and evaluating Kafka for few use cases.
If we wanted to provide the capability for filtering messages for consumers (subscribers) based on message content, what is best approach for doing this?
Say a topic named "Trades" is exposed by producer which has different trades details such as market name, creation date, price etc.
Some consumers are interested in trades for a specific markets and others are interested in trades after certain date etc. (content based filtering)
As filtering is not possible on broker side, what is best possible approach for implementing below cases :
If filtering criteria is specific to consumer. Should we use
Consumer-Interceptor (though interceptor are suggested for logging
purpose as per documentation)?
If filtering criteria (content based filtering) is common among consumers, what should be the approach?
Listen to topic and filter the messages locally and write to new topic (using either interceptor or streams)
If I understand you question correctly, you have one topic and different consumer which are interested in specific parts of the topic. At the same time, you do not own those consumer and want to avoid that those consumer just read the whole topic and do the filtering by themselves?
For this, the only way to go it to build a new application, that does read the whole topic, does the filtering (or actually splitting) and write the data back into two (multiple) different topics. The external consumer would consumer from those new topics and only receive the date they are interested in.
Using Kafka Streams for this purpose would be a very good way to go. The DSL should offer everything you need.
As an alternative, you can just write your own application using KafkaConsumer and KafkaProducer to do the filtering/splitting manually in your user code. This would not be much different from using Kafka Streams, as a Kafka Streams application would do the exact same thing internally. However, with Streams your effort to get it done would be way less.
I would not use interceptors for this. Even is this would work, it seems not to be a good software design for you use case.
Create your own interceptor class that implements org.apache.kafka.clients.consumer.ConsumerInterceptor and implement your logic in method 'onConsume' before setting 'interceptor.classes' config for the consumer.