I have multiple producers writing to single topic which is default as defined in policy, is it possible to create new topic without changing the default topic ? In other words, one producer same logs to multiple topics possible ?
In other words, one producer same logs to multiple topics possible ?
Yes, one producer can produce to multiple topics. The relation between the topic and a producer is not one-to-one.
Example:
producer.send(new ProducerRecord<String, String>("my-topic", "key", "val"));
The send() method takes a ProducerRecord which contains the topic name. So we can give different topic names to each send() call.
However, the key.serializer and value.serializer matters. We specify only one key.serializer and one value.serializer per-producer rather than per-topic.
This being the case, all of your topic messages can be serialized using those serializers only.
If you want to support different objects, either write a custom serializer that is common for all of them (perhaps, a Json Serializer) or convert your objects to the format that your serializers can serialize (for ex, String for StringSerializer, byte[] for ByteArraySerializer etc)
Related
I have more number of producers let's say around 200. and each producer have 4 different kinds of data. so is it efficient to produce data from all producers to same topic or is it efficient to configure different topic for each producer?
I want to collect each producer data separately at the consumer end.
What are the available ways to handle data at the consumer end?
There's no difference. The broker just stores bytes; it has no knowledge of "type".
The main problem with putting everything into one topic is that your consumers will need to know how to deserialize and process those bytes, consistently. I.e. setting value.deserializer might not return a concrete class, but rather some generic type like JSON string or Avro GenericRecord, which then needs extra parsing and large switch-case logic in your consumer
I have use case to broadcast specific type of message to all partitions. I explored on custom partitioner but it doesn't support broadcast to all partitions. I am using custom partitioner to forward other type of messages to specific partitioner.
I want to know is there any way from kafka side to support broadcasting to all partitions?
Ideas around custom solutions are also welcome:-
One of the way is to have separate kafka producer instance to send message to all partitions individually but if number of partitions are more and number of broadcasting messages are more then that can become bottleneck or might have latency overhead. Using kafka streams or kafka producer.
Producer<String, String> producer = new KafkaProducer<>(props);
for (int partition = 0; part < totalNoOfPartitions; partition++)
producer.send(new ProducerRecord<String, String>("Test", partition, "Hello", "World!"));
producer.close();
I understand duplicating data can be concern here but let's ignore that factor here. We are fine with duplicate data on kafka cluster.
Please help if there is better way than what is proposed in this post.
In older version of Kafka, it's not easily possible to do. You would need to "replicate" the message manually inside your Kafka Streams application, and send each copy to a different partition using a custom partitioner.
In the upcoming Kafka 3.4 release, there will be built-in support to mulit-cast/broadcast a message to multiple partitions via KIP-837. The StreamPartitioner interface now has a new method Optional<Set<Integer>> partitions(String topic, K key, V value, int numPartitions) that allows you to return a set of partitions you want a single record to be written into (instead of a single partition as in the old interface).
The use case is: I have a kafka streams app that consumer from an input topic, and output to a intermediate topic, then in the same streams another topology consume from this intermediate topic.
Whenever the application id is updated, both topic start to consumer from earliest. I want to change the auto.offset.reset for the intermediate topic to latest while keep that to earliest for the input topic.
Yes. You can set the reset strategy for each topic via:
// Processor API
topology.addSource(AutoOffsetReset offsetReset, String name, String... topics);
// DSL
builder.stream(String topic, Consumed.with(AutoOffsetReset offsetReset));
builder.table(String topic, Consumed.with(AutoOffsetReset offsetReset));
All those methods have some overloads that allow to set it.
When a partition is assigned by the Producer using a number . For eg
kafkaTemplate.send(topic, 1, "[" + LocalDateTime.now() + "]" + "Message to partition 1");
The number 1 second parameter defines the partition id where i want my message to be sent. So the consumer can consume this message:
TopicPartition partition1 = new TopicPartition(topic, 1);
consumer1.assign(Arrays.asList(partition1));
But how do i achieve this for a producer choosing a partition based on the hash of the key sent by the producer using the DefaultPartitioner. Example:
kafkaTemplate.send(topic, "forpartition1", "testkey");
Here the key is "forpartition1" , how do i assign my consumer to consume from this partition generated from hash Key of "forpartition1". Do i again compute the hash for that key in the consumer or are there any other ways to achieve that. I am pretty new to this technology.
Based on the information that you are new to Kafka, I am tempted to guess you are unintentionally trying an advanced use case and that is probably not what you want.
The common use case is that you publish messages to a topic. The message gets assigned a partition based on the key and all messages for the same key ends at the same partition.
On the consumer, you subscribe to the whole topic (without explicitly asking for a partition) and Kafka will handle the distribution of partitions between all the consumers available.
This gives you the guarantee that all messages with a specific key will be processed by the same consumer (they all go to the same partition and only one consumer handles each partition) and in the same order they were sent.
If you really want to choose the partition yourself, you can write a partitioner class and configure your producer to use it by setting partitioner.class configuration.
From the Kafka Documentation
NAME
partitioner.class
DESCRIPTION
Partitioner class that implements the org.apache.kafka.clients.producer.Partitioner interface.
TYPE
class
DEFAULT
org.apache.kafka.clients.producer.internals.DefaultPartitioner
VALID VALUES
IMPORTANCE
medium
A few example tutorials on how to do it can be found online. Here's a sample for reference:
Write An Apache Kafka Custom Partitioner
Apache Kafka
Foundation Course - Custom Partitioner
Previously I have been using 0.8 API. When you pass topics list to it, it returns a map of streams (one entry per topic). This allows me to spawn a separate thread and assign each topic's stream to it. Having too much data in each topic, spawning a separate thread helps multi tasking.
//0.8 code sample
Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap =
consumer.createMessageStreams(topicCountMap);
I want to upgrade to 0.10. I checked KafkaStreams and KafkaConsumer classes. KafkaConsumer object takes config properties and provide the subscribe method that takes topics List and its return type is void. I cannot find a way where I can get a handle to each topic.
KafkaConsumer consumer = new KafkaConsumer(props);
consumer.subscribe(topicsList);
conusmer.poll(long ms)
KafkaStreams on the other hand seems to have the same problem.
KStreamBuilder builder = new KStreamBuilder();
String [] topics = new String[] {"topic1", "topic2"};
KStream<byte[], byte[]> source = builder.stream(stringSerde, stringSerde, topics);
KafkaStreams streams = new KafkaStreams(builder, props);
streams.start();
There is source.foreach() method available but it is a stream of all topics. Anyone, any ideas ?
First, using a multi threaded consumer is tricky, thus the pattern you employed in 0.8 is hopefully well designed :)
Best practice is to use a single threaded consumer and thus, there is "no need" to separate different topics if a single consumer subscribes to a list of topics at once. Nevertheless, while consuming the record, the record object provides information about from which topic it originates from (it carries this metadata). Thus, you could theoretically dispatch a record according to its topics to a different thread for the actual processing (even if this is not recommended!).
Kafka scales out via partitions, thus, if a single-threaded consumer is not able to handle the load, you should start multiple consumers (as a consumer group) to scale out your consumer processing capacity.
A more general question: if you want to process data per topic, why not using multiple consumers each subscribing to a single topic each?
Last but not least, in Apache Kafka 0.10+ the Kafka Streams API is a newly introduced stream processing library -- though it must not be confused with 0.8 KafkaStream class (hint, there is no "s"). Both are completely unrelated to each other.