I am new to the kafka eco system and in my case I'm using a Java producer but have no need for sending a key along with the record value which is serialized Avro. Is there a way to build a Java Producer to will not send keys, or are keys a requirement when sending messages in Kafka?
Like #gasparms said, there are built-in ways to produce without sending in a key. Most people use Kafka this way, since they just want to be able to send a stream of messages, with no key. Using keys is only really required if you need log compaction
Here's a really good explanation -
https://stackoverflow.com/a/29515696/236528
ProducerRecord has several constructors, one of them don't have value for the key, so you don't have to indicate it.
Example:
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<>(props);
for(int i = 0; i < 100; i++)
producer.send(new ProducerRecord<>("my-topic", "myValue"));
producer.close();
Related
I use the kafka stream with exactly once guarantee, however, when I debug it, the producer.commitTransaction doesn't execute, and get rebalanced, but I always get duplicated outputs.
My application is just to filter and re route the topic messages.
This config is like below:
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, applicationId);
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootStrapServers);
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.Long().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.ByteArray().getClass());
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, "5");
props.put(StreamsConfig.producerPrefix(ProducerConfig.LINGER_MS_CONFIG), "5");
I am using kafka with spring boot. I read so many articles on kafka tuning and made some configuration changes on both producer and consumer sides.
But still kafka took 1.10 seconds to reach one message from producer to consumer. As per kafka, kafka is very well programmed for high latency but I am not able achieve.
ProducerConfig :
#Bean
public Map<String, Object> producerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 20000);
props.put(ProducerConfig.LINGER_MS_CONFIG, 1000);
return props;
}
ConsumerConfig
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, 502400);
return props;
}
Some Points which I want to described :
Producer : for low latency, Batch.size and liner.ms is very crucial on producer configuration. Try to set as maximum.
Consumer : For high-throughput, need to increase buffer size.
I had these two changes but still not able to achieve what I am expected from Kafka.
How will be achieve High latency using kafka ? Any specific configuration in producer and consumer ?
Any suggestions will help me .
There is a nice white paper on Optimizing your Kafka Deployment by Confluent which explains in full details how to optimize the configurations for
latency
throughput
durability
availability
You are interested in reducing latency. Your current configuration can lead to a high latency because linger.ms is set to 1000 (1 second). In addition, the batch.size is "quite high" which probably leads to a delay of at least 1 second (from linger.ms).
To reduce your latency, you can force your producer to send messages without any delay and irrespective of their size by setting linger.ms=0.
Overall, the white paper give the following recommendation, which is also what I have experienced in practice and can highly support:
Producer:
linger.ms=0
compression.type=none
acks=1
Consumer:
fetch.min.bytes=1
I am trying latest kafka version 1.1.0.
I have point which bothering me about producer behavior.
Below is a small piece of code
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 3);
props.put("max.in.flight.requests.per.connection", 1);
Producer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 100; i++)
producer.send(new ProducerRecord<String, String>("my-topic",
Integer.toString(i), Integer.toString(i)), new CallBack());
assumptions
each message is sent to the same partition of a topic.
Size of each message is large enough that, it is submitted to the broker (not held in the buffer)
Now when the index is 0 and send method fails but for subsequent send call didn't fail, then, in that case, will the message's reach to broker in out of sequence. That is index 0 message will not be the first to reach to broker even if a retry code is added.
Will it be the same behavior if I add below configuration property
enable.idempotence=true
Is there any elegant approach to handle this situation? that is to maintain the order of messages
Thanks in advance
A mllib model is trained somewhere and I want it to be sent to somewhere else. When I try to send it through a kafka topic like this
val model = LogisticRegressionModel.load(sc, "/PATH/To/Model")
val producer=new Producer[String, LogisticRegressionModel](config)
val data=new KeyedMessage[String, LogisticRegressionModel(topic2,key,model)
producer.send(data)
producer.close()
I would encounter an error like this:
org.apache.spark.mllib.classification.LogisticRegressionModel cannot be cast to java.lang.String
So, is it possible for kafka to send non-string messages through a topic?
You can send non-string messages to Kafka topic using Kafka Producer. From 0.9.0 version its better to use Java Client instead of Scala Client.
All you need to do is specifying the correct Key, Value serializer in Properties like below.
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Kafka is confusing me. I am running it local with standard values.
only auto create topic turned on. 1 partition, 1 node, everything local and simple.
If it write
consumer.subscribe("test_topic");
consumer.poll(10);
It simply won't work and never finds any data.
If I instead assign a partition like
consumer.assign(new TopicPartition("test_topic",0));
and check the position I sit at 995. and now can poll and receive all the data my producer put in.
What is it that I don't understand about subscriptions? I don't need multiple consumers each handling only a part of the data. My consumer needs to get all the data of a certain topic. Why does the standard subscription approach not work for me that is shown in all the tutorials?
I do understand that partitions are for load balancing consumers. I don't understand what I do wrong with the subscription.
consumer config properties
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "postproc-" + EnvUtils.getAppInst()); // jeder ist eine eigene gruppe -> kriegt alles
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.LongDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.ByteArrayDeserializer");
KafkaConsumer<Long, byte[]> consumer = new KafkaConsumer<Long, byte[]>(props);
producer config
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("retries", 2);
props.put("batch.size", 16384);
props.put("linger.ms", 5000);
props.put("buffer.memory", 1024 * 1024 * 10); // 10mb
props.put("key.serializer", "org.apache.kafka.common.serialization.LongSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
return new KafkaProducer(props);
producer execution
try (ByteArrayOutputStream out = new ByteArrayOutputStream()){
event.writeDelimitedTo(out);
for (long a = 10; a<20;a++){
long rand=new Random(a).nextLong();
producer.send(new ProducerRecord<>("test_topic",rand ,out.toByteArray()));
}
producer.flush();
}catch (IOException e){
consumer execution
consumer.subscribe(Arrays.asList("test_topic"));
ConsumerRecords<Long,byte[]> records = consumer.poll(10);
for (ConsumerRecord<Long,byte[]> r :records){ ...
I managed to solve the issue. The problem were timeouts. When piling I didn't give it enough time to complete. I assume assigning a partition just is a lot faster and therfore completed timely. The standard subscription poll takes longer. Never actually finished and did not commit.
At least I think that was the problem. With longer timeouts it works.
You are missing this property I think
auto.offset.reset=earliest
What to do when there is no initial offset in Kafka or if the current
offset does not exist any more on the server (e.g. because that data
has been deleted):
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
none: throw exception to the consumer if no previous offset is found for the consumer's group
anything else: throw exception to the consumer.
Reference: http://kafka.apache.org/documentation.html#highlevelconsumerapi