If Kafka producer compression is set (e.g. to gzip), and the broker configuration is also set to the same codec, will the broker re-compress any messages from the producer, or recognise that its the same codec and skip and broker-side re-compression?
I'm aware that the broker can be configured to inherit broker codec via the 'producer' setting. However, for our scenario we may have producers (out of our control) who may not set any compression, so we'd like to configure the broker to have default compression enabled, but for those producers that are in our control we'd prefer to use producer compression to save on network bandwidth but also the reduce load on the broker.
Setting topic compression to producer is equivalent to setting it to the same value you use in your producers.
Thus to achieve what you need, you just need to set topic compression to the same algo you use in your producers. The external producers that use the same compression algorithm will work the same as your internal producers, and the rest will trigger a potential decompression/recompression.
This article sums it up nicely:
https://newbedev.com/if-i-set-compression-type-at-topic-level-and-producer-level-which-takes-precedence
Related
I have the situation where my producers currently do not use compression. The topic is configured with compression.type=lz4.
Say that I wanted to, in the future, switch this configuration to be on the producer side, so that compression.type=producer and my producers use e.g. lz4.
My questions are:
Are there any special considerations for this scenario?
What if I were to choose another compression algorithm on the producer side down the road, e.g. zstd? Does Kafka retain the required metadata for this to be possible, or would I need to reprocess my topics so that a single compression algorithm would be used during its lifetime?
When configuring ksqlDB I can set the option ksql.streams.producer.compression.type which enables compression for ksqlDB's internal producers. Thus when I create a ksqlDB stream, it's output topic will be compressed with the selected compression type.
However, as far as I have understood the compression performance is heavily impacted by how much batching the producer does. Therefore, I wish to be able to configure the batch.size and linger.ms parameters for ksqlDB's producers. Does anyone know if and how these parameters can be set for ksqlDB?
Thanks to Matthias J Sax for answering my question on the Confluent Community Slack channel: https://app.slack.com/client/T47H7EWH0/threads?cdn_fallback=1
There is an info-box in the documentation.
That explains it pretty well:
KSQL documentation info box
The underlying producer and consumer clients in ksqlDB's server can be
modified with any valid properties. Simply use the form
ksql.streams.producer.xxx, ksql.streams.consumer.xxx to pass the
property through. For example, ksql.streams.producer.compression.type
sets the compression type on the producer.
Source: https://docs.ksqldb.io/en/latest/reference/server-configuration/
After implementation of gzip compression, whether messages stored earlier will aslo get compressed? And while sending messages to consumer whether Message content is changed or kafka internally uncompresses it?
If you turn on Broker side compression, existing messages are unchanged. Compression will apply to only new messages. When consumers fetch the data, it will be automatically decompressed so you don't have to handle it on the consumer side. Just remember, there's a CPU and latency cost by doing this type of compression potentially.
Very recently we started to get MessageSizeTooLargeException on the metadata, so we enabled offsets.topic.compression.codec=1, to enable gzip compression, but the overall bytes in rate/messages in rate to the broker hasnt changed. Am i missing something? Is there some other property which needs to be changed?
How does this codec work?
Do we need to add some property on consumers and producers as well? I have just enabled this on the broker.
offsets.topic.compression.codec is only for the internal offset topic (namely __consumer_offsets). It is no use for those user-level topics.
See Kafka: Sending a 15MB message on how to avoid MessageTooLargeException/RecordTooLargeException.
We have a use case where data loss is acceptable(think 30-50% loss acceptable). In an effort to reduce costs, we want to know if it is possible to configure Kafka with a replication factor of 1 such that consumers and producer can recover from broker failures by simply consuming and producing from and to available partitions.
If this is possible, what are the configurations that need to be set?
There are other broker technologies that inherently behave this way, however, we would like to avoid the introduction of another technology as kafka is already part of our ecosystem.
If you create a new topic via bin/kafka-topics.sh you need to specify parameter --replication-factor; just set it to 1 to disable replication.
For existing topics, you can change the replication factor using bin/kafka/topics.sh using parameter --alter.
For producers and consumers you might need to do some extra exception handling. For example, if you do specify a dedicated partition when you write a record and the broker is not reachable, you might need to take for of this (maybe just skip this write or whatever is appropriate). But there is no specific configuration you need to set for you clients.