Is there any option to disable Kafka headers being consumed from consumer. In my case I wrote a consumer to consume messages from a Kafka topic published by an upstream system. My processing doesn't require any information from headers and the published headers are heavy weight (bigger than the message itself in size). So my consumer is taking longer time than expected.
Any option that I can only consume message content leaving headers so that it saves time to transfer the headers over network and de-serialize them at consumer. Your help is appreciated.
Every message is a Record with Headers (as of Kafka 0.11).
length: varint
attributes: int8
bit 0~7: unused
timestampDelta: varint
offsetDelta: varint
keyLength: varint
key: byte[]
valueLen: varint
value: byte[]
Headers => [Header]
Record Header
headerKeyLength: varint
headerKey: String
headerValueLength: varint
Value: byte[]
Even if you ignore deserializing them, they will still be sent over the wire as part of the Record's TCP packet body.
You could try using a Kafka 0.10.2 client version, for example, which might drop the header entirely, because they just weren't part of the API, but YMMV.
As mentioned in the comments, the most reliable way here would be to stop sending such heavy information in the upstream application. Or the middle-ground would be to compress, and/or binary encode that data.
Related
A kafka message has:
key, value, compression type, headers(key-value pairs,optional),
partition+offset, timestamp
Key is hashed to partition to find which partition producer would write to.
Then why do we need partition as part of message.
Also, how does producer know the offset as offset seems more like a property of kafka server? And doesn't it cause coupling between server and producer?
And how would it work if multiple producers are writing to a topic, as offset send by them may clash?
why do we need partition as part of message.
It's optional for the client to set the record partition. The partition is still needed in the protocol because the key is not hashed server-side, then rerouted.
how does producer know the offset as offset seems more like a property of kafka server?
The producer would need a callback to get the OffsetMetadata, but it's not known when a batch is sent
And doesn't it cause coupling between server and producer?
Yes? It's the Kafka protocol. The consumer is also "coupled" with the server because it must understand how to communicate with it.
multiple producers are writing to a topic, as offset send by them may clash?
If max.inflight.connections is more than 1 and retires are enabled, then yes, batches may get rearranged, but send requests are initially ordered, and clients do not set the record offset, the broker does.
It is a business requirement that all the messages I consume from the kafka topic contain an integrity seal so as to identify if any changes were introduced to the message payload.
I was thinking I could possible do this in a kafka connect transform.
This would require that I convert the payload to the resulting json message format in the transform prior to sealing it so that the results could be verified upon consumption of the message.
My issues currently are...
1) I am not sure how to convert the payload to the same json that would be output to the kafka topic whilst in the transform.
2) I am not sure the best way to add the seal to the message. It would need to be placed in a consistent location ( for example first ) so that It could be stripped easily/completely prior to validating the seal in the consumer.
Any thoughts, suggestions, different approaches would be appreciated.
Producer serializes the message and send them to Broker in byte arrays. And Consumers deserializes those byte arrays. Broker always stores and passes byte arrays. This is how I understood.
But when you use REST Proxy in Kafka, Producer encodes the message with base64, and Consumer decodes those base64 messages.
A Python example of Producer and Consumer :
# Producer using the REST Proxy
payload = {"records" :
[{
"key":base64.b64encode("firstkey"),
"value":base64.b64encode("firstvalue")
}]}
# Consumer using the REST Proxy
print "Message Key:" + base64.b64decode(message["key"])
Why do you send message in base64 to the Broker instead of byte arrays?
When using REST Proxy, a Broker stores messages in base64 format?
When a Producer wants to send a message 'Man', it serializes into bytes (bits). A Broker will store it as 010011010110000101101110. When a Consumer gets this message, it will deserialize back to Man.
However, according to Confluent document :
Data formats - The REST Proxy can read and write data using JSON, raw bytes encoded with base64 or using JSON-encoded Avro.
Therefore, a Producer using REST Proxy will change the message Man into TWFu (base64 encode) and send this to a Broker, and a Consumer using REST Proxy will base64 decode this back to Man.
As you already answered the broker always stores the data in a binary format.
Answering why base 64 is needed instead I found this on the confluent documentation (https://www.confluent.io/blog/a-comprehensive-rest-proxy-for-kafka/):
The necessity of using base64 encoding is more clear when you have to send raw binary data through the Rest Proxy:
If you opt to use raw binary data, it cannot be embedded directly in JSON, so the API uses a string containing the base64 encoded data.
After implementation of gzip compression, whether messages stored earlier will aslo get compressed? And while sending messages to consumer whether Message content is changed or kafka internally uncompresses it?
If you turn on Broker side compression, existing messages are unchanged. Compression will apply to only new messages. When consumers fetch the data, it will be automatically decompressed so you don't have to handle it on the consumer side. Just remember, there's a CPU and latency cost by doing this type of compression potentially.
Below are my producer configuration , where if you see their is compression type as gzip , even though i mentioned the compression type why the message is not publishing and it is failing with
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, edi856KafkaConfig.getBootstrapServersConfig());
props.put(ProducerConfig.RETRIES_CONFIG, edi856KafkaConfig.getRetriesConfig());
props.put(ProducerConfig.BATCH_SIZE_CONFIG, edi856KafkaConfig.getBatchSizeConfig());
props.put(ProducerConfig.LINGER_MS_CONFIG, edi856KafkaConfig.getIntegerMsConfig());
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, edi856KafkaConfig.getBufferMemoryConfig());
***props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.IntegerSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");***
props.put(Edi856KafkaProducerConstants.SSL_PROTOCOL, edi856KafkaConfig.getSslProtocol());
props.put(Edi856KafkaProducerConstants.SECURITY_PROTOCOL, edi856KafkaConfig.getSecurityProtocol());
props.put(Edi856KafkaProducerConstants.SSL_KEYSTORE_LOCATION, edi856KafkaConfig.getSslKeystoreLocation());
props.put(Edi856KafkaProducerConstants.SSL_KEYSTORE_PASSWORD, edi856KafkaConfig.getSslKeystorePassword());
props.put(Edi856KafkaProducerConstants.SSL_TRUSTSTORE_LOCATION, edi856KafkaConfig.getSslTruststoreLocation());
props.put(Edi856KafkaProducerConstants.SSL_TRUSTSTORE_PASSWORD, edi856KafkaConfig.getSslTruststorePassword());
**props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip");**
and error am getting is below
org.apache.kafka.common.errors.RecordTooLargeException: The message is 1170632 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.
2017-12-07_12:34:10.037 [http-nio-8080-exec-1] ERROR c.tgt.trans.producer.Edi856Producer - Exception while writing mesage to topic= '{}'
org.springframework.kafka.core.KafkaProducerException: Failed to send; nested exception is org.apache.kafka.common.errors.RecordTooLargeException: The message is 1170632 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration.
and want consumer configuration we need to use of i want string representation of the kafka message on the consumer side
Unfortunately you're encountering a rather odd issue with the new Producer implementation in Kafka.
Although the messages size limit applied by Kafka at the broker level is applied to a single compressed record set (potentially multiple messages), the new producer currently applies the max.request.size limit on the record prior to any compression.
This has been captured in https://issues.apache.org/jira/browse/KAFKA-4169 (created 14/Sep/16 and unresolved at time of writing).
If you are certain that the compressed size of your message (plus any overhead of the record set) will be smaller than the broker's configured max.message.bytes, you may be able to get away with increasing the value of max.request.size property on your Producer without having to change any configuration on the broker. This would allow the Producer code to accept the size of the pre-compression payload where it would then be compressed and sent to the broker.
However it is important to note that should the Producer try to send a request that is too large for the configuration of the broker, the broker will reject the message and it will be up to your application to handle this correctly.
Just read the error message :)
The message is 1170632 bytes when serialized which is larger than the maximum request size you have configured with the max.request.size configuration
The message is > 1 MByte that is the default value allowed by Apache Kafka. To allow large messages check the answers in How can I send large messages with Kafka (over 15MB)?