Configure Kafka to send large files(messages) [duplicate]

Configure Kafka to send large files(messages) [duplicate] - apache-kafka

This question already has answers here:
How can I send large messages with Kafka (over 15MB)?
(9 answers)
Closed 6 months ago.
I have researched different configs even from Stackoverflow, but really stucked with it for several days so created separate question for it. I am trying to configure Kafka to send large messages (10-50 mbytes). I run Kafka in Docker (version is confluentinc/cp-kafka:7.2.1). I also understand that Kafka is the not the best instrument for it. I am trying to config Kafka from Java the way below, and restarted my Kafka Docker instance, but still see the error message:
org.apache.kafka.common.errors.RecordTooLargeException: The request
included a message larger than the max message size the server will
accept.
And below is config which I use (from Google and Stackoverflow).
Here are my Producer and Consumer and KafkaAdmin java classes:
KafkaAdminConfig.java:
#Bean
public KafkaAdmin kafkaAdmin() {
Map<String, Object> configProps = new HashMap<>();
configProps.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
configProps.put("max.message.bytes", String.valueOf(maxFileSize));
configProps.put("max.request.size", maxFileSize);
configProps.put("replica.fetch.max.bytes", maxFileSize);
configProps.put("message.max.bytes", maxFileSize);
configProps.put("max.message.bytes", maxFileSize);
configProps.put("max.message.max.bytes", maxFileSize);
configProps.put("max.partition.fetch.bytes", maxFileSize);
return new KafkaAdmin(configProps);
}
ProducerConfig.java
#Bean
public ProducerFactory<String, Byte[]> producerFactoryLargeFiles() {
Map<String, Object> configProps = new HashMap<>();
configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class);
//required to allow Kafka process files <= 20 mb
configProps.put("buffer.memory", maxFileSize);
configProps.put("max.request.size", maxFileSize);
configProps.put("replica.fetch.max.bytes", maxFileSize);
configProps.put("message.max.bytes", maxFileSize);
configProps.put("max.message.bytes", maxFileSize);
configProps.put("acks", "all");
configProps.put("retries", 0);
configProps.put("batch.size", 16384);
configProps.put("linger.ms", 1);
return new DefaultKafkaProducerFactory<>(configProps);
}
ConsumerConfig.java
#Bean
public ConsumerFactory<String, String> consumerFactoryLargeFiles() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
//required to allow Kafka process files <= 20 mb
props.put("fetch.message.max.bytes", maxFileSize);
return new DefaultKafkaConsumerFactory<>(props);
}
maxFileSize is 104857600 - it is about 104Mb. And I am trying to send message about 3MB.
I also added following env variables to my docker compose:
KAFKA_MAX_REQUEST_SIZE: 104857600 KAFKA_PRODUCER_MAX_REQUEST_SIZE:
104857600 CONNECT_PRODUCER_MAX_REQUEST_SIZE: 104857600
I will be happy to provide additional information or logs if need.

I suggest to upload file with other means and send file name via Kafka

I resolved by myself - maybe this will be helpful for other people. I followed following manual by Baeldung - https://www.baeldung.com/java-kafka-send-large-message:
Kafka topic config via java
Kafka broker config via env variable cause I am using Docker:
KAFKA_MESSAGE_MAX_BYTES: 20971520
Kafka Producer config via java
Kafka Consumer config via java

Related

Spring Integration Kafka - Exception Handling by using SeekToCurrentErrorHandler

Problem Statement :
Need to handle exceptions occur while consuming messages in kafka
Commit failed offset
Seek to the next unprocessed offset, so that next polling start from this offset.
Seems all these are handled as part of SeekToCurrentErrorHandler.java in Spring-Kafka.
How to leverage this functionality in Spring-Integration-Kafka ?
Please help with this.
Versions used :
Spring-Integration-Kafka - 3.3.1
Spring for apache kafka - 2.5.x
#Bean(name ="kafkaConsumerFactory")
public ConsumerFactory consumerFactory0(
HashMap<String, String> properties = new HashMap<>();
properties.put("bootstrap.servers", "kafkaServerl");
properties.put("key.deserializer", StringDeserializer.class);
properties.put("value.deserializer", StringDeserializer.class);
properties.put("auto.offset.reset", "earliest");
} return new DefaultKafkaConsumerFactoryo(properties); I
#Bean("customKafkalistenerContainer")
public ConcurrentMessagelistenerContainerCtring, AddAccountReqRes> customKafkaListenerContainer() (
ContainerProperties containerProps = new ContainerProperties("Topici");
containerProps.setGroupld("Groupldl");
return (ConcurrentMessagelistenerContainerCtring, CustomReqRes>) new ConcurrentMessageListenerContainer<>(
} kafkaConsumerFactory, containerProps);
IntegrationFlows.from(Kafka.messageDrivenChannelAdapter(customKafkalistenerContainer, KafkaMessageDrivenChannelAdapter.ListenerMode.record)
.errorChannel(errorChannel()))
.handle(transformationProcessor, "process")
.channel("someChannel")
.get();

spring-integration-kafka uses spring-kafka underneath, so you just need to configure the adapter's container with the error handler.
spring-integration-kafka was moved to spring-integration starting with 5.4 (it was an extension previously). So, the current versions of both jars is 5.4.2.

Spring Kafka Idempotence Producer configuration

For the native Java Kafka client, there is a Kafka configuration called, enable.idempotence and we can set it to be true to enable idempotence producer.
However, for Spring Kafka, I can't find similar idempotence property in KafkaProperties class.
So I am wondering, if I manually set in my Spring Kafka configuration file, whether this property will take effect or Spring will totally ignore this config for Spring Kafka?

There are two ways to specify this property
application.properties You can use this property to specify any additional properties on producer
spring.kafka.producer.properties.*= # Additional producer-specific properties used to configure the client.
If you have any additional common config between Producer and Consumer
spring.kafka.properties.*= # Additional properties, common to producers and consumers, used to configure the client.
Through Code You can also override and customize the configs
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> configProps = new HashMap<>();
configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,bootstrapAddress);
configProps.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
StringSerializer.class);
configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
StringSerializer.class);
return new DefaultKafkaProducerFactory<>(configProps);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
}

You're trying to add features that are not handled by Spring KafkaProperties, if you look at the documentation, you can do as the following:
Only a subset of the properties supported by Kafka are available directly through the KafkaProperties class.
If you wish to configure the producer or consumer with additional properties that are not directly supported, use the following properties:
spring.kafka.properties.prop.one=first
spring.kafka.admin.properties.prop.two=second
spring.kafka.consumer.properties.prop.three=third
spring.kafka.producer.properties.prop.four=fourth
spring.kafka.streams.properties.prop.five=fifth
https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-messaging.html#boot-features-kafka-extra-props
Yannick

You can find it with ProducerConfig as it is producer configuration. In order to enable this, you need to add below line in producerConfigs:
Properties producerProperties = new Properties();
producerProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
producerProperties.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);

How to consume Kafka Topic in Consumer MQ Topic

I have a requirement where I need to consume Kafka Topic and write it into MQ Topic. Can someone advise me the best way to do it, I am new to Kafka.
I have read about the IBM MQ Connector in confluent but could not get the idea how to implement it.

The best way to move data from Kafka to MQ is to use the IBM MQ sink connector: https://github.com/ibm-messaging/kafka-connect-mq-sink
This is a Kafka Connect connector. The README contains details for building and running it.

Kafka has a component called Kafka Connect. It is used to read and write data to/from Kafka into other systems such as Database in your case MQ.
Kafka connect can have two kind of connectors
Source connectors - Read data from an external system and write to Kafka (For eg. Read inserted/modified rows from a table in DB and insert into a topic in Kafka)
Sink Connector - Read message from Kafka write to external system.
The link you have added is a source connector, it will read messages from the MQ and write to Kafka.
For simple use case you do not need Kafka connect. You can write a simple Kafka consumer that will read data from Kafka topic and write it to MQ.
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
//Insert code to append to MQ here
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}

Kafka producer and consumer in seperate EC2 instance

I have 2 ec2 instances one for Kafka broker and the other for Kafka Consumer. May i know how to connect both the ec2 instance to communicate with each other. If i produce a message in my broker i need to get it in the consumer.
Basically, i am looking for that part of configuration where i need to give the consumer information in the broker ec2 instance and vice versa (whichever way it works) . Do i need to use some api or something ?
I have tried in a single node cluster and it worked.

It does not matter you are hosting your broker in ec-2 or elsewhere as long as it is accessible to consumer.
A sample consumer in Java using StringDeserializer for both key and value. You need to use the KafkaConsumer API if you are accessing from a Java program
Properties props = new Properties();
props.put("bootstrap.servers", "YOUR_KAFKA_BROKER_ADDRESS");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
https://kafka.apache.org/10/javadoc/?org/apache/kafka/clients/consumer/KafkaConsumer.html

If you're using Kafka across machines, you need to configure the listeners correctly. This article explains how: https://rmoff.net/2018/08/02/kafka-listeners-explained/

where i need to give the consumer information in the broker
Brokers don't push messages to consumers, so you wouldn't give the information for the consumer to any broker
Any code that works against a single broker should work for more than one, assuming network settings are configured properly

Kafka spout integration

I am using kafka 0.10.1.1 and storm 1.0.2. In the storm documentation for kafka integration , i can see that offsets are still maintained using zookeeper as we are initializing kafka spout using zookeeper servers.
How can i bootstrap the spout using kafka servers .Is there any example for this .
Example from storm docs
BrokerHosts hosts = new ZkHosts(zkConnString);
SpoutConfig spoutConfig = new SpoutConfig(hosts, topicName, "/" + topicName, UUID.randomUUID().toString());
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
This option using zookeeper is working fine and is consuming the messages . but i was not able to see the consumer group or storm nodes as consumers in kafkamanager ui .
Alternate approach tried is this .
KafkaSpoutConfig<String, String> kafkaSpoutConfig = newKafkaSpoutConfig();
KafkaSpout<String, String> spout = new KafkaSpout<>(kafkaSpoutConfig);
private static KafkaSpoutConfig<String, String> newKafkaSpoutConfig() {
Map<String, Object> props = new HashMap<>();
props.put(KafkaSpoutConfig.Consumer.BOOTSTRAP_SERVERS, bootstrapServers);
props.put(KafkaSpoutConfig.Consumer.GROUP_ID, GROUP_ID);
props.put(KafkaSpoutConfig.Consumer.KEY_DESERIALIZER,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(KafkaSpoutConfig.Consumer.VALUE_DESERIALIZER,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(KafkaSpoutConfig.Consumer.ENABLE_AUTO_COMMIT, "true");
String[] topics = new String[1];
topics[0] = topicName;
KafkaSpoutStreams kafkaSpoutStreams =
new KafkaSpoutStreamsNamedTopics.Builder(new Fields("message"), topics).build();
KafkaSpoutTuplesBuilder<String, String> tuplesBuilder =
new KafkaSpoutTuplesBuilderNamedTopics.Builder<>(new TuplesBuilder(topicName)).build();
KafkaSpoutConfig<String, String> spoutConf =
new KafkaSpoutConfig.Builder<>(props, kafkaSpoutStreams, tuplesBuilder).build();
return spoutConf;
}
But this solution is showing CommitFailedException after reading few messages from kafka.

Storm-kafka writes consumer information in a different location and different format in zookeeper with common kafka client. So you can't see it in kafkamanager ui.
You can find some other monitor tools, like
https://github.com/keenlabs/capillary.

On your alternate approach, you're likely getting CommitFailedException due to:
props.put(KafkaSpoutConfig.Consumer.ENABLE_AUTO_COMMIT, "true");
Up to Storm 2.0.0-SNAPSHOT (and since 1.0.6) -
KafkaConsumer autocommit is unsupported
From the docs:
Note that KafkaConsumer autocommit is unsupported. The
KafkaSpoutConfig constructor will throw an exception if the
"enable.auto.commit" property is set, and the consumer used by the
spout will always have that property set to false. You can configure
similar behavior to autocommit through the setProcessingGuarantee
method on the KafkaSpoutConfig builder.
References:
http://storm.apache.org/releases/2.0.0-SNAPSHOT/storm-kafka-client.html
http://storm.apache.org/releases/1.0.6/storm-kafka-client.html

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Configure Kafka to send large files(messages) [duplicate] - apache-kafka

I suggest to upload file with other means and send file name via Kafka

Related

Spring Integration Kafka - Exception Handling by using SeekToCurrentErrorHandler

Spring Kafka Idempotence Producer configuration

How to consume Kafka Topic in Consumer MQ Topic

Kafka producer and consumer in seperate EC2 instance

Kafka spout integration

Categories

Resources