How to consume Kafka Topic in Consumer MQ Topic - apache-kafka

I have a requirement where I need to consume Kafka Topic and write it into MQ Topic. Can someone advise me the best way to do it, I am new to Kafka.
I have read about the IBM MQ Connector in confluent but could not get the idea how to implement it.

The best way to move data from Kafka to MQ is to use the IBM MQ sink connector: https://github.com/ibm-messaging/kafka-connect-mq-sink
This is a Kafka Connect connector. The README contains details for building and running it.

Kafka has a component called Kafka Connect. It is used to read and write data to/from Kafka into other systems such as Database in your case MQ.
Kafka connect can have two kind of connectors
Source connectors - Read data from an external system and write to Kafka (For eg. Read inserted/modified rows from a table in DB and insert into a topic in Kafka)
Sink Connector - Read message from Kafka write to external system.
The link you have added is a source connector, it will read messages from the MQ and write to Kafka.
For simple use case you do not need Kafka connect. You can write a simple Kafka consumer that will read data from Kafka topic and write it to MQ.
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
//Insert code to append to MQ here
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}

Related

Enable kafka client metrics in spring app prometheus

I have an application that consumes from kafka topics and produce to kafka topics. I am trying to set up Grafana dashboard to display kafka metrics. I am trying to expose the kafka metrics to prometheus and so Grafana dashboard can fetch these metrics and display them further.
Below are the libraries and properties I am using.
org.apache.kafka:kafka-clients:jar:2.5.1:compile
io.micrometer:micrometer-registry-prometheus:jar:1.5.4:compile
io.micrometer:micrometer-jersey2:jar:1.5.4:compile
application properties:
spring.jmx.enabled=true
management.endpoints.web.base-path=/
management.endpoint.metrics.enabled=true
management.endpoints.web.exposure.include=prometheus
management.endpoint.prometheus.enabled=true
management.metrics.export.prometheus.enabled=true
Using micrometer, the prometheus is displaying jvm, tomcat related metrics and also the custom metrics. But the kafka metrics are not being exposed. I tried to debug and left with no clue. Any suggestions would be a great help.
As a heads-up, I am not using spring kafka annotations. I am running it as standalone multi-threaded where I fetch records using consumer.poll(1000L) method.
kafka consumer is created as: new KafkaConsumer<>(getProps()) and kafka producer is created as new KafkaProducer<>(props)
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, ...);
props.put(ConsumerConfig.CLIENT_ID_CONFIG, config.getConsumerName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, config.getGroupId());
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, config.getKeyDeserializer());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, config.getValueDeserializer());
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, config.getEnableAutoCommit());
props.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, config.getReceiveBufferBytes());
props.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, config.getMaxPartitionFetchBytes());
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, config.getMaxPollRecords());
props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, RoundRobinAssignor.class.getName());
protected KafkaConsumer<byte[], Event> getConsumer(List<String> inputTopics) {
KafkaConsumer<byte[], Event> consumer = new KafkaConsumer<>(props));
consumer.subscribe(inputTopics, new ConsumerRebalanceListener() {
#Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
logger.info("PARTITIONS revoked: " + partitions);
consumer.commitAsync();
}
#Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
logger.info("PARTITIONS assigned: " + partitions);
}
});
return consumer;
}
I already answered you on Gitter - the code above is not using Spring for Apache Kafka, you are creating your own consumer and producer.
If you use Spring for Apache Kafka's DefaultKafkaConsumerFactory, you can add a MicrometerConsumerListener to it and Spring will register metrics for each consumer with the meter registry via the KafkaClientMetrics. If you create your own consumers, you have to do that yourself.
Same thing for producers.

Too many UnknownProducerIdException in kafka broker for kafka internal topics created for kafka streams application

One of Kafka stream application is generating a lot of Unknown Producer Id errors in the Kafka brokers as well as on the consumer side.
Stream Configs are as below:
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, appName);
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG,appName + "-Client");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, this.bootstrapServer);
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.Long().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG,StreamsConfig.EXACTLY_ONCE);
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, offset);
streamsConfiguration.put(StreamsConfig.STATE_DIR_CONFIG,state_dir);
streamsConfiguration.put(StreamsConfig.REPLICATION_FACTOR_CONFIG,defaultReplication);
return streamsConfiguration;
Error on the broker side:
Error on the consumer side:
custom configuration for repartition internal topic:
prod.Prod-Job-Summary-v0.4-KTABLE-AGGREGATE-STATE-STORE-0000000049-repartition
What can be the reason behind these?
It's a known issue. See KAFKA-7190
Under low traffic conditions purging repartition topics cause WARN statements about UNKNOWN_PRODUCER_ID and KIP-360: Improve handling of unknown producer.

Kafka producer and consumer in seperate EC2 instance

I have 2 ec2 instances one for Kafka broker and the other for Kafka Consumer. May i know how to connect both the ec2 instance to communicate with each other. If i produce a message in my broker i need to get it in the consumer.
Basically, i am looking for that part of configuration where i need to give the consumer information in the broker ec2 instance and vice versa (whichever way it works) . Do i need to use some api or something ?
I have tried in a single node cluster and it worked.
It does not matter you are hosting your broker in ec-2 or elsewhere as long as it is accessible to consumer.
A sample consumer in Java using StringDeserializer for both key and value. You need to use the KafkaConsumer API if you are accessing from a Java program
Properties props = new Properties();
props.put("bootstrap.servers", "YOUR_KAFKA_BROKER_ADDRESS");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
https://kafka.apache.org/10/javadoc/?org/apache/kafka/clients/consumer/KafkaConsumer.html
If you're using Kafka across machines, you need to configure the listeners correctly. This article explains how: https://rmoff.net/2018/08/02/kafka-listeners-explained/
where i need to give the consumer information in the broker
Brokers don't push messages to consumers, so you wouldn't give the information for the consumer to any broker
Any code that works against a single broker should work for more than one, assuming network settings are configured properly

Kafka spout integration

I am using kafka 0.10.1.1 and storm 1.0.2. In the storm documentation for kafka integration , i can see that offsets are still maintained using zookeeper as we are initializing kafka spout using zookeeper servers.
How can i bootstrap the spout using kafka servers .Is there any example for this .
Example from storm docs
BrokerHosts hosts = new ZkHosts(zkConnString);
SpoutConfig spoutConfig = new SpoutConfig(hosts, topicName, "/" + topicName, UUID.randomUUID().toString());
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
This option using zookeeper is working fine and is consuming the messages . but i was not able to see the consumer group or storm nodes as consumers in kafkamanager ui .
Alternate approach tried is this .
KafkaSpoutConfig<String, String> kafkaSpoutConfig = newKafkaSpoutConfig();
KafkaSpout<String, String> spout = new KafkaSpout<>(kafkaSpoutConfig);
private static KafkaSpoutConfig<String, String> newKafkaSpoutConfig() {
Map<String, Object> props = new HashMap<>();
props.put(KafkaSpoutConfig.Consumer.BOOTSTRAP_SERVERS, bootstrapServers);
props.put(KafkaSpoutConfig.Consumer.GROUP_ID, GROUP_ID);
props.put(KafkaSpoutConfig.Consumer.KEY_DESERIALIZER,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(KafkaSpoutConfig.Consumer.VALUE_DESERIALIZER,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(KafkaSpoutConfig.Consumer.ENABLE_AUTO_COMMIT, "true");
String[] topics = new String[1];
topics[0] = topicName;
KafkaSpoutStreams kafkaSpoutStreams =
new KafkaSpoutStreamsNamedTopics.Builder(new Fields("message"), topics).build();
KafkaSpoutTuplesBuilder<String, String> tuplesBuilder =
new KafkaSpoutTuplesBuilderNamedTopics.Builder<>(new TuplesBuilder(topicName)).build();
KafkaSpoutConfig<String, String> spoutConf =
new KafkaSpoutConfig.Builder<>(props, kafkaSpoutStreams, tuplesBuilder).build();
return spoutConf;
}
But this solution is showing CommitFailedException after reading few messages from kafka.
Storm-kafka writes consumer information in a different location and different format in zookeeper with common kafka client. So you can't see it in kafkamanager ui.
You can find some other monitor tools, like
https://github.com/keenlabs/capillary.
On your alternate approach, you're likely getting CommitFailedException due to:
props.put(KafkaSpoutConfig.Consumer.ENABLE_AUTO_COMMIT, "true");
Up to Storm 2.0.0-SNAPSHOT (and since 1.0.6) -
KafkaConsumer autocommit is unsupported
From the docs:
Note that KafkaConsumer autocommit is unsupported. The
KafkaSpoutConfig constructor will throw an exception if the
"enable.auto.commit" property is set, and the consumer used by the
spout will always have that property set to false. You can configure
similar behavior to autocommit through the setProcessingGuarantee
method on the KafkaSpoutConfig builder.
References:
http://storm.apache.org/releases/2.0.0-SNAPSHOT/storm-kafka-client.html
http://storm.apache.org/releases/1.0.6/storm-kafka-client.html

Consuming from the beginning of a kafka topic with Flink

How do I make sure I always consume from the beginning of a Kafka topic with Flink?
With the Kafka 0.9.x consumer that is part of Flink 1.0.2, it appears that it's no longer Kafka but Flink to control the offset:
Flink snapshots the offsets internally as part of its
distributed checkpoints. The offsets committed to Kafka / ZooKeeper
are only to bring the outside view of progress in sync with Flink's
view of the progress. That way, monitoring and other jobs can get a
view of how far the Flink Kafka consumer has consumed a topic.
This is how far I got, but my Flink program always starts where it left off, and doesn't return to the beginning as the configuration instructs it to:
val props = new Properties()
props.setProperty("bootstrap.servers", "localhost:9092");
props.setProperty("group.id", "myflinkservice")
props.setProperty("auto.offset.reset", "earliest")
val incomingData = env.addSource(
new FlinkKafkaConsumer09[IncomingDataRecord](
"my.topic.name",
new IncomingDataSchema,
props
)
)
Use:
consumer.setStartFromEarliest();
I think you can get around this by specifying a random group.id:
val props = new Properties()
props.setProperty("bootstrap.servers", "localhost:9092");
props.setProperty("group.id", s"myflinkservice_${UUID.randomUUID}")
props.setProperty("auto.offset.reset", "smallest") // "smallest", not "earliest"
auto.offset.reset only works when there's no initial offset available in ZooKeeper.