Flink Kafka connector to eventhub - apache-kafka

I am using Apache Flink, and trying to connect to Azure eventhub by using Apache Kafka protocol to receive messages from it. I manage to connect to Azure eventhub and receive messages, but I can't use flink feature "setStartFromTimestamp(...)" as described here (https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration).
When I am trying to get some messages from timestamp, Kafka said that the message format on the broker side is before 0.10.0.
Is anybody faced with this?
Apache Kafka client version is 2.0.1
Apache Flink version is 1.7.2
UPDATED: tried to use Azure-Event-Hub quickstart examples (https://github.com/Azure/azure-event-hubs-for-kafka/tree/master/quickstart/java) in consumer package added code to get offset with timestamp, it returns null as expected if message version under 0.10.0 kafka version.
List<PartitionInfo> partitionInfos = consumer.partitionsFor(TOPIC);
List<TopicPartition> topicPartitions = partitionInfos.stream().map(pi -> new TopicPartition(pi.topic(), pi.partition())).collect(Collectors.toList());
Map<TopicPartition, Long> topicPartitionToTimestampMap = topicPartitions.stream().collect(Collectors.toMap(tp -> tp, tp -> 0L));
Map<TopicPartition, OffsetAndTimestamp> offsetAndTimestamp = consumer.offsetsForTimes(topicPartitionToTimestampMap);
System.out.println(offsetAndTimestamp);

Sorry we missed this. Kafka offsetsForTimes() is now supported in EH (previously unsupported).
Feel free to open an issue against our Github in the future. https://github.com/Azure/azure-event-hubs-for-kafka

Related

Messages are not getting consumed

I have added the below configuration in application.properties file of Spring Boot with Camel implementation but the messages are not getting consumed. Am I missing any configuration? Any pointers to implement consumer from Azure event hub using kafka protocol and Camel ?
bootstrap.servers=NAMESPACENAME.servicebus.windows.net:9093
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{YOUR.EVENTHUBS.CONNECTION.STRING}";
The route looks like this:
from("kafka:{{topicName}}?brokers=NAMESPACENAME.servicebus.windows.net:9093" )
.log("Message received from Kafka : ${body}");
I found the solution for this issue. Since I was using the Spring Boot Auto configuration (camel-kafka-starter), the entry on the application.properties file had to be modified as given below:
camel.component.kafka.brokers=NAMESPACENAME.servicebus.windows.net:9093
camel.component.kafka.security-protocol=SASL_SSL
camel.component.kafka.sasl-mechanism=PLAIN
camel.component.kafka.sasl-jaas-config =org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{YOUR.EVENTHUBS.CONNECTION.STRING}";
The consumer route for the Azure event hub with Kafka protocol will look like this:
from("kafka:{{topicName}}")
.log("Message received from Kafka : ${body}");
Hope this solution helps to consume events from Azure event hub in Camel using Kafka protocol

How to consume Kafka messages with a protobuf definition in Apache Beam?

I'm using KafkaIO unbounded source in a Apache Beam pipeline running on DataFlow. Following configuration works for me
Map<String, Object> kafkaConsumerConfig = new HashMap<String, Object>() {{
put("auto.offset.reset", "earliest");
put("group.id", "my.group.id");
}};
p.apply(KafkaIO.<String, String>read()
.withBootstrapServers("ip1:9092,ip2:9092,ip3:9092")
.withConsumerConfigUpdates(kafkaConsumerConfig)
.withTopic("my.topic")
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.withMaxNumRecords(10)
.withoutMetadata())
// do something
Now as I have a protobuf definition for the messages in my topic I would like to use it to convert the kafka records in Java objects.
Following configuration doesn't work and requires a Coder:
p.apply(KafkaIO.<String, Bytes>read()
.withBootstrapServers("ip1:9092,ip2:9092,ip3:9092")
.withConsumerConfigUpdates(kafkaConsumerConfig)
.withTopic("my.topic")
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(BytesDeserializer.class)
.withMaxNumRecords(10)
.withoutMetadata())
Unfortunately, I cannot find out what is the right Value Deserializer + Coder combination and cannot find similar examples in the documentation. Do you have any working examples for using Protobuf with Kafka source in Apache Beam?

kafka producer api 0.8.2.1 is not compatible with 1.0.1 broker?

i was using kafka producer which version is 0.8.2.1 to write to kafka broker which version is 1.0.1 async.
my code is like bellow:
KafkaProducer producer = new KafkaProducer(configs);
ProducerRecord producerRecord = new ProducerRecord("topic", "key", "value");
producer.send(producerRecord, new CallBack(){
#override
public void onCompletion(RecordMetadata metadata,
java.lang.Exception exception){
if(metadata != null){
System.out.println(metadata.partition() + "|" + metadata.offset());
}
});
i found that partition offset printed in my producer app's log at "onCompletion" method was bigger than kafka broker's offset which was query by shell command "./kafka-run-class.sh kafka.tools.GetOffsetShell ".
my producer was set with the config "acks=all"
for example, partition 0's offset is 30000 in log, but is 10000 queryed by shell command.
is it caused by version compatible problem?
The producer API was rewriten around Kafka 0.9 such that offsets are stored in Kafka, not Zookeeper. It's not clear if you've used GetOffsetShell with Zookeeper option or not.
Newer brokers are mostly backwards compatible down to version 0.10.2, but you shouldn't expect older clients less than those versions to work correctly with newer broker versions
https://cwiki.apache.org/confluence/display/KAFKA/Compatibility+Matrix

How can flink reads newest data from kafka

Now, in my scenario, flink reads newest data from kafka everytime.
For example,
kafka products:
log1
log2
log3
When read,only log3 is needed.
Kafka consumer API, seekToEnd() can do it.
Does FlinkKafkaConsumer have the same function?
Flink 1.3 has this function.
FlinkKafkaConsumer09 flinkKafkaConsumer09 = new FlinkKafkaConsumer09<>(properties.getProperty("topic"), new RowDeserializationSchema(properties.getProperty("separator"), resultType), properties);
flinkKafkaConsumer09.setStartFromLatest();

flink kafka consumer groupId not working

I am using kafka with flink.
In a simple program, I used flinks FlinkKafkaConsumer09, assigned the group id to it.
According to Kafka's behavior, when I run 2 consumers on the same topic with same group.Id, it should work like a message queue. I think it's supposed to work like:
If 2 messages sent to Kafka, each or one of the flink program would process the 2 messages totally twice(let's say 2 lines of output in total).
But the actual result is that, each program would receive 2 pieces of the messages.
I have tried to use consumer client that came with the kafka server download. It worked in the documented way(2 messages processed).
I tried to use 2 kafka consumers in the same Main function of a flink programe. 4 messages processed totally.
I also tried to run 2 instances of flink, and assigned each one of them the same program of kafka consumer. 4 messages.
Any ideas?
This is the output I expect:
1> Kafka and Flink2 says: element-65
2> Kafka and Flink1 says: element-66
Here's the wrong output i always get:
1> Kafka and Flink2 says: element-65
1> Kafka and Flink1 says: element-65
2> Kafka and Flink2 says: element-66
2> Kafka and Flink1 says: element-66
And here is the segment of code:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
ParameterTool parameterTool = ParameterTool.fromArgs(args);
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer09<>(parameterTool.getRequired("topic"), new SimpleStringSchema(), parameterTool.getProperties()));
messageStream.rebalance().map(new MapFunction<String, String>() {
private static final long serialVersionUID = -6867736771747690202L;
#Override
public String map(String value) throws Exception {
return "Kafka and Flink1 says: " + value;
}
}).print();
env.execute();
}
I have tried to run it twice and also in the other way:
create 2 datastreams and env.execute() for each one in the Main function.
There was a quite similar question on the Flink user mailing list today, but I can't find the link to post it here. So here a part of the answer:
"Internally, the Flink Kafka connectors don’t use the consumer group
management functionality because they are using lower-level APIs
(SimpleConsumer in 0.8, and KafkaConsumer#assign(…) in 0.9) on each
parallel instance for more control on individual partition
consumption. So, essentially, the “group.id” setting in the Flink
Kafka connector is only used for committing offsets back to ZK / Kafka
brokers."
Maybe that clarifies things for you.
Also, there is a blog post about working with Flink and Kafka that may help you (https://data-artisans.com/blog/kafka-flink-a-practical-how-to).
Since there is not much use of group.id of flink kafka consumer other than commiting offset to zookeeper. Is there any way of offset monitoring as far as flink kafka consumer is concerned. I could see there is a way [with the help of consumer-groups/consumer-offset-checker] for console consumers but not for flink kafka consumers.
We want to see how our flink kafka consumer is behind/lagging with kafka topic size[total number of messages in topic at given point of time], it is fine to have it at partition level.