I'm trying out with some Kafka basics and following examples at https://kafka.apache.org/quickstart. After starting zookeepier and kafka, I tried producing and consuming with included kafka shell scripts and it all worked without issue.
When I try to produce message from simple scala application then I get following error org.apache.kafka.common.errors.TimeoutException: Topic quickstart-events not present in metadata after 60000 ms.
I ensured the topic has been created and can telnet to localhost:9092 as well.
Here's the code I'm using for producer:
val props = new Properties()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
props.put(ProducerConfig.CLIENT_ID_CONFIG, "test")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getName)
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getName)
val producer = new KafkaProducer[String, String](props)
producer.send(new ProducerRecord[String, String]("quickstart-events", "1", "some event")).get()
Running this on mac, above code is part of a test case executed in IntelliJ.
Solved. I used kafka-clients library version 2.6.0 and running kafka server version 3.2.0. Matching version of the library fixed the issue.
I got this problem as well, the version is correct for me.
I figure out it's the lack of sasl certification.
try:
// set SASL configuration here
props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, “SASL_PLAINTEXT”);
props.put(SaslConfigs.SASL_MECHANISM, “PLAIN”);
props.put(“sasl.jaas.config”,
“org.apache.kafka.common.security.plain.PlainLoginModule required username=\”alice\” password=\”123456\”;”);
Related
I have a simple stream execution configured as:
val config: Configuration = new Configuration()
config.setString("taskmanager.memory.managed.size", "4g")
config.setString("parallelism.default", "4")
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(config)
env
.fromSource(KafkaSource.builder[String]
.setBootstrapServers("node1:9093,node2:9093,node3:9093")
.setTopics("example-topic")
//.setProperties(kafkaProps) // didn't work
.setProperty("security.protocol", "SASL_SSL")
.setProperty("sasl.mechanism", "GSSAPI")
.setProperty("sasl.kerberos.service.name", "kafka")
.setProperty("group.id","groupid-test")
//.setGroupId("groupid-test") // didn't work
.setStartingOffsets(OffsetsInitializer.earliest)
.setProperty("partition.discovery.interval.ms", "60000") // discover part
.setDeserializer(KafkaRecordDeserializationSchema.valueOnly(classOf[StringDeserializer]))
.build,
WatermarkStrategy.noWatermarks[String],
"example-input-topic"
)
.print
env.execute("asdasd")
My flink version is: 1.14.2
My kafka is running on cloudera. Kafka version: 2.2.1-cdh6.3.2
Am able to consume records from Kafka. But it doesnt set groupid for topic. Does anyone has any ideas?
Since Flink 1.14.0, the group.id is an optional value. See https://issues.apache.org/jira/browse/FLINK-24051. You can set your own value if you want to specify one. You can see from the accompanying PR how this was previously set at https://github.com/apache/flink/pull/17052/files#diff-34b4ff8d43271eeac91ba17f29b13322f6e0ff3d15f71003a839aeb780fe30fbL56
Flink Version 1.9.0
Scala Version 2.11.12
Kafka Cluster Version 2.3.0
I am trying to connect a flink job I made to a kafka cluster that has 3 partitions. I have tested my job against a kafka cluster topic running on my localhost that has one partition and it works to read and write to the local kafka. When I attempt to connect to a topic that has multiple partitions I get the following error (topicName is the name of the topic I am trying to consume. Weirdly I dont have any issues when I am trying to produce to a multi-partition topic.
java.lang.RuntimeException: topicName
at org.apache.flink.streaming.connectors.kafka.internal.KafkaPartitionDiscoverer.getAllPartitionsForTopics(KafkaPartitionDiscoverer.java:80)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.discoverPartitions(AbstractPartitionDiscoverer.java:131)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.open(FlinkKafkaConsumerBase.java:508)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:529)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:393)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
My consumer code looks like this:
def defineKafkaDataStream[A: TypeInformation](topic: String,
env: StreamExecutionEnvironment,
SASL_username:String,
SASL_password:String,
kafkaBootstrapServer: String = "localhost:9092",
zookeeperHost: String = "localhost:2181",
groupId: String = "test"
)(implicit c: JsonConverter[A]): DataStream[A] = {
val properties = new Properties()
properties.setProperty("bootstrap.servers", kafkaBootstrapServer)
properties.setProperty("security.protocol" , "SASL_SSL")
properties.setProperty("sasl.mechanism" , "PLAIN")
val jaasTemplate = "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"%s\" password=\"%s\";"
val jaasConfig = String.format(jaasTemplate, SASL_username, SASL_password)
properties.setProperty("sasl.jaas.config", jaasConfig)
properties.setProperty("group.id", "MyConsumerGroup")
env
.addSource(new FlinkKafkaConsumer(topic, new JSONKeyValueDeserializationSchema(true), properties))
.map(x => x.convertTo[A](c))
}
Is there another property I should be setting to allow for a single job to consume from multiple partitions?
After digging around and questioning everything in my process I found the issue.
I looked at the Java code of the KafkaPartitionDiscoverer function that had the runtime exception.
One section I noticed handled RuntimeException
if (kafkaPartitions == null) {
throw new RuntimeException("Could not fetch partitions for %s. Make sure that the topic exists.".format(topic));
}
I was working off of a kafka cluster that I dont maintain and had a topic name that was given to me that I did not verify first. When I did verify it using:
kafka-topics --describe --zookeeper serverIP:2181 --topic topicName
It returned a response of :
Error while executing topic command : Topics in [] does not exist
ERROR java.lang.IllegalArgumentException: Topics in [] does not exist
at kafka.admin.TopicCommand$.kafka$admin$TopicCommand$$ensureTopicExists(TopicCommand.scala:435)
at kafka.admin.TopicCommand$ZookeeperTopicService.describeTopic(TopicCommand.scala:350)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:66)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
After I got the correct topic name everything works.
I am working with kafka. and getting the message in the description. I am setting the properties for the deserializer in my consumer class.
props.put("key.deserializer", "org.apache.kafka.abstracts.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.abstracts.serialization.StringDeserializer");
producer = new KafkaProducer<>(props);
Still at runtime i'm getting an error that the deserializer could not be found. We recently upgraded to 10.1.1 from 10.0.1 is there a change in there that I am missing?
Kafka's String deserualizer is
'org.apache.kafka.common.serialization.StringDeserializer'
( https://github.com/apache/kafka/blob/0.10.0/clients/src/main/java/org/apache/kafka/common/serialization/StringDeserializer.java )
Kafka: 0.10.1.0 (Client & Server)
Java client.
Zookeeper: 3.4.6
Setup: Producer publishes messages. Sent messages on topic counted using ./kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9093 --topic TEST.TOPIC --time -1
Issue Consumer when polled while subscribing doesn't work but if you manually assign() - it works. There had been a separate thread on same question but no answer. It may be UUID issue but need more details as we are in evaluating phase and details would help.
Consumer Settings:
props.put("bootstrap.servers", servers);
props.put("enable.auto.commit", ENABLE_AUTO_COMMIT);
props.put("auto.commit.interval.ms", AUTO_COMMIT_INTERVAL_MS);
props.put("session.timeout.ms", SESSION_TIMEOUT_MS);
props.put("group.id", CONSUMER_GROUP_ID);
props.put("key.deserializer", STRING_DESRIALIZER);
props.put("value.deserializer", STRING_DESRIALIZER);
props.put("auto.offset.reset", "earliest");
Issue was with Version of Kafka.
Switched to 0.10.2.1 (server and client) and subscribe() worked flawlessly.
I want to benchmark Spark vs Flink, for this purpose I am making several tests. However Flink doesn't work with Kafka, meanwhile with Spark works perfect.
The code is very simple:
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
properties.setProperty("group.id", "myGroup")
println("topic: "+args(0))
val stream = env.addSource(new FlinkKafkaConsumer09[String](args(0), new SimpleStringSchema(), properties))
stream.print
env.execute()
I use kafka 0.9.0.0 with the same topics (in consumer[Flink] and producer[Kafka console]), but when I send my jar to the cluster, nothing happens:
Cluster Flink
What it could be happening?
Your stream.print will not print in console on flink .It will write to flink0.9/logs/recentlog. Other-wise you can add your own logger for confirming output.
For this particular case (a Source chained into a Sink) the Webinterface will never report Bytes/Records sent/received. Note that this will change in the somewhat near future.
Please check whether the job-/taskmanager logs do not contain any output.