Apache Flink KafkaSource doesnt set group.id - apache-kafka

I have a simple stream execution configured as:
val config: Configuration = new Configuration()
config.setString("taskmanager.memory.managed.size", "4g")
config.setString("parallelism.default", "4")
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(config)
env
.fromSource(KafkaSource.builder[String]
.setBootstrapServers("node1:9093,node2:9093,node3:9093")
.setTopics("example-topic")
//.setProperties(kafkaProps) // didn't work
.setProperty("security.protocol", "SASL_SSL")
.setProperty("sasl.mechanism", "GSSAPI")
.setProperty("sasl.kerberos.service.name", "kafka")
.setProperty("group.id","groupid-test")
//.setGroupId("groupid-test") // didn't work
.setStartingOffsets(OffsetsInitializer.earliest)
.setProperty("partition.discovery.interval.ms", "60000") // discover part
.setDeserializer(KafkaRecordDeserializationSchema.valueOnly(classOf[StringDeserializer]))
.build,
WatermarkStrategy.noWatermarks[String],
"example-input-topic"
)
.print
env.execute("asdasd")
My flink version is: 1.14.2
My kafka is running on cloudera. Kafka version: 2.2.1-cdh6.3.2
Am able to consume records from Kafka. But it doesnt set groupid for topic. Does anyone has any ideas?

Since Flink 1.14.0, the group.id is an optional value. See https://issues.apache.org/jira/browse/FLINK-24051. You can set your own value if you want to specify one. You can see from the accompanying PR how this was previously set at https://github.com/apache/flink/pull/17052/files#diff-34b4ff8d43271eeac91ba17f29b13322f6e0ff3d15f71003a839aeb780fe30fbL56

Related

Kafka TimeoutException: Topic not present in metadata after 60000 ms

I'm trying out with some Kafka basics and following examples at https://kafka.apache.org/quickstart. After starting zookeepier and kafka, I tried producing and consuming with included kafka shell scripts and it all worked without issue.
When I try to produce message from simple scala application then I get following error org.apache.kafka.common.errors.TimeoutException: Topic quickstart-events not present in metadata after 60000 ms.
I ensured the topic has been created and can telnet to localhost:9092 as well.
Here's the code I'm using for producer:
val props = new Properties()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
props.put(ProducerConfig.CLIENT_ID_CONFIG, "test")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getName)
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, classOf[StringSerializer].getName)
val producer = new KafkaProducer[String, String](props)
producer.send(new ProducerRecord[String, String]("quickstart-events", "1", "some event")).get()
Running this on mac, above code is part of a test case executed in IntelliJ.
Solved. I used kafka-clients library version 2.6.0 and running kafka server version 3.2.0. Matching version of the library fixed the issue.
I got this problem as well, the version is correct for me.
I figure out it's the lack of sasl certification.
try:
// set SASL configuration here
props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, “SASL_PLAINTEXT”);
props.put(SaslConfigs.SASL_MECHANISM, “PLAIN”);
props.put(“sasl.jaas.config”,
“org.apache.kafka.common.security.plain.PlainLoginModule required username=\”alice\” password=\”123456\”;”);

InvalidGroupIdException for Kafka spout in Storm

I have defined a basic Storm topology with spout consumer from Kafka (producer is created in Kafka separate module). However, when I run the application I get this error:
java.lang.RuntimeException: org.apache.kafka.common.errors.InvalidGroupIdException: To use the group management or offset commit APIs, you must provide a valid group.id in the consumer configuration.
at org.apache.storm.utils.Utils$1.run(Utils.java:407) ~[storm-client-2.1.0.jar:2.1.0]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221]
Caused by: org.apache.kafka.common.errors.InvalidGroupIdException: To use the group management or offset commit APIs, you must provide a valid group.id in the consumer configuration.
How can I set up group id? I am running Storm locally with 2.1.0 version.
Here is the code for the topology:
val cluster = new LocalCluster()
val bootstrapServers = "localhost:9092"
val brokerHosts = new ZkHosts(bootstrapServers)
val topologyBuilder = new TopologyBuilder()
val spoutConfig = KafkaSpoutConfig.builder(bootstrapServers, "tweets").build()
topologyBuilder.setSpout("kafka_spout", new KafkaSpout(spoutConfig), 1)
val config = new Config()
cluster.submitTopology("kafkaTest", config, topologyBuilder.createTopology())
You should use setProp(java.lang.String, java.lang.Object) with ConsumerConfig.GROUP_ID_CONFIG to add the consumer group id on the KafkaSpoutConfig

Kafka 1.0.0 admin client cannot create topic with EOFException

Using the 1.0.0 Kafka admin client I wish to programmatically create a topic on the broker. I happen to be using Scala. I've tried using the following code to either create a topic on the Kafka broker or simply to list the available topics
import org.apache.kafka.clients.admin.{AdminClient, ListTopicsOptions, NewTopic}
import scala.collection.JavaConverters._
val zkServer = "localhost:2181"
val topic = "test1"
val zookeeperConnect = zkServer
val sessionTimeoutMs = 10 * 1000
val connectionTimeoutMs = 8 * 1000
val partitions = 1
val replication:Short = 1
val topicConfig = new Properties() // add per-topic configurations settings here
import org.apache.kafka.clients.admin.AdminClientConfig
val config = new Properties
config.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, zkServer)
val admin = AdminClient.create(config)
val existing = admin.listTopics(new ListTopicsOptions().timeoutMs(500).listInternal(true))
val nms = existing.names()
nms.get().asScala.foreach(nm => println(nm)) // nms.get() fails
val newTopic = new NewTopic(topic, partitions, replication)
newTopic.configs(Map[String,String]().asJava)
val ret = admin.createTopics(List(newTopic).asJavaCollection)
ret.all().get() // Also fails
admin.close()
With either command, the ZooKeeper (3.4.10) side throws an EOFException and closes the connection. Debugging the ZooKeeper side itself, it seems it is unable to deserialize the message that the admin client is sending (it runs out of bytes it is trying to read)
Anyone able to make the 1.0.0 Kafka admin client work for creating or listing topics?
The AdminClient directly connects to Kafka and does not need access to Zookeeper.
You need to set AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG to point to your Kafka brokers (for example localhost:9092) instead of Zookeeper.

does pyspark support spark-streaming-kafka-0-10 lib?

my kafka cluster version is 0.10.0.0, and i want to use pyspark stream to read kafka data. but in Spark Streaming + Kafka Integration Guide, http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
there is no python code example.
so can pyspark use spark-streaming-kafka-0-10 to integrate kafka?
Thank you in advance for your help !
I also use spark streaming with Kafka 0.10.0 cluster. After adding following line to your code, you are good to go.
spark.jars.packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0
And here a sample in python:
# Initialize SparkContext
sc = SparkContext(appName="sampleKafka")
# Initialize spark stream context
batchInterval = 10
ssc = StreamingContext(sc, batchInterval)
# Set kafka topic
topic = {"myTopic": 1}
# Set application groupId
groupId = "myTopic"
# Set zookeeper parameter
zkQuorum = "zookeeperhostname:2181"
# Create Kafka stream
kafkaStream = KafkaUtils.createStream(ssc, zkQuorum, groupId, topic)
#Do as you wish with your stream
# Start stream
ssc.start()
ssc.awaitTermination()
You can use spark-streaming-kafka-0-8 when your brokers are 0.10 and later. spark-streaming-kafka-0-8 supports newer brokers versions while streaming-kafka-0-10 does not support older broker versions. streaming-kafka-0-10 as of now is still experimental and has no Python support.

Flink with Kafka Consumer doesn't work

I want to benchmark Spark vs Flink, for this purpose I am making several tests. However Flink doesn't work with Kafka, meanwhile with Spark works perfect.
The code is very simple:
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val properties = new Properties()
properties.setProperty("bootstrap.servers", "localhost:9092")
properties.setProperty("group.id", "myGroup")
println("topic: "+args(0))
val stream = env.addSource(new FlinkKafkaConsumer09[String](args(0), new SimpleStringSchema(), properties))
stream.print
env.execute()
I use kafka 0.9.0.0 with the same topics (in consumer[Flink] and producer[Kafka console]), but when I send my jar to the cluster, nothing happens:
Cluster Flink
What it could be happening?
Your stream.print will not print in console on flink .It will write to flink0.9/logs/recentlog. Other-wise you can add your own logger for confirming output.
For this particular case (a Source chained into a Sink) the Webinterface will never report Bytes/Records sent/received. Note that this will change in the somewhat near future.
Please check whether the job-/taskmanager logs do not contain any output.