I am using:
flink 1.1.2
Kafka 2.10-0.10.0.1
flink-connector-kafka-0.9.2.10-1.0.0
I am using the following very simple/basic app
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:33334");
properties.setProperty("partition.assignment.strategy", "org.apache.kafka.clients.consumer.RangeAssignor");
properties.setProperty("group.id", "test");
String topic = "mytopic";
FlinkKafkaConsumer09<String> fkc =
new FlinkKafkaConsumer09<String>(topic, new SimpleStringSchema(), properties);
DataStream<String> stream = env.addSource(fkc);
env.execute()
After compiling it using maven and when I try to run using the following command:
bin/flink run -c com.mycompany.app.App fkaf/target/fkaf-1.0-SNAPSHOT.jar
I see the following runtime error:
Submitting job with JobID: f6e290ec7c28f66d527eaa5286c00f4d. Waiting for job completion.
Connected to JobManager at Actor[akka.tcp://flink#127.0.0.1:6123/user/jobmanager#-1679485245]
10/12/2016 15:10:06 Job execution switched to status RUNNING.
10/12/2016 15:10:06 Source: Custom Source(1/1) switched to SCHEDULED
10/12/2016 15:10:06 Source: Custom Source(1/1) switched to DEPLOYING
10/12/2016 15:10:06 Map -> Sink: Unnamed(1/1) switched to SCHEDULED
10/12/2016 15:10:06 Map -> Sink: Unnamed(1/1) switched to DEPLOYING
10/12/2016 15:10:06 Source: Custom Source(1/1) switched to RUNNING
10/12/2016 15:10:06 Map -> Sink: Unnamed(1/1) switched to RUNNING
10/12/2016 15:10:06 Map -> Sink: Unnamed(1/1) switched to CANCELED
10/12/2016 15:10:06 Source: Custom Source(1/1) switched to FAILED
java.lang.NoSuchMethodError: org.apache.kafka.clients.consumer.KafkaConsumer.assign(Ljava/util/List;)V
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09.open(FlinkKafkaConsumer09.java:282)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:91)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:376)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:256)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:722)
Any idea on why the method assign() is not being found? The method is there in the
lib/kafka-clients-0.10.0.1.jar.
ParameterTool parameterTool = ParameterTool.fromArgs(args);
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer09<String>(parameterTool.getRequired("topic"), new SimpleStringSchema(), parameterTool.getProperties()));
// print() will write the contents of the stream to the TaskManager's standard out stream
// the rebelance call is causing a repartitioning of the data so that all machines
// see the messages (for example in cases when "num kafka partitions" < "num flink operators"
messageStream.rebalance().map(new MapFunction<String, String>() {
private static final long serialVersionUID = -6867736771747690202L;
#Override
public String map(String value) throws Exception {
return "Kafka and Flink says: " + value;
}
}).print();
env.execute();
A NoSuchMethodError indicates a version mismatch.
I would guess the issue is that you try to connect a Kafka 0.9 consumer to a Kafka 0.10 instance. Flink 1.1.x does not provide a Kafka 0.10 consumer. However, a 0.10 consumer will be included in the upcoming 1.2.0 release.
You could try to build the Kafka 0.10 consumer yourself from the current master branch (1.2-SNAPSHOT) and use that one with Flink 1.1.2. The corresponding Flink APIs should be stable and backwards compatible from 1.2 to 1.1.
Related
I have a simple stream execution configured as:
val config: Configuration = new Configuration()
config.setString("taskmanager.memory.managed.size", "4g")
config.setString("parallelism.default", "4")
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(config)
env
.fromSource(KafkaSource.builder[String]
.setBootstrapServers("node1:9093,node2:9093,node3:9093")
.setTopics("example-topic")
//.setProperties(kafkaProps) // didn't work
.setProperty("security.protocol", "SASL_SSL")
.setProperty("sasl.mechanism", "GSSAPI")
.setProperty("sasl.kerberos.service.name", "kafka")
.setProperty("group.id","groupid-test")
//.setGroupId("groupid-test") // didn't work
.setStartingOffsets(OffsetsInitializer.earliest)
.setProperty("partition.discovery.interval.ms", "60000") // discover part
.setDeserializer(KafkaRecordDeserializationSchema.valueOnly(classOf[StringDeserializer]))
.build,
WatermarkStrategy.noWatermarks[String],
"example-input-topic"
)
.print
env.execute("asdasd")
My flink version is: 1.14.2
My kafka is running on cloudera. Kafka version: 2.2.1-cdh6.3.2
Am able to consume records from Kafka. But it doesnt set groupid for topic. Does anyone has any ideas?
Since Flink 1.14.0, the group.id is an optional value. See https://issues.apache.org/jira/browse/FLINK-24051. You can set your own value if you want to specify one. You can see from the accompanying PR how this was previously set at https://github.com/apache/flink/pull/17052/files#diff-34b4ff8d43271eeac91ba17f29b13322f6e0ff3d15f71003a839aeb780fe30fbL56
I have defined a basic Storm topology with spout consumer from Kafka (producer is created in Kafka separate module). However, when I run the application I get this error:
java.lang.RuntimeException: org.apache.kafka.common.errors.InvalidGroupIdException: To use the group management or offset commit APIs, you must provide a valid group.id in the consumer configuration.
at org.apache.storm.utils.Utils$1.run(Utils.java:407) ~[storm-client-2.1.0.jar:2.1.0]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221]
Caused by: org.apache.kafka.common.errors.InvalidGroupIdException: To use the group management or offset commit APIs, you must provide a valid group.id in the consumer configuration.
How can I set up group id? I am running Storm locally with 2.1.0 version.
Here is the code for the topology:
val cluster = new LocalCluster()
val bootstrapServers = "localhost:9092"
val brokerHosts = new ZkHosts(bootstrapServers)
val topologyBuilder = new TopologyBuilder()
val spoutConfig = KafkaSpoutConfig.builder(bootstrapServers, "tweets").build()
topologyBuilder.setSpout("kafka_spout", new KafkaSpout(spoutConfig), 1)
val config = new Config()
cluster.submitTopology("kafkaTest", config, topologyBuilder.createTopology())
You should use setProp(java.lang.String, java.lang.Object) with ConsumerConfig.GROUP_ID_CONFIG to add the consumer group id on the KafkaSpoutConfig
Flink Version 1.9.0
Scala Version 2.11.12
Kafka Cluster Version 2.3.0
I am trying to connect a flink job I made to a kafka cluster that has 3 partitions. I have tested my job against a kafka cluster topic running on my localhost that has one partition and it works to read and write to the local kafka. When I attempt to connect to a topic that has multiple partitions I get the following error (topicName is the name of the topic I am trying to consume. Weirdly I dont have any issues when I am trying to produce to a multi-partition topic.
java.lang.RuntimeException: topicName
at org.apache.flink.streaming.connectors.kafka.internal.KafkaPartitionDiscoverer.getAllPartitionsForTopics(KafkaPartitionDiscoverer.java:80)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.discoverPartitions(AbstractPartitionDiscoverer.java:131)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.open(FlinkKafkaConsumerBase.java:508)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:529)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:393)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
My consumer code looks like this:
def defineKafkaDataStream[A: TypeInformation](topic: String,
env: StreamExecutionEnvironment,
SASL_username:String,
SASL_password:String,
kafkaBootstrapServer: String = "localhost:9092",
zookeeperHost: String = "localhost:2181",
groupId: String = "test"
)(implicit c: JsonConverter[A]): DataStream[A] = {
val properties = new Properties()
properties.setProperty("bootstrap.servers", kafkaBootstrapServer)
properties.setProperty("security.protocol" , "SASL_SSL")
properties.setProperty("sasl.mechanism" , "PLAIN")
val jaasTemplate = "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"%s\" password=\"%s\";"
val jaasConfig = String.format(jaasTemplate, SASL_username, SASL_password)
properties.setProperty("sasl.jaas.config", jaasConfig)
properties.setProperty("group.id", "MyConsumerGroup")
env
.addSource(new FlinkKafkaConsumer(topic, new JSONKeyValueDeserializationSchema(true), properties))
.map(x => x.convertTo[A](c))
}
Is there another property I should be setting to allow for a single job to consume from multiple partitions?
After digging around and questioning everything in my process I found the issue.
I looked at the Java code of the KafkaPartitionDiscoverer function that had the runtime exception.
One section I noticed handled RuntimeException
if (kafkaPartitions == null) {
throw new RuntimeException("Could not fetch partitions for %s. Make sure that the topic exists.".format(topic));
}
I was working off of a kafka cluster that I dont maintain and had a topic name that was given to me that I did not verify first. When I did verify it using:
kafka-topics --describe --zookeeper serverIP:2181 --topic topicName
It returned a response of :
Error while executing topic command : Topics in [] does not exist
ERROR java.lang.IllegalArgumentException: Topics in [] does not exist
at kafka.admin.TopicCommand$.kafka$admin$TopicCommand$$ensureTopicExists(TopicCommand.scala:435)
at kafka.admin.TopicCommand$ZookeeperTopicService.describeTopic(TopicCommand.scala:350)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:66)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
After I got the correct topic name everything works.
I am trying to run flink job as below to read data from Apache Kafka & print:
Java Program
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "test.net:9092");
properties.setProperty("group.id", "flink_consumer");
properties.setProperty("zookeeper.connect", "dev.com:2181,dev2.com:2181,dev.com:2181/dev2");
properties.setProperty("topic", "topic_name");
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer082<>("topic_name", new SimpleStringSchema(), properties));
messageStream.rebalance().map(new MapFunction<String, String>() {
private static final long serialVersionUID = -6867736771747690202L;
public String map(String value) throws Exception {
return "Kafka and Flink says: " + value;
}
}).print();
env.execute();
Scala Code
var properties = new Properties();
properties.setProperty("bootstrap.servers", "msg01.staging.bigdata.sv2.247-inc.net:9092");
properties.setProperty("group.id", "flink_consumer");
properties.setProperty("zookeeper.connect", "host33.dev.swamp.sv2.tellme.com:2181,host37.dev.swamp.sv2.tellme.com:2181,host38.dev.swamp.sv2.tellme.com:2181/staging_sv2");
properties.setProperty("topic", "sv2.staging.rtdp.idm.events.omnichannel");
var env = StreamExecutionEnvironment.getExecutionEnvironment();
var stream:DataStream[(String)] = env
.addSource(new FlinkKafkaConsumer082[String]("sv2.staging.rtdp.idm.events.omnichannel", new SimpleStringSchema(), properties));
stream.print();
env.execute();
Whenever I run this in app in eclipse, I see below out to start with:
03/27/2017 20:06:19 Job execution switched to status RUNNING.
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(1/4) switched to SCHEDULED
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(1/4) switched to DEPLOYING
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(2/4) switched to SCHEDULED
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(2/4) switched to DEPLOYING
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(3/4) switched to SCHEDULED
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(3/4) switched to DEPLOYING
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(4/4) switched to SCHEDULED
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(4/4) switched to DEPLOYING
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(4/4) switched to RUNNING
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(2/4) switched to RUNNING
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(1/4) switched to RUNNING
03/27/2017 20:06:19 Source: Custom Source -> Sink: Unnamed(3/4) switched to RUNNING
Question I have is:
1) Why am I seeing 4 instance of sink in all the cases(Scheduled, deployed and running).
2) For every line received in Apache Kafka, I see being printed here multiple times mostly 4 times. What's a reason?
Ideally I want to read each lines only once and do further processing with it. Any input/help will be appreciable!
If you run the program in the LocalStreamEnvironment (which you get when you call StreamExecutionEnvironment.getExecutionEnvironment() in an IDE) the default parallelism of all operators is equal to the number of CPU cores.
So in your example each operator is parallelized into four subtasks. In the log you see message for each of these four subtasks (3/4 indicates this is the third of in total four tasks).
You can control the number of subtasks by calling StreamExecutionEnvironment.setParallelism(int) or call setParallelism(int) on each individual operator.
Given your program, the Kafka records should not be replicated. Each record should only be printed once. However, since the records are written in parallel, line of output is prefixed by x> where x indicates the id of the parallel subtask that emitted the line.
I am using Confluent 3.0.1 platform on windows. I followed installation guide and developer guide to do all installation and develop my Topology.
I started Zookeeper, then Kafka server and try to run my topology. But getting below error on Kafka server. Even if I create topic manually and run the topology I see same error.
INFO Topic creation {"version":1,"partitions":{"0":[0]}} (kafka.admin.AdminUtils$)
[2016-09-21 17:20:08,807] INFO [KafkaApi-0] Auto creation of topic Text4 with 1 partitions and replication factor 1 is successful (kafka.server.KafkaApis)
[2016-09-21 17:20:09,436] ERROR [KafkaApi-0] Error when handling request {group_id=my-first-streams-application1} (kafka.server.KafkaApis)
kafka.admin.AdminOperationException: replication factor: 3 larger than available brokers: 1
at kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:117)
at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:403)
at kafka.server.KafkaApis.kafka$server$KafkaApis$$createTopic(KafkaApis.scala:629)
at kafka.server.KafkaApis.kafka$server$KafkaApis$$createGroupMetadataTopic(KafkaApis.scala:651)
at kafka.server.KafkaApis$$anonfun$getOrCreateGroupMetadataTopic$1.apply(KafkaApis.scala:657)
at kafka.server.KafkaApis$$anonfun$getOrCreateGroupMetadataTopic$1.apply(KafkaApis.scala:657)
at scala.Option.getOrElse(Option.scala:121)
at kafka.server.KafkaApis.getOrCreateGroupMetadataTopic(KafkaApis.scala:657)
at kafka.server.KafkaApis.handleGroupCoordinatorRequest(KafkaApis.scala:818)
at kafka.server.KafkaApis.handle(KafkaApis.scala:86)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:745)
And my topology code is below:
public class MetricTopology implements InitializingBean {
#Autowired
#Qualifier("getStreamsConfig")
private Properties properties;
// Method to build topology.
public void buildTopology() {
System.out.println("MetricTopology.buildTopology()");
TopologyBuilder builder = new TopologyBuilder();
// add the source processor node that takes Kafka topic "Text4" as input
builder.addSource("Source", "Text4")
// add the Metricsprocessor node which takes the source processor as its upstream processor
.addProcessor("Process", () -> new MetricsProcessor(), "Source");
// Building Stream.
KafkaStreams streams = new KafkaStreams(builder, properties);
streams.start();
}
// Called after all properties are set.
public void afterPropertiesSet() throws Exception {
buildTopology();
}
}
Below are the Properties which I am using which is part of different java source file.
Properties settings = new Properties();
// Set a few key parameters. This properties will be picked from property file.
settings.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-first-streams-application1");
settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
settings.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "localhost:2181");
settings.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
settings.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
settings.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, "1");
kafka.admin.AdminOperationException: replication factor: 3 larger than available brokers: 1
In order to create a topic with replication factor 3, you need at least 3 running brokers.