Storm - KafkaSpout fails on open() - apache-kafka

For the past two days, I’ve been tying to implement a KafkaSpout within our topology. Here
are some important information.
All three services are running on the same instance. Kafka’s brokers use as by default the 9092
port, with advertised.listeners set to PLAINTEXT://localhost:9092. Zookeeper, uses the default
client port 2181. Whereas the Storm Nimbus host name has been set to localhost as well.
A custom Kafka Producer creates log messages successfully, whereas by using the zkCli
Zookeeper script I’ve seen that when using the /brokers path, the partitions and other relevant
information are stored correctly.
However, I keep getting the error when activating, and afterwards monitoring the topology.
Here is the source code of the Storm topology I’ve implemented:
BrokerHosts hosts = new ZkHosts("127.0.0.1:2181");
SpoutConfig spoutConfig = new SpoutConfig(hosts, "bytes", "/kafkastorm/", "bytes" + UUID.randomUUID().toString());
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConfig.zkServers = Arrays.asList("127.0.0.1");
spoutConfig.zkPort = 2181;
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("bytes", kafkaSpout);
builder.setBolt("byteSize", new KafkaByteProcessingBolt()).shuffleGrouping("bytes");
StormTopology topology = builder.createTopology();
Config config = new Config();
StormSubmitter.submitTopology("topology", config, topology);
However, the error message I keep getting when executing the bin/storm monitor <topology_name> -m bytes is the following:
Exception in thread "main" java.lang.IllegalArgumentException: stream: default not found
at org.apache.storm.utils.Monitor.metrics(Monitor.java:223)
at org.apache.storm.utils.Monitor.metrics(Monitor.java:159)
at org.apache.storm.command.monitor$_main.doInvoke(monitor.clj:36)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at org.apache.storm.command.monitor.main(Unknown Source)
Whereas by inspecting the logs of the workers (the worker.log file), I’ve concluded that
the KafkaSpout fails on the open() method.
java.lang.NoClassDefFoundError: org/apache/curator/RetryPolicy
at org.apache.storm.kafka.KafkaSpout.open(KafkaSpout.java:75) ~[storm-kafka-1.0.2.jar:1.0.2]
at org.apache.storm.daemon.executor$fn__7990$fn__8005.invoke(executor.clj:604) ~[storm-core-1.0.2.jar:1.0.2]
at org.apache.storm.util$async_loop$fn__624.invoke(util.clj:482) [storm-core-1.0.2.jar:1.0.2]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
Caused by: java.lang.ClassNotFoundException: org.apache.curator.RetryPolicy
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_101]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_101]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[?:1.8.0_101]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_101]
... 5 more
Could someone explain what might be the reason for the KafkaSpout to fail on the
open() method?
I would really appreciate for your help!

From the error "java.lang.NoClassDefFoundError: org/apache/curator/RetryPolicy" it appears that curator-client.jar is missing.
Please check if below link helps you ?
https://github.com/abhinavg6/Kafka-Storm-Conscomp/issues/1

Related

Failed to construct kafka producer: No resolvable bootstrap urls given in bootstrap.servers (Intermittent issue)

I am getting intermittent issues while accessing the kafka service from the Kubernetes pod.
org.apache.kafka.common.KafkaException: Failed to construct kafka producer
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:432)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:298)
at oracle.fs.framework.core.transports.event.kafka.KafkaFactory.createProducer(KafkaFactory.java:29)
at oracle.fs.framework.core.transports.event.kafka.stream.KafkaStreamEventTransport.start(KafkaStreamEventTransport.java:165)
at oracle.fs.framework.core.service.TransportLifecycleHandler.start(TransportLifecycleHandler.java:57)
at oracle.fs.framework.core.service.AbstractService.start(AbstractService.java:400)
at oracle.fs.foundation.bootstrap.Bootstrap.startDomain(Bootstrap.java:555)
at oracle.fs.foundation.bootstrap.Bootstrap.startDomain(Bootstrap.java:188)
at oracle.fs.foundation.bootstrap.Bootstrap.startDomain(Bootstrap.java:147)
at oracle.fs.foundation.bootstrap.Bootstrap.startDomain(Bootstrap.java:102)
at oracle.fs.service.driver.DomainServiceDriver.startService(DomainServiceDriver.java:24)
at oracle.fs.service.driver.DomainServiceDriver.main(DomainServiceDriver.java:19)
Caused by: org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:88)
at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:47)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:407)
I am not sure which causes this issue intermittently as I've found the bootstrap urls are correct.
Its the Format of spring.kafka.bootstrap-servers=port:9092
sample: spring.kafka.bootstrap-servers=localhost:9092 or spring.kafka.bootstrap-servers=127.0.0.1:9092, etc

Kafka Sink: ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectStandalone:130)

Am trying to stream data from one stream file to another file. It was working earlier and suddenly it providing the error as ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectStandalone:130). Have restarted the zookeeper, kafka-server, schema-registry, source and sink connectors, but still am facing same issue and unable to resolve it. Any suggestion would be helpful.
Source connector:
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=/home/jimmacaulay/Desktop/ETL/Kafka/confluent-5.5.1/data/data/Jim_Source.csv
topic=Jim
Sink connector:
name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=/home/jimmacaulay/Desktop/ETL/Kafka/confluent-5.5.1/data/data/Jim_Sink.csv
topics=Jim
Error:
[2020-08-18 06:25:50,482] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectStandalone:130)
org.apache.kafka.connect.errors.ConnectException: Unable to initialize REST server
at org.apache.kafka.connect.runtime.rest.RestServer.initializeServer(RestServer.java:217)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:87)
Caused by: java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:8083
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:346)
at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:307)
at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:231)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
at org.eclipse.jetty.server.Server.doStart(Server.java:385)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
at org.apache.kafka.connect.runtime.rest.RestServer.initializeServer(RestServer.java:215)
... 1 more
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:220)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:85)
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:342)
... 8 more
Resolved the error by starting the connect-standalone using source and sink properties together.
sh connect-standalone ../config/connect-avro-standalone.properties ../config/connect-file-source_Topic_Jim.properties ../config/connect-file-sink_Topic_Jim.properties
Earlier i was starting it separately as below,
sh connect-standalone ../config/connect-avro-standalone.properties ../config/connect-file-source_Topic_Jim.properties
sh connect-standalone ../config/connect-avro-standalone.properties ../config/connect-file-sink_Topic_Jim.properties
Cause of the issue,
When am starting separately connect-standalone is getting started first for source properties using the port number 8083. Again when am starting the sink properties, it tries to use same port number and failes.
Solutions,
Both source and sink properties should be while starting connect-standalone which shares the same port.
Or define the different port numbers in the properties file and start it separately

Flink with kafka issue: Timeout expired while fetching topic metadata

I tried submitting the simple flink job to accept messages from kafka, but after submitting the job, within less than a minute, the job fails with the following kafka exception. I have kafka 2.12 running on my local machine and I have configured the topic which this job consumes from.
public static void main(String[] args) throws Exception {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> kafkaData = env
.addSource(new FlinkKafkaConsumer<String>("test-topic",
new SimpleStringSchema(), properties));
kafkaData.print();
env.execute("Aggregation Job");
}
Here's the exception:
Job has been submitted with JobID 5cc30fe72f685406126e2f5a26f10341
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 5cc30fe72f685406126e2f5a26f10341)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
...
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
I saw another question in stackoverflow, but that does not resolve the problem. I have not configured any SSL on the kafka broker. Any suggestions would be appreciated.
I had this same issue today. In my case, the problem was that I failed to put my flink application in a VPC (my MSK cluster lives in a VPC). After editing the flink application and moving it into the appropriate VPC, the problem went away.
I realize this question is a few months old, but I figured I'd post my findings in case anyone else happens to come across this from a Google search like I did.

java.lang.RuntimeException for Flink consumer connecting to Kafka cluster with multiple partitions

Flink Version 1.9.0
Scala Version 2.11.12
Kafka Cluster Version 2.3.0
I am trying to connect a flink job I made to a kafka cluster that has 3 partitions. I have tested my job against a kafka cluster topic running on my localhost that has one partition and it works to read and write to the local kafka. When I attempt to connect to a topic that has multiple partitions I get the following error (topicName is the name of the topic I am trying to consume. Weirdly I dont have any issues when I am trying to produce to a multi-partition topic.
java.lang.RuntimeException: topicName
at org.apache.flink.streaming.connectors.kafka.internal.KafkaPartitionDiscoverer.getAllPartitionsForTopics(KafkaPartitionDiscoverer.java:80)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.discoverPartitions(AbstractPartitionDiscoverer.java:131)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.open(FlinkKafkaConsumerBase.java:508)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:529)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:393)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
My consumer code looks like this:
def defineKafkaDataStream[A: TypeInformation](topic: String,
env: StreamExecutionEnvironment,
SASL_username:String,
SASL_password:String,
kafkaBootstrapServer: String = "localhost:9092",
zookeeperHost: String = "localhost:2181",
groupId: String = "test"
)(implicit c: JsonConverter[A]): DataStream[A] = {
val properties = new Properties()
properties.setProperty("bootstrap.servers", kafkaBootstrapServer)
properties.setProperty("security.protocol" , "SASL_SSL")
properties.setProperty("sasl.mechanism" , "PLAIN")
val jaasTemplate = "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"%s\" password=\"%s\";"
val jaasConfig = String.format(jaasTemplate, SASL_username, SASL_password)
properties.setProperty("sasl.jaas.config", jaasConfig)
properties.setProperty("group.id", "MyConsumerGroup")
env
.addSource(new FlinkKafkaConsumer(topic, new JSONKeyValueDeserializationSchema(true), properties))
.map(x => x.convertTo[A](c))
}
Is there another property I should be setting to allow for a single job to consume from multiple partitions?
After digging around and questioning everything in my process I found the issue.
I looked at the Java code of the KafkaPartitionDiscoverer function that had the runtime exception.
One section I noticed handled RuntimeException
if (kafkaPartitions == null) {
throw new RuntimeException("Could not fetch partitions for %s. Make sure that the topic exists.".format(topic));
}
I was working off of a kafka cluster that I dont maintain and had a topic name that was given to me that I did not verify first. When I did verify it using:
kafka-topics --describe --zookeeper serverIP:2181 --topic topicName
It returned a response of :
Error while executing topic command : Topics in [] does not exist
ERROR java.lang.IllegalArgumentException: Topics in [] does not exist
at kafka.admin.TopicCommand$.kafka$admin$TopicCommand$$ensureTopicExists(TopicCommand.scala:435)
at kafka.admin.TopicCommand$ZookeeperTopicService.describeTopic(TopicCommand.scala:350)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:66)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
After I got the correct topic name everything works.

Kafka Stream failed to rebalance on startup

I'm trying to run a Kafka stream application and I'm running into the following exception when kafka stream starts.
Here the configurations that I'm using on Kafka 0.10.1
final Map<String, String> properties = new HashMap<>();
properties.put(StreamsConfig.APPLICATION_ID_CONFIG, "some app id");
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
properties.put(StreamsConfig.CLIENT_ID_CONFIG, "some app id");
properties.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "localhost:2181");
properties.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, "org.apache.kafka.common.serialization.Serdes$StringSerde");
properties.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, "org.apache.kafka.common.serialization.Serdes$StringSerde");
The exception that I'm getting.
Exception in thread "StreamThread-1" org.apache.kafka.streams.errors.StreamsException: stream-thread [StreamThread-1] Failed to rebalance
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:410)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:242)
Caused by: java.lang.NullPointerException
at java.io.File.<init>(File.java:360)
at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:157)
at org.apache.kafka.streams.state.internals.RocksDBStore.init(RocksDBStore.java:163)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:85)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.init(CachingKeyValueStore.java:62)
at org.apache.kafka.streams.processor.internals.AbstractTask.initializeStateStores(AbstractTask.java:81)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:120)
at org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:633)
at org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:660)
at org.apache.kafka.streams.processor.internals.StreamThread.access$100(StreamThread.java:69)
at org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:124)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:407)
... 1 more
Seems like I need to set state.dir but that doesn't seem to work. By default on Kafka 0.10.1, it already defaults to /tmp/kafka-streams. Anyone have any ideas?