NimbusLeaderNotFoundException in Apache Storm UI - scala

I am trying to launch Storm ui for streaming application, however I constantly get this error:
org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [localhost]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:250)
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:179)
at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:138)
at org.apache.storm.daemon.ui.resources.StormApiResource.getClusterConfiguration(StormApiResource.java:116)
I have launched storm locally using storm script for starting nimbus, submitting jar and polling ui. What could be the reason of it?
Here is the code with connection setup:
val cluster = new LocalCluster()
val bootstrapServers = "localhost:9092"
val spoutConfig = KafkaTridentSpoutConfig.builder(bootstrapServers, "tweets")
.setProp(props)
.setFirstPollOffsetStrategy(FirstPollOffsetStrategy.LATEST)
.build()
val config = new Config()
cluster.submitTopology("kafkaTest", config, tridentTopology.build())

When you submit to a real cluster using storm jar, you should not use LocalCluster. Instead use the StormSubmitter class.
The error you're getting is saying that it can't find Nimbus at localhost. Are you sure Nimbus is running on the machine you're running storm jar from? If so, please post the commands you're running, and maybe also check the Nimbus log.

Related

MSK, IAM, and Kafka Java Api

So for some reason I can't get my connections just right with MSK via the Kafka Java API. I can get producers/consumers to work with MSK using conduktor and Kafka CLI tools. However when I try to hook up my Scala code I can't get it to work. So I am using the config as follows to connect via conduktor and Kafka CLI tools.
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandle
and for my Scala application I am setting up producers/consumers using a similar pattern
def props: Properties = {
val p = new Properties()
....
p.setProperty("security.protocol", "SASL_SSL")
p.setProperty("sasl.mechanism", "AWS_MSK_IAM")
p.setProperty("sasl.jaas.config", "software.amazon.msk.auth.iam.IAMLoginModule required;")
p.setProperty("sasl.client.callback.handler.class", "software.amazon.msk.auth.iam.IAMClientCallbackHandler")
p
}
val PRODUCER = new KafkaConsumer[AnyRef, AnyRef](props)
So the code works when I omit the security config lines and run against a local instance of Kafka but when I try to hit the MSK it seems like it isn't constructing a consumer and I get the following error.
java.lang.IllegalStateException: You can only check the position for partitions assigned to this consumer.
However, the locally running instance works. So this makes me think I'm not setting up something correctly in the config to connect to the MSK.
I am trying to follow the following tutorial and I am using Scala 2.11 and Kafka versions 2.41. I also added the aws-msk-iam-auth to my build.sbt (1.1.0). Any thoughts or solutions?
This turned out to not be a problem with my AWS connection as I implemented some logging as explained here. My problem lies in the difference between my local running version of Kafka and MSK. I am still trying to understand the differences.

Consume from Kafka 0.10.x topic using Storm 0.10.x (KafkaSpout)

I am not sure if this a right question to ask in this forum. We were consuming from a Kafka topic by Storm using the Storm KafkaSpout connector. It was working fine till now. Now we are supposed to connect to a new Kafka cluster having upgraded version 0.10.x from the same Storm env which is running on version 0.10.x.
From storm documentation (http://storm.apache.org/releases/1.1.0/storm-kafka-client.html) I can see that storm 1.1.0 is compatible with Kafka 0.10.x onwards supporting the new Kafka consumer API. But in that case I won't be able to run the topology in my end (please correct me if I am wrong).
Is there any work around for this?
I have seen that even if the New Kafka Consumer API has removed ZooKeeper dependency but we can still consume message from it using the old Kafka-console-consumer.sh by passing the --zookeeper flag instead of new –bootstrap-server flag (recommended). I run this command from using Kafka 0.9 and able to consume from a topic hosted on Kafka 0.10.x
When we are trying to connect getting the below exception:
java.lang.RuntimeException: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /brokers/topics/mytopic/partitions
at storm.kafka.DynamicBrokersReader.getBrokerInfo(DynamicBrokersReader.java:81) ~[stormjar.jar:?]
at storm.kafka.trident.ZkBrokerReader.<init>(ZkBrokerReader.java:42) ~[stormjar.jar:?]
But we are able to connect to the remote ZK server and validated that the path exists:
./zkCli.sh -server remoteZKServer:2181
[zk: remoteZKServer:2181(CONNECTED) 5] ls /brokers/topics/mytopic/partitions
[3, 2, 1, 0]
As we can see above that it's giving us expected output as the topic has 4 partitions in it.
At this point have the below questions:
1) Is it at all possible to connect to Kafka 0.10.x using Storm version 0.10.x ? Has one tried this ?
2) Even if we are able to consume, do we need to make any code change in order to retrieve the message offset in case of topology shutdown/restart. I am asking this as we will passing the Zk cluster details instead of the brokers info as supported in old KafkaSpout version.
Running out of options here, any pointers would be highly appreciated
UPDATE:
We are able to connect and consume from the remote Kafka topic while running it locally using eclipse. To make sure storm does not uses the in-memory zk we have used the overloaded constructor LocalCluster("zkServer",port), it's working fine and we can see the data coming. This lead us to conclude that version compatibility might not be the issue here.
However still no luck when deployed the topology in cluster.
We have verified the connectivity from storm box to zkservers
The znode seems fine also ..
At this point really need some pointers here, what could possibly be wrong with this and how do we debug that? Never worked with Kafka 0.10x before so not sure what exactly are we missing.
Really appreciate some help and suggestions
Storm 0.10x is compatible with Kafka 0.10x . We can still uses the old KafkaSpout that depends on zookeeper based offset storage mechanism.
The connection loss exception was coming as we were trying to reach a remote Kafka cluster that does not allow/accept connection from our end. We need to open specific firewall port so that the connection can be established. It seems that while running topology is cluster mode all the supervisor nodes should be able to talk to the zookeeper, so the firewall should be open for each one of them.

Deleting and re-creating Kafka topics through the Java KafkaServer API

I have a few integration tests for my application that connect to a local Kafka instance. I am using the Java KafkaServer API to create the local instance on demand when the test runs in a way similar to the accepted answer from this question:
How can I instanciate a Mock Kafka Topic for junit tests?
Each of my tests pass when run in isolation. The problem I am having is that my tests use the same Kafka topics and I would like the topics to start each test containing no messages. However, when I run the tests in series I am getting this error when all tests after the first run and try to recreate the topics they need:
kafka.common.TopicExistsException: Topic "test_topic" already exists.
at kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:187)
at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:172)
at kafka.admin.TopicCommand$.createTopic(TopicCommand.scala:93)
Each test creates and shuts down its own EmbeddedZookeeper and KafkaServer. I have also tried deleting the 'brokers/topics' path from ZK as well as the KafkaServer's logDirs at the end of each test. Somehow the topics from the first test are still surviving to the second.
What can I do at the end of each test to make sure that the topics it uses do not interfere with tests that run after it?
I was able to eventually get it to work.
Instead of cleaning up after each test, I changed the tests to clean up before they ran.
There were two cleanup steps that I needed to do.
The first was to delete the broker's data directory before starting KafkaServer.
String dataDirectory = 'tmp/kafka'
FileUtils.deleteDirectory(FileUtils.getFile(dataDirectory))
Properties props = TestUtils.createBrokerConfig(BROKER_ID, port, true)
props.put('log.dir', dataDirectory)
props.put('delete.topic.enable', 'true')
KafkaConfig config = new KafkaConfig(props)
Time mock = new MockTime()
kafkaServer = TestUtils.createServer(config, mock)
The second was to delete the topic path recursively in Zookeeper before sending the createTopic command.
zkClient.deleteRecursive(ZkUtils.getTopicPath(topicName))
List<String> arguments = ['--topic', topicName, '--partitions', '1', '--replication-factor', '1']
TopicCommand.createTopic(zkClient, new TopicCommand.TopicCommandOptions(arguments as String[]))
I tried a number of similar approaches but couldn't get it working with anything except exactly this.
Note that the code is Groovy and not Java.

Spark App Works Only in Standalone but not able to connect to master?

I have a scala 2.10 spark 1.5.0 sbt app I am developing in eclipse. In my main method, I have:
val sc = new SparkContext("local[2]", "My Spark App")
// followed by operations to do things with the spark context to count up lines in a file
When I run this application within the eclipse IDE, it works and outputs what I expect the result to be, but when I change the spark context to connect to my cluster using:
val master = "spark://My-Hostname-From-The-Spark-Master-Page.local:7077"
val conf = new SparkConf().setAppName("My Spark App").setMaster(master)
val sc = new SparkContext(conf)
I get errors like:
WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#My-Hostname-From-The-Spark-Master-Page.local:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
What gives? How can I get my job to run with the existing master and slave nodes I started up? I know spark-submit is recommended, but arn't applications like Zepplin and Notebook designed to use spark without having to "spark-submit"?

Storm - KeeperException

I'm running a storm cluster. I have a nimbus, zookeeper, Kafka server, and a supervisor in one node,
and another supervisor in a separate node.
When I deploy the topology which has a simple Kafka-spout in the first node. The supervisor in second node throws run time exception.
But it works fine with the supervisor in the first node. How to solve this?
Let me ask some questions regarding your topology:
1. How are you ensuring that spout executor runs only on first supervisor node? It can run on any supervisor node.
2. Supervisor node registered correctly in the cluster? I mean this node shows on UI? Because as per the exception, it seems zookeeper does not aware of this node.
3. If spout runs on first supervisor node means that kafka config parameters like hostname might have been specified like "localhost". So when spout runs on first node, it contacts the localhost only for kafka queue. And if spout tries to run on second supervisor node, it fails because, for him "localhost" is itself and there kafka queu is not there.
--Hariprasad
I have faced similar issue with Storm versions before 0.6.2. We tried running Zookeeper on a separate node and the issue was resolved, but we soon upgraded our version of Storm and no longer faced the issue. Try to run Zookeeper on a separate node and see if that helps. Also check if you have correctly configured the supervisors in the cluster. Check this Google Groups Thread to see if you have correct configuration options.