error while running the injector topolgy in strom crawler - apache-zookeeper

i am using follwing resources with their versions given
apache storm 1.2.3
storm crawler 1.16
zookeeper 3.6.1
elasticsearch 7.5.0
when i try to inject urls into my elasticsearch db through the given readme command-
"
storm jar target/crawler-1.16.jar org.commoncrawl.stormcrawler.news.CrawlTopology -conf $PWD/conf/es-conf.yaml -conf $PWD/conf/crawler-conf.yaml $PWD/seeds/ feeds.txt"
it shows the following errors->
[
error-1->
org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [localhost]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
error-2->
org.apache.storm.thrift.TApplicationException: Internal error processing getLeader
]
this is my storm.yaml file configs->
[
storm.zookeeper.servers:
- "localhost"
storm.local.dir: "/home/zathura/Desktop/apache-storm-1.2.3/data"
nimbus.seeds: ["localhost"]
supervisor.slots.ports:
6700
6701
6702
6703
nimbus.thrift.max_buffer_size: 20480000
]
and also my storm ui is not showing the required output page
my storm ui screenshot

Related

"SchemaRegistryException: Failed to get Kafka cluster ID" for LOCAL setup

I'm downloaded the .tz (I am on MAC) for confluent version 7.0.0 from the official confluent site and was following the setup for LOCAL (1 node) and Kafka/ZooKeeper are starting fine, but the Schema Registry keeps failing (Note, I am behind a corporate VPN)
The exception message in the SchemaRegistry logs is:
[2021-11-04 00:34:22,492] INFO Logging initialized #1403ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log)
[2021-11-04 00:34:22,543] INFO Initial capacity 128, increased by 64, maximum capacity 2147483647. (io.confluent.rest.ApplicationServer)
[2021-11-04 00:34:22,614] INFO Adding listener: http://0.0.0.0:8081 (io.confluent.rest.ApplicationServer)
[2021-11-04 00:35:23,007] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryException: Failed to get Kafka cluster ID
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.kafkaClusterId(KafkaSchemaRegistry.java:1488)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.<init>(KafkaSchemaRegistry.java:166)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:71)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:90)
at io.confluent.rest.Application.configureHandler(Application.java:271)
at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:245)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:44)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:180)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.kafkaClusterId(KafkaSchemaRegistry.java:1486)
... 7 more
My schema-registry.properties file has bootstrap URL set to
kafkastore.bootstrap.servers=PLAINTEXT://localhost:9092
I saw some posts saying its the SchemaRegistry unable to connect to the KafkaCluster URL because of the localhost address potentially. I am fairly new to Kafka and basically just need this local setup to run a git repo that is utilizing some Topics/Kafka so my questions...
How can I fix this (I am behind a corporate VPN but I figured this shouldn't affect this)
Do I even need the SchemaRegistry?
I ended up just going with the Docker local setup inside, and the only change I had to make to the docker compose YAML was to change the schema-registry port (I changed it to 8082 or 8084, don't remember exactly but just an unused port that is not being used by some other Confluent service listed in the docker-compose.yaml) and my local setup is working fine now

Storm 1.1.0 Multi Node Cluster

I am using Storm v1.1.0, and I am building Storm on different machines, lets say I have 5 machines,
machine1: Zookeeper
machine2: Nimbus
machine3: Supervisor1
machine4: Supervisor2
machine5: UI
And the configuration of each machine is the following:
machine1
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
machine2
storm.local.dir: "/mnt/storm"
storm.zookeeper.servers: - "ZookeeperIP"
machines(3-4)
storm.local.dir: "/mnt/storm"
storm.zookeeper.servers:
- "ZookeeperIP"
nimbus.seeds: ["NimbusIP"]
machine5
storm.local.dir: "/mnt/storm"
storm.zookeeper.servers:
- "ZookeeperIP"
nimbus.seeds: ["NimbusIP"]
ui.port: 8080
All is running okay no errors, and I also checked logs, but no errors also all of them are running and starting fine!
The problem is UI is showing anything and gives error in the console
machine5
storm.local.dir: "/mnt/storm"
storm.zookeeper.servers:
- "ZookeeperIP"
nimbus.seeds: ["NimbusIP"]
ui.port: 8080
This configuration should be available in machine 2-5. From your description,
machine 2 is missing nimbus.seeds and ui.port. Machine 3-4 is missing ui.port.
Try keeping the same configuration throughout your storm cluster.

Apache Storm: Could not find leader nimbus from seed hosts

I installed Apache Storm 1.0 by following this tutorial but I am not able to access to the Storm UI from the Internet. Accessing localhost:8080 gives the following error:
org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [localhost]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:90)
at org.apache.storm.ui.core$cluster_configuration.invoke(core.clj:343)
at org.apache.storm.ui.core$fn__12106.invoke(core.clj:929)
at org.apache.storm.shade.compojure.core$make_route$fn__2467.invoke(core.clj:93)
at org.apache.storm.shade.compojure.core$if_route$fn__2455.invoke(core.clj:39)
at org.apache.storm.shade.compojure.core$if_method$fn__2448.invoke(core.clj:24)
at org.apache.storm.shade.compojure.core$routing$fn__2473.invoke(core.clj:106)
at clojure.core$some.invoke(core.clj:2570)
at org.apache.storm.shade.compojure.core$routing.doInvoke(core.clj:106)
at clojure.lang.RestFn.applyTo(RestFn.java:139)
at clojure.core$apply.invoke(core.clj:632)
at org.apache.storm.shade.compojure.core$routes$fn__2477.invoke(core.clj:111)
at org.apache.storm.shade.ring.middleware.json$wrap_json_params$fn__11576.invoke(json.clj:56)
at org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__3543.invoke(multipart_params.clj:103)
at org.apache.storm.shade.ring.middleware.reload$wrap_reload$fn__4286.invoke(reload.clj:22)
at org.apache.storm.ui.helpers$requests_middleware$fn__3770.invoke(helpers.clj:46)
at org.apache.storm.ui.core$catch_errors$fn__12301.invoke(core.clj:1230)
at org.apache.storm.shade.ring.middleware.keyword_params$wrap_keyword_params$fn__3474.invoke(keyword_params.clj:27)
at org.apache.storm.shade.ring.middleware.nested_params$wrap_nested_params$fn__3514.invoke(nested_params.clj:65)
at org.apache.storm.shade.ring.middleware.params$wrap_params$fn__3445.invoke(params.clj:55)
at org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__3543.invoke(multipart_params.clj:103)
at org.apache.storm.shade.ring.middleware.flash$wrap_flash$fn__3729.invoke(flash.clj:14)
at org.apache.storm.shade.ring.middleware.session$wrap_session$fn__3717.invoke(session.clj:43)
at org.apache.storm.shade.ring.middleware.cookies$wrap_cookies$fn__3645.invoke(cookies.clj:160)
at org.apache.storm.shade.ring.util.servlet$make_service_method$fn__3351.invoke(servlet.clj:127)
at org.apache.storm.shade.ring.util.servlet$servlet$fn__3355.invoke(servlet.clj:136)
at org.apache.storm.shade.ring.util.servlet.proxy$javax.servlet.http.HttpServlet$ff19274a.service(Unknown Source)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:654)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1320)
at org.apache.storm.logging.filters.AccessLoggingFilter.handle(AccessLoggingFilter.java:47)
at org.apache.storm.logging.filters.AccessLoggingFilter.doFilter(AccessLoggingFilter.java:39)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1291)
at org.apache.storm.shade.org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:247)
at org.apache.storm.shade.org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:210)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1291)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:443)
at org.apache.storm.shade.org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1044)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:372)
at org.apache.storm.shade.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:978)
at org.apache.storm.shade.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.apache.storm.shade.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.apache.storm.shade.org.eclipse.jetty.server.Server.handle(Server.java:369)
at org.apache.storm.shade.org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:486)
at org.apache.storm.shade.org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:933)
at org.apache.storm.shade.org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:995)
at org.apache.storm.shade.org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.apache.storm.shade.org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.apache.storm.shade.org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.apache.storm.shade.org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668)
at org.apache.storm.shade.org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at org.apache.storm.shade.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.apache.storm.shade.org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Content of storm.yaml:
########### These MUST be filled in for a storm configuration
storm.zookeeper.servers:
- "localhost"
storm.local.dir: "/var/storm"
nimbus.host: "localhost"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
I resolved the two problems by myself.
for the first problem:
I had to restart zookeeper after installing apache storm.
for the second problem:
the problem was not a problem of storm.
the cause of this problem is due to the platform of azure, the 8080 port was closed by default.
So, I thank myself for this effort.
If it were allowed, I will give you (myself) +1M points
I had the same error and the answer I needed is not here, so here I am.
Before going further, I am using Storm version 2.1.0. In this version nimbus.host has been replaced by the array option nimbus.seeds.
In the log is written Did you specify a valid list of nimbus hosts for config nimbus.seeds?. As nimbus.seeds is a mandatory option, to fix the problem I simply added the host IP address as the only element in this list:
nimbus.seeds: ["HOST IP"]

Cassandra Channel has been closed

We have a small test cluster with 3 nodes on Amazon. Everything seems working with cqlsh. But when I try to debug my app from my laptop (outside of Amazon of course), I'm getting 'Channel has been closed' errors, and it starts retrying forever. I know it's likely caused by the config in cassandra.ymal, as it shows some private IPs in my Eclipse console. Tried many different ways but still getting the same problem. Appreciate any input on this. How to get rid of the private IPs 10.251.x.x from the client?
Here are some context,
Versions:
[cqlsh 4.0.1 | Cassandra 2.0.4 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
cassandra-driver-core-2.0.0-rc1.jar
In cassandra.ymal:
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "54.203.x.x,54.203.x.y"
listen_address: 10.251.a.b
broadcast_address: 54.203.x.x
native_transport_port: 9042
endpoint_snitch: Ec2MultiRegionSnitch
In Eclipse console:
DEBUG [main] (ControlConnection.java:145) - [Control connection] Successfully connected to /54.203.x.x
DEBUG [Cassandra Java Driver worker-0] (Session.java:379) - Adding /54.203.x.x to list of queried hosts
DEBUG [Cassandra Java Driver worker-1] (Session.java:379) - Adding /10.251.a.c to list of queried hosts
DEBUG [Cassandra Java Driver worker-1] (Connection.java:103) - [/10.251.a.c-1] Error connecting to /10.251.a.c (connection timed out: /10.251.a.c:9042)
DEBUG [Cassandra Java Driver worker-1] (Session.java:390) - Error creating pool to /10.251.a.c ([/10.251.a.c] Cannot connect)
DEBUG [Cassandra Java Driver worker-1] (Cluster.java:1064) - /10.251.a.c is down, scheduling connection retries
DEBUG [New I/O worker #4] (Connection.java:194) - Defuncting connection to /10.251.a.c
com.datastax.driver.core.TransportException: [/10.251.a.b] Channel has been closed
at com.datastax.driver.core.Connection$Dispatcher.channelClosed(Connection.java:548)
...
It seem that your Java driver is using auto discovery by calling "describe cluster" to get a list of all nodes in your cluster. In AWS using Ec2Snitch, that yields to private ips which obviously won't work from outside of AWS. There is a discussion on this topic here:
https://datastax-oss.atlassian.net/browse/JAVA-145
The last commend got my attention. It says you can do something with LoadBalancingPolicy of the driver to limit the nodes. Hope this includes specifying the specific IPs so it does not auto discover.

Hector test example not working on Cassandra 0.7.4

I have set up my single node Cassandra 0.7.4 and started the service with
bin/cassandra -f. Now I am trying to use the Hector API (v. 0.7.0) to manage the
DB.
The Cassandra CLI works fine and I can create keyspaces and so on.
I tried to run the test example and create a single keyspace:
Cluster cluster = HFactory.getOrCreateCluster("TestCluster",
new CassandraHostConfigurator("localhost:9160"));
Keyspace keyspace = HFactory.createKeyspace("Keyspace1", cluster);
But all I get is this:
2011-04-14 22:20:27,469 [main ] INFO
me.prettyprint.cassandra.connection.CassandraHostRetryService
- Downed Host
Retry service started with queue size -1 and retry delay 10s
2011-04-14 22:20:27,492 [main ] DEBUG
me.prettyprint.cassandra.connection.HThriftClient -
Transport open status false
for client CassandraClient<localhost:9160-1>
....this again about 20 times
me.prettyprint.cassandra.service.JmxMonitor - Registering JMX
me.prettyprint.cassandra.service_TestCluster:ServiceType=hector,
MonitorType=hector
2011-04-14 22:20:27,636 [Thread-0 ] INFO
me.prettyprint.cassandra.connection.CassandraHostRetryService -
Downed Host
retry shutdown hook called
2011-04-14 22:20:27,646 [Thread-0 ] INFO
me.prettyprint.cassandra.connection.CassandraHostRetryService -
Downed Host
retry shutdown complete
Can you please tell me what I'm doing wrong?
Thanks
When you connect via the CLI, do you specify "-h localhost -p 9160"?
Can you actually do stuff on the command line with the above?
The error from HThriftClient indicates it could not connect to the Cassandra Daemon.
FTR, you would get responses much faster via hector-users#googlegroups.com
If you are on a linux machine, try starting up your cassandra server by this command:
/bin$ ./cassandra start -f
Then for the cli, use this command:
./cassandra-cli -h {hostname}/9160.
Then make sure that the configures are ok.