Anyway to know when a project finished in Apache Storm? - apache-kafka

I have a question about Apache Storm. My current application has multiple projects which run continuously to update latest data from other services. When a project completed, it will be started again after a modifiable interval to update data.
We're moving to Apache Storm and Kafka for new version. Our design is putting project metadata into Kafka then Storm will read from Kafka and start projects. But since we have many Bolts to process data then we haven't found out any solution to indicate when a project completed updating data to update its status so a scheduler can know that this project was done for a while then putting it to Kafka again after specific interval.
So my question is for a design like this, is there anyway to handle project status, to know when a project finished in Storm? Thank u in advance

Related

Integration tests for spring kafka producer and consumer

I want to write tests for spring kafka producer and consumer. I have tried multiple ways:
EmbeddedKafka annotation
EmbeddedKafkaRule
EmbeddedKafkaBroker
etc...
Every time I get one or the other error and all the examples posted on GitHub don't seem to run at all. I checked the spring kafka versions for compatibility as well.
Can someone share an example code base that was recently written and has seen it run successfully?
There are hundreds of tests in the framework itself.
This is probably the most extensive one...
https://github.com/spring-projects/spring-kafka/blob/1b9a9451feea7cca16903f1c990c74c6be9b8ffb/spring-kafka/src/test/java/org/springframework/kafka/annotation/EnableKafkaIntegrationTests.java#L164-L176

trying to execute s3-sqs qubole connector for spark structured streaming

I am trying to follow, https://github.com/qubole/s3-sqs-connector and trying to load the connector but seems like the connector is not available on maven and while generating the buiold manually the sqs classes are not loaded.
Can some1 guide me on this.
Thanks,
Dipesh
Update:
S3-SQS connector is now released on maven central.
https://mvnrepository.com/artifact/com.qubole/spark-sql-streaming-sqs
You should be able to use it now. Get back to us if you face any problems.
Unfortunately, registration of the s3-sqs-connector in spark-packages is still pending. We'll try to release the package in maven central directly as soon as possible but it should take a few days at the very least.
In the meantime, you can clone the repository and build the jar yourself if required.

Upgrading Kafka client from 0.8.2.0 to 0.11.0.0

Currently, at my company we are migrating from Kafka 0.8 to 0.11, brokers migration steps and clearly stated in kafka documentation here
What I am stuck in is, upgrading the kafka clients (producers, consumers, spark-streaming), I don't find any documentation/ articles listing out clearly what are the required changes or steps to follow to upgarde the client, all what I found is the java doc Producer Client
What I did so far is to change the kafka client version in my gradle to kafka-clients-0.11.0.0, and everything from the compilation point of view went fine with no code changes at all.
What I seek help with is, is there any expected problems I should take care of, any pointers for client changes other than the kafka-client version?
I went through lots of experiments to get this done.
For the consumers and producers, I just used the kafka consumers and producers 0.11.0.
The trick part was replacing spark-streaming, spark-streaming latest version only support upto kafka 0.10.X, which doesn't contains any updates related to the new broker.
What I recommend here, if you are about to write an application from scratch and your main goal is realtime streaming go for kafka-streaming API, it is just AWESOME!, if you already have spark streaming app (which was my case), you should either judge which is more important than the other wether to get stuck with the kafka-broker version 10.X and spark-streaming which was [experimental][1] btw.
The benefits of having the streaming inside kafka not spark the following:
Kafka streaming is a normal jar that can be injected in any java application, so you don't care that much about deployment, and environment
Auto-scaling is so easy when using kafka-streaming using any scaleset provided by any cloud service provider, unlike scaling a HDP cluster.
Monitoring using something like prometheus would be much easier.

How to update running kafka connector

I have kafka conenct running in Marathon container. If I want to update the connector plugin (jar) I have to upload the new one and then restart the Connect task.
Is it possible to do that without restarting/downtime?
The updated jar for the connector plugin needs to be added to the classpath and then the classloader for the worker needs to pick it up. The best way to do this currently is to take an outage as described here.
Depending on your connector, you might be able to do rolling upgrades, but the generic answer is that if you need to upgrade the connector plugin, you currently have to take an outage.

How to run kafka producer in eclipse?

I am new to kafka.
I have downloaded kafka 2.9.2-0.8.1.1. I have started zookeeper, broker, producer and consumer from command prompt. I successfully created a topic and sent a message.
Now I want to run the producer from eclipse. But I dont know how to to do that. I found some links like http://vulab.com/blog/?p=611 to do this but I am still not able to run it. Is this the correct process mentioned in the link? Do i really need to create maven project in eclipse? Or is there any other way to do that?
You need to use the Kafka Java API for creating kafka producer (if you intent to use JAVA). Using maven is highly recomended as it will help managing all the dependencies for you. But you are free to bypass maven if you are ready to manage all the required JARs by yourself.
kafka wiki is another good place to look at.