Spring cloud stream not throwing exception when zookeeper is down - apache-kafka

I'm using kafka with zookeeper that comes with the kafka bundle. I'm also using spring cloud stream, and kafka binder.
I wanted to see what happens if zookeeper is down, while the kafka broker is running. I send some items to the kafka topic via spring cloud stream source. To my surprise, instead of getting an exception, spring cloud stream says it's ok. However my items aren't in the kafka queue. Is this the expected behavior, or a bug? Is there something I can configure to get an exception back if zookeeper is down?

Try Kafka Producer sync = true property.
I'm afraid we can reach the Broker for metadata, but when we send a record that is done in async manner by default.
So, track down the network problems fully we have to switch to the sync mode.
http://docs.spring.io/spring-cloud-stream/docs/Chelsea.SR2/reference/htmlsingle/index.html#_kafka_producer_properties

Related

Why are there kafka and brokers in the scdf trace shown in zipkin?

I have a spring cloud data flow environment created in kubernetes and a zipkin environment created as well. But when I look at the Dependencies in zipkin, I see that in addition to the application that exists in the stream, there is also a broker and kafka.
Is there anyone who can tell me why this is? And is there any way I can get broker and kafka to not show up.
It's like this image shows
That's because one of the brokers got resolved as being kafka (for example via special message headers) and the other didn't. It's either a bug in Sleuth or you're using an uninstrumented library.

how to call a springboot service endpoint from apache kafka and pass all the messages from a topic?

I am new to kafka technology and I have a requirement of fetch all the realtime data from a DB and pass it to a springboot microservice for its processing. in my analysis found that apache kafka with kafka source connect can pull all real time data from the DB to kafka Topics.
Can someone tell is there any way to pick this data form kafka topics and share to microservice by trigger a restcall from the kafka service ?
The idea is whenever a new entry added to the database table kafka can pull that data via kafka connect and somehow kafka should call the microservice and share this new entry. is it possible with kafka ?
Database --> kafka connect --> kafka (Topic) ---> some service that call microservice ---> microservice
form kafka topics and share to microservice
Kafka doesn't push. You would add a consumer in your service to pull from Kafka, perhaps using spring-kafka or spring-cloud-streams
Alternatively, Kafka Connect Sink could be used with an HTTP POST connector, but then you'd need to somehow deal with not committing offsets for messages that have failed requests.

Kafka 2.0 - Kafka Connect Sink - Creating a Kafka Producer

We are currently on HDF (Hortonworks Dataflow) 3.3.1 which bundles Kafka 2.0.0 and are trying to use Kafka Connect in distributed mode to launch a Google Cloud PubSub Sink connector.
We are planning on sending back some metadata into a Kafka Topic and need to integrate a Kafka producer into the flush() function of the Sink task java code.
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
Also, how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source? I need to use the same Bootstrap server list to start the producer.
Currently I am changing the config for the sink connector, adding bootstrap server list as a property and parsing it in the Java code for the connector. I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible.
Kindly help on this.
Thanks in advance.
need to integrate a Kafka producer into the flush() function of the Sink task java code
There is no producer instance exposed in the SinkTask API...
Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).
I mean, you can add whatever code you want. As far as negative impacts go, that's up to you to benchmark on your own infrastructure. Obviously adding more blocking code makes the other processes slower overall
how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source?
Sinks and sources are not workers. Look at connect-distributed.properties
I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible
It's not possible. Adding extra properties to the sink/source configs are the only way. (Feel free to make a Kafka JIRA requesting such a feature of exposing the worker configs, though)

Spring Cloud Stream Kafka Binder autoCommitOnError=false get unexpected behavior

I am using Spring Boot 2.1.1.RELEASE and Spring Cloud Greenwich.RC2, and the managed version for spring-cloud-stream-binder-kafka is 2.1.0RC4. The Kafka version is 1.1.0. I have set the following properties as the messages should not be consumed if there is an error.
spring.cloud.stream.bindings.input.group=consumer-gp-1
...
spring.cloud.stream.kafka.bindings.input.consumer.autoCommitOnError=false
spring.cloud.stream.kafka.bindings.input.consumer.enableDlq=false
spring.cloud.stream.bindings.input.consumer.max-attempts=3
spring.cloud.stream.bindings.input.consumer.back-off-initial-interval=1000
spring.cloud.stream.bindings.input.consumer.back-off-max-interval=3000
spring.cloud.stream.bindings.input.consumer.back-off-multiplier=2.0
....
There are 20 partitions in the Kafka topic and Kerberos is used for authentication (not sure if this is relevant).
The Kafka consumer is calling a web service for every message it processes, and if the web service is unavailable then I expect that the consumer will then try to process the message for 3 times before it moves on to the next message. So for my test, I disabled the webservice, and therefore none of the message could be processed correctly. From the logs I can see that this is happening.
After a while I stopped and then restarted the Kafka consumer (webservice is still disabled). I was expecting that after the restart of the Kafka consumer, it would attempt to process the messages that was not successfully processed the first time around. From the logs (I printed out each message with its fields) after the restart of the Kafka Consumer I couldn't see this happening. I thought the partition might be influencing something, but I check the logs and all 20 partitions were assigned to this single consumer.
Is there a property I have missed? I thought the expected behavior when I restart the consumer the second time, is that Kafka broker would pass the records that were not successfully processed to the consumer again.
Thanks
Parameters working as expected. See comment.

Kafka Streams application stops working after no message have been read for a while

I have noticed that my Kafka Streams application stops working when it has not read new messages from the Kafka topic for a while. It is the third time that I have seen this happen.
No messages have been produced to the topic since 5 days. My Kafka Streams application, which also hosts a spark-java webserver, is still responsive. However, the messages I produce to the Kafka topic are not being read by Kafka Streams anymore. When I restart the application, all messages will be fetched from the broker.
How can I make my Kafka Streams Application more durable to this kind of scenario? It feels that Kafka Streams has an internal "timeout" after which it closes the connection to the Kafka broker when no messages have been received. I could not find such a setting in the documentation.
I use Kafka 1.1.0 and Kafka Streams 1.0.0
Kafka Streams do not have an internal timeout to control when to permanently close a connection to the Kafka broker; Kafka broker, on the other hand, does have some timeout value to close idle connections from clients. But Streams will keep trying to re-connect once it has some processed result data that is ready to be sent to the brokers. So I'd suspect your observed issue came from some other causes.
Could you share your application topology sketch and the config properties you used, for me to better understand your issue?