How to stop a sinkTask in Kafka? - apache-kafka

I am running a sinkTask using connect-standalone.sh and connect-standalone.properties. I am doing this in a shell script and I am not sure how to stop the sinkTask once the data is consumed by the consumer.
I tried various settings in the properties file like connections.max.idle.ms=5000. But nothing is stopping the sink.
I don't want to try the distributed mode as it requires REST API calls. Any suggest to stop the sinkTask once the messages in the producer are empty?

When running in standalone, the only way to stop a connector is to stop the connect process you started with connect-standalone.sh.
If you want to often start and stop connectors, I'd recommend you to reconsider distributed mode as it makes controlling the life cycle of connectors easy to manage via the REST API.

Related

How to add health check for topics in KafkaStreams api

I have a critical Kafka application that needs to be up and running all the time. The source topics are created by debezium kafka connect for mysql binlog. Unfortunately, many things can go wrong with this setup. A lot of times debezium connectors fail and need to be restarted, so does my apps then (because without throwing any exception it just hangs up and stops consuming). My manual way of testing and discovering the failure is checking kibana log, then consume the suspicious topic through terminal. I can mimic this in code but obviously no way the best practice. I wonder if there is the ability in KafkaStream api that allows me to do such health check, and check other parts of kafka cluster?
Another point that bothers me is if I can keep the stream alive and rejoin the topics when connectors are up again.
You can check the Kafka Streams State to see if it is rebalancing/running, which would indicate healthy operations. Although, if no data is getting into the Topology, I would assume there would be no errors happening, so you need to then lookup the health of your upstream dependencies.
Overall, sounds like you might want to invest some time into using monitoring tools like Consul or Sensu which can run local service health checks and send out alerts when services go down. Or at the very least Elasticseach alerting
As far as Kafka health checking goes, you can do that in several ways
Is the broker and zookeeper process running? (SSH to the node, check processes)
Is the broker and zookeeper ports open? (use Socket connection)
Are there important JMX metrics you can track? (Metricbeat)
Can you find an active Controller broker (use AdminClient#describeCluster)
Are there a required minimum number of brokers you would like to respond as part of the Controller metadata (which can be obtained from AdminClient)
Are the topics that you use having the proper configuration? (retention, min-isr, replication-factor, partition count, etc)? (again, use AdminClient)

Standalone Kafka Producer

I'm thinking about creating a stand alone Kafka producer that runs as a daemon and takes messages via a socket and send them reliable to Kafka.
But, I must not be the first one to think about this idea. The idea is to avoid writing a Kafka producer in for example PHP or Node but just deliver messages via a socket to a stand alone daemon from these languages that takes care of the delivery while the main applications keeps doing its thing.
This daemon should take care of retry delivery in case of outages and acts as a delivery point for all programs that run on the server.
Is this something that is a good idea, or is writing producers in every used language the common approach? That mmust not be the case right?
You should have a look at Kafka connectors.
Here is one of the them:
Kafka Connect Socket Source
Here you can find how to use it:
https://www.baeldung.com/kafka-connectors-guide
Sample Configuration connect-socket-source.properties:
name=socket-connector
connector.class=org.apache.kafka.connect.socket.SocketSourceConnector
tasks.max=1
topic=topic
schema.name=socketschema
port=12345
batch.size=100

Kafka Connector - distributed - load balancing tasks

I am running development environment for Confluent Kafka, Community edition on Windows, version 3.0.1-2.11.
I am trying to achieve load balancing of tasks between 2 instances of connector. I am running Kafka Zookepper, Server, REST services and 2 instance of Connect distributed on the same machine.
Only difference between properties file for connectors is rest port since they are running on the same machine.
I don't create topics for connector offsets, config, status. Should I?
I have custom code for sink connector.
When I create worker for my sink connector I do this by executing POST request
POST http://localhost:8083/connectors
toward any of the running connectors. Checking is there loaded worker is done at URL
GET http://localhost:8083/connectors
My sink connector has System.out.println() lines in code with which I can follow output of my code in the console log.
When my worker is running I can see that only one instance of connector is executing code. If I terminate one connector another instance will take over the worker and execution will resume. However this is not what I want.
My goal is that both connector instances are running worker code so that they can share the load between them.
I've tried to got over some open source connectors to see is there specifics in writing code of connectors but with no success.
I've made some different attempts to tackle this problem but with no success.
I could rewrite my business code to come around this but I'm pretty sure I'm missing on something not obvious for me.
Recently I commented on Robin Moffatt's answer of this question.
From the sounds of it your custom code is not correctly spawning the number of tasks that you are expecting.
Make sure that you've set tasks.max >1 in your config
Make sure that your connector is correctly creating the appropriate number of tasks to taskConfigs
References:
https://opencredo.com/blogs/kafka-connect-source-connectors-a-detailed-guide-to-connecting-to-what-you-love/
https://docs.confluent.io/current/connect/devguide.html
https://enfuse.io/a-diy-guide-to-kafka-connectors/

Automatically deploy Kafka Stream app

I'm just starting with Kafka and Kafka Streaming Applications. I wrote a Kafka Stream App that consumes from one topic, process this messages, and send them to another topic.
To the best of my knowledge, the only ways that I have found to run this Kafka Stream App coded are:
Run Java Class from IDE.
Generate *.jar file and run it from prompt.
I would like to know if there is any way to automatically run Kafka Streaming Applications on Kafka server startup. For example: copy the *.jar file to some folder of my Kafka installation, and automatically run this stream app when I start my Kafka server.
Your Kafka broker (server) and your Kafka Streams application are independent from one another. You can start them however you manage processes on your server, whether it's something like initd or systemd, or container-based solutions like Docker or Kubernetes.
In my experience, if your streams application starts well before your broker or ZooKeeper, then it may time out waiting for them to come online. So you may need to configure the streams process to restart in such a situation.

How to start kafka server programmatically

we are trying to start the kafka server (zookeeper and broker) programmatically in our application. Any api / library available for the same?
Yes, you can use embedded Kafka, which will run zookeeper and kafka server for you. It is generally used for testing kafka producer/consumer where there is no need to explicitly run them.
For more detail refer
To run it, we write EmbeddedKafka.start() at the start and EmbeddedKafka.stop() at the end.