How to get dedicated Apache Kafka MirrorMaker 2.0 Rest API exposed - apache-kafka

I am trying to reach a dedicated MirrorMaker 2.0 cluster to see the status of connectors/tasks etc. On this README in their git Apache kafka people claims that when used with dedicated.mode.enable.internal.rest=true MirrorMaker nodes are starting with an internal listener port to communicate with each other.
My question is; Is there a way to advertise this port to outside so I can send curl requests to the dedicated MirrorMaker nodes as we do in general like curl http://localhost:8083/connectors to see the connectors running etc?
I have already tried multiple solutions I've found online they simply do not work. It seems to me this is impossible when you start mirrormaker 2.0 with ./bin/connect-mirror-maker. I know this is possible, If I add every single required connector manually to an existing Kafka Connect cluster, but thats not what I am looking for.
I am also curious if there is a way to add the dedicated MirrorMaker cluster connectors into a already running kafka connect cluster.
This is important because we would like to get curl responses to check tasks status for MirrorMaker.
Thanks.

You should be able to run connect-distributed like normal, have its REST API available, then configure and monitor MM2 without using its dedicated scripts. Similarly, this is how you'd add to other existing Connect cluster.
Ideally, you should monitor from JMX, instead, where you get count of the running tasks, not use curl. Or, add Jolokia or Prometheus JMX Exporter to run their own http server, then curl that, and grep for the tasks metric

Related

Accessing Kafka in Kubernetes from SvelteKit

I am building a SvelteKit application with Kafka Support. I have created a kafka.js file and tested the kafka support with a local kafka setup, which is successful. Now, When I replaced the topic name with a topic that is running in the kafka of our Kubernetes cluster, am not seeing any response.
How do I test this connection that is establishing between Kafka in Kubernetes cluster and the JS web application ? Any hints could be much helpful. Doing just console logs are not helpful so far because kafka itself not getting hit.
Any two pods deployed in the same namespace can communicate using local service names. So, do the brokers have a Service resource?
For example, assuming you are in namespace: default, and you have kafka-svc, then you'd setup bootstrap.servers: kafka-svc.svc.cluster.local.
You also need to configure Kafka's advertised.listeners. Related blog - https://strimzi.io/blog/2019/04/17/accessing-kafka-part-1/
But this requires NodeJS (or other language) backend, and not SvelteKit UI (which cannot connect to backend TCP server, only use some HTTP bridge). Your HTTP options would include some custom server, or Confluent REST Proxy, Strimzi Kafka Bridge, etc. But you were already told this.

Kafka - increase partition count of existing topic through Confluent REST Api

I need to add partitions to an existing Kafka topic.
I'm aware that it is possible to use the /bin/kafka-topics.sh script to achieve this, but I would prefer to do this through the Confluent REST api.
As far as I see there is no documented endpoint in the api reference, but I wonder if someone else here was able to make this work.
Edit: As it does seem to be impossible to use the REST api here, I wonder what the best practice is for adding partitions to an existing topic in a containerized setup. E.g. if there is a custom partioning scheme that maps customer ids to specific partitions. In this case the app container would need to adjust the partition count of the kafka container.
The Confluent REST Proxy has no such endpoint for topic update administration.
You would need to use the shell script or the corresponding AdminClient class that the shell script uses
This is the solution I ended up with:
Create a small http service that is is deployed within the kafka docker image
The http service accepts requests to increase the partition count and directs the requests to the kafka admin scripts (bin/kafka/kafka-topics.sh)
Something similar could have been achieved by using the AdminClient NewPartitions api in the Java Kafka lib. This solution has the advantage that the kafka docker image does not have to be changed, because the AdminClient can connect through network from another container.
For a production setup the AdminClient is preferrable, I decided for the integrated script approach because of the chosen language (rust).

Alternative of Confluent REST Proxy

We have some applications which want to communicate with Kafka using REST API calls to both consume and produce messages. If we do not want to use Confluent REST Proxy, what are the options ?
One possible alternative is the Strimzi Kafka Bridge (https://github.com/strimzi/strimzi-kafka-bridge).
It's part of the broader Strimzi project about running Kafka on Kubernetes but work even running as standalone (when your Kafka cluster is on bare metal).
Of course it's open source and Apache 2.0 licensed.
the reason [not to use it] is monetary
You can use the Confluent REST Proxy with no software/licensing costs.
We are thinking of not buying any additional hardware for this new request and use existing configuration to meet the requirement.I am mostly interested to know if consumer/producer can be created to meet this requirement
You don't need extra hardware.
Pick an existing server with at least 2GB available of memory, and run kafka-rest-start and see how well it works
if we can create Rest-API calls which will be used by other applications to consume data from Kafka and push data to Kafka
That's the main purpose of REST Proxy, yes.

Kafka Connector - distributed - load balancing tasks

I am running development environment for Confluent Kafka, Community edition on Windows, version 3.0.1-2.11.
I am trying to achieve load balancing of tasks between 2 instances of connector. I am running Kafka Zookepper, Server, REST services and 2 instance of Connect distributed on the same machine.
Only difference between properties file for connectors is rest port since they are running on the same machine.
I don't create topics for connector offsets, config, status. Should I?
I have custom code for sink connector.
When I create worker for my sink connector I do this by executing POST request
POST http://localhost:8083/connectors
toward any of the running connectors. Checking is there loaded worker is done at URL
GET http://localhost:8083/connectors
My sink connector has System.out.println() lines in code with which I can follow output of my code in the console log.
When my worker is running I can see that only one instance of connector is executing code. If I terminate one connector another instance will take over the worker and execution will resume. However this is not what I want.
My goal is that both connector instances are running worker code so that they can share the load between them.
I've tried to got over some open source connectors to see is there specifics in writing code of connectors but with no success.
I've made some different attempts to tackle this problem but with no success.
I could rewrite my business code to come around this but I'm pretty sure I'm missing on something not obvious for me.
Recently I commented on Robin Moffatt's answer of this question.
From the sounds of it your custom code is not correctly spawning the number of tasks that you are expecting.
Make sure that you've set tasks.max >1 in your config
Make sure that your connector is correctly creating the appropriate number of tasks to taskConfigs
References:
https://opencredo.com/blogs/kafka-connect-source-connectors-a-detailed-guide-to-connecting-to-what-you-love/
https://docs.confluent.io/current/connect/devguide.html
https://enfuse.io/a-diy-guide-to-kafka-connectors/

Kafka connect cluster setup or launching connect workers

I am going through kafka connect, and i am trying to get the concepts.
Let us say I have kafka cluster (nodes k1, k2 and k3) setup and it is running, now i want to run kafka connect workers in different nodes say c1 and c2 in distributed mode.
Few questions.
1) To run or launch kafka connect in distributed mode I need to use command ../bin/connect-distributed.sh, which is available in kakfa cluster nodes, so I need to launch kafka connect from any one of the kafka cluster nodes? or any node from where I launch kafka connect needs to have kafka binaries so that i will be able to use ../bin/connect-distributed.sh
2) I need to copy the my connector plugins to any kafka cluster node( or to all cluster nodes?) from where I do the step 1?
3) how does kafka copies these connector plugins to worker node before starting jvm process on the worker node? because the plugin is the one which has my task code and it needs to be copied to worker in order to start the process in worker.
4) Do i need to install anything in connect cluster nodes c1 and c2, like need to install java or any kafka connect related?
5) In some places it says use confluent platform but i would like to start it with apache kafka connect alone first.
can some one please through some light or even pointer to some resources would also help.
Thank you.
1) In order to have a highly available kafka-connect service you need to run at least two instances of connect-distributed.sh on two distinct machines that have the same group.id. You can find more details regarding the configuration of each worker here. For improved performance, Connect should be ran independently of the broker and Zookeeper machines.
2) Yes, you need to place all your connectors under plugin.path (normally under /usr/share/java/) on every machine that you are planning to run kafka-connect.
3) kafka-connect will load the connectors on startup. You don't need to handle this. Note that if your kafka-connect instance is running and a new connector is added, you need to restart the service.
4) You need to have Java installed on all your machines. For Confluent Platform particularly:
Java 1.7 and 1.8 are supported in this version of Confluent Platform
(Java 1.9 is currently not supported). You should run with the
Garbage-First (G1) garbage collector. For more information, see the
Supported Versions and Interoperability.
5) It depends. Confluent was founded by the original creators of Apache Kafka and it comes as a more complete distribution adding schema management, connectors and clients. It also comes with KSQL which is quite useful if you need to act on certain events. Confluent simply adds on top of the Apache Kafka distribution, it's not a modified version.
Answer given by Giorgos is correct. I ran few connectors and now I understand it better.
I am just trying to put it differently.
In Kafka connect there are two things involved one is Worker and second is connector.Below is on details about running distributed Kafka connect.
Kafka connect Worker is a Java process on which the connector/connect task will run. So first thing is we need to launch worker, to run/launch a worker we need java installed on that machine then we need Kafka connect related sh/bat files to launch worker and kafka libs which will be used by kafka connect worker, for this we will just simply copy/install Kafka in the worker machine, also we need to copy all the connector and connect-task related jars/dependencies in "plugin.path" as defined in the below worker properties file, now worker machine is ready, to start worker we need to invoke ./bin/connect-distributed.sh ./config/connect-distributed.properties, here connect-distributed.properties will have configuration for worker. The same thing has to be repeated in each machine where we need to run Kafka connect.
Now the worker java process is running in all machines, the woker config will have group.id property, the workers which have this same property value will be forming a group/cluster of workers.
Each worker process will expose rest endpoint (default http://localhost:8083/connectors), to launch/start a connector on the running workers, we need do http-post a connector config json, based on the given config the worker will start the connector and the number of tasks in the above group/cluster workers.
Example: Connect post,
curl -X POST -H "Content-Type: application/json" --data '{"name": "local-file-sink", "config": {"connector.class":"FileStreamSinkConnector", "tasks.max":"3", "file":"test.sink.txt", "topics":"connect-test" }}' http://localhost:8083/connectors