Kafka Connect: Getting connector configuration - apache-kafka

I have been testing with kafka connect. But for every connector I have to go and read the connector documentation to understand the configuration needed for the connectors. As far as I read the kafka connect API documentation I have seen to APIs to get the connector related data.
GET /connector-plugins - return a list of connector plugins installed in the Kafka Connect cluster. Note that the API only checks for connectors on the worker that handles the request, which means you may see inconsistent results, especially during a rolling upgrade if you add new connector jars.
PUT /connector-plugins/{connector-type}/config/validate - validate the provided configuration values against the configuration definition. This API performs per config validation, returns suggested values and error messages during validation.
Rest of other APIs are related to created connectors. Is there anyway to get the configuration for the required connectors?

Is there anyway to get the configuration for the required connectors
The validate endpoint does exactly that, and is what the Landoop Kafka Connect UI uses to provide errors for missing/misconfigured properties.
The implementation details of how properties become required depends on the Importance level of the connector configuration, and for any non-high importance configs, referring documentation or source code (if available) would be best

Related

Kafka Connect - connectors stop after no data for some time

I am running Kafka Connect in distributed mode on Kubernetes with 3 sink connectors, Kafka -> S3.
When data flows into Kafka and at least one of the connectors has data to read, everything works fine.
But on periods when there is no data to read, for a few hours for example, and none of the connectors needs to read any data, all the connectors stop (the /connectors endpoint on the Rest API shows an empty list). So when new data comes in eventually - it is not read unless manually starting the connectors.
Is this common behavior or am I missing something? I can add additional information about the setup if needed.
Based on comments, your config.storage.topic was not created with cleanup.policy=compact, therefore Kafka is deleting your configs for idle configurations, not idle connector tasks. When the configs are deleted from the topic, then the REST API removes the /connector response information.
Refer documentation on appropriate configurations for the internal Connect topics
https://kafka.apache.org/documentation/#connect

Is there a sample example of opensource Kafka Cassandra connector configuration?

We are feeding events (logs) from Logstash to Apache Cassandra using the PerimeterX Cassandra Logstash out plugin. We have hit the max throughput of the plugin to be 8K as it opens only 2 connections to Cassandra whereas Cassandra has a much higher throughput (for consuming data) and we expecting a throughput on the actual system to be 30K or higher.
Here throughput is the capacity to consume the incoming events, which is x units/sec
Hence we planned to introduced Kafa in the middle which has a 45K throughput with Logstash output.
We are looking for help from this stack overflow post. We could configure the connector JAR as mentioned in the documentation. But there is no proper guide or current documentation is very confusing and goes in a loop with the configuration requirement. We don't see the plugin being called when Kafka is running with the target topic.
Some help on what is the correct configuration, or some documentation info on Cassandra keyspaces will be helpful.
After placing the JAR as mentioned in the documentation
We need to run Kafka connect which will show all the connectors configured.
To turn on Kafka connect run the below command (Kafka connect in distributed mode)
bin/connect-distributed.sh config/connect-distributed.properties
Kafka connect has a REST API service available at http://localhost:8083
using this REST API you can configure your connectors.
To register the connector use the below API
POST /connectors – creates a new connector; the request body should be a JSON object containing a string name field and an object config field with the connector configuration parameters
The JSON sample to register the connector is present kafka-connect-cassandra-sink-1.4.0.tar.gz file.
The official-documentation provides a list with all endpoints.
More info available here

ArtemisMQ Connector

I'm new to ArtemisMQ and absolutely don't understand the sense of connectors.
Why is connector essential, as we already specify accepter of Broker Server in broker.xml -> we know which port (it is accepter port) to send a request to if we want to connect to this server. Even if this server is part of cluster, what is a role of connector? There is also information from other part of documentation about "Clusters", but there is words about cluster connections :
The cluster is formed by each node declaring cluster connections to other nodes in the core configuration file broker.xml. When a node forms a cluster connection to another node, internally it creates a core bridge (as described in Core Bridges) connection between it and the other node, this is done transparently behind the scenes - you don't have to declare an explicit bridge for each node. These cluster connections allow messages to flow between the nodes of the cluster to balance load.
From documentation "Understanding Connectors":
connectors are used by a client to define how it connects to a server.
What does it mean "define how"?
I've already read and another question about connector, but it doesn't help me.
Additional questions:
Is connector always the same as acceptor(I've downloaded some official examples and all of them(that i've seen) have both same acceptor and connector )?
What information does connector encapsulates, if it only consists of host+port (and it is same as acceptor's (if we omit that acceptor host can me 0.0.0. or localhost))?
Why does stand-alone Broker have connector, for example by default creation ./artemis create?
What should we write in connector?
Can you give a simple example when acceptor and connector are
different?
Two important points to note:
A connector is not essential depending on your use-case. You'll find that the default broker.xml doesn't have any connector elements defined. For example, if you just run ./artemis create the generated broker.xml will not have any connector elements.
The documentation you cited is quite old (from the very first release of Artemis). You may benefit from reading the latest documentation which has been updated for clarity in many places.
As noted in both the documentation and the other Stack Overflow answer you cited, certain components in the broker need to connect to other brokers (e.g. core bridges, cluster-connections, etc.). A connector encapsulates the information necessary for these other components to make the connections they need. It's really as simple as that.
Now regarding your individual questions...
Even if this server is part of cluster, what is a role of connector?
In the case of a cluster using a broadcast-group and a discovery-group each node in the cluster needs to broadcast to all the other nodes in the cluster how the other nodes can connect to itself. It does this by broadcasting a connector which is referenced in the cluster-connection configuration. When the other nodes in the cluster receive this broadcast they take the connector information and use it to connect back to the node which broadcast it originally. In this way nodes can dynamically discover and connect to each other. It's also worth noting that in this case the connector configuration will essentially mirror one of the broker's acceptor configurations (since the connector will be used by other nodes to connect to the broadcasting node's acceptor). This is discussed further in the cluster documentation.
...connectors are used by a client to define how it connects to a server...
This bit of documentation you quoted is accurate but may be a bit confusing. Keep in mind that that a client can run anywhere, even within the broker itself. In the case of core bridges and cluster connections there is a client running in the broker which use the connector to determine how to connect to another broker. For what it's worth the updated documentation doesn't have this specific wording.
What does it mean "define how"?
A connector is the URL that the client needs to connect to the broker. The URL can simply include the host and port or it can contain lots of configuration details for the connection (e.g. SSL config).
Is connector always the same as acceptor..?
No, not always. In the case of a cluster they will be the same (or very close) for the reasons I already outlined, but in the case of a bridge they won't be the same.
What information does connector encapsulates..?
See above.
Why does stand-alone Broker have connector, for example by default creation ./artemis create?
It doesn't. See above.
What should we write in connector?
The URL needed to connect.
Can you give a simple example when acceptor and connector are different?
As mentioned previously, bridging is an example where different acceptors and connectors are used. ActiveMQ Artemis ships with a "core-bridge" example in the examples/features/standard directory which demonstrates different acceptors and connectors. The example involves 2 different brokers with one broker having a core bridge configured to send messages to the other broker. Here's the broker.xml with the bridge defined. You can see the acceptor listening on the localhost:61616 and the connector for localhost:61617. This connector points to the other broker which is listening on localhost:61617.

Can I access Kafka Connect Worker config from connector or task?

I am developing a custom Kafka source connector. I would like to access the worker configuration, such as key converter or value converter or schema registry url or zookeeper url etc., but I didn't find a way to do that. Any idea? Is that possible?
To be more specific, in my implementation of connector and task, can I access worker's configuration? I checked, the only thing I can get is a ConnectorContext from connector implementation, and it has only one useful method to do reconfiguration, but it is not what I want.

Publish message to kafka via http

I'm new with kafka and I'm trying to publish data from external application via http but I cannot find the way to do this.
I already created a topic in kafka and test it producing and consuming the message but I don't know how to insert/publish message via http, I tried to invoke the following url to retrieve the topics but it does not retrieve any data http://servername:2181/topics/
I'm using cloudera 5.12.1.
You can access to your topics, if it was already created, using APIs. The easy way...(see client list)
Or see Connects Config to manage connectors by REST (rest.host.name, rest.port parameters). But only connectors...
To consume or produce message in a topic, use a middleware. IT is more feaseble.
Check out the open source Kafka REST Proxy from Confluent. It does exactly what you want.
You can get it standalone, or as part of Confluent Platform.