How can I install connector config in kafka connect - apache-kafka

Is there any other way to deploy connector config rather than POSTing connector config to kafka connect REST api? https://docs.confluent.io/platform/current/connect/references/restapi.html#tasks
I am thinking of any form of persistent approach like a volume or s3, where connect during bootstrap would grap those configs would be great. Don't know/can't find if thats anywhere available.
regards

The REST API is the only way.
You can use abstractions like Terraform or Kubernetes resources, however, which wrap an HTTP client.
If you use other storage, that'll require you to write extra code to download files and call the REST API.

Related

Consume events from AWS EventBridge in a self hosted kafka cluster outsite aws

We got a SaaS which is publishing it's events on AWS eventbridge (coulple of milion per day). We would love to consume those events and put them on our self hosted Kafka cluster. What would be the best methode to do this? We where thinking about lambda's, but the seem expensive for this use case and we don't to to manage to many other components. Does anyone have some good practices?
i was looking for a similar solution but in my case it is from eventbridge to MSK within AWS account. at this point looks like the only option is to use a lambda to feed into Kafka.
As per today, AWS only supports following Targets - https://docs.amazonaws.cn/en_us/eventbridge/latest/userguide/eb-targets.html#eb-console-targets
I have a similar use case where i need to send a message to AWS RabbitMQ or even to AWS Kafka as i need Priority Logic Implemented.
As AWS supports Lambda's, the message can be forwarded to the lambda from where it can be fed to any system

Dynamic creation of Kafka Connectors

I have deployed a Kafka cluster and a Kafka Connect cluster in kubernetes, using Strimzi and AKS. And I wanted to start reading from RSS resources to feed my Kafka cluster, so I created a connector instance of "org.kaliy.kafka.connect.rss.RssSourceConnector" which reads from a specific RSS feed, given an url, and writes to a specific topic. But my whole intention with this is to eventually have a Kafka Connect cluster able to manage a lot of external requests of new RSSs to read from; and here is where all my doubts come in:
Shoud I create an instance of Kaliy RSS connector for each RSS feed? Or would it be better to implement my own connector, so I create only one instance of it and each time I want to read a new RSS feed I would create a new Task in the connector?
Who should be resposible of assuring the Kafka Connect Cluster state is the desired one? I mean that if a Connector(in the case of 1 RSS feed : 1 Connector instance) stopped working, who should try to start it again? An external client via the Kafka Connect REST API? Kubernetes itself?
Right now, I think my best option is to rely on Kafka Connect REST API making the external client responsible of managing the state of the set of connectors, but I don't know if these was designed to recieve a lot of requests as it would be the case. Maybe these could be scaled by provisioning several listeners in the Kafka Connect REST API configuration but I do not know.
Thanks a lot!
One of the main benefits in using Kafka Connect is to leverage a configuration-driven approach, so you will lose this by implementing your own Connector. In my opinion, the best strategy is to have one Connector instance for each RSS feed. Reducing the number of instances could make sense when having a single data source system, to avoid overloading it.
Using Strimzi Operator, Kafka Connect cluster would be monitored and it will try to restore the desired cluster state when needed. This does not include the single Connector instances and their tasks, but you may leverage the K8s API for monitoring the Connector custom resource (CR) status, instead of the REST API.
Example:
$ kubetctl get kafkaconnector amq-sink -o yaml
apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaConnector
# ...
status:
conditions:
- lastTransitionTime: "2020-12-07T10:30:28.349Z"
status: "True"
type: Ready
connectorStatus:
connector:
state: RUNNING
worker_id: 10.116.0.66:8083
name: amq-sink
tasks:
- id: 0
state: RUNNING
worker_id: 10.116.0.66:8083
type: sink
observedGeneration: 1
It could be late, but it could help anyone will pass by the question, It is more relevant to have a look at Kafka-connect CR (Custom Resources) as a part of Confluent For Kubernetes (CFK), it introduces a clear cut declarative way to manage and monitor Connector with health checks and auto healing.
https://www.confluent.io/blog/declarative-connectors-with-confluent-for-kubernetes/

I want to deploy janusgraph. which storage backend should i use for cassandra. Is it cql or cassandrathrift?

Problem -> I want to deploy JanusGraph as separate service on Kubernetes. which storage backend should i use for cassandra. Is it CQL or cassandrathrift?? Cassandra is running as stateful service on Kubernetes.
Detailed Description-> As per JanusGraph doc, in case of Remote Server Mode, storage backend should be cql.
JanusGraph graph = JanusGraphFactory.build().
set("storage.backend", "cql").
set("storage.hostname", "77.77.77.77").
open();
Even they mentioned that Thrift is deprecated going ahead with Cassandra 2.1 & I am using Cassandra 3.
But in some blog, they have mentioned that rest api call from JanusGraph to Cassandra is possible only through Thrift.
Is Thrift really required? Can't we use CQL as storage backend for rest api call as well?
Yes, you absolutely should use the cql storage backend.
Thrift is deprecated, disabled by default in the current version of Cassandra (version 3), and has been removed from Cassandra version 4.
I would also be interested in reading the blog post you referenced. Are you talking about IBM's Rest API mentioned in their JanusGraph-utils Git repo? That confuses me as well, because I see both Thrift and CQL config happening there. In any case, I would go with the cql settings and give it a shot.
tl;dr;
Avoid Thrift at all costs!

Safely give secret/token to Kafka Connector?

We are using Kafka Connectors (JDBC and others), and configuring them using the REST API (using curl in shell scripts). Right now, when testing/developing, we are including secrets (for the JDBC connect - database user/pw) directly in the request. This is obviously bad, as those are then readily available for everybody to see when reading them out using the GET request.
Is there a good way to give secrets to the connectors? We can bring them in safely using environment variables or config files (injected fom OpenShift) - but is there a syntax available when starting a connector via the REST API for that?
EDIT: This is for the distributed mode of connectors; i.e., configuration by REST API, not connector config files...
A pluggable interface for this was implemented in Apache Kafka 2.0 through KIP-297. You can see more details in the documented example here.

Flume Metrics through REST API

I'm running hortonworks 2.3 and currently hooking into the REST API through ambari to start/stop the flume service and also submit configurations.
This is all working fine, My issue is how do I get the metrics?
Previously I used to run an agent with the parameters to produce the metrics to a http port and then read them in from there using this:
-Dflume.root.logger=INFO,console
-Dflume.monitoring.type=http
-Dflume.monitoring.port=XXXXX
However now that Ambari kicks off the agent I no longer have control over this.
Any assistance appreciated :-)
Using Ambari 2.6.2.0,
http://{ipadress}:8080/api/v1/clusters/{your_cluster_name}/components/?ServiceComponentInfo/component_name=FLUME_HANDLER&fields=host_components/metrics/flume/flume
gives flume metrics breakdown by components.
Found the answer by giving a shot (and doing some cropping) to the API call provided to this JIRA issue (which complains about how slow fetching flume metrics is) https://issues.apache.org/jira/browse/AMBARI-9914?attachmentOrder=asc
Hope this helps.
I don't know if you still need the answer. That happens because Hortonworks, by default, disable JSON monitoring, they use their own metric class to send the metrics to Ambari Metrics. While you can't retrieve it from Flume directly, you still can retrieve it from Ambari REST API: https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/index.md.
Good luck,