Safely give secret/token to Kafka Connector? - apache-kafka

We are using Kafka Connectors (JDBC and others), and configuring them using the REST API (using curl in shell scripts). Right now, when testing/developing, we are including secrets (for the JDBC connect - database user/pw) directly in the request. This is obviously bad, as those are then readily available for everybody to see when reading them out using the GET request.
Is there a good way to give secrets to the connectors? We can bring them in safely using environment variables or config files (injected fom OpenShift) - but is there a syntax available when starting a connector via the REST API for that?
EDIT: This is for the distributed mode of connectors; i.e., configuration by REST API, not connector config files...

A pluggable interface for this was implemented in Apache Kafka 2.0 through KIP-297. You can see more details in the documented example here.

Related

How can I install connector config in kafka connect

Is there any other way to deploy connector config rather than POSTing connector config to kafka connect REST api? https://docs.confluent.io/platform/current/connect/references/restapi.html#tasks
I am thinking of any form of persistent approach like a volume or s3, where connect during bootstrap would grap those configs would be great. Don't know/can't find if thats anywhere available.
regards
The REST API is the only way.
You can use abstractions like Terraform or Kubernetes resources, however, which wrap an HTTP client.
If you use other storage, that'll require you to write extra code to download files and call the REST API.

Submit jobs via Rest API and deploy Flink on a running Kubernetes cluster (Native way)

I am trying to implement a Rest client for Flink to send jobs via Restful Flink services. And also I want to integrate Flink and Kubernetes natively. I have decided to use “Application Mode” as deployment mode according to Flink documentation .
I have already implemented a job and packaged it as jar. And I have tested it on Standalone Flink. But my aim is to move on Kubernetes and deploy my application in Application mode via Rest API of Flink.
I have already investigated the samples at Flink documentation - Native Kubernetes. But I cannot find a sample for executing same samples via Restful services (esp. how to set --target kubernetes-application/kubernetes-session or other parameters).
In addition to samples, I checked out the Flink sources from GitHub and tried to find some sample implementation or get some clue.
I think the below ones are related with my case.
org.apache.flink.client.program.rest. RestClusterClient
org.apache.flink.kubernetes. KubernetesClusterDescriptorTest. testDeployApplicationCluster
But they are all so complicated for me to understand below points.
For application mode, are there any need to initialize a container to serve Flink Rest services before submitting job? If so, is it JobManager?
For application mode, how can I set the same command line parameters via Rest services?
For session mode, in command line samples, kubernetes-session.sh is executed before job submission to initialize a JobManager container. How sould I do this step via Rest client?
For session mode, how can I set the same command line parameters via Rest services? Although the command line samples send .jar job as parameter, should I upload jar before submitting job?
Could you please provide me some clue/sample to continue my implementation?
Best regards,
Burcu
I suspect that if you study the implementation of the Apache Flink Kubernetes Operator you'll find some clues.

Debezium io with pulsar

I want to understand how pulsar uses debezium io connect for CDC.
While creating the source using pulsar-admin source create, how can I pass broker url and authentication params or client. Similar to what we di when using localrun.
The cmd I run :
bin/pulsar-admin source localrun --sourceConfigFile debezium-mysql-source-config.yaml --client-auth-plugin --client-auth-params --broker-service-url
Now I want to replace this to create a connector which runs in cluster mode.
Localrun is a special mode that simplifies debugging and it runs outside of normal cluster. It needs extra parameters to create the client for the local runtime.
In the cluster mode the connector will get the client from the Pulsar connectors runtime/through the function worker configuration. All you need to do is use "bin/pulsar-admin source create ...".

I want to deploy janusgraph. which storage backend should i use for cassandra. Is it cql or cassandrathrift?

Problem -> I want to deploy JanusGraph as separate service on Kubernetes. which storage backend should i use for cassandra. Is it CQL or cassandrathrift?? Cassandra is running as stateful service on Kubernetes.
Detailed Description-> As per JanusGraph doc, in case of Remote Server Mode, storage backend should be cql.
JanusGraph graph = JanusGraphFactory.build().
set("storage.backend", "cql").
set("storage.hostname", "77.77.77.77").
open();
Even they mentioned that Thrift is deprecated going ahead with Cassandra 2.1 & I am using Cassandra 3.
But in some blog, they have mentioned that rest api call from JanusGraph to Cassandra is possible only through Thrift.
Is Thrift really required? Can't we use CQL as storage backend for rest api call as well?
Yes, you absolutely should use the cql storage backend.
Thrift is deprecated, disabled by default in the current version of Cassandra (version 3), and has been removed from Cassandra version 4.
I would also be interested in reading the blog post you referenced. Are you talking about IBM's Rest API mentioned in their JanusGraph-utils Git repo? That confuses me as well, because I see both Thrift and CQL config happening there. In any case, I would go with the cql settings and give it a shot.
tl;dr;
Avoid Thrift at all costs!

Flume Metrics through REST API

I'm running hortonworks 2.3 and currently hooking into the REST API through ambari to start/stop the flume service and also submit configurations.
This is all working fine, My issue is how do I get the metrics?
Previously I used to run an agent with the parameters to produce the metrics to a http port and then read them in from there using this:
-Dflume.root.logger=INFO,console
-Dflume.monitoring.type=http
-Dflume.monitoring.port=XXXXX
However now that Ambari kicks off the agent I no longer have control over this.
Any assistance appreciated :-)
Using Ambari 2.6.2.0,
http://{ipadress}:8080/api/v1/clusters/{your_cluster_name}/components/?ServiceComponentInfo/component_name=FLUME_HANDLER&fields=host_components/metrics/flume/flume
gives flume metrics breakdown by components.
Found the answer by giving a shot (and doing some cropping) to the API call provided to this JIRA issue (which complains about how slow fetching flume metrics is) https://issues.apache.org/jira/browse/AMBARI-9914?attachmentOrder=asc
Hope this helps.
I don't know if you still need the answer. That happens because Hortonworks, by default, disable JSON monitoring, they use their own metric class to send the metrics to Ambari Metrics. While you can't retrieve it from Flume directly, you still can retrieve it from Ambari REST API: https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/index.md.
Good luck,