I have started confluent platform on windows with the help of docker. I am able to start Kafka Broker, Zookeeper & control center.
I have setup the confluent-cli as well in docker. Now when I try to run the commands on confluent-cli, it's expecting --url param. Confluent docs says that Contact your IT admin to get the HTTP address for the <service url> (for example, http://127.0.0.1:8080/) for each Confluent Platform component. Since I am running docker on my local now, what would be my service url ?
Now, I have started schema-registry as well and I can use the schema-registry port to check the cluster details.
confluent cluster describe --url http://<my_ip>:<schema_registry_port>
I got the below output when I run the describe command
Scope:
Type | ID
+-------------------------+------------------------+
kafka-cluster | d60cQ7BWQTSz5v9fNuvQRw
schema-registry-cluster | schema-registry
Reference : https://docs.confluent.io/current/cli/command-reference/confluent_cluster_describe.html#example
Related
I installed confluent using CFK (Confluent for Kubernetes) way of deployment, setup went fine, using the vanilla yaml file for the entire components (zookeeper, kafka, connect, ksql, control-center, schema-registry).
I tried to use kind: Connector to configure my sqlserver source connector, connector created succesfully.
The problem came when I tried to list the connector via the below curl request (after port-forwarding to the pod)
curl localhost:8083/connectors | jq .
I got nothing registered, however when I ran the below command:
kubectl confluent connector list
it shows that I have a registered connectors as below, I assume both are 2 faces for the same coin.
NAME STATUS TASKS-READY TASKS-FAILED AGE
bq-sink-conn 0 9h
mssql-source-conn 0 9h
My question is, why is this discrepancy?, or I am missing something
also after a week of searching and looking on the internet, I can't find enough resources with example on how to use CFK and specifically Connectors CR.
Thanks,
To run cp S3-connect to consume kafka topic in my local mac, I did something like below
1. Installed Confluent Kafka connector and ran the kafka connect-standalone.sh
ML-C02Z605SLVDQ:kafka_2.12-2.5.0 e192270$ confluent-hub install confluentinc/kafka-connect-s3:latest --component-dir /usr/local/share/java --worker-configs config/connect-distributed.properties
ML-C02Z605SLVDQ:kafka_2.12-2.5.0 e192270$ cd kafka_2.12-2.5.0
ML-C02Z605SLVDQ:kafka_2.12-2.5.0 e192270$ bin/connect-standalone.sh config/connect-standalone.properties s3-sink.properties. // s3-sink.properties connector.class=io.confluent.connect.s3.S3SinkConnector
Now, to run Kafka S3 connect in minikube I have installed Kafka-connect(kafka-connect-s3) in minikube using cp-helm-charts with help this tutorial Using a connector with Helm-installed Kafka/Confluent.
How to copy kafka config and script files inside kafka-connect pod ?
Should I need to login kafka-connect pod to run
connect-standalone.sh command?
There is a from scratch procedure here. The only requirement is Minikube.
The steps you need are the following:
Start Minikube
Deploy a Kafka cluster using the Strimzi Operator
Build your own custom image including required plugins and dependencies
Deploy Kafka Connect cluster in distributed mode using that image
Create a KafkaConnector instance passing a configuration YAML
How to copy kafka config and script files inside kafka-connect pod
You shouldn't copy anything. Everything is configured by env-vars. The Helm charts should be mostly documenting how those variables are working.
The Docker image uses Connect Distributed, which is started via a REST API, not a property file. And confluentinc/cp-kafka-connect already contains S3 Connect
You can also take a look at https://strimzi.io/.
The project is aimed at making the installation and management of a Kafka and Kafka Connect cluster on Kubernetes very easy.
I am deploying Kafka-connect on Google Kubernetes Engine (GKE) using cp-kafka-connect Helm chart in distributed mode.
A working Kafka cluster with broker and zookeeper is already running on the same GKE cluster. I understand I can create connectors by sending post requests to http://localhost:8083/connectors endpoint once it is available.
However, Kafka-connect container goes into RUNNING state and then starts loading the jar files and till all the jar files are loaded the endpoint mentioned above is unreachable.
I am looking for a way to automate the step of manually exec the pod, check if the endpoint is ready and then send the post requests. I have a shell script that has a bunch of curl -X POST requests to this endpoint to create the connectors and also have config files for these connectors which work fine with standalone mode (using Confluent platform show in this confluent blog).
Now there are only two ways to create the connector:
Somehow identify when the container is actually ready (when the endpoint has started listening) and then run the shell script containing the curl requests
OR use the configuration files as we do in standalone mode (Example: $ <path/to/CLI>/confluent local load connector_name -- -d /connector-config.json)
Which of the above approach is better?
Is the second approach (config files) even doable with distributed mode?
If YES: How to do that?
If NO: How to successfully do what is explained in the first approach?
EDIT:
With reference to his github issue(thanks to #cricket_007's answer below) I added the following as the container command and connectors got created after the endpoint gets ready:
...
command:
- /bin/bash
- -c
- |
/etc/confluent/docker/run &
echo "Waiting for Kafka Connect to start listening on kafka-connect "
while : ; do
curl_status=`curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors`
echo -e `date` " Kafka Connect listener HTTP state: " $curl_status " (waiting for 200)"
if [ $curl_status -eq 200 ] ; then
break
fi
sleep 5
done
echo -e "\n--\n+> Creating Kafka Connector(s)"
/tmp/scripts/create-connectors.sh
sleep infinity
...
/tmp/scripts/create-connectors.sh is a script mounted externally containing a bunch of POST requests using CURL to the Kafka-connect API.
confluent local doesn't interact with a remote Connect cluster, such as one in Kubernetes.
Please refer to the Kafka Connect REST API
You'd connect to it like any other RESTful api running in the cluster (via a Nodeport, or an Ingress/API Gateway for example)
the endpoint mentioned above is unreachable.
Localhost is the physical machine you're typing the commands into, not the remote GKE cluster
Somehow identify when the container is actually ready
Kubernetes health checks are responsible for that
kubectl get services
there are only two ways to create the connector
That's not true. You could additional run Landoop's Kafka Connect UI or Confluent Control Center in your cluster to point and click.
But if you have local config files, you could also write code to interact with the API
Or try and see if you can make a PR for this issue
https://github.com/confluentinc/cp-docker-images/issues/467
I am running Apache Drill and Zookeeper on a Kubernetes cluster.
Drill is connecting to zookeeper through a zookeeper-service running on port 2181. I am trying the persist storage plugin configuration on zookeeper. On the Apache Drill docs (https://drill.apache.org/docs/persistent-configuration-storage/), it is given that sys.store.provider.zk.blobroot key needs to be added to drill-override.conf property. But I am not able to figure out a value for this key if I want to connect it to Zookeeper service in Kubernetes.
The value should be:
<name-of-your-zk-service>.<namespace-where-zk-is-running>.svc.cluster.local:2181
That's how services get resolved internally in Kubernetes. You can always test it by creating a Pod, connecting to is using kubectl exec -it <pod-name> sh, and running:
ping <name-of-your-zk-service>.<namespace-where-zk-is-running>.svc.cluster.local
Hope it helps!
This is an optional config. You can specify it to modify where the ZooKeeper PStore provider offloads query profile data [1] or you can remove this property from your drill-override.conf and restart drillbits.
[1] http://doc.mapr.com/display/MapR/Persistent+Configuration+Storage
I have Docker and OpenShift client installed on Ubuntu 16.04.3 LTS
[vagrant#desktop:~] $ docker --version
Docker version 18.01.0-ce, build 03596f5
[vagrant#desktop:~] $ oc version
oc v3.7.1+ab0f056
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://127.0.0.1:8443
openshift v3.7.1+282e43f-42
kubernetes v1.7.6+a08f5eeb62
[vagrant#desktop:~] $
Notice server URL https://127.0.0.1:8443.
I can start a cluster using oc cluster up
vagrant#desktop:~] $ oc cluster up --public-hostname='ocp.devops.ok' --host-data-dir='/var/lib/origin/etcd' --use-existing-config --routing-suffix='cloudapps.lab.example.com'
Starting OpenShift using openshift/origin:v3.7.1 ...
OpenShift server started.
The server is accessible via web console at:
https://ocp.devops.ok:8443
I can access the server using https://ocp.devops.ok:8443 but then the OCP will redirect to https://127.0.0.1:8443. So it redirect to kubernetes server URL I think.
This raises the question about public-hostname. What does it do? It is not used by OpenShift I think because it redirects to Kubernetes server URL.
How do I change this setting in Kubernetes?
I think that because --public-hostname does not specify the ip to be bound, and that ip currently is 127.0.0.1, som of the config is set to that value, and hence the oauth challenge redirects you there. I hope it might be solved in 3.10.
See this issue described in OpensShift's Origin GitHub
The problem is as it turns out is use-existing-config. If I remove that from the command there is no redirect.