Add MongoDB Sink Connector on docker?

Add MongoDB Sink Connector on docker? - mongodb

I have a Ubuntu machine, where I followed this steps in order to run Confluent Platform with docker.
https://docs.confluent.io/platform/current/quickstart/ce-docker-quickstart.html
I can produce and subscribe to messages just fine.
I'm trying to add a MongoDB Sink Connector, in order to sync data with a mongo database.
I've downloaded this zip file https://www.confluent.io/hub/hpgrahsl/kafka-connect-mongodb
I've edited the etc/MongoDbSinkConnector.properties file with the correct mongo endpoint
I've uploaded the zip to my Ubuntu machine
I've created a file Dockerfile with the following content
FROM confluentinc/cp-kafka-connect-base
COPY hpgrahsl-kafka-connect-mongodb-1.4.0.zip /tmp/hpgrahsl-kafka-connect-mongodb-1.4.0.zip
RUN confluent-hub install --no-prompt /tmp/hpgrahsl-kafka-connect-mongodb-1.4.0.zip
I've executed the following command docker build . -t my-custom-image:1.0.0
Sending build context to Docker daemon 15.03MB
Step 1/3 : FROM confluentinc/cp-kafka-connect-base
---> 8fe065fffe44
Step 2/3 : COPY hpgrahsl-kafka-connect-mongodb-1.4.0.zip /tmp/hpgrahsl-kafka-connect-mongodb-
1.4.0.zip
---> Using cache
---> ed2e4ec7ff97
Step 3/3 : RUN confluent-hub install --no-prompt /tmp/hpgrahsl-kafka-connect-mongodb-1.4.0.zip
---> Using cache
---> 034f82e2e136
Successfully built 034f82e2e136
Successfully tagged my-custom-image:1.0.0
Am I missing something? My mongo does not get updated
Do I have to edit docker-compose.yml also?
How do I debug this connector, does it have logs?

When you run Kafka Connect under Docker (including with the cp-kafka-connect-base) image it is usually in distributed mode. To create a connector configuration in this mode you use a REST call; it won't load the configuration from a flat file (per standalone mode).
You can either launch the container that you've created and then manually create the connector with a REST call, or you can automate that REST call - here's an example of doing it within a Docker Compose:
kafka-connect-01:
image: confluentinc/cp-kafka-connect-base:6.2.0
depends_on:
- kafka
ports:
- 8083:8083
environment:
CONNECT_BOOTSTRAP_SERVERS: "kafka:29092"
[…]
command:
- bash
- -c
- |
#
echo "Installing connector plugins"
confluent-hub install --no-prompt hpgrahsl/kafka-connect-mongodb:1.4.0
#
echo "Launching Kafka Connect worker"
/etc/confluent/docker/run &
#
echo "Waiting for Kafka Connect to start listening on localhost ⏳"
while : ; do
curl_status=$$(curl -s -o /dev/null -w %{http_code} http://localhost:8083/connectors)
echo -e $$(date) " Kafka Connect listener HTTP state: " $$curl_status " (waiting for 200)"
if [ $$curl_status -eq 200 ] ; then
break
fi
sleep 5
done
echo -e "\n--\n+> Creating connector"
curl -s -X PUT -H "Content-Type:application/json" http://localhost:8083/connectors/mongo-connector/config \
-d '{
[… connector JSON config goes here …]
}'
sleep infinity
References:
https://rmoff.net/2018/12/15/docker-tips-and-tricks-with-kafka-connect-ksqldb-and-kafka/
https://developer.confluent.io/learn-kafka/kafka-connect/docker/

Related

when you use kafka(in docker container), Where exactly is the plugin path?

----- i use kafka, kafka-connect(image: confluentinc/cp-kafka-connect)
when you use kafka in docker container if you wanna operate kafka, you have to go into the container(like 'docker exec -it kafka' or 'docker exec -it kafka-connect' ----> this is another question what i wanna ask) , right..??
i tried putting some connector (jdbc connector, mysql connector) into kafka-connect container, but it didn't work.
so.. my question is
after docker-compose up(put in container), if i wanna connect with some connectors('./bin/connect-distributed.sh ./etc/kafka/connect-distributed.properties'),
what container i have to go into???
if i type plugin path, where should i write?( kafka? kafka-connect?)
I wouldn't mind if it was difficult to read... sorry for that

No, you don't need to exec anywhere unless you cannot download Kafka on your host machine to get the CLI scripts. But you'd only exec for kafka-topics, console producer/consumer, kafka-consumer-groups, etc, not any of the Connect scripts.
The Connect container automatically runs the Distributed script and you simply provide CONNECT_PLUGIN_PATH as an environment variable to any directory in the container you want to use for the plugins (I like /opt/connectors if I mount volume, but that's not where confluent-hub installs to for that image). That variable doesn't do anything for the broker image, only Connect.
Related How to install connectors to the docker image of apache kafka connect

If your requirement is startup a Kafka Connect.
You can use the basic guide published by Confluent "Build Your Own Apache Kafka® Demos"
Basically you need execute the following instructions:
git clone https://github.com/confluentinc/cp-all-in-one.git
cd cp-all-in-one/cp-all-in-one
git checkout 7.1.1-post
docker-compose up -d
This has Control Center at http://localhost:8088
If you need install a Connector, you can go to the https://www.confluent.io/hub select your specific connector.
Then, you can create your DockerImage of specific Kafka Connect server.
1.- Write a Dockerfile.
vim Dockerfile
2.- Add connector "example JDBC" from Confluent Hub.
FROM confluentinc/cp-kafka-connect
ENV MYSQL_DRIVER_VERSION 5.1.39
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-jdbc:10.5.0
RUN curl -k -SL "https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-${MYSQL_DRIVER_VERSION}.tar.gz" \
| tar -xzf - -C /usr/share/confluent-hub-components/confluentinc-kafka-connect-jdbc/lib \
--strip-components=1 mysql-connector-java-5.1.39/mysql-connector-java-${MYSQL_DRIVER_VERSION}-bin.jar
3.- Build the docker image.
docker build . -t my-kafka-connect-jdbc:1.0.0
4.- Then, you can go to edit your docker-compose.yml, change the line 57
from:
image: cnfldemos/cp-server-connect-datagen:0.5.3-7.1.0
to:
image: my-kafka-connect-jdbc:1.0.0
5.- Finally, stop and start your Confluent Platform local environment:
docker-compose down
docker-compose up
Verify your docker
docker ps
Test your Connect server:
curl --location --request GET 'http://localhost:8083/connectors'

Creating 2 S3 buckets in LocalStack via a docker-compose file

Currently we’re creating a localstack container using a docker-compose file, specifically for the purpose of using the S3 service.
We’ve added this line to the environment which creates an S3 bucket
- AMAZONPROPERTIES.BUCKETNAME=bucketname
We’ve then created any additional buckets needed using a utility within our Java code.
However, it would be preferable to create all buckets needed automatically at the outset using our docker-compose file. Is it possible to do this?

Not sure if it's the best way, but it works.
We're now running the docker-compose.yml in a bash script, waiting a short while to ensure that the service is running and then call a curl command within the docker container to create another S3 bucket.
#!/bin/bash
docker-compose -f docker-compose.yml up -d --build
echo "Waiting for Services to be ready"
sleep 20
docker exec -it general-files_general-files_1 curl -X POST
https://localhost:7777/createBucket -F bucketName=bucketname2 --insecure
echo
echo "S3 buckets available are: "
docker exec -it general-files_general-files_1 curl -X GET
https://localhost:7777/listBuckets --insecure
echo
echo "Services are ready for use"

How to run Kafka Connect connectors automatically (e.g. in production)?

Is there a way to automatically load (multiple) Kafka Connect connectors upon the start of Kafka Connect (e.g. in Confluent Platform)?
What I've found out so far:
Confluent Docs state to use the bin/connect-standalone
the command for Standalone Mode with a properties file for the worker and for every single connector.
For Distributed Mode you have to run the connector via REST API.
https://docs.confluent.io/current/connect/userguide.html#standalone-mode, https://docs.confluent.io/current/connect/managing/configuring.html#standalone-example
Is there another method, e.g. to include all connectors that should be run in the 'connect-[standalone|distributed].properties' file (similar to providing KSQL queries file in ksql-server.properties) so that they are loaded automatically upon the start of Kafka Connect (e.g. in Confluent Platform)?
Or are the connectors loaded "manually" as described above even in production environments?

Normally, you'd have to use the REST API when running Kafka Connect in distributed mode. However, you can use docker compose to script the creation of connectors;
#Robin Moffatt has written a nice article about this:
kafka-connect:
image: confluentinc/cp-kafka-connect:5.1.2
environment:
CONNECT_REST_PORT: 18083
CONNECT_REST_ADVERTISED_HOST_NAME: "kafka-connect"
[…]
volumes:
- $PWD/scripts:/scripts
command:
- bash
- -c
- |
/etc/confluent/docker/run &
echo "Waiting for Kafka Connect to start listening on kafka-connect ⏳"
while [ $$(curl -s -o /dev/null -w %{http_code} http://kafka-connect:8083/connectors) -eq 000 ] ; do
echo -e $$(date) " Kafka Connect listener HTTP state: " $$(curl -s -o /dev/null -w %{http_code} http://kafka-connect:8083/connectors) " (waiting for 200)"
sleep 5
done
nc -vz kafka-connect 8083
echo -e "\n--\n+> Creating Kafka Connect Elasticsearch sink"
/scripts/create-es-sink.sh
sleep infinity
Notes:
In the command section, $ are replaced with $$ to avoid the error
Invalid interpolation format for "command" option
sleep infinity is
necessary, because we’ve sent the /etc/confluent/docker/run process to
a background thread (&) and so the container will exit if the main
command finishes.
The actual script to configure the connector is a
curl call in a separate file. You could build this into the Docker
Compose but it feels a bit yucky.
You could combine both this and the
technique above if you wanted to install a custom connector plugin
before launching Kafka Connect, e.g.
confluent-hub install --no-prompt confluentinc/kafka-connect-gcs:5.0.0 /etc/confluent/docker/run

kafka connect connector doesn't start automatically

I have a Kafka Connect source and sink connector for putting data into Kafka and taking it back out.
I am running Kafka and Kafka Connect using docker-compose which runs connect in distributed mode. see that it finds my plugin when connect starts up, but it doesn't actually do anything unless I do a POST to the /connectors API, including the configuration in JSON.
I have a properties file with the configuration in it and I've tried putting it under /etc where I find similar properties files for the other plugins that are installed.
Am I missing a step when installing my plugin, or is it required to register the connector via the REST API before it will be assigned to workers?

Yes, you have to configure Kafka Connect using the REST API when using distributed mode.
It's possible to script the creation of connectors though, using a Docker Compose like this:
command:
- bash
- -c
- |
/etc/confluent/docker/run &
echo "Waiting for Kafka Connect to start listening on kafka-connect ⏳"
while [ $$(curl -s -o /dev/null -w %{http_code} http://kafka-connect:8083/connectors) -eq 000 ] ; do
echo -e $$(date) " Kafka Connect listener HTTP state: " $$(curl -s -o /dev/null -w %{http_code} http://kafka-connect:8083/connectors) " (waiting for 200)"
sleep 5
done
nc -vz kafka-connect 8083
echo -e "\n--\n+> Creating Kafka Connect Elasticsearch sink"
/scripts/create-es-sink.sh
sleep infinity
where /scripts/create-es-sink.sh is the REST call from curl in a file mounted locally to the container.
(source)

You can install a Kafka connector before you start the distributed Connect worker using "confluent-hub install" as shown here: Install Kafka connector manually). However, I'm not sure what the magic is if you aren't using confluent-hub though.

How can I wait for a docker container to be up and running?

When running a service inside a container, let's say mongodb, the command
docker run -d myimage
will exit instantly, and return the container id.
In my CI script, I run a client to test mongodb connection, right after running the mongo container.
The problem is: the client can't connect because the service is not up yet.
Apart from adding a big sleep 10in my script, I don't see any option to wait for a container to be up and running.
Docker has a command wait which doesn't work in that case, because the container doesn't exist.
Is it a limitation of docker?

Found this simple solution, been looking for something better but no luck...
until [ "`docker inspect -f {{.State.Running}} CONTAINERNAME`"=="true" ]; do
sleep 0.1;
done;
or if you want to wait until the container is reporting as healthy (assuming you have a healthcheck)
until [ "`docker inspect -f {{.State.Health.Status}} CONTAINERNAME`"=="healthy" ]; do
sleep 0.1;
done;

As commented in a similar issue for docker 1.12
HEALTHCHECK support is merged upstream as per docker/docker#23218 - this can be considered to determine when a container is healthy prior to starting the next in the order
This is available since docker 1.12rc3 (2016-07-14)
docker-compose is in the process of supporting a functionality to wait for specific conditions.
It uses libcompose (so I don't have to rebuild the docker interaction) and adds a bunch of config commands for this. Check it out here: https://github.com/dansteen/controlled-compose
You can use it in Dockerfile like this:
HEALTHCHECK --interval=5m --timeout=3s \
CMD curl -f http://localhost/ || exit 1
Official docs: https://docs.docker.com/engine/reference/builder/#/healthcheck

If you don't want to expose the ports, as is the case if you plan to link the container and might be running multiple instances for testing, then I found this was a good way to do it in one line :) This example is based on waiting for ElasticSearch to be ready:
docker inspect --format '{{ .NetworkSettings.IPAddress }}:9200' elasticsearch | xargs wget --retry-connrefused --tries=5 -q --wait=3 --spider
This requires wget to be available, which is standard on Ubuntu. It will retry 5 times, 3 seconds between tries, even if the connection is refused, and also does not download anything.

If the containerized service you started doesn't necessarily respond well to curl or wget requests (which is quite likely for many services) then you could use nc instead.
Here's a snippet from a host script which starts a Postgres container and waits for it to be available before continuing:
POSTGRES_CONTAINER=`docker run -d --name postgres postgres:9.3`
# Wait for the postgres port to be available
until nc -z $(sudo docker inspect --format='{{.NetworkSettings.IPAddress}}' $POSTGRES_CONTAINER) 5432
do
echo "waiting for postgres container..."
sleep 0.5
done
Edit - This example does not require that you EXPOSE the port you are testing, since it accesses the Docker-assigned 'private' IP address for the container. However this only works if the docker host daemon is listening on the loopback (127.x.x.x). If (for example) you are on a Mac and running the boot2docker VM, you will be unable to use this method since you cannot route to the 'private' IP addresses of the containers from your Mac shell.

Assuming that you know the host+port of your MongoDB server (either because you used a -link, or because you injected them with -e), you can just use curl to check if the MongoDB server is running and accepting connections.
The following snippet will try to connect every second, until it succeeeds:
#!/bin/sh
while ! curl http://$DB_PORT_27017_TCP_ADDR:$DB_PORT_27017_TCP_PORT/
do
echo "$(date) - still trying"
sleep 1
done
echo "$(date) - connected successfully"

I've ended up with something like:
#!/bin/bash
attempt=0
while [ $attempt -le 59 ]; do
attempt=$(( $attempt + 1 ))
echo "Waiting for server to be up (attempt: $attempt)..."
result=$(docker logs mongo)
if grep -q 'waiting for connections on port 27017' <<< $result ; then
echo "Mongodb is up!"
break
fi
sleep 2
done

Throwing my own solution out there:
I'm using docker networks so Mark's netcat trick didn't work for me (no access from the host network), and Erik's idea doesn't work for a postgres container (the container is marked as running even though postgres isn't yet available to connect to). So I'm just attempting to connect to postgres via an ephemeral container in a loop:
#!/bin/bash
docker network create my-network
docker run -d \
--name postgres \
--net my-network \
-e POSTGRES_USER=myuser \
postgres
# wait for the database to come up
until docker run --rm --net my-network postgres psql -h postgres -U myuser; do
echo "Waiting for postgres container..."
sleep 0.5
done
# do stuff with the database...

If you want to wait for an opened port, you can use this simple script:
until </dev/tcp/localhost/32022; do sleep 1; done
For wait until port 32022 be able to connect.

I had to tackle this recetly and came up with an idea. When doing research for this task I got here, so I thought I'd share my solution with future visitors of this post.
Docker-compose-based solution
If you are using docker-compose you can check out my docker synchronization POC. I combined some of the ideas in other questions (thanks for that - upvoted).
The basic idea is that every container in the composite exposes a diagnostic service. Calling this service checks if the required set of ports is open in the container and returns the overall status of the container (WARMUP/RUNNING as per the POC). Each container also has an utility to check upon startup if the dependant services are up and running. Only then the container starts up.
In the example docker-compose environment there are two services server1 and server2 and the client service which waits for both servers to start then sends a request to both of them and exits.
Excerpt from the POC
wait_for_server.sh
#!/bin/bash
server_host=$1
sleep_seconds=5
while true; do
echo -n "Checking $server_host status... "
output=$(echo "" | nc $server_host 7070)
if [ "$output" == "RUNNING" ]
then
echo "$server_host is running and ready to process requests."
break
fi
echo "$server_host is warming up. Trying again in $sleep_seconds seconds..."
sleep $sleep_seconds
done
Waiting for multiple containers:
trap 'kill $(jobs -p)' EXIT
for server in $DEPENDS_ON
do
/assets/wait_for_server.sh $server &
wait $!
done
Diagnostic srervice basic implementation (checkports.sh):
#!/bin/bash
for port in $SERVER_PORT; do
nc -z localhost $port;
rc=$?
if [[ $rc != 0 ]]; then
echo "WARMUP";
exit;
fi
done
echo "RUNNING";
Wiring up the diagnostic service to a port:
nc -v -lk -p 7070 -e /assets/checkports.sh

test/test_runner
#!/usr/bin/env ruby
$stdout.sync = true
def wait_ready(port)
until (`netstat -ant | grep #{port}`; $?.success?) do
sleep 1
print '.'
end
end
print 'Running supervisord'
system '/usr/bin/supervisord'
wait_ready(3000)
puts "It's ready :)"
$ docker run -v /tmp/mnt:/mnt myimage ruby mnt/test/test_runner
I'm testing like this whether the port is listening or not.
In this case I have test running from inside container, but it's also possible from outside whether mongodb is ready or not.
$ docker run -p 37017:27017 -d myimage
And check whether the port 37017 is listening or not from host container.

You can use wait-for-it, "a pure bash script that will wait on the availability of a host and TCP port. It is useful for synchronizing the spin-up of interdependent services, such as linked docker containers. Since it is a pure bash script, it does not have any external dependencies".
However, you should try to design your services to avoid these kind of interdependencies between services. Can your service try to reconnect to the database? Can you let your container just die if it can't connect to the database and let a container orchestrator (e.g. Docker Swarm) do it for you?

Docker-compose solution
After docker-compose I dont know name of docker container, so I use
docker inspect -f {{.State.Running}} $(docker-compose ps -q <CONTAINER_NAME>)
and checking true like here https://stackoverflow.com/a/33520390/7438079

In order to verify if a PostgreSQL or MySQL (currently) Docker container is up and running (specially for migration tools like Flyway), you can use the wait-for binary: https://github.com/arcanjoaq/wait-for.

For mongoDB docker instance we did this and works like a charm:
#!/usr/bin/env bash
until docker exec -i ${MONGO_IMAGE_NAME} mongo -u ${MONGO_INITDB_ROOT_USERNAME} -p ${MONGO_INITDB_ROOT_PASSWORD}<<EOF
exit
EOF
do
echo "Waiting for Mongo to start..."
sleep 0.5
done

Here is what I ended up with which is similar to a previous answer just a little more concise,
until [[ $(docker logs $db_container_name) == *"waiting for connections on port 27017"* ]]
do
echo "waiting on mongo to boot..."
sleep 1
done

1 : A container attached to a service with docker-compose doesn't launch when a Synology NAS starts up.
I had a problem launching a docker container on a Synology NAS that was attached to another container via docker-compose like this:
...
---
version: "3"
services:
gluetun:
image: qmcgaw/gluetun
container_name: gluetun
...
qbittorrent:
image: lscr.io/linuxserver/qbittorrent:latest
container_name: qbittorrent
# Connect the service to gluetun
network_mode: "service:gluetun"
...
The docker used by Synology is different or not up to date and apparently does not appreciate that a container is attached to another container with network_mode, the Synology docker considers that the container is not attached to any network and therefore can not launch the container. However in command line it works very well so I wanted to make a script to launch it automatically at the startup of my NAS by a scheduled task.
note : I creat my docker compose with portainer
2 : The until loop does not work even with all the different ways of writing the condition.
If like me on your Synology NAS you did not manage to make the until loop work as described superhero : here you will have to go through the while loop.
however with the -x argument of bash to debug my code the String comparison was well done:
output line (same with all ways of describing the expression):
...
+ [' false = true ']'
...
No matter what the result, nothing worked, I checked every time and there was always a moment when it did not work as I wanted.
4: THE SOLUTION FOR SYNOLOGY
Environment
DSM : 7.1.1
bash : 4.4.23
docker : 20.10.3
After finding the right syntax, we had to solve another following problem:
The docker container status check can only work if the synology docker package is running.
so i used synopkg with is_onoff, is_active doesn't work and status was giving too much string. so my solution gave this :
#!/bin/bash
while [ "$(synopkg is_onoff Docker)" != "package Docker is turned on" ]; do
sleep 0.1;
done;
echo "Docker package is running..."
echo ""
while [ "$(docker inspect -f {{.State.Running}} gluetun)" = "false" ]; do
sleep 0.1;
done;
echo "gluetun is running..."
echo ""
if [ "$(docker ps -a -f status=exited -f name=qbittorrent --format '{{.Names}}')" ]; then
echo "Qbittorrent is not running I try to start this container"
docker start qbittorrent
else
echo "Qbittorrent docker is already started"
fi
So I was able to do a scheduled task with the root user at Boot-Up in the DSM configurations and it worked fine after a reboot, without checking the Synology Docker package launch status with synopkg it did not work.
NOTE
I think the version of Bash in DSM doesn't like the until loop or it is misinterpreted. Maybe this solution can work with systems where bash is in an older version and for X reasons you can't update it or you don't want to update the binaries of Bash to avoid breaking your system.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Add MongoDB Sink Connector on docker? - mongodb

Related

when you use kafka(in docker container), Where exactly is the plugin path?

Creating 2 S3 buckets in LocalStack via a docker-compose file

How to run Kafka Connect connectors automatically (e.g. in production)?

kafka connect connector doesn't start automatically

How can I wait for a docker container to be up and running?

Categories

Resources