Kafka source connector for postgres - Day0 load - postgresql

I am looking for a Kafka source connector for Day0 load from Postgres to Kafka.
Came across Debezium postgres connector.
Docker image,
debezium/connect:1.4
docker run -it --rm --name postgres-connect -p 8083:8083 -e BOOTSTRAP_SERVERS=host1:8080 -e GROUP_ID=test-1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses debezium/connect:1.4
How to pass the postgres host details and kafka sasl config?
Any help would be appreciated.

1. SASL configuration
1.1. In common case you need to add the following properties to your connect-distributed.properties:
sasl.mechanism=PLAIN
security.protocol=SASL_PLAINTEXT
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="connect" \
password="connect-secret";
producer.sasl.mechanism=PLAIN
producer.security.protocol=SASL_PLAINTEXT
producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="connect" \
password="connect-secret";
consumer.sasl.mechanism=PLAIN
consumer.security.protocol=SASL_PLAINTEXT
consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
username="connect" \
password="connect-secret";
Source: StackOverflow answer "ACL configuration in Kafka connect is not working"
Reference: Kafka Connect Security docs
1.2. For debezium/connect docker image you can try to pass SASL config directly via environment variables (using these transformation steps):
docker run -it --rm --name postgres-connect -p 8083:8083 \
-e BOOTSTRAP_SERVERS=host1:8080 -e GROUP_ID=test-1 \
-e CONFIG_STORAGE_TOPIC=my_connect_configs \
-e OFFSET_STORAGE_TOPIC=my_connect_offsets \
-e STATUS_STORAGE_TOPIC=my_connect_statuses \
-e CONNECT_SASL_MECHANISM=PLAIN \
-e CONNECT_SECURITY_PROTOCOL=SASL_PLAINTEXT \
-e CONNECT_SASL_JAAS_CONFIG=org.apache.kafka.common.security.plain.PlainLoginModule required username="connect" password="connect-secret"; \
-e CONNECT_PRODUCER_SASL_MECHANISM=PLAIN \
-e CONNECT_PRODUCER_SECURITY_PROTOCOL=SASL_PLAINTEXT \
-e CONNECT_PRODUCER_SASL_JAAS_CONFIG=org.apache.kafka.common.security.plain.PlainLoginModule required username="connect" password="connect-secret"; \
-e CONNECT_CONSUMER_SASL_MECHANISM=PLAIN \
-e CONNECT_CONSUMER_SECURITY_PROTOCOL=SASL_PLAINTEXT \
-e CONNECT_CONSUMER_SASL_JAAS_CONFIG=org.apache.kafka.common.security.plain.PlainLoginModule required username="connect" password="connect-secret"; \
debezium/connect:1.4
2. PostgreSQL host configaration
Host details should be passed via Kafka Connect REST API using connector config:
curl -i -X PUT -H "Content-Type:application/json" \
http://localhost:8083/connectors/debezium_postgres_source/config \
-d '{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"database.hostname": "source-db",
"database.port": "5432",
"database.user": "postgresusersource",
"database.password": "postgrespw",
"database.dbname" : "sourcedb",
"database.server.name": "dbserver1"
}'

Related

How to specify whether a connector is a source or a sink?

I am currently configuring kafka connect (with debezium/connect docker image), I successfully connected it to Kafka using environment variables:
docker run -it --rm --name AAAAAA-kafka-connect -p 8083:8083 \
-v aaaaa.jks:aaaaa.jks \
-v bbbbbb.jks:bbbbbb.jks \
-e LOG_LEVEL=INFO \
-e HOST_NAME="AAAAAA-kafka-connect" \
-e HEAP_OPTS="-Xms256m -Xmx2g" \
-e BOOTSTRAP_SERVERS="BBBBB:9092" \
-e CONNECT_CLIENT_ID="xxx-kafka-connect" \
-e CONNECT_SASL_JAAS_CONFIG="org.apache.kafka.common.security.scram.ScramLoginModule required username=\"...\" password=\"...\";" \
-e CONNECT_SECURITY_PROTOCOL="SASL_SSL" \
-e CONNECT_SASL_MECHANISM="PLAIN" \
-e CONNECT_SSL_TRUSTSTORE_LOCATION="bbbbbb.jks" \
-e CONNECT_SSL_TRUSTSTORE_PASSWORD="..." \
-e CONNECT_SSL_KEYSTORE_LOCATION="aaaaa.jks" \
-e CONNECT_SSL_KEYSTORE_PASSWORD="..." \
-e GROUP_ID="XXX.grp.kafka.connect" \
-e CONFIG_STORAGE_TOPIC="XXX.connect.configs.v1" \
-e OFFSET_STORAGE_TOPIC="XXX.connect.offsets.v1" \
-e STATUS_STORAGE_TOPIC="XXX.connect.statuses.v1" \
quay.io/debezium/connect:1.9
Now I have to create a source connector (posgresql db) and I want the data kafka connect will grab from the source to be sink in a kafka topic.
Where do I have to set the kafka configuration of the sink since there is no such config in the json config of the database connector?
Have I to create a sink connector to the kafka topic? if so, where do we specify if this is a sink or a source connector??
PS: I already have created the kafka topic where i want to put datas in
Feel free to ask questions
Environment variables only modify the client parameters.
Source and Sinks are determined when you actually create the connector. You need a JSON config and it will have a connector.class.
In Kafka API there is SinkTask and SourceTask.
Debezium is always a Source. Sources write to Kafka; that doesn't make Kafka a sink. You need to install a new connector plugin to get a sink for your database, such as the JDBC Connector from Confluent which has classes for both sources and sinks.
ok, you have to add the CONNECT_PRODUCER_* or CONNECT_CONSUMER_* environment variables to specify the config of source or sink !!!!!!
Like this:
docker run -it --rm --name AAAAAA-kafka-connect -p 8083:8083 \
-v aaaaa.jks:aaaaa.jks \
-v bbbbbb.jks:bbbbbb.jks \
-e LOG_LEVEL=INFO \
-e HOST_NAME="AAAAAA-kafka-connect" \
-e HEAP_OPTS="-Xms256m -Xmx2g" \
-e BOOTSTRAP_SERVERS="BBBBB:9092" \
-e CONNECT_CLIENT_ID="xxx-kafka-connect" \
-e CONNECT_SASL_JAAS_CONFIG="org.apache.kafka.common.security.scram.ScramLoginModule required username=\"...\" password=\"...\";" \
-e CONNECT_SECURITY_PROTOCOL="SASL_SSL" \
-e CONNECT_SASL_MECHANISM="PLAIN" \
-e CONNECT_SSL_TRUSTSTORE_LOCATION="bbbbbb.jks" \
-e CONNECT_SSL_TRUSTSTORE_PASSWORD="..." \
-e CONNECT_SSL_KEYSTORE_LOCATION="aaaaa.jks" \
-e CONNECT_SSL_KEYSTORE_PASSWORD="..." \
-e GROUP_ID="XXX.grp.kafka.connect" \
-e CONFIG_STORAGE_TOPIC="XXX.connect.configs.v1" \
-e OFFSET_STORAGE_TOPIC="XXX.connect.offsets.v1" \
-e STATUS_STORAGE_TOPIC="XXX.connect.statuses.v1" \
-e CONNECT_PRODUCER_TOPIC_CREATION_ENABLE=false \
-e CONNECT_PRODUCER_SASL_JAAS_CONFIG="org.apache.kafka.common.security.scram.ScramLoginModule required username=\"...\" password=\"...\";" \
-e CONNECT_PRODUCER_SECURITY_PROTOCOL="SASL_SSL" \
-e CONNECT_PRODUCER_SASL_MECHANISM="PLAIN" \
-e CONNECT_PRODUCER_SSL_TRUSTSTORE_LOCATION="bbbbbb.jks" \
-e CONNECT_PRODUCER_SSL_TRUSTSTORE_PASSWORD="..." \
-e CONNECT_PRODUCER_SSL_KEYSTORE_LOCATION="aaaaa.jks" \
-e CONNECT_PRODUCER_SSL_KEYSTORE_PASSWORD="..." \
-e CONNECT_PRODUCER_CLIENT_ID="xxx-kafka-connect" \
-e CONNECT_PRODUCER_TOPIC_CREATION_ENABLE=false \
quay.io/debezium/connect:1.9
the sink or source property comes from the connector.class used in the json definition of the connector. However, Debeziums CDC connectors can only be used as a source connector that captures real-time event change records from external database systems (https://hevodata.com/learn/debezium-vs-kafka-connect/#:~:text=Debezium%20platform%20has%20a%20vast,records%20from%20external%20database%20systems.)

Install Postrouting in docker postgis-postgresql container

I created a postgis database with docker using the postgis image as usual
docker run -d \
--name mypostgres \
-p 5555:5432 \
-e POSTGRES_PASSWORD=postgres \
-v /data/postgres/data:/var/lib/postgresql/data \
-v /data/postgres/lib:/usr/lib/postgresql/10/lib \
postgis/postgis:10-3.0
now I can see all extensiones in the database,it has postgis, it's ok. but not have postrouting.
so I pull another image:
docker pull pgrouting/pgrouting:11-3.1-3.1.3
and do the same command:
docker run -d \
--name pgrouting \
-p 5556:5432 \
-e POSTGRES_PASSWORD=postgres \
-v /data/pgrouting/data/:/var/lib/postgresql/data/ \
-v /data/postgres/lib/:/usr/lib/postgresql/11/lib/ \
pgrouting/pgrouting:11-3.1-3.1.3
but when I exec this command:
create extensione postrouting
I get this error message:
could not load library "/usr/lib/postgresql/11/lib/plpgsql.so": /usr/lib/postgresql/11/lib/plpgsql.so: undefined symbol: AllocSetContextCreate
I can't solve this problem.Can anyone help me?
thanks a lot

Airflow - Switching to CeleryExecutor results in password authentication failed for user "airflow" exception

I run docker container with apache airflow
If I set executor = LocalExecutor, everything works fine, however, if I set executor = CeleryExecutor and run a DAG I get the following exception printed
[2020-07-13 04:17:41,065] {{celery_executor.py:266}} ERROR - Error fetching Celery task state, ignoring it:OperationalError('(psycopg2.OperationalError) FATAL: password authentication failed for user "airflow"\n')
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/executors/celery_executor.py", line 108, in fetch_celery_task_state
I provide however the following ENV variables in docker run call
docker run --name test -it \
-p 8000:80 -p 5555:5555 -p 8080:8080 \
-v `pwd`:/app \
-e AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY \
-e AWS_DEFAULT_REGION \
-e PYTHONPATH=/app \
-e ENVIRONMENT=local \
-e XCOMMAND \
-e POSTGRES_PORT=5432 \
-e POSTGRES_HOST=postgres \
-e POSTGRES_USER=project_user \
-e POSTGRES_PASSWORD=password \
-e DJANGO_SETTINGS_MODULE=config.settings.local \
-e AIRFLOW_DB_NAME=project_airflow_dev \
-e AIRFLOW_ADMIN_USER=project_user \
-e AIRFLOW_ADMIN_EMAIL=admin#project.com \
-e AIRFLOW_ADMIN_PASSWORD=password \
-e AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://project_user:password#postgres:5432/project_airflow_dev \
-e AIRFLOW__CORE__EXECUTOR=CeleryExecutor \
-e AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/1 \
--network="project-network" \
--link project_cassandra_1:cassandra \
--link project_postgres_1:postgres \
--link project_redis_1:redis \
registry.dkr.ecr.us-east-2.amazonaws.com/airflow:v1.0
In LocalExecutor - everything is fine, so I can login into admin UI and trigger the dag and get successful results, it's just that when I switch to CeleryExecutor - I get a weird error about "airflow" user, as if AIRFLOW__CORE__SQL_ALCHEMY_CONN env var is not visible or used at all.
Any ideas?
solution:
Adding AIRFLOW__CELERY__RESULT_BACKEND env var fixed the issue.
...
-e AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql+psycopg2://project_user:password#postgres:5432/project_airflow_dev \
...
or edit airflow.cfg
[celery]
result_backend = db+postgresql://airflow:airflow#postgres/airflow

kafka connect avro No suitable driver found for jdbc:mysql://127.0.0.1:3306/connect_test

I'm following the
Confluent Kafka Connect docker tutorial.
I start the kafka-connect docker image, and check that it's started ok in the docker logs.
docker run -d \
--name=kafka-connect-avro \
--net=host \
-e CONNECT_BOOTSTRAP_SERVERS=localhost:29092 \
-e CONNECT_REST_PORT=28083 \
-e CONNECT_GROUP_ID="quickstart-avro" \
-e CONNECT_CONFIG_STORAGE_TOPIC="quickstart-avro-config" \
-e CONNECT_OFFSET_STORAGE_TOPIC="quickstart-avro-offsets" \
-e CONNECT_STATUS_STORAGE_TOPIC="quickstart-avro-status" \
-e CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR=1 \
-e CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR=1 \
-e CONNECT_STATUS_STORAGE_REPLICATION_FACTOR=1 \
-e CONNECT_KEY_CONVERTER="io.confluent.connect.avro.AvroConverter" \
-e CONNECT_VALUE_CONVERTER="io.confluent.connect.avro.AvroConverter" \
-e CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL="http://localhost:8081" \
-e CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL="http://localhost:8081" \
-e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_REST_ADVERTISED_HOST_NAME="localhost" \
-e CONNECT_LOG4J_ROOT_LOGLEVEL=DEBUG \
-e CONNECT_PLUGIN_PATH=/usr/share/java,/etc/kafka-connect/jars \
-v /tmp/quickstart/file:/tmp/quickstart \
-v /tmp/quickstart/jars:/etc/kafka-connect/jars \
confluentinc/cp-kafka-connect:latest
When I try to post to the database
curl -X POST -H "Content-Type: application/json" --data '{ "name": "quickstart-jdbc-source", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "tasks.max": 1, "connection.url": "jdbc:mysql://127.0.0.1:3306/connect_test?user=root&password=confluent", "mode": "incrementing", "incrementing.column.name": "id", "timestamp.column.name": "modified", "topic.prefix": "quickstart-jdbc-", "poll.interval.ms": 1000 } }' http://$CONNECT_HOST:28083/connectors
I get
"error_code":400,"message":"Connector configuration is invalid and contains the following 2 error(s):\nInvalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://127.0.0.1:3306/connect_test?user=root&password=confluent for configuration Couldn't open connection to jdbc:mysql://127.0.0.1:3306/connect_test?user=root&password=confluent\nInvalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://127.0.0.1:3306/connect_test?user=root&password=confluent for configuration Couldn't open connection to jdbc:mysql://127.0.0.1:3306/connect_test?user=root&password=confluent\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"}
Something wrong with the mysql jdbc jar?
Further detail, it seems from the logs below that the CONNECT_PLUGIN_PATH does nothing.
[2019-08-28 17:19:27,113] INFO Added alias 'BasicAuthSecurityRestExtension' to plugin 'org.apache.kafka.connect.rest.basic.auth.extension.BasicAuthSecurityRestExtension' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
plugin.path = [/usr/share/java, /etc/kafka-connect/jars]
[2019-08-28 17:19:27,231] WARN The configuration 'plugin.path' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
From below the mount is sucessful
docker inspect kafka-connect-avro
{
"Type": "bind",
"Source": "/tmp/quickstart/jars",
"Destination": "/etc/kafka-connect/jars",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
I got this working by copying the jar into the docker image
docker cp /tmp/quickstart/jars/mysql-connector-java-8.0.17.jar kafka-connect-avro:/usr/share/java/kafka

Hasura use SSL certificates for Postgres connection

I can run Hashura from the Docker image.
docker run -d -p 8080:8080 \
-e HASURA_GRAPHQL_DATABASE_URL=postgres://username:password#hostname:port/dbname \
-e HASURA_GRAPHQL_ENABLE_CONSOLE=true \
hasura/graphql-engine:latest
But I also have a Postgres instance that can only be accessed with three certificates:
psql "sslmode=verify-ca sslrootcert=server-ca.pem \
sslcert=client-cert.pem sslkey=client-key.pem \
hostaddr=$DB_HOST \
port=$DB_PORT\
user=$DB_USER dbname=$DB_NAME"
I don't see a configuration for Hasura that allows me to connect to a Postgres instance in such a way.
Is this something I'm suppose to pass into the database connection URL?
How should I do this?
You'll need to mount your certificates into the docker container and then configure libpq (which is what hasura uses underneath) to use the required certificates with these environment variables. It'll be something like this (I haven't tested this):
docker run -d -p 8080:8080 \
-v /absolute-path-of-certs-folder:/certs
-e HASURA_GRAPHQL_DATABASE_URL=postgres://hostname:port/dbname \
-e HASURA_GRAPHQL_ENABLE_CONSOLE=true \
-e PGSSLMODE=verify-ca \
-e PGSSLCERT=/certs/client-cert.pem \
-e PGSSLKEY=/certs/client-key.pem \
-e PGSSLROOTCERT=/certs/server-ca.pem \
hasura/graphql-engine:latest