I am starting to play with CDC and Kafka connect
After countless hours trying, I have come to understand the logic
Set Kafka Connect properties (bin/connect-standalone.sh) with your cluster information
Set Kafka Connect configuration file (config/connect-standalone.properties)
Download your Kafka connector (in this case MySQL from Debizium)
Configure connector properties in whatevername.properties
In order to run a worker with Kafka Connector, you need to
./bin/connect-standalone.sh config/connect-standalone.properties
which answers:
INFO Usage: ConnectStandalone worker.properties connector1.properties [connector2.properties ...] (org.apache.kafka.connect.cli.ConnectStandalone:62)
I know we need to run:
./bin/connect-standalone.sh config/connect-standalone.properties myconfig.properties
My issue is that I cannot find any format description, or example of that myconfig.properties field.
【Extra Info】
Debizium configuration properties list:
https://docs.confluent.io/debezium-connect-mysql-source/current/mysql_source_connector_config.html#mysql-source-connector-config
https://debezium.io/documentation/reference/1.5/connectors/mysql.html
【Question】
Where can I find an example of the connector properties?
Thanks!
I'm not sure if I understood your question, but here is an example of properties for this connector :
connector.class=io.debezium.connector.mysql.MySqlConnector
connector.name=someuniquename
database.hostname=192.168.99.100
database.port=3306
database.user=debezium-user
database.password=debezium-user-pw
database.server.id=184054
database.server.name=fullfillment
database.include.list=inventory
database.history.kafka.bootstrap.servers=kafka:9092
database.history.kafka.topic=dbhistory.fullfillment
include.schema.changes=true
The original config is the one from the documentation which I converted from json to properties : https://debezium.io/documentation/reference/1.5/connectors/mysql.html#mysql-example-configuration
Related
I have successfully set up Kafka Connect in distributed mode locally with the Confluent BigQuery connector. The topics are being made available to me by another party; I am simply moving these topics into my Kafka Connect on my local machine, and then to the sink connector (and thus into BigQuery).
Because of the topics being created by someone else, the schema registry is also being managed by them. So in my config, I set "schema.registry.url":https://url-to-schema-registry, but we have multiple topics which all use the same schema entry, which is located at, let's say, https://url-to-schema-registry/subjects/generic-entry-value/versions/1.
What is happening, however, is that Connect is looking for the schema entry based on the topic name. So let's say my topic is my-topic. Connect is looking for the entry at this URL: https://url-to-schema-registry/subjects/my-topic-value/versions/1. But instead, I want to use the entry located at https://url-to-schema-registry/subjects/generic-entry-value/versions/1, and I want to do so for any and all topics.
How can I make this change? I have tried looking at this doc: https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#configuration-details as well as this class: https://github.com/confluentinc/schema-registry/blob/master/schema-serializer/src/main/java/io/confluent/kafka/serializers/subject/TopicRecordNameStrategy.java
but this looks to be a config parameter for the schema registry itself (which I have no control over), not the sink connector. Unless I'm not configuring something correctly.
Is there a way for me to configure my sink connector to look for a specified schema entry like generic-entry-value/versions/..., instead of the default format topic-name-value/versions/...?
The strategy is configurable at the connector level.
e.g.
value.converter.value.subject.name.strategy=...
There are only strategies built-in, however for Topic and/or RecordName lookups. You'll need to write your own class for static lookups from "generic-entry" if you otherwise cannot copy this "generic-entry-value" schema into new subjects
e.g
# get output of this to a file
curl ... https://url-to-schema-registry/subjects/generic-entry-value/versions/1/schema
# upload it again where "new-entry" is the name of the other topic
curl -XPOST -d #schema.json https://url-to-schema-registry/subjects/new-entry-value/versions
I installed Neo4j and I can access the server. I can make nodes though cypher.
Now I want to use it for data streams. But I'm not sure how to do so. I just started Neo4j and I'm struggling with installing 'Stream Plugin'.
Any help is highly appreciated.
You should copy the jar files for the Neo4j streams plugin directly into your /plugins folder and configure the connections to Kafka and Zookeeper as well as other Neo4j property values at the neo4j.conf file as described here. For example:
kafka.zookeeper.connect=zookeeper-host:2181
kafka.bootstrap.servers=kafka-host:9092
Alternatively, if you are looking only for a sink connection from Kafka (i.e. moving records from Kafka topics to into Neo4j), you can also use Kafka Connect with the the supported Kafka Connect Neo4j Sink. More at https://www.confluent.io/hub/neo4j/kafka-connect-neo4j
I am using MongoDB Source Connector to get the data from a MongoDB collection into Kafka. What this connector does is that it automatically creates a topic using the following naming convention:
[prefix_provided_in_the_connector_properties].[db_name].[collection_name]
In the MongoDB Source Connector's documentation, there is no mention of overriding the topic configuration such as number of partitions or replication factor. I have the following questions:
Is it possible to override the topic configs in the connector.properties file?
If not, is it then done on Kafka's end? If so, can we individually configure each topics' settings or it will globally affect all the topics?
Thank you!
Sounds like you have auto.create.topics.enable=true on your brokers. It is recommended to disable this and enforce manual topic creation.
Connect only creates internal topics for itself. Source connectors should ideally have their topics created ahead of time, otherwise, you get the defaults set in the broker server.properties. Changing the values will not change existing topics
I am trying to use kafka connect to read changes in postgress DB.
I have Kafka running on my local system and i want to use the Kafka connect API in standalone mode to read the postgress server DB changes.
connect-standalone.sh connect-standalone.properties dbezium.properties
i would appreciate if someone can help me with setting up configuration properties for CDC postgress debezium connector
https://www.confluent.io/connector/debezium-postgresql-cdc-connector/
I am following the below to construct the properties
https://debezium.io/docs/connectors/postgresql/#how-the-postgresql-connector-works
The name of the Kafka topics takes by default the form
serverName.schemaName.tableName, where serverName is the logical name
of the connector as specified with the database.server.name
configuration property
and here is what i have come up with for dbezium.properties
name=cdc_demo
connector.class=io.debezium.connector.postgresql.PostgresConnector
tasks.max=1
plugin.name=wal2json
slot.name=debezium
slot.drop_on_stop=false
database.hostname=localhost
database.port=5432
database.user=postgress
database.password=postgress
database.dbname=test
time.precision.mode=adaptive
database.sslmode=disable
Lets say i create a PG schema name as demo and table name as suppliers
So i need to create a topic with name as test.demo.suppliers so that this plugin can push the data to?
Also can someone suggest a docker image which has the postgress server + with suitable replication plugin such as wal2json etc? i am having hard time configuring postgress and the CDC plugin myself.
Check out the tutorial with associated Docker Compose and sample config.
The topic you've come up with sounds correct, but if you have your Kafka broker configured to auto-create topics (which is the default behaviour IIRC) then it will get created for you and you don't need to pre-create it.
According to Kafka Documentation
Connector configurations are simple key-value mappings. For standalone
mode these are defined in a properties file and passed to the Connect
process on the command line.
Most configurations are connector dependent, so they can't be outlined
here. However, there are a few common options:
name - Unique name for the connector. Attempting to register again with the same name will fail.
I have 10 connectors running in standalone mode like this:
bin/connect-standalone.sh config/connect-standalone.properties connector1.properties connector2.properties ...
My question is can a connector load its own name at runtime?
Thanks in advance.
Yes, you can get the name of the connector at runtime.
When connector starts all properties are passed to Connector::start(Map<String, String> props). Connector can read those properties, validate them, save and later pass to Task. It depends on Connector implementation if he use it or not.
Connector name property is name.