Manipulate kafka stream from a topic - apache-kafka

I have a postgres table and connected it to Kafk using debezium connector. Now I want to edit the message (adding one custom column) in the kafka topic (postgres table data) and convert it as a stream to create a KSQLDB table.
I do not want to write code. I need to achieve it within KSQLDB. Help me to achieve it. Any blogs or ideas would be appreciated.

Finally I added the custom column using the transform also I need this custom column to be present in the key of the kafka topic that was also achieved by this transform. So my source is below, may be helpful for someone,
{
"name": "ksqldb-connector-kafkaetl",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"plugin.name": "pgoutput",
"database.hostname": "ipaddress",
"database.port": "5432",
"database.user": "user",
"database.password": "pwd",
"database.dbname": "practice_1_kafkaetl",
"database.server.name": "postgres",
"topic.prefix":"etlsource.prc1",
"table.include.list": "dbo.table1",
"column.include.list":"dbo.table1.col1,dbo.table1.col2,dbo.table1.col3,dbo.table1.col4",
"slot.name" : "slot_batch_work_items",
"transforms": "unwrap,InsertSource",
"transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones":"false",
"transforms.InsertSource.type":"org.apache.kafka.connect.transforms.InsertField$Key",
"transforms.InsertSource.static.field":"pactice_id",
"transforms.InsertSource.static.value":"1"
}
}

Related

1 topic maps to two different db table Kafka Sink Connector

I am currently having trouble mapping my Kafka topic: st_record to two separate database table: 1) gt_school.strecord_1week 2) gt_school.strecord_1semester. My Kafka sink configuration is
"tasks.max": "1",
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": " "'"$URL"'",
"topics":"st_record",
"table.name.format": "gt_school.strecord_1week, gt_school.strecord_1semester",
"table.whitelist": "gt_school.strecord_1week, gt_school.strecord_1semester",
"transforms":"route",
"transforms.route.type":"org.apache.kafka.connect.transforms.RegexRouter,
"transforms.route.regex":"st_record",
"transforms.route.replacement":"gt_school.strecord_1week, gt_school.strecord_1semester"
I tried table.name.format, table.whitelist, and transform route however everytime I received the following error that both tables are unfound
io.confluent.connect.jdbc.sink.TableAlterOrCreateException: Table "gt_school"."strecord_1week, gt_school"."strecord_1semester" is missing and auto-creation is disabled"
Which is true, it should return in this format, "gt_school.strecord_1week, gt_school.strecord_1semester".
Does anyone know what field it should map the two tables to from 1 topic name. Am I suppose to use table.name.format. I know that in default the topic and table name are suppose to be the same however I route it and still errors
The error only says one table isn't found. Not two. The comma is within quotes... JDBC sink only writes to one table, per topic. Plus, tables cannot contain commas, as far as I know.
RegexRouter doesn't split your topic into two. It only renames the topic to a static string.
If you want to write to two distinct tables, create two separate connectors with
"topics":"st_record",
...
"transforms":"route",
"transforms.route.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex":".*",
"transforms.route.replacement":"$0_1week"
"topics":"st_record",
...
"transforms":"route",
"transforms.route.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex":".*",
"transforms.route.replacement":"$0_1semester"
However, this will obviously duplicate data in the database, so I'd recommend creating one table with data from the topic, then two VIEWs instead to do different queries of weeks/semesters

Debezium MongoDb - adding non metadata headers with ExtractNewDocumentState SMT not working

I'm having trouble adding an header from a document field (not metadata), which I'm able to do using the ExtractNewRecordState SMT on Postgres, but not using ExtractNewDocumentState on MongoDb.
This is the configuration which allows me to copy a non metadata field to a header using ExtractNewRecordState (by reaching into the "after" object):
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.add.headers": "after.Id:Id",
"transforms.unwrap.add.headers.prefix": "",
If I try the same configuration using the ExtractNewDocumentState SMT for MongoDb, it doesn't work:
"transforms": "unwrap",
"transforms.unwrap.type":"io.debezium.connector.mongodb.transforms.ExtractNewDocumentState",
"transforms.unwrap.add.headers": "after.Id:Id",
"transforms.unwrap.add.headers.prefix": "",
I get the following error:
java.lang.IllegalArgumentException: Unexpected field name: after.Id
I suspect it has to with the type of the "after" object being a string, instead of an object (as in Postgres), so at this point the SMT is not able to reach into the object to get the fields.
From what I've seen there is no SMT available for copying a field into a header, so how can I overcome this issue?

Kafka Connect - Transformes rename field only if it exist

I have an S3 sink connector for multiple topics (topic_a, topic_b, topic_c) and topic_a have field created_date and topic_b, topic_c have creation_date . I have used the below transforms.RenameField.renames to rename the field (created_date:creation_date) but since the only topic_a have created_date and others don't, the connector is failing.
I want to move all the messages (from all topics with single connector) into s3 with creation_date (and rename created_date to creation_date if exist) but I am not able to figure out the regex or transformer to rename the field (if it exists) for the specific topic.
"config":{
"connector.class":"io.confluent.connect.s3.S3SinkConnector",
"errors.log.include.messages":"true",
"s3.region":"eu-west-1",
"topics.dir":"dir",
"flush.size":"5",
"tasks.max":"2",
"s3.part.size":"5242880",
"timezone":"UTC",
"locale":"en",
"format.class":"io.confluent.connect.s3.format.json.JsonFormat",
"errors.log.enable":"true",
"s3.bucket.name":"bucket",
"topics": "topic_a, topic_b, topic_c",
"s3.compression.type":"gzip",
"partitioner.class":"io.confluent.connect.storage.partitioner.DailyPartitioner",
"name":"NAME",
"storage.class":"io.confluent.connect.s3.storage.S3Storage",
"key.converter.schemas.enable":"true",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter.schemas.enable":"true",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"https://schemaregistry.com",
"enhanced.avro.schema.support": "true",
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.RenameField.renames": "created_date:creation_date"
}
only topic_a have created_date and others don't,
Then you would use separate Connectors. One with the Transform and all topics with the field, then another without the transform.
from all topics with single connector
This doesn't scale very well. You're making limited consumer threads and one consumer group to read from many topics at once. Multiple connectors would be better to distribute the load.

Kafka: creating stream from topic with values in separate columns

I just connected my kafka to postgres with a postgres source connector. Now when i print the topic i get following output:
rowtime: 4/1/20 4:16:12 PM UTC, key: <null>, value: {"userid": 4, "id": 5, "title": "lorem", "body": "dolor sit amet, consectetur"}
rowtime: 4/1/20 4:16:12 PM UTC, key: <null>, value: {"userid": 5, "id": 6, "title": "ipsum", "body": "cupidatat non proident"}
How do i make a stream from this topic so the values would be separated to their own columns as they were in the database table originally?
Bonus question: Is there any way to specify in jdbc-connector to separate the columns into the topic when creating source connector?
My connector looks like this:
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{
"name": "jdbc_source_postgres_02",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:postgresql://postgres:5432/kafka",
"connection.user": "bob",
"connection.password": "builder",
"topic.prefix": "post_",
"mode":"bulk",
"table.whitelist" : "kafka_t.users_t",
"poll.interval.ms" : 500
}
}'
How do i make a stream from this topic so the values would be separated to their own columns as they were in the database table originally?
Not 100% sure what you mean by this. If you use Kafka Streams, you can for example create a KStream<KeyType, Columns> with a custom Columns type (or just use JSON as value type) to get an "column view" on your data.
Similarly, you could use ksqlDB with a CREATE STREAM command -- it can automatically parse the JSON value into corresponding columns.
Bonus question: Is there any way to specify in jdbc-connector to separate the columns into the topic when creating source connector?
What do you mean by that? Kafka topic have a key-value data model, and thus if you store any data in a topic, it must either go into the key or the value. If you have a more structured type, like a DB tuple, there is no native support in the Kafka brokers but you need to fit it into the key-value model.

`delete.enabled=true` not deleting the record in MySQL through JDBC sink connector

My config file of sink contains the following configurations -
...
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "DELETESRC",
"insert.mode": "upsert",
"batch.size": "50000",
"table.name.format": "DELETESRC",
"pk.mode": "record_key",
"pk.fields": "ID,C_NO",
"delete.enabled": "true",
"auto.create": "true",
"auto.evolve": "true",
"max.retries": "10",
"retry.backoff.ms": "3000",
"mode": "bulk",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schemas.enable": "true",
"value.converter.schema.registry.url": "http://localhost:8081",
"transforms": "ValueToKey",
"transforms.ValueToKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.ValueToKey.fields": "ID,C_NO"
...
I am able to use upsert using the keys but not able to use the delete mode in the JDBC sink.
I configured the topic DELETESRC as cleanup.policy=compact, delete.retention.ms=0.
I created a KSQL stream as with 4 columns (ID,CMP,SEG,C_NO) and using the insert into statements in KSQL to push the data.
INSERT INTO DELETESRC VALUES ('null','11','D','1','3')
INSERT INTO DELETESRC VALUES ('null','11','C','1','4')
INSERT INTO DELETESRC VALUES ('null','12','F','1','3')
But when I am doing INSERT INTO DELETESRC VALUES ('null','11','null','null','3'), the sink is updating the table as 11,null,null,3.
I have looked into other answers in stack-overflow but those solutions did not work.
Am I doing any mistake in creating a tombstone record?
I tried other ways in insert statement in KSQL but the delete operation is not occurring.
In order to generate a proper tombstone message you need to provide a keyed message with null value. In the example you don't show null value.
In order to not lose events due to high consumer lag, I guess you need to increase the delete.retention.ms:
The amount of time to retain delete tombstone markers for log
compacted topics. This setting also gives a bound on the time in which
a consumer must complete a read if they begin from offset 0 to ensure
that they get a valid snapshot of the final stage (otherwise delete
tombstones may be collected before they complete their scan).