Kafka FileStreamSinkConnector recreated the file to write in after it was deleted by me - apache-kafka

OK, this is an unusual one.
I made a File stream sink connector as follows:
curl -X POST http://cpnode.local.lan:8083/connectors -H "Content-Type: application/json" --data '{
"name":"file-sink-connector-002",
"config": {
"tasks.max":"1",
"batch.size":"1000",
"batch.max.rows":"1000",
"poll.interval.ms":"500",
"connector.class":"org.apache.kafka.connect.file.FileStreamSinkConnector",
"file":"/kafka/names.txt",
"table.name.format":"tb_sensordata",
"topics":"names",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable":"false",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable":"false"
}
}'
While the connector was running, I deleted the file names.txt
After an hour or so.....It recreated the file.....
I started console producer and produced some data in the topic. The sink connector wrote the data in the file.
Can anyone explain this behavior....plz.

According to this pull request (MINOR: Append or create file in FileStreamSinkTask #5406) the file mentioned in a FileStreamSinkConnector will get created by kafka-connect if it does not exists.

Related

Sink data from DLQ topic to another table as CLOB

I'm using a connector to sink records from a topic to a DB. There are also some values redirected to DLQ (Dead Letter Queue). The records in DLQ may contain wrong types, sizes, non-avro values etc. What I want to do is, sinking all the records to an Oracle DB table. This table will only have 2 columns; a CLOB for the entire message and record date.
To sink from Kafka, we need a schema. Since this topic will contain many types of records, I can't create a proper schema (or can I?). I just want to sink the messages as a whole, how can I achieve this?
I've tried it with this schema and connector:
curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{\"type\":\"record\",\"name\":\"DLQ_TEST\",\"namespace\":\"DLQ_TEST\",\"fields\":[
{\"name\":\"VALUE\",\"type\":[\"null\",\"string\"]},
{\"name\":\"RECORDDATE\",\"type\":[\"null\",\"long\"]}]}"}' http://server:8071/subjects/DLQ_INSERT-value/versions
curl -i -X PUT -H "Content-Type:application/json" http://server:8083/connectors/sink_DLQ_INSERT/config -d
'{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url": "jdbc:oracle:thin:#oracleserver:1357/VITORCT",
"table.name.format": "GLOBAL.DLQ_TEST_DLQ",
"connection.password": "${file:/kafka/vty/pass.properties:vitweb_pwd}",
"connection.user": "${file:/kafka/vty/pass.properties:vitweb_user}",
"tasks.max": "1",
"topics": "DLQ_TEST_dlq",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "true",
"auto.create": "false",
"insert.mode": "insert",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter"
}'
I don't exactly understand how to make connector use this schema.

Kafka: Source-Connector to Topic mapping is Flakey

I have the following Kafka connector configuration (below). I have created the "member" topic already (30 partitions). The problem is that I will install the connector and it will work; i.e.
curl -d "#mobiledb-member.json" -H "Content-Type: application/json" -X PUT https://TTGSSQA0VRHAP81.ttgtpmg.net:8085/connectors/mobiledb-member-connector/config
curl -s https://TTGSSQA0VRHAP81.ttgtpmg.net:8085/connectors/member-connector/topics
returns:
{"member-connector":{"topics":[member]}}
the status call returns no errors:
curl -s https://TTGSSQA0VRHAP81.ttgtpmg.net:8085/connectors/mobiledb-member-connector/status
{"name":"member-connector","connector":{"state":"RUNNING","worker_id":"ttgssqa0vrhap81.***.net:8085"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"ttgssqa0vrhap81.***.net:8085"}],"type":"source"}
... but at other times, I will install a similar connector config and it will return no topics.
{"member-connector":{"topics":[]}}
Yet the status shows no errors and the Connector logs show no clues as to why this "connector to topic" mapping isn't working. Why aren't the logs helping out?
Connector configuration.
{
"connector.class":"io.confluent.connect.jdbc.JdbcSourceConnector",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"connection.url":"jdbc:sqlserver:****;",
"connection.user":"***",
"connection.password":"***",
"transforms":"createKey",
"table.poll.interval.ms":"120000",
"key.converter.schemas.enable":"false",
"value.converter.schemas.enable":"false",
"poll.interval.ms":"5000",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"name":"member-connector",
"tasks.max":"4",
"query":"SELECT * FROM member_kafka_test",
"table.types":"TABLE",
"topic.prefix":"member",
"mode":"timestamp+incrementing",
"transforms.createKey.fields":"member_id",
"incrementing.column.name": "member_id",
"timestamp.column.name" : "update_datetime"
}

kafka FileStreamSourceConnector write an avro file to topic with key field

I want to use kafka FileStreamSourceConnector to write a local avro file into a topic.
My connector config looks like this:
curl -i -X PUT -H "Content-Type:application/json" http://localhost:8083/connectors/file_source_connector/config \
-d '{
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"topic": "my_topic",
"file": "/data/log.avsc",
"format.include.keys": "true",
"source.auto.offset.reset": "earliest",
"tasks.max": "1",
"value.converter.schemas.enable": "true",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter"
}'
Then when I print out the topic, the key fields are null.
Updated on 2021-03-29:
After watching this video 🎄Twelve Days of SMT 🎄 - Day 2: ValueToKey and ExtractField from Robin,
I applied SMT to my connector config:
curl -i -X PUT -H "Content-Type:application/json" http://localhost:8083/connectors/file_source_connector_02/config \
-d '{
"connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"topic": "my_topic",
"file": "/data/log.avsc",
"tasks.max": "1",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"transforms": "ValueToKey, ExtractField",
"transforms.ValueToKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.ValueToKey.fields":"id",
"transforms.ExtractField.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.ExtractField.field":"id"
}'
However, the connector is failed:
Caused by: org.apache.kafka.connect.errors.DataException: Only Struct objects supported for [copying fields from value to key], found: java.lang.String
I would use ValueToKey transformer.
In bad case ignorig values and setting random key.
For details look at:ValueToKey
FileStreamSource assumes UTF8 encoded, line delimited files are your input, not binary files such as Avro. Last I checked, format.include.keys is not a valid config for the connector either.
Therefore each consumed event will be a string, and subsequently, transforms that require Structs with field names will not work
You can use the Hoist transform to create a Struct from each "line", but this still will not parse your data to make the ID field accessible to move to the key.
Also, your file is AVSC, which is JSON formatted, not Avro, so I'm not sure what the goal is by using the AvroConverter, or having "schemas.enable": "true". Still, the lines read by the connector are not parsed by converters such that fields are accessible, only serialized when sent to Kafka
My suggestion would be to write some other CLI script using plain producer libraries to parse the file, extract the schema, register that with Schema Registry, build a producer record for each entity in the file, and send them

Kafka connect MongoDB Kafka connector using JSONConverter not working

I'm trying to configure the Kafka connector to use mongoDB as the source and send the records into Kakfa topics.
I've successfully done so, but I'm trying to do it with the JSONConverter in order to also save the schema with the payload.
My problem is that the connector is saving the data as follows:
{ "schema": { "type": "string" } , "payload": "{....}" }
In other words, it's automatically assuming the actual JSON is a string and it's saving the schema as String.
This is how I'm setting up the connector:
curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{
"name": "newtopic",
"config": {
"tasks.max":1,
"connector.class":"com.mongodb.kafka.connect.MongoSourceConnector",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enabled": "true",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enabled": "true",
"connection.uri":"[MONGOURL]",
"database":"dbname",
"collection":"collname",
"pipeline":"[]",
"topic.prefix": "",
"publish.full.document.only": "true"
}}'
Am I missing something for the configuration? Is it simply not able to guess the schema of the document stored in MongoDB, so it goes with String?
There's nothing to guess. Mongo is schemaless, last I checked, so the schema is a string or bytes. I would suggest using AvroConverter or setting schema enabled to false
You may also want to try using Debezium to see if you get different results
The latest version 1.3 of the MongoDB connector should solve this issue.
It even provides an option for inferring the source schema.

how to override key.serializer in kafka connect jdbc

I am doing mysql to kafka connection using kafka jdbc source connector. Everything working fine. Now i need to pass key.serializer and value.serializer to encrypt data as show at macronova. but i didn't found any changes in output.
POST API to start source connector
curl -X POST -H "Content-Type: application/json" --data '{
"name": "jdbc-source-connector-2",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"tasks.max": 10,
"connection.url": "jdbc:mysql://localhost:3306/connect_test?user=roo&password=roo",
"mode": "incrementing",
"table.whitelist" : "test",
"incrementing.column.name": "id",
"timestamp.column.name": "modified",
"topic.prefix": "table-",
"poll.interval.ms": 1000
}
}' http://localhost:8083/connectors
Connectors take Converters only, not serializers via key and value properties
If you want to encrypt a whole string, you'd need to implement your own converter or edit your code that writes into the database to write into Kafka instead, then consume and write to the database as well as other downstream systems