Kafka Connect Number type fields - apache-kafka

When i would like to use Kafka connect with source RDBMS which is Oracle , Number type fields are seen as bytes like below,
Column "ID" with value "4" as number has been sent ,but at consumer console this value has been seen as "ID":"BA=="
What can i do in order to solve this issue ?
Kafka connect is started with below command
connect-standalone ./etc/kafka/connect-standalone.properties /home/kafka/oracle.properties.test
######## connect-standalone.properties
# These are defaults. This file just demonstrates how to override some settings.
bootstrap.servers=kafkaserver01.localdomain:9092
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true
# The internal converter used for offsets and config data is configurable and must be specified, but most users will
# always want to use the built-in default. Offset and config data is never visible outside of Kafka Connect in this format.
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000
######## /home/kafka/oracle.properties.test Configuration File
name=oracle-connect-test1
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
topic.prefix=
connection.url=jdbc:oracle:thin:test/oracle#testsrv01:1521:testdb
table.whitelist=TEST1,TEST2
mode=timestamp
timestamp.column.name=CDC_TIMESTAMP
## Console Consumer
kafka-console-consumer --bootstrap-server kafkaserver01.localdomain:9092 --topic TEST1
Thanks.

I found the solution please add below configuration in your connector source properties
numeric.precision.mapping = true
it will disable encoding numeric value at topic
with new version of kafka-connect-jdbc-4.1.1
you can use property
numeric.mapping=best_fit
for best result

Related

kafka connect: config file not found

I am new to kafka and learning to use kafka connect to read from a text (.txt) file.
I created the source-config file and worker config files and saved them as .properties files under resources. I am getting a NoSuchFileException for both the config files when i run the following from terminal:
/bin/connect-standalone.sh my-standalone.properties my-file-source.properties
#my-file-source.properties config file
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=/tmp/my-test.txt
topic=my-connect-test
# my-standalone.properties worker config file
#bootstrap kafka servers
bootstrap.servers=localhost:9092
# specify input data format
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
# The internal converter used for offsets, most will always want to use the built-in default
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
# local file storing offsets and config data
offset.storage.file.filename=/tmp/connect.offsets
Is adding config files to resources the right way? Where am I going wrong?

Kafka RabbitMQ connector - can only get byte arrays

I've set up a connector to pull from a RabbitMQ queue and push into a Kafka topic. The connector runs and the queue empties out. But when I look at the topic with either kafka-console-consumer or kafkacat, every entry looks like a byte arrray - [B#xxxxxxxx.
The RabbitMQ message payloads are all JSON. What do I need to do to get JSON back out from Kafka? I've tried value.converter=org.apache.kafka.connect.storage.StringConverter as well as using ByteArrayDeserializer with the console consumer.
connect-standalone.properties:
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/home/robbie/kafka/plugins
RabbitMQSourceConnector.properties:
name=rabbitmq
tasks.max=1
connector.class=io.confluent.connect.rabbitmq.RabbitMQSourceConnector
rabbitmq.prefetch.count=500
rabbitmq.automatic.recovery.enabled=false
rabbitmq.network.recovery.interval.ms=10000
rabbitmq.topology.recovery.enabled=true
rabbitmq.queue=test1
rabbitmq.username=testuser1
rabbitmq.password=xxxxxxxxxxxxxxx
rabbitmq.host=rmqhost
rabbitmq.port=5672
kafka.topic=rabbitmq.test1
You need to set
value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
I wrote a blog up on this just today :)

Snowflake Kafka connector config issue

I'm following the steps in this guide Snowflake Connector for Kafka
The error message I'm getting is
BadRequestException: Connector config {.....} contains no connector type
I am running the command as
sh kafka_2.12-2.3.0/bin/connect-standalone.sh connect-standalone.properties snowflake_kafka_config.json
my config files are
connect-standalone.properties
bootstrap.servers=localhost:9092
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/Users/kafka_test/kafka
jar file snowflake-kafka-connector-0.5.1.jar is in plugin.path
snowflake_kafka_config.json
{
"name":"Kafka_Test",
"Config":{
"connector.class":"com.snowflake.kafka.connector.SnowflakeSinkConnector",
"tasks.max":"8",
"topics":"test",
"snowflake.topic2table.map": "",
"buffer.count.records":"1",
"buffer.flush.time":"60",
"buffer.size.bytes":"65536",
"snowflake.url.name":"<url>",
"snowflake.user.name":"<user_name>",
"snowflake.private.key":"<private_key>",
"snowflake.private.key.passphrase":"<pass_phrase>",
"snowflake.database.name":"<db>",
"snowflake.schema.name":"<schema>",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"com.snowflake.kafka.connector.records.SnowflakeJsonConverter",
"value.converter.schema.registry.url":"",
"value.converter.basic.auth.credentials.source":"",
"value.converter.basic.auth.user.info":""
}
}
Kafka is running on local, I have a producer and consumer up, can see the data flowing.
This is the same question I answered over on the Confluent community Slack, but I'll post it here for reference too :-)
The connect worker log shows that the connector JAR itself is being loaded, so the 'contains no connector type` is because your config formatting is fubar.
You're running in Standalone mode, but passing in a JSON file which won't. My personal opinion is always use distributed, even if just a single node of it. Check this out if you need a recap on standalone vs distributed : http://rmoff.dev/ksldn19-kafka-connect
If you must use standalone then you need your connector config (snowflake_kafka_config.json) to be a properties file like this:
param1=argument1
param2=argument2
You can see valid JSON examples (if you use distributed mode) here: https://github.com/confluentinc/demo-scene/blob/master/kafka-connect-zero-to-hero/demo_zero-to-hero-with-kafka-connect.adoc#stream-data-from-kafka-to-elasticsearch

MongoDB Kafka Connector not generating the message key with the Mongo document id

I'm using the beta release of the MongoDB Kafka Connector to publish from MongoDB to a Kafka topic.
Messages are generated into Kafka but their key is null when it should be the document id:
This is my connect standalone config:
bootstrap.servers=xxx:9092
# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter you want to apply
# it to
key.converter.schemas.enable=false
value.converter.schemas.enable=false
# The internal converter used for offsets and config data is configurable and must be specified, but most users will
# always want to use the built-in default. Offset and config data is never visible outside of Kafka Connect in this format.
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
And the mongodb source properties:
name=mongo-source
connector.class=com.mongodb.kafka.connect.MongoSourceConnector
tasks.max=1
# Connection and source configuration
connection.uri=mongodb+srv://xxx
database=mydb
collection=mycollection
topic.prefix=someprefix
poll.max.batch.size=1000
poll.await.time.ms=5000
# Change stream options
pipeline=[]
batch.size=0
change.stream.full.document=updateLookup
collation=
Below there's an example of a message String value:
"{\"_id\": {\"_data\": \"xxx\"}, \"operationType\": \"replace\", \"clusterTime\": {\"$timestamp\": {\"t\": 1564140389, \"i\": 1}}, \"fullDocument\": {\"_id\": \"5\", \"name\": \"Some Client\", \"clientId\": \"someclient\", \"clientSecret\": \"1234\", \"whiteListedIps\": [], \"enabled\": true, \"_class\": \"myproject.Client\"}, \"ns\": {\"db\": \"mydb\", \"coll\": \"mycollection\"}, \"documentKey\": {\"_id\": \"5\"}}"
I tried using a transform to extract if from the value, specifically from the documentKey field:
transforms=InsertKey
transforms.InsertKey.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.InsertKey.fields=documentKey
But got an exception:
Caused by: org.apache.kafka.connect.errors.DataException: Only Struct objects supported for [copying fields from value to key], found: java.lang.String
at org.apache.kafka.connect.transforms.util.Requirements.requireStruct(Requirements.java:52)
at org.apache.kafka.connect.transforms.ValueToKey.applyWithSchema(ValueToKey.java:79)
at org.apache.kafka.connect.transforms.ValueToKey.apply(ValueToKey.java:65)
Any ideas to generate a key with the document id?
According to exception, that is thrown:
Caused by: org.apache.kafka.connect.errors.DataException: Only Struct objects supported for [copying fields from value to key], found: java.lang.String
at org.apache.kafka.connect.transforms.util.Requirements.requireStruct(Requirements.java:52)
at org.apache.kafka.connect.transforms.ValueToKey.applyWithSchema(ValueToKey.java:79)
at org.apache.kafka.connect.transforms.ValueToKey.apply(ValueToKey.java:65)
Unfortunately Mongo DB connector, that you use, it doesn't create properly schema.
Above connector create Record with key and value schema's as String.
Check this line:: How record is created by connector. That is the reason why you can't apply Transformation to it
This should be supported in release 1.3.0:
https://jira.mongodb.org/browse/KAFKA-40

kafka connect - How to filter schema metadata from payload

I'm trying to remove schema from the payload and here are the configurations
connector.properties
name=test-source-mysql-jdbc-autoincrement
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
value.converter=org.apache.kafka.connect.json.JsonConverter
tasks.max=1
connection.url=jdbc:mysql://127.0.0.1:3306/employee_db?user=root&password=root
table.whitelist=testemp
mode=incrementing
incrementing.column.name=employee_id
topic.prefix=test-mysql-jdbc-
and below are my worker.properties
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=C:\Users\name\Desktop\kafka\libs
output:
{"schema":{"type":"struct","fields":[{"type":"int32","optional":false,"field":"employee_id"},{"type":"string","optional":false,"field":"first_name"}],"optional":false,"name":"testemp"},"payload":{"employee_id":2,"first_name":"test"}}
excepted output:
{"payload":{"employee_id":2,"first_name":"test"}}
I tried disabling value.converter.schemas.enable= false in worker as suggested in here still no effect
Am i missing something?
There are two option to fix it:
Remove value.converter property from your connector configuration (You use same value.converter)
Set value.converter.schemas.enable=false in your connector configuration.
Schema is added to message, because you have overwritten value converter and didn't disable schema (by default for JsonConverter schema is enabled). From Kafka Connect point of view you used completely new Converter (it will not use properties from global configuration)
If you will disable schema your message will be as follow:
{
"employee_id": 2,
"first_name":"test"
}