HTTP Sink Connector not Batching the messages - apache-kafka

I am using below HTTP Sink connector config and it is still sending records one by one. It is supposed to send data in a batch of 50 messages.
{
"name": "HTTPSinkConnector_1",
"config": {
"topics": "topic_1",
"tasks.max": "1",
"connector.class": "io.confluent.connect.http.HttpSinkConnector",
"http.api.url": "http://localhost/messageHandler",
"request.method": "POST",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"confluent.topic.bootstrap.servers": "kafka:19092",
"confluent.topic.replication.factor": "1",
"batching.enabled": true,
"batch.max.size": 50,
"reporter.bootstrap.servers": "kafka:19092",
"reporter.result.topic.name": "success-responses",
"reporter.result.topic.replication.factor": "1",
"reporter.error.topic.name": "error-responses",
"reporter.error.topic.replication.factor": "1",
"request.body.format": "json"
}
}
Could someone please suggest if any another property is missing?

The HTTP Sink connector does not batch requests for messages containing Kafka header values that are different.
https://docs.confluent.io/kafka-connectors/http/current/overview.html#features
The workaround would be to either:
remove the headers
use Kafka Streams to manually window the data on your own into a new topic, which the connector reads

Related

How to replace a field value using Kafka Transform ReplaceField in Sink Connector

I want to rename the value of key name "id" I've use case where producer publish messages to this topic (product-topic) either of these messages "id": "test.product.mobile" or "id": "test.product.computer"
My HTTP sink connector consume the message from this topic and want to do the transformation (rename the field's value)
For example,
if producer sends "id": "test.product.mobile" I want to replace like this "id": "test.product.iPhone"
if producer sends "id": "test.product.computer" I want to replace like this "id": "test.product.mac"
I'm using HTTP sink connector and transform package to replace field value, but it's not working as expected. Please find the connector configuration below:
{
"connector.class": "io.confluent.connect.http.HttpSinkConnector",
"confluent.topic.bootstrap.servers": "localhost:9092",
"topics": "product-topic",
"tasks.max": "1",
"http.api.url": "http://localhost:8080/product/create",
"reporter.bootstrap.servers": "localhost:9092",
"reporter.error.topic.name": "error-responses",
"reporter.result.topic.name": "success-responses",
"reporter.error.topic.replication.factor": "1",
"confluent.topic.replication.factor": "1",
"errors.tolerance": "all",
"value.converter.schemas.enable": "false",
"batch.json.as.array": "true",
"name": "Product-Connector",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"reporter.result.topic.replication.factor": "1",
"transforms": "RenameField",
"transforms.RenameField.type": "org.apache.kafka.connect.transforms.ReplaceField$Key",
"transforms.RenameField.renames": "id:test.product.iPhone"
}
Producer send messages like below
{
"id": "test.product.mobile",
"price": "1232"
}
{
"id": "test.product.computer",
"price": "2032"
}
Expected Output:
{
"id": "test.product.iPhone",
"price": "1232"
}
{
"id": "test.product.mac",
"price": "2032"
}
I referred the Kafka Confluent Docs to rename a field but that example works well if we want to replace the key name but not value. Can someone please help me with use case - what needs to be change to rename the field value?
Appreciated your help in advance. Thanks!
It's not possible to replace field value text with any (included) SMTs, outside of masking, no.
You could write (or find) your own SMT, but otherwise, the recommended pattern for this is a KStreams/ksqlDB process.
Or, simply have your initial Kafka producer send the values that you want to sink to the HTTP server.

Produce Avro messages in Confluent Control Center UI

To develop a data transfer application I need first define a key/value avro schemas. The producer application is not developed yet till define the avro schema.
I cloned a topic and its key/value avro schemas that are already working and
and also cloned the the jdbc snink connector. Simply I just changed the topic and connector names.
Then I copied and existing message successfully sent sink using Confluent Topic Message UI Producer.
But it is sending the error: "Unknown magic byte!"
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.getByteBuffer(AbstractKafkaSchemaSerDe.java:250)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer$DeserializationContext.<init>(AbstractKafkaAvroDeserializer.java:323)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaAvroDeserializer.java:164)
at io.confluent.connect.avro.AvroConverter$Deserializer.deserialize(AvroConverter.java:172)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:107)
... 17 more
[2022-07-25 03:45:42,385] INFO Stopping task (io.confluent.connect.jdbc.sink.JdbcSinkTask)
Reading other questions it seems the message has to be serialized using the schema.
Unknown magic byte with kafka-avro-console-consumer
is it possible to send a message to a topic with AVRO key/value schemas using the Confluent Topic UI?
Any idea if the avro schemas need information depending on the connector/source? or if namespace depends on the topic name?
This is my key schema. And the topic's name is knov_03
{
"connect.name": "dbserv1.MY_DB_SCHEMA.ps_sap_incoming.Key",
"fields": [
{
"name": "id_sap_incoming",
"type": "long"
}
],
"name": "Key",
"namespace": "dbserv1.MY_DB_SCHEMA.ps_sap_incoming",
"type": "record"
}
Connector:
{
"name": "knov_05",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "knov_03",
"connection.url": "jdbc:mysql://eXXXXX:3306/MY_DB_SCHEMA?useSSL=FALSE&nullCatalogMeansCurrent=true",
"connection.user": "USER",
"connection.password": "PASSWORD",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.mode": "record_key",
"pk.fields": "id_sap_incoming",
"auto.create": "true",
"auto.evolve": "true",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"key.converter.schema.registry.url": "http://schema-registry:8081"
}
}
Thanks.

Getting junk/control characters in Kafka topic from confluent MongoDb source connector

I am trying to get data in kafka topic from confluent MongoDB source connector. Below is config of connector:
**{
"value.converter.schema.registry.url": <-SR url->,
"key.converter.schema.registry.url": <-SR url->,
"name": <-connector-name->,
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"tasks.max": "2",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"connection.uri": <-mongo-source-url->,
"database": <-source-db->,
"collection": <-source-tablename->,
"publish.full.document.only": "true",
"output.format.key": "json",
"output.format.value": "json",
"output.json.formatter": "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson",
"copy.existing": "false"
}**
I am getting data in kafka topic but key & value are like below:
**"key":"\u0000\u0000\u0000\u0005Đ\u0002{\"_id\": {\"_data\": \"ABCD123\"}}",
"value":"\u0000\u0000\u0000\u0005��\t{\"_id\": \"ABCD123\", \"name\": abc, \"id\": 174}"**
Can anyone has gone through similar issue?
My topic schema is in avro format that's why I need to use AvroConverter only. Also I have to use 'output.format' for key and value as 'json' only as source schema is not constant.
Would really appreciate any help here.

write Debezium Kafka topic data to Hive through HDFS sink connector not work

I try to capture MySQL data changes with Debezium MySQL connector to Kafka, then write the changes finally to Hive on Hadoop through HDFS Sink connector. The pipeline is like: MySQL -> Kafka -> Hive.
The sink connector is configured like following screenshot.
{
"name": "hdfs-sink",
"config": {
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"tasks.max": "1",
"topics": "customers",
"hdfs.url": "hdfs://192.168.10.15:8020",
"flush.size": "3",
"hive.integration": "true",
"hive.database":"inventory",
"hive.metastore.uris":"thrift://192.168.10.14:9083",
"schema.compatibility":"BACKWARD",
"transforms": "unwrap,key",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.drop.tombstones": "false",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "id"
}
}
This seems working but querying the Hive table and I see the changed data (wrapped in after key is displayed in after conlumn instead of delimiting the data into original table columns.
Here is the query result scrrenshot.
As you can see in sink configuration, I already try to use Debezium's "io.debezium.transforms.UnwrapFromEnvelope" operator to unwrap the event message but obviously it is not working.
What is the minimal settings that can let me write the DB change events from Kafka to Hive? Is it HDFS Sink connector the right choice for this job?
Update:
I test this with the sample "inventory" database from Debezium database.
I got the test environment from Debezium images so that they should be the lastest ones. Some version info here: debezium 1.0, kafka 2.0, confluent-kafka-connect-hdfs sink connector: 5.4.1.
update 2:
I moved on to use following sink config, but still no luck:
{
"name": "hdfs-sink",
"config": {
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"tasks.max": "1",
"topics": "dbserver1.inventory.customers",
"hdfs.url": "hdfs://172.17.0.8:8020",
"flush.size": "3",
"hive.integration": "true",
"hive.database":"inventory",
"hive.metastore.uris":"thrift://172.17.0.8:9083",
"schema.compatibility":"BACKWARD",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.drop.tombstones": "false"
}
}

Error handling for invalid JSON in kafka sink connector

I have a sink connector for mongodb, that takes json from a topic and puts it into the mongoDB collection. But, when I send an invalid JSON from a producer to that topic (e.g. with an invalid special character ") => {"id":1,"name":"\"}, the connector stops. I tried using errors.tolerance = all, but the same thing is happening. What should happen is that the connector should skip and log that invalid JSON, and keep the connector running. My distributed-mode connector is as follows:
{
"name": "sink-mongonew_test1",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "error7",
"connection.uri": "mongodb://****:27017",
"database": "abcd",
"collection": "abc",
"type.name": "kafka-connect",
"key.ignore": "true",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"value.projection.list": "id",
"value.projection.type": "whitelist",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneTimestampsStrategy",
"delete.on.null.values": "false",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"errors.deadletterqueue.topic.name": "crm_data_deadletterqueue",
"errors.deadletterqueue.topic.replication.factor": "1",
"errors.deadletterqueue.context.headers.enable": "true"
}
}
Since Apache Kafka 2.0, Kafka Connect has included error handling options, including the functionality to route messages to a dead letter queue, a common technique in building data pipelines.
https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/
As commented, you're using connect-api-1.0.1.*.jar, version 1.0.1, so that explains why those properties are not working
Your alternatives outside of running a newer version of Kafka Connect include Nifi or Spark Structured Streaming