Kafka connect with ibm-messaging connector not ingesting data into postgres - postgresql

I am trying to ingest data from kafka-topic to postgres. I have successfully connected kafka and postgres through ibm-messaging/kafka-connect-jdbc-sink connector.Below is my jdbc-sink-connector file:
name=jdbc-sink-connector1
connector.class=com.ibm.eventstreams.connect.jdbcsink.JDBCSinkConnector
tasks.max=1
# Below is the postgresql driver
driver.class=org.postgresql.Driver
# The topics to consume from - required for sink connectors
topics=connector7
# Configuration specific to the JDBC sink connector.
# We want to connect to a SQLite database stored in the file test.db and auto-create tables.
connection.url=jdbc:postgresql://localhost:5432/abc
connection.user=xyz
connection.password=***
connection.ds.pool.size=5
connection.table=company
insert.mode.databaselevel=true
put.mode=insert
table.name.format=company
auto.create=true
Now in kafka-producer I am sending data as follows:
{"schema":
{"type": "struct","fields": [
{ "field": "id", "type": "integer", "optional": false },
{ "field": "time", "type": "timestamp", "optional": false },
{ "field": "name", "type": "string", "optional": false },
{ "field": "company", "type": "string", "optional": false }]},
"payload":
{"id":"1",
"time":"2022-02-08 16:18:00-05",
"name":"ned",
"company":"advice"}}
It is giving the following error:
ERROR WorkerSinkTask{id=jdbc-sink-connector1-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: 1 (org.apache.kafka.connect.runtime.WorkerSinkTask:607)
java.lang.ArrayIndexOutOfBoundsException: 1
ERROR WorkerSinkTask{id=jdbc-sink-connector1-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:184)
Now according to my understanding I think there is some problem in my JSON in the way I am giving data to the producer but cannot seem to figure what it is?

Related

For Kafka mongo sink connector I want to map same Object ID which is existing

I am Using kafka sink connector for mongodb. Here I want to push some json document from kafka topic to mongodb, But I am facing error at using $oid in the document.
Below is the error:
{"name":"mongodb-sink-connector","connector":{"state":"RUNNING","worker_id":"localhost:8083"},"tasks":[{"id":0,"state":"FAILED","worker_id":"localhost:8083","trace":"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:610)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:330)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:237)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.kafka.connect.errors.DataException: Failed to write mongodb documents\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:227)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1541)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.put(MongoSinkTask.java:122)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:582)\n\t... 10 more\nCaused by: java.lang.IllegalArgumentException: Invalid BSON field name $oid\n\tat org.bson.AbstractBsonWriter.writeName(AbstractBsonWriter.java:534)\n\tat com.mongodb.internal.connection.BsonWriterDecorator.writeName(BsonWriterDecorator.java:193)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:117)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat org.bson.codecs.EncoderContext.encodeWithChildContext(EncoderContext.java:91)\n\tat org.bson.codecs.BsonDocumentCodec.writeValue(BsonDocumentCodec.java:139)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:118)\n\tat org.bson.codecs.BsonDocumentCodec.encode(BsonDocumentCodec.java:42)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:221)\n\tat com.mongodb.internal.connection.SplittablePayload$WriteRequestEncoder.encode(SplittablePayload.java:187)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:63)\n\tat org.bson.codecs.BsonDocumentWrapperCodec.encode(BsonDocumentWrapperCodec.java:29)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writeDocument(BsonWriterHelper.java:77)\n\tat com.mongodb.internal.connection.BsonWriterHelper.writePayload(BsonWriterHelper.java:59)\n\tat com.mongodb.internal.connection.CommandMessage.encodeMessageBodyWithMetadata(CommandMessage.java:162)\n\tat com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:138)\n\tat com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:59)\n\tat com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:268)\n\tat com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:100)\n\tat com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:490)\n\tat com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:71)\n\tat com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:253)\n\tat com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:202)\n\tat com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:118)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeCommand(MixedBulkWriteOperation.java:431)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:251)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:194)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.OperationHelper.withReleasableConnection(OperationHelper.java:621)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:185)\n\tat com.mongodb.internal.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:76)\n\tat com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:187)\n\tat com.mongodb.client.internal.MongoCollectionImpl.executeBulkWrite(MongoCollectionImpl.java:442)\n\tat com.mongodb.client.internal.MongoCollectionImpl.bulkWrite(MongoCollectionImpl.java:422)\n\tat com.mongodb.kafka.connect.sink.MongoSinkTask.bulkWriteBatch(MongoSinkTask.java:209)\n\t... 13 more\n"}],"type":"sink"}
Below is the document I inserted in kafka topic:
{"_id": {"$oid": "634fd99b52281517a468f3a7"},"schema": {"type": "struct", "fields": [{"type": "int32","optional": true, "field": "id"}, {"type": "string", "optional": true, "field": "name"}, {"type": "string", "optional": true, "field": "middel_name"}, {"type": "string", "optional": true, "field": "surname"}],"optional": false, "name": "foobar"},"payload": {"id":45,"name":"mongo","middle_name": "mmp","surname": "kafka"}}
Below is my Connector settings I have used:
{
"name": "mongodb-sink-connector",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"topics": "migration-mongo",
"connection.uri": "mongodb://abc:xyz#xx.xx.xx.01:27018,xx.xx.xx.02:27018,xx.xx.xx.03:27018/?authSource=admin&replicaSet=dev",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"document.id.strategy.overwrite.existing": "false",
"validate.non.null": false,
"database": "foo",
"collection": "product"
}
}
Kafka Connect JSONConverter payloads should only have schema and payload fields, not _id. And you need "value.converter.schemas.enable": "true". If you set that to false, then you can remove schema and payload, and put _id directly in the payload...
The ID used by the Mongo Client is more commonly associated with the Kafka record key itself, not any values embedded within the value part that you've shown, but this depends on the ID Strategy

MSK S3 sink not working with Field Partitioner

I am using AWS MSK and msk connect. S3 sink connector is not working properly when I added io.confluent.connect.storage.partitioner.FieldPartitioner Note:without fieldPartitioner s3sink had worked. Other than this stack overflow Question Link I was not able to find any resource
Error
ERROR [FieldPart-sink|task-0] Value is not Struct type. (io.confluent.connect.storage.partitioner.FieldPartitioner:81)
Caused by: io.confluent.connect.storage.errors.PartitionException: Error encoding partition.
ERROR [Sink-FieldPartition|task-0] WorkerSinkTask{id=Sink-FieldPartition-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: Error encoding partition. (org.apache.kafka.connect.runtime.WorkerSinkTask:612)
MSK Connect Config
connector.class=io.confluent.connect.s3.S3SinkConnector
format.class=io.confluent.connect.s3.format.avro.AvroFormat
flush.size=1
schema.compatibility=BACKWARD
tasks.max=2
topics=MSKTutorialTopic
storage.class=io.confluent.connect.s3.storage.S3Storage
topics.dir=mskTrials
s3.bucket.name=clickstream
s3.region=us-east-1
partitioner.class=io.confluent.connect.storage.partitioner.FieldPartitioner
partition.field.name=name
value.converter.schemaAutoRegistrationEnabled=true
value.converter.registry.name=datalake-schema-registry
value.convertor.schemaName=MSKTutorialTopic-value
value.converter.avroRecordType=GENERIC_RECORD
value.converter.region=us-east-1
value.converter.schemas.enable=true
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter=org.apache.kafka.connect.storage.StringConverter
Data Schema which is stored in glue schema registry
{
"namespace": "example.avro",
"type": "record",
"name": "UserData",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "favorite_number",
"type": [
"int",
"null"
]
},
{
"name": "favourite_color",
"type": [
"string",
"null"
]
}
]
}
In order to partition by fields, your data needs actual fields.
StringConverter cannot parse data it consumes to add said fields. Use AvroConverter if you have Avro data in the topic. Also, Avro always has a schema, so remove the schemas.enable configuration

Debezium might produce invalid schema

I face an issue with Avro and Schema Registry. After Debezium created a schema and a topic, I have downloaded the schema from Schema Registry. I put it into a .asvc file and it looks like this:
{
"type": "record",
"name": "Envelope",
"namespace": "my.namespace",
"fields": [
{
"name": "before",
"type": [
"null",
{
"type": "record",
"name": "MyValue",
"fields": [
{
"name": "id",
"type": "int"
}
]
}
],
"default": null
},
{
"name": "after",
"type": [
"null",
"MyValue"
],
"default": null
}
]
}
I ran two experiments:
I tried to put it back into Schema Registry but I get this error: MyValue is not correct. When I remove "after" record, the schema seems to work well.
I used 'generate-sources' from avro-maven-plugin to generate the Java classes. When I try to consume the topic above, I see this error:
Exception in thread "b2-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Exception caught in process. [...]: Error registering Avro schema: [...]
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema being registered is incompatible with an earlier schema; error code: 409
Did anyone faced the same problem? Is it Debezium that is producing an invalid schema or is Schema Registry that is having a bug?
MyValue is not correct.
That's not an Avro type. You would have to embed the actual record within the union, just like the before value
In other words, you're not able to cross reference the record types within a schema, AFAIK
When I try to consume the topic above, I see this error:
A consumer does not register schemas, so it's not clear how you're getting that error unless maybe using Kafka Streams, which produces into intermediate topics

Confluent Schema Registry timed out error

I'm using Avro schema to write data to Kafka topic. Initially, everything worked fine. After adding one more new field(scan_app_id) in avro file. I'm facing this error.
Avro file:
{
"type": "record", "name": "Initiate_Scan", "namespace": "avro",
"doc": "Avro schema registry for Initiate_Scan", "fields": [
{
"name": "app_id",
"type": "string",
"doc": "3 digit application id"
},
{
"name": "app_name",
"type": "string",
"doc": "application name"
},
{
"name": "dev_stage",
"type": "string",
"doc": "development stage"
},
{
"name": "scan_app_id",
"type": "string",
"doc": "unique scan id for an app in Veracode"
},
{
"name": "scan_name",
"type": "string",
"doc": "scan details"
},
{
"name": "seq_num",
"type": "int",
"doc": "unique number"
},
{
"name": "result_flg",
"type": "string",
"doc": "Y indicates results of scan available",
"default": "Y"
},
{
"name": "request_id",
"type": "int",
"doc": "unique id"
},
{
"name": "scan_number",
"type": "int",
"doc": "number of scans"
} ] }
Error:
Caused by: org.apache.kafka.common.errors.SerializationException:
Error registering Avro schema:
{"type":"record","name":"Initiate_Scan","namespace":"avro","doc":"Avro
schema registry for
Initiate_Scan","fields":[{"name":"app_id","type":{"type":"string","avro.java.string":"String"},"doc":"3
digit application
id"},{"name":"app_name","type":{"type":"string","avro.java.string":"String"},"doc":"application
name"},{"name":"dev_stage","type":{"type":"string","avro.java.string":"String"},"doc":"development
stage"},{"name":"scan_app_id","type":{"type":"string","avro.java.string":"String"},"doc":"unique
scan id for an
App"},{"name":"scan_name","type":{"type":"string","avro.java.string":"String"},"doc":"scan
details"},{"name":"seq_num","type":"int","doc":"unique
number"},{"name":"result_flg","type":{"type":"string","avro.java.string":"String"},"doc":"Y
indicates results of scan
available","default":"Y"},{"name":"request_id","type":"int","doc":"unique
id"},{"name":"scan_number","type":"int","doc":"number of scans"}]}
INFO Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer:1017)
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Register operation timed out; error code: 50002
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:182)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:203)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:292)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:284)
at io.confluent.kafka.schemaregistry.client.rest.RestService.registerSchema(RestService.java:279)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.registerAndGetId(CachedSchemaRegistryClient.java:61)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.register(CachedSchemaRegistryClient.java:93)
at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:72)
at io.confluent.kafka.serializers.KafkaAvroSerializer.serialize(KafkaAvroSerializer.java:54)
at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:65)
at org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:55)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:768)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:745)
at com.ssc.svc.svds.initiate.InitiateProducer.initiateScanData(InitiateProducer.java:146)
at com.ssc.svc.svds.initiate.InitiateProducer.topicsData(InitiateProducer.java:41)
at com.ssc.svc.svds.initiate.InputData.main(InputData.java:31)
I went through Confluent documentation about 50002 error, which says
A schema should be compatible with the previously registered schema.
Does this mean I cannot make changes/update existing schema ?
How to fix this?
Actually, the link says 50002 -- Operation timed out. If it was indeed incompatible, the response would actually say so.
In any case, if you add a new field, you are required to define a default value.
This way, any consumers defined with a newer schema that are reading older messages know what value to set to that field.
A straight-forward list of allowed Avro changes I found is by Oracle
Possible errors are:
A field is added without a default value

Getting exception when i try to copy data from Azure Blob to SQL Datawarehouse using DataFactory

I have created a pipeline that uses the Copy activity to move data from Blob to SQL Datawarehouse.
Azure Blob Dataset:
"name": "TradeData",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "HDInsightStorageLinkedService",
"structure": [],
"typeProperties": {
"folderPath": "hdinsight/hive/warehouse/tradesummary/",
"format": {
"type": "OrcFormat"
}
},
SQL DW Dataset:
"name": "TradeDataRepository",
"properties": {
"type": "AzureSqlDWTable",
"linkedServiceName": "AzureSQLDataWarehouseLinkedService",
"typeProperties": {
"tableName": "tradesummary"
},
Pipeline:
"activities": [
{
"name": "CopyActivityTemplate",
"type": "Copy",
"inputs": [
{
"name": "TradeData"
}
],
"outputs": [
{
"name": "TradeDataRepository"
}
],
"typeProperties": {
"source": {
"type": "BlobSource",
"skipHeaderLineCount": 0
},
"sink": {
"type": "SqlDWSink",
"allowPolyBase": false
}
When I execute the pipeline, i get following error:
Database operation failed.
Error message from database execution : ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException, Message: Java exception raised on call to HdfsBridge_CreateRecordReader: Error [HdfsBridge::CreateRecordReader - Unexpected error encountered creating the record reader.] occurred while accessing external file [/hive/warehouse/tradesummary/000000_0][0].,Source=.Net SqlClient Data Provider,SqlErrorNumber=110802,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=110802,State=1,Message=110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: Microsoft.SqlServer.DataWarehouse.DataMovement.Common.ExternalAccess.HdfsAccessException, Message: Java exception raised on call to HdfsBridge_CreateRecordReader: Error [HdfsBridge::CreateRecordReader - Unexpected error encountered creating the record reader.] occurred while accessing external file [/hive/warehouse/tradesummary/000000_0][0].,},],'.
Any pointers would be appreciated.
The error message indicates that the problem is with your sink (SQL Data warehouse), not with ADF. When there is inadequate system memory in the resource pool, this might be a temporary issue.
To double-check, try activating and disabling the "Skip incompatible rows" option in the ADF copy activity and doing a few test runs to see whether you're seeing the problem regularly.
As stated by wBob as per the MSDN article there are other few reasons which could be when the source data includes a bit or a unique identification It appears that the bit problem has been resolved, but the unique identifier has not. Can you check that one of these datatypes exists in your source data. The GUIDs can be converted to varchar as a workaround.