Kafka JDBC sink no handling null values - apache-kafka

I am trying to insert data with the Kafka JDBC Sink connector, but it is returning me this exception.
org.apache.kafka.connect.errors.DataException: Invalid null value for required INT64 field
The records have the following schema:
[
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"field": "ID"
},
{
"type": "int64",
"field": "TENANT_ID"
},
{
"type": "string",
"field": "ITEM"
},
{
"type": "int64",
"field": "TIPO"
},
{
"type": "int64",
"field": "BUSINESS_CONCEPT"
},
{
"type": "string",
"field": "ETIQUETA"
},
{
"type": "string",
"field": "VALOR"
},
{
"type": "string",
"field": "GG_T_TYPE"
},
{
"type": "string",
"field": "GG_T_TIMESTAMP"
},
{
"type": "string",
"field": "TD_T_TIMESTAMP"
},
{
"type": "string",
"field": "POS"
}
]
},
"payload": {
"ID": 298457,
"TENANT_ID": 83,
"ITEM": "0-0-0",
"TIPO": 4,
"BUSINESS_CONCEPT": null,
"ETIQUETA": "¿Cuándo ha ocurrido?",
"VALOR": "2019-05-31T10:33:00Z",
"GG_T_TYPE": "I",
"GG_T_TIMESTAMP": "2019-05-31 14:35:19.002221",
"TD_T_TIMESTAMP": "2019-06-05T10:46:55.0106",
"POS": "00000000530096832544"
}
}
]
As you can see, the value BUSINESS_CONCEPT can be null. It is the only null value, so I suppose the exception is due to that field. How could I make the sink insert the value as null?

You need to change the definition of
{
"type": "int64",
"field": "BUSINESS_CONCEPT"
}
to
{
"type": ["null", "int64"],
"field": "BUSINESS_CONCEPT"
}
in order to treat BUSINESS_CONCEPT as optional field.

Related

org.apache.avro.AvroTypeException: Unknown union branch EventId

I am trying to convert a json to avro using 'kafka-avro-console-producer' and publish it to kafka topic.
I am able to do that flat json/schema's but for below given schema and json I am getting
"org.apache.avro.AvroTypeException: Unknown union branch EventId" error.
Any help would be appreciated.
Schema :
{
"type": "record",
"name": "Envelope",
"namespace": "CoreOLTPEvents.dbo.Event",
"fields": [{
"name": "before",
"type": ["null", {
"type": "record",
"name": "Value",
"fields": [{
"name": "EventId",
"type": "long"
}, {
"name": "CameraId",
"type": ["null", "long"],
"default": null
}, {
"name": "SiteId",
"type": ["null", "long"],
"default": null
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Value"
}],
"default": null
}, {
"name": "after",
"type": ["null", "Value"],
"default": null
}, {
"name": "op",
"type": "string"
}, {
"name": "ts_ms",
"type": ["null", "long"],
"default": null
}],
"connect.name": "CoreOLTPEvents.dbo.Event.Envelope"
}
And Json input is like below :
{
"before": null,
"after": {
"EventId": 12,
"CameraId": 10,
"SiteId": 11974
},
"op": "C",
"ts_ms": null
}
And in my case I cant alter schema, I can alter only json such a way that it works
If you are using the Avro JSON format, the input you have is slightly off. For unions, non-null values need to be specified such that the type information is listed: https://avro.apache.org/docs/current/spec.html#json_encoding
See below for an example which I think should work.
{
"before": null,
"after": {
"CoreOLTPEvents.dbo.Event.Value": {
"EventId": 12,
"CameraId": {
"long": 10
},
"SiteId": {
"long": 11974
}
}
},
"op": "C",
"ts_ms": null
}
Removing "connect.name": "CoreOLTPEvents.dbo.Event.Value" and "connect.name": "CoreOLTPEvents.dbo.Event.Envelope" as The RecordType can only contains {'namespace', 'aliases', 'fields', 'name', 'type', 'doc'} keys.
Could you try with below schema and see if you are able to produce the msg?
{
"type": "record",
"name": "Envelope",
"namespace": "CoreOLTPEvents.dbo.Event",
"fields": [
{
"name": "before",
"type": [
"null",
{
"type": "record",
"name": "Value",
"fields": [
{
"name": "EventId",
"type": "long"
},
{
"name": "CameraId",
"type": [
"null",
"long"
],
"default": "null"
},
{
"name": "SiteId",
"type": [
"null",
"long"
],
"default": "null"
}
]
}
],
"default": null
},
{
"name": "after",
"type": [
"null",
"Value"
],
"default": null
},
{
"name": "op",
"type": "string"
},
{
"name": "ts_ms",
"type": [
"null",
"long"
],
"default": null
}
]
}

Need primary key information in Debezium connector for postgres insert events

I am using Debezium connector for postgres with Kafka connect.
For an insert row event written to Kafka by the connector, I need information about which columns are primary keys and which are not. Is there a way to achieve this ?
Pasting a sample insert event generated in Kafka:
"schema": {
"type": "struct",
"fields": [
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "int32",
"optional": false,
"field": "bucket_type"
}
],
"optional": true,
"name": "postgresconfigdb.config.alert_configs.Value",
"field": "before"
},
{
"type": "struct",
"fields": [
{
"type": "int32",
"optional": false,
"field": "id"
},
{
"type": "int32",
"optional": false,
"field": "bucket_type"
}
],
"optional": true,
"name": "postgresconfigdb.config.alert_configs.Value",
"field": "after"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "version"
},
{
"type": "string",
"optional": false,
"field": "connector"
},
{
"type": "string",
"optional": false,
"field": "name"
},
{
"type": "int64",
"optional": false,
"field": "ts_ms"
},
{
"type": "string",
"optional": true,
"name": "io.debezium.data.Enum",
"version": 1,
"parameters": {
"allowed": "true,last,false"
},
"default": "false",
"field": "snapshot"
},
{
"type": "string",
"optional": false,
"field": "db"
},
{
"type": "string",
"optional": false,
"field": "schema"
},
{
"type": "string",
"optional": false,
"field": "table"
},
{
"type": "int64",
"optional": true,
"field": "txId"
},
{
"type": "int64",
"optional": true,
"field": "lsn"
},
{
"type": "int64",
"optional": true,
"field": "xmin"
}
],
"optional": false,
"name": "io.debezium.connector.postgresql.Source",
"field": "source"
},
{
"type": "string",
"optional": false,
"field": "op"
},
{
"type": "int64",
"optional": true,
"field": "ts_ms"
},
{
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "id"
},
{
"type": "int64",
"optional": false,
"field": "total_order"
},
{
"type": "int64",
"optional": false,
"field": "data_collection_order"
}
],
"optional": true,
"field": "transaction"
}
],
"optional": false,
"name": "postgresconfigdb.config.alert_configs.Envelope"
},
"payload": {
"before": null,
"after": {
"id": 1100,
"bucket_type": 10
},
"source": {
"version": "1.2.0.Final",
"connector": "postgresql",
"name": "postgresconfigdb",
"ts_ms": 1599830887858,
"snapshot": "true",
"db": "configdb",
"schema": "config",
"table": "alert_configs",
"txId": 2139888,
"lsn": 379356048,
"xmin": null
},
"op": "r",
"ts_ms": 1599830887859,
"transaction": null
}
}
Here the columns in the table are 'id' and 'bucket_type', the values of which are reported in the json-path payload->after.
There is information about columns which are not null in the column specific 'optional' boolean field, however no information about which columns are primary keys. (id in this case)
you find information about what fields are PK columns in Kafka key.

Getting error on null and empty string while copying a csv file from blob container to Azure SQL DB

I tried all combination on the datatype of my data but each time my data factory pipeline is giving me this error:
{
"errorCode": "2200",
"message": "ErrorCode=UserErrorColumnNameNotAllowNull,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Empty or Null string found in Column Name 2. Please make sure column name not null and try again.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "xxx",
"details": []
}
My Copy data source code is something like this:{
"name": "xxx",
"description": "uuu",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"wildcardFileName": "*"
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "populationId",
"type": "Guid"
},
"sink": {
"name": "PopulationID",
"type": "String"
}
},
{
"source": {
"name": "inputTime",
"type": "DateTime"
},
"sink": {
"name": "inputTime",
"type": "DateTime"
}
},
{
"source": {
"name": "inputCount",
"type": "Decimal"
},
"sink": {
"name": "inputCount",
"type": "Decimal"
}
},
{
"source": {
"name": "inputBiomass",
"type": "Decimal"
},
"sink": {
"name": "inputBiomass",
"type": "Decimal"
}
},
{
"source": {
"name": "inputNumber",
"type": "Decimal"
},
"sink": {
"name": "inputNumber",
"type": "Decimal"
}
},
{
"source": {
"name": "utcOffset",
"type": "String"
},
"sink": {
"name": "utcOffset",
"type": "Int32"
}
},
{
"source": {
"name": "fishGroupName",
"type": "String"
},
"sink": {
"name": "fishgroupname",
"type": "String"
}
},
{
"source": {
"name": "yearClass",
"type": "String"
},
"sink": {
"name": "yearclass",
"type": "String"
}
}
]
}
},
"inputs": [
{
"referenceName": "DelimitedTextFTDimensions",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "AzureSqlTable1",
"type": "DatasetReference"
}
]
}
Can anyone please help me understand the issue. I see in some blogs they ask me use treatnullasempty but I am not allowed to modify the JSON. is there a way to do that??
I suggest to using Data Flow DerivedColumn, DerivedColumn can help you build expression to replace the null column.
For example:
Derived Column, if Column_2 is null =true, return 'dd' :
iifNull(Column_2,'dd')
Mapping the column
Reference: Data transformation expressions in mapping data flow
Hope this helps.
fixed it.it was a easy fix as one of my column in destination was marked as not null, i changed it as null and it worked.

Conversion of JSON to Avro failed: Failed to convert JSON to Avro: Unknown union branch

I am trying to send a JSON message to a Kafka topic using Kafka-rest service to serialize JSON as an Avro object, but the JSON message is failed to get accepted by Kafka-rest with the following error:
Conversion of JSON to Avro failed: Failed to convert JSON to Avro: Unknown union branch postId
I suspect that there is an issue with the Avro schema I am using as it is a nested record type with nullable fields.
Avro schema:
{
"type": "record",
"name": "ExportRequest",
"namespace": "com.example.avro.model",
"fields": [
{
"name": "context",
"type": {
"type": "map",
"values": {
"type": "string",
"avro.java.string": "String"
},
"avro.java.string": "String"
}
},
{
"name": "exportInfo",
"type": {
"type": "record",
"name": "ExportInfo",
"fields": [
{
"name": "exportId",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
{
"name": "exportType",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
{
"name": "exportQuery",
"type": {
"type": "record",
"name": "ExportQuery",
"fields": [
{
"name": "postExport",
"type": [
"null",
{
"type": "record",
"name": "PostExport",
"fields": [
{
"name": "postId",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
{
"name": "isCommentIncluded",
"type": "boolean"
}
]
}
],
"default": null
},
{
"name": "feedExport",
"type": [
"null",
{
"type": "record",
"name": "FeedExport",
"fields": [
{
"name": "accounts",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "recordTypes",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "actions",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "contentTypes",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "startDate",
"type": "long"
},
{
"name": "endDate",
"type": "long"
},
{
"name": "advancedSearch",
"type": [
"null",
{
"type": "record",
"name": "AdvancedSearchExport",
"fields": [
{
"name": "allOfTheWords",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "anyOfTheWords",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "noneOfTheWords",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "hashtags",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "keyword",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
{
"name": "exactPhrase",
"type": {
"type": "string",
"avro.java.string": "String"
}
}
]
}
],
"default": null
}
]
}
],
"default": null
}
]
}
}
]
}
}
]
}
Json message:
{
"context": {
"user_id": "1",
"group_id": "1",
"organization_id": "1"
},
"exportInfo": {
"exportId": "93874dd7-35d7-4f1f-8cf8-051c606d920b",
"exportType": "json",
"exportQuery": {
"postExport": {
"postId": "dd",
"isCommentIncluded": false
},
"feedExport": {
"accounts": [
"1677143852565319"
],
"recordTypes": [],
"actions": [],
"contentTypes": [],
"startDate": 0,
"endDate": 0,
"advancedSearch": {
"allOfTheWords": [
"string"
],
"anyOfTheWords": [
"string"
],
"noneOfTheWords": [
"string"
],
"hashtags": [
"string"
],
"keyword": "string",
"exactPhrase": "string"
}
}
}
}
}
I would appreciate it if someone could help me to understand what the issue is.
Both of your JSON and Avro looks good.
You are facing the issue because JSON doesn't conform to Avro's JSON encoding spec.
So, if you convert your JSON accordingly, it will somehow look like this
{
"context": {
"user_id": "1",
"group_id": "1",
"organization_id": "1"
},
"exportInfo": {
"exportId": "93874dd7-35d7-4f1f-8cf8-051c606d920b",
"exportType": "json",
"exportQuery": {
"postExport": {
"com.example.avro.model.PostExport": {
"postId": "dd",
"isCommentIncluded": false
}
},
"feedExport": {
"com.example.avro.model.FeedExport": {
"accounts": [
"1677143852565319"
],
"recordTypes": [],
"actions": [],
"contentTypes": [],
"startDate": 0,
"endDate": 0,
"advancedSearch": {
"com.example.avro.model.AdvancedSearchExport": {
"allOfTheWords": [
"string"
],
"anyOfTheWords": [
"string"
],
"noneOfTheWords": [
"string"
],
"hashtags": [
"string"
],
"keyword": "string",
"exactPhrase": "string"
}
}
}
}
}
}
}

avro.io.AvroTypeException: The datum [object] is not an example of the schema

I have been struggling through this issue quite for some time. I am working on AvroProducer(confluent kafka) and getting error related to schema defined.
Here is the complete stacktrace of the issue I am getting:
<!--language: lang-none-->
raise AvroTypeException(self.writer_schema, datum)
avro.io.AvroTypeException: The datum {'totalDifficulty': 2726165051, 'stateRoot': '0xf09bd6730b3ae7f5728836564837d7f776a8f7333628c8b84cb57d7c6d48ebba', 'sha3Uncles': '0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347', 'size': 538, 'logs': [], 'gasLimit': 8000000, 'mixHash': '0x410b2b19519be16496727c93515f399072ffecf06defe4913d00eb4d10bb7351', 'logsBloom': '0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000', 'nonce': '0x18dc6c0d30839c91', 'proofOfAuthorityData': '0xd883010817846765746888676f312e31302e34856c696e7578', 'number': 5414, 'timestamp': 1552577641, 'difficulty': 589091, 'gasUsed': 0, 'miner': '0x48FA5EBc2f0D82B5D52faAe624Fa2426998ab492', 'hash': '0x71259991acb407a85befa8b3c5df26a94a11a6c08f92f3e3b7c9c0e8e1f5916d', 'transactionsRoot': '0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421', 'receiptsRoot': '0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421', 'transactions': [], 'parentHash': '0x9f0c25eeab86fc144296cb034c94857beed331936016d60c0986a35ac07d9c68', 'uncles': []} is not an example of the schema {
"type": "record",
"name": "value",
"namespace": "exporter.value.opsnetBlock",
"fields": [
{
"type": "int",
"name": "difficulty"
},
{
"type": "string",
"name": "proofOfAuthorityData"
},
{
"type": "int",
"name": "gasLimit"
},
{
"type": "int",
"name": "gasUsed"
},
{
"type": "string",
"name": "hash"
},
{
"type": "string",
"name": "logsBloom"
},
{
"type": "int",
"name": "size"
},
{
"type": "string",
"name": "miner"
},
{
"type": "string",
"name": "mixHash"
},
{
"type": "string",
"name": "nonce"
},
{
"type": "int",
"name": "number"
},
{
"type": "string",
"name": "parentHash"
},
{
"type": "string",
"name": "receiptsRoot"
},
{
"type": "string",
"name": "sha3Uncles"
},
{
"type": "string",
"name": "stateRoot"
},
{
"type": "int",
"name": "timestamp"
},
{
"type": "int",
"name": "totalDifficulty"
},
{
"type": "string",
"name": "transactionsRoot"
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "transactions"
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "uncles"
},
{
"type": {
"type": "array",
"items": {
"type": "record",
"name": "Child",
"namespace": "exporter.value.opsnetBlock",
"fields": [
{
"type": "string",
"name": "address"
},
{
"type": "string",
"name": "blockHash"
},
{
"type": "int",
"name": "blockNumber"
},
{
"type": "string",
"name": "data"
},
{
"type": "int",
"name": "logIndex"
},
{
"type": "boolean",
"name": "removed"
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "topics"
},
{
"type": "string",
"name": "transactionHash"
},
{
"type": "int",
"name": "transactionIndex"
}
]
}
},
"name": "logs"
}
]
}
Can anybody please tell me where am I going wrong in this?
Thanks in advance