Kafka JDBC Source connector Insert or Update

Kafka JDBC Source connector Insert or Update - apache-kafka

I configured a Kafka JDBC Source connector in order to push on a Kafka topic the record changed (insert or update) from a PostgreSQL database.
I use "timestamp+incrementing" mode. Seems to work fine.
I didnt't configure the JDBC Sink connector because I'm using a Kafka Consumer that listen on the topic.
The message on the topic is a JSON. This is an example:
{
"schema": {
"type": "struct",
"fields": [
{
"type": "int64",
"optional": false,
"field": "id"
},
{
"type": "int64",
"optional": true,
"name": "org.apache.kafka.connect.data.Timestamp",
"version": 1,
"field": "entity_create_date"
},
{
"type": "int64",
"optional": true,
"name": "org.apache.kafka.connect.data.Timestamp",
"version": 1,
"field": "entity_modify_date"
},
{
"type": "int32",
"optional": true,
"field": "entity_version"
},
{
"type": "string",
"optional": true,
"field": "firstname"
},
{
"type": "string",
"optional": true,
"field": "lastname"
}
],
"optional": false,
"name": "author"
},
"payload": {
"id": 1,
"entity_create_date": 1600287236682,
"entity_modify_date": 1600287236682,
"entity_version": 1,
"firstname": "George",
"lastname": "Orwell"
}
}
As you can see there is no info about if this change is captured by Source connector because of an insert or an update.
I need this information. How can solve?

You can't get that information using the JDBC Source connector, unless you do something bespoke in the source schema and triggers.
This is one of the reasons why log-based CDC is generally a better way to get events from the source database, and for other reasons including:
capturing deletes
capturing the type of operation
capturing all events, not just what's there at the time when the connector polls.
For more details on the nuances of this see this blog or a talk based on the same.

Using a CDC based approach as suggested by #Robin Moffatt may be the proper way to handle your requirement. Checkout https://debezium.io/
However, looking at your table data you could use "entity_create_date" and "entity_modify_date" in your consumer to determine if the message in an insert or update. If "entity_create_date" = "entity_modify_date" then it's an insert else it's an update.

Related

JDBC sink topic with multiple structs to postgres

I am trying to sink a few topics top a postgres database. However the topic schema defines a array at the top level and within it multiple structs. Automapping does not work and I cannot find any reference how to handle this. I need all structs because they are dependent types, the second struct references the first struct as a field.
Currently it breaks when hitting the 2nd struct stating statusChangeEvent (struct) has no mapping to sql column type. This because it is using auto.create to make a table (probably called ProcessStatus) then when hitting the second entry there is no column of course.
[
{
"type": "record",
"name": "processStatus",
"namespace": "company.some.process",
"fields": [
{
"name": "code",
"doc": "The code of the processStatus",
"type": "string"
},
{
"name": "name",
"doc": "The name of the processStatus",
"type": "string"
},
{
"name": "description",
"type": "string"
},
{
"name": "isCompleted",
"type": "boolean"
},
{
"name": "isSuccessfullyCompleted",
"type": "boolean"
}
]
},
{
"type": "record",
"name": "StatusChangeEvent",
"namespace": "company.some.process",
"fields": [
{
"name": "contNumber",
"type": "string"
},
{
"name": "processId",
"type": "string"
},
{
"name": "processVersion",
"type": "int"
},
{
"name": "extProcessId",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "fromStatus",
"type": "process.status"
},
{
"name": "toStatus",
"doc": "The new status of the process",
"type": "company.some.process.processStatus"
},
{
"name": "changeDateTime",
"type": "long",
"logicalType": "timestamp-millis"
},
{
"name": "isPublic",
"type": "boolean"
}
]
}
]
I am not using ksql atm. Which connector settings are suited for this task? If there is a ksql alternative it would be nice to know but the current requirement is to use the JDBC connector.
I tried using flatten but it does not support struct fields that have a schema. Which seems kind of weird. Aren't schema's the whole selling point of connect with kafka? Or is it more of a constraint you have to work around?

Aren't schema's the whole selling point of connect with kafka?
Yes, but Postgres (or the JDBC Sink, in general) doesn't really support nested objects within columns. For that, you're better off with a document database, such as using Mongo Sink Connector.
Which connector settings are suited for this task?
None, really, other than transforms. You could write your own if flatten doesn't work.
You could try pre-defining your table to use JSONB for the two status columns, however, that's more of a workaround.

What column type do I need for this nested data in BigQuery?

I have a JSON schema for a Kafka stream that I am integrating with BigQuery but I can't get the data type correct at the BigQuery end. This is the schema:
"my_meta_data": {
"type": "object",
"properties": {
"property_1": {
"type": "array",
"items": {
"type": "number"
}
},
"property_2": {
"type": "array",
"items": {
"type": "number"
}
},
"property_3": {
"type": "array",
"items": {
"type": "number"
}
}
}
}
I tried this in the JSON file defining the BigQuery table:
{
"name": "my_meta_data",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "property_1",
"type": "INT64",
"mode": "REPEATED"
},
{
"name": "property_2",
"type": "INT64",
"mode": "REPEATED"
},
{
"name": "property_3",
"type": "INT64",
"mode": "REPEATED"
}
]
}
I am using a hosted connector from Confluent, the Kafka provider, and the error message is
The connector is failing because it cannot write a non-array element to an array column. Please check the schemas of the data in Kafka and the BigQuery tables the connector is writing to, and ensure that all data from Kafka that will be written to an array column in BigQuery is contained in an array.
I haven't defined an array column though, I've defined a RECORD column that contains arrays. Any ideas how I can set up the BigQuery table to capture this data? Thanks in advance.

Does Kafka Connect provide data provenance?

Iam new to kafka connect. I have used tools like nifi for sometime now. Those tools provide data provenance for auditing and other purpose for understanding what happened to a piece of data. But I couldn't find any similar feature with kafka connect. Does that feature exist for kafka connect? Or is there some way of handling data provenance in kafka connect so as to understand what happened to the data?

A CDC tool may help with your auditing needs, otherwise you will have to build your custom logic using a single message transformation (SMT). For example, using Debezium connector, this is what you will get as message payload for every change event:
{
"payload": {
"before": null,
"after": {
"id": 1,
"first_name": "7b789a503dc96805dc9f3dabbc97073b",
"last_name": "8428d131d60d785175954712742994fa",
"email": "68d0a7ccbd412aa4c1304f335b0edee8#example.com"
},
"source": {
"version": "1.1.0.Final",
"connector": "postgresql",
"name": "localhost",
"ts_ms": 1587303655422,
"snapshot": "true",
"db": "cdcdb",
"schema": "cdc",
"table": "customers",
"txId": 2476,
"lsn": 40512632,
"xmin": null
},
"op": "c",
"ts_ms": 1587303655424,
"transaction": null
}
}

Orion - cygnus integration

We are trying to integrate Orion, Cygnus and Ckan together.
I have followed these steps in order to make this happen:
Install and configure Cygnus with the Fiware Ckan info(Cygnus up and running)
Login in Ckan and get the API key and configure this in the Cygnus settings
Orion steps:
queryUpdate = APPEND data
{
"contextElements": [{
"type": "Room",
"isPattern": "false",
"id": "26JanRoom",
"attributes": [{
"name": "temperature",
"type": "float",
"value": "888"
}]
}],
"updateAction": "APPEND"
}
subscribeContext = subscribe with the entity id created above(our Cygnus host is given as reference "reference": "CYGNUS HOST", )
{
"entities": [{
"type": "Room",
"isPattern": "false",
"id": "26JanRoom"
}],
"attributes": ["temperature"],
"reference": "CYGNUS HOST",
"duration": "P1M",
"notifyConditions": [{
"type": "ONCHANGE",
"condValues": ["temperature"]
}],
"throttling": "PT5S"
}
queryUpdate = UPDATE data
{
"contextElements": [{
"type": "Room",
"isPattern": "false",
"id": "26JanRoom",
"attributes": [{
"name": "temperature",
"type": "float",
"value": "111"
}]
}],
"updateAction": "UPDATE"
}
What we expect is to receive some notifications in the Cygnus side, but there is nothing sent from the Orion (orion.lab.fi-ware.org:1026/)
Could you please help us on this topic?
Thanks kr
Omer Ozdemir

Your are using
"condValues": ["pressure"]
which means that every time the attribute named pressure change, Orion will trigger a notification. However, your update is modifiygin temperature.
Please, have a look to the subscribe context operation section at Orion documentation.

Loopback - GET model using custom String ID from MongoDB

I'm developing an API with loopback, everything worked fine until I decided to change the ids of my documents in the database. Now I don't want them to be auto generated.
Now that I'm setting the Id myself. I get an "Unknown id" 404, whenever I hit this endpoint: GET properties/{id}
How can I use custom IDs with loopback and mongodb?
Whenever I hit this endpoint: http://localhost:5000/api/properties/20020705171616489678000000
I get this error:
{
"error": {
"name": "Error",
"status": 404,
"message": "Unknown \"Property\" id \"20020705171616489678000000\".",
"statusCode": 404,
"code": "MODEL_NOT_FOUND"
}
}
This is my model.json, just in case...
{
"name": "Property",
"plural": "properties",
"base": "PersistedModel",
"idInjection": false,
"options": {
"validateUpsert": true
},
"properties": {
"id": {"id": true, "type": "string", "generated": false},
"photos": {
"type": [
"string"
]
},
"propertyType": {
"type": "string",
"required": true
},
"internalId": {
"type": "string",
"required": true
},
"flexCode": {
"type": "string",
"required": true
}
},
"validations": [],
"relations": {},
"acls": [],
"methods": []
}

Your model setup (with with idInjection: true or false) did work when I tried it with a PostGreSQL DB setup with a text id field for smaller numbers.
Running a Loopback application with DEBUG=loopback:connector:* node . outputs the database queries being run in the terminal - I tried it with the id value you are trying and the parameter value was [2.002070517161649e+25], so the size of the number is the issue.
You could try raising it as a bug in Loopback, but JS is horrible at dealing with large numbers so you may be better off not using such large numbers as identifiers anyway.
It does work if the ID is an alphanumeric string over 16 characters so there might be a work around for you (use ObjectId?), depending on what you are trying to achieve.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Kafka JDBC Source connector Insert or Update - apache-kafka

Related

JDBC sink topic with multiple structs to postgres

What column type do I need for this nested data in BigQuery?

Does Kafka Connect provide data provenance?

Orion - cygnus integration

Loopback - GET model using custom String ID from MongoDB

Categories

Resources