issue with kafka source connector - apache-kafka

I will appreciate it if someone can help me figure out why my kafka consumer is displaying its schema "fields" as shown below, instead of to display the data type and the column names as should be.
confirmuserreg | schema: {
confirmuserreg | type: 'struct',
confirmuserreg | fields: [ [Object], [Object], [Object], [Object], [Object], [Object] ],
confirmuserreg | optional: false,
confirmuserreg | name: 'smartdevdbserver1.signup_db.users.Envelope'
confirmuserreg | },
confirmuserreg | payload: {
confirmuserreg | before: null,
confirmuserreg | after: {
confirmuserreg | id: 44,
confirmuserreg | email: 'testing13#firstclicklimited.com',
confirmuserreg | password: '$2a$10$lJ5ILqdiJMXoJhHBOLmFeOAF3gppc9ZNgPrzTRnzDU18kX4lxu19C',
confirmuserreg | User_status: 'INACTIVE',
confirmuserreg | auth_token: null
confirmuserreg | },
confirmuserreg | source: {
confirmuserreg | version: '1.9.5.Final',
confirmuserreg | connector: 'mysql',
confirmuserreg | name: 'smartdevdbserver1',
confirmuserreg | ts_ms: 1666790831000,
confirmuserreg | snapshot: 'false',
confirmuserreg | db: 'signup_db',
confirmuserreg | sequence: null,
confirmuserreg | table: 'users',
confirmuserreg | server_id: 1,
confirmuserreg | gtid: '4e390d46-53b4-11ed-b7c4-0242ac140003:33',
confirmuserreg | file: 'binlog.000008',
confirmuserreg | pos: 487,
confirmuserreg | row: 0,
confirmuserreg | thread: 41,
confirmuserreg | query: null
confirmuserreg | },
confirmuserreg | op: 'c',
confirmuserreg | ts_ms: 1666790832054,
confirmuserreg | transaction: null
confirmuserreg | }
It should be something like this instead:
{"schema": {"type": "struct","fields": [{"type": "string","optional": false,"field": "Name"}, {"type": "string","optional": false,"field": "company"}],"optional": false,"name": "Person"},"payload": {"Name": "deepak","company": "BT"}}
This is my connector config:
{
"name": "smartdevsignupconnector112",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "true",
"value.converter.schemas.enable": "true",
"database.hostname": "mysql1",
"database.port": "3306",
"database.user": "clusterAdmin",
"database.password": "xxxxxxxxxx",
"database.server.id": "184055",
"database.server.name": "smartdevdbserver1",
"database.include.list": "signup_db",
"database.history.kafka.bootstrap.servers": "kafka1:9092",
"database.history.kafka.topic": "dbhistory.smartdevdbserver1",
"include.schema.changes": "true",
"table.whitelist": "signup_db.users",
"column.blacklist": "signup_db.users.fullName, signup_db.users.address, signup_db.users.phoneNo, signup_db.users.gender, signup_db.users.userRole, signup_db.users.reason_for_inactive, signup_db.users.firstvisit, signup_db.users.last_changed_PW, signup_db.users.regDate",
"snapshot.mode": "when_needed"
}
}
I expect record from 5 columns (email, password, User_status, auth_token, including the primary key) to be displayed and below is the table schema:
DROP TABLE IF EXISTS `users`;
CREATE TABLE IF NOT EXISTS `users` (
`id` int NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
`password` varchar(255) NOT NULL,
`fullName` varchar(66),
`address` varchar(77),
`phoneNo` varchar(16),
`gender` varchar(6),
`userRole` enum('visitor','student','Admin') NOT NULL DEFAULT 'visitor',
`User_status` enum('ACTIVE','INACTIVE') NOT NULL DEFAULT 'INACTIVE',
`reason_for_inactive` enum('visitor','TERMINATED','SUSPENDED_FOR_VIOLATION') NOT NULL DEFAULT 'visitor',
`firstvisit` varchar(3) DEFAULT NULL,
`last_changed_PW` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`regDate` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`auth_token` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

should be something like this instead
Debezium has its own event format, so no, it shouldn't look like that.
Seems your confirmuserreg service is a Javascript application, and you are simply having [Object] as the default console.log() output for deeply nested JS objects.
If you don't care about the Debezium metdata, then flatten the event. That way, KSQL will read the payload.after field and be able to extract the fields within.

Related

Sink Connector auto create tables with proper data type

I have the debezium source connector for Postgresql with the value convertor as Avro and it uses the schema registry.
Source DDL:
Table "public.tbl1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-----------------------------+-----------+----------+----------------------------------+----------+-------------+--------------+-------------
id | integer | | not null | nextval('tbl1_id_seq'::regclass) | plain | | |
name | character varying(100) | | | | extended | | |
col4 | numeric | | | | main | | |
col5 | bigint | | | | plain | | |
col6 | timestamp without time zone | | | | plain | | |
col7 | timestamp with time zone | | | | plain | | |
col8 | boolean | | | | plain | | |
Indexes:
"tbl1_pkey" PRIMARY KEY, btree (id)
Publications:
"dbz_publication"
Access method: heap
In the schema registry:
{
"type": "record",
"name": "Value",
"namespace": "test.public.tbl1",
"fields": [
{
"name": "id",
"type": {
"type": "int",
"connect.parameters": {
"__debezium.source.column.type": "SERIAL",
"__debezium.source.column.length": "10",
"__debezium.source.column.scale": "0"
},
"connect.default": 0
},
"default": 0
},
{
"name": "name",
"type": [
"null",
{
"type": "string",
"connect.parameters": {
"__debezium.source.column.type": "VARCHAR",
"__debezium.source.column.length": "100",
"__debezium.source.column.scale": "0"
}
}
],
"default": null
},
{
"name": "col4",
"type": [
"null",
{
"type": "double",
"connect.parameters": {
"__debezium.source.column.type": "NUMERIC",
"__debezium.source.column.length": "0"
}
}
],
"default": null
},
{
"name": "col5",
"type": [
"null",
{
"type": "long",
"connect.parameters": {
"__debezium.source.column.type": "INT8",
"__debezium.source.column.length": "19",
"__debezium.source.column.scale": "0"
}
}
],
"default": null
},
{
"name": "col6",
"type": [
"null",
{
"type": "long",
"connect.version": 1,
"connect.parameters": {
"__debezium.source.column.type": "TIMESTAMP",
"__debezium.source.column.length": "29",
"__debezium.source.column.scale": "6"
},
"connect.name": "io.debezium.time.MicroTimestamp"
}
],
"default": null
},
{
"name": "col7",
"type": [
"null",
{
"type": "string",
"connect.version": 1,
"connect.parameters": {
"__debezium.source.column.type": "TIMESTAMPTZ",
"__debezium.source.column.length": "35",
"__debezium.source.column.scale": "6"
},
"connect.name": "io.debezium.time.ZonedTimestamp"
}
],
"default": null
},
{
"name": "col8",
"type": [
"null",
{
"type": "boolean",
"connect.parameters": {
"__debezium.source.column.type": "BOOL",
"__debezium.source.column.length": "1",
"__debezium.source.column.scale": "0"
}
}
],
"default": null
}
],
"connect.name": "test.public.tbl1.Value"
}
But in the target PostgreSQL the data types are completely mismatched for ID columns and timestamp columns. Sometimes Decimal columns as well(that's due to this)
Target:
Table "public.tbl1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id | text | | not null | | extended | | |
name | text | | | | extended | | |
col4 | double precision | | | | plain | | |
col5 | bigint | | | | plain | | |
col6 | bigint | | | | plain | | |
col7 | text | | | | extended | | |
col8 | boolean | | | | plain | | |
Indexes:
"tbl1_pkey" PRIMARY KEY, btree (id)
Access method: heap
Im trying to understand even with schema registry , its not creating the target tables with proper datatypes.
Sink config:
{
"name": "t1-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"topics": "test.public.tbl1",
"connection.url": "jdbc:postgresql://172.31.85.***:5432/test",
"connection.user": "postgres",
"connection.password": "***",
"dialect.name": "PostgreSqlDatabaseDialect",
"auto.create": "true",
"insert.mode": "upsert",
"delete.enabled": "true",
"pk.fields": "id",
"pk.mode": "record_key",
"table.name.format": "tbl1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable": "false",
"internal.key.converter": "com.amazonaws.services.schemaregistry.kafkaconnect.AWSKafkaAvroConverter",
"internal.key.converter.schemas.enable": "true",
"internal.value.converter": "com.amazonaws.services.schemaregistry.kafkaconnect.AWSKafkaAvroConverter",
"internal.value.converter.schemas.enable": "true",
"value.converter": "com.amazonaws.services.schemaregistry.kafkaconnect.AWSKafkaAvroConverter",
"value.converter.schemas.enable": "true",
"value.converter.region": "us-east-1",
"key.converter.region": "us-east-1",
"key.converter.schemaAutoRegistrationEnabled": "true",
"value.converter.schemaAutoRegistrationEnabled": "true",
"key.converter.avroRecordType": "GENERIC_RECORD",
"value.converter.avroRecordType": "GENERIC_RECORD",
"key.converter.registry.name": "bhuvi-debezium",
"value.converter.registry.name": "bhuvi-debezium",
"value.converter.column.propagate.source.type": ".*",
"value.converter.datatype.propagate.source.type": ".*"
}
}

Postgres expand array of object (jsonb)

I have a postgres table in which I want to expand a jsonb column.
The table (called runs) has each participant as a single row, with the jsonb column holding their experiment data.
id |data |
---|-------------|
id1|[{}, {}] |
id2|[{}, {}, {}] |
The jsonb column is always an array with an unknown number of objects, where each object has an unknown number of arbitrary keys.
[
{
"rt": 3698,
"phase": "questionnaire",
"question": 1
},
{
"rt": 3698,
"phase": "forced-choice",
"choice": 0,
"options": ["red", "blue"]
}
]
I would like to either (1) expand the jsonb column so each object is a row:
id |rt | phase | question | choice | options |
---|------| ------------- | -------- | ------ | -------------- |
id1| 3698 | questionnaire | 1 | | |
id1| 5467 | choice | | 0 | ["red", "blue] |
OR (2) map the other columns in the row to the jsonb array (here the "id" key):
[
{
"id": "id1",
"rt": 3698,
"phase": "questionnaire",
"question": 1
},
{
"id": "id1",
"rt": 3698,
"phase": "forced-choice",
"choice": 0,
"options": ["red", "blue"]
}
]
The fact that the number of objects, the number of keys per object, and the keys themselves are unknown a priori is really stumping me on how to accomplish this. Maybe something like this, but this isn't right...
SELECT id, x.*
FROM
runs_table,
jsonb_populate_recordset(null:runs_table, data) x
PostgreSQL has a many JSON functions. Firstly you must extract keys and values from 'JSONB'. After then you can get the type of values using Postgres jsonb_typeof(jsonb) function. I wrote two samples for you:
-- sample 1
select *
from jsonb_array_elements(
'[
{
"id": "id1",
"rt": 3698,
"phase": "questionnaire",
"question": 1
},
{
"id": "id1",
"rt": 3698,
"phase": "forced-choice",
"choice": 0,
"options": ["red", "blue"]
}
]'::jsonb
) t1 (json_data)
cross join jsonb_each(t1.json_data) t2(js_key, js_value)
where jsonb_typeof(t2.js_value::jsonb) = 'array'
-- sample 2
select *
from jsonb_array_elements(
'[
{
"id": "id1",
"rt": 3698,
"phase": "questionnaire",
"question": 1
},
{
"id": "id1",
"rt": 3698,
"phase": "forced-choice",
"choice": 0,
"options": ["red", "blue"]
}
]'::jsonb
) t1 (json_data)
where jsonb_typeof((t1.json_data->'options')::jsonb) = 'array'
Sample 1: This Query will extract all keys and values from JSONB and after then will be set filtering for showing only array type of values.
Sample 2: Use this query if you know which keys can be array types.

Error while inserting Json array into Postgresql

I'm tring to run following INSERT code in PostgreSQL. I have some Json array objects. When i set them null, it works but when i try to execute the code with json arrays, I'm getting error. I describe the details below:
My insert SQL code:
INSERT INTO vuln
(pde,
"pde_id",
pde_description,
pde_published,
pde_published_date,
pde_valid,
pde_cwe,
pde_references,
pde_vu_products,
pde_eval_has_ex,
pde_vu_products_has_inc_excl,
pde_impact)
VALUES ( '{"data_type": "PDE", "data_format": "MITRE", "data_version": "4.0", "PDE_data_meta": {"ID": "PDE-0001", "ASSIGNER": "PDE#xs2.org"}, "problemtype": {"problemtype_data": [{"description": [{"lang": "en", "value": "CWE-20"}]}]}, "references": {"reference_data": [{"url": "http://www.example.com ", "name": "http://www.example.com ", "refsource": "CONFIRM", "tags": []}, {"url": "http://www.example.com", "name": "5707", "refsource": "OSVDB", "tags": []}]}, "description": {"description_data": [{"lang": "en", "value": "asd"}]}}' , 'PDE-0001', 'asd', 'True', '1999-12-30T05:00Z', 'False', 'CWE-20', '[{"url": "http://www.example.com", "name": "http://www.example.com", "refsource": "CONFIRM", "tags": []}, {"url": "http://www.example.com", "name": "5707", "refsource": "OSVDB", "tags": []}]' , NULL, 'False', 'False', '{"baseMetricV2": {"cvssV2": {"version": "2.0", "vectorString": "AV:N/AC:L/Au:N/C:N/I:N/A:P", "accessVector": "NETWORK", "accessComplexity": "LOW", "authentication": "NONE", "confidentialityImpact": "NONE", "integrityImpact": "NONE", "availabilityImpact": "PARTIAL", "baseScore": 5.0}, "severity": "MEDIUM", "exploitabilityScore": 10.0, "impactScore": 2.9, "obtainAllPrivilege": false, "obtainUserPrivilege": false, "obtainOtherPrivilege": false, "userInteractionRequired": false}}' );
I'm getting following error:
LINE 4: '[{"url": "http://www.example.com...
^
DETAIL: Auf »[« müssen explizit angegebene Array-Dimensionen folgen.
My table schema:
CREATE TABLE public.vu
(
"ID" integer NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
PDE json,
"PDE_ID" text COLLATE pg_catalog."default",
PDE_published boolean,
PDE_references json[],
PDE_impact json,
PDE_cwe text COLLATE pg_catalog."default",
PDE_valid boolean,
PDE_published_date date,
PDE_eval_has_ex boolean,
PDE_vu_products_has_inc_excl boolean,
PDE_vu_products json[],
PDE_description text COLLATE pg_catalog."default",
CONSTRAINT vu_pkey PRIMARY KEY ("ID")
)
What am i doing wrong? I tried to cast the array objects with json_array_elements(), but it didn't help me.
You try to insert a single JSON where the engine expects an array of JSON (that's not a JSON array!). Your column pde_references is defined as json[]. Try to insert the value as an array with a single value:
INSERT INTO public.vu
(...
pde_references,
...)
VALUES (...
ARRAY['[{"url": "http://www.example.com", "name": "http://www.example.com", "refsource": "CONFIRM", "tags": []}, {"url": "http://www.example.com", "name": "5707", "refsource": "OSVDB", "tags": []}]']::json[]
...);
The quotes around the Boolean values aren't necessary BTW. You can just write true or false not 'true' or 'false'.

recursively flatten a nested jsonb in postgres without unknown depth and unknown key fields

How can I recursively flatten a nested jsonb in postgres which I don't know the depth and the field at each depth? (see example below)
A postgressql query to do the flattening would be much apreciated
{
"xx": "",
"xx": "",
"form": "xxx",
"type": "",
"content_type": "xxx",
"reported_date": ,
"contact": {
"imported_date": "",
"name": "",
"phone": "",
"alternate_phone": "",
"specialization": "",
"type": "",
"reported_date": ,
"parent": {
"_id": "xxx",
"_rev": "xxx",
"parent": "",
"type": "xxx"
}
}
}
I have searched in stack overflow but they only consider jsonb's which have a single depth and the keys are already known before
Example setup:
create table my_table(id int, data jsonb);
insert into my_table values
(1,
$${
"type": "a type",
"form": "a form",
"contact": {
"name": "a name",
"phone": "123-456-78",
"type": "contact type",
"parent": {
"id": "444",
"type": "parent type"
}
}
}$$);
The recursive query executes jsonb_each() for every json object found on any level. New key names contain full path from the root:
with recursive flat (id, key, value) as (
select id, key, value
from my_table,
jsonb_each(data)
union
select f.id, concat(f.key, '.', j.key), j.value
from flat f,
jsonb_each(f.value) j
where jsonb_typeof(f.value) = 'object'
)
select id, jsonb_pretty(jsonb_object_agg(key, value)) as data
from flat
where jsonb_typeof(value) <> 'object'
group by id;
id | data
----+------------------------------------------
1 | { +
| "form": "a form", +
| "type": "a type", +
| "contact.name": "a name", +
| "contact.type": "contact type", +
| "contact.phone": "123-456-78", +
| "contact.parent.id": "444", +
| "contact.parent.type": "parent type"+
| }
(1 row)
If you want to get a flat view of this data you can use the function create_jsonb_flat_view() described in this answer Flatten aggregated key/value pairs from a JSONB field?
You need to create a table (or view) with flattened jsonb:
create table my_table_flat as
-- create view my_table_flat as
with recursive flat (id, key, value) as (
-- etc as above
-- but without jsonb_pretty()
Now you can use the function on the table:
select create_jsonb_flat_view('my_table_flat', 'id', 'data');
select * from my_table_flat_view;
id | contact.name | contact.parent.id | contact.parent.type | contact.phone | contact.type | form | type
----+--------------+-------------------+---------------------+---------------+--------------+--------+--------
1 | a name | 444 | parent type | 123-456-78 | contact type | a form | a type
(1 row)
The solution works in Postgres 9.5+, as it uses jsonb function introduced in this version. If your server version is older it is highly recommended to upgrade Postgres anyway to use jsonb efficiently.

Tilestache - Mismatch between the coordinates present in the PostgreSQL/PostGIS and those returned by PostGeoJSON provider

This is the relevant bit my Tilestache config,
"points-of-interest":
{
"provider":
{
"class": "TileStache.Goodies.Providers.PostGeoJSON.Provider",
"kwargs":
{
"dsn": "dbname=database user=username host=localhost",
"query": "SELECT loc_id AS __id__, loc_name, geo2 AS __geometry__ FROM location",
"id_column": "__id__", "geometry_column": "__geometry__"
}
}
},
When I access -
http://127.0.0.1:8080/points-of-interest/0/0/0.json
I get the Response -
{
"type": "FeatureCollection",
"features": [
{
"geometry": {
"type": "Point",
"coordinates": [
-0.0008691850758236021,
0.0002956334943026654
]
},
"type": "Feature",
"properties": {
"loc_name": "TI Blvd, TX"
},
"id": 9
}
]}
The coordinates in the above response are -
"coordinates": [-0.0008691850758236021,0.0002956334943026654]
Where as the actual coordinates in the db table are -
database=# SELECT loc_id AS __id__, loc_name, ST_AsText(geo2) AS __geometry__ FROM location;
__id__ | loc_name | __geometry__
--------+-------------+---------------------------
9 | TI Blvd, TX | POINT(-96.75724 32.90977)
What am I missing here? Why are my GeoJSON response coordinates different?
The table description is
Table "public.location"
Column | Type | Modifiers
----------+------------------------+-----------
loc_id | integer | not null
loc_name | character varying(70) |
geo2 | geometry(Point,900913) |
Indexes:
"location_pkey" PRIMARY KEY, btree (loc_id)
Thank you all in advance for all the help.
Inserting the point with SRID - 4326 fixed the issue.
This is the insert -
INSERT INTO location(loc_id, loc_name, geo2) VALUES (3, 'Manchester, NH', ST_Transform(ST_GeomFromText('POINT(-71.46259 42.99019)',4326), 900913));