ksqldb stream created with EXTRACTJSONFIELD fields contains null values for these fields - apache-kafka

everyone. I am trying to create a stream from another stream (using CREATE FROM SELECT) that contains JSON data stored as VARCHAR. My first intention was to use EXTRACTJSONFIELD to extract specific JSON fields to separate columns. But on an attempt to query a new stream I am getting the null values for fields created using EXTRACTJSONFIELD.
Example:
My payload looks like this:
{
"data": {
"some_value": 3702,
},
"metadata": {
"timestamp": "2022-05-23T23:54:38.477934Z",
"table-name": "some_table"
}
}
I create stream using
CREATE STREAM SOME_STREAM (data VARCHAR, metadata VARCHAR) WITH (KAFKA_TOPIC='SOME_TOPIC', VALUE_FORMAT='JSON');
Data inside looks like this:
Then I am trying to create new stream using:
CREATE STREAM SOME_STREAM_QUERYABLE
AS SELECT
DATA,
METADATA,
EXTRACTJSONFIELD(METADATA, '$.timestamp') as timestamp
FROM SOME_STREAM
EMIT CHANGES
The attempt to query this new stream gives me null for timestamp field. Data and metadata fields have valid json.

Related

How to parse array of struct when creating stream in KSQL

I'm trying to access a key called session ID, here is what the JSON of my message looks like:
{
"events":[
{
"data":{
"session_id":"-8479590123628083695"}}]}
This is my KSQL code to create the stream,
CREATE STREAM stream_test(
events ARRAY< data STRUCT< session_id varchar> >
) WITH (KAFKA_TOPIC='my_topic',
VALUE_FORMAT='JSON')
But I get this error,
KSQLError: ("line 4:22: mismatched input 'STRUCT' expecting {'(', 'ARRAY', '>'}", 40001, None)
Does anyone know how to unpack this kind of structure? I'm struggling to find examples
My solution:
events ARRAY< STRUCT<data STRUCT< session_id varchar >> >

ksqlDB get message's value as string column from Json topic

There is a topic, containing plain JSON messages, trying to create a new stream by extracting few columns from JSON and another column as varchar with the message's value.
Here is a sample message in a topic_json
{
"db": "mydb",
"collection": "collection",
"op": "update"
}
Creating a stream like -
CREATE STREAM test1 (
db VARCHAR,
collection VARCHAR
VAL STRING
) WITH (
KAFKA_TOPIC = 'topic_json',
VALUE_FORMAT = 'JSON'
);
Output of this stream will contain only db and collection columns, How can I add another column as message's value - "{\"db\":\"mydb\",\"collection\":\"collection\",\"op\":\"update\"}"

How Create KSQLdb Stream fields from nested JSON Object

I have a topic over which i am sending json in the following format:
{
"schema": {
"type": "string",
"optional": true
},
"payload": “CustomerData{version='1', customerId=‘76813432’, phone=‘76813432’}”
}
and I would like to create a stream with customerId, and phone but I am not sure how to define the stream in terms of a nested json object. (edited)
CREATE STREAM customer (
payload.version VARCHAR,
payload.customerId VARCHAR,
payload.phone VARCHAR
) WITH (
KAFKA_TOPIC='customers',
VALUE_FORMAT='JSON'
);
Would it be something like that? How do I de-reference the nested object in defining the streams field?
Actually the above does not work for field definitions it says:
Caused by: line 2:12:
extraneous input '.' expecting {'EMIT', 'CHANGES',
'INTEGER', 'DATE', 'TIME', 'TIMESTAMP', 'INTERVAL', 'YEAR', 'MONTH', 'DAY',
Applying function extractjsonfield
There is an ksqlDB function called extractjsonfield that you can use.
First, you need to extract the schema and payload fields:
CREATE STREAM customer (
schema VARCHAR,
payload VARCHAR
) WITH (
KAFKA_TOPIC='customers',
VALUE_FORMAT='JSON'
);
Then you can select the nested fields within the json:
SELECT EXTRACTJSONFIELD(payload, '$.version') AS version FROM customer;
However, it looks like your payload data does not have a valid JSON format.
Applying a STRUCT schema
If your entire payload is encoded as JSON string, which means your data looks like:
{
"schema": {
"type": "string",
"optional": true
},
"payload": {
"version"="1",
"customerId"="76813432",
"phone"="76813432"
}
}
you can define STRUCT as below:
CREATE STREAM customer (
schema STRUCT<
type VARCHAR,
optional BOOLEAN>,
payload STRUCT<
version VARCHAR,
customerId VARCHAR,
phone VARCHAR>
)
WITH (
KAFKA_TOPIC='customers',
VALUE_FORMAT='JSON'
);
and finally referencing individual fields can be done like this:
CREATE STREAM customer_analysis AS
SELECT
payload->version as VERSION,
payload->customerId as CUSTOMER_ID,
payload->phone as PHONE
FROM customer
EMIT CHANGES;

how to convert map<anydata> to json

In my CRUD Rest Service I do an insert into a DB and want to respond to the caller with the created new record. I am looking for a nice way to convert the map to json.
I am running on ballerina 0.991.0 and using a postgreSQL.
The return of the Update ("INSERT ...") is a map.
I tried with convert and stamp but i did not work for me.
import ballerinax/jdbc;
...
jdbc:Client certificateDB = new({
url: "jdbc:postgresql://localhost:5432/certificatedb",
username: "USER",
password: "PASS",
poolOptions: { maximumPoolSize: 5 },
dbOptions: { useSSL: false }
}); ...
var ret = certificateDB->update("INSERT INTO certificates(certificate, typ, scope_) VALUES (?, ?, ?)", certificate, typ, scope_);
// here is the data, it is map<anydata>
ret.generatedKeys
map should know which data type it is, right?
then it should be easy to convert it to json like this:
{"certificate":"{certificate:
"-----BEGIN
CERTIFICATE-----\nMIIFJjCCA...tox36A7HFmlYDQ1ozh+tLI=\n-----END
CERTIFICATE-----", typ: "mqttCertificate", scope_: "QARC", id_:
223}"}
Right now i do a foreach an build the json manually. Quite ugly. Maybe somebody has some tips to do this in a nice way.
It cannot be excluded that it is due to my lack of programming skills :-)
The return value of JDBC update remote function is sql:UpdateResult|error.
The sql:UpdateResult is a record with two fields. (Refer https://ballerina.io/learn/api-docs/ballerina/sql.html#UpdateResult)
UpdatedRowCount of type int- The number of rows which got affected/updated due to the given statement execution
generatedKeys of type map - This contains a map of auto generated column values due to the update operation (only if the corresponding table has auto generated columns). The data is given as key value pairs of column name and column value. So this map contains only the auto generated column values.
But your requirement is to get the entire row which is inserted by the given update function. It can’t be returned with the update operation if self. To get that you have to execute the jdbc select operation with the matching criteria. The select operation will return a table or an error. That table can be converted to a json easily using convert() function.
For example: Lets say the certificates table has a auto generated primary key column name ‘cert_id’. Then you can retrieve that id value using below code.
int generatedID = <int>updateRet.generatedKeys.CERT_ID;
Then use that generated id to query the data.
var ret = certificateDB->select(“SELECT certificate, typ, scope_ FROM certificates where id = ?”, (), generatedID);
json convertedJson = {};
if (ret is table<record {}>) {
var jsonConversionResult = json.convert(ret);
if (jsonConversionResult is json) {
convertedJson = jsonConversionResult;
}
}
Refer the example https://ballerina.io/learn/by-example/jdbc-client-crud-operations.html for more details.?

How to return a plain value from a Knex / Postgresql query?

I'm trying to return a simple, scalar string value from a Postgres DB using Knex. So far, everything I do returns a JSON object with a key (the column name) and the value, so I have to reach into the object to get the value. If I return multiple rows, then I get multiple JSON objects, each one repeating the key.
I could be returning multiple columns, in which case each row would at least need to be an array. I'm not looking for a special case where specifying a single column returns the value without the array -- I'm OK reaching into the array. I want to avoid the JSON object with the repetitive listing of column names as keys.
I've scoured the Knex docs but don't see how to control the output.
My table is a simple mapping table with two string columns:
CREATE TABLE public._suite
(
piv_id character(18) NOT NULL,
sf_id character(18) NOT NULL,
CONSTRAINT _suite_pkey PRIMARY KEY (piv_id)
)
When I build a query using Knex methods like
let myId = 'foo', table = '_suite';
return db(table).where('piv_id', myId).first(['sf_id'])
.then( function(id) { return(id); });
I get {"sf_id":"a4T8A0000009PsfUAE"} ; what I want is just "a4T8A0000009PsfUAE"
If I use a raw query, like
return db.raw(`select sf_id from ${table} where piv_id = '${myId}'`);
I get a much larger JSON object describing the result:
{"command":"SELECT","rowCount":1,"oid":null,"rows":[{"sf_id":"a4T8A0000009Q9HUAU"}],"fields":[{"name":"sf_id","tableID":33799,"columnID":2,"dataTypeID":1042,"dataTypeSize":-1,"dataTypeModifier":22,"format":"text"}],"_parsers":[null],"RowCtor":null,"rowAsArray":false}
What do I have to do to just get the value itself? (Again, I'm OK if it's in an array -- I just don't want the column names.)
Take a look at the pluck method.
db(table).where('piv_id', myId).pluck('sf_id'); // => will return you ["a4T8A0000009PsfUAE"]