How to parse array of struct when creating stream in KSQL - apache-kafka

I'm trying to access a key called session ID, here is what the JSON of my message looks like:
{
"events":[
{
"data":{
"session_id":"-8479590123628083695"}}]}
This is my KSQL code to create the stream,
CREATE STREAM stream_test(
events ARRAY< data STRUCT< session_id varchar> >
) WITH (KAFKA_TOPIC='my_topic',
VALUE_FORMAT='JSON')
But I get this error,
KSQLError: ("line 4:22: mismatched input 'STRUCT' expecting {'(', 'ARRAY', '>'}", 40001, None)
Does anyone know how to unpack this kind of structure? I'm struggling to find examples

My solution:
events ARRAY< STRUCT<data STRUCT< session_id varchar >> >

Related

Insert the evaluation of a function in cassandra with JSON

I am using kafka connect to sink some data from a kafka topic to a cassandra table. I want to add a column with a timestamp when the insert/update happens into cassandra. That is easy with postgres with functions and triggers. To use the same in cassandra that is not going to happen. I can not add java code to the cassandra cluster.
So I am thinking on how to add this value injecting a now() at some point of the kafka connect and inserted on the cassandra table as the result of the execution of the function.
I read kafka connect to cassandra uses de JSON cassandra insert api.
https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useInsertJSON.html
I tried to insert with different formats but nothing worked.
INSERT INTO keyspace1.table1 JSON '{ "lhlt" : "key 1", "last_updated_on_cassandra" : now() }';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Could not decode JSON string as a map: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'now': was expecting 'null', 'true', 'false' or NaN
at [Source: { "lhlt" : "key 1", "last_updated_on_cassandra" : now() }; line: 1, column: 74]. (String was: { "lhlt" : "key 1", "last_updated_on_cassandra" : now() })"
The CQL grammar does not support the use of functions when inserting data in JSON format.
The INSERT INTO ... JSON command only accepts valid JSON. For example:
INSERT INTO tstamp_tbl JSON '{
"id":1,
"tstamp":"2022-10-29"
}';
You will need to use the more generic INSERT syntax if you want to call CQL functions. For example:
INSERT INTO tstamp_tbl (id, tstamp)
VALUES ( 1, toTimestamp(now()) );
Cheers!

ksqldb stream created with EXTRACTJSONFIELD fields contains null values for these fields

everyone. I am trying to create a stream from another stream (using CREATE FROM SELECT) that contains JSON data stored as VARCHAR. My first intention was to use EXTRACTJSONFIELD to extract specific JSON fields to separate columns. But on an attempt to query a new stream I am getting the null values for fields created using EXTRACTJSONFIELD.
Example:
My payload looks like this:
{
"data": {
"some_value": 3702,
},
"metadata": {
"timestamp": "2022-05-23T23:54:38.477934Z",
"table-name": "some_table"
}
}
I create stream using
CREATE STREAM SOME_STREAM (data VARCHAR, metadata VARCHAR) WITH (KAFKA_TOPIC='SOME_TOPIC', VALUE_FORMAT='JSON');
Data inside looks like this:
Then I am trying to create new stream using:
CREATE STREAM SOME_STREAM_QUERYABLE
AS SELECT
DATA,
METADATA,
EXTRACTJSONFIELD(METADATA, '$.timestamp') as timestamp
FROM SOME_STREAM
EMIT CHANGES
The attempt to query this new stream gives me null for timestamp field. Data and metadata fields have valid json.

ksqlDB get message's value as string column from Json topic

There is a topic, containing plain JSON messages, trying to create a new stream by extracting few columns from JSON and another column as varchar with the message's value.
Here is a sample message in a topic_json
{
"db": "mydb",
"collection": "collection",
"op": "update"
}
Creating a stream like -
CREATE STREAM test1 (
db VARCHAR,
collection VARCHAR
VAL STRING
) WITH (
KAFKA_TOPIC = 'topic_json',
VALUE_FORMAT = 'JSON'
);
Output of this stream will contain only db and collection columns, How can I add another column as message's value - "{\"db\":\"mydb\",\"collection\":\"collection\",\"op\":\"update\"}"

How Create KSQLdb Stream fields from nested JSON Object

I have a topic over which i am sending json in the following format:
{
"schema": {
"type": "string",
"optional": true
},
"payload": “CustomerData{version='1', customerId=‘76813432’, phone=‘76813432’}”
}
and I would like to create a stream with customerId, and phone but I am not sure how to define the stream in terms of a nested json object. (edited)
CREATE STREAM customer (
payload.version VARCHAR,
payload.customerId VARCHAR,
payload.phone VARCHAR
) WITH (
KAFKA_TOPIC='customers',
VALUE_FORMAT='JSON'
);
Would it be something like that? How do I de-reference the nested object in defining the streams field?
Actually the above does not work for field definitions it says:
Caused by: line 2:12:
extraneous input '.' expecting {'EMIT', 'CHANGES',
'INTEGER', 'DATE', 'TIME', 'TIMESTAMP', 'INTERVAL', 'YEAR', 'MONTH', 'DAY',
Applying function extractjsonfield
There is an ksqlDB function called extractjsonfield that you can use.
First, you need to extract the schema and payload fields:
CREATE STREAM customer (
schema VARCHAR,
payload VARCHAR
) WITH (
KAFKA_TOPIC='customers',
VALUE_FORMAT='JSON'
);
Then you can select the nested fields within the json:
SELECT EXTRACTJSONFIELD(payload, '$.version') AS version FROM customer;
However, it looks like your payload data does not have a valid JSON format.
Applying a STRUCT schema
If your entire payload is encoded as JSON string, which means your data looks like:
{
"schema": {
"type": "string",
"optional": true
},
"payload": {
"version"="1",
"customerId"="76813432",
"phone"="76813432"
}
}
you can define STRUCT as below:
CREATE STREAM customer (
schema STRUCT<
type VARCHAR,
optional BOOLEAN>,
payload STRUCT<
version VARCHAR,
customerId VARCHAR,
phone VARCHAR>
)
WITH (
KAFKA_TOPIC='customers',
VALUE_FORMAT='JSON'
);
and finally referencing individual fields can be done like this:
CREATE STREAM customer_analysis AS
SELECT
payload->version as VERSION,
payload->customerId as CUSTOMER_ID,
payload->phone as PHONE
FROM customer
EMIT CHANGES;

Updating an array of objects fields in crate

I created a table with following syntax:
create table poll(poll_id string primary key,
poll_type_id integer,
poll_rating array(object as (rating_id integer,fk_user_id string, israted_image1 integer, israted_image2 integer, updatedDate timestamp, createdDate timestamp )),
poll_question string,
poll_image1 string,
poll_image2 string
)
And I inserted a record without "poll_rating" field which is actually an array of objects fields.
Now when I try to update a poll_rating with the following commands:
update poll set poll_rating = [{"rating_id":1,"fk_user_id":-1,"israted_image1":1,"israted_image2":0,"createddate":1400067339.0496}] where poll_id = "f748771d7c2e4616b1865f37b7913707";
I'm getting an error message like this:
"SQLParseException[line 1:31: no viable alternative at input '[']; nested: ParsingException[line 1:31: no viable alternative at input '[']; nested: NoViableAltException;"
Can anyone tell me why I get this error when I try to update the array of objects fields.
Defining arrays and objects directly in SQL statement is currently not supported by our SQL parser, please use parameter substitution using placeholders instead as described here:
https://crate.io/docs/current/sql/rest.html
Example using curl is as below:
curl -sSXPOST '127.0.0.1:4200/_sql?pretty' -d#- <<- EOF
{"stmt": "update poll set poll_rating = ? where poll_id = ?",
"args": [ [{"rating_id":1,"fk_user_id":-1,"israted_image1":1,"israted_image2":0,"createddate":1400067339.0496}], "f748771d7c2e4616b1865f37b7913707" ]
}
EOF