Insert the evaluation of a function in cassandra with JSON - apache-kafka

I am using kafka connect to sink some data from a kafka topic to a cassandra table. I want to add a column with a timestamp when the insert/update happens into cassandra. That is easy with postgres with functions and triggers. To use the same in cassandra that is not going to happen. I can not add java code to the cassandra cluster.
So I am thinking on how to add this value injecting a now() at some point of the kafka connect and inserted on the cassandra table as the result of the execution of the function.
I read kafka connect to cassandra uses de JSON cassandra insert api.
https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useInsertJSON.html
I tried to insert with different formats but nothing worked.
INSERT INTO keyspace1.table1 JSON '{ "lhlt" : "key 1", "last_updated_on_cassandra" : now() }';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Could not decode JSON string as a map: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'now': was expecting 'null', 'true', 'false' or NaN
at [Source: { "lhlt" : "key 1", "last_updated_on_cassandra" : now() }; line: 1, column: 74]. (String was: { "lhlt" : "key 1", "last_updated_on_cassandra" : now() })"

The CQL grammar does not support the use of functions when inserting data in JSON format.
The INSERT INTO ... JSON command only accepts valid JSON. For example:
INSERT INTO tstamp_tbl JSON '{
"id":1,
"tstamp":"2022-10-29"
}';
You will need to use the more generic INSERT syntax if you want to call CQL functions. For example:
INSERT INTO tstamp_tbl (id, tstamp)
VALUES ( 1, toTimestamp(now()) );
Cheers!

Related

ksqldb stream created with EXTRACTJSONFIELD fields contains null values for these fields

everyone. I am trying to create a stream from another stream (using CREATE FROM SELECT) that contains JSON data stored as VARCHAR. My first intention was to use EXTRACTJSONFIELD to extract specific JSON fields to separate columns. But on an attempt to query a new stream I am getting the null values for fields created using EXTRACTJSONFIELD.
Example:
My payload looks like this:
{
"data": {
"some_value": 3702,
},
"metadata": {
"timestamp": "2022-05-23T23:54:38.477934Z",
"table-name": "some_table"
}
}
I create stream using
CREATE STREAM SOME_STREAM (data VARCHAR, metadata VARCHAR) WITH (KAFKA_TOPIC='SOME_TOPIC', VALUE_FORMAT='JSON');
Data inside looks like this:
Then I am trying to create new stream using:
CREATE STREAM SOME_STREAM_QUERYABLE
AS SELECT
DATA,
METADATA,
EXTRACTJSONFIELD(METADATA, '$.timestamp') as timestamp
FROM SOME_STREAM
EMIT CHANGES
The attempt to query this new stream gives me null for timestamp field. Data and metadata fields have valid json.

How to parse array of struct when creating stream in KSQL

I'm trying to access a key called session ID, here is what the JSON of my message looks like:
{
"events":[
{
"data":{
"session_id":"-8479590123628083695"}}]}
This is my KSQL code to create the stream,
CREATE STREAM stream_test(
events ARRAY< data STRUCT< session_id varchar> >
) WITH (KAFKA_TOPIC='my_topic',
VALUE_FORMAT='JSON')
But I get this error,
KSQLError: ("line 4:22: mismatched input 'STRUCT' expecting {'(', 'ARRAY', '>'}", 40001, None)
Does anyone know how to unpack this kind of structure? I'm struggling to find examples
My solution:
events ARRAY< STRUCT<data STRUCT< session_id varchar >> >

ksqlDB get message's value as string column from Json topic

There is a topic, containing plain JSON messages, trying to create a new stream by extracting few columns from JSON and another column as varchar with the message's value.
Here is a sample message in a topic_json
{
"db": "mydb",
"collection": "collection",
"op": "update"
}
Creating a stream like -
CREATE STREAM test1 (
db VARCHAR,
collection VARCHAR
VAL STRING
) WITH (
KAFKA_TOPIC = 'topic_json',
VALUE_FORMAT = 'JSON'
);
Output of this stream will contain only db and collection columns, How can I add another column as message's value - "{\"db\":\"mydb\",\"collection\":\"collection\",\"op\":\"update\"}"

Can't insert string to Delta Table using Update in Pyspark

I have encountered an issue were it will not allow me to insert a string using update and returns. I'm running 6.5 (includes Apache Spark 2.4.5, Scala 2.11), but it is not working on 6.4 runtime as well.
I have a delta table with the following columns, partitioned by the created date
ID string
, addressLineOne string
, addressLineTwo string
, addressLineThree string
, addressLineFour string
, matchName string
, createdDate
And I'm running a process that hits an API and updates the matchName column.
Using Pyspark if it do this, just to test writing
deltaTable.update(col("ID") == "ABC123", {"matchName ": "example text"})
I get the following error:
Py4JJavaError: An error occurred while calling o1285.update.
: org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'example
If I try this, change the string to 123, it updates without an issue
deltaTable.update(col("ID") == "ABC123", {"matchName ": "123"})
Yet if I use sql and do
UPDATE myTable SET matchName = "Some text" WHERE ID = "ABC123"
It inserts fine. I've searched and can't see a similar issue, Any suggestions? Have I missed something obvious?
Looks like you have an extra space after matchName in your python code

Updating an array of objects fields in crate

I created a table with following syntax:
create table poll(poll_id string primary key,
poll_type_id integer,
poll_rating array(object as (rating_id integer,fk_user_id string, israted_image1 integer, israted_image2 integer, updatedDate timestamp, createdDate timestamp )),
poll_question string,
poll_image1 string,
poll_image2 string
)
And I inserted a record without "poll_rating" field which is actually an array of objects fields.
Now when I try to update a poll_rating with the following commands:
update poll set poll_rating = [{"rating_id":1,"fk_user_id":-1,"israted_image1":1,"israted_image2":0,"createddate":1400067339.0496}] where poll_id = "f748771d7c2e4616b1865f37b7913707";
I'm getting an error message like this:
"SQLParseException[line 1:31: no viable alternative at input '[']; nested: ParsingException[line 1:31: no viable alternative at input '[']; nested: NoViableAltException;"
Can anyone tell me why I get this error when I try to update the array of objects fields.
Defining arrays and objects directly in SQL statement is currently not supported by our SQL parser, please use parameter substitution using placeholders instead as described here:
https://crate.io/docs/current/sql/rest.html
Example using curl is as below:
curl -sSXPOST '127.0.0.1:4200/_sql?pretty' -d#- <<- EOF
{"stmt": "update poll set poll_rating = ? where poll_id = ?",
"args": [ [{"rating_id":1,"fk_user_id":-1,"israted_image1":1,"israted_image2":0,"createddate":1400067339.0496}], "f748771d7c2e4616b1865f37b7913707" ]
}
EOF