Format my records in topic - key: 1-2MDWE7JT, I want to convert it into key: {"some_id" : "1-2MDWE7JT"}.
How can I include a field into json with ksql functions? I didn't find a reverse function for "EXTRACTJSONFIELD" like "INCLUDEJSONFIELD".
If the data is written as key: 1-2MDWE7JT in the input topic, you can only read it this way, ie, the schema would a plain VARCHAR type. You could wrap the key into a STRUCT dynamically at query time though
SELECT STRUCT(f1 := v1, f2 := v2) FROM s1;
Cf https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/select-pull-query/#struct-output
Thus, you would create an output STREAM that contains the key wrapped as like you want if (if you use JSON key format).
Related
I need to copy messages from one Kafka topic to another based on a specific JSON property. That is, if property value is "A" - copy the message, otherwise do not copy. I'm trying to figure out the simplest way to do it with KSQL. My source messages all have my test property, but otherwise have very different and complex schema. Is there a way to have "schemaless" setup for this?
Source message (example):
{
"data": {
"propertyToCheck": "value",
... complex structure ...
}
}
If I define my "data" as VARCHAR in the stream I can examine the property further on with EXTRACTJSONFIELD.
CREATE OR REPLACE STREAM Test1 (
`data` VARCHAR
)
WITH (
kafka_topic = 'Source_Topic',
value_format = 'JSON'
);
In this case however, my "select" stream will produce data as JSON string instead of raw JSON (which is what I want).
CREATE OR REPLACE STREAM Test2 WITH (
kafka_topic = 'Target_Topic',
value_format = 'JSON'
)AS
SELECT
`data` AS `data`
FROM Test1
EMIT CHANGES;
Any ideas how to make this work?
This is a bit of a workaround, but you can achieve your desired behavior as follows: instead of defining your message schema as VARCHAR, use the BYTES type instead. Then use FROM_BYTES in combination with EXTRACTJSONFIELD to read the property you'd like to filter on from the bytes representation.
Here's an example:
Here's a source stream, with nested JSON data, and one example row of data:
CREATE STREAM test (data STRUCT<FOO VARCHAR, BAR VARCHAR>) with (kafka_topic='test', value_format='json', partitions=1);
INSERT INTO test (data) VALUES (STRUCT(FOO := 'foo', BAR := 'bar'));
Now, represent the data as bytes (using the KAFKA format), instead of as JSON:
CREATE STREAM test_bytes (data BYTES) WITH (kafka_topic='test', value_format='kafka');
Next, perform the filter based on the nested JSON data:
CREATE STREAM test_filtered_bytes WITH (kafka_topic='test_filtered') AS SELECT * FROM test_bytes WHERE extractjsonfield(from_bytes(data, 'utf8'), '$.DATA.FOO') = 'foo';
The newly created topic "test_filtered" now has data in proper JSON format, analogous to the source stream "test". We can verify by representing the stream in the original format and reading it back to check:
CREATE STREAM test_filtered (data STRUCT<FOO VARCHAR, BAR VARCHAR>) WITH (kafka_topic='test_filtered', value_format='json');
SELECT * FROM test_filtered EMIT CHANGES;
I verified that these example statements work for me as of the latest ksqlDB version (0.27.2). They should work the same on all ksqlDB versions ever since the BYTES type and relevant built-in functions were introduced.
Using ksqlDB scalar functions such as EXTRACTJSONFIELD or JSON_RECORDS might help you.
I have issue with extracting text from string data using T-SQL. I have following data and text that I would like to extract (value between second and third underscore:
sss_ss_d_a -> d
aaa_dd_b -> b
aaa_aa -> NULL
I know that there is a lot of similar topics, but I have problem especially with handling NULLs and situation when there are no delimiter after desired value
Thank you very much for your help, regards
Try it like this:
Some sample data in a declared table
DECLARE #tbl TABLE(YourString VARCHAR(100));
INSERT INTO #tbl VALUES('sss_ss_d_a'),('aaa_dd_b'),('aaa_aa');
--The query
SELECT t.YourString
,CAST(CONCAT('<x>',REPLACE(t.YourString,'_','</x><x>'),'</x>') AS XML).value('/x[3]/text()[1]','nvarchar(max)')
FROM #tbl t;
The idea in short:
We replace the underscores with XML tags thus transforming the string to castable XML.
We use XQuery within .value to pick the third element.
Starting with v2016 the recommended approach uses JSON instead
(hint: we use "2" instead of "3" as JSON is zero-based).
,JSON_VALUE(CONCAT('["',REPLACE(t.YourString,'_','","'),'"]'),'$[2]')
The idea is roughly the same.
General hint: Both approaches might need escaping forbidden characters, such as < or & in XML or " in JSON. But this is easy...
I just started using ksql, when I do print topic from beginning I get data in below format.
rowtime: 4/12/20, 9:00:05 AM MDT, key: {"messageId":null}, value: {"WHS":[{"Character Set":"UTF-8","action":"finished","Update-Date-Time":"2020-04-11 09:00:02:25","Number":0,"Abbr":"","Name":"","Name2":"","Country-Code":"","Addr-1":"","Addr-2":"","Addr-3":"","Addr-4":"","City":"","State":""}]}
But all the examples in KSQL have the data in below format
{"ROWTIME":1537436551210,"ROWKEY":"3375","rating_id":3375,"user_id":2,"stars":3,"route_id":6972,"rating_time":1537436551210,"channel":"web","message":"airport refurb looks great, will fly outta here more!"}
so I'm not able to perform any operations, the format is showing as
Key format: JSON or SESSION(KAFKA_STRING) or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING
Value format: JSON or KAFKA_STRING
on my topic. How can I modify the data into the specific format?
Thanks
ksqlDB does not yet support JSON message keys, (See the tracking Github issue).
However, you can still access the data, both in the key and the value. The JSON key is just a string after all!
The value, when reformatted, looks like this:
{
"WHS":[
{
"Character Set":"UTF-8",
"action":"finished",
"Update-Date-Time":"2020-04-11 09:00:02:25",
"Number":0,
"Abbr":"",
"Name":"",
"Name2":"",
"Country-Code":"",
"Addr-1":"",
"Addr-2":"",
"Addr-3":"",
"Addr-4":"",
"City":"",
"State":""
}
]
}
Which, assuming all rows share a common format, ksqlDB can easily handle.
To import your stream you should be able to run something like this:
-- assuming v0.9 of Kafka
create stream stuff
(
ROWKEY STRING KEY,
WHS ARRAY<
STRUCT<
`Character Set` STRING,
action STRING,
`Update-Date-Time` STRING,
Number STRING,
... etc
>
>
)
WITH (kafka_topic='?', value_format='JSON');
The value column WHS is an array of structs, (where the will be only one element), and the struct defines all the fields you need to access. Note, some field names needed quoting as they contained invalid characters, e.g. spaces and dashes.
I have a custom UDF called getCityStats(string city, double distance) which takes 2 arguments and returns an array of JSON strings ( Objects) as follows
{"zipCode":"90921","mode":3.54}
{"zipCode":"91029","mode":7.23}
{"zipCode":"96928","mode":4.56}
{"zipCode":"90921","mode":6.54}
{"zipCode":"91029","mode":4.43}
{"zipCode":"96928","mode":3.96}
I would like to process them in a KSQL table creation query as
create table city_stats
as
select
zipCode,
avg(mode) as mode
from
(select
getCityStats(city,distance) as (zipCode,mode)
from
city_data_stream
) t
group by zipCode;
In other words can KSQL handle tuple type where an array of Json strings can be processed to return as indicated above in a table creation query?
No, KSQL doesn't currently support the syntax you're suggesting. Whilst KSQL can work with arrays, it doesn't get support any kind of explode function, so you can reference specific index points in the array only.
Feel free to view and upvote if appropriate these issues: #527, #1830, or indeed raise your own if they don't cover what you want to do.
I have a column in Postgres database which has a json
{"predict":[{"method":"A","val":1.2},{"method":"B","val":1.7}]}
I would like to extract both val as a separate column. Is there a way I could do this from within Postgres?
Postgres introduced JSON types plus functions and operators in 9.2. If your column is a JSON type you can use them to do your extraction.
If the json always has the structure you indicate (i.e. a key "predict" that holds an array with two JSON objects each having a "method" and a "val" key) then the solution is simply:
SELECT ((my_json->'predict')->>0)->'val' AS method_a,
((my_json->'predict')->>1)->'val' AS method_b
FROM my_table;
If the structure can vary then you'd have to tell us more about it to provide you with a solution.