How to convert a response from KSQL - UDF returning JSON array to columns - apache-kafka

I have a custom UDF called getCityStats(string city, double distance) which takes 2 arguments and returns an array of JSON strings ( Objects) as follows
{"zipCode":"90921","mode":3.54}
{"zipCode":"91029","mode":7.23}
{"zipCode":"96928","mode":4.56}
{"zipCode":"90921","mode":6.54}
{"zipCode":"91029","mode":4.43}
{"zipCode":"96928","mode":3.96}
I would like to process them in a KSQL table creation query as
create table city_stats
as
select
zipCode,
avg(mode) as mode
from
(select
getCityStats(city,distance) as (zipCode,mode)
from
city_data_stream
) t
group by zipCode;
In other words can KSQL handle tuple type where an array of Json strings can be processed to return as indicated above in a table creation query?

No, KSQL doesn't currently support the syntax you're suggesting. Whilst KSQL can work with arrays, it doesn't get support any kind of explode function, so you can reference specific index points in the array only.
Feel free to view and upvote if appropriate these issues: #527, #1830, or indeed raise your own if they don't cover what you want to do.

Related

Iterating a json array in postgresesql and filtering based on data

[{“name”=“abc”,”type”=“charts”},{“name”=“def”,”type”=“transactions”}]
Attachment column gives me this data but I need to iterate this and check if type is present and if type=charts or transactions ..mainly we need to filter it out .can someone help me on this as I am new to Postgres
It's unclear to me what exactly you want to filter.
If you just want to see rows that contain a specific type, you can use:
select *
from the_table
where attachment #> '[{"type": "transactions"}]';
This will return all rows that include at least one element with {"type": "transactions"} in the array.
This assumes that your attachment column is defined as jsonb. If it's not you will need to cast it.

How do I query Postgresql with IDs from a parquet file in an Data Factory pipeline

I have an azure pipeline that moves data from one point to another in parquet files. I need to join some data from a Postgresql database that is in an AWS tenancy by a unique ID. I am using a dataflow to create the unique ID I need from two separate columns using a concatenate. I am trying to create where clause e.g.
select * from tablename where unique_id in ('id1','id2',id3'...)
I can do a lookup query to the database, but I can't figure out how to create the list of IDs in a parameter that I can use in the select statement out of the dataflow output. I tried using a set variable and was going to put that into a for-each, but the set variable doesn't like the output of the dataflow (object instead of array). "The variable 'xxx' of type 'Array' cannot be initialized or updated with value of type 'Object'. The variable 'xxx' only supports values of types 'Array'." I've used a flatten to try to transform to array, but I think the sync operation is putting it back into JSON?
What a workable approach to getting the IDs into a string that I can put into a lookup query?
Some notes:
The parquet file has a small number of unique IDs compared to the total unique IDs in the database.
If this were an azure postgresql I could just use a join in the dataflow to do the join, but the generic postgresql driver isn't available in dataflows. I can't copy the entire database over to Azure just to do the join and I need the dataflow in Azure for non-technical reasons.
Edit:
For clarity sake, I am trying to replace local python code that does the following:
query = "select * from mytable where id_number in "
df = pd.read_parquet("input_file.parquet")
df['id_number'] = df.country_code + df.id
df_other_data = pd.read_sql(conn, query + str(tuple(df.id_number))
I'd like to replace this locally executing code with ADF. In the ADF process, I have to replace the transformation of the IDs which seems easy enough if a couple of different ways. Once I have the IDs in the proper format in a column in a dataset, I can't figure out how to query a database that isn't supported by Data Flow and restrict it to only the IDs I need so I don't bring down the entire database.
Due to variables of ADF only can store simple type. So we can define an Array type paramter in ADF and set default value. Paramters of ADF support any type of elements including complex JSON structure.
For example:
Define a json array:
[{"name": "Steve","id": "001","tt_1": 0,"tt_2": 4,"tt3_": 1},{"name": "Tom","id": "002","tt_1": 10,"tt_2": 8,"tt3_": 1}]
Define an Array type paramter and set its default value:
So we will not get any error.

Is it possible to Group By a field extracted from JSON input in a Siddhi Query?

I currently have an stream, with 1 string attribute that contains a Json event.
This stream receives different events, which I want to apply Json path expressions so I can use those attributes on filters and functions.
JsonPath extractors work like a charm on filters and selectors, unfortunately, I am not being able to use them for the 'Group By' part.
I am actually doing it in an embedded Siddhi App with siddhi-execution-json extension added manually, but for the discussion, so everybody can
easily check and test it, I will paste an example app that works on WSO2 Stream Processor.
The objective looks like the following App:
#App:name("Group_by_json_attribute")
define stream JsonStream(json string);
#sink(type='log')
define stream LogStream(myField string, count long);
#info(name='query1')
from JsonStream#window.time(10 sec)
select json:getString(json, '$.myField') as myField, count() as count
group by myField having count > 1
insert into LogStream;
and it can accept the following events:
{"myField": "my_value"}
However, this query will raise the error:
Cannot find attribute type as 'myField' does not exist in 'JsonStream'; define stream JsonStream(json string)
I have also tried to use directly the Json extractor at 'Group by':
group by json:getString(json, '$.myField') as myField having count > 1
However the error now is:
mismatched input ':' expecting {',', ORDER, LIMIT, OFFSET, HAVING, INSERT, DELETE, UPDATE, RETURN, OUTPUT}
which seems to not be expecting to use an extension here
I am just wondering, if it is possible to group by attributes not directly defined in the input stream. In this case is a field extracted from a JSON object, but it could be any other function that generates another attribute.
I am also using versions from maven central repository
Siddhi: io.siddhi:siddhi-core:5.0.1
siddhi-execution-json: io.siddhi.extension.execution.json:siddhi-execution-json:2.0.1
(Edit) Clarification
The objective is, to use attributes not directly defined in the Stream, to be used on the Group By.
The reason why is, I currently have an embedded app which defines the whole set of input streams coming from external sources formatted as JSON, and there are also a set of output streams to inform external components when a query matches.
This app allows users to create custom queries on this set of predefined Streams, but they are not able to create Streams by their own.
Many thanks!
It seems we are expecting the group by fields from the query input stream, in this case, JsonStream. Use another query before this for extraction and the aggregation and filtering in the following query,
#App:name("Group_by_json_attribute")
define stream JsonStream(json string);
#sink(type='log')
define stream LogStream(myField string, count long);
#info(name='extract_stream')
from JsonStream
select json:getString(json, '$.myField') as myField
insert into ExtractedStream;
#info(name='query1')
from ExtractedStream#window.time(10 sec)
select myField, count() as count
group by myField
having count > 1
insert into LogStream;

PostgreSql Search JSON And Match Whole Term

I have a Postgres database using JSON storage. I have a table of cameras and lenses with a single property to search against called BrandAndModel. The relevant JSON portion looks like this and is stored in a column called "data":
"BrandAndModel": "nikon nikkor 50mm f/1.4 ai-s"
I have a LIKE query running against this brand and model string but it only returns a result of the sequence of characters matches. For instance, the above does get results for "nikkor 50mm" but NOT "nikon 50mm".
I'm no SQL expert and I'm not sure what I need to use to match more possible combinations.
My query looks like this
SELECT * FROM listing where data ->> 'Product' ->> 'BrandAndModel' like '%nikon 50mm%'
How could I get this query to match "nikon 50mm"?
You may use ANY with an array for multiple comparisons.
LIKE ANY(ARRAY['%nikon%ai-s%', 'nikon%50mm%', '%nikkor%50mm%'])

Convert varchar parameter with CSV into column values postgres

I have a postgres query with one input parameter of type varchar.
value of that parameter is used in where clause.
Till now only single value was sent to query but now we need to send multiple values such that they can be used with IN clause.
Earlier
value='abc'.
where data=value.//current usage
now
value='abc,def,ghk'.
where data in (value)//intended usage
I tried many ways i.e. providing value as
value='abc','def','ghk'
Or
value="abc","def","ghk" etc.
But none is working and query is not returning any result though there are some matching data available. If I provide the values directly in IN clause, I am seeing the data.
I think I should somehow split the parameter which is comma separated string into multiple values, but I am not sure how I can do that.
Please note its Postgres DB.
You can try to split input string into an array. Something like that:
where data = ANY(string_to_array('abc,def,ghk',','))