Querying List in Snowflake - snowflake-schema

I have a table like below in Snowflake
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|attrs |options |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|acct |[{"key":"spencer","val":"spencer"},{"key":"mart","val":"mart"},{"key":"red-fin","val":"-"}] |
|fav_activity |[{"key":"movie","val":"movie"},{"key":"books","val":"books"},{"key":"music","val":"music"},{"key":"swimming","val":"swimming"},{"key":"games","val":"games"},{"key":"team","val":"team"},{"key":"food","val":"food"},{"key":"steam-room","val":"\"steam room\""},{"key":"hiking","val":"hiking"}] |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Now I need to compare each key and val of the list of dictionaries and return only key of the dictionary whose key/val are distinct.
For example, looking at the attrs acct - There are 3 dictionaries in the list. The key and val of the first two dictionaries are identical but the third one has distinct key/val values. So I need to return the key of the third dict which is red-fin
Expected output :
+---------------+------------+
|attrs |result |
+---------------+------------+
|acct |red-fin |
|fav_activity |steam-room |
+---------------+------------+
Can someone help me doing this in Snowflake.
Thanks in advance.

You just need to flatten and compare.
This solves it for the example test case:
with data as (
select $1 attrs, parse_json($2) options
from values ('acct', '[{"key":"spencer","val":"spencer"},{"key":"mart","val":"mart"},{"key":"red-fin","val":"-"}]')
, ('fav_activity', '[{"key":"movie","val":"movie"},{"key":"books","val":"books"},{"key":"music","val":"music"},{"key":"swimming","val":"swimming"},{"key":"games","val":"games"},{"key":"team","val":"team"},{"key":"food","val":"food"},{"key":"steam-room","val":"\\"steam room\\""},{"key":"hiking","val":"hiking"}]')
)
select attrs, k.value:key
from data, table(flatten(options)) k
where k.value:key!=k.value:val
;

Related

How to convert nested json to data frame with kdb+

I am trying to get the data from cryptostats like below, it gives me back a nested json. I want it to be in a table format. How do I do that?
query:"https://api.cryptostats.community/api/v1/fees/oneDayTotalFees/2023-02-07";
raw:.Q.hg query;
res:.j.k raw;
To get json file, use https://api.cryptostats.community/api/v1/fees/oneDayTotalFees/2023-02-07
To view json code into a table format, use https://jsongrid.com/json-grid
Final result would be a kdb+ table which has all the cols from nested json output
They are all dictionaries
q)distinct type each res[`data]
,99h
But they do not collapse to a table because they do not all have matching keys
q)distinct key each res[`data]
`id`bundle`results`metadata`errors
`id`bundle`results`metadata
Looking at a row where errors is populated we can see it is a dictionary
q)res[`data;0;`errors]
oneDayTotalFees| "Error executing oneDayTotalFees on compound: Date incomplete"
You can create a prototype dictionary with a blank errors key in it and join , each piece of data onto it. This will result in uniform dictionaries which will be promoted to a table type 98h
q)table:(enlist[`errors]!enlist (`$())!()),/:res`data
q)type table
98h
Row which already had errors is unaffected:
q)table 0
errors | (,`oneDayTotalFees)!,"Error executing oneDayTotalFees..
id | "compound"
bundle | 0n
results | (,`oneDayTotalFees)!,0n
metadata| `source`icon`name`category`description`feeDescription;..
Row which previously did not have errors now has a valid empty dictionary
q)table 1
errors | (`symbol$())!()
id | "swapr-ethereum"
bundle | "swapr"
results | (,`oneDayTotalFees)!,24.78725
metadata| `category`name`icon`bundle`blockchain`description`feeDescription..
https://kx.com/blog/kdb-q-insights-parsing-json-files/
https://code.kx.com/q/ref/join/
https://code.kx.com/q/kb/faq/#construction
https://code.kx.com/q/basics/datatypes/
https://code.kx.com/q/ref/maps/#each-left-and-each-right
If you want to explore nested objects you can index at depth (see blog post linked above). If you have many sparse keys leaving it like this is efficient for storage:
q)select tokenSymbol:metadata[::;`tokenSymbol] from table where not ""~/:metadata[::;`tokenSymbol]
tokenSymbol
-----------
"HNY"
If you do wish to explode a nested field you can run similar to:
q)table:table,'{flip c!flip table[`metadata]#\:(c:distinct raze key each table[`metadata])}[]
q)meta table
c | t f a
----------------| -----
errors |
id | C
bundle | C
results |
metadata |
source | C
icon | C
name | C
category | C
description | C
feeDescription | C
blockchain | C
website | C
tokenTicker | C
tokenCoingecko | C
protocolLaunch | C
tokenLaunch | C
adapter | C
subtitle | C
events | C
shortName | C
protocolShutdown| C
tokenSymbol | C
subcategory | C
tokenticker | C
tokencoingecko | C
Care needs to be taken will filling in nulls and keeping consistent types of data in each column. In this dataset the events tag inside metadata is tabular data:
q)select distinct type each events from table
events
------
10
98
0
This would need to be cleaned similar to:
q)table:update events:count[i]#enlist ([] date:();description:()) from table where not 98h=type each events
The data returned from the API contains dictionaries with two distinct sets of keys:
q)distinct key each res`data
`id`bundle`results`metadata`errors
`id`bundle`results`metadata
One simple way to convert this to a table is to enlist each dictionary first, converting them to tables, then joining with uj:
q)(uj/)enlist each res`data
id bundle results metadata ..
-----------------------------------------------------------------------------..
"compound" 0n (,`oneDayTotalFees)!,0n `source`i..
"swapr-ethereum" "swapr" (,`oneDayTotalFees)!,24.78725 `category..
...
This works as uj generalises the join operator ,, allowing different schemas with common elements to be combined.

How to use ts_query with ANY(anyarray)

I currently have a query in PostgreSQL like:
SELECT
name
FROM
ingredients
WHERE
name = ANY({"string value",tomato,other})
My ingredients table is simply a list of names:
name
----------
jalapeno
tomatoes
avocados
lime
My issue is that plural values in the array will not match single values in the query. To solve this, I created a tsvector column on the table:
name | tokens
---------------+--------------
jalapeno | 'jalapeno':1
tomatoes | 'tomato':1
avocados | 'avocado':1
lime | 'lime':1
I'm able to correctly query single values from the table like this:
SELECT
name,
ts_rank_cd(tokens, plainto_tsquery('tomato'), 16) AS rank
FROM
ingredients
WHERE
tokens ## plainto_tsquery('tomato')
ORDER BY
rank DESC;
However, I need to query values from the entire array. The array is generated from another function, so I have control over the type of each of items in the array.
How can I use the ## operand with ANY(anyarray)?
That should be straight forward:
WHERE tokens ## ANY
(ARRAY[
plainto_tsquery('tomato'),
plainto_tsquery('celery'),
plainto_tsquery('vodka')
])

KDB: How to assign string datatype to all columns

When I created the table Tab, I specified the columns as string,
Tab: ([Key1:string()] Col1:string();Col2:string();Col3:string())
But the column datatype (t) is empty. I suppose specifying the column as string has no effect.
meta Tab
c t f a
--------------------
Key1
Col1
Col2
Col3
After I do a bulk upsert in Java...
c.Dict dict = new c.Dict((Object[]) columns.toArray(new String[columns.size()]), data);
c.Flip flip = new c.Flip(dict);
conn.c.ks("upsert", table, flip);
The datatypes are all symbols:
meta Tab
c t f a
--------------------
Key1 s
Col1 s
Col2 s
Col3 s
How can I specify the datatype of the columns as string and have it remain as string?
You cant define a column of the empty table with as strings as they are merely lists of lists of characters
You can just set them as empty lists which is what your code is doing.
But the column will then take on the type of whatever data is inserted into it.
Real question is what is your java process sending symbols when it should be sending strings. You need to make the change there before publishing to KDB
Note if you define as chars you still wont be able to upsert strings
q)Tab: ([Key1:`char$()] Col1:`char$();Col2:`char$();Col3:`char$())
q)Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
'rank
[0] Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
^
q)Tab: ([Key1:()] Col1:();Col2:();Col3:())
q)Tab upsert ([Key1:enlist"test"] Col1:enlist"test";Col2:enlist"test";Col3:enlist "test")
Key1 | Col1 Col2 Col3
------| --------------------
"test"| "test" "test" "test"
KDB does not allow to define column types as list during creation of table. So that means you can not define your column type as String because that is also a list.
To do that only way is to define column as empty list like:
q) t:([]id:`int$();val:())
Then when you insert data to this table the column will automatically take type of that data.
q)`t insert (4;"row1")
q) meta t
c | t f a
---| -----
id | i
val| C
In your case, one option is to send string data from your Java process as mentioned by user 'emc211' or other option is to convert your data to string in KDB process before insertion.

Looking for a value in a jsonb list of keys/values

I have a postgresql table of cities (1 row = 1 city) with a jsonb colum containing the name of the city in different languages (as a list, not an array). For example for Paris(France) I have:
id_city (integer) = 7444
name_city (text) = Paris
names_i18n (jsonb) = {"name:fr":"Paris","name:zh":"巴黎","name:it":"Parigi",...}
In reality in my table I have around 20 different languages. So I try to find a city looking for any name:xx's value that could match a parameter given by the user, but I can't find how to query the jsonb column in that way. I've tried something like the request below but it doesn't seem to be the good syntaxe
select * from jsonb_each_text(select names_i18n from CityTable)
where value ilike 'Parigi'
I have also tried the following
select * from CityTable where names_i18n ? 'Parigi';
But it seems to work only for the key part of the jsonb, is there any similar operator for the value part? I also need a way to know what name:XX has been found, not only the city name.
Anyone has a clue?
with CityTable (id_city, name_city, names_i18n) as (values(
7444, 'Paris',
'{"name:fr":"Paris","name:zh":"巴黎","name:it":"Parigi"}'::jsonb
))
select *
from CityTable, jsonb_each_text(names_i18n) jbet (key, value)
where value ilike 'Parigi'
;
id_city | name_city | names_i18n | key | value
---------+-----------+--------------------------------------------------------------+---------+--------
7444 | Paris | {"name:fr": "Paris", "name:it": "Parigi", "name:zh": "巴黎"} | name:it | Parigi

Search inside full search column using certain letters

I want to search inside a full search column using certain letters, I mean:
select "Name","Country","_score" from datatable where match("Country", 'China');
Returns many rows and is ok. My question is, how can I search for example:
select "Name","Country","_score" from datatable where match("Country", 'Ch');
I want to see, China, Chile, etc.
I think that match_type phrase_prefix can be the answer, but I don't know how I can use (correct syntax).
The match predicate supports different types by use of using match_type [with (match_parameter = [value])].
So in your example using the phrase_prefix match type:
select "Name","Country","_score" from datatable where match("Country", 'Ch') using phrase_prefix;
gives you your desired results.
See the match predicate documentation: https://crate.io/docs/en/latest/sql/fulltext.html?#match-predicate
If you just need to match the beginning of a string column, you don't need a fulltext analyzed column. You can use the LIKE operator instead, e.g.:
cr> create table names_table (name string, country string);
CREATE OK (0.840 sec)
cr> insert into names_table (name, country) values ('foo', 'China'), ('bar','Chile'), ('foobar', 'Austria');
INSERT OK, 3 rows affected (0.049 sec)
cr> select * from names_table where country like 'Ch%';
+---------+------+
| country | name |
+---------+------+
| Chile | bar |
| China | foo |
+---------+------+
SELECT 2 rows in set (0.037 sec)