KDB+/Q: How to categorize data - kdb

My column has categorical data. E.g. cat and dog are animals, ant,bee and wasp are insects.
t:([] creature:`cat`dog`ant`bee`wasp`crocodile; cnt:til 5)
And I need to add a column 'category' to show creature type.
I know how to do dict mapping, but this looks ugly:
update category:creature ^ ((`cat`dog`ant`bee`wasp)!(`animal`animal`insect`insect`insect)) creature from t
I could use it if I knew how to create such mapping dictionary from simple lists like:
mapping: (???) ((`cat`dog;`animal);(`ant`bee`wasp;`insect)

In the format provided you can use flip each to create all the pairs for the dictionary:
q)flip each ((`cat`dog;`animal);(`ant`bee`wasp;`insect))
(`cat`animal;`dog`animal)
(`ant`insect;`bee`insect;`wasp`insect)
Which can then be turned into a dictionary by razing that into a single list first:
(!). flip raze flip each ((`cat`dog;`animal);(`ant`bee`wasp;`insect))
cat | animal
dog | animal
ant | insect
bee | insect
wasp| insect

Or you could make a table and run whatever kind of join suits the case.
q)ungroup flip`creature`category!flip L
creature category
-----------------
cat animal
dog animal
ant insect
bee insect
wasp insect

Related

How to use JSON_CONTAINS in spring data jpa+queryDsl?

For example,
row1 ["cat","dog","bird"]
row2 ["horse","dog","bird"]
row3 ["honeybee","cat"]
...
I want to query whether cat exists in the categories field in the animal table.What should I do?
I try to Expressions.numberTemplate("JSON_CONTAINS...") method,
but I don't know how to use Expressions.xxxTemplate.

Pattern matching Postgres to replace misspelled words of a column

I have a table A :
name
renamed_name
HON/A
HONDA TRUCK
GMC
and I have a renaming rules table B:
rule
correct_name
HON/A
HONDA
HONDA TRUCK
HONDA
^GMC.+
GMC
I need to update table A and set the column A.renamed_name to the B.correct_name of Table B where A.name matched any of B.rule.
When I use the following update query:
Update A set A.renamed_name = B.correct_name from B where name ~* any(array[B.rule]) gives me a result
name
renamed_name
HON/A
HONDA
HONDA TRUCK
HONDA
GMC
NULL
The last row is not updated though my condition check includes regex exp. Please let me know where I can be possibly going wrong or if there is an alternate solution.

Function updating multiple columns in kdb

I have a table i want to update:
q)show table:([]letter:`a`b`c`a;fruit:`apple`banana`pear`strawberry;family:`mom`dad`brother`sister)
letter fruit family
-------------------------
a apple mom
b banana dad
c pear brother
a strawberry sister
I want to replace all entries with the name of their respective column.
This seems to work:
q){![table;();0b;(enlist x)!(enlist `x)]}`letter
letter fruit family
-------------------------
letter apple mom
letter banana dad
letter pear brother
letter strawberry sister
...but not this:
q){![table;();0b;(enlist x)!(enlist `x)]}`letter`fruit
'type
[1] {![table;();0b;(enlist x)!(enlist `x)]}
^
q))
The purpose is to create a function that creates dummy variables for categorical variables, so I need a general function. Any suggestions?
This can be achieved with # apply instead of a functional update like so:
q){#[`table;x;:;x]}`letter`fruit
`table
q)table
letter fruit family
--------------------
letter fruit mom
letter fruit dad
letter fruit brother
letter fruit sister
Edit - For all cols:
{#[`table;x;:;x]} each cols table
or
{#[x;y;:;y]}/[table;cols table]
q)table
letter fruit family
-------------------
letter fruit family
letter fruit family
letter fruit family
letter fruit family
Your original functional update could work like this:
q){![table;();0b;x!enlist'[x:(),x]]}cols table
letter fruit family
-------------------
letter fruit family
letter fruit family
letter fruit family
letter fruit family

How to count the number of times a certain value occurs in a column postgresql

I have a table with two columns. The first is called 'animal' and lists different types of animals such as cat, dog, bird etc. The second column 'num' lists different numbers.
I want to count how many times the word 'cat' appears in the 'animal' column so that I be able to output 'cat' and then it's frequency.
SELECT animal, count(*) as Total FROM mydata GROUP BY animal;
However, this is outputting all animals and their frequencies, but I would just like to to limit my output to cat and its frequency only.
When I try to do:
SELECT count(animal) FROM mydata WHERE animal = 'cat';
I am given an output that says the total is 0.
This should satisfy your requirements.
SELECT 'cat' as animal, count(*) as Total FROM mydata where animal = 'cat';

How to pass dictionary into query constraint?

If this is the dictionary of constraint:
dictName:`region`Code;
dictValue:(`NJ`NY;`EEE213);
dict:dictName!dictValue;
I would like to pass the dict to a function and depending on how many keys there are and let the query react accordingly. If there is one key region, then I would like to put it as
select from table where region in dict`region;
The same thing is for code. But if I pass two keys, I would like the query knows and pass it as:
select form table where region in dict`region,Code in dict`code;
Is there any way to do this?
I came up this code:
funcForOne:{[constraint]?[`bce;enlist(in;constraint;(`dict;enlist constraint));0b;()]};
funcForAll[]
{[dict]$[(null dict)~1;select from bce;($[(count key dict)=1;($[`region in (key dict);funcForOne[`region];funcForOne[`Code]]);select from bce where region in dict`region,rxmCode in dict`Code])]};
It works for one and two constraint. but when I called funcForAll[] it gives type error. How should I change it? i think it is from null dict~1
I tried count too. but doesn't work too well.
Update
So I did this but I have some error
tab:([]code:`B90056`B90057`B90058`B90059;region:`CA`NY`NJ`CA);
dictKey:`region`Code;dictValue:(`NJ`NY;`B90057);
dict:dictKey!dictValue;
?[tab;f dict;0b;()];
and I got 'NY error. Do you know why? Also,if I pass a null dictionary it doesn't seem working.
As I said funtional form would be the better approach but if your requirement is very limited as you said then you can consider other solution as below:
Note: Assuming all dictionary keys will be in table columns list.
q) f:{[dict] if[0=count dict;:select from t];
select from t where (#[key dict;t]) in {$[any 0<=type each value x;flip ;enlist ]x}[dict] }
Explanation:
1. convert dict to table depending on the values type. Flip if any value is a general list else enlist.
$[any 0<=type each value dict;flip ;enlist ]dict
Get subset of table t which consists only of dictionary keys as columns.
#[key dict;t]
get rows where (2) in (1)
Basically we are using below form of querying and matching:
q)t1:([]id:1 2;s:`a`b);
q)t2:([]id:1 3 ;s:`a`b);
q)select from t1 where ([]id;s) in t2
If you're just using in, you can do something like:
f:{{[x;y](in),'key[y],'(),x}[;x]enlist each value[x]}
So that:
q)d
a| 10 1
b| ,`a
q)f d
in `a 10 1
in `b ,`a
q)t
a b c
------
1 a 10
2 b 20
3 c 30
q)?[t;f d;0b;()]
a b c
------
1 a 10
Note that because of the enlist each the resulting list is enlisted so that singletons work too:
q)d:enlist[`a]!enlist 1
q)d
a| 1
q)?[t;f d;0b;()]
a b c
------
1 a 10
Update to secondary question
This still works with empty dict, i.e. ()!(). I'm passing in the dictionary variable.
In your 2nd question your dictionary is not constructed correctly (also remember q is case sensitive). Also your values need to be enlisted. Look up functional select in the reference pages on the kx site, you'll see that you need to enlist the symbol lists to differentiate them from column name declarations
`region`code!(enlist `NY`NJ;enlist `B90057)