kdb q - apply each-left for each atom in list and reduce - kdb

I would like to apply each-left between a column of a table and each atom in a list. I cannot use each-both because the table column and the list are not of same length.
I have seen this done in one line somewhere already but I can't find it anymore..
Example:
t:([] name:("jim";"john";"john";"julia");c1: til 4);
searchNames:("jim";"john");
f:{[name;nameCol] nameCol like\:name}; / each-left between name (e.g. "jim") and column
g:f[;t[`name]];
r:g each searchNames; / result: (1000b;0110b)
filter:|/[r]; / result: 1110b
select from t where filter
How can I do that more q-like?

If you wish to use like with each-right /::
q)select from t where any name like/:searchNames
name c1
---------
"jim" 0
"john" 1
"john" 2
In this case you can simply use in as you are not using any wildcards:
q)select from t where name in searchNames
name c1
---------
"jim" 0
"john" 1
"john" 2

Below is a generic function you could use, given two lists of different sizes.
q)f:{(|) over x like/:y}
q)
q)select from t where f[name;searchNames]
name c1
---------
"jim" 0
"john" 1
"john" 2
Or, wrapping it up in a single function (assuming always searching a table column):
q)f2:{x where (|) over (0!x)[y] like/:z}
q)
q)f2[t;`name;searchNames]
name c1
---------
"jim" 0
"john" 1
"john" 2
But in the scenario you describe, Thomas' solution seems the most natural.

Related

KDB How to update column values

I have a table which has column of symbol type like below.
Name
Value
First
TP_RTD_FRV
Second
RF_QWE_FRV
Third
KF_FRV_POL
I need to update it as below, wherever I have FRV, I need to replace it with AB_FRV. How to achieve this?
Name
Value
First
TP_RTD_AB_FRV
Second
RF_QWE_AB_FRV
Third
KF_AB_FRV_POL
q)t
name v
---------------
0 TP_RTD_FRV
1 RF_QWE_FRV
2 KF_FRV_POL
3 THIS
4 THAT
q)update `$ssr[;"FRV";"AB_FRV"]each string v from t
name v
------------------
0 TP_RTD_AB_FRV
1 RF_QWE_AB_FRV
2 KF_AB_FRV_POL
3 THIS
4 THAT
or without using qSQL
q)#[t;`v;]{`$ssr[;"FRV";"AB_FRV"]each string x}
name v
------------------
0 TP_RTD_AB_FRV
1 RF_QWE_AB_FRV
2 KF_AB_FRV_POL
3 THIS
4 THAT
Depending on the uniqueness of the data, you might benefit from .Q.fu
q)t:1000000#t
q)\t #[t;`v;]{`$ssr[;"FRV";"AB_FRV"]each string x}
2343
q)\t #[t;`v;].Q.fu {`$ssr[;"FRV";"AB_FRV"]each string x}
10

KDB: how to compare strings?

I have a column of type C. How do I compare the value to the previous value in the same column? I did col1 like prev col1 but it returns a Correction hint: length error. I also created another column newCol: prev col1 but still cannot perform the comparison. I also tried with = and no luck. How can I do this?
a sample data:
col1
Paris
London
London
New York
Singapore
Ha Noi
Could you use the prior keyword?
q)t
col1
-----------
"Paris"
"London"
"London"
"Ney York"
"Singapore"
"Ha Noi"
q)select (~) prior col1 from t
col1
----
0
0
1
0
0
0
When comparing strings, if they are the same length it will check that each character in each slot of the array is the same, and return a list of booleans to tell you where the strings are the same. If the strings are two different lengths, you will get a length error. If you want to test if two strings are the exact same thing, you can use ~, which will work regardless of the length of the string and give you a single boolean telling you if they are the same.
Use each prior:https://code.kx.com/q/ref/maps/#each-prior
With match: https://code.kx.com/q/basics/comparison/#match
q)tab:([]col1:("Paris";"London";"London";"New York"))
q)select col1,compare:(~':)col1 from tab
col1 compare
------------------
"Paris" 0
"London" 0
"London" 1
"New York" 0
You should use like' instead of like, because you are comparing not to single value, but to list.
update comparison: col1 like' prev col1 from
([]col1:("Paris";"London";"London";"New York";"Singapore";"Ha Noi"))
Although this is essentially the same as Matthews and jomahony's answers, the differ keyword can arguably make it easier to read/understand:
q)select not differ col1 from ([]col1:("Paris";"London";"London";"New York"))
col1
----
0
0
1
0

Complex list of list of empty characters in a column in kdb

I have a joined table which consists of list of list of characters.
q)t:([] a:`c`d; b:("";"fill"));
q)s:([] a:`b`c`c; b:("";"";""))
q)select from t lj select b by a from s
Output:
a b
---------
c ("";"") / This is the culprit want to replace it with null character
d "fill"
The output of join consists of a list of list of empty characters.
I want to replace that with empty character.
Expected output:
a b
---------
c ""
d "fill"
Tried: Few Unsuccessful attempts
q)update b:?[null in b;raze b;b]from select from t lj select b by a from s
q)update b:?["" in b;raze b;b]from select from t lj select b by a from s
To replace a list of list of empty strings with empty string, you can try below query:
q) select from t lj select (b;"")all""~/:b by a from s
Output:
a b
--------
c ""
d "fill"
Explanation:
Basically, empty strings list is coming from group command on the right table. So during the grouping stage, we can match if all the items in a grouped list (b column values) for particular a value are an empty string. And if they are just replacing them with a single empty string.
q) select (b;"")all""~/:b by a from s
a| b
-| --
b| ""
c| ""
For a = c , b grouped values are ("";""). Lets break down the command:
q) b:("";"")
q) ""~/:b / output 11b
q) all ""~/:b / output 1b
q)(b;"") all ""~/:b / output ""
The last command is list indexing. If the return from the previous command is 1b which means all items are empty strings, then return "" else return actual b.
Edit:
Based on the discussion in the comment section of TerryLynch's answer, it looks like your requirement is:
if all values of b list after grouping are empty strings then return a single empty string.
if values of b are a mixture of empty strings and non-empty strings, then remove all empty strings.
For that, you could use the below query:
q) select from t lj select b:raze ("";b except enlist "") by a from s
But that would result in different types for different values in b column. An empty string will be 10h and all non-empty string list will be 0h.
For consistent type, can use below query which returns enlist"" instead of "" but that will not be an empty string:
q) select from t lj select b:{(c;enlist "")()~c:x except enlist ""}b by a from s
Instead of trying to fix the adverse outcome I think you need to decide what you want to do with the duplicate c rows in the s table. You're grouping by the a column but it has duplicates so how should it behave .... should it take the first value, should it take the last value? Should it append the two strings together? If you solve that then you avoid this problem, for example:
q)t lj select last b by a from s
a b
--------
c ""
d "fill"
An alternative solution would be to simply raze all the results of b together. Less where clauses in use and less match (~) operations.
q)update raze'/[b] from (t lj select b by a from s)
a b
--------
c ""
d "fill"
Here I've used over to account for more an unknown level of enlistment, as a precaustion, and then applied it to each row from the lj. For your case, an even faster solution would be
update raze each b from (t lj select b by a from s)
This will give different results than Rahuls answers
q)update raze each b from (t lj select b by a from s)
a b
--------
c "str"
d "fill"
q) select from t lj select (b;"")all""~/:b by a from s
a b
------------
c ("";"str")
d "fill"
q)update raze each b from (t lj select b by a from s)
a b
--------
c "str"
d "fill"

q/kdb Selecting a variable in query

q)sym:`a`b`c
q)t:([] s:`g`v; p:2?10.)
Selecting the variable sym works fine in the following query :
q)select sym from t
However it throws an error while selecting with a table column, I am not able to figure out the reason
q)select sym, p from t
You get a 'length error because the lists sym and p (column from t) are different lengths.
q)sym:`a`b
q)select sym,p from t
sym p
------------
a 3.927524
b 5.170911
What is the output you are trying to get to with this?
Assuming you are trying to select as many elements of sym as the table count :
q)select p,(count i)#sym from t
p sym
------------
1.780839 a
3.017723 b

Get substring into a new column

I have a table that contains a column that has data in the following format - lets call the column "title" and the table "s"
title
ab.123
ab.321
cde.456
cde.654
fghi.789
fghi.987
I am trying to get a unique list of the characters that come before the "." so that i end up with this:
ab
cde
fghi
I have tried selecting the initial column into a table then trying to do an update to create a new column that is the position of the dot using "ss".
something like this:
t: select title from s
update thedot: (title ss `.)[0] from t
i was then going to try and do a 3rd column that would be "N" number of characters from "title" where N is the value stored in "thedot" column.
All i get when i try the update is a "type" error.
Any ideas? I am very new to kdb so no doubt doing something simple in a very silly way.
the reason why you get the type error is because ss only works on string type, not symbol. Plus ss is not vector based function so you need to combine it with each '.
q)update thedot:string[title] ss' "." from t
title thedot
---------------
ab.123 2
ab.321 2
cde.456 3
cde.654 3
fghi.789 4
There are a few ways to solve your problem:
q)select distinct(`$"." vs' string title)[;0] from t
x
----
ab
cde
fghi
q)select distinct(` vs' title)[;0] from t
x
----
ab
cde
fghi
You can read here for more info: http://code.kx.com/q/ref/casting/#vs
An alternative is to make use of the 0: operator, to parse around the "." delimiter. This operator is especially useful if you have a fixed number of 'columns' like in a csv file. In this case where there is a fixed number of columns and we only want the first, a list of distinct characters before the "." can be returned with:
exec distinct raze("S ";".")0:string title from t
`ab`cde`fghi
OR:
distinct raze("S ";".")0:string t`title
`ab`cde`fghi
Where "S " defines the types of each column and "." is the record delimiter. For records with differing number of columns it would be better to use the vs operator.
A variation of WooiKent's answer using each-right (/:) :
q)exec distinct (` vs/:x)[;0] from t
`ab`cde`fghi