compare values of two string columns of a table in q kdb - kdb

We have a table:
q)t:([] a:("abc";"def";"ghi";"lmn"); b:("abc";"xyz";"ghi";"def"); c:1 2 3 4)
q)t
a b c
-------------
"abc" "abc" 1
"def" "xyz" 2
"ghi" "ghi" 3
"lmn" "def" 4
Expected output: Match column a and column b row wise and update mu column accordingly
a b c mu
---------------------
"abc" "abc" 1 match
"def" "xyz" 2 unmatch
"ghi" "ghi" 3 match
"lmn" "def" 4 unmatch
When I run below query it's failing, output is
q)select a,b,c, mu:?[any a like/: b; `match; `unmatch] from t
a b c mu
---------------------
"abc" "abc" 1 match
"def" "xyz" 2 match // Issue here, since match a value of column a in all value of b
"ghi" "ghi" 3 match
"lmn" "def" 4 unmatch

If you are looking to do row-wise match then you can use match operator (~) and each-both operator(').
q) update mu:?[a~'b;`match;`unmatch] from t

Related

Get additional column using functional select

How to get an additional column of type string using ??
I tried this:
t:([]c1:`a`b`c;c2:1 2 3)
?[t;();0b;`c1`c2`c3!(`c1;`c2;10)] / ok
?[t;();0b;`c1`c2`c3!(`c1;`c2;enlist(`abc))] / ok
?[t;();0b;`c1`c2`c3!(`c1;`c2;"10")] / 'length
?[t;();0b;`c1`c2`c3!(`c1;`c2;enlist("10"))] / 'length
but got 'length error.
Your first case works because an atom will automatically expand to the required length. For a compound column you'll need to explicitly generate the correct length as follows
q)select c1,c2,c3:`abc,c4:10,c5:count[i]#enlist"abc" from t
c1 c2 c3 c4 c5
------------------
a 1 abc 10 "abc"
b 2 abc 10 "abc"
c 3 abc 10 "abc"
// in functional form
q)?[t;();0b;`c1`c2`c3!(`c1;`c2;(#;(count;`i);(enlist;"abc")))]
c1 c2 c3
-----------
a 1 "abc"
b 2 "abc"
c 3 "abc"
Jason

KDB: String comparison with a table

I have a table bb:
bb:([]key1: 0 1 2 1 7; col1: 1 2 3 4 5; col2: 5 4 3 2 1; col3:("11";"22" ;"33" ;"44"; "55"))
How do I do a relational comparison of string? Say I want to get records with col3 less than or equal to "33"
select from bb where col3 <= "33"
Expected result:
key1 col1 col2 col3
0 1 5 11
1 2 4 22
2 3 3 33
If you want col3 to remain of string type, then just cast temporarily within the qsql query?
q)select from bb where ("J"$col3) <= 33
key1 col1 col2 col3
-------------------
0 1 5 "11"
1 2 4 "22"
2 3 3 "33"
If you are looking for classical string comparison, regardless to if string is number or not, I would propose the next approach:
a. Create methods which behave similar to common Java Comparators. Which returns 0 when strings are equal, -1 when first string is less than second one, and 1 when first is greater than the second
.utils.compare: {$[x~y;0;$[x~first asc (x;y);-1;1]]};
.utils.less: {-1=.utils.compare[x;y]};
.utils.lessOrEq: {0>=.utils.compare[x;y]};
.utils.greater: {1=.utils.compare[x;y]};
.utils.greaterOrEq: {0<=.utils.compare[x;y]};
b. Use them in where clause
bb:([]key1: 0 1 2 1 7;
col1: 1 2 3 4 5;
col2: 5 4 3 2 1;
col3:("11";"22" ;"33" ;"44"; "55"));
select from bb where .utils.greaterOrEq["33"]'[col3]
c. As you see below, this works for arbitrary strings
cc:([]key1: 0 1 2 1 7;
col1: 1 2 3 4 5;
col2: 5 4 3 2 1;
col3:("abc" ;"def" ;"tyu"; "55poi"; "gab"));
select from cc where .utils.greaterOrEq["ffff"]'[col3]
.utils.compare could also be written in vector form, though, I'm not sure if it will be more time/memory efficient
.utils.compareVector: {
?[x~'y;0;?[x~'first each asc each(enlist each x),'enlist each y;-1;1]]
};
one way would be to evaluate the strings before comparison:
q)bb:([]key1: 0 1 2 1 7; col1: 1 2 3 4 5; col2: 5 4 3 2 1; col3:("11";"22" ;"33" ;"44"; "55"))
q)bb
key1 col1 col2 col3
-------------------
0 1 5 "11"
1 2 4 "22"
2 3 3 "33"
1 4 2 "44"
7 5 1 "55"
q)
q)
q)select from bb where 33>=value each col3
key1 col1 col2 col3
-------------------
0 1 5 "11"
1 2 4 "22"
2 3 3 "33"
in this case value each returns the strings values as integers and then performs the comparison

How can I count the null entries by column in a kdb q table?

Given a table that contains a number of null entries how can I create a summary table that describes the number of nulls per column? Can this be done on a general table where the number of columns and column names are not known beforehand?
q)t: ([] a: 1 2 3 4; b: (2018.10.08; 0Nd; 2018.10.08; 2018.10.08); c: (0N;0N;30;40); d: `abc`def``jkl)
q)t
a b c d
-------------------
1 2018.10.08 abc
2 def
3 2018.10.08 30
4 2018.10.08 40 jkl
Expected result:
columnName nullCount
--------------------
a 0
b 1
c 2
d 1
While sum null t is the simplest solution in this example, it doesn't handle string (or nested) columns. To handle string or nested columns for example you would need something like
q)t: ([] a: 1 2 3 4; b: (2018.10.08; 0Nd; 2018.10.08; 2018.10.08); c: (0N;0N;30;40); d: `abc`def``jkl;e:("aa";"bb";"";()," "))
q){sum$[0h=type x;0=count#'x;null x]}each flip t
a| 0
b| 1
c| 2
d| 1
e| 1
You can make such a table using
q)flip `columnName`nullCount!(key;value)#\:sum null t
columnName nullCount
--------------------
a 0
b 1
c 2
d 1
where sum null t gives a dictionary of the null values in each column
q)sum null t
a| 0
b| 1
c| 2
d| 1
and we apply the column names as keys and flip to a table.
To produce a table with the columns as the headers and number of nulls and the values you can use:
q)tab:enlist sum null t
Which enlists a dictionary with the number of nulls as the values and the columns names as keys:
a b c d
-------
0 1 2 1
If you then wanted this in your given format you could then use:
result:([]columnNames:cols tab; nullCount:raze value each tab)

Evaluate Values From Multiple Rows As Part of Aggregate or Window Function

I need to find a way to tell if a column has two specific values within a grouped/partitioned section. Easiest to describe by example. I have table "foo" with the following data:
ID | Indicator
1 | A
1 | B
1 | B
2 | C
2 | B
3 | A
3 | B
3 | B
3 | C
4 | A
4 | C
For my output I want a result of "A" if one of the rows in the group has Indicator "A". If not, then "C" if one of the rows Indicator is "C". But in the case where the group has an Indicator of "A" and an Indicator of "C" I want a result of "X" for the group. Given the data I want the following result:
ID | Result
1 | A
2 | C
3 | X
4 | X
The result of A or C (ID 1 and 2 in the example) can be done using a partition and windows function this way:
SELECT DISTINCT ID,
priority_indicator
FROM (SELECT ID,
first_value(Indicator) OVER
(PARTITION BY ID
ORDER BY
CASE
WHEN Indicator = 'A' THEN
1
WHEN Indicator = 'C' THEN
2
ELSE
3
END
) priority_indicator
FROM foo) a
How would you look at the values in multiple rows at once to return an "X" when there's both an "A" and a "C" in the Indicator?
--test data
WITH foo(id,indicator) AS ( VALUES
(1,'A'),
(1,'B'),
(1,'B'),
(2,'C'),
(2,'B'),
(3,'A'),
(3,'B'),
(3,'B'),
(3,'C'),
(4,'A'),
(4,'C')
),
-- get all entries for each Id in indicator_set
agg AS (
SELECT id,array_agg(DISTINCT(indicator)) AS indicator_set FROM foo
GROUP BY id
)
-- actual query
SELECT id,
CASE
WHEN indicator_set #> '{A,C}' THEN 'X'
WHEN indicator_set #> '{A}' THEN 'A'
WHEN indicator_set #> '{C}' THEN 'C'
END result
FROM agg;
Output:
id | result
----+--------
1 | A
2 | C
3 | X
4 | X
(4 rows)

KDB: select first n rows from each group

How can I extract the first n rows from each group? For example: for table
bb: ([]sym:(4#`a),(5#`b);val: til 9)
sym val
-------------
a 0
a 1
a 2
a 3
b 4
b 5
b 6
b 7
b 8
How can I select the first 2 rows of each group by sym?
Thanks
Can use fby:
q)select from bb where ({x in 2#x};i) fby sym
sym val
-------
a 0
a 1
b 4
b 5
You can try this:
q)select from t where i in raze exec 2#i by sym from t
sym val
-------
a 0
a 1
b 4
b 5