Splitting a list of strings using cut - KDB - kdb

For the following list:
q)a:("ua#1100#1";"sba#2220#2";"r#4444#a")
I want following output :
("1100#1";"2220#2";"4444#a")
? gives first index of #
q)(a?\:"#")
2 3 1`
but using cut does not give the desired result :
q)(a?\:"#")cut'a
(("ua";"#1";"10";"0#";"1");("sba";"#22";"20#";"2");("r";"#";"4";"4";"4";"4";"#";"a"))`

You can also parse the data rather than drop chars from each string.
It'll be somewhat more efficient if your dataset is large.
q)("J#*"0:/:a)[;1]
"1100#1"
"2220#2"
"4444#a"
Notice I've set the 'key' to 'J' which will result in nulls in your example case, but you only care about the values anyway.
If you can join (sv) the strings together, it'll be even better too
q)last "J#;"0:";" sv a
"1100#1"
"2220#2"
"4444#a"
HTH,
Sean

When the left argument of cut is atom , cut behaves differently than _.
q)2 cut 2 3 4 5 6
(2 3;4 5;,6)
q)2 _ 2 3 4 5 6
4 5 6
Use _ to cut the string
q)(1+a?\:"#")_'a
("1100#1";"2220#2";"4444#a")
or
q)"#"sv/:1_/:"#" vs/:a
("1100#1";"2220#2";"4444#a")

Related

How do I convert a dictionary of dictionaries into a table?

I've got a dictionary of dictionaries:
`1`2!((`a`b`c!(1 2 3));(`a`b`c!(4 5 6)))
| a b c
-| -----
1| 1 2 3
2| 4 5 6
I'm trying to work out how to turn it into a table that looks like:
1 a 1
1 b 2
1 c 3
2 a 4
2 b 5
2 c 6
What's the easiest/'right' way to achieve this in KDB?
Not sure if this is the shortest or best way, but my solution is:
ungroup flip`c1`c2`c3!
{(key x;value key each x;value value each x)}
`1`2!((`a`b`c!(1 2 3));(`a`b`c!(4 5 6)))
Which gives expected table with column names c1, c2, c3
What you're essentially trying to do is to "unpivot" - see the official pivot page here: https://code.kx.com/q/kb/pivoting-tables/
Unfortunately that page doesn't give a function for unpivoting as it isn't trivial and it's hard to have a general solution for it, but if you search the Kx/K4/community archives for "unpivot" you'll find some examples of unpivot functions, for example this one from Aaron Davies:
unpiv:{[t;k;p;v;f] ?[raze?[t;();0b;{x!x}k],'/:(f C){![z;();0b;x!enlist each (),y]}[p]'v xcol't{?[x;();0b;y!y,:()]}/:C:(cols t)except k;enlist(not;(.q.each;.q.all;(null;v)));0b;()]};
Using this, your problem (after a little tweak to the input) becomes:
q)t:([]k:`1`2)!((`a`b`c!(1 2 3));(`a`b`c!(4 5 6)));
q)`k xasc unpiv[t;1#`k;1#`p;`v;::]
k v p
-----
1 1 a
1 2 b
1 3 c
2 4 a
2 5 b
2 6 c
This solution is probably more complicated than it needs to be for your use case as it tries to solve for the general case of unpivoting.
Just an update to this, I solved this problem a different way to the selected answer.
In the end, I:
Converted each row into a table with one row in it and all the columns I needed.
Joined all the tables together.

Applying adverb to colon operator

Please help me with colon : operator, I'm stuck on how it works. It works as an assignment, assignment through x+:1, global assignment/view ::, I/O 0:, 1:, to return value from the middle of the function :r, and to get an unary form of operator #:.
But what happend if one apply an adverb to it? I tried this way:
$ q
KDB+ 3.6 2019.04.02 Copyright (C) 1993-2019 Kx Systems
q)(+')[100;2 3 4]
102 103 104
q)(:')[x;2 3 4]
'x
[0] (:')[x;2 3 4]
^
q)(:')[100;2 3 4]
2 3 4
I expect evaluations in order: x:2, then x:3, then x:4. To get x:4 as a result. But I've got an error. And also :' works with a number 100 for some unknown reason.
What :' is actually doing?
q)parse "(:')[100;2 3 4]"
(';:)
100
2 3 4
Parsing didn't shed much light to me, so I'm asking for your help.
When modified by an iterator (also known as an adverb in q speak), : behaves just like any other binary operator. In your example
q)(:')[100;2 3 4]
2 3 4
an atom 100 is extended to a conformant list 100 100 100 and then : is applied to elements of the two lists pairwise. The final result is returned. It might look confusing (: tries to modify a constant value, really?) but if you compare this to any other binary operator and notice that they never modify their operands but return a result of expression everything should click into place.
For example, compare
q)+'[100; 2 3 4]
102 103 104
and
q)(:')[100;2 3 4]
2 3 4
In both cases an a temporary vector 100 100 100 is created implicitly and an operator is applied to it and 2 3 4. So the former is semantically equivalent to
(t[0]+2;t[1]+2;t[2]+4)
and the latter to
(t[0]:2;t[1]:2;t[2]:4)
where t is that temporary vector.
This explains why (:')[x;2 3 4] gives an error -- if x doesn't exist kdb can't extend it to a list.

Understanding how to read each-right and each-left combined in kdb

From q for mortals, i'm struggling to understand how to read this, and understand it logically.
1 2 3,/:\:10 20
I understand the result is a cross product when in full form: raze 1 2 3,/:\:10 20.
But reading from left to right, I'm currently lost at understanding what this yields (in my head)
\:10 20
combined with 1 2 3,/: ??
Help in understanding how to read this clearly (in words or clear logic) would be appreciated.
I found myself saying the following in my head whilst I program the syntax in q. q works from right to left.
Internal Monologue -> Join the string on the right onto each of the strings on the left
code -> "ABC",\:"-D"
result -> "A-D"
"B-D"
"C-D"
I think that's an easy way to understand it. 'join' can be replaced with whatever...
Internal Monologue -> Does the string on the right match any of the strings on the left
code -> ("Cat";"Dog";"CAT";"dog")~\:"CAT"
result -> 0010b
Each-right is the same concept and combining them is straightforward also;
Internal Monologue -> Does each of the strings on the right match each of the strings on the left
code -> ("Cat";"Dog";"CAT";"dog")~\:/:("CAT";"Dog")
result -> 0010b
0100b
So in your example 1 2 3,/:\:10 20 - you're saying 'Join each of the elements on the right to each of the elements on the left'
Hope this helps!!
EDIT To add a real world example.... - consider the following table
q)show tab:([] upper syms:10?`2; names:10?("Robert";"John";"Peter";"Jenny"); amount:10?til 10)
syms names amount
--------------------
CF "Peter" 8
BP "Robert" 1
IC "John" 9
IN "John" 5
NM "Peter" 4
OJ "Jenny" 6
BJ "Robert" 6
KH "John" 1
HJ "Peter" 8
LH "John" 5
q)
I you want to get all records where the name is Robert, you can do; select from tab where names like "Robert"
But if you want to get the results where the name is either Robert or John, then it is a perfect scenario to use our each-left and each-right.
Consider the names column - it's a list of strings (a list where each element is a list of chars). What we want to ask is 'does any of the strings in the names column match any of the strings we want to find'... that translates to (namesList)~\:/:(list;of;names;to;find). Here's the steps;
q)(tab`names)~\:/:("Robert";"John")
0100001000b
0011000101b
From that result we want a compiled list of booleans where each element is true of it is true for Robert OR John - for example, if you look at index 1 of both lists, it's 1b for Robert and 0b for John - in our result, the value at index 1 should be 1b. Index 2 should be 1b, index3 should be 1b, index4 should be 0b etc... To do this, we can apply the any function (or max or sum!). The result is then;
q)any(tab`names)~\:/:("Robert";"John")
0111001101b
Putting it all together, we get;
q)select from tab where any names~\:/:("Robert";"John")
syms names amount
--------------------
BP "Robert" 1
IC "John" 9
IN "John" 5
BJ "Robert" 6
KH "John" 1
LH "John" 5
q)
Firstly, q is executed (and hence generally read) right to left. This means that it's interpreting the \: as a modifier to be applied to the previous function, which itself is a simple join modified by the /: adverb. So the way to read this is "Apply join each-right to each of the left-hand arguments."
In this case, you're applying the two adverbs to the join - \:10 20 on its own has no real meaning here.
I find it helpful to also look at the converse case 1 2 3,\:/:10 20, running that code produces a 2x6 matrix, which I'd describe more like "apply join each-left to each of the right hand arguments" ... I hope that makes sense.
An alternative syntax which also might help is ,/:\:[1 2 3;10 20] - this might be useful as it makes it very clear what the function you're applying is, and is equivalent to your in-place notation.

create a new matrix from values obtained iterating through other matricies

In Matlab I have 4 matricies which are all 1(row) by 4(coloumns) (ABDC, EFGH, IJKL, MNOP)
Their names are also stored in a list
Stock_List2 = {'ABCD' 'EFGH' 'IJKL' 'MNOP'} and is a 1 by 4 cell.
I want to iterate through the list and create a new matrix called "display" which takes the values of the indvidual matricies and places them underneath each other)
I am trying something like
for e = 1:length(Stock_List2)
display(e) = eval(strcat(Stock_List2)(e))
end
Error: ()-indexing must appear last in an index expression.
However getting the following error expression which truthfully may well just be that I'm way off the mark.
As an example if the orginal matricies are as follows:
ABCD 1 2 3 4
DEFG 5 6 7 8
HIJK 9 8 7 6
LMNO 5 4 3 2
I would like the final output ie the 'display matrix to be a 4 by 4 matrix looking like
display
1 2 3 4
5 6 7 8
9 8 7 6
5 4 3 2
If I understood right you want to concatenate vertically the matrices ABDC, EFGH, IJKL and MNOP saving them in the matrix "display".
You could do:
display = [ABDC; EFGH; IJKL; MNOP]
or:
for i=1:length(Stock_List2)
display(i,:) = Stock_List2{i}
end
Apologies if what I wanted wasnt clear - I've got the following from a colleague which achieves the desired result
for e=1:length(Stock_List2)
eval(strcat('display_mat(e,:) = ',Stock_List2{e}));
end

Reshape (#) doesn't work with a dynamic argument

To form a matrix consisting of identical rows, one could use
x:1 2 3
2 3#x,x
which produces (1 2 3i;1 2 3i) as expected. However, attempting to generalise this thus:
2 (count x)#x,x
produces a type error although the types are equal:
(type 3) ~ type count x
returns 1b. Why doesn't this work?
The following should work.
q)(2;count x)#x,x
1 2 3
1 2 3
If you look at the parse tree of both your statements you can see that the second is evaluated differently. In the second only the result of count is passed as an argument to #.
q)parse"2 3#x,x"
#
2 3
(,;`x;`x)
q)parse"2 (count x)#x,x"
2
(#;(#:;`x);(,;`x;`x))
If you're looking to build matrices with identical rows you might be better off using
rownum#enlist x
q)x:100000?100
q)\ts do[100;v1:5 100000#x,x]
157 5767696j
q)\ts do[100;v2:5#enlist x]
0 992j
q)v1~v2
1b
I for one find this more natural (and its faster!)