kdb: longList#dictionary behavior - kdb

q)(`a`b`c!101 0N 103)~100 101 102 103 104#`a`b`c!1 5 3
1b
I noticed that below two lines are equivalent
100 101 102 103 104#`a`b`c!1 5 3
`a`b`c!100 101 102 103 104#1 5 3
Is it in general true that a#b!c is equivalent to b!a#c?
On the overloaded glyphs page for #, is this usage Index At or Apply At or something else? For x#y is there documentation on the exact behavior when y is a dictionary?

An operation that works on a list will also work on a list that just so happens to be the value of a dictionary (it applies "through" the dictionary and keeps the keys untouched). E.g.:
q)1+`a`b!1 2
a| 2
b| 3
q)2*`a`b!1 2
a| 2
b| 4
Thus an index at would apply the same way:
q)10 20 30#`a`b!1 2
a| 20
b| 30
since it's the same as:
q)10 20 30#1 2
20 30
It's not always true that a#b!c is equivalent to b!a#c if the a#c doesn't make sense. E.g.
q)10 20 30#1.1 2.2
'type
[0] 10 20 30#1.1 2.2
^
q)10 20 30#`a`b!1.1 2.2
'type
[0] 10 20 30#`a`b!1.1 2.2
^
It would only be true if the underlying datatype of the indexes is boolean/short/int/long.
Answered above mainly, it's index at. There's no explicit documentation for when y is a dictionary as it essentially is the same behaviour when y is a list

Related

Applying elements to object vs applying them to object's symbol name

I saw a technique to access directory elements by its symbol rather then its name (see q.k):
`.q `svar`sdev`scov`med / instead of .q `svar`sdev`scov`med
Why and when this approach is useful?
Also for some reason the behaviour is reversed compared to the # apply:
q)l:til 5
q)`l[2 3]: 20 30 / 'assign, `l not changed
/ (upd: `l is not just a symbol here, it is exactly refers to a list: `l[2 3] gets l elements)
q)l[2 3]: 21 31; l / l changed
0 1 21 31 4
But when we use # apply syntax, the result is reversed:
q)#[l;2 3;:;22 32]; l / l not changed
0 1 21 31 4
q)#[`l;2 3;:;23 33]; l / `l changed
0 1 23 33 4
upd:
Applying indexes to a symbol does no work in shakti, looks like this idea hadn't withstand the test of time.
By directory, I think you mean dictionary as this is the data structure which is returned when you call .q.
You can access dictionary elements a number of ways:
q)d:`a`b`c!1 2 3
q)d
a| 1
b| 2
c| 3
q)d`a
1
q)`d `a
1
q)d[`a]
1
q)#[d;`a]
1
q)#[`d;`a]
1
q) / etc ...
which are all syntactic sugar for each other, it just depends which you prefer (or what the situation dictates is better).
In the code below,
q)`l[2 3]:20 30
'assign
[0] `l[2 3]:20 30
`l is simply the symbol `l, not a reference to the list l, which is why you get an assign error.
The # operator is slightly different,
q)#[l;0;:;20]
20 1 2 3 4
q)l
0 1 2 3 4
q) / -vs-
q)#[`l;0;:;20]
`l
q)l
20 1 2 3 4
adding the backtick to l is telling q that you want to update l not just apply the operation to the list and return the result.

kdb union join (with plus join)

I have been stuck on this for a while now, but cannot come up with a solution, any help would be appriciated
I have 2 table like
q)x
a b c d
--------
1 x 10 1
2 y 20 1
3 z 30 1
q)y
a b| c d
---| ----
1 x| 1 10
3 h| 2 20
Would like to sum the common columns and append the new ones. Expected result should be
a b c d
--------
1 x 11 11
2 y 20 1
3 z 30 1
3 h 2 20
pj looks to only update the (1,x) but doesn't insert the new (3,h). I am assuming there has to be a way to do some sort of union+plus join in kdb
You can take advantage of the plus (+) operator here by simply keying x and adding the table y to get the desired table:
q)(2!x)+y
a b| c d
---| -----
1 x| 11 11
2 y| 20 1
3 z| 30 1
3 h| 2 20
The same "plus if there's a matching key, insert if not" behaviour works for dictionaries too:
q)(`a`b!1 2)+`a`c!10 30
a| 11
b| 2
c| 30
got it :)
q) (x pj y), 0!select from y where not ([]a;b) in key 2!x
a b c d
--------
1 x 11 11
2 y 20 1
3 z 30 1
3 h 2 20
Always open for a better implementation :D I am sure there is one.

Distribute elements of one list over elements of another list

I have two lists:
l1:`a`b`c;
l2: til 20;
I am trying to create a dictionary 'd' that contains the elements of 'l1' as key and the elements of 'l2' evenly distributed over it. So like this:
d:(`a`b`c)!(0j, 3j, 6j, 9j, 12j, 15j, 18j;1j, 4j, 7j, 10j, 13j, 16j, 19j;2j, 5j, 8j, 11j, 14j, 17j)
The order of the elements is not relevant, I just need them balanced. I was able to achieve that in an iterative way (happy to add the code, if that's considered helpful), but there must be a more elegant way (potentially with adverbs?).
It can be done using the group :
q)group (count[l2]#l1)
(`a`b`c)!(0j, 3j, 6j, 9j, 12j, 15j, 18j;1j, 4j, 7j, 10j, 13j, 16j, 19j;2j, 5j, 8j, 11j, 14j, 17j)
If your l2 is something else instead of til 20 , then you have to lookup the items back after grouping :
q)l2: 20#.Q.a
q)l2
"abcdefghijklmnopqrst"
q)l2 group (count[l2]#l1) // lookup the items back from l2 after grouping
(`a`b`c)!("adgjmps";"behknqt";"cfilor")
You can use the reshape functionality of the take operator #. It takes two arguments: a LHS of at least 2 dimensions and the list to reshape.
For example (3;4)#til 12 will reshape the list 0 1 ... 12 into a 3 by 4 matrix
In our case, the number of the number of elements in l1 will will not necessary divide exactly into the number of elements in l2 (we don't want a rectangular matrix). Instead we can supply a null as the second dimension which will take care of distributing the remainders.
q) l1!(count[l1];0N)#l2
a| 0 1 2 3 4 5
b| 6 7 8 9 10 11 12
c| 13 14 15 16 17 18 19
This method performs very well for larger input lists.
As a side note, when using .Q.fc to split a vector argument over n slaves for multi-threading, kdb uses the # operator to reshape the vector into n vectors, one for each slave.
q)d:`a`b`c!{a where x = (a:til 20) mod y}'[til 3;3]
q)d
a| 0 3 6 9 12 15 18
b| 1 4 7 10 13 16 19
c| 2 5 8 11 14 17

RISC V manual confusion: instruction format VS immediate format

I have some question related the RISC V manual
It has different types of instruction encoding such as R-type,I-type.
Just like the MIPS encoding.
* R-type
31 25 24 20 19 15 14 12 11 7 6 0
+------------+---------+---------+------+---------+-------------+
| funct7 | rs2 | rs1 |funct3| rd | opcode |
+------------+---------+---------+------+---------+-------------+
* I-type
31 20 19 15 14 12 11 7 6 0
+----------------------+---------+------+---------+-------------+
| imm | rs1 |funct3| rd | opcode |
+----------------------+---------+------+---------+-------------+
* S-type
31 25 24 20 19 15 14 12 11 7 6 0
+------------+---------+---------+------+---------+-------------+
| imm | rs2 | rs1 |funct3| imm | opcode |
+------------+---------+---------+------+---------+-------------+
* U-type
31 11 7 6 0
+---------------------------------------+---------+-------------+
| imm | rd | opcode |
+---------------------------------------+---------+-------------+
But it also have something called immediate format:
such as I-immediate, S-immediate and so on
* I-immediate
31 10 5 4 1 0
+-----------------------------------------+-----------+-------+--+
| <-- 31 | 30:25 | 24:21 |20|
+-----------------------------------------+-----------+-------+--+
* S-immediate
31 10 5 4 1 0
+-----------------------------------------+-----------+-------+--+
| <-- 31 | 30:25 | 11:8 |7 |
+-----------------------------------------+-----------+-------+--+
* B-immediate
31 12 11 10 5 4 1 0
+--------------------------------------+--+-----------+-------+--+
| <-- 31 |7 | 30:25 | 11:8 |z |
+--------------------------------------+--+-----------+-------+--+
* U-immediate
31 30 20 19 12 11 0
+--+-------------------+---------------+-------------------------+
|31| 30:20 | 19:12 | <-- z |
+--+-------------------+---------------+-------------------------+
* J-immediate
31 20 19 12 11 10 5 4 1 0
+----------------------+---------------+--+-----------+-------+--+
| <-- 31 | 19:12 |20| 30:25 | 24:21 |z |
+----------------------+---------------+--+-----------+-------+--+
According to the manual, it say those immediate is produced by RISC-V instruction but how are the things related?
What is the point to have immediate format?
The 2nd set of diagrams is showing you how the immediate bits are concatenated and sign-extended into a 32-bit integer (so they can work as a source operand for normal 32-bit ALU instructions like addi which need both their inputs to be the same size).
For I-type instructions it's trivial, just arithmetic right-shift the instruction word by 20 bits, because there's only one immediate field, and it's contiguous at the top of the instruction word.
For S-type immediate instructions, there are two separate fields in the instruction word: [31:25] and [11:7], and this shows you that they're in that order, not [11:7, 31:25] and not with any implicit zeros between them.
B-type immediate instructions apparently put bit 7 in front of [30:25], and the low bit is an implicit zero. (So the resulting number is always even). I assume B-type is for branches.
U-type is also interesting, padding the 20-bit immediate with trailing zeros. It's used for lui to create the upper bits of 32-bit constants (with addi supplying the rest). It's not a coincidence that U-type and I-type together have 32 total immediate bits.
To access static data, lui can create the high part of an address while lw can supply the low part directly, instead of using an addi to create the full address in a register. This is typical for RISC ISAs like MIPS and PowerPC as well (see an example on the Godbolt compiler explorer). But unlike most other RISC ISAs, RISC-V has auipc which adds the U-type immediate to the program counter, for efficient PIC without having to load addresses from a GOT (global offset table). (A recent MIPS revision also added an add-to-PC instruction, but for a long time MIPS was quite bad at PIC).
lui can encode any 4k-aligned address, i.e. a page-start address with 4k pages.

KDB '.' operator

The . operator in the simplest form is used to index a list. How would you explain its use in english in this code?
if[x~"last";upd:{[t;x].[t;();,;r::select by sym from x]}]
I also don't understand the empty list and the :: operator in this line, but maybe they will make sense once the . is cleared up.
In plain english I would explain it as:
modify the table t at all () indices by applying the append/comma function with the value r.
First consider a few simpler cases of #:
q)l:3 5 7 9
q)l:1.1 2.2 3.3
q)#[l; 0 2; +; 10]
11.1 2.2 13.3
q)d:`p`o`i!4.4 5.5 6.6
q)#[d; `p`i; -; 10]
p| -5.6
o| 5.5
i| -3.4
As you can see the format is
#[dataStructure; indices; function; y-arg]
means to the dataStructure at indices apply the function with the given y-arguments. Notice for the list l the indices 0 2 meant index both 0 and 2 at the topmost level. There's no way using # to index at depth. e.g. given matrix m:(1 2 3; 4 5 6; 7 8 9) how can we use this format to modify just the values 4 and 6?
q)/ # indexes repeatedly at topmost level
q)/ definitely not what we want
q)#[m;(1;0 2);+;100]
101 102 103
104 105 106
107 108 109
q)/ **. indexes into the data structure**
q).[m;1 2;+;100]
1 2 3
4 5 106
7 8 9
q).[m;(1;0 2);+;100]
1 2 3
104 5 106
7 8 9
Lastly the empty list () is a short way of saying, apply to all indices:
q).[m;();+;100]
101 102 103
104 105 106
107 108 109
. In this case means apply , to t and r. r is global updated on each call and contains last values recieved by sym. :: is assignment to global in most cases.
code.kx.com describe . function in great details