kdb ticker plant: where to find documentation on .u.upd? - kdb

I am aware of this resource. But it does not spell out what parameters .u.upd takes and how to check if it worked.
This statement executes without error, although it does not seem to do anything:
.u.upd[`t;(`$"abc";1;2;3)]
If I define the table beforehand, e.g.
t:([] name:"aaa";a:1;b:2;c:3)
then the above .u.upd still runs without error, and does not change t.

.u.upd has the same function signature as insert (see http://code.kx.com/q/ref/qsql/#insert) in prefix form. In the most simplest case, .u.upd may get defined as insert.
so:
.u.upd[`table;<records>]
For example:
q).u.upd:insert
q)show tbl:([] a:`x`y;b:10 20)
a b
----
x 10
y 20
q).u.upd[`tbl;(`z;30)]
,2
q)show tbl
a b
----
x 10
y 20
z 30
q).u.upd[`tbl;(`a`b`c;1 2 3)]
3 4 5
q)show tbl
a b
----
x 10
y 20
z 30
a 1
b 2
c 3

Documentation including the event sequence, connection diagram etc. for tickerplants can be found here:
http://www.timestored.com/kdb-guides/kdb-tick-data-store
.u.upd[tableName; tableData] accepts two arguments, for inserting data
to a named table. This function will normally be called from a
feedhandler. It takes the tableData, adds a time column if one is
present, inserts it into the in-memory table, appends to the log file
and finally increases the log file counter.

Related

count t vs actual number in kdb select statement

I noticed the following
select (count t)#`test from t
Returns
flip (enlist `x)!enlist enlist `test`test`test
Vs
select 3#`test from t
Which returns
flip (enlist `x)!enlist `test`test`test
Similar with select (sum 1 2)#1 from t vs select(1 + 2)#1 from t etc
Anyone know the reason why key words in the select seems to cause the return to be a table with one row nested list containing x element vs a table with x rows?
It's because kdb recognises count and sum as aggregations and has special treatment for them (it enlists the result).
For example if you were to slightly change the count and sum to lambdas (which kdb won't recognise) you get the other results you expect:
q)select ({count x}t)#`test from t
x
----
test
test
test
q)select ({sum x}1 2)#1 from t
x
-
1
1
1
The reason kdb "recognises" certain common aggregations and auto-enlists them is because otherwise simple selects such as select sum a from tab would give a rank error as the sum returns an atom but a table column must be a list, e.g.
q)select {sum x}a from t
'rank
[0] select {sum x}a from t
^
/versus
q)select sum a from t
a
-
6
There's also a deeper reason which is to do with map/reduce aggregations over database partitions but that's beyond scope for this problem. The list of recognised aggregations is stored in the variable .Q.a0. See also https://code.kx.com/q/basics/qsql/#special-functions

KDB - Is there a limit to the number of functions called at one time when updating tables?

I’ve been running a number of functions to update a table and I keep adding more functions as I wish to update and call other various items. I have not run into any issues yet (currently at 7 functions) but I’m mindful that there may be a limit. I did find that there is a limit of 8 parameters for a single function but nothing noting a limit on the below. If not, great. I wanted to be mindful as I scale up.
updateTable: FuncG FuncF FunE FuncD FuncC FuncB FuncA ::; // max number of functions?
t: updateTable t;
I made a fake update statement with loads of function calls, and it seems like you're fine:
q)t:([]a:1 2 3)
q)f:{x+1}
q)value "update ",(raze 1000#enlist"f "),"a from t"
a
----
1001
1002
1003
One thing you might want to do is make a single function composed from a list of your functions:
q)f:{x+1}
q)g:{2*x}
q)h:{x+1+2}
q)(('[;])/)(f;g;h)
{x+1}{2*x}{x+1+2}
q)composed:(('[;])/)(f;g;h)
q)t:([]a:1 2 3)
q)update composed a from t
a
--
9
11
13
so that you only have a single function in your update statement, and it should scale.

How serializing foreign keyed table works internally in kdb

I have a keyed table(referenced table) linked using foreign key to the referencing table and I serialize both tables using set operator.
q)kt:([sym:`GOOG`AMZN`FB]; px:20 30 40);
q)`:/Users/uts/db/kt set kt
q)t:([] sym:`kt$5?`GOOG`AMZN`FB; vol:5?10000)
q)`:/Users/uts/db/t set t
Then I remove these tables from the memory
q)delete kt,t from `.
Now I deserialize the table t in memory:
t:get `:/Users/uts/db/t
If I do meta t after this it fails, expecting kt as foreign key.
If I print t, as expected it shows index values in column sym of table t.
So, the question arises -
As kdb stores the meta of each table(i.e c,t,f,a) and its corresponding values on disk, how does table t serialization works internally?
How(In which form in binary format) are these values stored in file t.
-rw-r--r-- 1 uts staff 100 Apr 13 23:09 t
tl;dr A foreign key is stored as a vector of 4-byte indices of a key column of a referenced table plus a name of a table a foreign key refers to.
As far as I know kx never documented their file formats, and yet I think some useful information relevant to your question can be deduced right from a q console session.
Let me modify your example a bit to make things simpler.
q)show kt:([sym:`GOOG`AMZN`FB]; px:20 30 40)
sym | px
----| --
GOOG| 20
AMZN| 30
FB | 40
q)show t:([] sym:`kt$`GOOG`GOOG`AMZN`FB`FB)
sym
----
GOOG
GOOG
AMZN
FB
FB
I left only one column - sym - in t because vol is not relevant to the question. Let's save t without any data first:
q)`:/tmp/t set 0#t
`:/tmp/t
q)hcount `:/tmp/t
30
Now we know that it takes 30 bytes to represent t when it's empty. Let's see if there's a pattern when we start adding rows to t:
q){`:/tmp/t set x#t;`cnt`size!(x;hcount[`:/tmp/t] - 30)} each til[11], 100 1000 1000000
cnt size
---------------
0 0
1 4
2 8
3 12
4 16
5 20
6 24
7 28
8 32
9 36
10 40
100 400
1000 4000
1000000 4000000
We can see that adding one row increases the size of t by four bytes. What can these 4 bytes be? Can they be a representation of a symbol itself? No, because if they were and we renamed a sym value in kt it would affect the size of t on disk but it doesn't:
q)update sym:`$50#.Q.a from `kt where sym=`GOOG
`kt
q)1#t
sym
--------------------------------------------------
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
q)`:/tmp/t set 1#t
`:/tmp/t
q)hcount `:/tmp/t
34
Still 34 bytes. I think it should be obvious by now that the 4 bytes is an index, but an index of what? Is it an index of a column which must be called sym exactly? Apparently no, it isn't.
q)kt:`foo xcol kt
q)t
sym
--------------------------------------------------
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
AMZN
FB
FB
There's no column called sym in kt any longer but t hasn't changed at all! We can go even further and change the type of foo (ex sym) in kt:
q)update foo:-1 -2 -3.0 from `kt
`kt
q)t
sym
---
-1
-1
-2
-3
-3
Not only did it change t, it changed its meta too:
q)meta t
c | t f a
---| ------
sym| f kt
q)/ ^------- used to be s
I hope it's clear now that kdb stores a 4-byte index of a key column of a referenced table and a name of a table (but not a key column name!). If a referenced table is missing kdb can't reconstruct the original data and displays the bare index. It a referencing table needs to be sent over the wire then indices are replaced with actual values so that the receiving side can see the real data.

How to select a column containing dot in column name in kdb

I have a table which consists of column named "a.b"
q)t:([]a.b:3?10.0; c:3?10; d:3?`3)
How can we select column a.b and c from table t?
How can we rename column a.b to b?
Is it possible to achieve above two cases without functional select?
Failed attempts:
q)select a.b, c from t
'type
q)?[`t;();0b;enlist (`b`c!`a.b`c)]
'type
q)select b:a.b from t
'type
As others have mentioned, .Q.id t will sanitise table column names if they aren't suitable for qSQL statements or performance in general.
`a.b`c#t
will only work for multiple column selects and
`a.b#t
will return a type error. However, you can get around this by enlisting the single item into the take operator, like so:
q)enlist[`a.b]#t
a.b
---------
4.931835
5.785203
0.8388858
q)(enlist`a.b)#t
a.b
---------
4.931835
5.785203
0.8388858
If you only need the values from a single column another option would be to use indexing, in this case, it would be t[a.b] ` which would return all values from the a.b column.
You could also mix these selection styles like so, but ultimately lose the column name from a.b:
q)select c,t[`a.b] from t
c x
----------
8 4.707883
5 6.346716
4 9.672398
In the query operation the . itself is used for foreign key navigation and it is throwing a type error as it cannot find any table relating to the foreign key it believes you have passed it.
As much as I hate answering any online forum question by refuting the premise, I really must here, do not use periods in column names, it will cause trouble. .Q.id exists to santise column names for a reason.
The primary reason that errors are encountered is that the use of dot notation in qSQL is reserved for the resolution of linked columns. We can see how this is actually working by parsing the query itself
q)parse "select a.b from tab"
?
`tab
()
0b
(,`b)!,`a.b // Here the referencing of a linked column b via a is occuring
// Compared to a normal select
q)parse "select b from tab"
?
`tab
()
0b
(,`b)!,`b
Other issues could crop up depending on future processing, such as q attempting to treat the column names as namespaces or operating on each part of the name with the dot operator.
Using dot notation in your column names will hamstring any further development, and force all other kdb users to use roundabout methods. The development will be slow and encounter many bugs.
I would advise that if periods must be included in the column, you create an API for external users to use to translate queries into the sanitised forms.
You can easily sanitise the whole table with .Q.id
q)tab:enlist `a.b`c`d!(1 2 3)
q)tab:.Q.id tab
q)sel:{[tab;cl] ?[tab;();0b;((),.Q.id each cl)!((),.Q.id each cl)]}
q)sel[tab;`a.b]
ab
--
1
How about the following, using take # :
q) `a.b`c#t
a.b c
-----------
4.931835 1
5.785203 9
0.8388858 5
To rename:
q) `b xcol t
b c d
---------------
4.931835 1 mil
5.785203 9 igf
0.8388858 5 kao
You can use .Q.id to rename any unselectable columns:
q).Q.id t
ab c d
---------------
4.931835 1 mil
5.785203 9 igf
0.8388858 5 kao
Best to avoid dots in columns names and symbols in general, use underscore if you must.

SAS PROC SQL - Concatenate variable values into a single value by group

I have a data set which contains 'factor' values and corresponding 'response' values:
data inTable;
input fact $ val $;
datalines;
a 1
a 2
a 3
b 4
b 5
b 6
c 7
d 8
e 9
e 10
f 11
;
run;
I want to aggregate response options by factor, i.e. I need to get
I know perfectly well how to implement this in a data step running a loop through values and applying CATX (posted here). But can I do the same with PROC SQL, using a combination of GROUP BY and some character analog of SUM() or CATX()?
Thanks for help,
Dmitry
The data step is the appropriate tool to use in SAS if you want to apply any sort of logic that carries lots of values forward from previous rows.
Any SQL solution would be extremely unwieldy - you would need to join the input table to itself n times, where n is the maximum number of distinct values for any of your factors, and you would also need to define a sequential key preserving the row order to use for the join.
A list of aggregation functions you can use in proc sql is available here:
http://support.sas.com/kb/25/279.html
Although a few of these do work with character variables, there is no aggregation function for string concatenation.