kdb: splayed table sym file not loaded - kdb

I'm following 14.2.4 Basic Operations on Splayed Tables on kx.com.
In an empty directory, running below
`:db/t/ set .Q.en[`:db;] ([] s1:`a`b`c; v:10 20 30; s2:`x`y`z)
created a splayed table successfully:
[22:52:57] [~/tmp] tree
.
└── db
├── sym
└── t
├── s1
├── s2
└── v
Q1) After loading the table using $ q db/t as instructed, the enumerated symbols did not get displayed. What am I missing here?
[22:54:24] [~/tmp] q db/t
KDB+ 4.0 ...
q)select from t
s1 v s2
--------
0 10 3
1 20 4
2 30 5
Q2) Loading using $ q db worked: the symfile was loaded successfully, and symbols appeared correctly. However, is there a way to load only one table? (Here, if I had more tables, all of them would've been loaded)
[23:05:18] [~/tmp] Q db
KDB+ 4.0 ...
q)t
s1 v s2
--------
a 10 x
b 20 y
c 30 z
q)sym
`a`b`c`x`y`z

Use get https://code.kx.com/q/ref/get/#get
q)sym:get `:db/sym
q)t:get `:db/t/

Q1 - When loading just the table using q db/t only the contents of that subdirectory are loaded. The sym file is not loaded as it is not located in the db/t subdirectory, which is why you don't see the enumerated symbols displayed. Best practice is to load the entire database using q db or \l db in your q session.
Q2 - If not using q db to load the entire database directory then get can be used as seen in the other answer to load both the table and sym file. The function loadTable below can handle loading both the table and sym file when passed the path to the table, as long as both exist in the same directory.
q)loadTable:{p:` vs x;`sym set get .Q.dd[p 0;`sym];p[1]set get x}
q)loadTable`:db/t
`t
q)t
s1 v s2
--------
a 10 x
b 20 y
c 30 z

Related

How serializing foreign keyed table works internally in kdb

I have a keyed table(referenced table) linked using foreign key to the referencing table and I serialize both tables using set operator.
q)kt:([sym:`GOOG`AMZN`FB]; px:20 30 40);
q)`:/Users/uts/db/kt set kt
q)t:([] sym:`kt$5?`GOOG`AMZN`FB; vol:5?10000)
q)`:/Users/uts/db/t set t
Then I remove these tables from the memory
q)delete kt,t from `.
Now I deserialize the table t in memory:
t:get `:/Users/uts/db/t
If I do meta t after this it fails, expecting kt as foreign key.
If I print t, as expected it shows index values in column sym of table t.
So, the question arises -
As kdb stores the meta of each table(i.e c,t,f,a) and its corresponding values on disk, how does table t serialization works internally?
How(In which form in binary format) are these values stored in file t.
-rw-r--r-- 1 uts staff 100 Apr 13 23:09 t
tl;dr A foreign key is stored as a vector of 4-byte indices of a key column of a referenced table plus a name of a table a foreign key refers to.
As far as I know kx never documented their file formats, and yet I think some useful information relevant to your question can be deduced right from a q console session.
Let me modify your example a bit to make things simpler.
q)show kt:([sym:`GOOG`AMZN`FB]; px:20 30 40)
sym | px
----| --
GOOG| 20
AMZN| 30
FB | 40
q)show t:([] sym:`kt$`GOOG`GOOG`AMZN`FB`FB)
sym
----
GOOG
GOOG
AMZN
FB
FB
I left only one column - sym - in t because vol is not relevant to the question. Let's save t without any data first:
q)`:/tmp/t set 0#t
`:/tmp/t
q)hcount `:/tmp/t
30
Now we know that it takes 30 bytes to represent t when it's empty. Let's see if there's a pattern when we start adding rows to t:
q){`:/tmp/t set x#t;`cnt`size!(x;hcount[`:/tmp/t] - 30)} each til[11], 100 1000 1000000
cnt size
---------------
0 0
1 4
2 8
3 12
4 16
5 20
6 24
7 28
8 32
9 36
10 40
100 400
1000 4000
1000000 4000000
We can see that adding one row increases the size of t by four bytes. What can these 4 bytes be? Can they be a representation of a symbol itself? No, because if they were and we renamed a sym value in kt it would affect the size of t on disk but it doesn't:
q)update sym:`$50#.Q.a from `kt where sym=`GOOG
`kt
q)1#t
sym
--------------------------------------------------
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
q)`:/tmp/t set 1#t
`:/tmp/t
q)hcount `:/tmp/t
34
Still 34 bytes. I think it should be obvious by now that the 4 bytes is an index, but an index of what? Is it an index of a column which must be called sym exactly? Apparently no, it isn't.
q)kt:`foo xcol kt
q)t
sym
--------------------------------------------------
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
AMZN
FB
FB
There's no column called sym in kt any longer but t hasn't changed at all! We can go even further and change the type of foo (ex sym) in kt:
q)update foo:-1 -2 -3.0 from `kt
`kt
q)t
sym
---
-1
-1
-2
-3
-3
Not only did it change t, it changed its meta too:
q)meta t
c | t f a
---| ------
sym| f kt
q)/ ^------- used to be s
I hope it's clear now that kdb stores a 4-byte index of a key column of a referenced table and a name of a table (but not a key column name!). If a referenced table is missing kdb can't reconstruct the original data and displays the bare index. It a referencing table needs to be sent over the wire then indices are replaced with actual values so that the receiving side can see the real data.

Check presence of a column in multiple tables

I want to check if a column is present in multiple tables.
When I try for one table, it works.
`tickerCol in cols tradeTable / (output is 1b) hence working perfectly
`tickerCol in cols table2 / (output is 1b) hence working perfectly
but when I run
`ticker in cols #' (tradeTable;table2) / (output is 0b, expected output 11b)
for above example ticker column is present in both tables(tradeTable;table2).
The following works using each-both ':
`ticker in ' cols each (tradeTable; table2)
This will find the columns that are present in each of the tables and then perform a check on each of the column lists to find if `ticker is present in these lists.
Solution is already provided in another answer. Just trying to explain why your solution is not working.
Lets say we have 2 tables t1 (columns id and v1) and t2 (columns id and v2).
Now when we run:
q) cols#'`t1`t2
output will be a list of list:
(`id`v1;`id`v2)
This list has 2 entries where each entry is a list.
Now what you are doing is trying to find column in this list .
q) `id in (`id`v1;`id`v2) /output 0b
And since that list doesn't have id as an entry it returns 0b.
If you search `id`v1 which is a list you will get 1b matching first entry.
q) `id`v1 in (`id`v1;`id`v2) / output 1b
What you want here is to search your column name in each entry of that list. So the only thing you are missing in your expression is each-both. This will work:
q) `id in'cols#'`t1`t2 / output 11b
In your case it will be:
q) `ticker in ' cols#'`tradeTable`table2

Accumulated value

I am coming across an issue wherein I am trying to lookup a Cost to a file with multiple rows for a project, but it's not working out, as lookup is repeating the cost for all the rows and thereby not providing the correct cost associated with a project. Here is how the file looks in which I am trying to lookup the value:
Date Project
1/08/2017 XYZ
2/08/2017 XYZ
3/08/2017 XYZ
4/08/2017 XYZ
5/08/2017 XYZ
6/08/2017 XYZ
1/09/2017 ABC
2/09/2017 ABC
3/09/2017 ABC
4/09/2017 ABC
5/09/2017 ABC
6/09/2017 ABC
12/10/2017 DEF
13/10/2017 DEF
11/11/2017 IJK
And here is the file form which I am trying to lookup the value from:
Project Budget
XYZ 200000
ABC 300000
DEF 1000000
IJK 50000
Any help is highly appreciated. Also how can I count a project is repeated in the field. I am looking for something like this :
Date Project Count_Projects
1/08/2017 XYZ 6
2/08/2017 XYZ 6
3/08/2017 XYZ 6
4/08/2017 XYZ 6
5/08/2017 XYZ 6
6/08/2017 XYZ 6
1/09/2017 ABC 6
2/09/2017 ABC 6
3/09/2017 ABC 6
4/09/2017 ABC 6
5/09/2017 ABC 6
6/09/2017 ABC 6
12/10/2017 DEF 2
13/10/2017 DEF 2
11/11/2017 IJK 1
I really need to figure this out.
For your second question, you can create the Count_Projects calculated column as follows:
Count_Projects =
CALCULATE(DISTINCTCOUNT(Dates[Date]),
FILTER(Dates, Dates[Project] = EARLIER(Dates[Project])))
Or you can use a variable:
Count_Projects =
VAR Project = Dates[Project]
RETURN CALCULATE(DISTINCTCOUNT(Dates[Date]),
ALL(Dates), Dates[Project] = Project)
Like #Alexis Olson, I'm not clear as to exactly what output you expect; but, assuming that you want to see the same Budget number listed for each respective Project entry (e.g., 200000 for each instance of XYZ, 300000 for each instance of ABC, etc.), here's an answer.
If you've got both tables loaded into PowerBI, As seen from the right side of the screen in the Data view (I named them Table and TableLookup):
If you click Home -> Manage Relationships, you'll see there is a relationship between the two tables:
If you then click Edit..., you'll see it's a Many to one relationship between the overall table (I called it Table) and the lookup table (I called it TableLookup):
Anyhow, the point is...there is a relationship between the two tables, and you're going to use it.
Click Cancel.
Click Close.
Click Modeling -> New Column; then, in the formula bar, type:
Budget = RELATED(TableLookup[Budget])
and enter. You'll get this:
Then you can do what Alexis said for counting:
Click Modeling -> New Column; then, in the formula bar, type:
Count_Projects =
CALCULATE(DISTINCTCOUNT('Table'[Date]),
FILTER('Table', 'Table'[Project] = EARLIER('Table'[Project])))
I replaced Alexis's "Date" with "Table" because my table is named Table.
You'll see this:

KDB/KX appending table to a file without reading the entire file

I'm new to KDB ( sorry if this question is dumb). I'm creating the following table
q)dsPricing:([id:`int$(); date:`date$()] open:`float$();close:`float$();high:`float$();low:`float$();volume:`int$())
q)dsPricing:([id:`int$(); date:`date$()] open:`float$();close:`float$();high:`float$();low:`float$();volume:`int$())
q)`dsPricing insert(123;2003.03.23;1.0;3.0;4.0;2.0;1000)
q)`dsPricing insert(123;2003.03.24;1.0;3.0;4.0;2.0;2000)
q)save `:dsPricing
Let's say after saving I exit. After starting q, I like to add another pricing item in there without loading the entire file because the file could be large
q)`dsPricing insert(123;2003.03.25;1.0;3.0;4.0;2.0;1500)
I've been looking at .Q.dpft but I can't really figure it out. Also this table/file doesn't need to be partitioned.
Thanks
You can upsert with the file handle of a table to append on disk, your example would look like this:
`:dsPricing upsert(123;2003.03.25;1.0;3.0;4.0;2.0;1500)
You can load the table into your q session using get, load or \l
q)get `:dsPricing
id date | open close high low volume
--------------| --------------------------
123 2003.03.23| 1 3 4 2 1000
123 2003.03.24| 1 3 4 2 2000
123 2003.03.25| 1 3 4 2 1500
.Q.dpft will save a table splayed(one file for each column in the table and a .d file containing column names) with a parted attribute(p#) on one of the symbol columns. Any symbol columns will also be enumerated by .Q.en.

kdb ticker plant: where to find documentation on .u.upd?

I am aware of this resource. But it does not spell out what parameters .u.upd takes and how to check if it worked.
This statement executes without error, although it does not seem to do anything:
.u.upd[`t;(`$"abc";1;2;3)]
If I define the table beforehand, e.g.
t:([] name:"aaa";a:1;b:2;c:3)
then the above .u.upd still runs without error, and does not change t.
.u.upd has the same function signature as insert (see http://code.kx.com/q/ref/qsql/#insert) in prefix form. In the most simplest case, .u.upd may get defined as insert.
so:
.u.upd[`table;<records>]
For example:
q).u.upd:insert
q)show tbl:([] a:`x`y;b:10 20)
a b
----
x 10
y 20
q).u.upd[`tbl;(`z;30)]
,2
q)show tbl
a b
----
x 10
y 20
z 30
q).u.upd[`tbl;(`a`b`c;1 2 3)]
3 4 5
q)show tbl
a b
----
x 10
y 20
z 30
a 1
b 2
c 3
Documentation including the event sequence, connection diagram etc. for tickerplants can be found here:
http://www.timestored.com/kdb-guides/kdb-tick-data-store
.u.upd[tableName; tableData] accepts two arguments, for inserting data
to a named table. This function will normally be called from a
feedhandler. It takes the tableData, adds a time column if one is
present, inserts it into the in-memory table, appends to the log file
and finally increases the log file counter.