Need to explain the kdb/q script to save partitioned table

Need to explain the kdb/q script to save partitioned table - kdb

I'm trying to understand this snippet code from:
https://code.kx.com/q/kb/loading-from-large-files/
to customize it by myself (e.x partition by hours, minutes, number of ticks,...):
$ cat fs.q
\d .Q
/ extension of .Q.dpft to separate table name & data
/ and allow append or overwrite
/ pass table data in t, table name in n, : or , in g
k)dpfgnt:{[d;p;f;g;n;t]if[~&/qm'r:+en[d]t;'`unmappable];
{[d;g;t;i;x]#[d;x;g;t[x]i]}[d:par[d;p;n];g;r;<r f]'!r;
#[;f;`p#]#[d;`.d;:;f,r#&~f=r:!r];n}
/ generalization of .Q.dpfnt to auto-partition and save a multi-partition table
/ pass table data in t, table name in n, name of column to partition on in c
k)dcfgnt:{[d;c;f;g;n;t]*p dpfgnt[d;;f;g;n]'?[t;;0b;()]',:'(=;c;)'p:?[;();();c]?[t;();1b;(,c)!,c]}
\d .
r:flip`date`open`high`low`close`volume`sym!("DFFFFIS";",")0:
w:.Q.dcfgnt[`:db;`date;`sym;,;`stats]
.Q.fs[w r#]`:file.csv
But I couldn't find any resources to give me detail explain. For example:
if[~&/qm'r:+en[d]t;'`unmappable];
what does it do with the parameter d?

(Promoting this to an answer as I believe it helps answer the question).
Following on from the comment chain: in order to translate the k code into q code (or simply to understand the k code) you have a few options, none of which are particularly well documented as it defeats the purpose of the q language - to be the wrapper which obscures the k language.
Option 1 is to inspect the built-in functions in the .q namespace
q).q
| ::
neg | -:
not | ~:
null | ^:
string | $:
reciprocal| %:
floor | _:
...
Option 2 is to inspect the q.k script which creates the above namespace (be careful not to edit/change this):
vi $QHOME/q.k
Option 3 is to lookup some of the nuggets of documentation on the code.kx website, for example https://code.kx.com/q/wp/parse-trees/#k4-q-and-qk and https://code.kx.com/q/basics/exposed-infrastructure/#unary-forms
Options 4 is to google search for reference material for other/similar versions of k, for example k2/k3. They tend to be similar-ish.
Final point to note is that in most of these example you'll see a colon (:) after the primitives....this colon is required in q/kdb to use the monadic form of the primitive (most are heavily overloaded) while in k it is not required to explicitly force the monadic form. This is why where will show as &: in the q reference but will usually just be & in actual k code

Related

deleting all records for all tables in memory in Kdb+

I would like to delete all records for all tables in memory but still keep the schemas.
for example:
a:([]a:1 2;b:2 4);
b:([]c:2 3;d:3 5);
I wrote a function:
{[t] t::select from t where i = -1} each tables[]
this didnt work, so i tried
{[t] ![`t;enlist(=;`i;-1);0b;()]} each tables[]
didnt work either.
Any idea or better ways?

If you pass a global table name as a symbol, it removes all the rows, leaving an empty table
q)delete from `a
`a
q)a
a b
---
q)meta a
c| t f a
-| -----
a| j
b| j
To do it for all global tables in root name space
{delete from x} each tables[]
Your second attempt using function was close. You can achieve it via the following (functional form of the above):
![;();0b;`symbol$()] each tables[]
The first argument should be the symbol of the table for the same reason I mentioned before
The second argument should be an empty list as we want to delete all records (we do not want to delete where i=-1, as that would delete nothing)
The final argument (list of columns to delete) should be an empty symbol list instead of an empty general list.

Mark's solution is best for doing what you want rather than functional form. Just adding to your question on t failing as putting kdb code in comments is awkward.
Your functional form fails not because of the t but because your last argument is not a symbol list `$(). Also you would want to delete where i is > -1, not =
q){[t] ![t;enlist(>;`i;-1);0b;`$()]} each `d`t`q
`d`t`q
q)d
date sym time bid1 bsize1 bid2 bsize2 bid3 bsize3 ask1 asize1 ask2 asize2 ask..
-----------------------------------------------------------------------------..
q)t
date sym time src price size
----------------------------
q)q
date sym time src bid ask bsize asize
-------------------------------------

KDB - Clearing out functions, variables, tables from server without exiting or restarting service

I'm currently setting up a server and successfully passing functions and test data to that server.
Is there an eloquent way to clear out all the functions, variables, tables, etc.? Since we are running KDB in a Docker container and accessing it via IP address, I prefer not to have to restart the q session on the server, but rather assign many/all values and functions to :: or null.
At present I'm assuming I have to reassign each function/variable/table to :: or similar to achieve this. It isn't a big issue but I expect to have numerous functions.
\p 5042
h:hopen `:XXX.XXX.XX.XX:5042
h "sq:{x*x}" //send sample function sq to server
h "sq: ::" //assign function sq to nothing (repeat for all variables/functions/tables etc)
hclose h
//check the IP address list of functions to confirm deletion http://XXX.XXX.XX.XX:5042/?\f

Would you be able to give us a bit more information as to why you want to clear out everything (and in particular, why do you want to set functions to null)?
Otherwise, you can do a few things. You can do a delete statement on a namespace to delete everything in it. To do delete all tables/variables/functions in the global namespace, you can do the following.
q)a: 1
q)b: 1 2 3
q)f: {1 + x}
q)value `.
a| 1
b| 1 2 3
f| {1 + x}
q)delete from `.
`.
q)value `.
q)f
'f
[0] f
^
q)
If you want to null them rather than delete them, you could use the system commands a, f and v to get lists of all the tables (a), functions (f) and varialbes (v) in the global (or other) name space, and then use set to set them all to null.
q)f: {1+x}
q)g: {2*x}
q)(system"f")set'(::)
`f`g
q)f
q)g
q)
Is this roughly what you were looking for?
(One obvious problem with this is that you might end up deleting other peoples variables.)

Few extra points to add to Matt's:
.Q.gc[] may be a good idea to run after doing this. This will return any memory to the OS that was being used by those variables in the root namespace e.g. if you had a large table defined before this.
Use an alternative namespace for functions you want to keep after this clearing e.g. .utils. You could even add a .utils.refresh function which clears the root namespace and runs .Q.gc[]

kdb/q question: How do I interpret this groupby in my functional selection?

I am new to kdb/q and am trying to figure out what this particular query means. The code is using functional select, which I am not overly comfortable with.
?[output;();b;a];
where output is some table which has columns size time symbol
the groupby filter dictionary b is defined as follows
key | value
---------------
ts | ("+";00:05:00v;("k){x*y div x:$[16h=abs[#x];"j"$x;x]}";00:05:00v;("%:";`time)))
sym | ("k){x'y}";"{`$(,/)("/" vs string x)}";`symbol)
For the sake of completeness, dictionary a is defined as
volume ("sum";`size)
In effect, the functional select seems to be bucketing the data into 5 minute buckets and doing some parsing in symbol. What baffles me is how to read the groupby dictionary. Especially the k)" part and the entire thing being in quotes. Can someone help me go through this or point me to resources that can help me understand? Any input will be appreciated.

The aggregation part of the function form takes a dictionary, the key being the output key column names and the values being parse tree functions.
A parse tree is an expression that is not immediately evaluated. The first argument as a function and subsequent elements are its arguments. The inner-most brackets are evaluated first and then it moves up the heirarchy, evaluating each one in turn. More detailed information can be found here and in the whitepaper linked on that page
You can use the function parse with a string argument to get the parse tree of a function. For example, the parse tree for 1+2+3 is (+;1;(+;2;3)):
q)parse "1+2+3"
+
1
(+;2;3)
The inner-most bracket (+;2;3) is evaluated first resulting in 5, before the result is propogated up to the outmost parse tree function (+;1;5) giving 6
The groupby part of the clause will evaluate one or more parse tree functions and then will collect together records with the same output from the grouping function.
Making the function a bit clearer to read:
(+;00:05:00v;({x*y div x:$[16h=abs[#x];"j"$x;x]}";00:05:00v;(%:;`time)))
Looking at the inner most bracket (%:;`time), it returns the result of %: applied on the time column. We can see that %: is k for the function ltime
q)ltime
%:
Moving up a level, the next function evaluated is the lambda function {x*y div x:$[16h=abs[#x];"j"$x;x]} with arguments 00:05:00v and the result of our previous evaluated function. The lambda rounds it down the the nearest 5 minute interval
({x*y div x:$[16h=abs[#x];"j"$x;x]};00:05:00v;(%:;`time))
Moving up once more to the whole expression it is equivalent to 00:05:00v + {x*y div x:$[16h=abs[#x];"j"$x;x]};00:05:00v;(%:;`time)), with 00:05:00 being added onto each result from the previous evaluation.
So essentially it first returns the local time of the timestamp, then
For the symbol aggregation
("k""{x'y}";{`$(,/)("/" vs string x)};`symbol)
The inner function {`$(,/)("/" vs string x)} strings a symbol, splits it at "/" character and then joins it back together, effectively removing the slash
The "k" is a function that evaluates the string using the k interpreter.
"k""{x'y}"" returns a function which itself takes a function x and argument y and modifies the function to use the each-both adverb '. This makes it so that the function x is applied on each symbol individually as opposed to the column as a whole.
This could be implemented in q instead of k like so:
({x#'y};{`$(,/)("/" vs string x)};`symbol)
The function {x#'y} takes the function argument {`$(,/)("/" vs string x)} and the symbol column as before, but we have to use # with the each-both adverb in q to apply the function on the arguments.
The aggregation function will then be applied to each group. In your case the function is a simple parse tree, which will return the sum of the size columns in each group, with the output column called volume
a:enlist[`volume]!enlist (sum;`size)

KDB Apply where phrase only if column exists

I'm looking for a way to write functional select in KDB such that the where phrases is only apply if the column exists (on order to avoid error). If the column doesn't exist, it defaults to true.
I tried this but it didn't work
enlist(|;enlist(in;`colname;key flip table);enlist(in;`colname;filteredValues[`colname]));
I tried to write a simple boolean expression and use parse to get my functional form
(table[`colname] in values)|(not `colname in key flip table)
But kdb doesn't have short circuit so the left-hand expression is still evaluated despite the right-hand expression evaluating to true. This caused a weird output boolean$() which is a list of booleans all evaluating to false 0b
Any help is appreciated. Thanks!
EDIT 1: I have to join a series of condition with parameter specified in the dictionary filters
cond,:(,/) {[l;k] enlist(in;k;enlist l[k])}[filters]'[a:(key filters)]
Then I pass this cond on and it gets executed on a few different selects on different tables. How can I make sure that whatever conditional expression I put in place of enlist(in;k;enlist l[k] will only get evaluated as the select statement gets executed.

You can use the if-else conditional $ here to do what you want
For example:
q)$[`bid in cols`quotes;enlist (>;`bid;35);()]
> `bid 35
q)$[`bad in cols`quotes;enlist (>;`bad;35);()]
Note that in the second example, the return is an empty list, as this column isn't in quotes table
So you can put this into the functional select like so:
?[`quotes;$[`bid in cols`quotes;enlist (>;`bid;35);()];0b;()]
and the where clause will be applied the the column is present, otherwise no where clause will be applied:
q)count ?[`quotes;$[`bid in cols`quotes;enlist (>;`bid;35);()];0b;()]
541 //where clause applied, table filtered
q)count ?[`quotes;$[`bad in cols`quotes;enlist (>;`bad;35);()];0b;()]
1000 //where clause not applied, full table returned
Hope this helps
Jonathon
AquaQ Analytics
EDIT: If I'm understanding your updated question correctly, you might be able to do something a like the following. Firstly, let's define an example "filters" dictionary:
q)filters:`a`b`c!(1 2 3;"abc";`d`e`f)
q)filters
a| 1 2 3
b| a b c
c| d e f
So here we are assuming a few different columns of different types, for illustration purposes. You can build up your list of where clauses like so:
q)(in),'flip (key filters;value filters)
in `a 1 2 3
in `b "abc"
in `c `d`e`f
(this is equivalent to the code you had to generate cond, but it's a little neater & more efficient - you also have the values enlisted, which isn't necessary)
You could then use a vector conditional to generate your list of where clauses to apply to a given table e.g.
q)t:([] a:1 2 3 4 5 6;b:"adcghf")
q)?[key[filters] in cols[t];(in),'flip (key filters;value filters);count[filters]#()]
(in;`a;,1 2 3)
(in;`b;,"abc")
()
As you can see, in this example the table "t" has columns a and b, but not c. So using the vector conditional, you get the where clauses for a and b but not c.
Finally to actually apply this list of output where clauses to the table, you can make use of an over to apply each in turn:
q)l:?[key[filters] in cols[t];(in),'flip (key filters;value filters);count[filters]#()]
q){?[x;$[y~();y;enlist y];0b;()]}/[t;l]
a b
---
1 a
3 c
One thing to note here is that in the where clause of the functional select we need to check if y is an empty list - this is so we can enlist it if it is not an empty list
Hope this helps

PostgreSQL tuple format

Is there any document describing the tuple format that PostgreSQL server adheres to? The official documentation appears arcane about this.
A single tuple seems simple enough to figure out, but when it comes to arrays of tuples, arrays of composite tuples, and finally nested arrays of composite tuples, it is impossible to be certain about the format simply by looking at the output.
I am asking this following my initial attempt at implementing pg-tuple, a parser that's still missing today, to be able to parse PostgreSQL tuples within Node.js
Examples
create type type_A as (
a int,
b text
);
with a simple text: (1,hello)
with a complex text: (1,"hello world!")
create type type_B as (
c type_A,
d type_A[]
);
simple-value array: {"(2,two)","(3,three)"}
for type_B[] we can get:
{"(\"(7,inner)\",\"{\"\"(88,eight-1)\"\",\"\"(99,nine-2)\"\"}\")","(\"(77,inner)\",\"{\"\"(888,eight-3)\"\",\"\"(999,nine-4)\"\"}\")"}
It gets even more complex for multi-dimensional arrays of composite types.
UPDATE
Since it feels like there is no specification at all, I have started working on reversing it. Not sure if it can be done fully though, because from some initial examples it is often unclear what formatting rules are applied.

As Nick posted, according to docs:
the whitespace will be ignored if the field type is integer, but not
if it is text.
and
The composite output routine will put double quotes around field
values if they are empty strings or contain parentheses, commas,
double quotes, backslashes, or white space.
and
Double quotes and backslashes embedded in field values will be
doubled.
and now quoting Nick himself:
nested elements are converted to strings, and then quoted / escaped
like any other string
I give shorted example below, comfortably compared against its nested value:
a=# create table playground (t text, ta text[],f float,fa float[]);
CREATE TABLE
a=# insert into playground select 'space here',array['','bs\'],8.0,array[null,8.1];
INSERT 0 1
a=# insert into playground select 'no_space',array[null,'nospace'],9.0,array[9.1,8.0];
INSERT 0 1
a=# select playground,* from playground;
playground | t | ta | f | fa
---------------------------------------------------+------------+----------------+---+------------
("space here","{"""",""bs\\\\""}",8,"{NULL,8.1}") | space here | {"","bs\\"} | 8 | {NULL,8.1}
(no_space,"{NULL,nospace}",9,"{9.1,8}") | no_space | {NULL,nospace} | 9 | {9.1,8}
(2 rows)
If you go for deeper nested quoting, look at:
a=# select nested,* from (select playground,* from playground) nested;
nested | playground | t | ta | f | fa
-------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+------------+----------------+---+------------
("(""space here"",""{"""""""",""""bs\\\\\\\\""""}"",8,""{NULL,8.1}"")","space here","{"""",""bs\\\\""}",8,"{NULL,8.1}") | ("space here","{"""",""bs\\\\""}",8,"{NULL,8.1}") | space here | {"","bs\\"} | 8 | {NULL,8.1}
("(no_space,""{NULL,nospace}"",9,""{9.1,8}"")",no_space,"{NULL,nospace}",9,"{9.1,8}") | (no_space,"{NULL,nospace}",9,"{9.1,8}") | no_space | {NULL,nospace} | 9 | {9.1,8}
(2 rows)
As you can see, the output again follows rules the above.
This way in short answers to your questions would be:
why array is normally presented inside double-quotes, while an empty array is suddenly an open value? (text representation of empty array does not contain comma or space or etc)
why a single " is suddenly presented as \""? (text representation of 'one\ two', according to rules above is "one\\ two", and text representation of the last is ""one\\\\two"" and it is just what you get)
why unicode-formatted text is changing the escaping for \? How can we tell the difference then? (According to docs,
PostgreSQL also accepts "escape" string constants, which are an
extension to the SQL standard. An escape string constant is specified
by writing the letter E (upper or lower case) just before the opening
single quote
), so it is not unicode text, but the the way you tell postgres that it should interpret escapes in text not as symbols, but as escapes. Eg E'\'' will be interpreted as ' and '\'' will make it wait for closing ' to be interpreted. In you example E'\\ text' the text represent of it will be "\\ text" - we add backslsh for backslash and take value in double quotes - all as described in online docs.
the way that { and } are escaped is not always clear (I could not anwer this question, because it was not clear itself)