deleting all records for all tables in memory in Kdb+ - kdb

I would like to delete all records for all tables in memory but still keep the schemas.
for example:
a:([]a:1 2;b:2 4);
b:([]c:2 3;d:3 5);
I wrote a function:
{[t] t::select from t where i = -1} each tables[]
this didnt work, so i tried
{[t] ![`t;enlist(=;`i;-1);0b;()]} each tables[]
didnt work either.
Any idea or better ways?

If you pass a global table name as a symbol, it removes all the rows, leaving an empty table
q)delete from `a
`a
q)a
a b
---
q)meta a
c| t f a
-| -----
a| j
b| j
To do it for all global tables in root name space
{delete from x} each tables[]
Your second attempt using function was close. You can achieve it via the following (functional form of the above):
![;();0b;`symbol$()] each tables[]
The first argument should be the symbol of the table for the same reason I mentioned before
The second argument should be an empty list as we want to delete all records (we do not want to delete where i=-1, as that would delete nothing)
The final argument (list of columns to delete) should be an empty symbol list instead of an empty general list.

Mark's solution is best for doing what you want rather than functional form. Just adding to your question on t failing as putting kdb code in comments is awkward.
Your functional form fails not because of the t but because your last argument is not a symbol list `$(). Also you would want to delete where i is > -1, not =
q){[t] ![t;enlist(>;`i;-1);0b;`$()]} each `d`t`q
`d`t`q
q)d
date sym time bid1 bsize1 bid2 bsize2 bid3 bsize3 ask1 asize1 ask2 asize2 ask..
-----------------------------------------------------------------------------..
q)t
date sym time src price size
----------------------------
q)q
date sym time src bid ask bsize asize
-------------------------------------

Related

KDB - Clearing out functions, variables, tables from server without exiting or restarting service

I'm currently setting up a server and successfully passing functions and test data to that server.
Is there an eloquent way to clear out all the functions, variables, tables, etc.? Since we are running KDB in a Docker container and accessing it via IP address, I prefer not to have to restart the q session on the server, but rather assign many/all values and functions to :: or null.
At present I'm assuming I have to reassign each function/variable/table to :: or similar to achieve this. It isn't a big issue but I expect to have numerous functions.
\p 5042
h:hopen `:XXX.XXX.XX.XX:5042
h "sq:{x*x}" //send sample function sq to server
h "sq: ::" //assign function sq to nothing (repeat for all variables/functions/tables etc)
hclose h
//check the IP address list of functions to confirm deletion http://XXX.XXX.XX.XX:5042/?\f
Would you be able to give us a bit more information as to why you want to clear out everything (and in particular, why do you want to set functions to null)?
Otherwise, you can do a few things. You can do a delete statement on a namespace to delete everything in it. To do delete all tables/variables/functions in the global namespace, you can do the following.
q)a: 1
q)b: 1 2 3
q)f: {1 + x}
q)value `.
a| 1
b| 1 2 3
f| {1 + x}
q)delete from `.
`.
q)value `.
q)f
'f
[0] f
^
q)
If you want to null them rather than delete them, you could use the system commands a, f and v to get lists of all the tables (a), functions (f) and varialbes (v) in the global (or other) name space, and then use set to set them all to null.
q)f: {1+x}
q)g: {2*x}
q)(system"f")set'(::)
`f`g
q)f
q)g
q)
Is this roughly what you were looking for?
(One obvious problem with this is that you might end up deleting other peoples variables.)
Few extra points to add to Matt's:
.Q.gc[] may be a good idea to run after doing this. This will return any memory to the OS that was being used by those variables in the root namespace e.g. if you had a large table defined before this.
Use an alternative namespace for functions you want to keep after this clearing e.g. .utils. You could even add a .utils.refresh function which clears the root namespace and runs .Q.gc[]

Need to explain the kdb/q script to save partitioned table

I'm trying to understand this snippet code from:
https://code.kx.com/q/kb/loading-from-large-files/
to customize it by myself (e.x partition by hours, minutes, number of ticks,...):
$ cat fs.q
\d .Q
/ extension of .Q.dpft to separate table name & data
/ and allow append or overwrite
/ pass table data in t, table name in n, : or , in g
k)dpfgnt:{[d;p;f;g;n;t]if[~&/qm'r:+en[d]t;'`unmappable];
{[d;g;t;i;x]#[d;x;g;t[x]i]}[d:par[d;p;n];g;r;<r f]'!r;
#[;f;`p#]#[d;`.d;:;f,r#&~f=r:!r];n}
/ generalization of .Q.dpfnt to auto-partition and save a multi-partition table
/ pass table data in t, table name in n, name of column to partition on in c
k)dcfgnt:{[d;c;f;g;n;t]*p dpfgnt[d;;f;g;n]'?[t;;0b;()]',:'(=;c;)'p:?[;();();c]?[t;();1b;(,c)!,c]}
\d .
r:flip`date`open`high`low`close`volume`sym!("DFFFFIS";",")0:
w:.Q.dcfgnt[`:db;`date;`sym;,;`stats]
.Q.fs[w r#]`:file.csv
But I couldn't find any resources to give me detail explain. For example:
if[~&/qm'r:+en[d]t;'`unmappable];
what does it do with the parameter d?
(Promoting this to an answer as I believe it helps answer the question).
Following on from the comment chain: in order to translate the k code into q code (or simply to understand the k code) you have a few options, none of which are particularly well documented as it defeats the purpose of the q language - to be the wrapper which obscures the k language.
Option 1 is to inspect the built-in functions in the .q namespace
q).q
| ::
neg | -:
not | ~:
null | ^:
string | $:
reciprocal| %:
floor | _:
...
Option 2 is to inspect the q.k script which creates the above namespace (be careful not to edit/change this):
vi $QHOME/q.k
Option 3 is to lookup some of the nuggets of documentation on the code.kx website, for example https://code.kx.com/q/wp/parse-trees/#k4-q-and-qk and https://code.kx.com/q/basics/exposed-infrastructure/#unary-forms
Options 4 is to google search for reference material for other/similar versions of k, for example k2/k3. They tend to be similar-ish.
Final point to note is that in most of these example you'll see a colon (:) after the primitives....this colon is required in q/kdb to use the monadic form of the primitive (most are heavily overloaded) while in k it is not required to explicitly force the monadic form. This is why where will show as &: in the q reference but will usually just be & in actual k code

KDB Apply where phrase only if column exists

I'm looking for a way to write functional select in KDB such that the where phrases is only apply if the column exists (on order to avoid error). If the column doesn't exist, it defaults to true.
I tried this but it didn't work
enlist(|;enlist(in;`colname;key flip table);enlist(in;`colname;filteredValues[`colname]));
I tried to write a simple boolean expression and use parse to get my functional form
(table[`colname] in values)|(not `colname in key flip table)
But kdb doesn't have short circuit so the left-hand expression is still evaluated despite the right-hand expression evaluating to true. This caused a weird output boolean$() which is a list of booleans all evaluating to false 0b
Any help is appreciated. Thanks!
EDIT 1: I have to join a series of condition with parameter specified in the dictionary filters
cond,:(,/) {[l;k] enlist(in;k;enlist l[k])}[filters]'[a:(key filters)]
Then I pass this cond on and it gets executed on a few different selects on different tables. How can I make sure that whatever conditional expression I put in place of enlist(in;k;enlist l[k] will only get evaluated as the select statement gets executed.
You can use the if-else conditional $ here to do what you want
For example:
q)$[`bid in cols`quotes;enlist (>;`bid;35);()]
> `bid 35
q)$[`bad in cols`quotes;enlist (>;`bad;35);()]
Note that in the second example, the return is an empty list, as this column isn't in quotes table
So you can put this into the functional select like so:
?[`quotes;$[`bid in cols`quotes;enlist (>;`bid;35);()];0b;()]
and the where clause will be applied the the column is present, otherwise no where clause will be applied:
q)count ?[`quotes;$[`bid in cols`quotes;enlist (>;`bid;35);()];0b;()]
541 //where clause applied, table filtered
q)count ?[`quotes;$[`bad in cols`quotes;enlist (>;`bad;35);()];0b;()]
1000 //where clause not applied, full table returned
Hope this helps
Jonathon
AquaQ Analytics
EDIT: If I'm understanding your updated question correctly, you might be able to do something a like the following. Firstly, let's define an example "filters" dictionary:
q)filters:`a`b`c!(1 2 3;"abc";`d`e`f)
q)filters
a| 1 2 3
b| a b c
c| d e f
So here we are assuming a few different columns of different types, for illustration purposes. You can build up your list of where clauses like so:
q)(in),'flip (key filters;value filters)
in `a 1 2 3
in `b "abc"
in `c `d`e`f
(this is equivalent to the code you had to generate cond, but it's a little neater & more efficient - you also have the values enlisted, which isn't necessary)
You could then use a vector conditional to generate your list of where clauses to apply to a given table e.g.
q)t:([] a:1 2 3 4 5 6;b:"adcghf")
q)?[key[filters] in cols[t];(in),'flip (key filters;value filters);count[filters]#()]
(in;`a;,1 2 3)
(in;`b;,"abc")
()
As you can see, in this example the table "t" has columns a and b, but not c. So using the vector conditional, you get the where clauses for a and b but not c.
Finally to actually apply this list of output where clauses to the table, you can make use of an over to apply each in turn:
q)l:?[key[filters] in cols[t];(in),'flip (key filters;value filters);count[filters]#()]
q){?[x;$[y~();y;enlist y];0b;()]}/[t;l]
a b
---
1 a
3 c
One thing to note here is that in the where clause of the functional select we need to check if y is an empty list - this is so we can enlist it if it is not an empty list
Hope this helps

Syncsort Sum Fields=None not removing duplicates

I'm trying to run a SYNCSORT job that will remove duplicate entries and when I run it, I'm still getting duplicates. The following is the SYNCSORT code I'm using:
INCLUDE COND=(((61,1,CH,EQ,C'P'),OR,
(61,1,CH,EQ,C'V')),AND,
(8,2,CH,EQ,C'FL'))
OUTREC FIELDS=(1:12,20,
30:36,20,
55:61,1)
SORT FIELDS=(30,20,CH,A,
01,20,CH,A)
SUM FIELDS=NONE
The input is as follows:
----+----1----+----2----+----3----+----4----+----5----+----6
FL AMELIA CITY
32034 FL NASSAU FERNANDINA BEACH P
32034 FL NASSAU AMELIA CITY V
32034 FL NASSAU AMELIA ISLAND S
32034 FL NASSAU FERNANDINA S
I'm getting most of the expected output, except that I'm still getting duplicates. The output that I have is as follows:
----+----1----+----2----+----3----+----4----+----5----+
MANATEE BRADENTON P
MANATEE BRADENTON P
MANATEE BRADENTON P
MANATEE BRADENTON P
MANATEE BRADENTON P
MANATEE BRADINGTON V
POLK BRADLEY P
HILLSBOROUGH BRANDON P
SUWANNEE BRANFORD P
MIAMI-DADE BRICKELL V
Any help would be appreciated as I'm not able to find my error.
This is what you are sort summing on:
< ------------ Sort Field ----------------------->
----+----1----+----2----+----3----+----4----+----5----+----6
FL AMELIA CITY
32034 FL NASSAU FERNANDINA BEACH P
32034 FL NASSAU AMELIA CITY V
32034 FL NASSAU AMELIA ISLAND S
32034 FL NASSAU FERNANDINA S
the Duplicate records will be different in the first 11 bytes which you can not see. Try removing the outrec to check.
Possible changes -
Change the outrec to an inrec
re-code the sort with fields associated with the output, see the following:
The following sort sorts based on the output records:
INCLUDE COND=(((61,1,CH,EQ,C'P'),OR,
(61,1,CH,EQ,C'V')),AND,
(8,2,CH,EQ,C'FL'))
OUTREC FIELDS=(1:12,20,
30:36,20,
55:61,1)
SORT FIELDS=(42,20,CH,A,
12,20,CH,A)
SUM FIELDS=NONE
It does not matter what order you code the different stages of a "sort", they will be executed in the order that SORT wants.
In your case this will be INCLUDE, then SORT, then SUM, then OUTREC. You can check that this is the case by entirely inverting the control cards, you will get identical output.
If you want to do something before SORT you use INREC, not just try to locate OUTREC before the SORT statement. Here, since you are SORTing, you only want to include the data you need. You do not want to include the spacing for formatting. Why would you want to load up your file to SORT with extra identical data on each record?
On INREC and OUTREC please don't use FIELDS. On OUTFIL please don't use OUTREC. It should be obvious that FIELDS is "overloaded" (see how many times you used FIELDS, and see how many are "the same") and OUTREC is "overloaded". More than 10 years ago BUILD was introduced to allow things to be much clearer - it describes what it is doing, and every time you see BUILD it only only means BUILD.
INCLUDE COND=(((61,1,CH,EQ,C'P'),
OR,
(61,1,CH,EQ,C'V')),
AND,
(8,2,CH,EQ,C'FL'))
INREC BUILD=(36,20,
12,20,
61,1)
SORT FIELDS=(1,40,CH,A)
OUTREC BUILD=(21,10,
10X,
1,20,
5X,
41,1)
The INREC selects only the data you want, and in an order where you need specify only one SORT key.
The OUTREC then formats the data how you want it. For each record in the SORT 15 bytes were saved (the blanks). 10X is 10 blanks, 5X is five blanks.
Note that it is much easier, to code and understand, and more maintainable therefore, if you include "explicit" blanks rather than implicit ones using column numbers. Imaging 10 columns of a report, and the spacing between columns one and two are incorrect. Do you want to change all the column references, just to add one extra space, or would you prefer to change 7X to 8X and the rest works itself out? Even if you enjoy tedious changes, remember your colleagues :-)
If your data is already in order don't use SUM FIELDS=NONE. Use OUTFIL reporting features, REMOVECC, NODETAIL and SECTIONS with TRAILER3. NEVER SORT data just to allow you to remove duplicates with SUM FIELDS=NONE.

Stata mmerge update replace gives wrong output

I wanted to test what happened if I replace a variable with a different data type:
clear
input id x0
1 1
2 13
3 .
end
list
save tabA, replace
clear
input id str5 x0
1 "1"
2 "23"
3 "33"
end
list
save tabB, replace
use tabA, clear
mmerge id using tabB, type(1:1) update replace
list
The result is:
+--------------------------------------------------+
| id x0 _merge |
|--------------------------------------------------|
1. | 1 1 in both, master agrees with using data |
2. | 2 13 in both, master agrees with using data |
3. | 3 . in both, master agrees with using data |
+--------------------------------------------------+
This seems very strange to me. I expected breakdown or disagreement. Is this a bug or am I missing something?
mmerge is user-written (Jeroen Weesie, SSC, 2002).
If you use the official merge in an up-to-date Stata, you will get what you expect.
. merge 1:1 id using tabB, update replace
x0 is str5 in using data
r(106);
I have not looked inside mmerge. My own guess is that what you see is a feature from the author's point of view, namely that it's not a problem if one variable is numeric and one variable is string so long as their contents agree. But why are you not using merge directly? There was a brief period several years ago when mmerge had some advantages over merge, but that's long past. BTW, I agree in wanting my merges to be very conservative and not indulgent on variable types.