KDB Apply where phrase only if column exists - kdb

I'm looking for a way to write functional select in KDB such that the where phrases is only apply if the column exists (on order to avoid error). If the column doesn't exist, it defaults to true.
I tried this but it didn't work
enlist(|;enlist(in;`colname;key flip table);enlist(in;`colname;filteredValues[`colname]));
I tried to write a simple boolean expression and use parse to get my functional form
(table[`colname] in values)|(not `colname in key flip table)
But kdb doesn't have short circuit so the left-hand expression is still evaluated despite the right-hand expression evaluating to true. This caused a weird output boolean$() which is a list of booleans all evaluating to false 0b
Any help is appreciated. Thanks!
EDIT 1: I have to join a series of condition with parameter specified in the dictionary filters
cond,:(,/) {[l;k] enlist(in;k;enlist l[k])}[filters]'[a:(key filters)]
Then I pass this cond on and it gets executed on a few different selects on different tables. How can I make sure that whatever conditional expression I put in place of enlist(in;k;enlist l[k] will only get evaluated as the select statement gets executed.

You can use the if-else conditional $ here to do what you want
For example:
q)$[`bid in cols`quotes;enlist (>;`bid;35);()]
> `bid 35
q)$[`bad in cols`quotes;enlist (>;`bad;35);()]
Note that in the second example, the return is an empty list, as this column isn't in quotes table
So you can put this into the functional select like so:
?[`quotes;$[`bid in cols`quotes;enlist (>;`bid;35);()];0b;()]
and the where clause will be applied the the column is present, otherwise no where clause will be applied:
q)count ?[`quotes;$[`bid in cols`quotes;enlist (>;`bid;35);()];0b;()]
541 //where clause applied, table filtered
q)count ?[`quotes;$[`bad in cols`quotes;enlist (>;`bad;35);()];0b;()]
1000 //where clause not applied, full table returned
Hope this helps
Jonathon
AquaQ Analytics
EDIT: If I'm understanding your updated question correctly, you might be able to do something a like the following. Firstly, let's define an example "filters" dictionary:
q)filters:`a`b`c!(1 2 3;"abc";`d`e`f)
q)filters
a| 1 2 3
b| a b c
c| d e f
So here we are assuming a few different columns of different types, for illustration purposes. You can build up your list of where clauses like so:
q)(in),'flip (key filters;value filters)
in `a 1 2 3
in `b "abc"
in `c `d`e`f
(this is equivalent to the code you had to generate cond, but it's a little neater & more efficient - you also have the values enlisted, which isn't necessary)
You could then use a vector conditional to generate your list of where clauses to apply to a given table e.g.
q)t:([] a:1 2 3 4 5 6;b:"adcghf")
q)?[key[filters] in cols[t];(in),'flip (key filters;value filters);count[filters]#()]
(in;`a;,1 2 3)
(in;`b;,"abc")
()
As you can see, in this example the table "t" has columns a and b, but not c. So using the vector conditional, you get the where clauses for a and b but not c.
Finally to actually apply this list of output where clauses to the table, you can make use of an over to apply each in turn:
q)l:?[key[filters] in cols[t];(in),'flip (key filters;value filters);count[filters]#()]
q){?[x;$[y~();y;enlist y];0b;()]}/[t;l]
a b
---
1 a
3 c
One thing to note here is that in the where clause of the functional select we need to check if y is an empty list - this is so we can enlist it if it is not an empty list
Hope this helps

Related

postgres regexp_matches strange behavior

Following the short docs on regexp_matches:
Return all captured substrings resulting from matching a POSIX regular expression against the string.
Example: regexp_matches('foobarbequebaz', '(bar)(beque)') returns {bar,beque}
With that in mind, I'd expect the result of regexp_matches('barbarbar', '(bar)') to be {bar,bar,bar}
However, only {bar} is returned.
Is this the expected behavior? Am I missing something?
Note:
calling regexp_matches('barbarbar', '(bar)', 'g') does return all 3 bars, but in table form:
regexp_matches text[]
{bar}
{bar}
{bar}
This behavior is described more in details in 9.7.3. POSIX Regular Expressions :
The regexp_matches function returns a set of text arrays of captured
substring(s) resulting from matching a POSIX regular expression
pattern to a string. It has the same syntax as regexp_match. This
function returns no rows if there is no match, one row if there is a
match and the g flag is not given, or N rows if there are N matches
and the g flag is given. Each returned row is a text array containing
the whole matched substring or the substrings matching parenthesized
subexpressions of the pattern, just as described above for
regexp_match. regexp_matches accepts all the flags shown in Table
9.24, plus the g flag which commands it to return all matches, not just the first one.
This is expected behavior. The function returns a set of text[] which means that multiple matches are presented in multiple rows. Why is it organized this way? The goal is to make it possible to find more than one token from a single match. In this case, they are presented in the form of an array. The documentation delivers a telling example:
SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
regexp_matches
----------------
{bar,beque}
{bazil,barf}
(2 rows)
The query returns two matches, each of them containing two tokens found.

deleting all records for all tables in memory in Kdb+

I would like to delete all records for all tables in memory but still keep the schemas.
for example:
a:([]a:1 2;b:2 4);
b:([]c:2 3;d:3 5);
I wrote a function:
{[t] t::select from t where i = -1} each tables[]
this didnt work, so i tried
{[t] ![`t;enlist(=;`i;-1);0b;()]} each tables[]
didnt work either.
Any idea or better ways?
If you pass a global table name as a symbol, it removes all the rows, leaving an empty table
q)delete from `a
`a
q)a
a b
---
q)meta a
c| t f a
-| -----
a| j
b| j
To do it for all global tables in root name space
{delete from x} each tables[]
Your second attempt using function was close. You can achieve it via the following (functional form of the above):
![;();0b;`symbol$()] each tables[]
The first argument should be the symbol of the table for the same reason I mentioned before
The second argument should be an empty list as we want to delete all records (we do not want to delete where i=-1, as that would delete nothing)
The final argument (list of columns to delete) should be an empty symbol list instead of an empty general list.
Mark's solution is best for doing what you want rather than functional form. Just adding to your question on t failing as putting kdb code in comments is awkward.
Your functional form fails not because of the t but because your last argument is not a symbol list `$(). Also you would want to delete where i is > -1, not =
q){[t] ![t;enlist(>;`i;-1);0b;`$()]} each `d`t`q
`d`t`q
q)d
date sym time bid1 bsize1 bid2 bsize2 bid3 bsize3 ask1 asize1 ask2 asize2 ask..
-----------------------------------------------------------------------------..
q)t
date sym time src price size
----------------------------
q)q
date sym time src bid ask bsize asize
-------------------------------------

How to put multiple where statements into function on kdb+

I'm trying to write a function using kdb+ which will look at the list, and find the values that simply meet two conditions.
Let's call the list DR (for data range). And I want a function that will combine these two conditions
"DR where (DR mod 7) in 2"
and
"DR where (DR.dd) in 1"
I'm able to apply them one at a time but I really need to combine them into one function. I was hoping I could do this
"DR was (DR.dd mod 7) in 2 and DR where (DR.dd) in 1"
but this obviously didn't work. Any advice?
You can utilize the and function to help with this, which is the same as &:
q)dr:.z.d+til 100
q)and
&
q)2=dr mod 7
10000001000000100000010000001000000100000010000001000000100000010000001000000..
q)1=dr.dd
00000000000000000000000001000000000000000000000000000000100000000000000000000..
q)(1=dr.dd)&2=dr mod 7
00000000000000000000000000000000000000000000000000000000100000000000000000000..
q)dr where(1=dr.dd)&2=dr mod 7
2021.02.01 2021.03.01
Its necessary wrap the first part in brackets due to how kdb reads code from right to left. This format changes slightly when doing this in a where clause, the brackets arent needed due to how each where clause is parsed, that is each clause between the commas are parsed seperately. However it is essentially doing the same thing as the code above.
q)t:([]date:dr)
q)select from t where 1=date.dd,2=date mod 7
date
----------
2021.02.01
2021.03.01
You could also do this using min to achieve similar results, like so:
DR where min(1=DR.dd;2=DR mod 7)

Find rows where string contains certain character at specific place

I have a field in my database, that contains 10 characters:
Fx: 1234567891
I want to look for the rows where the field has eg. the numbers 8 and 9 in places 5 and 6
So for example,
if the rows are
a) 1234567891
b) 1234897891
c) 1234877891
I only want b) returned in my select.
The type of the field is string/character varying.
I have tried using:
where field like '%89%'
but that won't work, because I need it to be 89 at a specific place in the string.
The fastest solution would be
WHERE substr(field, 8, 2) = '89'
If the positions are not adjacent, you end up with two conditions joined with AND.
You should be able to evaluate the single character using the underscore(_) character. So you should be able to use it as follows.
where field like '____89%'

kdb/q question: How do I interpret this groupby in my functional selection?

I am new to kdb/q and am trying to figure out what this particular query means. The code is using functional select, which I am not overly comfortable with.
?[output;();b;a];
where output is some table which has columns size time symbol
the groupby filter dictionary b is defined as follows
key | value
---------------
ts | ("+";00:05:00v;("k){x*y div x:$[16h=abs[#x];"j"$x;x]}";00:05:00v;("%:";`time)))
sym | ("k){x'y}";"{`$(,/)("/" vs string x)}";`symbol)
For the sake of completeness, dictionary a is defined as
volume ("sum";`size)
In effect, the functional select seems to be bucketing the data into 5 minute buckets and doing some parsing in symbol. What baffles me is how to read the groupby dictionary. Especially the k)" part and the entire thing being in quotes. Can someone help me go through this or point me to resources that can help me understand? Any input will be appreciated.
The aggregation part of the function form takes a dictionary, the key being the output key column names and the values being parse tree functions.
A parse tree is an expression that is not immediately evaluated. The first argument as a function and subsequent elements are its arguments. The inner-most brackets are evaluated first and then it moves up the heirarchy, evaluating each one in turn. More detailed information can be found here and in the whitepaper linked on that page
You can use the function parse with a string argument to get the parse tree of a function. For example, the parse tree for 1+2+3 is (+;1;(+;2;3)):
q)parse "1+2+3"
+
1
(+;2;3)
The inner-most bracket (+;2;3) is evaluated first resulting in 5, before the result is propogated up to the outmost parse tree function (+;1;5) giving 6
The groupby part of the clause will evaluate one or more parse tree functions and then will collect together records with the same output from the grouping function.
Making the function a bit clearer to read:
(+;00:05:00v;({x*y div x:$[16h=abs[#x];"j"$x;x]}";00:05:00v;(%:;`time)))
Looking at the inner most bracket (%:;`time), it returns the result of %: applied on the time column. We can see that %: is k for the function ltime
q)ltime
%:
Moving up a level, the next function evaluated is the lambda function {x*y div x:$[16h=abs[#x];"j"$x;x]} with arguments 00:05:00v and the result of our previous evaluated function. The lambda rounds it down the the nearest 5 minute interval
({x*y div x:$[16h=abs[#x];"j"$x;x]};00:05:00v;(%:;`time))
Moving up once more to the whole expression it is equivalent to 00:05:00v + {x*y div x:$[16h=abs[#x];"j"$x;x]};00:05:00v;(%:;`time)), with 00:05:00 being added onto each result from the previous evaluation.
So essentially it first returns the local time of the timestamp, then
For the symbol aggregation
("k""{x'y}";{`$(,/)("/" vs string x)};`symbol)
The inner function {`$(,/)("/" vs string x)} strings a symbol, splits it at "/" character and then joins it back together, effectively removing the slash
The "k" is a function that evaluates the string using the k interpreter.
"k""{x'y}"" returns a function which itself takes a function x and argument y and modifies the function to use the each-both adverb '. This makes it so that the function x is applied on each symbol individually as opposed to the column as a whole.
This could be implemented in q instead of k like so:
({x#'y};{`$(,/)("/" vs string x)};`symbol)
The function {x#'y} takes the function argument {`$(,/)("/" vs string x)} and the symbol column as before, but we have to use # with the each-both adverb in q to apply the function on the arguments.
The aggregation function will then be applied to each group. In your case the function is a simple parse tree, which will return the sum of the size columns in each group, with the output column called volume
a:enlist[`volume]!enlist (sum;`size)