kdb - how does the 'where' function work

kdb - how does the 'where' function work - kdb

I want to understand what is happening with the below statement:
sum(til 100) where 000011110000b
That line evaluates to 22, and I can't figure out why. sum(til 100) is 4950. where 000011110000b returns the list 4 5 6 7. The kdb reference page doesn't seem to explain this use case. Why does the above line evaluate to 22?
Also, why does the below line result in error
4950 where 000011110000b

Square brackets are used for function call arguments, rather than parentheses. So the above is being interpreted as:
sum[(til 100) where 000011110000b]
And you can probably see now why that would evaluate to 22, i.e. it is the sum of the 5th,6th,7th,8th values of the list til 100, i.e. (0...99), because where is indexing into til 100 using the boolean list 000011110000b
q)til[100] where 000011110000b
4 5 6 7
As you will have noticed, you can omit square brackets when calling functions; but when doing so you need to be sure it will be parsed/interpreted as intended. In general, code is evaluated right-to-left so in this case (til 100) where 000011110000b was evaluated first, and the result passed to the sum function.
Regarding 4950 where 000011110000b - again if we think right to left, the interpreter is trying to pass the result of where 000011110000b to the function 4590. Although 4590 isn't a named function in kdb - this is the format used for IPC, i.e. executing commands on other kdb processes over TCP. So you are telling kdb to use the OS file descriptor number 4590 (which almost certainly won't correspond to a valid TCP connection) to run the command where 000011110000b. See the "message formats" section here for more details:
http://code.kx.com/q/tutorials/startingq/ipc/

Related

kdb: differences between value and eval

From KX: https://code.kx.com/q/ref/value/ says, when x is a list, value[x] will be result of evaluating list as a parse tree.
Q1. In code below, I understand (A) is a parse tree, given below definition. However, why does (B) also work? Is ("+";3;4) a valid parse tree?
q)value(+;3;4) / A
7
q)value("+";3;4) / B
7
q)eval(+;3;4) / C
7
q)eval("+";3;4) / D
'length
[0] eval("+";3;4)
Any other parse tree takes a form of a list, of which the first item
is a function and the remaining items are its arguments. Any of these
items can be parse trees. https://code.kx.com/q/basics/parsetrees/
Q2. In below code, value failed to return the result of what I think is a valid parse tree, but eval works fine, recursively evaluating the tree. Does this mean the topmost description is wrong?
q)value(+;3;(+;4;5))
'type
[0] value(+;3;(+;4;5))
^
q)eval(+;3;(+;4;5))
12
Q3. In general then, how do we choose whether to use value or eval?

put simply the difference between eval and value is that eval is specifically designed to evaluate parse trees, whereas value works on parse trees among other operations it does. For example value can be used to see the non-keyed values of dictionaries, or value strings, such as:
q)value"3+4"
7
Putting this string instead into the eval, we simply get the string back:
q)eval"3+4"
"3+4"
1 Following this, the first part of your question isn't too bad to answer. The format ("+";3;4) is not technically the parsed form of 3+4, we can see this through:
q)parse"3+4"
+
3
4
The good thing about value in this case is that it is valuing the string "+" into a the operator + and then valuing executing the parse tree. eval cannot understand the string "+" as this it outside the scope of the function. Which is why A, B and C work but not D.
2 In part two, your parse tree is indeed correct and once again we can see this with the parse function:
q)parse"3+(4+5)"
+
3
(+;4;5)
eval can always be used if your parse tree represents a valid statement to get the result you want. value will not work on all parse tree's only "simple" ones. So the nested list statement you have here cannot be evaluated by value.
3 In general eval is probably the best function of choice for evaluating your parse trees if you know them to be the correct parse tree format, as it can properly evaluate your statements, even if they are nested.

How to put multiple where statements into function on kdb+

I'm trying to write a function using kdb+ which will look at the list, and find the values that simply meet two conditions.
Let's call the list DR (for data range). And I want a function that will combine these two conditions
"DR where (DR mod 7) in 2"
and
"DR where (DR.dd) in 1"
I'm able to apply them one at a time but I really need to combine them into one function. I was hoping I could do this
"DR was (DR.dd mod 7) in 2 and DR where (DR.dd) in 1"
but this obviously didn't work. Any advice?

You can utilize the and function to help with this, which is the same as &:
q)dr:.z.d+til 100
q)and
&
q)2=dr mod 7
10000001000000100000010000001000000100000010000001000000100000010000001000000..
q)1=dr.dd
00000000000000000000000001000000000000000000000000000000100000000000000000000..
q)(1=dr.dd)&2=dr mod 7
00000000000000000000000000000000000000000000000000000000100000000000000000000..
q)dr where(1=dr.dd)&2=dr mod 7
2021.02.01 2021.03.01
Its necessary wrap the first part in brackets due to how kdb reads code from right to left. This format changes slightly when doing this in a where clause, the brackets arent needed due to how each where clause is parsed, that is each clause between the commas are parsed seperately. However it is essentially doing the same thing as the code above.
q)t:([]date:dr)
q)select from t where 1=date.dd,2=date mod 7
date
----------
2021.02.01
2021.03.01

You could also do this using min to achieve similar results, like so:
DR where min(1=DR.dd;2=DR mod 7)

Expressions/operator precedence in Amend At and in function parameters

I always thought that in q and in k all expressions divided ; evaluated left-to-right and operator precedence inside is right-to-left.
But then I tried to apply this principle to Ament At operator parameters. Confusingly it seems working in the opposite direction:
$ q KDB+ 3.6 2019.04.02 Copyright (C) 1993-2019 Kx Systems
q)#[10 20 30;g;:;100+g:1]
10 101 30
The same precedence works inside a function parameters too:
q){x+y}[q;10+q:100]
210
So why does it happend - why does it first calculate the last one parameter and only then first? Is it a feature we should avoid?
Upd: evaluation vs parsing.
There could be another cases: https://code.kx.com/q/ref/apply/#when-e-is-not-a-function
q)#[2+;"42";{)}]
')
[0] #[2+;"42";{)}]
q)#[string;42;a:100] / expression not a function
"42"
q)a // but a was assigned anyway
100
q)#[string;42;{b::99}] / expression is a function
"42"
q)b // not evaluated
'b
[0] b
^

The semicolon is a multi-purpose separator in q. It can separate statements (e.g. a:10; b:20), in which case statements are evaluated from left-to-right similar to many other languages. But when it separates elements of a list it creates a list expression which (an expression) is evaluated from right-to-left as any other q expression would be.
Like in this example:
q)(q;10+q:100)
110 100
One of many overloads of the dot operator (.) evaluates its left operand on a list of values in its right operand:
q){x+y} . (q;10+q:100)
210
In order to do so a list expression itself needs to be evaluated first, and it will be, from right-to-left as any other list expression.
However, the latter is just another way of getting the result of
{x+y}[q;10+q:100]
which therefore should produce the same value. And it does. By evaluating function arguments from right-to-left, of course!
A side note. Please don't be confused by the conditional evaluation statement $[a;b;c]. Even though it looks like an expression it is in fact a statement which evaluates a first and only then either b or c. In other words a, b and c are not arguments of some function $ in this case.

kdb/q question: How do I interpret this groupby in my functional selection?

I am new to kdb/q and am trying to figure out what this particular query means. The code is using functional select, which I am not overly comfortable with.
?[output;();b;a];
where output is some table which has columns size time symbol
the groupby filter dictionary b is defined as follows
key | value
---------------
ts | ("+";00:05:00v;("k){x*y div x:$[16h=abs[#x];"j"$x;x]}";00:05:00v;("%:";`time)))
sym | ("k){x'y}";"{`$(,/)("/" vs string x)}";`symbol)
For the sake of completeness, dictionary a is defined as
volume ("sum";`size)
In effect, the functional select seems to be bucketing the data into 5 minute buckets and doing some parsing in symbol. What baffles me is how to read the groupby dictionary. Especially the k)" part and the entire thing being in quotes. Can someone help me go through this or point me to resources that can help me understand? Any input will be appreciated.

The aggregation part of the function form takes a dictionary, the key being the output key column names and the values being parse tree functions.
A parse tree is an expression that is not immediately evaluated. The first argument as a function and subsequent elements are its arguments. The inner-most brackets are evaluated first and then it moves up the heirarchy, evaluating each one in turn. More detailed information can be found here and in the whitepaper linked on that page
You can use the function parse with a string argument to get the parse tree of a function. For example, the parse tree for 1+2+3 is (+;1;(+;2;3)):
q)parse "1+2+3"
+
1
(+;2;3)
The inner-most bracket (+;2;3) is evaluated first resulting in 5, before the result is propogated up to the outmost parse tree function (+;1;5) giving 6
The groupby part of the clause will evaluate one or more parse tree functions and then will collect together records with the same output from the grouping function.
Making the function a bit clearer to read:
(+;00:05:00v;({x*y div x:$[16h=abs[#x];"j"$x;x]}";00:05:00v;(%:;`time)))
Looking at the inner most bracket (%:;`time), it returns the result of %: applied on the time column. We can see that %: is k for the function ltime
q)ltime
%:
Moving up a level, the next function evaluated is the lambda function {x*y div x:$[16h=abs[#x];"j"$x;x]} with arguments 00:05:00v and the result of our previous evaluated function. The lambda rounds it down the the nearest 5 minute interval
({x*y div x:$[16h=abs[#x];"j"$x;x]};00:05:00v;(%:;`time))
Moving up once more to the whole expression it is equivalent to 00:05:00v + {x*y div x:$[16h=abs[#x];"j"$x;x]};00:05:00v;(%:;`time)), with 00:05:00 being added onto each result from the previous evaluation.
So essentially it first returns the local time of the timestamp, then
For the symbol aggregation
("k""{x'y}";{`$(,/)("/" vs string x)};`symbol)
The inner function {`$(,/)("/" vs string x)} strings a symbol, splits it at "/" character and then joins it back together, effectively removing the slash
The "k" is a function that evaluates the string using the k interpreter.
"k""{x'y}"" returns a function which itself takes a function x and argument y and modifies the function to use the each-both adverb '. This makes it so that the function x is applied on each symbol individually as opposed to the column as a whole.
This could be implemented in q instead of k like so:
({x#'y};{`$(,/)("/" vs string x)};`symbol)
The function {x#'y} takes the function argument {`$(,/)("/" vs string x)} and the symbol column as before, but we have to use # with the each-both adverb in q to apply the function on the arguments.
The aggregation function will then be applied to each group. In your case the function is a simple parse tree, which will return the sum of the size columns in each group, with the output column called volume
a:enlist[`volume]!enlist (sum;`size)

Error: Can't take log of -9.4351e+0.007

I am creating a mini search engine using Perl.While doing so I am using a formula with log to the base 10. However for some value I am getting an error:
Can't take log of -9.4351e+0.007.
It is impossible to track where I am getting this error from. I just want to ignore this case. How can this be handled in Perl. Subroutine for finding log to the base 10 is like this:
sub log10 {
my $n=shift;
return log($n)/log(10);
}
So probably i am looking for a check which says if so and so value dont find log.

You cannot take the log of negative numbers.
See Wolfram MathWorld for more details.

Apart from the value being negative, the string -9.4351e+0.007 is not a valid number as the exponent part of a floating-point constant can be only an integer.
You must be passing strings to your log10 function as Perl would not complain about a number in this format.
You need to look at the source of these values as something is going wrong before your function is called, and it will probably give you incorrect results even for those values that can be passed to log without error.

"ln y" means "find the x where ex equals y".
e is a positive number (near 2.17828), so no matter how many times you multiply e with itself, you'll never get a negative number.
You cannot find the log of negative numbers.
As Borodin also points out that -9.4351e+0.007 isn't recognized as a number by Perl.
>perl -wE"say 0+'-9.4351e+0.007'"
Argument "-9.4351e+0.007" isn't numeric in addition (+) at -e line 1.
-9.4351

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

kdb - how does the 'where' function work - kdb

Related

kdb: differences between value and eval

How to put multiple where statements into function on kdb+

Expressions/operator precedence in Amend At and in function parameters

kdb/q question: How do I interpret this groupby in my functional selection?

Error: Can't take log of -9.4351e+0.007

Categories

Resources