How to apply a function to an entire column? - kdb

I have the following table from a JDBC connection in Q.
q)r
some_int this created_at updated_at ..
-----------------------------------------------------------------------------..
1231231 "ASD" 2016.02.11D14:16:29.743260000 2016.02.11D14:16:29...
13312 "TSM" 2016.02.11D14:16:29.743260000 2016.02.11D14:16:29...
I would like to apply the following function to the first column.
deviation:{a:avg x; sqrt avg (x*x)-a*a}
This works for arrays.
q)l
1 2 3 4
q)deviation l
1.118034
How can I apply deviation on a column in a table? It seems my approach does not work:
q)select deviation(some_id) from r
'rank
UPDATE:
I cannot explain the following:
q)select avg(some_int) from r
some_int
---------
1005341
q)select min(some_int) from r
some_int
---------
812361
q)select max(some_int) from r
some_int
---------
1184014
q)select sum(some_int) from r
some_int
---------

You need to enlist the result if it is an atom since table columns must be lists, not atoms. Normally kdb can do this for you but often not when you're performing your own custom aggregations. For example, even if you define a function sum2 to be an exact copy of sum:
q)sum2:sum
kdb can only recognise sum as an aggregation and will enlist automatically, but not for sum2
q)select sum col1 from ([]col1:1 2 3 4)
col1
----
10
q)select sum2 col1 from ([]col1:1 2 3 4)
'rank
So you need to enlist in the second case:
q)select enlist sum2 col1 from ([]col1:1 2 3 4)
col1
----
10
UPDATE:
To answer your second question - it looks like your sum of numbers has spilled over the boundary for an integer. You'd need to convert them to long and then sum
q)select sum col1 from ([]col1:2147483645 1i)
col1
----------
2147483646
Above is the maximum integer. Adding one more gives infinity for an int
q)select sum col1 from ([]col1:2147483645 1 1i)
col1
----
0W
Adding anything more than that shows a blank (null)
q)select sum col1 from ([]col1:2147483645 1 1 1i)
col1
----
Solution is to cast to long before summing (or make them long in the first place)
q)select sum `long$col1 from ([]col1:2147483645 1 1 1i)
col1
----------
2147483648

You get a rank because the function does not return a list. Since the function returns a single number presumably you just want the single number answer? In which case you can simple index into the table (or use exec) to get the column vector and apply it:
deviation t`some_id
Else if you want to retain a table as the answer if you enlist the result:
select enlist deviation some_id from t

Related

Replace first n entries in a column in kdb

How can I replace the values in the first n columns of my table?
i.e. mycol:(1 2 3 4) to mycol:(a a 3 4)
Thank you in advance!
If it's the values within mycol that you want updated then they will need to be of the same type as the existing values. See below.
q)t:([]mycol:`$string 1+til 4;mycol2:til 4)
q)update mycol:`a from t where i<2
mycol mycol2
------------
a 0
a 1
3 2
4 3
One way around this though is to enlist mycol, that way updates of any type can be made.
q)t:([]mycol:1+til 4;mycol2:til 4)
q)update mycol:`a from(update enlist each mycol from t)where i<2
mycol mycol2
------------
`a 0
`a 1
,3 2
,4 3
q)meta update mycol:`a from(update enlist each mycol from t)where i<2
c | t f a
------| -----
mycol |
mycol2| j
It's unclear from your question whether you want the column names or the column values changed. If it's the column names, you can use xcol.
q)(2#`a)xcol([]w:3#til 3;x:3#.Q.a;y:`;z:0N)
a a y z
-------
0 a
1 b
2 c

Fby q/kdb aggregation function

I want to apply an fby using count distinct, i.e.
select from t where 1=(count distinct; column) fby another_column
How can I do this?
To add to what's already up there you could do this in less characters with an # apply
q)n:100
// Create a table where all entries for sym=`a are size 10
q)show t:update size:10 from ([]sym:n#`a`b`c`d;size:n?200) where sym = `a
sym size
--------
a 10
b 28
c 51
d 64
a 10
b 43
...
// Use count distinct# to select from the table as per your requirements
q)select from t where 1=(count distinct#;size) fby sym
sym size
--------
a 10
a 10
a 10
a 10
a 10
0N! is a great operator for checking the operation of these queries 'in-flight'
Using it we can see count distinct fails on it's own because it tries to count the function distinct which returns 1
q)select from t where 1=(0N!count distinct; size) fby sym
1
'type
[0] select from t where 1=(0N!count distinct; size) fby sym
However with dyadic # we can create a handy projection
q)select from t where 1=(0N!(count distinct#); size) fby sym
##[?:]
sym size
--------
a 10
a 10
a 10
a 10
...
Here I've had to wrap using brackets here to prevent 0N! getting sucked into the count distinct # projection. In k-speak this effectively translates to 'count the result of the distinct operator applied to whatever the second argument to # is'. Quite handy for code-golfing
It'd be easier to know for sure if you provide a sample table and desired output, but one of the following will likely help.
The simplest solution would be to use an anonymous function:
select from t where 1=({count distinct x};c1) fby c2
Alternatively you can use this syntax which I first saw used in Nick Psaris' new book Fun Q:
select from t where 1=(count distinct ::;c1) fby c2

kdb/q -- how to number the rows by certain groupings

I have tables with date;sym columns. But each date might have multiple syms. I want to number the occurrences of symbol in each date
For example:
date sym
-------------------
2019.06.04 ABC
2019.06.04 DEF
2019.06.04 ABC
2019.06.05 DEF
2019.06.05 ABC
will give me
date sym c
-------------------
2019.06.04 ABC 1
2019.06.04 DEF 1
2019.06.04 ABC 2 / here ABC appears for the second time on this date.
2019.06.05 DEF 1
2019.06.05 ABC 1
This may be a little cleaner, here the c column is just a running sum of all rows that have been grouped by each combination of date and sym.
q)t:([]date:2019.06.04+0 0 0 1 1;sym:`ABC`DEF`ABC`DEF`ABC)
q)update c:sums i=i by date,sym from t
date sym c
----------------
2019.06.04 ABC 1
2019.06.04 DEF 1
2019.06.04 ABC 2
2019.06.05 DEF 1
2019.06.05 ABC 1
To count the occurrences of syms by date across all of the tables in a HDB we can run a count by date for each of the partitioned tabled .Q.pt and then scan that over pj plus join, as each table is keyed on date (matching keys). As pj is similar to an ij we need to ensure that there are no rows dropped as each date might be missing different syms
q)cntTabs:{2!0!update c:count each sym,sym:first each sym from select sym by date from x} each .Q.pt
q){t:pj[x;y];t,k!y k:key[y] except key[t]}/[cntTabs]

kdb how to calculate rolling count

Assume I have a table of events, with Timestamp and Type.
t1, 'b'
t2, 'x'
t3, 's'
t4, 'b'
How can I get a rolling count such that it would give me a list of all timestamps and the cummulative number of events up to taht ts, sort of like a count version of sums
for example for 'b' I d like a table
't1', 1
't2', 1
't3', 1
't4', 2
Here is one way to do it, although there may be a more clever way this uses sums:
//table definition
tab:([]a:`t1`t2`t3`t4;b:"bxsb")
//rolling sum of 1 by column b
update sums count[i]#1 by b from tab
Results in:
a b x
------
t1 b 1
t2 x 1
t3 s 1
t4 b 2
If you wanted replace b you would simply put b: in front of the sums .
One way:
q)t:([]p:asc 4?.z.p+til 1000;t:`b`x`s`b)
q)asc `p xcols ungroup select p,til count i by t from t
p t x
---------------------------------
2017.05.16D09:42:48.259062090 b 0
2017.05.16D09:42:48.259062585 x 0
2017.05.16D09:42:48.259062683 s 0
2017.05.16D09:42:48.259062858 b 1
Ps: Note I have started the sequence at 0 as if to say "I've had 0 events prior to this row" instead of beginning at 1 as per your example. It goes with your req "number of events up to that ts". If you need 1, just add 1 '1+til count i'. Also ensure your time is sorted so as it makes sense when beginning the sequence.
With table t as below:
q)show t: ([]ts:.z.t - desc "u"$(til 4);symb:`b`x`z`b)
ts symb
-----------------
09:46:56.384 b
09:47:56.384 x
09:48:56.384 z
09:49:56.384 b
using a vector conditional:
q)select ts, cum_count:sums ?[symb=`b;1;0] from t
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
The same, but with a function taking symb as a parameter:
q){select ts, cum_count:sums ?[symb=x;1;0] from t}[`b]
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
In fact you don't need a vector conditional because you can just sum the booleans directly:
q){select ts, cum_count:sums symb=x from t}[`b]
ts cum_count
----------------------
09:46:56.384 1
09:47:56.384 1
09:48:56.384 1
09:49:56.384 2
This also works
update x:1+til count i by b from tab

Why is it blank when I do summation in kdb?

I did a simple query:
select sum(sol) from data
and it returned blank for the result. I checked if there is any null and it doesn't have any null value. How can I know what was wrong?
It is possible that your sum is too big to be stored as integer? That would create a blank/null
q)tab:([] col1:1 2i,0Wi-1i)
q)
q)tab
col1
----------
1
2
2147483646
q)
q)meta tab
c | t f a
----| -----
col1| i
q)
q)
q)select sum col1 from tab
col1
----
q)
make sure you are not assigning the result to a variable or ending the line with a semi-colon as these will suppress the immediate output of the results.
q)data:([]sol:1 2 3 4 5)
q)
q) /Expected result.
q)select sum(sol) from data
sol
---
15
q)
q) /Output suppressed by semi-colon.
q)select sum(sol) from data;
q)
q) /Output suppressed by variable assignment.
q)example:select sum(sol) from data
q)
Also, even if data has no records in it (eg: count data = 0) it will still output the column name but with no results underneath.
q)data:([]sol:())
q)
q)select sum(sol) from data
sol
---
q)
A null value in the data set shouldn't effect the output (Of sum at least) but if you want to replace the null values (with zero for example you could do this)
q)update 0^sol from `data
This use's the ^ (fill) operator.