Equal not throwing error during list comparison in KDB - kdb

I have a table
t:([] s:`GOOG`APPL;d:2?.z.d ;t:2?.z.t)
I'd use = while selecting a single record :
q)select from t where s=`GOOG
s d t
----------------------------
GOOG 2017.01.25 04:30:47.898
However I am expecting an error while selecting multiple records using '=' because I should be using 'in' , but it is returning the correct data rather than throwing any error :
q)select from t where s=`GOOG`APPL
s d t
----------------------------
GOOG 2017.01.25 04:30:47.898
APPL 2007.12.27 04:07:38.923

In the first case you are comparing a vector to an atom. = is overloaded for this operation, so it will compare each element in the vector to that atom
q)t[`s]
`GOOG`APPL
q)t[`s]=`GOOG
10b
In the second case, where you expect the error, = is doing a vector to vector comparison. Since the length of both vectors are the same it works and does not throw a length error :
q)t[`s]=`GOOG`APPL
11b
Changing the order won't return anything due to the element by element comparison :
q)t[`s]=`APPL`GOOG
00b
e.g. for the following table with 3 elements
t:([] s:`GOOG`APPL`HP;d:3?.z.d ;t:3?.z.t)
q)t[`s]
`GOOG`APPL`HP
you'd get a length error
q)t[`s]=`GOOG`APPL
'length
and therefore using in would fix the error
q)t[`s] in `GOOG`APPL
110b

Related

KDB: select and round off each row

I created my own function of round off:
.q.rnd:{$[x < 0; -1; 1] * floor abs[x] + 0.5}
I have a table Test with a string column of COL
select "F"$(COL) from Test
24549.18741328
48939.50717263
-274853.33568872
-24549.18741328
298753.62574861
84822.70074144
-7468840.64371524
117944.21228603
-117944.21228603
7468840.64371524
-7468840.64371524
I want to derive a table that would round-off the records in Test
One would think that the statement below would work. But it does not.
select .q.rnd "F"$(COL) from Test
I get the error "type". So how do I round off the records?
The result if the if-else conditional must be an atomic boolean. When you run .q.rnd on a column, you are operating on a list and x<0 is going to return a list of booleans, not an atom. The vector conditional is ?
Nonetheless, it looks like you want a resulting integer/long anyway, so just use parse here
q)t:([]string (10?-1 1)*10?10000f)
q)select "F"$x from t
x
-------------------
4123.1701336801052
-9877.8444156050682
-3867.3530425876379
7267.8099689073861
4046.5459413826466
-8355.0649625249207
6427.3701561614871
-5830.2619284950197
1424.9352994374931
-9149.8820902779698
q)select "j"$"F"$x from t
x
-----
4123
-9878
-3867
7268
4047
-8355
6427
-5830
1425
-9150
To add to what Sean's said, if you wanted to use your function as well you could use each which will apply .q.rnd to each item in the list.
q)select .q.rnd each "F"$x from t
x
-----
-3928
5171
5160
-4067
-1781
3018
-7850
5347
-7112
-4116
but using select "F"$x from t is better as it is vectorised.
q)\t:1000 select "j"$"F"$x from t
22
q)\t:1000 select .q.rnd each "F"$x from t
33
Also it should be noted that the .q namespace isn't necessary and is "reserved for kx use". A lot of the default q functions are in the .q namespace and there's always a chance future kdb updates could add a .q.rnd that has different behaviour and will break any code where you have used your function in.

Inconsistent Results Using Loopback Count End-Point With Filter+Where vs. Just Where

We're getting unexpected and inconsistent results when using Loopback's count end-point. Using filter[where] always returns the total number of rows in the table, ignoring any filter. For example, the following call always returns a value of 9 which is not correct (given the data below):
/api/class/count?filter[where][companyrowid]=1
Class Table Data (MySQL)
rowid description companyrowid
3367 test1 0
3366 test2 0
3364 Asia Division 1
3365 Australia Division 1
3362 Canada Division 1
3363 Europe Division 1
3359 US East Division 1
3361 US Midwest Division 1
3360 US West Division 1
Strangely, using 'where[companyid]=1' returns the correct value of 7. Close inspection of the http request object shows that filter[where][...] and where[..] filters are actually stored separately and yield different results:
/api/class/count?filter[where][companyrowid]=1
req.query.filter.where = {companyid: "1"}
req.query.where = undefined
Result: 9 (incorrect)
/api/class/count?where[companyrowid]=1
req.query.filter.where = undefined
req.query.where = {companyid: "1"}
Result: 7 (correct)
The biggest issue is the filter[where] ignoring the filter.
count() method only takes the where clause instead of the filter.
So your REST API should be like this
/api/class/count?[where][companyrowid]=1

how to handle type F in a table in Q/KDB

I have started to learn q/KDB since a while, therefore forgive me in advance for trivial question but I am facing the following problem I don't know how to solve.
I have a table named "res" showing, side, summation of orders and average_price of some simbols
sym side | sum_order avg_price
----------| -------------------
ALPHA B | 95109 9849.73
ALPHA S | 91662 9849.964
BETA B | 47 9851.638
BETA S | 60 9853.383
with these types
c | t f a
---------| -----
sym | s p
side | s
sum_order| f
avg_price| f
I would like to calculate close and open positions, average point, made by close position, and average price of the open position.
I have used this query which I believe it is pretty bizarre (I am sure there will be a more professional way to do it) but it works as expected
position_summary:select
close_position:?[prev[sum_order]>sum_order;sum_order;prev[sum_order]],
average_price:avg_price-prev[avg_price],
open_pos:prev[sum_order]-sum_order,
open_wavgprice:?[sum_order>next[sum_order];avg_price;next[avg_price]][0]
by sym from res
giving me the following table
sym | close_position average_price open_pos open_wavgprice
----------| ----------------------------------------------------
ALPHA | 91662 0.2342456 3447 9849.73
BETA | 47 1.745035 -13 9853.38
and types are
c | t f a
--------------| -----
sym | s s
close_position| F
average_price | F
open_pos | F
open_wavgprice| f
Now my problem starts here, imagine I join position_summary table with another table appending another column "current_price" of type f
What I want to do is to determinate the points of the open positions.
I have tried this way:
select
?[open_pos>0;open_price-open_wavgprice;open_wavgprice-open]
from position_summary
but I got 'type error,
surely because sum_order is type F and open_wavgprice and current_price are f. I have search on internet by I did not find much about F type.
First: how can I handle this ? I have tried "cast" or use "raze" but no effects and moreover I am not sure if they are right on this particular occasion.
Second: is there a better way to use "if-then" during query tables (for example, in plain English :if this row of this column then take the previous / next of another column or the second or third of previous /next column)
Thank you for you help
Let me rephrase your question using a slightly simpler table:
q)show res:([sym:`A`A`B`B;side:`B`S`B`S]size:95 91 47 60;price:49.7 49.9 51.6 53.3)
sym side| size price
--------| ----------
A B | 95 49.7
A S | 91 49.9
B B | 47 51.6
B S | 60 53.3
You are trying to find the closing position for each symbol using a query like this:
q)show summary:select close:?[prev[size]>size;size;prev[size]] by sym from res
sym| close
---| -----
A | 91
B | 47
The result seems to have one number in each row of the "close" column, but in fact it has two. You may notice an extra space before each number in the display above or you can display the first row
q)first 0!summary
sym | `A
close| 0N 91
and see that the first row in the "close" column is 0N 91. Since the missing values such as 0N are displayed as a space, it was hard to see them in the earlier display.
It is not hard to understand how you've got these two values. Since you select by sym, each column gets grouped by symbol and for the symbol A, you have
q)show size:95 91
95 91
and
q)prev size
0N 95
that leads to
q)?[prev[size]>size;size;prev[size]]
0N 91
(Recall that 0N is smaller than any other integer.)
As a side note, ?[a>b;b;a] is element-wise minimum and can be written as a & b in q, so your conditional expression could be written as
q)size & prev size
0N 91
Now we can see why ? gave you the type error
q)close:exec close from summary
q)close
91
47
While the display is deceiving, "close" above is a list of two vectors:
q)first close
0N 91
and
q)last close
0N 47
The vector conditional does not support that:
q)?[close>0;10;20]
'type
[0] ?[close>0;10;20]
^
One can probably cure that by using each:
q)?[;10;20]each close>0
20 10
20 10
But I don't think this is what you want. Your problem started when you computed the summary table. I would expect the closing position to be the sum of "B" orders minus the sum of "S" orders that can be computed as
q)select close:sum ?[side=`B;size;neg size] by sym from res
sym| close
---| -----
A | 4
B | -13
Now you should be able to fix the rest of the columns in your summary query. Just make sure that you use an aggregation function such as sum in the expression for every column.
Type F means the "cell" in the column contains a vector of floats rather than an atom. So your column is actually a vector of vectors rather than a flat vector.
In your case you have a vector of size 1 in each cell, so in your case you could just do:
select first each close_position, first each average_price.....
which will give you a type f.
I'm not 100% on what you were trying to do in the first query, and I don't have a q terminal to hand to check but you could put this into your query:
select close_position:?[prev[sum_order]>sum_order;last sum_order; last prev[sum_order].....
i.e. get the last sum_order in the list.

kdb+/q: Apply iterative procedure with updated variable to a column

Consider the following procedure f:{[x] ..} with starting value a:0:
Do something with x and a. The output is saved as the new version of a, and the output is returned by the function
For the next input x, redo the procedure but now with the new a.
For a single value x, this procedure is easily constructed. For example:
a:0;
f:{[x] a::a+x; :a} / A simple example (actual function more complicated)
However, how do I make such a function such that it also works when applied on a table column?
I am clueless how to incorporate this step for 'intermediate saving of a variable' in a function that can be applied on a column at once. Is there a special technique for this? E.g. when I use a table column in the example above, it will simply calculate a+x with a:0 for all rows, opposed to also updating a at each iteration.
No need to use global vars for this - can use scan instead - see here.
Example --
Generate a table -
q)t:0N!([] time:5?.z.p; sym:5?`3; price:5?100f; size:5?10000)
time sym price size
-----------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994
2007.05.21D04:26:13.021438816 llm 7.347808 496
2010.10.30D10:15:14.157553088 obp 31.59526 1728
2005.11.01D21:15:54.022395584 dhc 34.10485 5486
2005.03.06D21:05:07.403334368 mho 86.17972 2318
Example with a simple accumilator - note, the function has access to the other args if needed (see next example):
q)update someCol:{[a;x;y;z] (a+1)}\[0;time;price;size] from t
time sym price size someCol
-------------------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994 1
2007.05.21D04:26:13.021438816 llm 7.347808 496 2
2010.10.30D10:15:14.157553088 obp 31.59526 1728 3
2005.11.01D21:15:54.022395584 dhc 34.10485 5486 4
2005.03.06D21:05:07.403334368 mho 86.17972 2318 5
Say you wanted to get cumilative size:
q)update cuSize:{[a;x;y;z] (a+z)}\[0;time;price;size] from t
time sym price size cuSize
------------------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994 3994
2007.05.21D04:26:13.021438816 llm 7.347808 496 4490
2010.10.30D10:15:14.157553088 obp 31.59526 1728 6218
2005.11.01D21:15:54.022395584 dhc 34.10485 5486 11704
2005.03.06D21:05:07.403334368 mho 86.17972 2318 14022
If you wanted more than one var passed through the scan, can pack more values into the first var, by giving it a more complex structure:
q)update cuPriceAndSize:{[a;x;y;z] (a[0]+y;a[1]+z)}\[0 0;time;price;size] from t
time sym price size cuPriceAndSize
--------------------------------------------------------------
2002.04.04D18:06:07.889113280 cmj 29.07093 3994 29.07093 3994
2007.05.21D04:26:13.021438816 llm 7.347808 496 36.41874 4490
2010.10.30D10:15:14.157553088 obp 31.59526 1728 68.014 6218
2005.11.01D21:15:54.022395584 dhc 34.10485 5486 102.1188 11704
2005.03.06D21:05:07.403334368 mho 86.17972 2318 188.2986 14022
#MdSalih solution is correct, I am just explaining here what could be the possible reason with global variable in your case and solution for that.
q) t:([]id: 1 2)
q)a:1
I think you might have been using it like this:
q) select k:{x:x+a;a::a+1;:x} id from t
output:
k
--
1
2
And a value is 2 which means function executed only once. Reason is we passed full id column list to function and (+) is atomic which means it operates on full list at once. In following ex. 2 will get added to all items in list.
q) 2 + (1;3;5)
Correct way to use it is 'each':
q)select k:{x:x+a;a::a+1;:x} each id from t
output:
k
--
2
3

Rows without repetitions - MATLAB

I have a matrix (4096x4) containing all possible combinations of four values taken from a pool of 8 numbers.
...
3 63 39 3
3 63 39 19
3 63 39 23
3 63 39 39
...
I am only interested in the rows of the matrix that contain four unique values. In the above section, for example, the first and last row should be removed, giving us -
...
3 63 39 19
3 63 39 23
...
My current solution feels inelegant-- basically, I iterate across every row and add it to a result matrix if it contains four unique values:
result = [];
for row = 1:size(matrix,1)
if length(unique(matrix(row,:)))==4
result = cat(1,result,matrix(row,:));
end
end
Is there a better way ?
Approach #1
diff and sort based approach that must be pretty efficient -
sortedmatrix = sort(matrix,2)
result = matrix(all(diff(sortedmatrix,[],2)~=0,2),:)
Breaking it down to few steps for explanation
Sort along the columns, so that the duplicate values in each row end up next to each other. We used sort for this task.
Find the difference between consecutive elements, which will catch those duplicate after sorting. diff was the tool for this purpose.
For any row with at least one zero indicates rows with duplicate rows. To put it other way, any row with no zero would indicate rows with no duplicate rows, which we are looking to have in the output. all got us the job done here to get a logical array of such matches.
Finally, we have used matrix indexing to select those rows from matrix to get the expected output.
Approach #2
This could be an experimental bsxfun based approach as it won't be memory-efficient -
matches = bsxfun(#eq,matrix,permute(matrix,[1 3 2]))
result = matrix(all(all(sum(matches,2)==1,2),3),:)
Breaking it down to few steps for explanation
Find a logical array of matches for every element against all others in the same row with bsxfun.
Look for "non-duplicity" by summing those matches along dim-2 of matches and then finding all ones elements along dim-2 and dim-3 getting us the same indexing array as had with our previous diff + sort based approach.
Use the binary indexing array to select the appropriate rows from matrix for the final output.
Approach #3
Taking help from MATLAB File-exchange's post combinator
and assuming you have the pool of 8 values in an array named pool8, you can directly get result like so -
result = pool8(combinator(8,4,'p'))
combinator(8,4,'p') basically gets us the indices for 8 elements taken 4 at once and without repetitions. We use these indices to index into the pool and get the expected output.
For a pool of a finite number this will work. Create is unique array, go through each number in pool, count the number of times it comes up in the row, and only keep IsUnique to 1 if there are either one or zero numbers found. Next, find positions where the IsUnique is still 1, extract those rows and we finish.
matrix = [3,63,39,3;3,63,39,19;3,63,39,23;3,63,39,39;3,63,39,39;3,63,39,39];
IsUnique = ones(size(matrix,1),1);
pool = [3,63,39,19,23,6,7,8];
for NumberInPool = 1:8
Temp = sum((matrix == pool(NumberInPool))')';
IsUnique = IsUnique .* (Temp<2);
end
UniquePositions = find(IsUnique==1);
result = matrix(UniquePositions,:)