kdb q - create 2d buckets for positive integers - kdb

I am trying create 2d buckets in q
Given a 2d grid
5 o---o---o
| | |
3 o---o---o
| | |
0 o---o---o
0 3 5
each node on the grid defines the boundary of 2d buckets for positive integers. For example the center node would contain tuples (x;y) where 3<=x<5 and 3<=y<5. The nine buckets are indexed from 0,...,8.
The way I tried to implemented this in q is
bucketidx:{((0 3 5i) cross (0 3 5i)) bin "i"$(first x;last x)}
To traverse through the buckets:
bucketidx each ((0 3 5i) cross (0 3 5i))
/0j, 1j, 2j, 3j, 4j, 5j, 6j, 7j, 8j
However I get a strange behavior on bucketidx 6 0. I expect this to be in the upper left node
(5<=y) and (x=0)
but it returns index 8 which would be the upper right node. I hope it is clear what I am trying to do.
Thanks for the help

Thats because of the bin behavior.
Binary search returns the index of the last item in x which is <=y
https://code.kx.com/q/ref/search/#bin-binr
Your list is :
q) a:(0 3 5i) cross (0 3 5i)
q) a / (0 0; 0 3;0 5;3 0; 3 3; 3 5;5 0;5 3; 5 5)
You are searching (6 0) in this list using bin function and last item in that list which is <=(6;0) is (5;5) and index of that item is 8.
q) a bin 6 0 / 8
thats the reason you are getting 8.
I think 'tuple with bin' approach is not the right way to go for this problem.
You could use something similar to below idea. First argument to function is X coordinate and second is Y coordinate.
q) node:{b:0 3 5;(b bin x)+3*b bin y}
q) node[0;6] / 6

Another approach is to use a dictionary with sorted attribute, which makes it a step function.
q)d:`s#0 3 5!0 1 2
q)3 sv' d#(0 3 5i) cross (0 3 5i)
0 1 2 3 4 5 6 7 8
q)3 sv' d#enlist 6 0
,6

Related

matrix cells of the same row and column

Is there a more efficient/idiomatic way to retrieve cells of a matrix that are on the same row and column as the given cell?
q) f:{except[;y] x[y div n;],x[;y mod n:count first x]}
q) show A:s#til prd s:2 3
0 1 2
3 4 5
q) f[A;4]
3 5 1
q) f[A;2]
0 1 5
g:{
s:count each 1 first\x; // shape
rc:s vs y; // y as row-column
on:rc+/:{x,reverse each x} -1 1,'0; // orthogonal neighbours of rc
nn:on where all flip[on]within'0,'s-1; // near neighbours: eliminate out of range
x ./:nn }
q)A:2 3#til prd 2 3
q)g[A;4]
1 3 5
q)g[A;2]
5 1
If A contains only the indices of its raze (raze A) then we need only its shape, and g can return the indexes of the orthogonal neighbours of y.
h:{[s;y]
rc:s vs y; // y as row-column
on:rc+/:{x,reverse each x} -1 1,'0; // orthogonal neighbours of rc
nn:on where all flip[on]within'0,'s-1; // near neighbours: eliminate out of range
s sv/:nn }
q)h[2 3;4]
1 3 5
q)h[2 3;2]
5 1
Note that this can easily be adapted to diagonal neighbours instead of or as well as orthogonal neighbours; also to vector y.
Key concepts
Use sv and vs to encode/decode numbers to any arithmetical base
Use of map iterators Each and Each Right to control iteration
I'm not sure if your approach works in the general case? It may only work for your specific setup, e.g.
q)A:3 cut neg[6]?20
q)A
12 13 4
7 9 17
q)f[A;9]
12 7
One alternative approach is to use in to find the columns and rows to include
f2:{except[raze(x where y in'x),f where y in'f:flip x;y]}
q)f2[A;9]
7 17 13

How to efficiently get the counts of values (from a list) less than some index number?

It is hard for me to describe in words what this function does, but I have some working code.
f:{[n;k] sum flip k </: til n}
i:i: 3 4 6 7 13;
f[30;i]
0 0 0 0 1 2 2 3 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5i
In am concerned that the flip operation may be expensive for large input values. Is there a way to do this without the flip that is more efficient?
Being just as concise would be a nice-to-have.
For your inputs you can achieve the same result with
{[n;k] k binr til n}
This should work so long as k remains in ascending order.
Docs for binr are here: https://code.kx.com/q/ref/bin/
Seans answer is probably more efficient (test it), but why not change your each-right to each-left to avoid the flip?
q)g:{[n;k] sum k<\:til n}
q)f[30;i]~g[30;i]
1b

How to filter out bad values in a data set regarding a matrix in matlab?

I wanted to ask any keen users here how to "filter out" bad values regarding a tremendous amount of a data matrix in matlab.
e.g: I have a MATLAB data file containing values 2*5000 (double) which represent x and y coordinates. How is it possible to delete all values above or under a certain limit?
or easier:
(matrix from data file)
1 2 4 134 2
3 5 5 4 2
or
1 2 4 9 2
3 5 5 234 2
setting a certain limit and delete column:
1 2 4 2
3 5 5 2
Find the "bad" elements, e.g. A < 0 | A > 20
Find the "good" columns, e.g. ~max(A < 0 | A > 20)
Keep the "good" columns / Remove the "bad" columns, e.g. A(:, ~max(A < 0 | A > 20))

KDB/Q how do we calculate the moving median

There are already moving average in kdb/q.
https://code.kx.com/q/ref/avg/#mavg
But how do I compute moving median?
Here is a naive approach. It starts with an empty list and null median and iterates over the list feeding in a new value each time.
Sublist is used fix the window, and this window is passed along with the median as the state of into the next iteration.
At the end scan \ will output the state at every iteration from which we take the median (first element) from each one
mmed:{{(med l;l:neg[x] sublist last[y],z)}[x]\[(0n;());y][;0]}
q)mmed[5;til 10]
0 0.5 1 1.5 2 3 4 5 6 7
q)i:4 9 2 7 0 1 9 2 1 8
q)mmed[3;i]
4 6.5 4 7 2 1 1 2 2 2
There's also a generic "sliding window" function here which you can pass your desired aggregator into: https://code.kx.com/q/kb/programming-idioms/#how-do-i-apply-a-function-to-a-sequence-sliding-window
q)swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
q)swin[avg; 3; til 10]
0 0.33333333 1 2 3 4 5 6 7 8
q)update newcol:swin[med;10;mycol] from tab

Vector-defined cross product application matrix and vectorization in Matlab

I ran into an operation I cannot seem to achieve via vectorization.
Let's say I want to find the matrix of the application defined by
h: X -> cross(V,X)
where V is a predetermined vector (both X and V are 3-by-1 vectors).
In Matlab, I would do something like
M= cross(repmat(V,1,3),eye(3,3))
to get this matrix. For instance, V=[1;2;3] yields
M =
0 -3 2
3 0 -1
-2 1 0
Let's now suppose that I have a 3-by-N matrix
V=[V_1,V_2...V_N]
with each column defining its own cross-product operation. For N=2, here's a naive try to find the two cross-product matrices that V's columns define
V=[1,2,3;4,5,6]'
M=cross(repmat(V,1,3),repmat(eye(3,3),1,2))
results in
V =
1 4
2 5
3 6
M =
0 -6 2 0 -3 5
3 0 -1 6 0 -4
-2 4 0 -5 1 0
while I was expecting
M =
0 -3 2 0 -6 5
3 0 -1 6 0 -4
-2 1 0 -5 4 0
2 columns are inverted.
Is there a way to achieve this without for loops?
Thanks!
First, make sure you read the documentation of cross very carefully when dealing with matrices:
It says:
C = cross(A,B,DIM), where A and B are N-D arrays, returns the cross
product of vectors in the dimension DIM of A and B. A and B must
have the same size, and both SIZE(A,DIM) and SIZE(B,DIM) must be 3.
Bear in mind that if you don't specify DIM, it's automatically assumed to be 1, so you're operating along the columns. In your first case, you specified both the inputs A and B to be 3 x 3 matrices. Therefore, the output will be the cross product of each column independently due to the assumption that DIM=1. As such, you expect that the i'th column of the output contains the cross product of the i'th column of A and the i'th column of B and the number of rows is expected to be 3 and the number of columns needs to match between A and B.
You're getting what you expect because the first input A has [1;2;3] duplicated correctly over the columns three times. From your second piece of code, what you're expecting for V as the first input (A) looks like this:
V =
1 1 1 4 4 4
2 2 2 5 5 5
3 3 3 6 6 6
However, when you do repmat, you are in fact alternating between each column. In fact, you are getting this:
V =
1 4 1 4 1 4
2 5 2 5 2 5
3 6 3 6 3 6
repmat tile matrices together and you specified that you wanted to tile V horizontally three times. That's obviously not correct. This explains why the columns are swapped because the second, fourth and sixth columns of V actually should appear at the last three columns instead. As such, the ordering of your input columns is the reason why the output appears swapped.
As such, you need to re-order V so that the first three vectors are [1;2;3], followed by the next three vectors as [4;5;6] after. Therefore, you can generate your original V matrix first, then create a new matrix such that the odd column comes first in a group of three, followed by the even column in a group of three after:
>> V = [1,2,3;4,5,6].';
>> V = V(:, [1 1 1 2 2 2])
V =
1 1 1 4 4 4
2 2 2 5 5 5
3 3 3 6 6 6
Now use V with cross and maintain the same second input:
>> M = cross(V, repmat(eye(3), 1, 2))
M =
0 -3 2 0 -6 5
3 0 -1 6 0 -4
-2 1 0 -5 4 0
Looks good to me!