kdb - how to create sum a list of dynamic columns using functional select - kdb

I want to be able to construct (+; (+; `a; `b); `c) given a list of `a`b`c
Similarly if I have a list of `a`b`c`d, I want to be able to construct another nest and so on and so fourth.
I've been trying to use scan but I cant get it right

q)fsum:(+;;)/
enlist[+;;]/
q)fsum `a`b`c`d
+
(+;(+;`a;`b);`c)
`d

If you only want the raw parse tree output, one way is to form the equivalent string and use parse. This isn't recommended for more complex examples, but in this case it is clear.
{parse "+" sv string x}[`a`b`c`d]
+
`d
(+;`c;(+;`b;`a))
If you are looking to use this in a functional select, we can use +/ instead of adding each column individually, like how you specified in your example
q)parse"+/[(a;b;c;d)]"
(/;+)
(enlist;`a;`b;`c;`d)
q)f:{[t;c] ?[t;();0b;enlist[`res]!enlist (+/;(enlist,c))]};
q)t:([]a:1 2 3;b:4 5 6;c:7 8 9;d:10 11 12)
q)f[t;`a`b`c]
res
---
12
15
18
q)f[t;`a`b]
res
---
5
7
9
q)f[t;`a`b`c]~?[t;();0b;enlist[`res]!enlist (+;(+;`a;`b);`c)]
1b
You can also get the sum by indexing directly to return a list of each column values and sum over these. We use (), to turn any input into a list, otherwise it will sum the values in that single column and return only a single value
q)f:{[t;c] sum t (),c}
q)f[t;`a`b`c]
12 15 18

Related

Extracting kdb list values based on some condition

Say we have a kdb list
L1:(1 2 3 4 5)
Apply condition
L1 < 3
And how can I retrieve result in another list (1 2)
You can use the where keyword for this:
q)l1 where l1<3
1 2
Applying l1<3 will return a list of booleans 11000b. Using where on this list will return the index of every 1b
q)where 11000b
0 1
Then indexing back into the original list will return the result in another list.

KDB: select and round off each row

I created my own function of round off:
.q.rnd:{$[x < 0; -1; 1] * floor abs[x] + 0.5}
I have a table Test with a string column of COL
select "F"$(COL) from Test
24549.18741328
48939.50717263
-274853.33568872
-24549.18741328
298753.62574861
84822.70074144
-7468840.64371524
117944.21228603
-117944.21228603
7468840.64371524
-7468840.64371524
I want to derive a table that would round-off the records in Test
One would think that the statement below would work. But it does not.
select .q.rnd "F"$(COL) from Test
I get the error "type". So how do I round off the records?
The result if the if-else conditional must be an atomic boolean. When you run .q.rnd on a column, you are operating on a list and x<0 is going to return a list of booleans, not an atom. The vector conditional is ?
Nonetheless, it looks like you want a resulting integer/long anyway, so just use parse here
q)t:([]string (10?-1 1)*10?10000f)
q)select "F"$x from t
x
-------------------
4123.1701336801052
-9877.8444156050682
-3867.3530425876379
7267.8099689073861
4046.5459413826466
-8355.0649625249207
6427.3701561614871
-5830.2619284950197
1424.9352994374931
-9149.8820902779698
q)select "j"$"F"$x from t
x
-----
4123
-9878
-3867
7268
4047
-8355
6427
-5830
1425
-9150
To add to what Sean's said, if you wanted to use your function as well you could use each which will apply .q.rnd to each item in the list.
q)select .q.rnd each "F"$x from t
x
-----
-3928
5171
5160
-4067
-1781
3018
-7850
5347
-7112
-4116
but using select "F"$x from t is better as it is vectorised.
q)\t:1000 select "j"$"F"$x from t
22
q)\t:1000 select .q.rnd each "F"$x from t
33
Also it should be noted that the .q namespace isn't necessary and is "reserved for kx use". A lot of the default q functions are in the .q namespace and there's always a chance future kdb updates could add a .q.rnd that has different behaviour and will break any code where you have used your function in.

Select a number of random rows based on one columan condition in matlab

I have a table 'X' like this:
name value score
joy 3 60
rony 8 50
macheis 20 20
joung 2 80
joy 8 3
joy 90 0
joung 4 78
machies 3 23
joy 7 99
I want to select 2 random rows(with name, value, score) where the name is 'joy'.
I applied something like this:
mnew = datasample(find(X.name=='joy'),2); but it does not work! and gives me the error: Undefined operator '==' for input arguments of type 'cell'.
The rows should be selected randomly (with all columns values) where the name is joy.
Does anyone any other solution of this problem? how can i do it in MATLAB?
You have the right idea, but in order to check for the presence of a string within a cell array of strings, you need to use strcmp, ismember, or another method for comparing a string to a cell array.
You probably also want to specify that you don't want to use replacement when calling datasample so you don't get the same row twice.
subx = X(datasample(find(strcmp(X.name, 'joy')), 2, 'Replace', false),:);

Selecting all columns in a cell array that contain a certain value in the first row?

I currently have a 4x3500 cell array. First row is a single number, 2 row is a single string, 3rd and 4th rows are also single numbers.
Ex:
1 1 2 3 3 4 5 5 5 6
hi no ya he ........ % you get the idea
28 34 18 0 3 ......
55 2 4 42 24 .....
I would like to be able to select all columns that have a certain value in the first row. ie if I wanted '1' as the first row value, it would return
1 1
hi no
28 34
55 2
Then I would like to sort based on the 2nd row's string. ie if I wanted to have'hi', it would return:
1
hi
28
55
I have attempted to do:
variable = cellArray{:,find(cellArray{1,:} == 1)}
However I keep getting:
Error using find
Too many input arguments.
or
Error using ==
Too many input arguments.
Any help would be much appreciated! :)
{} indexing will return a comma separated list which will provide multiple outputs. When you pass this to find, it's the same as passing each element of your cell array as a separate input. This is what leads to the error about to many input arguments.
You will want to surround the comma-separated list with [] to create an array or numbers. Also, you don't need find because you can just use logical indexing to grab the columns you want. Additionally, you will want to index using () to grab the relevant rows, again to avoid the comma-separated list.
variable = cellArray(:, [cellArray{1,:}] == 1)

Rows without repetitions - MATLAB

I have a matrix (4096x4) containing all possible combinations of four values taken from a pool of 8 numbers.
...
3 63 39 3
3 63 39 19
3 63 39 23
3 63 39 39
...
I am only interested in the rows of the matrix that contain four unique values. In the above section, for example, the first and last row should be removed, giving us -
...
3 63 39 19
3 63 39 23
...
My current solution feels inelegant-- basically, I iterate across every row and add it to a result matrix if it contains four unique values:
result = [];
for row = 1:size(matrix,1)
if length(unique(matrix(row,:)))==4
result = cat(1,result,matrix(row,:));
end
end
Is there a better way ?
Approach #1
diff and sort based approach that must be pretty efficient -
sortedmatrix = sort(matrix,2)
result = matrix(all(diff(sortedmatrix,[],2)~=0,2),:)
Breaking it down to few steps for explanation
Sort along the columns, so that the duplicate values in each row end up next to each other. We used sort for this task.
Find the difference between consecutive elements, which will catch those duplicate after sorting. diff was the tool for this purpose.
For any row with at least one zero indicates rows with duplicate rows. To put it other way, any row with no zero would indicate rows with no duplicate rows, which we are looking to have in the output. all got us the job done here to get a logical array of such matches.
Finally, we have used matrix indexing to select those rows from matrix to get the expected output.
Approach #2
This could be an experimental bsxfun based approach as it won't be memory-efficient -
matches = bsxfun(#eq,matrix,permute(matrix,[1 3 2]))
result = matrix(all(all(sum(matches,2)==1,2),3),:)
Breaking it down to few steps for explanation
Find a logical array of matches for every element against all others in the same row with bsxfun.
Look for "non-duplicity" by summing those matches along dim-2 of matches and then finding all ones elements along dim-2 and dim-3 getting us the same indexing array as had with our previous diff + sort based approach.
Use the binary indexing array to select the appropriate rows from matrix for the final output.
Approach #3
Taking help from MATLAB File-exchange's post combinator
and assuming you have the pool of 8 values in an array named pool8, you can directly get result like so -
result = pool8(combinator(8,4,'p'))
combinator(8,4,'p') basically gets us the indices for 8 elements taken 4 at once and without repetitions. We use these indices to index into the pool and get the expected output.
For a pool of a finite number this will work. Create is unique array, go through each number in pool, count the number of times it comes up in the row, and only keep IsUnique to 1 if there are either one or zero numbers found. Next, find positions where the IsUnique is still 1, extract those rows and we finish.
matrix = [3,63,39,3;3,63,39,19;3,63,39,23;3,63,39,39;3,63,39,39;3,63,39,39];
IsUnique = ones(size(matrix,1),1);
pool = [3,63,39,19,23,6,7,8];
for NumberInPool = 1:8
Temp = sum((matrix == pool(NumberInPool))')';
IsUnique = IsUnique .* (Temp<2);
end
UniquePositions = find(IsUnique==1);
result = matrix(UniquePositions,:)