How to sum across a row in KDB/Q - kdb

I have a table rCom which has various columns. I would like to sum across each row..
for example:
Date TypeA TypeB TypeC TypeD
date1 40.5 23.1 45.1 65.2
date2 23.3 32.2 56.1 30.1
How can I write a q query to add a fourth column 'Total' that sums across each row?

why not just:
update Total: TypeA+TypeB+TypeC+TypeD from rCom
?

Sum will work just fine:
q)flip`a`b`c!3 3#til 9
a b c
-----
0 3 6
1 4 7
2 5 8
q)update d:sum(a;b;c) from flip`a`b`c!3 3#til 9
a b c d
--------
0 3 6 9
1 4 7 12
2 5 8 15
Sum has map reduce which will be better for a huge table.

One quick point regarding summing across rows. You should be careful about nulls in 1 column resulting in a null result for the sum. Borrowing #WooiKent Lee's example.
We put a null into the first position of the a column. Notice how our sum now becomes null
q)wn:.[flip`a`b`c!3 3#til 9;(0;`a);first 0#] //with null
q)update d:sum (a;b;c) from wn
a b c d
--------
3 6
1 4 7 12
2 5 8 15
This is a direct effect of the way nulls in q are treated. If you sum across a simple list, the nulls are ignored
q)sum 1 2 3 0N
6
However, a sum across a general list will not display this behavior
q)sum (),/:1 2 3 0N
,0N
So, for your table situation, you might want to fill in with a zero beforehand
q)update d:sum 0^(a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
Or alternatively, make it s.t. you are actually summing across simple lists rather than general lists.
q)update d:sum each flip (a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
For a more complete reference on null treatment please see the reference website

This is what worked:
select Answer:{[x;y;z;a] x+y+z+a }'[TypeA;TypeB;TypeC;TypeD] from
([] dt:2014.01.01 2014.01.02 2014.01.03; TypeA:4 5 6; TypeB:1 2 3; TypeC:8 9 10; TypeD:3 4 5)

Related

SPSS Modeler group by and select top n rows

I would like to know what is the proper way in SPSS to group data by specydic column and then find top n max values.
For example I have below columns:
x<-c(3,2,1,8,7,11,10,9,7,5,4)
y<-c("a","a","a", "b","b","c","c","c","c","c","c")
z<-c(2,2,2,1,1,3,3,3,3,3,3)
I want to select top max n values from column X for each group by column y
x y
1 3 a
2 2 a
3 1 a
4 8 b
5 7 b
6 11 c
7 10 c 3
8 9 c 3
9 7 c 3
10 5 c 3
11 4 c 3

add column to table in kdb based of existing columns?

I want to add a new column to a kdb table, it should add based of the existing column by populating with the non null value as below
q)t:([]a:`a`b`c`d`e`f`g`h;b:1 0n 3 4 0n 6 0n 8;c:0n 2 0n 0n 5 0n 7 0n)
q)t
a b c
-----
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
I want to add a column d that would take the value from c or d that isn't null
to produce a table like this
a b c d
-------
a 1 1
b 2 2
c 3 3
d 4 4
e 5 5
f 6 6
g 7 7
h 8 8
I tried concatenating but then it has the null in it:
q)update d:(b,'c)from t
a b c d
----------
a 1 1
b 2 2
c 3 3
d 4 4
e 5 5
f 6 6
g 7 7
h 8 8
A vector conditional might be what you’re after, something like the below:
update d:?[null b;c;b] from t
You can read more about vector conditionals here. This expects a Boolean list as the first argument and returns values from a list in the second argument where True, or values from a list in the third argument where False.
For example:
q)?[10101b;”abcde”;”ABCDE”]
“aBcDe”
When used in conjunction with a select/update statement, columns of the table can be specified as the arguments to the vector conditional as these are simply lists.
As an aside, the null keyword returns a Boolean true where a value is null and is useful as part of your solution.
You can use the ^(fill) operator.
t:([]a:`a`b`c`d`e`f`g`h;b:1 0n 3 4 0n 6 0n 8;c:0n 2 0n 0n 5 0n 7 0n)
q)update d:b^c from t
a b c d
-------
a 1 1
b 2 2
c 3 3
d 4 4
e 5 5
f 6 6
g 7 7
h 8 8
It is worth noting that if you had a row with non-null values for b and c then the query above would default to the value in c. If you would prefer the value in b to be default then switch the inputs:
q)t:([]a:`a`b`c`d`e`f`g`h;b:1 0n 3 4 0n 6 0n 8;c:0n 2 0n 0n 5 100 7 0n)
q)update d:b^c from t
a b c d
-----------
a 1 1
b 2 2
c 3 3
d 4 4
e 5 5
f 6 100 100
g 7 7
h 8 8
q)update d:c^b from t
a b c d
---------
a 1 1
b 2 2
c 3 3
d 4 4
e 5 5
f 6 100 6
g 7 7
h 8 8
You could use 'or(|)' operator.
q)update d:b|c from t
Concat will give you a list with items from both 'b' and 'c' column. It will not remove null. 'or' will compare each pair of 'b' and 'c' and will return maximum value from that pair. As null is lesser than an integer, it will give you integer value either from 'b' or 'c' column.
Can use fill here - https://code.kx.com/wiki/Reference/Caret
q)t:([]a:`a`b`c`d`e`f`g`h;b:1 0n 3 4 0n 6 0n 8;c:0n 2 0n 0n 5 0n 7 0n)
q)update d:c^b from t
a b c d
-------
a 1 1
b 2 2
c 3 3
...

How to get the difference of matrixes without repetitions removed

The function setdiff(A,B,'rows') is used to return the set of rows that are in A but not B, with repetitions removed.
Is there any way to do it without removing the repetitions?
Thanks a lot.
You can use ismember instead of setdiff, to find all the rows of B that appear in A.
Because you want only those that NOT appear in A, use the ~ sign, and finally take all A rows in these rows indices:
A =
1 2 3
4 5 6
1 2 3
7 8 9
B =
4 5 6
C=A(~ismember(A,B,'rows'),:)
C =
1 2 3
1 2 3
7 8 9

how to look back on rows until criteria matched

Consider the following sheet example:
A1 A2
1 5 10
2 6 12
3 -3 9
4 1 10
5 5 15
6 -4 11
7 9 20
How do I look back from row 6 and sum all A2 rows until a previous negative A1 row.
In this example: 15 + 10 = 25
Assuming -3 is in A3, in C4 and copied down to suit:
=IF(A3<0,0,C3+B3)
This creates a running total, starting immediately after the first negative in the left hand column, that resets after each negative in the left hand column.

Sorting data in MATLAB dependant on one column

How do I sort a column based on the values in another column in MATLAB?
Column A shows position data (it is neither ascending or descending in order) Column B contains another column of position data. Finally column C contains numerical values. Is it possible to link the first position value in B with its numerical value in the first cell of C? Then after this I want to sort B such that it is in the same order as column A with the C values following their B counterparts?The length of my columns would be 1558 values.
Before case;
A B C
1 4 10
4 1 20
3 5 30
5 2 40
2 3 50
After Case;
A B C
1 1 20
4 4 10
3 3 50
5 5 30
2 2 40
Basically A and B became the same and Column C followed B.
Since you don't want things necessarily in ascending or descending order, I don't think any built-in sorting functions like sortrows() will help here. Instead you are matching elements in one column with elements in another column.
Using [~,idx]=ismember(A,B) will tell you where each element of B is in A. You can use that to sort the desired columns.
M=[1 4 10
4 1 20
3 5 30
5 2 40
2 3 50];
A=M(:,1); B=M(:,2); C=M(:,3);
[~,idx]=ismember(A,B);
sorted_matrix = [A B(idx) C(idx)]
Powerful combo of bsxfun and matrix-multiplication solves it and good for code-golfing too! Here's the implementation, assuming M as the input matrix -
[M(:,1) bsxfun(#eq,M(:,1),M(:,2).')*M(:,2:3)]
Sample run -
>> M
M =
1 4 10
4 1 20
3 5 30
5 2 40
2 3 50
>> [M(:,1) bsxfun(#eq,M(:,1),M(:,2).')*M(:,2:3)]
ans =
1 1 20
4 4 10
3 3 50
5 5 30
2 2 40
Given M = [A B C]:
M =
1 4 10
4 1 20
3 5 30
5 2 40
2 3 50
You need to sort the rows of the matrix excluding the first column:
s = sortrows(M(:,2:3));
s =
1 20
2 40
3 50
4 10
5 30
Then use the first column as the indices to reorder the resulting submatrix:
s(M(:,1),:);
ans =
1 20
4 10
3 50
5 30
2 40
This would be used to build the output matrix:
N = [M(:,1) s(M(:,1),:)];
N =
1 1 20
4 4 10
3 3 50
5 5 30
2 2 40
The previous technique will obviously only work if A and B are permutations of the values (1..m). If this is not the case, then we need to find the ranking of each value in the array. Let's start with new values for our arrays:
A B C
1 5 60
6 1 80
9 6 60
-4 9 40
5 -4 30
We construct s as before:
s = sortrows([B C]);
s =
-4 30
1 80
5 60
6 60
9 40
We can generate the rankings one of two ways. If the elements of A (and B) are unique, we can use the third output of unique as in this answer:
[~, ~, r] = unique(A);
r =
2
4
5
1
3
If the values of A are not unique, we can use the second return value of sort, the indices in the original array of the elements in sorted order, to generate the rank of each element:
[~, r] = sort(A);
r =
4
1
5
2
3
[~, r] = sort(r);
r =
2
4
5
1
3
As you can see, the resulting r is the same, it just takes 2 calls to sort rather than 1 to unique. We then use r as the list of indices for s above:
M = [A s(r, :)];
M =
1 1 80
6 6 60
9 9 40
-4 -4 30
5 5 60
If you must retain the order of A then use something like this
matrix = [1 4 10; 4 1 20; 3 5 30; 5 2 40; 2 3 50];
idx = arrayfun(#(x) find(matrix(:,2) == x), matrix(:,1));
sorted = [matrix(:,1), matrix(idx,2:3)];