SPSS Modeler group by and select top n rows - spss-modeler

I would like to know what is the proper way in SPSS to group data by specydic column and then find top n max values.
For example I have below columns:
x<-c(3,2,1,8,7,11,10,9,7,5,4)
y<-c("a","a","a", "b","b","c","c","c","c","c","c")
z<-c(2,2,2,1,1,3,3,3,3,3,3)
I want to select top max n values from column X for each group by column y
x y
1 3 a
2 2 a
3 1 a
4 8 b
5 7 b
6 11 c
7 10 c 3
8 9 c 3
9 7 c 3
10 5 c 3
11 4 c 3

Related

Total sum of a matrix

I have a 4x9 matrix, and I need to calculate the sum of all numbers in every other column of c starting with the first. Can anyone point me in the right direction? I know we have to use the function sum() but that's about it.
I used Octave rather than MATLAB, but this works for me:
A = randi(10,4,9)
B = A(:, 1:2:9)
C = sum(B)
Generate a 4x9 matrix with random numbers between 1 and 10, then create a sub-matrix with each row, and given columns 1:2:9 means starting from the first column and ending on the 9th, choose every second column, then sum up each column. Example output:
>> A = randi(10,4,9)
A =
1 3 6 8 2 8 4 8 10
3 6 10 4 6 4 6 2 8
4 3 9 2 7 10 6 9 6
8 5 3 9 3 8 4 6 10
>> B = A(:, 1:2:9)
B =
1 6 2 4 10
3 10 6 6 8
4 9 7 6 6
8 3 3 4 10
>> C = sum(B)
C =
16 28 18 20 34
You could also take the sum of matrix C using the sum() first and then select every other element from the result starting from the 1st element.
tmpC = sum(C);
result = tmpC(1:2:end)

Rows and columns rearrangement in Matlab

I would like to select a sub-matrix of dimension 1000rows X 10columns conditional on the value of a certain column and move this sub-matrix to a different (new) matrix.
The problem is that I would like to do that for millions of observations, with sub-matrix having dimensions approximately equal to 1000rows X 10 columns (the dimension might change as well from sub-matrix to sub-matrix, it can be for instance 950 X 10, then 1050 X 10). I would like to create as many new matrices as the number of my sub-matrices.
1 2 3
-----------
1 1 2 2
2 2 2 2
3 3 2 2
4 4 2 2
5 5 2 2
6 5.5 2 2
7 6 2 2
8 6.5 2 2
9 7 2 2
10 8.5 2 2
11 12 2 2
12 15 2 2
For example:
In the above matrix, I would like to select all the elements conditional on the fact that the elements of column 1 are between 5 and 10 (i.e. rows 5 to 10), and then create a new matrix. But I have to do that for millions of sub-matrices, so I would have to create millions of new matrices.

How to sum across a row in KDB/Q

I have a table rCom which has various columns. I would like to sum across each row..
for example:
Date TypeA TypeB TypeC TypeD
date1 40.5 23.1 45.1 65.2
date2 23.3 32.2 56.1 30.1
How can I write a q query to add a fourth column 'Total' that sums across each row?
why not just:
update Total: TypeA+TypeB+TypeC+TypeD from rCom
?
Sum will work just fine:
q)flip`a`b`c!3 3#til 9
a b c
-----
0 3 6
1 4 7
2 5 8
q)update d:sum(a;b;c) from flip`a`b`c!3 3#til 9
a b c d
--------
0 3 6 9
1 4 7 12
2 5 8 15
Sum has map reduce which will be better for a huge table.
One quick point regarding summing across rows. You should be careful about nulls in 1 column resulting in a null result for the sum. Borrowing #WooiKent Lee's example.
We put a null into the first position of the a column. Notice how our sum now becomes null
q)wn:.[flip`a`b`c!3 3#til 9;(0;`a);first 0#] //with null
q)update d:sum (a;b;c) from wn
a b c d
--------
3 6
1 4 7 12
2 5 8 15
This is a direct effect of the way nulls in q are treated. If you sum across a simple list, the nulls are ignored
q)sum 1 2 3 0N
6
However, a sum across a general list will not display this behavior
q)sum (),/:1 2 3 0N
,0N
So, for your table situation, you might want to fill in with a zero beforehand
q)update d:sum 0^(a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
Or alternatively, make it s.t. you are actually summing across simple lists rather than general lists.
q)update d:sum each flip (a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
For a more complete reference on null treatment please see the reference website
This is what worked:
select Answer:{[x;y;z;a] x+y+z+a }'[TypeA;TypeB;TypeC;TypeD] from
([] dt:2014.01.01 2014.01.02 2014.01.03; TypeA:4 5 6; TypeB:1 2 3; TypeC:8 9 10; TypeD:3 4 5)

matching two matrices in matlab

Suppose I have two matrices p
p =
1 3 6 7 3 6
8 5 10 10 10 4
5 4 8 9 1 7
5 5 5 3 8 9
9 3 5 4 3 1
3 3 9 10 4 1
then after sorting the columns of matrix p into ascending order
y =
1 3 5 3 1 1
3 3 5 4 3 1
5 3 6 7 3 4
5 4 8 9 4 6
8 5 9 10 8 7
9 5 10 10 10 9
I want to know, given a value from y, what its row was in p
ex: the value 3 which is in matrix p located in row 6 column 1
then after sorting it located in matrix y in row 2 column 1
So I want at the end the values after sorting in matrix y, where it was originally in matrix p
Just use second output of sort:
[y ind] = sort(p);
Your desired result (original row of each value) is in matrix ind.
The Matlab sort command returns a second value which can be used to index into the original array or matrix. From the sort documentation:
[Y,I] = sort(X,DIM,MODE) also returns an index matrix I.
If X is a vector, then Y = X(I).
If X is an m-by-n matrix and DIM=1, then
for j = 1:n, Y(:,j) = X(I(:,j),j); end
Ok i understand exactly what you want.
I will give you my code that i write now, it is not optimal but you can optimize it or i can work with you in order to get the better code..
P and y have the same size.
[n,m]=size(p);
for L=1:m
i=1;
temp=y(i,L);
while(i<=n)
if(temp==y(i,L))
% So it is present in case i of p
disp(['It is present in line' num2str(i) ' of p']);
end
i=i+1;
end
end
VoilĂ !!

How to duplicate all inner columns of a matrix and sum pairs of columns in Matlab

Suppose I have a matrix A
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
How do I duplicate the inner columns of A to get a new matrix B
1 2 2 3 3 4 4 5
1 2 2 3 3 4 4 5
1 2 2 3 3 4 4 5
1 2 2 3 3 4 4 5
1 2 2 3 3 4 4 5
Notice the first and last column of A were left alone. Then I need to sum pairs of rows together to get another matrix C:
3 5 7 9
3 5 7 9
3 5 7 9
3 5 7 9
3 5 7 9
The size of my matrices will not always be 5x5 and the elements will not always be so nice, but the matrix will always be square.
I do not need to generate or output matrix B. That was just simply how I initially thought of obtaining my final matrix C.
My goal is to be reasonably efficient, so I would like to accomplish this without a for loop.
How do I accomplish this for arbitrary matrix size nxn ?
Very simple . .
C = A(:,2:end) + A(:,1:end-1)