How to find out the rows with a repeating element in a particular column and add all the corresponding elements of other columns? - matlab

I have a huge simulation data which needs to be post processed in MATLAB.
Say my matrix is A and its columns are named as variables ID, X, Y, Z, s1, s2 and s3. Actually my requirement is I want to find out rows with repeated X (here I mean that I am having many points for one value of x-coordinate) and add all the corresponding elements of columns s1 and s2, and divide each by no. of occurrences of X. Finally I want s1, s2 and s3 averaged over their frequency of occurrences.
It may be very trivial question, but, as a beginner I searched & tried a lot in this web, but cud not advance much. I know we can find out the repeated rows and their frequency by using commands like mode or unique etc. but iam not able to add the corresponding column elements and do averaging.
Finally when I want to plot say X vs. s1, I should have only one value of s1 for each value of x1. (i.e. s1 needs to be averaged over all repeating X)
Do we have any direct matlab command for this or we need to use some loop?
Please help me.

There is a function in matlab named grpstats that solves your very problem.
It computes groupwise summary statistics, for data in a matrix or dataset array.
Example:
data = [1,2,3,4];
group = [1,1,1,3];
[name,mean] = grpstats(data, group,{'gname','mean'})
would output:
name =
'1'
'3'
mean =
2
4
You may type help grpstats in Matlab for more information.

Related

How to retrieve specific rows that the coordinates x,y have been saved in another matrix?

I have a data matrix Data(8765x138) that first and second columns are x and y coordinates. I have sampled some specific points in another array, Points(2000x2), first and second columns in A refers to x and y, respectively. I want to extract some specific rows in Matlab that match with matrix A (both x,y). The output should be (2000x138). I tried the following code but the result is not correct.
newData = Data(ismember(Data(:,1),Points(:,1))& ismember(Data(:,2),Points(:,2)),:);
what should I do to select the rows from Data that its first and second columns match to my Points matrix. Someone please help, I feel like I've tried everything!
You can use ismember with argument rows
newData = Data(ismember(Data(:,1:2),Points(:,1:2),'rows'),:);
But when you are sampling it is better to save index of sampled data and use them to extract coordinates.
It's a badly worded question, so hard to know for sure, but the solution might be to use ismember() to find matching rows, e.g.:
DataXY = Data(:,[1,2]);
tf = ismember(DataXY, Points, 'rows');
newData = Data(tf,:);

(matlab matrix operation), Is it possible to get a group of value from matrix without loop?

I'm currently working on implementing a gradient check function in which it requires to get certain index values from the result matrix. Could someone tell me how to get a group of values from the matrix?
To be specific, for a result matrx res with size M x N, I'll need to get element res(3,1), res(4,2), res(1,3), res(2,4)...
In my case, M is dimension and N is batch size and there's a label array whose size is 1xbatch_size, [3 4 1 2...]. So the desired values are res(label(:),1:batch_size). Since I'm trying to practice vectorization programming and it's better not using loop. Could someone tell me how to get a group of value without a iteration?
Cheers.
--------------------------UPDATE----------------------------------------------
The only idea I found is firstly building a 'mask matrix' then use the original result matrix to do element wise multiplication (technically called 'Hadamard product', see in wiki). After that just get non-zero element out and do the sum operation, the code in matlab should look like:
temp=Mask.*res;
desired_res=temp(temp~=0); %Note: the temp(temp~=0) extract non-zero elements in a 'column' fashion: it searches temp matrix column by column then put the non-zero number into container 'desired_res'.
In my case, what I wanna do next is simply sum(desired_res) so I don't need to consider the order of those non-zero elements in 'desired_res'.
Based on this idea above, creating mask matrix is the key aim. There are two methods to do this job.
Codes are shown below. In my case, use accumarray function to add '1' in certain location (which are stored in matrix 'subs') and add '0' to other space. This will give you a mask matrix size [rwo column]. The usage of full(sparse()) is similar. I made some comparisons on those two methods (repeat around 10 times), turns out full(sparse) is faster and their time costs magnitude is 10^-4. So small difference but in a large scale experiments, this matters. One benefit of using accumarray is that it could define the matrix size while full(sparse()) cannot. The full(sparse(subs, 1)) would create matrix with size [max(subs(:,1)), max(subs(:,2))]. Since in my case, this is sufficient for my requirement and I only know few of their usage. If you find out more, please share with us. Thanks.
The detailed description of those two functions could be found on matlab's official website. accumarray and full, sparse.
% assume we have a label vector
test_labels=ones(10000,1);
% method one, accumarray(subs,1,[row column])
tic
subs=zeros(10000,2);
subs(:,1)=test_labels;
subs(:,2)=1:10000;
k1=accumarray(subs,1,[10, 10000]);
t1=toc % to compare with method two to check which one is faster
%method two: full(sparse(),1)
tic
k2=full(sparse(test_labels,1:10000,1));
t2=toc

MATLAB ttest2 command gives different values when used on same data in a 2D and 3D matrix?

I'm using MATLAB to perform some statistics on some data. I have two 17x206x378 matrices where dimension 1 are subjects from the same group (so 17 subjects in matrix1, 17 in matrix 2). I want to perform ttests so I get 206 p-values. I then want to do this SEPARATELY for each of the 378 elements in the third dimension.
So say u is a 17x206x378 matrix and d is a different 17x206x378 matrix.
I basically started by doing:
[h,p,ci,s] = ttest2(u,d)
Which does in fact give me a p-matrix size 1x206x378 so everything looked great.
Then to do a quick check I just extracted the first of the third dimension elements from each matrix with:
u1=u(:,:,1); d1=d(:,:,1);
and ran test2 on this data via what you would expect:
[h1,p1,ci1,s1] = ttest2(u1,d1);
I again got a 1x206 p1-matrix of results but the values are not the same as those in the 1x206x378 p-matrix. When I plot the values in both the p(:,:,1) and the p1 vectors the resulting plots look very similar but not exactly the same.
Obviously one of these give results that are significant (below .05) in some instances where the other does not and I do not want to report a fake result so 2 questions I suppose?
1) I am under the impression I am doing the ttests on the same data so what exactly is going on here?
2) If I do ultimately want to get 206 p-values for each of the 378 third dimension elements, what is the correct way to do this?
Thanks for your help!
I ran the following code:
u = rand(17,206,378);
d = rand(17,206,378);
u1 = u(:,:,1);
d1 = d(:,:,1);
[h,p,ci,s] = ttest(u,d);
[h1,p1,ci1,s1] = ttest(u1,d1);
sum(abs(p1(1,:)- p(1,:,1)))
And the output was 0, indicating that the corresponding elements of p and p1 are the same. Maybe it's an indexing issue.

Assigning the different row to another matrix after comparing two matrices

i have two matrices
r=10,000x2
q=10,000x2
i have to find out those rows of q which are one value or both values(as it is a two column matrix) different then r and allocate them in another matrix, right now i am trying this.i cannot use isequal because i want to know those rows
which are not equal this code gives me the individual elements not the complete rows different
can anyone help please
if r(:,:)~=q(:,:)
IN= find(registeredPts(:,:)~=q(:,:))
end
You can probably do this using ismember. Is this what you want? Here you get the values from q in rows that are different from r.
q=[1,2;3,4;5,6]
r=[1,2;3,5;5,6]
x = q(sum(ismember(q,r),2) < 2,:)
x =
3 4
What this do:
ismember creates an array with 1's in the positions where q == r, and 0 in the remaining positions. sum(.., 2) takes the column sum of each of these rows. If the sum is less than 2, that row is included in the new array.
Update
If the values might differ some due to floating point arithmetic, check out ismemberf from the file exchange. I haven't tested it myself, but it looks good.

Preserving matrix columns using Matlab brush/select data tool

I'm working with matrices in Matlab which have five columns and several million rows. I'm interested in picking particular groups of this data. Currently I'm doing this using plot3() and the brush/select data tool.
I plot the first three columns of the matrix as X,Y, Z and highlight the matrix region I'm interested in. I then use the brush/select tool's "Create variable" tool to export that region as a new matrix.
The problem is that when I do that, the remaining two columns of the original, bigger matrix are dropped. I understand why- they weren't plotted and hence the figure tool doesn't know about them. I need all five columns of that subregion though in order to continue the processing pipeline.
I'm adding the appropriate 4th and 5th column values to the exported matrix using a horrible nested if loop approach- if columns 1, 2 and 3 match in both the original and exported matrix, attach columns 4/5 of the original matrix to the exported one. It's bad design and agonizingly slow. I know there has to be a Matlab function/trick for this- can anyone help?
Thanks!
This might help:
1. I start with matrix 1 with columns X,Y,Z,A,B
2. Using the brush/select tool, I create a new (subregion) matrix 2 with columns X,Y,Z
3. I then loop through all members of matrix 2 against all members of matrix 1. If X,Y,Z match for a pair of rows, I append A and B
from that row in matrix 1 to the appropriate row in matrix 2.
4. I become very sad as this takes forever and shows my ignorance of Matlab.
If I understand your situation correctly here is a simple way to do it:
Assuming you have a matrix like so: M = [A B C D E] where each letter is a Nx1 vector.
You select a range, this part is not really clear to me, but suppose you can create the following:
idxA,idxB and idxC, that are 1 if they are in the region and 0 otherwise.
Then you can simply use:
M(idxA&idxB&idxC,:)
and you will get the additional two columns as well.