Extracting unique values - matlab

I have data in two columns that looks as follows:
A B
1,265848208 3
-0,608043611 0
-0,285735893 0
0,006895134 7
0 7
-0,004526196 7
0,176326617 10
-0,159688071 2
0,22439945 2
-0,991045044 1
0,178022324 1
-0,270967397 4
0,285849994 4
1,881705539 23
1,057184204 10
NaN 10
For all unique values in B I want to extract the corresponding value in column A and move it to a new matrix. I'm looking to then compute the mean of all the corresponding values in A and use as a dependent variable (weighted by no of observations per value in B) in a regression with the common value of B being the independent variable to reduce noise. Any help would on how to do this in Matlab (except running the regression) would be great!
Thanks
Oscar

Here is an efficient solution:
X = [
1.265848208 3
-0.608043611 0
-0.285735893 0
0.006895134 7
0 7
-0.004526196 7
0.176326617 10
-0.159688071 2
0.22439945 2
-0.991045044 1
0.178022324 1
-0.270967397 4
0.285849994 4
1.881705539 23
1.057184204 10
NaN 10
];
%# unique values in B, and their indices
[valB,~,subs] = unique(X(:,2));
%# values of A for each unique number in B (cellarray)
valA = accumarray(subs, X(:,1), [], #(x) {x});
%# mean of each group
meanValA = cellfun(#nanmean, valA)
%# perform regression here...
The result:
%# B values, mean of corresponding values in A, number of A values
>> [valB meanValA cellfun(#numel,valA)]
ans =
0 -0.44689 2
1 -0.40651 2
2 0.032356 2
3 1.2658 1
4 0.0074413 2
7 0.00078965 3
10 0.61676 3
23 1.8817 1

Related

How to align vectors with asynchronous time stamp in matlab?

I would like to align and count vectors with different time stamps to count the corresponding bins.
Let's assume I have 3 matrix from [N,edges] = histcounts in the following structure. The first row represents the edges, so the bins. The second row represents the values. I would like to sum all values with the same bin.
A = [0 1 2 3 4 5;
5 5 6 7 8 5]
B = [1 2 3 4 5 6;
2 5 7 8 5 4]
C = [2 3 4 5 6 7 8;
1 2 6 7 4 3 2]
Now I want to sum all the same bins. My final result should be:
result = [0 1 2 3 4 5 6 7 8;
5 7 12 16 ...]
I could loop over all numbers, but I would like to have it fast.
You can use accumarray:
H = [A B C].'; %//' Concatenate the histograms and make them column vectors
V = [unique(H(:,1)) accumarray(H(:,1)+1, H(:,2))].'; %//' Find unique values and accumulate
V =
0 1 2 3 4 5 6 7 8
5 7 12 16 22 17 8 3 2
Note: The H(:,1)+1 is to force the bin values to be positive, otherwise MATLAB will complain. We still use the actual bins in the output V. To avoid this, as #Daniel says in the comments, use the third output of unique (See: https://stackoverflow.com/a/27783568/2732801):
H = [A B C].'; %//' stupid syntax highlighting :/
[U, ~, IU] = unique(H(:,1));
V = [U accumarray(IU, H(:,2))].';
If you're only doing it with 3 variables as you've shown then there likely aren't going to be any performance hits with looping it.
But if you are really averse to the looping idea, then you can do it using arrayfun.
rng = 0:8;
output = arrayfun(#(x)sum([A(2,A(1,:) == x), B(2,B(1,:) == x), C(2,C(1,:) == x)]), rng);
output = cat(1, rng, output);
output =
0 1 2 3 4 5 6 7 8
5 7 12 16 22 17 8 3 2
This can be beneficial for particularly large A, B, and C variables as there is no copying of data.

dot product of matrix columns

I have a 4x8 matrix which I want to select two different columns of it then derive dot product of them and then divide to norm values of that selected columns, and then repeat this for all possible two different columns and save the vectors in a new matrix. can anyone provide me a matlab code for this purpose?
The code which I supposed to give me the output is:
A=[1 2 3 4 5 6 7 8;1 2 3 4 5 6 7 8;1 2 3 4 5 6 7 8;1 2 3 4 5 6 7 8;];
for i=1:8
for j=1:7
B(:,i)=(A(:,i).*A(:,j+1))/(norm(A(:,i))*norm(A(:,j+1)));
end
end
I would approach this a different way. First, create two matrices where the corresponding columns of each one correspond to a unique pair of columns from your matrix.
Easiest way I can think of is to create all possible combinations of pairs, and eliminate the duplicates. You can do this by creating a meshgrid of values where the outputs X and Y give you a pairing of each pair of vectors and only selecting out the lower triangular part of each matrix offsetting by 1 to get the main diagonal just one below the diagonal.... so do this:
num_columns = size(A,2);
[X,Y] = meshgrid(1:num_columns);
X = X(tril(ones(num_columns),-1)==1); Y = Y(tril(ones(num_columns),-1)==1);
In your case, here's what the grid of coordinates looks like:
>> [X,Y] = meshgrid(1:num_columns)
X =
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
Y =
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8
As you can see, if we select out the lower triangular part of each matrix excluding the diagonal, you will get all combinations of pairs that are unique, which is what I did in the last parts of the code. Selecting the lower-part is important because by doing this, MATLAB selects out values column-wise, and traversing the columns of the lower-triangular part of each matrix gives you the exact orderings of each pair of columns in the right order (i.e. 1-2, 1-3, ..., 1-7, 2-3, 2-4, ..., etc.)
The point of all of this is that can then use X and Y to create two new matrices that contain the columns located at each pair of X and Y, then use dot to apply the dot product to each matrix column-wise. We also need to divide the dot product by the multiplication of the magnitudes of the two vectors respectively. You can't use MATLAB's built-in function norm for this because it will compute the matrix norm for matrices. As such, you have to sum over all of the rows for each column respectively for each of the two matrices then multiply both of the results element-wise then take the square root - this is the last step of the process:
matrix1 = A(:,X);
matrix2 = A(:,Y);
B = dot(matrix1, matrix2, 1) ./ sqrt(sum(matrix1.^2,1).*sum(matrix2.^2,1));
I get this for B:
>> B
B =
Columns 1 through 11
1 1 1 1 1 1 1 1 1 1 1
Columns 12 through 22
1 1 1 1 1 1 1 1 1 1 1
Columns 23 through 28
1 1 1 1 1 1
Well.. this isn't useful at all. Why is that? What you are actually doing is finding the cosine angle between two vectors, and since each vector is a scalar multiple of another, the angle that separates each vector is in fact 0, and the cosine of 0 is 1.
You should try this with different values of A so you can see for yourself that it works.
To make this code compatible for copying and pasting, here it is:
%// Define A here:
A = repmat(1:8, 4, 1);
%// Code to produce dot products here
num_columns = size(A,2);
[X,Y] = meshgrid(1:num_columns);
X = X(tril(ones(num_columns),-1)==1); Y = Y(tril(ones(num_columns),-1)==1);
matrix1 = A(:,X);
matrix2 = A(:,Y);
B = dot(matrix1, matrix2, 1) ./ sqrt(sum(matrix1.^2,1).*sum(matrix2.^2,1));
Minor Note
If you have a lot of columns in A, this may be very memory intensive. You can get your original code to work with loops, but you need to change what you're doing at each column.
You can do something like this:
num_columns = nchoosek(size(A,2),2);
B = zeros(1, num_columns);
counter = 1;
for ii = 1 : size(A,2)
for jj = ii+1 : size(A,2)
B(counter) = dot(A(:,ii), A(:,jj), 1) / (norm(A(:,ii))*norm(A(:,jj)));
counter = counter + 1;
end
end
Note that we can use norm because we're specifying vectors for each of the inputs into the function. We first preallocate a matrix B that will contain the dot products of all possible combinations. Then, we go through each pair of combinations - take note that the inner for loop starts from the outer most for loop index added with 1 so you don't look at any duplicates. We take the dot product of the corresponding columns referenced by positions ii and jj and store the results in B. I need an external counter so we can properly access the right slot to place our result in for each pair of columns.

How can I go through the columns of a matrix in matlab and add them each to a specific column of a sum matrix in matlab?

Supose there is a Matrix
A =
1 3 2 4
4 2 5 8
6 1 4 9
and I have a Vector containing the "class" of each column of this matrix for example
v = [1 , 1 , 2 , 3]
How can I sum the columns of the matrix to a new matrix as column vectors each to the column of their class? In this example columns 1 and 2 of A would added to the first column of the new matrix, column 2 to the 3 to the 2nd, column 4 the the 3rd.
Like
SUM =
4 2 4
6 5 8
7 4 9
Is this possible without loops?
One of the perfect scenarios to combine the powers of accumarray and bsxfun -
%// Since we are to accumulate columns, first step would be to transpose A
At = A.' %//'
%// Create a vector of linear IDs for use with ACCUMARRAY later on
idx = bsxfun(#plus,v(:),[0:size(A,1)-1]*max(v))
%// Use ACCUMARRAY to accumulate rows from At, i.e. columns from A based on the IDs
out = reshape(accumarray(idx(:),At(:)),[],size(A,1)).'
Sample run -
A =
1 3 2 4 6 0
4 2 5 8 9 2
6 1 4 9 8 9
v =
1 1 2 3 3 2
out =
4 2 10
6 7 17
7 13 17
An alternative with accumarray in 2D. Generate a grid with the vector v and then apply accumarray:
A = A.';
v = [1 1 2 3];
[X, Y] = ndgrid(v,1:size(A,2));
Here X and Y look like this:
X =
1 1 1
1 1 1
2 2 2
3 3 3
Y =
1 2 3
1 2 3
1 2 3
1 2 3
Then apply accumarray:
B=accumarray([X(:) Y(:)],A(:)),
SUM = B.'
SUM =
4 2 4
6 5 8
7 4 9
As you see, using [X(:) Y(:)] create the following array:
ans =
1 1
1 1
2 1
3 1
1 2
1 2
2 2
3 2
1 3
1 3
2 3
3 3
in which the vector v containing the "class" is replicated 3 times since there are 3 unique classes that are to be summed up together.
EDIT:
As pointed out by knedlsepp you can get rid of the transpose to A and B like so:
[X2, Y2] = ndgrid(1:size(A,1),v);
B = accumarray([X2(:) Y2(:)],A(:))
which ends up doing the same. I find it a bit more easier to visualize with the transposes but that gives the same result.
How about a one-liner?
result = full(sparse(repmat(v,size(A,1),1), repmat((1:size(A,1)).',1,size(A,2)), A));
Don't optimize prematurely!
The for loop performs fine for your problem:
out = zeros(size(A,1), max(v));
for i = 1:numel(v)
out(:,v(i)) = out(:,v(i)) + A(:,i);
end
BTW: With fine, I mean: fast, fast, fast!

Matlab: How I can make this transformation on the matrix A? (part 2)

N.B: This question is more complex than my previous question: Matlab: How I can make this transformation on the matrix A?
I have a matrix A 4x10000, I want to use it to find another matrix C, based on a predefined vector U.
I'll simplify my problem with a simple example:
from a matrix A
20 4 4 74 20 20 4
36 1 1 11 36 36 1
77 1 1 15 77 77 1
3 4 2 6 7 8 15
and
U=[2 3 4 6 7 8 2&4&15 7&8 4|6].
& : AND
| : OR
I want, first, to find an intermediate entity B:
2 3 4 6 7 8 2&4&15 7&8 4|6
[20 36 77] 0 1 0 0 1 1 0 1 0 4
[4 1 1] 1 0 1 0 0 0 1 0 1 4
[74 11 15] 0 0 0 1 0 0 0 0 1 2
we put 1 if the corresponding value of the first line and the vector on the left, made ​​a column in the matrix A.
the last column of the entity B is the sum of 1 of each line.
at the end I want a matrix C, consisting of vectors which are left in the entity B, but only if the sum of 1 is greater than or equal to 3.
for my example:
20 4
C = 36 1
77 1
This was a complex one indeed and because of the many restrictions and labeling processes involved, it won't be as efficient as the solution to the previous problem. Here's the code to solve the posted problem -
find_labels1 = 2:8; %// Labels to be detected - main block
find_labels2 = {[2 4 15],[7 8],[4 6]}; %// ... side block
A1 = A(1:end-1,:); %// all of A except the last row
A2 = A(end,:); %// last row of A
%// Find unique columns and their labels for all of A execpt the last row
[unqmat_notinorder,row_ind,inv_labels] = unique(A1.','rows'); %//'
[tmp_sortedval,ordered_ind] = sort(row_ind);
unqcols = unqmat_notinorder(ordered_ind,:);
[tmp_matches,labels] = ismember(inv_labels,ordered_ind);
%// Assign labels to each group
ctl = numel(unique(labels));
labelgrp = arrayfun(#(x) find(labels==x),1:ctl,'un',0);
%// Work for the main comparisons
matches = bsxfun(#eq,A2,find_labels1'); %//'
maincols = zeros(ctl,numel(find_labels1));
for k = 1:ctl
maincols(k,:) = any(matches(:,labelgrp{k}),2);
end
%// Work for the extra comparisons added that made this problem extra-complex
lens = cellfun('length',find_labels2);
lens(end) = 1;
extcols = nan(ctl,numel(find_labels2));
for k = 1:numel(find_labels2)
idx = find(ismember(A2,find_labels2{k}));
extcols(:,k)=arrayfun(#(n) sum(ismember(labelgrp{n},idx))>=lens(k),1:ctl).'; %//'
end
C = unqcols(sum([maincols extcols],2)>=3,:).' %//'# Finally the output
I will give you a partial answer. I think you can take from here. Idea is to concatenate first 3 rows of A with each element of U replicated as last column. After you get the 3D matrix, replicate your original A and then just compare the rows. The rows which are equal, that is equivalent to putting one in your table.
B=(A(1:3,:).';
B1=repmat(B,[1 1 length(U)]);
C=permute(U,[3 1 2]);
D=repmat(C,[size(B1,1),1,1]);
E=[B1 D];
F=repmat(A',[1 1 size(E,3)]);
Now compare F and E, row-wise. If the rows are equal, then you put 1 in your table. For replicating & and |, you can form some kind of indicator vector.
Say,
indU=[1 2 3 4 5 6 7 7 7 8 8 -9 -9];
Same positive value indicates &, same negative value indicates |. Different value indicate different entries.
I hope you can take from here.

MATLAB find mean of column in matrix using two different indices

I have a 22007x3 matrix with data in column 3 and two separate indices in columns 1 and 2.
eg.
x =
1 3 4
1 3 5
1 3 5
1 16 4
1 16 3
1 16 4
2 4 1
2 4 3
2 11 2
2 11 3
2 11 2
I need to find the mean of the values in column 3 when the values in column 1 are the same AND the values in column 2 are the same, to end up with something like:
ans =
1 3 4.6667
1 16 3.6667
2 4 2
2 11 2.3333
Please bear in mind that in my data, the number of times the values in column 1 and 2 occur can be different.
Two options I've tried already are the meshgrid/accumarray option, using two distinct unique functions and a 3D array:
[U, ix, iu] = unique(x(:, 1));
[U2,ix2,iu2] = unique(x(:,2));
[c, r, j] = meshgrid((1:size(x(:, 1), 2)), iu, iu2);
totals = accumarray([r(:), c(:), j(:)], x(:), [], #nanmean);
which gives me this:
??? Maximum variable size allowed by the program is exceeded.
Error in ==> meshgrid at 60
xx = xx(ones(ny,1),:,ones(nz,1));
and the loop option,
for i=1:size(x,1)
if x(i,2)== x(i+1,2);
totals(i,:)=accumarray(x(:,1),x(:,3),[],#nanmean);
end
end
which is obviously so very, very wrong, not least because of the x(i+1,2) bit.
I'm also considering creating separate matrices depending on how many times a value in column 1 occurs, but that would be long and inefficient, so I'm loathe to go down that road.
Group on the first two columns with a unique(...,'rows'), then accumulate only the third column (always the best approach to accumulate only where accumulation really happens, thus avoiding indices, i.e. the first two columns, which you can reattach with unX):
[unX,~,subs] = unique(x(:,1:2),'rows');
out = [unX accumarray(subs,x(:,3),[],#nanmean)];
out =
1 3 4.6667
1 16 3.6667
2 4 2
2 11 2.33
This is an ideal opportunity to use sparse matrix math.
x = [ 1 2 5;
1 2 7;
2 4 6;
3 4 6;
1 4 8;
2 4 8;
1 1 10]; % for example
SM = sparse(x(:,1),x(:,2), x(:,3);
disp(SM)
Result:
(1,1) 10
(1,2) 12
(1,4) 8
(2,4) 14
(3,6) 7
As you can see, we did the "accumulate same indices into same container" in one fell swoop. Now you need to know how many elements you have:
NE = sparse(x(:,1), x(:,2), ones(size(x(:,1))));
disp(NE);
Result:
(1,1) 1
(1,2) 2
(1,4) 1
(2,4) 2
(3,6) 1
Finally, you divide one by the other to get the mean (only use elements that have a value):
matrixMean = SM;
nz = find(NE>0);
matrixMean(nz) = SM(nz) ./ NE(nz);
If you then disp(matrixMean), you get
(1,1) 10
(1,2) 6
(1,4) 8
(2,4) 7
(3,6) 7
If you want to access the individual elements differently, then after you have computed SM and NE you can do
[i j n] = find(NE);
matrixMean = SM(i,j)./NE(i,j);
disp([i(:) j(:) nonzeros(matrixMean)]);