How do I split a very large matrix into submatrices based on the value in a column? - matlab

I have a 5 x 600,000 matrix. I've had an idea to group the data so I want to group this matrix into submatrices based on the values in column 4.
For values between 0 and 500, I want one matrix, for values between 501 and 1000 I want another, and for values between 1001 and 1500 I want another.
How can I do this?
I currently don't have any reliable material, I have seen some examples online but they only seem to feature 2 variables (i.e. with value 1 or 0 in a column and grouping the 1s and the 0s into 2 submatrices).

I think in Matlab-speak you mean you have an nxm matrix where n=600000, m=5, but if not you can change accordingly.
Is this what you were looking to do?
n=600000;
m=5;
thisCol =4;
values_range = {[0,500];[501,1000];[1001,1500]}; % cell array of vectors
myMatrix = zeros(n,m);
myMatrix(:,thisCol) = 1:600000; % to prove it works.
theseSubMatrices = cell(length(values_range),1); % cell array of matrices
for j=1:length(values_range)
thisLow= values_range{j}(1);
thisHigh= values_range{j}(2);
theseSubMatrices{j} = myMatrix(myMatrix(:,thisCol)>=thisLow & myMatrix(:,thisCol)<=thisHigh,:);
end

If you have some data
arr = rand( 6e5, 5 ); % 5 columns / 600,000 rows
arr(:,5) = arr(:,5) .* 1500; % for this example, get column 5 into range [0,1500]
Then you can use histcounts to "bin" the 5th column according to your edges.
edges = [0, 500, 1000, 1500]; % edges to split column 5 by
[~,~,iSubArr] = histcounts( arr(:,5), edges );
And generate a cell array with one element per sub array
nSubArr = numel(edges)-1; % number of bins / subarrays
subArrs = arrayfun( #(x) arr( iSubArr == x, : ), 1:nSubArr, 'uni', 0 ); % Get a matrix per bin
Output:
subArrs =
1×3 cell array
{200521×5 double} {199924×5 double} {199555×5 double}

Related

Count rows of a matrix and give back an array

I would like to know how to count rows in an matrix in such a way that gives an output for each colum. for example:
X=[1 1 1;
5 5 5]
I would like to find a command that when I input the matrix X the answers is [2 2 2], so that it counts the number of rows per column.
I have already found nunel(X) but the answer is a scalar numel(X)=6, whereas I need per column.
size(X,1) will give you the number of rows in the matrix (a scalar). a matrix has only one number of rows, i.e. each column has the same number of rows.
however if you still want the number of rows per each column you can use:
X = [1 1 1;
5 5 5];
nrows = size(X,1);
ncols = size(X,2);
nrowsPerCol = repmat(nrows, [1 ncols]) % [2 2 2]
Each matrix object in MATLAB has height and width property.
In other words: each column has the same number of rows.
To get this value, use MATLAB's size function:
[numOfRows, numOfCols] = size(X);

Index a vector by a matrix of conditions to obtain multiple selections of the target?

I have a vector T of length n and m other vectors of the same length with 0 or 1 used as condition to select elements of T. The condition vectors are combined into a matrix I of size n x m.
Is there a one liner to extract a matrix M of values from Tsuch that the i-th column of M are those elements in T that are selected by the condition elements of the i-th column in I?
Example:
T = (1:10)'
I = mod(T,2) == 0
T(I)'
yields
2 4 6 8 10
However
I = mod(T,2:4) == 0
T(I)'
yields an error in the last statement. I see that the columns might select a different number of elements which results in vectors of different lengths (as in the example). However, even this example doesn't work:
I = zeros(10,2)
I(:,1) = mod(T,2)==0
I(:,2) = mod(T,2)==1
Is there any way to achieve the solution in a one liner?
The easiest way I can think of to do something like this is to take advantage of the element-wise multiplication operator .* with your matrix I. Take this as an example:
% these lines are just setup of your problem
m = 10;
n = 10;
T = [1:m]';
I = randi([0 1], m, n);
% 1 liner to create M
M = repmat(T, 1, n) .* I;
What this does is expand T to be the same size as I using repmat and then multiplies all the elements together using .*.
Here is a one linear solution
mat2cell(T(nonzeros(bsxfun(#times,I,(1:numel(T)).'))),sum(I))
First logical index should be converted to numeric index for it we multiply T by each column of I
idx = bsxfun(#times,I,(1:numel(T)).');
But that index contain zeros we should extract those values that correspond to 1s in matrix I:
idx = nonzeros(idx);
Then we extract repeated elements of T :
T2 = T(idx);
so we need to split T2 to 3 parts size of each part is equal to sum of elements of corresponding column of I and mat2cell is very helpful
result = mat2cell(T2,sum(I));
result
ans =
{
[1,1] =
2
4
6
8
10
[2,1] =
3
6
9
[3,1] =
4
8
}
One line solution using cellfun and mat2cell
nColumns = size(I,2); nRows = size(T,1); % Take the liberty of a line to write cleaner code
cellfun(#(i)T(i),mat2cell(I,nRows,ones(nColumns,1)),'uni',0)
What is going on:
#(i)T(i) % defines a function handle that takes a logical index and returns elements from T for those indexes
mat2cell(I,nRows,ones(nColumns,1)) % Split I such that every column is a cell
'uni',0 % Tell cellfun that the function returns non uniform output

How to compare columns of a binary matrix and compare elements in matlab?

i have [sentences*words] matrix as shown below
out = 0 1 1 0 1
1 1 0 0 1
1 0 1 1 0
0 0 0 1 0
i want to process this matrix in a way that should tell W1 & W2 in "sentence number 2" and "sentence number 4" occurs with same value i.e 1 1 and 0 0.the output should be as follows:
output{1,2}= 2 4
output{1,2} tells word number 1 and 2 occurs in sentence number 2 and 4 with same values.
after comparing W1 & W2 next candidate should be W1 & W3 which occurs with same value in sentence 3 & sentence 4
output{1,3}= 3 4
and so on till every nth word is compared with every other words and saved.
This would be one vectorized approach -
%// Get number of columns in input array for later usage
N = size(out,2);
%// Get indices for pairwise combinations between columns of input array
[idx2,idx1] = find(bsxfun(#gt,[1:N]',[1:N])); %//'
%// Get indices for matches between out1 and out2. The row indices would
%// represent the occurance values for the final output and columns for the
%// indices of the final output.
[R,C] = find(out(:,idx1) == out(:,idx2))
%// Form cells off each unique C (these will be final output values)
output_vals = accumarray(C(:),R(:),[],#(x) {x})
%// Setup output cell array
output = cell(N,N)
%// Indices for places in output cell array where occurance values are to be put
all_idx = sub2ind(size(output),idx1,idx2)
%// Finally store the output values at appropriate indices
output(all_idx(1:max(C))) = output_vals
You can get a logical matrix of size #words-by-#words-by-#sentences easily using bsxfun:
coc = bsxfun( #eq, permute( out, [3 2 1]), permute( out, [2 3 1] ) );
this logical array is occ( wi, wj, si ) is true iff word wi and word wj occur in sentence si with the same value.
To get the output cell array from coc you need
nw = size( out, 2 ); %// number of words
output = cell(nw,nw);
for wi = 1:(nw-1)
for wj = (wi+1):nw
output{wi,wj} = find( coc(wi,wj,:) );
output{wj,wi} = output{wi,wj}; %// you can force it to be symmetric if you want
end
end

Get even/odd indices of a matrix - MATLAB

I have following problem:
I have a given matrix of let's say 4x4.
How can I get the indices of the following combinations:
row odd and column odd
row odd and column even
row even and column odd
row even and column even
For example if I have the matrix:
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
'row odd and column odd' would be the indices of 1, 3, 9, 11...
'row odd and column even' would be the indices of 2, 4, 10, 12...
'row even and column odd' would be the indices of 5, 7, 13, 15...
and 'row even and column even' would be indices of 6, 8, 14, 16...
Also, is it possible to combine those operations so e.g. I get the indices for 'row odd and column odd' and 'row even and column even'?
Thank you!
It's pretty easy to do with indexing:
Odd rows and odd columns
B = A(1:2:end, 1:2:end);
Odd rows and even columns
B = A(1:2:end, 2:2:end);
Even rows and odd columns
B = A(2:2:end, 1:2:end);
Even rows and even columns
B = A(2:2:end, 2:2:end);
The above assumes that you want the actual matrix values themselves. It's a bit confusing as your matrix elements are the same as linear indexing values themselves. If you want to determine the actual column major indices to access the matrix, you can generate a vector from 1 to N where N is the total number of elements in your matrix, then reshape this matrix into the desired size that you want. After, use the same logic above to get the actual linear indices:
N = numel(A);
B = reshape(1:N, size(A,1), size(A,2));
ind = B(1:2:end, 1:2:end); %// For odd rows, odd columns
%// Repeat for the other ones...
Now, given your comment, you want to create a new matrix that will store only these extracted matrix values while making all of the other elements zero. If you want to do this, simply pre-allocate a matrix of zeroes, then copy over those values to extract using the computed indices into the new matrix. In other words:
N = numel(A);
B = reshape(1:N, size(A,1), size(A,2));
ind = B(1:2:end, 1:2:end); %// For odd rows, odd columns - Change to suit your tastes
out = zeros(size(A));
out(ind(:)) = A(ind(:));
If you want to combine the indices like having odd row - odd column, and even row - even column, just compute two sets of indices, concatenate them into a single vector and do the same syntax like before. Therefore:
N = numel(A);
B = reshape(1:N, size(A,1), size(A,2));
ind = B(1:2:end, 1:2:end); %// For odd rows, odd columns
ind2 = B(2:2:end, 2:2:end); %// For even rows, even columns
ind = [ind(:); ind2(:)];
out = zeros(size(A));
out(ind) = A(ind);
Code
N = size(A,1); %// Get size of input matrix A
case1_ind = bsxfun(#plus,[1:2:N]',(0:N/2-1)*2*N)
case2_ind = case1_ind + N
case3_ind = case1_ind + 1
case4_ind = case3_ind + N
Note: These outputs are indices. So, to get the actual outputs, use these as indices.
To combine indices for case 1 and case 4, just concatenate -
case14comb_ind = [case1_ind ; case4_ind]
Edit :
%// To copy onto some other matrix of the same size as A, do this for case 1
new_matrix = zeros(size(A))
new_matrix(case1_ind(:)) = A(case1_ind(:))
Repeat this for the other cases too.

Sum every n rows of matrix

Is there any way that I can sum up columns values for each group of three rows in a matrix?
I can sum three rows up in a manual way.
For example
% matrix is the one I wanna store the new data.
% data is the original dataset.
matrix(1,1:end) = sum(data(1:3, 1:end))
matrix(2,1:end) = sum(data(4:6, 1:end))
...
But if the dataset is huge, this wouldn't work.
Is there any way to do this automatically without loops?
Here are four other ways:
The obligatory for-loop:
% for-loop over each three rows
matrix = zeros(size(data,1)/3, size(data,2));
counter = 1;
for i=1:3:size(data,1)
matrix(counter,:) = sum(data(i:i+3-1,:));
counter = counter + 1;
end
Using mat2cell for tiling:
% divide each three rows into a cell
matrix = mat2cell(data, ones(1,size(data,1)/3)*3);
% compute the sum of rows in each cell
matrix = cell2mat(cellfun(#sum, matrix, 'UniformOutput',false));
Using third dimension (based on this):
% put each three row into a separate 3rd dimension slice
matrix = permute(reshape(data', [], 3, size(data,1)/3), [2 1 3]);
% sum rows, and put back together
matrix = permute(sum(matrix), [3 2 1]);
Using accumarray:
% build array of group indices [1,1,1,2,2,2,3,3,3,...]
idx = floor(((1:size(data,1))' - 1)/3) + 1;
% use it to accumulate rows (appliead to each column separately)
matrix = cell2mat(arrayfun(#(i)accumarray(idx,data(:,i)), 1:size(data,2), ...
'UniformOutput',false));
Of course all the solution so far assume that the number of rows is evenly divisble by 3.
This one-liner reshapes so that all the values needed for a particular cell are in a column, does the sum, and then reshapes the back to the expected shape.
reshape(sum(reshape(data, 3, [])), [], size(data, 2))
The naked 3 could be changed if you want to sum a different number of rows together. It's on you to make sure the number of rows in each group divides evenly.
Slice the matrix into three pieces and add them together:
matrix = data(1:3:end, :) + data(2:3:end, :) + data(3:3:end, :);
This will give an error if size(data,1) is not a multiple of three, since the three pieces wouldn't be the same size. If appropriate to your data, you might work around that by truncating data, or appending some zeros to the end.
You could also do something fancy with reshape and 3D arrays. But I would prefer the above (unless you need to replace 3 with a variable...)
Prashant answered nicely before but I would have a simple amendment:
fl = filterLength;
A = yourVector (where mod(A,fl)==0)
sum(reshape(A,fl,[]),1).'/fl;
There is the ",1" that makes the line run even when fl==1 (original values).
I discovered this while running it in a for loop like so:
... read A ...
% Plot data
hold on;
averageFactors = [1 3 10 30 100 300 1000];
colors = hsv(length(averageFactors));
clear legendTxt;
for i=1:length(averageFactors)
% ------ FILTERING ----------
clear Atrunc;
clear ttrunc;
clear B;
fl = averageFactors(i); % filter length
Atrunc = A(1:L-mod(L,fl),:);
ttrunc = t(1:L-mod(L,fl),:);
B = sum(reshape(Atrunc,fl,[]),1).'/fl;
tB = sum(reshape(ttrunc,fl,[]),1).'/fl;
length(B)
plot(tB,B,'color',colors(i,:) )
%kbhit ()
endfor