Get submatrix made from random rows and columns of large matix - matlab

I would like to extract (square) sub-matrices from a matrix, randomly with a size between some min and max values.
I would also like to retain the row and column indices which were selected.
Are there any built-in functions for this purpose? I would appreciate any general algorithm to achieve the desired results.
Just for an example without considering the square matrix constraint:
| 1 2 3 4 5 6 7 8 <- column indices
1 | 4 3 1 4 0 1 0 1
2 | 2 0 1 5 6 3 2 0
3 | 5 5 0 6 7 8 9 0
4 | 2 3 5 6 7 9 0 1
Row indices
Output sample:
| 1 4 5 7 8 <- column indices
1 | 4 4 0 0 1
2 | 2 5 6 2 0
4 | 2 6 7 0 1
Row indices
Edit: I would like to have a function like this
matrixsample(numberOfSamples, minSize, maxSize, satisfyingFunction)
the samples should vary their size between the minSize and maxSize.
If numberOfSamples = 10, minSize = 2 and maxSize = 6 then output should be randomly selected rows and columns like:
sampleMatrix1 2x2 sampleMatrix2 3x3 sampleMatrix3 5x5 ... sampleMatrix7 6x6 ... sampleMatrx10 4x4
satisfyingFunction could test any attribute of the matrix; like being non-singular, rank > x, etc.

In MATLAB, you will find the randperm function useful for selecting random rows/columns.
To get a randomly sized sub-matrix, use randi([minVal, maxVal]) to get a random integer between minVal and maxVal.
Getting a submatrix from random (ordered) combination of rows and columns:
M = randi([0,9],4,8); % 4x8 matrix of random 1 digit integers, your matrix here!
nRows = randi([2, 4]); % The number of rows to extract, random between 2 and 4
nCols = randi([5, 7]); % The number of columns to extract, random between 5 and 7
idxRows = sort(randperm(size(M,1), nRows)); % Random indices of rows
idxCols = sort(randperm(size(M,2), nCols)); % Random indices of columns
output = M(idxRows, idxCols); % Select the sub-matrix
If you want to make the sub-matrix square then simply use the same value for nRows and nCols.
Showing this method works with your example input/output values:
M = [4 3 1 4 0 1 0 1; 2 0 1 5 6 3 2 0; 5 5 0 6 7 8 9 0; 2 3 5 6 7 9 0 1];
idxRows = [1 2 4]; idxCols = [1 4 5 7 8];
output = M(idxRows, idxCols)
% >> 4 4 0 0 1
% 2 5 6 2 0
% 2 6 7 0 1
Edit in response to extended question scope:
You can package the above up into a short function, which returns the row and column indices, as well as the submatrix.
function output = getsubmatrix(M, nRows, nCols)
% Get a submatrix of random rows/columns
idxRows = sort(randperm(size(M,1), nRows));
idxCols = sort(randperm(size(M,2), nCols));
submatrix = M(idxRows, idxCols);
output = {submatrix, idxRows, idxCols};
Then you can use this in some sampling function as you described:
function samples = matrixsample(matrix, numberOfSamples, minSize, maxSize, satisfyingFunction)
% output SAMPLES which contains all sample matrices, with row & column indices w.r.t MATRIX
samples = cell(numberOfSamples,1);
% maximum iterations trying to satisfy SATISFYINGCONDITION
maxiters = 100;
for ii = 1:numberOfSamples
iters = 0; submatrixfound = false; % reset loop exiting conditions
nRows = randi([minSize, maxSize]); % get random submatrix size
nCols = nRows; % Square matrix
while iters < maxiters && submatrixfound == false
% Get random submatrix, and the indices
submatrix = getsubmatrix(matrix, nRows,nCols);
% satisfyingFunction MUST RETURN BOOLEAN
if satisfyingFunction(submatrix{1})
samples{ii} = submatrix; % If satisfied, assign to output
submatrixfound = true; % ... and move on!
iters = iters + 1;
Test example:
s = matrixsample(magic(10), 5, 2, 8, #(M)max(M(:)) < 90)

If the table is given, you will do it by randsample:
minRowVal = 3;
maxRowVal = 4;
minColVal = 2;
maxColVal = 4;
kRow = randi([minRowVal maxRowVal]) ;
kCol = randi([minColVal maxColVal]);
choose some sample by the given size of kRow and kCol, then select the information from the table.


How can I calculate the relative frequency of a row in a data set using Matlab?

I am new to Matlab and I have a basic question.
I have this data set:
1 2 3
4 5 7
5 2 7
1 2 3
6 5 3
I am trying to calculate the relative frequencies from the dataset above
specifically calculating the relative frequency of x=1, y=2 and z=3
my code is:
data = load('datasetReduced.txt')
X = data(:, 1)
Y = data(:, 2)
Z = data(:, 3)
f = 0;
for i=1:5
if X == 1 & Y == 2 & Z == 3
s = 1;
s = 0;
f = f + s;
r = f/5
it is giving me a 0 result.
How can the code be corrected??
Your issue is likely that you are comparing floating point numbers using the == operator which is likely to fail due to floating point errors.
A faster way to do this would be to use ismember with the 'rows' option which will result in a logical array that you can then sum to get the total number of rows that matched and divide by the total number of rows.
tf = ismember(data, [1 2 3], 'rows');
relFreq = sum(tf) / numel(tf);
I think you want to count frequency of each instance, So try this
data = [1 2 3
4 5 7
5 2 7
1 2 3
6 5 3];
[counts,centers] = hist(data , unique(data))
Where centers is your unique instances and counts is count of each of them. The result should be as follow:
counts =
2 0 0
0 3 0
0 0 3
1 0 0
1 2 0
1 0 0
0 0 2
centers =
1 2 3 4 5 6 7
That it means you have 7 unique instances, from 1 to 7 and there is two 1s in first column and there is not any 1s in second and third and etc.

How find rows and columns in matlab

I have variable matrix :
A = [1 2 8 8 1
4 6 8 1 1
5 3 1 1 8];
and I have variable B :
B=[2 3 1 8 8];
Question is how to find rows and columns (sort by rows) in variable A from variable B.
Example, first index in variable B is 2, and then I want to find value 2 in variable A and get to first rows and columns, and next process until index 5, but if rows and columns has been used so get second position (ex. index 4 & 5 having same value).
Result is:
rows = 1 3 1 1 1
columns = 2 2 1 3 4
Use can use find and sub2ind to achieve what you want
but for that you have to take transpose of your A first
A = [1 2 8 8 1
4 6 8 1 1
5 3 1 1 8];
B= [2 3 1 8 8];
TMP = A.';
for i = 1:length(B)
indx = find(TMP== B(i),1,'first') %Finding the element of B present in A
if(~isempty(indx )) % If B(i) is a member of A
[column(i),row(i)] = ind2sub(size(TMP),indx) % store it in row and column matrix
TMP(indx) = nan; % remove that element
column =
2 2 1 3 4
row =
1 3 1 1 1
As in one of the comments Usama suggested preallocation of memory
you can do that by using
row = zeros(1,sum(ismember(B,A)))
column= zeros(1,sum(ismember(B,A)))
The above code works even if there are some members of B not present in A
Use find. The function could return both a linear index or a row/col index.
Using linear index a solution could be
idx = zeros(size(B));
for i = 1:numel(B)
% Find all indexes
tmpIdx = find(A == B(i));
% Remove those already used
tmpIdx = setdiff(tmpIdx, idx);
% Get the first new unique
idx(i) = tmpIdx(1);
% Convert index to row and col
[rows, cols] = ind2sub(size(A),idx)
rows = 1 3 1 1 2
cols = 2 2 1 3 3
Note that as the linear indexing goes down column by column, the result here differs from the one in your example (although still a correct index)
rows = 1 3 1 1 1
columns= 2 2 1 3 4
But to get this you could just transpose the A matrix (A.') and flip the rows and cols (the result from ind2sub)
Here is on solution where I use for loop, I tried to optimize the number of iteration and the computational cost. If there is no corresponding value between B and A the row/col index return NaN.
[Bu,~,ord] = unique(B,'stable');
% Index of each different values
[col,row] = arrayfun(#(x) find(A'==x),Bu,'UniformOutput',0)
% For each value in vector B we search the first "non already used" corresponding value in A.
for i = 1:length(B)
if ~isempty(row{ord(i)})
r(i) = row{ord(i)}(1);
row{ord(i)}(1) = [];
c(i) = col{ord(i)}(1);
col{ord(i)}(1) = [];
r(i) = NaN;
c(i) = NaN;
c = [2 2 1 3 4]
r = [1 3 1 1 1]

Position of integers in vector

I think the question is pretty basic, but still keeps me busy since some time.
Lets assume we have a vector containing 4 integers randomly repetetive, like:
v = [ 1 3 3 3 4 2 1 2 3 4 3 2 1 4 3 3 4 2 2]
I am searching for the vector of all positions of each integer, e.g. for 1 it should be a vector like:
position_one = [1 7 13]
Since I want to search every row of a 100x10000 matrix I was not able to deal with linear indeces.
Thanks in advance!
Rows and columns
Since your output for every integer changes, a cell array will fit the whole task. For the whole matrix, you can do something like:
A = randi(4,10,30); % some data
Row = repmat((1:size(A,1)).',1,size(A,2)); % index of the row
Col = repmat((1:size(A,2)),size(A,1),1); % index of the column
pos = #(n) [Row(A==n) Col(A==n)]; % Anonymous function to find the indices of 'n'
than for every n you can write:
>> pos(3)
ans =
1 1
2 1
5 1
6 1
9 1
8 2
3 3
. .
. .
. .
where the first column is the row, and the second is the column for every instance of n in A.
And for all ns you can use an arrayfun:
positions = arrayfun(pos,1:max(A(:)),'UniformOutput',false) % a loop that goes over all n's
or a simple for loop (faster):
positions = cell(1,max(A(:)));
for n = 1:max(A(:))
positions(n) = {pos(n)};
The output in both cases would be a cell array:
positions =
[70x2 double] [78x2 double] [76x2 double] [76x2 double]
and for every n you can write positions{n}, to get for example:
>> positions{1}
ans =
10 1
2 3
5 3
3 4
5 4
1 5
4 5
. .
. .
. .
Only rows
If all you want in the column index per a given row and n, you can write this:
A = randi(4,10,30);
row_pos = #(k,n) A(k,:)==n;
positions = false(size(A,1),max(A(:)),size(A,2));
for n = 1:max(A(:))
positions(:,n,:) = row_pos(1:size(A,1),n);
now, positions is a logical 3-D array, that every row corresponds to a row in A, every column corresponds to a value of n, and the third dimension is the presence vector for the combination of row and n. this way, we can define R to be the column index:
R = 1:size(A,2);
and then find the relevant positions for a given row and n. For instance, the column indices of n=3 in row 9 is:
>> R(positions(9,3,:))
ans =
2 6 18 19 23 24 26 27
this would be just like calling find(A(9,:)==3), but if you need to perform this many times, the finding all indices and store them in positions (which is logical so it is not so big) would be faster.
Find linear indexes in a matrix: I = find(A == 1).
Find two dimensional indexes in matrix A: [row, col] = find(A == 1).
%Create sample matrix, with elements equal one:
A = zeros(5, 4);
A([2 10 14]) = 1
A =
0 0 0 0
1 0 0 0
0 0 0 0
0 0 1 0
0 1 0 0
Find ones as linear indexes in A:
find(A == 1)
ans =
%This is the same as reshaping A to a vector and find ones in the vector:
B = A(:);
find(B == 1);
B' =
0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
Find two dimensional indexes:
[row, col] = find(A == 1)
row =
col =
You can do that with accumarray using an anonymous function as follows:
positions = accumarray(v(:), 1:numel(v), [], #(x) {sort(x.')});
v = [ 1 3 3 3 4 2 1 2 3 4 3 2 1 4 3 3 4 2 2];
this gives
positions{1} =
1 7 13
positions{2} =
6 8 12 18 19
positions{3} =
2 3 4 9 11 15 16
positions{4} =
5 10 14 17

average 3rd column when 1st and 2nd column have same numbers

just lets make it simple, assume that I have a 10x3 matrix in matlab. The numbers in the first two columns in each row represent the x and y (position) and the number in 3rd columns show the corresponding value. For instance, [1 4 12] shows that the value of function in x=1 and y=4 is equal to 12. I also have same x, and y in different rows, and I want to average the values with same x,y. and replace all of them with averaged one.
For example :
A = [1 4 12
1 4 14
1 4 10
1 5 5
1 5 7];
I want to have
B = [1 4 12
1 5 6]
I really appreciate your help
Like this?
A = [1 4 12;1 4 14;1 4 10; 1 5 5;1 5 7];
[x,y] = consolidator(A(:,1:2),A(:,3),#mean);
B = [x,y]
B =
1 4 12
1 5 6
Consolidator is on the File Exchange.
Using built-in functions:
sparsemean = accumarray(A(:,1:2), A(:,3).', [], #mean, 0, true);
[i,j,v] = find(sparsemean);
B = [i.' j.' v.'];
A = [1 4 12;1 4 14;1 4 10; 1 5 5;1 5 7]; %your example data
B = unique(A(:, 1:2), 'rows'); %find the unique xy pairs
C = nan(length(B), 1);
% calculate means
for ii = 1:length(B)
C(ii) = mean(A(A(:, 1) == B(ii, 1) & A(:, 2) == B(ii, 2), 3));
C =
The step inside the for loop uses logical indexing to find the mean of rows that match the current xy pair in the loop.
Use unique to get the unique rows and use the returned indexing array to find the ones that should be averaged and ask accumarray to do the averaging part:
[C,~,J]=unique(A(:,1:2), 'rows');
B=[C, accumarray(J,A(:,3),[],#mean)];
For your example
>> [C,~,J]=unique(A(:,1:2), 'rows')
C =
1 4
1 5
J =
C contains the unique rows and J shows which rows in the original matrix correspond to the rows in C then
>> accumarray(J,A(:,3),[],#mean)
ans =
returns the desired averages and
>> B=[C, accumarray(J,A(:,3),[],#mean)]
B =
1 4 12
1 5 6
is the answer.

Randomly replace percentage of elements in matrix per existing values

is there a sensible way to replace x% of each value in matrix/vector with a new value, and have the the element(s) to be changed be selected randomly? That is, in A, if I wanted to change 20% of the values (1 element per existing value) to the value 5, how do I make sure that each of the 5 elements per existing value in A has an equal probability of changing to the new value (e.g. 5)? I would appreciate some guidance on a method to complete the task described above.
Thank you kindly.
% Example Matrix
% M = 5;
% N = 5;
% A = zeros(M, N);
A = [0 0 0 0 0;
1 1 1 1 1;
2 2 2 2 2;
3 3 3 3 3;
4 4 4 4 4];
% Example Matrix with 20% of elements per value replaced with the value '5'
A = [0 0 5 0 0;
1 5 1 1 1;
2 5 2 2 2;
3 3 3 3 5;
4 4 5 4 4];
Try using logical arrays and a random number generated, like this:
Using information from here and here I was able to achieve my objective. The code below will replace x% of each value in matrix with a new value and then randomize its location within that value in the matrix.
M = 5;
N = 5;
A = zeros(M, N);
PC = 20; % percent to change
nCells = round(100/PC); % # of cells to replace with new value
A = [0 0 0 0 0;
1 1 1 1 1;
2 2 2 2 2;
3 3 3 3 3;
4 4 4 4 4];
A2 = A+1; % Pad the cell values for calculations (bc of zero)
newvalue = 6;
a=hist(A2(:),5);% determine qty of each value
for i=1:5
% find 1st instance of each value and convert to newvalue
out = A2-1; % remove padding
[~,idx] = sort(rand(M,N),2); % convert column indices into linear indices
idx = (idx-1)*M + ndgrid(1:M,1:N); %rearrange each newvalue to be random
A = out;
A(:) = A(idx);