match IDs + find a number within a matrix in Matlab - matlab

I am facing a problem in matching elements in 2 matrices. The first element can be matched using ismember but the second element should be within a range. Please see the example below:
% col1 is integerID, col2 is a number. -->col1 is Countrycode, col2 is date
bigmat = [ 600 2
600 4
800 1
800 5
900 1] ;
% col1 is integerID, col2 is VALUE, col2 is a range -->value is Exchange rate
datamat = {...
50 0.1 [2 3 4 5 6] % 2:6
50 0.2 [9 10 11] % 9:11
600 0.01 [1 2 3 4] % 1:4
600 0.2 [8 9 10] % 8:10
800 0.12 [1] % 1:1
800 0.13 [3 4] % 3:4
900 0.15 [1 2] } ; % 1:2
I need the answer as:
ansmat = [ 600 2 0.01
600 4 0.01
800 1 0.12
800 5 nan % even deleting this row is fine
930 1 0.15 ] ;
For simplicity:
All intIDs from matrix_1 exist in matrix_2.
The numbers in range are dates! Within a range, these numbers are consecutive: [1 2...5]
For any ID, dates in the next row are not continuous. Eg, you can see [1 2 3 4] and then [8 9 10] in next row.
bigmat is a huge matrix! 300,000-500,000 rows and so a vectorized code would be appreciated. datamat is roughly 5000 rows or less. You can convert the cell to matrix. For each row, I have the minimum and maximum. The 3 column is minimum:maximum. Thanks!

Here is one possible implementation:
%# data matrices
q = [
600 2
600 4
800 1
800 5
900 1
];
M = {
50 0.1 [2 3 4 5 6]
50 0.2 [9 10 11]
600 0.01 [1 2 3 4]
600 0.2 [8 9 10]
800 0.12 [1]
800 0.13 [3 4]
900 0.15 [1 2]
};
%# build matrix: ID,value,minDate,maxDate
M = [cell2num(M(:,1:2)) cellfun(#min,M(:,3)) cellfun(#max,M(:,3))];
%# preallocate result
R = zeros(size(M,1),3);
%# find matching rows
c = 1; %# counter
for i=1:size(q,1)
%# rows indices matching ID
ind = find( ismember(M(:,1),q(i,:)) );
%# out of those, keep only those where date number is in range
ind = ind( M(ind,3) <= q(i,2) & q(i,2) <= M(ind,4) );
%# check if any
num = numel(ind);
if num==0, continue, end
%# extract matching rows
R(c:c+num-1,:) = [M(ind,1) repmat(q(i,2),[num 1]) M(ind,2)];
c = c + num;
end
%# remove excess
R(c:end,:) = [];
The result as expected:
>> R
R =
600 2 0.01
600 4 0.01
800 1 0.12
900 1 0.15

I'm not completely sure I understand .. should the second entry be '600 4 0.02'?
Anyways, you may be able to try something like:
% grab first column
col = bigmat(:, 1);
% find all entries in column that are equal to ID
rel = (col == id);
% retrieve just those rows
rows = bigmat(rel, :);
Then once you have the rows you need from your matrices, you can concatenate them together like so:
result = [rowsA(1:3) rowsB(2) rowsC(5:6)];

Related

Get submatrix made from random rows and columns of large matix

I would like to extract (square) sub-matrices from a matrix, randomly with a size between some min and max values.
I would also like to retain the row and column indices which were selected.
Are there any built-in functions for this purpose? I would appreciate any general algorithm to achieve the desired results.
Just for an example without considering the square matrix constraint:
Input:
| 1 2 3 4 5 6 7 8 <- column indices
--+----------------
1 | 4 3 1 4 0 1 0 1
2 | 2 0 1 5 6 3 2 0
3 | 5 5 0 6 7 8 9 0
4 | 2 3 5 6 7 9 0 1
^
Row indices
Output sample:
| 1 4 5 7 8 <- column indices
--+----------
1 | 4 4 0 0 1
2 | 2 5 6 2 0
4 | 2 6 7 0 1
^
Row indices
Edit: I would like to have a function like this
matrixsample(numberOfSamples, minSize, maxSize, satisfyingFunction)
the samples should vary their size between the minSize and maxSize.
If numberOfSamples = 10, minSize = 2 and maxSize = 6 then output should be randomly selected rows and columns like:
sampleMatrix1 2x2 sampleMatrix2 3x3 sampleMatrix3 5x5 ... sampleMatrix7 6x6 ... sampleMatrx10 4x4
satisfyingFunction could test any attribute of the matrix; like being non-singular, rank > x, etc.
In MATLAB, you will find the randperm function useful for selecting random rows/columns.
To get a randomly sized sub-matrix, use randi([minVal, maxVal]) to get a random integer between minVal and maxVal.
Getting a submatrix from random (ordered) combination of rows and columns:
M = randi([0,9],4,8); % 4x8 matrix of random 1 digit integers, your matrix here!
nRows = randi([2, 4]); % The number of rows to extract, random between 2 and 4
nCols = randi([5, 7]); % The number of columns to extract, random between 5 and 7
idxRows = sort(randperm(size(M,1), nRows)); % Random indices of rows
idxCols = sort(randperm(size(M,2), nCols)); % Random indices of columns
output = M(idxRows, idxCols); % Select the sub-matrix
If you want to make the sub-matrix square then simply use the same value for nRows and nCols.
Showing this method works with your example input/output values:
M = [4 3 1 4 0 1 0 1; 2 0 1 5 6 3 2 0; 5 5 0 6 7 8 9 0; 2 3 5 6 7 9 0 1];
idxRows = [1 2 4]; idxCols = [1 4 5 7 8];
output = M(idxRows, idxCols)
% >> 4 4 0 0 1
% 2 5 6 2 0
% 2 6 7 0 1
Edit in response to extended question scope:
You can package the above up into a short function, which returns the row and column indices, as well as the submatrix.
function output = getsubmatrix(M, nRows, nCols)
% Get a submatrix of random rows/columns
idxRows = sort(randperm(size(M,1), nRows));
idxCols = sort(randperm(size(M,2), nCols));
submatrix = M(idxRows, idxCols);
output = {submatrix, idxRows, idxCols};
end
Then you can use this in some sampling function as you described:
function samples = matrixsample(matrix, numberOfSamples, minSize, maxSize, satisfyingFunction)
% output SAMPLES which contains all sample matrices, with row & column indices w.r.t MATRIX
samples = cell(numberOfSamples,1);
% maximum iterations trying to satisfy SATISFYINGCONDITION
maxiters = 100;
for ii = 1:numberOfSamples
iters = 0; submatrixfound = false; % reset loop exiting conditions
nRows = randi([minSize, maxSize]); % get random submatrix size
nCols = nRows; % Square matrix
while iters < maxiters && submatrixfound == false
% Get random submatrix, and the indices
submatrix = getsubmatrix(matrix, nRows,nCols);
% satisfyingFunction MUST RETURN BOOLEAN
if satisfyingFunction(submatrix{1})
samples{ii} = submatrix; % If satisfied, assign to output
submatrixfound = true; % ... and move on!
end
iters = iters + 1;
end
end
end
Test example:
s = matrixsample(magic(10), 5, 2, 8, #(M)max(M(:)) < 90)
If the table is given, you will do it by randsample:
minRowVal = 3;
maxRowVal = 4;
minColVal = 2;
maxColVal = 4;
kRow = randi([minRowVal maxRowVal]) ;
kCol = randi([minColVal maxColVal]);
table(sort(randsample(size(table,1),kRow)),sort(randsample(size(table,2),kCol))
choose some sample by the given size of kRow and kCol, then select the information from the table.

How to creat a four dimentional (from a vector) matrix and reset it's 'lower triangle'

I am trying to get a 4 dimentional matrix out of a vector and then reset it's
'lower triangel'.
for example, if my original vector is two dimentional: A = [1 2]' then I would like my initial matrix to be:
C(:,:,1,1) = [1*1*1*1 1*1*1*2 ; 1*1*2*1 1*1*2*2] = [ 1 2 ; 2 4]
C(:,:,2,1) = [2*1*1*1 2*1*1*2 ; 2*1*2*1 2*1*2*2] = [ 2 4 ; 4 8]
C(:,:,1,2) = [1*2*1*1 1*2*1*2 ; 1*2*2*1 1*2*2*2] = [ 2 4 ; 4 8]
C(:,:,2,2) = [2*2*1*1 2*2*1*2 ; 2*2*2*1 2*2*2*2] = [ 4 8 ; 8 16]
So C is:
C(:,:,1,1) = [ 1 2 ; 2 4] C(:,:,2,1) = [ 2 4 ; 4 8]
C(:,:,1,2) = [ 2 4 ; 4 8] C(:,:,2,2) = [ 4 8 ; 8 16]
and after reset I would like it to be:
C(:,:,1,1) = [ 1 2 ; 2 4] C(:,:,2,1) = [ 0 0 ; 0 0]
C(:,:,1,2) = [ 0 0 ; 4 8] C(:,:,2,2) = [ 0 0 ; 8 16]
shotrly, I want no rows repetitions.
I tried the following code:
A = [1 2]';
C = bsxfun(#times, permute(C, [4 3 2 1]), C*C');
disp('C before reset is:');
disp(C);
for k = 2:size(C, 4)
C(1:k-1,:,k) = 0;
end
disp('C after reset is:');
disp(C);
disp('The size of C is:');
disp(size(C));
But the output is:
BB before reset is:
(:,:,1,1) =
1 2
2 4
(:,:,1,2) =
2 4
4 8
C after reset is:
(:,:,1,1) =
1 2
2 4
(:,:,1,2) =
0 0
4 8
The size of BB is:
2 2 1 2
What did I miss?
I think I don't understand what is behind the line:
C = bsxfun(#times, permute(C, [4 3 2 1]), C*C');
what is the meaning of each number in the row [4 3 2 1]?
Thanks!
edit note: The matrix represents correlations between neurons. I am trying to look at the correlation structure of groups of 4 neurons. So, each 4 neurons sould only be measuresd once. I think that he matrix before reset contains 4! times, every group of 4, because they apear in all orders. I could leave it like this but I am think it might slow the program..
Permute exchanges dimensions, so for example
C = [1:3;4:6];
permute(C, [2 1])
Computes a simple transpose by swapping rows and columns. The [2 1] argument means that the 2st and 1st dimension of C are mapped to the 1st and 2nd dimension in the result. Each 'new' dimension is specified in order. So [3 2 1] would take the 3rd, 2nd and 1st dimensions to be the new 1st, 2nd and 3rd dimensions.
permute(C, [3 2 1])
ans =
ans(:,:,1) =
1 2 3
ans(:,:,2) =
4 5 6
Elements of C with row = 1 are found in where the 3rd dimension = 1 in the result. Similarly, elements of C with row = 2 are found where the 3rd dimension = 2 in the result.
Elements of C with column = 1 are still found where column = 1 in the result (and so on) as the column dimension was mapped to itself.
The rows of the result is the interesting dimension, it is singleton (i.e. there is only one row) as a result of C having no 3rd dimension.
Addressing the first part of your problem, the correct output for C can be obtained by
A = [1 2]'*[1 2];
C = bsxfun(#times, permute(A, [4,3,1,2]), A);
I would need more information on what you want the final behaviour to be ('resetting the lower triangle') as it is unclear to me what you desire.
A function that might be useful to you is the triu function which extracts upper triangular components of a matrix.

MATLAB: After reshaping matrix to array, how can we know back where the value originally belongs to?

Let z = [1 3 5 6] and by getting all the difference between each elements:
we get:
bsxfun(#minus, z', z)
ans =
0 -2 -4 -5
2 0 -2 -3
4 2 0 -1
5 3 1 0
I now want to order these values in ascending order and remove the duplicates. So:
sort(reshape(bsxfun(#minus, z', z),1,16))
ans =
Columns 1 through 13
-5 -4 -3 -2 -2 -1 0 0 0 0 1 2 2
Columns 14 through 16
3 4 5
C = unique(sort(reshape(bsxfun(#minus, z', z),1,16)))
C =
-5 -4 -3 -2 -1 0 1 2 3 4 5
But by looking at -5 in [-5 -4 -3 -2 -1 0 1 2 3 4 5],
how can I tell where -5 comes from. By reading myself the matrix,
0 -2 -4 -5
2 0 -2 -3
4 2 0 -1
5 3 1 0
I know it comes from z(1) - z(4), i.e. row 1 column 4.
Also 2 comes from both z(3) - z(2) and z(2) - z(1), which comes from two cases. Without reading the originally matrix itself, how can we know that the 2 in [-5 -4 -3 -2 -1 0 1 2 3 4 5] is originally in row 3 column 2 and row 2 column 1 of the original matrix?
So by looking at each element in [-5 -4 -3 -2 -1 0 1 2 3 4 5], how do we know, for example, where -5 comes from in the original matrix index efficiently. I want to know as I need to do operation on ,e.g.,-5 and two indices that produce this: for example, for each difference, say -5, i do (-5)*1*6, as z(1)- z(6) = -5. But for 2, I need to do 2*(3*2+2*1) as z(3) - z(2) = 2, z(2) - z(1) = 2 which is not distinct.
Thinking hard, I think i should not reshape bsxfun(#minus, z', z) to array. I will also create two index array such that I can do operations like (-5)*1*6 stated above effectively. However, this is easier said than done and I also have to take care of nondistinct sources. Or should I do the desired operations first?
Use the third output from unique. And don't sort, unique will do that for you.
[sortedOutput,~,linearIndices] = unique(reshape(bsxfun(#minus, z', z),[1 16]))
You can reconstruct the result from bsxfun like so:
distances = reshape(sortedOutput(linearIndices),[4 4]);
If you want to know where a certain value appears, you write
targetValue = -5;
targetValueIdx = find(sortedOutput==targetValue);
linearIndexIntoDistances = find(targetValueIdx==linearIndices);
[row,col] = ind2sub([4 4],linearIndexIntoDistances);
Because linearIndices is 1 wherever the first value in sortedOutput appears in the original vector.
If you save the result of bsxfun in an intermediate variable:
distances=bsxfun(#minus, z', z)
Then you can look for the values of C in distances using find iteratively.
[rows,cols]=find(C(i)==distances)
This will give all rows and cols if the values are repeated. You just need to then use them for your equation.
You can use accumarray to collect all row and column indices that correspond to the same value in the matrix of differences:
z = [1 3 5 6]; % data vector
zd = bsxfun(#minus, z.', z); % matrix of differences
[C, ~, ind] = unique(zd); % unique values and indices
[rr, cc] = ndgrid(1:numel(z)); % template for row and col indices
f = #(x){x}; % anonymous function to collect row and col indices
row = accumarray(ind, rr(:), [], f); % group row indices according to ind
col = accumarray(ind, cc(:), [], f); % same for col indices
For example, C(6) is value 0, which appears four times in zd, at positions given by row{6} and col{6}:
>> row{6}.'
ans =
3 2 1 4
>> col{6}.'
ans =
3 2 1 4
As you see, the results are not guaranteed to be sorted. If you need to sort them in linear order:
rowcol = cellfun(#(r,c)sortrows([r c]), row, col, 'UniformOutput', false);
so now
>> rowcol{6}
ans =
1 1
2 2
3 3
4 4
I'm not sure I've followed exactly but some points to consider:
unique will sort the data for you by default so you don't need to call sort first
unique actually has three outputs and you can recover your original vector (i.e. with duplicates) using the third output so
[C,~,ic] = unique(reshape(bsxfun(#minus, z', z),1,16))
now you can get back to bsxfun(#minus, z', z),1,16) by calling
reshape(C(ic), numel(z), numel(z))
You might be more interested in the second output of unique which tells you what index each unique value was at in your 1-by-16 vector. It really depends on what you're trying to do though. But with this you could get a list of row column pairs to match your unique values:
[rows, cols] = ndgrid(1:4);
coords = [rows(:), cols(:)];
[C, ia] = unique(reshape(bsxfun(#minus, z', z),1,16));
coords_pairs = coords(ia,:)
which results in
coords_pairs =
1 4
1 3
2 4
2 3
3 4
4 4
4 3
3 2
4 2
3 1
4 1

average 3rd column when 1st and 2nd column have same numbers

just lets make it simple, assume that I have a 10x3 matrix in matlab. The numbers in the first two columns in each row represent the x and y (position) and the number in 3rd columns show the corresponding value. For instance, [1 4 12] shows that the value of function in x=1 and y=4 is equal to 12. I also have same x, and y in different rows, and I want to average the values with same x,y. and replace all of them with averaged one.
For example :
A = [1 4 12
1 4 14
1 4 10
1 5 5
1 5 7];
I want to have
B = [1 4 12
1 5 6]
I really appreciate your help
Thanks
Ali
Like this?
A = [1 4 12;1 4 14;1 4 10; 1 5 5;1 5 7];
[x,y] = consolidator(A(:,1:2),A(:,3),#mean);
B = [x,y]
B =
1 4 12
1 5 6
Consolidator is on the File Exchange.
Using built-in functions:
sparsemean = accumarray(A(:,1:2), A(:,3).', [], #mean, 0, true);
[i,j,v] = find(sparsemean);
B = [i.' j.' v.'];
A = [1 4 12;1 4 14;1 4 10; 1 5 5;1 5 7]; %your example data
B = unique(A(:, 1:2), 'rows'); %find the unique xy pairs
C = nan(length(B), 1);
% calculate means
for ii = 1:length(B)
C(ii) = mean(A(A(:, 1) == B(ii, 1) & A(:, 2) == B(ii, 2), 3));
end
C =
12
6
The step inside the for loop uses logical indexing to find the mean of rows that match the current xy pair in the loop.
Use unique to get the unique rows and use the returned indexing array to find the ones that should be averaged and ask accumarray to do the averaging part:
[C,~,J]=unique(A(:,1:2), 'rows');
B=[C, accumarray(J,A(:,3),[],#mean)];
For your example
>> [C,~,J]=unique(A(:,1:2), 'rows')
C =
1 4
1 5
J =
1
1
1
2
2
C contains the unique rows and J shows which rows in the original matrix correspond to the rows in C then
>> accumarray(J,A(:,3),[],#mean)
ans =
12
6
returns the desired averages and
>> B=[C, accumarray(J,A(:,3),[],#mean)]
B =
1 4 12
1 5 6
is the answer.

Ranking values with repeats. Matlab

My question is simple, but I can't find the answer...
I have a 100,000 rows x 30 colums matrix for a simulation, and I need to rank the 100k values of each column.
I'm looking for something similar to tiedrank but I need that repeats does count (and not the average).
Suppose: data = [-1 2 0 -2 0] what I need is rank= [2 5 3 1 4]
Any suggestion?
Thank you very much!
Juan
It seems like you need sort:
data = [-1 2 0 -2 0];
[ignore, idx] = sort( data );
rank( idx ) = 1:numel(idx)
rank =
2 5 3 1 4
To rank all columns of matrix as once you can use the following code
data = [ -1 2 0 -2 0; -1 -1 -2 2 2]' ; %'
[n m] = size( data ); % number of rows and columns
[ignore idx] = sort(data); % sort columns
rank = zeros( size(data) ); % allocate
rank( sub2ind( size(rank), idx, bsxfun(#times, 1:m, ones(n,1) ) ) ) = ...
repmat( (1:n)', 1, m )
rank =
2 2
5 3
3 1
1 4
4 5