I have a pretty large matrix M and I am only interested in a few of the columns. I have a boolean vector V where a value of 1 represents a column that is of interest. Example:
-1 -1 -1 7 7 -1 -1 -1 7 7 7
M = -1 -1 7 7 7 -1 -1 7 7 7 7
-1 -1 7 7 7 -1 -1 -1 7 7 -1
V = 0 0 1 1 1 0 0 1 1 1 1
If multiple adjacent values of V are all 1, then I want the corresponding columns of M to be extracted into another matrix. Here's an example, using the matrices from before.
-1 7 7 -1 7 7 7
M1 = 7 7 7 M2 = 7 7 7 7
7 7 7 -1 7 7 -1
How might I do this efficiently? I would like all these portions of the matrix M to be stored in a cell array, or at least have an efficient way to generate them one after the other. Currently I'm doing this in a while loop and it is not as efficient as I'd like it to be.
(Note that my examples only include the values -1 and 7 just for clarity; this isn't the actual data I use.)
You can utilize the diff function for this, to break your V vector into blocks
% find where block differences exist
diffs = diff(V);
% move start index one value forward, as first value in
% diff represents diff between first and second in original vector
startPoints = find(diffs == 1) + 1;
endPoints = find(diffs == -1);
% if the first block begins with the first element diff won't have
% found start
if V(1) == 1
startPoints = [1 startPoints];
end
% if last block lasts until the end of the array, diff won't have found end
if length(startPoints) > length(endPoints)
endPoints(end+1) = length(V);
end
% subset original matrix into cell array with indices
results = cell(size(startPoints));
for c = 1:length(results)
results{c} = M(:,startPoints(c):endPoints(c));
end
The one thing I'm not sure of is if there's a better way to find the being_indices and end_indices.
Code:
X = [1 2 3 4 5 1 2 3 4 5
6 7 8 9 10 6 7 8 9 10
11 12 13 14 15 11 12 13 14 15
16 17 18 19 20 16 17 18 19 20
1 2 3 4 5 1 2 3 4 5
6 7 8 9 10 6 7 8 9 10
11 12 13 14 15 11 12 13 14 15
16 17 18 19 20 16 17 18 19 20];
V = logical([ 1 1 0 0 1 1 1 0 1 1]);
find_indices = find(V);
begin_indices = [find_indices(1) find_indices(find(diff(find_indices) ~= 1)+1)];
end_indices = [find_indices(find(diff(find_indices) ~= 1)) find_indices(end)];
X_truncated = mat2cell(X(:,V),size(X,1),[end_indices-begin_indices]+1);
X_truncated{:}
Output:
ans =
1 2
6 7
11 12
16 17
1 2
6 7
11 12
16 17
ans =
5 1 2
10 6 7
15 11 12
20 16 17
5 1 2
10 6 7
15 11 12
20 16 17
ans =
4 5
9 10
14 15
19 20
4 5
9 10
14 15
19 20
Related
I got a question when using pdist, it would be so many thanks if you could give me some advice. The pdist(D) usually gives the sum of the distance for the multiple dimension, however, I want to get the distance separately. For example I have a data set S which is a 10*2 matrix , I am using pdist(S(:,1)) and pdist(S(:,2)) to get the distance separately, but this seems very inefficient when the data has many dimensions. Is there any alternative way to achieve this more efficient? Thanks in advance!
Assuming you just want the absolute difference between the individual dimensions of the points then pdist is overkill. You can use the following simple function
function d = pdist_1d(S)
idx = nchoosek(1:size(S,1),2);
d = abs(S(idx(:,1),:) - S(idx(:,2),:));
end
which returns the absolute pairwise difference between all pairs of rows in S.
In this case
dist = pdist_1d(S)
gives the same result as
dist = cell2mat(arrayfun(#(dim)pdist(S(:,dim))',1:size(S,2),'UniformOutput',false));
Another option, since you're simply taking the absolute difference of the coordinates, is to use bsxfun:
>> D = randi(20, 10, 2) % generate sample data
D =
17 12
14 10
8 4
7 11
19 13
2 18
11 14
5 19
19 12
20 8
From here, we permute the data so that the coordinates (columns) extend into the 3rd dimension and the rows are in the 1st dimension for the 1st argument, and the 2nd dimension for the 2nd argument:
>> dist = bsxfun(#(x,y)abs(x-y), permute(D, [1 3 2]), permute(D, [3 1 2]))
dist =
ans(:,:,1) =
0 3 9 10 2 15 6 12 2 3
3 0 6 7 5 12 3 9 5 6
9 6 0 1 11 6 3 3 11 12
10 7 1 0 12 5 4 2 12 13
2 5 11 12 0 17 8 14 0 1
15 12 6 5 17 0 9 3 17 18
6 3 3 4 8 9 0 6 8 9
12 9 3 2 14 3 6 0 14 15
2 5 11 12 0 17 8 14 0 1
3 6 12 13 1 18 9 15 1 0
ans(:,:,2) =
0 2 8 1 1 6 2 7 0 4
2 0 6 1 3 8 4 9 2 2
8 6 0 7 9 14 10 15 8 4
1 1 7 0 2 7 3 8 1 3
1 3 9 2 0 5 1 6 1 5
6 8 14 7 5 0 4 1 6 10
2 4 10 3 1 4 0 5 2 6
7 9 15 8 6 1 5 0 7 11
0 2 8 1 1 6 2 7 0 4
4 2 4 3 5 10 6 11 4 0
This results in a 3-d symmetric matrix where
dist(p, q, d)
gives you the distance between points p and q in dimension d with
dist(p, q, d) == dist(q, p, d)
If you want the distances between p and q in all (or multiple) dimensions, you should use squeeze to put it in a vector:
>> squeeze(dist(3, 5, :))
ans =
11
9
Note that if you're using MATLAB 2016b or later (or Octave) you can create the same distance matrix without bsxfun:
dist = abs(permute(D, [1 3 2]) - permute(D, [3 1 2]))
The downside to this approach is that it creates the full symmetric matrix so you're generating each distance twice, which could potentially become a memory issue.
I want to use randsample to sample values from a matrix, but I want the values sampled to be replaced by zero in the matrix. What do I do/Is there a fuction for this?
I think you don't want to use randsample because you have a given matrix (here M). You can use datasample instead to randomly sample existing data. Then you can use the second output of datasample (here ind) to address the entries in the original matrix M and overwrite them easily.
In the following example operates over the second dimension and takes a selection of columns. If you want a selection of rows, change the third argument of datasample to 1 (this is Matlab's default behaviour when no third argument is given).
% create random data
M = randi(20,4,10)
% randomly sample data
[Y,ind] = datasample(M,4,2)
% write 0 for the sampled data in original matrix
M(:,ind) = 0
This is the result:
M =
20 14 6 18 1 9 4 15 11 11
11 5 9 20 8 19 2 9 3 13
20 20 16 4 1 16 13 11 4 9
15 1 4 12 20 18 4 19 11 13
Y =
18 4 15 14
20 2 9 5
4 13 11 20
12 4 19 1
ind =
4 7 8 2
M =
20 0 6 0 1 9 0 0 11 11
11 0 9 0 8 19 0 0 3 13
20 0 16 0 1 16 0 0 4 9
15 0 4 0 20 18 0 0 11 13
Initialized with rng(4).
My data matrix is large: smt like 180:3000 size.
Each element value is between 0 to 255;
I have to find areas in this matrix where average value is higher than some threshold (lets call it 'P'). And reset each element in these areas to '0'. Another words filter my matrix.
I have width and heigth of filter area.
So I need to loop over data matrix to find appropriate areas (As many as exist).
EDIT:
Please, see an example:
4 6 7 5 6 6 7
10 8 9 8 9 10 9
10 8 9 8 9 10 9
7 4 6 9 7 8 7
4 5 5 5 5 5 5
4 5 5 5 5 5 5
10 12 12 12 13 10 11
14 15 15 16 14 15 15
13 15 15 15 14 14 13
This is given matrix. Lets try to find areas (2, 3) of size where average value is > 15.
So the result will be:
4 6 7 5 6 6 7
10 8 9 8 9 10 9
10 8 9 8 9 10 9
7 4 6 9 7 8 7
4 5 5 5 5 5 5
4 5 5 5 5 5 5
10 12 12 12 13 10 11
14 0 0 0 14 15 15
13 0 0 0 14 14 13
Please, look at bottom of matrix
Please, give me some tips how it is possible to loop throw.
Thank you very much.
One way of doint this is as follows:
% example A with more areas of mean greater than 15
% there are four such areas as shown here: http://i.imgur.com/V6m0NfL.jpg
A = [16 16 16 5 16 16 16
16 16 16 8 16 16 16
10 8 9 8 9 10 9
7 4 6 9 7 8 7
4 5 15.1 15 15 5 5
4 5 15 15 15 5 5
10 12 12 12 13 10 11
14 15 15 16 14 15 15
13 15 15 15 14 14 13];
% filter size
[n,m] = deal(2,3);
% filter center
center = floor(([n,m]+1)/2);
% find where we have areas greater than 15
B = nlfilter(A, [n,m], #(b) mean(b(:)) > 15);
% get coordinates of areas with mean > 15
[rows,cols] = find(B);
% zero out elements in all found areas
for i = 1:size(rows,1)
% calculate starting coordinates for the area to be set to 0
row = rows(i) - center(1) + 1;
col = cols(i) - center(2) + 1;
A(row:row+n-1 , col:col+m-1) = 0;
end
Results in:
A =
0 0 0 5 0 0 0
0 0 0 8 0 0 0
10 8 9 8 9 10 9
7 4 6 9 7 8 7
4 5 0 0 0 5 5
4 5 0 0 0 5 5
10 12 12 12 13 10 11
14 0 0 0 14 15 15
13 0 0 0 14 14 13
try this
a = input_matrix;
ii = 2 ; jj = 3;
threshold = 15;
x = ones(ii,jj)/(ii*jj);
%\\create matrix temp2 with average value of block a(i:i+ii-1,j:j+jj-1) at temp2(i,j)
temp1 = conv2(a,x,'full');
temp2 = temp1(ii:end-ii+1,jj:end-jj+1);
%\\find row and column indices of temp2 with value > threshold
[row_ col_] = find(temp2>threshold);
out = a;
%\\assign zero value to the corresponding blocks
for iii = 1:length(row_)
out(row_(iii):row_(iii)+ii-1,col_(iii):col_(iii)+jj-1) = 0;
end
I need a function that splits a vector in smaller frames with an overlap, like buffer, but instead of column-wise, it should be done row-wise.
This is how buffer works:
x = 1:20
x = buffer(x, 10, 5);
x = 0 1 6 11
0 2 7 12
0 3 8 13
0 4 9 14
0 5 10 15
1 6 11 16
2 7 12 17
3 8 13 18
4 9 14 19
5 10 15 20
What I want would be this though:
x = 0 0 1 2
1 2 3 4
3 4 5 6
5 6 7 8
7 8 9 10
9 10 11 12
11 12 13 14
13 14 15 16
15 16 17 18
17 18 19 20
Is there any function or way to achieve that? Maybe combination of buffer + some rearranging?
First figure out the answer in columns, then transpose the resulting matrix:
buffer(x, 4, 2).'
We have the following case:
Q = [idxcell{:,1}];
Sort = sort(Q,'descend')
Sort =
Columns 1 through 13
23 23 22 22 20 19 18 18 18 18 17 17 17
Columns 14 through 26
15 15 14 14 13 13 13 12 12 12 11 10 9
Columns 27 through 39
9 9 8 8 8 8 8 7 7 7 7 7 7
Columns 40 through 52
7 6 6 6 5 4 4 3 3 3 3 2 2
Columns 53 through 64
2 2 2 2 2 2 2 1 1 1 1 1
How can we sort matrix Sort according to how many times its values are repeated?
Awaiting result should be:
repeatedSort = 2(9) 7(7) 1(5) 8(5) 3(4) 18(4) 6(3) 9(3) 12(3) 13(3) 17(3) 4(2) 14(2) 15(2) 22(2) 23(2) 5(1) 10(1) 11(1) 19(1) 20(1)
or
repeatedSort = 2 7 1 8 3 18 6 9 12 13 17 4 14 15 22 23 5 10 11 19 20
Thank you in advance.
You can use the TABULATE function from the Statistics Toolbox, then call SORTROWS to sort by the frequency.
Example:
x = randi(10, [20 1]); %# random values
t = tabulate(x); %# unique values and counts
t = t(find(t(:,2)),1:2); %# get rid of entries with zero count
t = sortrows(t, -2) %# sort according to frequency
the result, where first column are the unique values, second is their count:
t =
2 4 %# value 2 appeared four times
5 4 %# etc...
1 3
8 3
7 2
9 2
4 1
6 1
Here's one way of doing it:
d = randi(10,1,30); %Some fake data
n = histc(d,1:10);
[y,ii] = sort(n,'descend');
disp(ii) % ii is now sorted according to frequency