Find series of the same value - matlab

Given a vector A that contains a sequence of numbers.
The objective is to find all series (longer than a given number "threshold") that contain the same value. The result should be the position of both first and last values of that series.
Example: given a vector A where:
A = [1 1 1 2 1 3 3 3 1 1 1 1 1 4 3 2 2 2 2 2 2 2 3 4];
and a threshold B = 5;
The results would be:
[9 13] % a series contain only the number 1 with length equal to 5
[16 22] % a series contain only the number 2 with length equal to 7

A=[1 1 1 2 1 3 3 3 1 1 1 1 1 4 3 2 2 2 2 2 2 2 3 4];
B = 5;
[l c]= size(A); % to know the size of 'A'
K=1; % to define the length of the series
W=1; % a value used to save the positions of the wanted series.
For i=1:c-1
If A(i)==A(i+1)
K=k+1;
Else
If k>= B % to test of the actual series is equal or longer than the given threshold
S(w,1)=i;
S(w,2)= S(w,1)-k+1; % saving the first position and the last position of the series in 'S'
w=w+1;
end
k=1;
end
S % the final result which is a table contain all wanted series.
the result is as follow:
S 13 9 % 13: the last position of the wanted series and 9 is the first position
16 22
This one work soo good. But still... it is soo slow when its come to a big table.

A faster, vectorized option is to modify the approach from this solution for finding islands of zeroes:
A = [1 1 1 2 1 3 3 3 1 1 1 1 1 4 3 2 2 2 2 2 2 2 3 4]; % Sample data
B = 5; % Threshold
tsig = (diff(A) ~= 0);
dsig = diff([1 tsig 1]);
startIndex = find(dsig < 0);
endIndex = find(dsig > 0)-1;
duration = endIndex-startIndex+1;
stringIndex = (duration >= (B-1));
result = [startIndex(stringIndex); endIndex(stringIndex)+1].';
And the results:
result =
9 13
16 22

Related

How to create all permutations of a 2-column cell-array?

I created a cell array of shape m x 2, each element of which is a matrix of shape d x d.
For example like this:
A = cell(8, 2);
for row = 1:8
for col = 1:2
A{row, col} = rand(3, 3);
end
end
More generally, I can represent A as follows:
where each A_{ij} is a matrix.
Now, I need to randomly pick a matrix from each row of A, because A has m rows in total, so eventually I will pick out m matrices, which we call a combination.
Obviously, since there are only two picks for each row, there are a total of 2^m possible combinations.
My question is, how to get these 2^m combinations quickly?
It can be seen that the above problem is actually finding the Cartesian product of the following sets:
2^m is actually a binary number, so we can use those to create linear indices. You'll get an array containing 1s and 0s, something like [1 1 0 0 1 0 1 0 1], which we can treat as column "indices", using a 0 to indicate the first column and a 1 to indicate the second.
m = size(A, 1);
% Build all binary numbers and create a logical matrix
bin_idx = dec2bin(0:(2^m -1)) == '1';
row = 3; % Loop here over size(bin_idx,1) for all possible permutations
linear_idx = [find(~bin_idx(row,:)) find(bin_idx(row,:))+m];
A{linear_idx} % the combination as specified by the permutation in out(row)
On my R2007b version this runs virtually instant for m = 20.
NB: this will take m * 2^m bytes of memory to store bin_idx. Where that's just 20 MB for m = 20, that's already 30 GB for m = 30, i.e. you'll be running out of memory fairly quickly, and that's for just storing permutations as booleans! If m is large in your case, you can't store all of your possibilities anyway, so I'd just select a random one:
bin_idx = rand(m, 1); % Generate m random numbers
bin_idx(bin_idx > 0.5) = 1; % Set half to 1
bin_idx(bin_idx < 0.5) = 0; % and half to 0
Old, slow answer for large m
perms()1 gives you all possible permutations of a given set. However, it does not take duplicate entries into account, so you'll need to call unique() to get the unique rows.
unique(perms([1,1,2,2]), 'rows')
ans =
1 1 2 2
1 2 1 2
1 2 2 1
2 1 1 2
2 1 2 1
2 2 1 1
The only thing left now is to somehow do this over all possible amounts of 1s and 2s. I suggest using a simple loop:
m = 5;
out = [];
for ii = 1:m
my_tmp = ones(m,1);
my_tmp(ii:end) = 2;
out = [out; unique(perms(my_tmp),'rows')];
end
out = [out; ones(1,m)]; % Tack on the missing all-ones row
out =
2 2 2 2 2
1 2 2 2 2
2 1 2 2 2
2 2 1 2 2
2 2 2 1 2
2 2 2 2 1
1 1 2 2 2
1 2 1 2 2
1 2 2 1 2
1 2 2 2 1
2 1 1 2 2
2 1 2 1 2
2 1 2 2 1
2 2 1 1 2
2 2 1 2 1
2 2 2 1 1
1 1 1 2 2
1 1 2 1 2
1 1 2 2 1
1 2 1 1 2
1 2 1 2 1
1 2 2 1 1
2 1 1 1 2
2 1 1 2 1
2 1 2 1 1
2 2 1 1 1
1 1 1 1 2
1 1 1 2 1
1 1 2 1 1
1 2 1 1 1
2 1 1 1 1
1 1 1 1 1
NB: I've not initialised out, which will be slow especially for large m. Of course out = zeros(2^m, m) will be its final size, but you'll need to juggle the indices within the for loop to account for the changing sizes of the unique permutations.
You can create linear indices from out using find()
linear_idx = [find(out(row,:)==1);find(out(row,:)==2)+size(A,1)];
A{linear_idx} % the combination as specified by the permutation in out(row)
Linear indices are row-major in MATLAB, thus whenever you need the matrix in column 1, simply use its row number and whenever you need the second column, use the row number + size(A,1), i.e. the total number of rows.
Combining everything together:
A = cell(8, 2);
for row = 1:8
for col = 1:2
A{row, col} = rand(3, 3);
end
end
m = size(A,1);
out = [];
for ii = 1:m
my_tmp = ones(m,1);
my_tmp(ii:end) = 2;
out = [out; unique(perms(my_tmp),'rows')];
end
out = [out; ones(1,m)];
row = 3; % Loop here over size(out,1) for all possible permutations
linear_idx = [find(out(row,:)==1).';find(out(row,:)==2).'+m];
A{linear_idx} % the combination as specified by the permutation in out(row)
1 There's a note in the documentation:
perms(v) is practical when length(v) is less than about 10.

How can I calculate the relative frequency of a row in a data set using Matlab?

I am new to Matlab and I have a basic question.
I have this data set:
1 2 3
4 5 7
5 2 7
1 2 3
6 5 3
I am trying to calculate the relative frequencies from the dataset above
specifically calculating the relative frequency of x=1, y=2 and z=3
my code is:
data = load('datasetReduced.txt')
X = data(:, 1)
Y = data(:, 2)
Z = data(:, 3)
f = 0;
for i=1:5
if X == 1 & Y == 2 & Z == 3
s = 1;
else
s = 0;
end
f = f + s;
end
f
r = f/5
it is giving me a 0 result.
How can the code be corrected??
thanks,
Shosho
Your issue is likely that you are comparing floating point numbers using the == operator which is likely to fail due to floating point errors.
A faster way to do this would be to use ismember with the 'rows' option which will result in a logical array that you can then sum to get the total number of rows that matched and divide by the total number of rows.
tf = ismember(data, [1 2 3], 'rows');
relFreq = sum(tf) / numel(tf);
I think you want to count frequency of each instance, So try this
data = [1 2 3
4 5 7
5 2 7
1 2 3
6 5 3];
[counts,centers] = hist(data , unique(data))
Where centers is your unique instances and counts is count of each of them. The result should be as follow:
counts =
2 0 0
0 3 0
0 0 3
1 0 0
1 2 0
1 0 0
0 0 2
centers =
1 2 3 4 5 6 7
That it means you have 7 unique instances, from 1 to 7 and there is two 1s in first column and there is not any 1s in second and third and etc.

Histogram of subblock matrix

Given some matrix, I want to divide it into blocks of size 2-by-2 and show a histogram for each of the blocks. The following is the code I wrote to solve the problem, but the sum of the histograms I'm generating is not the same as the histogram of the whole matrix. Actually the the sum of the blocks' histograms is double what I expected. What am I doing wrong?
im =[1 1 1 2 0 6 4 3; 1 1 0 4 2 9 1 2; 1 0 1 7 4 3 0 9; 2 3 4 7 8 1 1 4; 9 6 4 1 5 3 1 4; 1 3 5 7 9 0 2 5; 1 1 1 1 0 0 0 0; 1 1 2 2 3 3 4 4];
display(imhist(im));
[r c]=size(im);
bs = 2; % Block Size (8x8)
nob=[r c ]./ bs; % Total number of Blocks
% Dividing the image into 8x8 Blocks
kk=0;
for k=1:nob/2
for i=1:(r/bs)
for j=1:(c/bs)
Block(:,:,kk+j)=im((bs*(i-1)+1:bs*(i-1)+bs),(bs*(j-1)+1:bs*(j-1)+bs));
count(:,:,kk+j)=sum(sum(sum(hist(Block(:,:,kk+j)))));
p=sum(count(:,:,kk+j));
end
kk=kk+(r/bs);
end
end
The reason they aren't the same is because you use imhist for im and hist for the blocks. Hist separates data into 10 different bins based on your data range, imhist separates data based on the image type. Since your arrays are doubles, the imhist bins are from 0 to 1.0 Thats why your imhist has only values at 0, and 1. The hist produces bins based on your data range, so it will actually change slightly depending on what value you pass in. So you cant simply add bins together. Even though they are the same size vector 10x1 , the values in them can be very different. in one set bin(1) can be the range 1-5 but in another set of data bin(1) could be 1-500.
To fix all these issues I used imhist, and converted your data to uint8. At the very end I subtract the two histograms from one another and get zero, this shows that they are indeed the same
im =uint8([1 1 1 2 0 6 4 3 ;
1 1 0 4 2 9 1 2 ;
1 0 1 7 4 3 0 9 ;
2 3 4 7 8 1 1 4 ;
9 6 4 1 5 3 1 4 ;
1 3 5 7 9 0 2 5 ;
1 1 1 1 0 0 0 0 ;
1 1 2 2 3 3 4 4 ]);
orig_imhist = imhist(im);
%% next thing
[r c]=size(im);
bs=2; % Block Size (8x8)
nob=[r c ]./ bs; % Total number of Blocks
%creates arrays ahead of time
block = uint8(zeros(bs,bs,nob(1)*nob(2)));
%we use 256, because a uint8 has 256 values, or 256 'bins' for the
%histogram
block_imhist = zeros(256,nob(1)*nob(2));
sum_block_hist = zeros(256,1);
% Dividing the image into 2x2 Blocks
for i = 0:nob(1)-1
for j = 0:nob(2)-1
curr_block = i*nob(1)+(j+1);
%creates the 2x2 block
block(:,:,curr_block) = im(bs*i+1:bs*i+ bs,bs*j+1:bs*j+ bs);
%creates a histogram for the block
block_imhist(:,curr_block) = imhist(block(:,:,curr_block));
%adds the current histogram to the running sum
sum_block_hist = sum_block_hist + block_imhist(:,curr_block);
end
end
%shows that the two are the same
sum(sum(orig_imhist-sum_block_hist))
if my solution solves your problem please mark it as the answer

Sorting Coordinate Matrix in Matlab

In Matlab I have a big matrix containing the coordinates (x,y,z) of many points (over 200000). There is an extra column used as identification. I have written this code in order to sort all coordinate points. My final goal is to find duplicated points (rows with same x,y,z). After sorting the coordinate points I use the diff function, two consecutive rows of the matrix with the same coordinates will take value [0 0 0], and then with ismember I can find which rows of that matrix resulting from applying "diff" have the [0 0 0] row. With the indices returned from ismember I can find which points are repeated.
Back to my question...This is the code I wrote to sort properly my coordintes+id matrix. I guess It could be done better. Any suggestion?
%coordinates are always positive
a=[ 1 2 8 4; %sample matrix
1 0 5 6;
2 4 7 1;
3 2 1 0;
2 3 5 0;
3 1 2 8;
1 2 4 8];
b=a; %for checking purposes
%sorting first column
a=sortrows(a,1);
%sorting second column
for i=0:max(a(:,1));
k=find(a(:,1)==i);
if not(isempty(k))
a(k,:)=sortrows(a(k,:),2);
end
end
%Sorting third column
for i=0:max(a(:,2));
k=find(a(:,2)==i);
if not(isempty(k))
%identifying rows with same value on first column
for j=1:length(k)
[rows,~] = ismember(a(:,1:2), [ a(k(j),1),i],'rows');
a(rows,3:end)=sortrows(a(rows,3:end),1);
end
end
end
%Checking that rows remain the same
m=ismember(b,a,'rows');
if length(m)~=sum(m)
disp('Error while sorting!');
end
Why don't you just use unique?
[uniqueRows, ii, jj] = unique(a(:,1:3),'rows');
Example
a = [1 2 3 5
3 2 3 6
1 2 3 9
2 2 2 8];
gives
uniqueRows =
1 2 3
2 2 2
3 2 3
and
jj =
1
3
1
2
meaning third row equals first row.
If you need the full unique rows, including the fourth column: use ii to index a:
fullUniqueRows = a(ii,:);
which gives
fullUniqueRows =
1 2 3 9
2 2 2 8
3 2 3 6
Trying to sort a based on the fourth column? Do this -
a=[ 1 2 8 4; %sample matrix
1 0 5 6;
2 4 7 1;
3 2 1 0;
2 3 5 0;
3 2 1 8;
1 2 4 8];
[x,y] = sort(a(:,4))
sorted_a=a(y,:)
Trying to get the row indices having repeated x-y-z coordinates being represented by the first three columns? Do this -
out = sum(squeeze(all(bsxfun(#eq,a(:,1:3),permute(a(:,1:3),[3 2 1])),2)),2)>1
and use it similarly for sorted_a.

Matlab: sorting a vector by the number of time each unique value occurs

We have p.e. i = 1:25 iterations.
Each iteration result is a 1xlength(N) cell array, where 0<=N<=25.
iteration 1: 4 5 9 10 20
iteration 2: 3 8 9 13 14 6
...
iteration 25: 1 2 3
We evaluate the results of all iterations to one matrix sorted according to frequency each value is repeated in descending order like this example:
Matrix=
Columns 1 through 13
16 22 19 25 2 5 8 14 17 21 3 12 13
6 5 4 4 3 3 3 3 3 3 2 2 2
Columns 14 through 23
18 20 1 6 7 9 10 11 15 23
2 2 1 1 1 1 1 1 1 1
Result explanation: Column 1: N == 16 is present in 6 iterations, column 2: N == 22 is present in 5 iterations etc.
If a number N isn't displayed (in that paradigm N == 4, N == 24) in any iteration, is not listed with frequency index of zero either.
I want to associate each iteration (i) to the first N it is displayed p.e. N == 9 to be present only in first iteration i = 1 and not in i = 2 too, N == 3 only to i = 2 and not in i = 25 too etc until all i's to be unique associated to N's.
Thank you in advance.
Here's a way that uses a feature of unique (i.e. that it returns the index to the first value) that was introduced in R2012a
%# make some sample data
iteration{1} = [1 2 4 6];
iteration{2} = [1 3 6];
iteration{3} = [1 2 3 4 5 6];
nIter= length(iteration);
%# create an index vector so we can associate N's with iterations
nn = cellfun(#numel,iteration);
idx = zeros(1,sum(nn));
idx([1,cumsum(nn(1:end-1))+1]) = 1;
idx = cumsum(idx); %# has 4 ones, 3 twos, 6 threes
%# create a vector of the same length as idx with all the N's
nVec = cat(2,iteration{:});
%# run `unique` on the vector to identify the first occurrence of each N
[~,firstIdx] = unique(nVec,'first');
%# create a "cleanIteration" array, where each N only appears once
cleanIter = accumarray(idx(firstIdx)',firstIdx',[nIter,1],#(x){sort(nVec(x))},{});
cleanIter =
[1x4 double]
[ 3]
[ 5]
>> cleanIter{1}
ans =
1 2 4 6
Here is another solution using accumarray. Explanations in the comments
% example data (from your question)
iteration{1} = [4 5 9 10 20 ];
iteration{2} = [3 8 9 13 14 6];
iteration{3} = [1 2 3];
niterations = length(iteration);
% create iteration numbers
% same as Jonas did in the first part of his code, but using a short loop
for i=1:niterations
idx{i} = i*ones(size(iteration{i}));
end
% count occurences of values from all iterations
% sort them in descending order
occurences = accumarray([iteration{:}]', 1);
[occ val] = sort(occurences, 1, 'descend');
% remove zero occurences and create the Matrix
nonzero = find(occ);
Matrix = [val(nonzero) occ(nonzero)]'
Matrix =
3 9 1 2 4 5 6 8 10 13 14 20
2 2 1 1 1 1 1 1 1 1 1 1
% find minimum iteration number for all occurences
% again, using accumarray with #min function
assoc = accumarray([iteration{:}]', [idx{:}]', [], #min);
nonzero = find(assoc);
result = [nonzero assoc(nonzero)]'
result =
1 2 3 4 5 6 8 9 10 13 14 20
3 3 2 1 1 2 2 1 1 2 2 1