The best way to ask my question is via a clear example. Consider 2 timelines (e.g. time in seconds) A and B where the intervals for each timeline are:
intervals_a =
0 1
1 4
4 7
7 9
intervals_b =
0 2
2 3
3 5
5 8
Notice that the first a-interval overlaps the first b-interval. The second a-interval overlaps the first, second, and third b-intervals, and so on.
Ultimately, I need an output which shows the indices of a-intervals which overlap with b-intervals as below:
output =
1 1 \\ 1st a-interval overlaps 1st b-interval
2 1 \\ 2nd a-interval overlaps 1st b-interval
2 2 \\ 2nd a-interval overlaps 2nd b-interval
2 3 \\ 2nd a-interval overlaps 3rd b-interval
3 3 \\ etc...
3 4
4 4
The big challenge is: The solution cannot contain for/while loops ("why" is irrelevant). Can this be done efficiently using vector / matrix / array / sort or other tools? A MATLAB implementation would be perfect, but any other language is fine. Thanks in advance!
To find overlapping intervals, you need to check if the start-time or the end-time of one interval falls within the boundaries of another. To do that with for all intervals at once, you could use bsxfun:
ovlp = #(x, y)bsxfun(#ge, x(:, 1), y(:, 1)') & bsxfun(#le, x(:, 1), y(:, 2)');
idx = ovlp(intervals_a, intervals_b) | ovlp(intervals_b, intervals_a)';
[row, col] = ind2sub(size(idx), find(idx));
output = [row, col];
Example
Let's see how this works for your example:
intervals_a = [0 1; 1 4; 4 7; 7 9]
intervals_b = [0 2; 2 3; 3 5; 5 8]
The anonymous function ovlp checks if the start-times in x (that is, x(:, 1)) fall inside the intervals given in y. Therefore, ovlp(intervals_a, intervals_b) yields:
ans =
1 0 0 0
1 0 0 0
0 0 1 0
0 0 0 1
The '1's indicate where start-time of interval_a falls inside interval_b. The row number is the index of the interval in intervals_a, and the column number is the index of the interval in intervals_b.
We need to do the same process for the start-times of intervals_b to find all the overlapping intervals, and we do a logical OR between the two results:
idx = ovlp(intervals_a, intervals_b) | ovlp(intervals_b, intervals_a)'
Notice that the second result is transposed, to keep the rows corresponding with intervals_a and not intervals_b. The resulting matrix idx is:
idx =
1 0 0 0
1 1 1 0
0 0 1 1
0 0 0 1
The final step is to translate the matrix idx into indices in intervals_a and intervals_b, so we obtain the row and column numbers of the '1's and concatenate them:
[row, col] = ind2sub(size(idx), find(idx));
output = [row, col];
The final result is:
output =
1 1
2 1
2 2
2 3
3 3
3 4
4 4
You want the indices (ii,jj) of int_A and int_B such that int_A(ii,1) > int_B(jj,1) and int_A(ii,2)
NA = size(A_int,1);
NB = size(int_B,1);
ABlower= repmat(A_int(:,1),[1,NB]);
ABupper= repmat(A_int(:,2),[1,NB]);
BAlower= repmat(B_int(:,1),[1,NA])';
BAupper= repmat(B_int(:,2),[1,NA])';
inInt = find((ABlower>BAlower & ABlower < BAupper) | (ABupper>BAlower & ABupper<BAupper);
[ii,jj]=ind2sub([NA,NB], inInt);
I don't have acces to Matlab right now but I believe this is very close...
Related
I created a cell array of shape m x 2, each element of which is a matrix of shape d x d.
For example like this:
A = cell(8, 2);
for row = 1:8
for col = 1:2
A{row, col} = rand(3, 3);
end
end
More generally, I can represent A as follows:
where each A_{ij} is a matrix.
Now, I need to randomly pick a matrix from each row of A, because A has m rows in total, so eventually I will pick out m matrices, which we call a combination.
Obviously, since there are only two picks for each row, there are a total of 2^m possible combinations.
My question is, how to get these 2^m combinations quickly?
It can be seen that the above problem is actually finding the Cartesian product of the following sets:
2^m is actually a binary number, so we can use those to create linear indices. You'll get an array containing 1s and 0s, something like [1 1 0 0 1 0 1 0 1], which we can treat as column "indices", using a 0 to indicate the first column and a 1 to indicate the second.
m = size(A, 1);
% Build all binary numbers and create a logical matrix
bin_idx = dec2bin(0:(2^m -1)) == '1';
row = 3; % Loop here over size(bin_idx,1) for all possible permutations
linear_idx = [find(~bin_idx(row,:)) find(bin_idx(row,:))+m];
A{linear_idx} % the combination as specified by the permutation in out(row)
On my R2007b version this runs virtually instant for m = 20.
NB: this will take m * 2^m bytes of memory to store bin_idx. Where that's just 20 MB for m = 20, that's already 30 GB for m = 30, i.e. you'll be running out of memory fairly quickly, and that's for just storing permutations as booleans! If m is large in your case, you can't store all of your possibilities anyway, so I'd just select a random one:
bin_idx = rand(m, 1); % Generate m random numbers
bin_idx(bin_idx > 0.5) = 1; % Set half to 1
bin_idx(bin_idx < 0.5) = 0; % and half to 0
Old, slow answer for large m
perms()1 gives you all possible permutations of a given set. However, it does not take duplicate entries into account, so you'll need to call unique() to get the unique rows.
unique(perms([1,1,2,2]), 'rows')
ans =
1 1 2 2
1 2 1 2
1 2 2 1
2 1 1 2
2 1 2 1
2 2 1 1
The only thing left now is to somehow do this over all possible amounts of 1s and 2s. I suggest using a simple loop:
m = 5;
out = [];
for ii = 1:m
my_tmp = ones(m,1);
my_tmp(ii:end) = 2;
out = [out; unique(perms(my_tmp),'rows')];
end
out = [out; ones(1,m)]; % Tack on the missing all-ones row
out =
2 2 2 2 2
1 2 2 2 2
2 1 2 2 2
2 2 1 2 2
2 2 2 1 2
2 2 2 2 1
1 1 2 2 2
1 2 1 2 2
1 2 2 1 2
1 2 2 2 1
2 1 1 2 2
2 1 2 1 2
2 1 2 2 1
2 2 1 1 2
2 2 1 2 1
2 2 2 1 1
1 1 1 2 2
1 1 2 1 2
1 1 2 2 1
1 2 1 1 2
1 2 1 2 1
1 2 2 1 1
2 1 1 1 2
2 1 1 2 1
2 1 2 1 1
2 2 1 1 1
1 1 1 1 2
1 1 1 2 1
1 1 2 1 1
1 2 1 1 1
2 1 1 1 1
1 1 1 1 1
NB: I've not initialised out, which will be slow especially for large m. Of course out = zeros(2^m, m) will be its final size, but you'll need to juggle the indices within the for loop to account for the changing sizes of the unique permutations.
You can create linear indices from out using find()
linear_idx = [find(out(row,:)==1);find(out(row,:)==2)+size(A,1)];
A{linear_idx} % the combination as specified by the permutation in out(row)
Linear indices are row-major in MATLAB, thus whenever you need the matrix in column 1, simply use its row number and whenever you need the second column, use the row number + size(A,1), i.e. the total number of rows.
Combining everything together:
A = cell(8, 2);
for row = 1:8
for col = 1:2
A{row, col} = rand(3, 3);
end
end
m = size(A,1);
out = [];
for ii = 1:m
my_tmp = ones(m,1);
my_tmp(ii:end) = 2;
out = [out; unique(perms(my_tmp),'rows')];
end
out = [out; ones(1,m)];
row = 3; % Loop here over size(out,1) for all possible permutations
linear_idx = [find(out(row,:)==1).';find(out(row,:)==2).'+m];
A{linear_idx} % the combination as specified by the permutation in out(row)
1 There's a note in the documentation:
perms(v) is practical when length(v) is less than about 10.
I have a vector M containing single elements and repeats. I want to delete all the single elements. Turning something like [1 1 2 3 4 5 4 4 5] to [1 1 4 5 4 4 5].
I thought I'd try to get the count of each element then use the index to delete what I don't need, something like this:
uniq = unique(M);
list = [uniq histc(M,uniq)];
Though I'm stuck here and not sure how to go forward. Can anyone help?
Here is a solution using unique, histcounts and ismember:
tmp=unique(M) ; %finding unique elements of M
%Now keeping only those elements in tmp which appear only once in M
tmp = tmp(histcounts(M,[tmp tmp(end)])==1); %Thanks to rahnema for his insight on this
[~,ind] = ismember(tmp,M); %finding the indexes of these elements in M
M(ind)=[];
histcounts was introduced in R2014b. For earlier versions, hist can be used by replacing that line with this:
tmp=tmp(hist(M,tmp)==1);
You can get the result with the following code:
A = [a.', ones(length(a),1)];
[C,~,ic] = unique(A(:,1));
result = [C, accumarray(ic,A(:,2))];
a = A(~ismember(A(:,1),result(result(:,2) == 1))).';
The idea is, add ones to the second column of a', then accumarray base on the first column (elements of a). After that, found the elements in first column which have accum sum in the second column. Therefore, these elements repeated once in a. Finally, removing them from the first column of A.
Here is a cheaper alternative:
[s ii] = sort(a);
x = [false s(2:end)==s(1:end-1)]
y = [x(2:end)|x(1:end-1) x(end)]
z(ii) = y;
result = a(z);
Assuming the input is
a =
1 1 8 8 3 1 4 5 4 6 4 5
we sort the list s and get index of the sorted list ii
s=
1 1 1 3 4 4 4 5 5 6 8 8
we can find index of repeated elements and for it we check if an element is equal to the previous element
x =
0 1 1 0 0 1 1 0 1 0 0 1
however in x the first elements of each block is omitted to find it we can apply [or] between each element with the previous element
y =
1 1 1 0 1 1 1 1 1 0 1 1
we now have sorted logical index of repeated elements. It should be reordered to its original order. For it we use index of sorted elements ii :
z =
1 1 1 1 0 1 1 1 1 0 1 1
finally use z to extract only the repeated elements.
result =
1 1 8 8 1 4 5 4 4 5
Here is a result of a test in Octave* for the following input:
a = randi([1 100000],1,10000000);
-------HIST--------
Elapsed time is 5.38654 seconds.
----ACCUMARRAY------
Elapsed time is 2.62602 seconds.
-------SORT--------
Elapsed time is 1.83391 seconds.
-------LOOP--------
Doesn't complete in 15 seconds.
*Since in Octave histcounts hasn't been implemented so instead of histcounts I used hist.
You can test it Online
X = [1 1 2 3 4 5 4 4 5];
Y = X;
A = unique(X);
for i = 1:length(A)
idx = find(X==A(i));
if length(idx) == 1
Y(idx) = NaN;
end
end
Y(isnan(Y)) = [];
Then, Y would be [1 1 4 5 4 4 5]. It detects all single elements, and makes them as NaN, and then remove all NaN elements from the vector.
Given a vector A that contains a sequence of numbers.
The objective is to find all series (longer than a given number "threshold") that contain the same value. The result should be the position of both first and last values of that series.
Example: given a vector A where:
A = [1 1 1 2 1 3 3 3 1 1 1 1 1 4 3 2 2 2 2 2 2 2 3 4];
and a threshold B = 5;
The results would be:
[9 13] % a series contain only the number 1 with length equal to 5
[16 22] % a series contain only the number 2 with length equal to 7
A=[1 1 1 2 1 3 3 3 1 1 1 1 1 4 3 2 2 2 2 2 2 2 3 4];
B = 5;
[l c]= size(A); % to know the size of 'A'
K=1; % to define the length of the series
W=1; % a value used to save the positions of the wanted series.
For i=1:c-1
If A(i)==A(i+1)
K=k+1;
Else
If k>= B % to test of the actual series is equal or longer than the given threshold
S(w,1)=i;
S(w,2)= S(w,1)-k+1; % saving the first position and the last position of the series in 'S'
w=w+1;
end
k=1;
end
S % the final result which is a table contain all wanted series.
the result is as follow:
S 13 9 % 13: the last position of the wanted series and 9 is the first position
16 22
This one work soo good. But still... it is soo slow when its come to a big table.
A faster, vectorized option is to modify the approach from this solution for finding islands of zeroes:
A = [1 1 1 2 1 3 3 3 1 1 1 1 1 4 3 2 2 2 2 2 2 2 3 4]; % Sample data
B = 5; % Threshold
tsig = (diff(A) ~= 0);
dsig = diff([1 tsig 1]);
startIndex = find(dsig < 0);
endIndex = find(dsig > 0)-1;
duration = endIndex-startIndex+1;
stringIndex = (duration >= (B-1));
result = [startIndex(stringIndex); endIndex(stringIndex)+1].';
And the results:
result =
9 13
16 22
I am new to Matlab and I have a basic question.
I have this data set:
1 2 3
4 5 7
5 2 7
1 2 3
6 5 3
I am trying to calculate the relative frequencies from the dataset above
specifically calculating the relative frequency of x=1, y=2 and z=3
my code is:
data = load('datasetReduced.txt')
X = data(:, 1)
Y = data(:, 2)
Z = data(:, 3)
f = 0;
for i=1:5
if X == 1 & Y == 2 & Z == 3
s = 1;
else
s = 0;
end
f = f + s;
end
f
r = f/5
it is giving me a 0 result.
How can the code be corrected??
thanks,
Shosho
Your issue is likely that you are comparing floating point numbers using the == operator which is likely to fail due to floating point errors.
A faster way to do this would be to use ismember with the 'rows' option which will result in a logical array that you can then sum to get the total number of rows that matched and divide by the total number of rows.
tf = ismember(data, [1 2 3], 'rows');
relFreq = sum(tf) / numel(tf);
I think you want to count frequency of each instance, So try this
data = [1 2 3
4 5 7
5 2 7
1 2 3
6 5 3];
[counts,centers] = hist(data , unique(data))
Where centers is your unique instances and counts is count of each of them. The result should be as follow:
counts =
2 0 0
0 3 0
0 0 3
1 0 0
1 2 0
1 0 0
0 0 2
centers =
1 2 3 4 5 6 7
That it means you have 7 unique instances, from 1 to 7 and there is two 1s in first column and there is not any 1s in second and third and etc.
I think the question is pretty basic, but still keeps me busy since some time.
Lets assume we have a vector containing 4 integers randomly repetetive, like:
v = [ 1 3 3 3 4 2 1 2 3 4 3 2 1 4 3 3 4 2 2]
I am searching for the vector of all positions of each integer, e.g. for 1 it should be a vector like:
position_one = [1 7 13]
Since I want to search every row of a 100x10000 matrix I was not able to deal with linear indeces.
Thanks in advance!
Rows and columns
Since your output for every integer changes, a cell array will fit the whole task. For the whole matrix, you can do something like:
A = randi(4,10,30); % some data
Row = repmat((1:size(A,1)).',1,size(A,2)); % index of the row
Col = repmat((1:size(A,2)),size(A,1),1); % index of the column
pos = #(n) [Row(A==n) Col(A==n)]; % Anonymous function to find the indices of 'n'
than for every n you can write:
>> pos(3)
ans =
1 1
2 1
5 1
6 1
9 1
8 2
3 3
. .
. .
. .
where the first column is the row, and the second is the column for every instance of n in A.
And for all ns you can use an arrayfun:
positions = arrayfun(pos,1:max(A(:)),'UniformOutput',false) % a loop that goes over all n's
or a simple for loop (faster):
positions = cell(1,max(A(:)));
for n = 1:max(A(:))
positions(n) = {pos(n)};
end
The output in both cases would be a cell array:
positions =
[70x2 double] [78x2 double] [76x2 double] [76x2 double]
and for every n you can write positions{n}, to get for example:
>> positions{1}
ans =
10 1
2 3
5 3
3 4
5 4
1 5
4 5
. .
. .
. .
Only rows
If all you want in the column index per a given row and n, you can write this:
A = randi(4,10,30);
row_pos = #(k,n) A(k,:)==n;
positions = false(size(A,1),max(A(:)),size(A,2));
for n = 1:max(A(:))
positions(:,n,:) = row_pos(1:size(A,1),n);
end
now, positions is a logical 3-D array, that every row corresponds to a row in A, every column corresponds to a value of n, and the third dimension is the presence vector for the combination of row and n. this way, we can define R to be the column index:
R = 1:size(A,2);
and then find the relevant positions for a given row and n. For instance, the column indices of n=3 in row 9 is:
>> R(positions(9,3,:))
ans =
2 6 18 19 23 24 26 27
this would be just like calling find(A(9,:)==3), but if you need to perform this many times, the finding all indices and store them in positions (which is logical so it is not so big) would be faster.
Find linear indexes in a matrix: I = find(A == 1).
Find two dimensional indexes in matrix A: [row, col] = find(A == 1).
%Create sample matrix, with elements equal one:
A = zeros(5, 4);
A([2 10 14]) = 1
A =
0 0 0 0
1 0 0 0
0 0 0 0
0 0 1 0
0 1 0 0
Find ones as linear indexes in A:
find(A == 1)
ans =
2
10
14
%This is the same as reshaping A to a vector and find ones in the vector:
B = A(:);
find(B == 1);
B' =
0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
Find two dimensional indexes:
[row, col] = find(A == 1)
row =
2
5
4
col =
1
2
3
You can do that with accumarray using an anonymous function as follows:
positions = accumarray(v(:), 1:numel(v), [], #(x) {sort(x.')});
For
v = [ 1 3 3 3 4 2 1 2 3 4 3 2 1 4 3 3 4 2 2];
this gives
positions{1} =
1 7 13
positions{2} =
6 8 12 18 19
positions{3} =
2 3 4 9 11 15 16
positions{4} =
5 10 14 17