Remove single elements from a vector - matlab

I have a vector M containing single elements and repeats. I want to delete all the single elements. Turning something like [1 1 2 3 4 5 4 4 5] to [1 1 4 5 4 4 5].
I thought I'd try to get the count of each element then use the index to delete what I don't need, something like this:
uniq = unique(M);
list = [uniq histc(M,uniq)];
Though I'm stuck here and not sure how to go forward. Can anyone help?

Here is a solution using unique, histcounts and ismember:
tmp=unique(M) ; %finding unique elements of M
%Now keeping only those elements in tmp which appear only once in M
tmp = tmp(histcounts(M,[tmp tmp(end)])==1); %Thanks to rahnema for his insight on this
[~,ind] = ismember(tmp,M); %finding the indexes of these elements in M
M(ind)=[];
histcounts was introduced in R2014b. For earlier versions, hist can be used by replacing that line with this:
tmp=tmp(hist(M,tmp)==1);

You can get the result with the following code:
A = [a.', ones(length(a),1)];
[C,~,ic] = unique(A(:,1));
result = [C, accumarray(ic,A(:,2))];
a = A(~ismember(A(:,1),result(result(:,2) == 1))).';
The idea is, add ones to the second column of a', then accumarray base on the first column (elements of a). After that, found the elements in first column which have accum sum in the second column. Therefore, these elements repeated once in a. Finally, removing them from the first column of A.

Here is a cheaper alternative:
[s ii] = sort(a);
x = [false s(2:end)==s(1:end-1)]
y = [x(2:end)|x(1:end-1) x(end)]
z(ii) = y;
result = a(z);
Assuming the input is
a =
1 1 8 8 3 1 4 5 4 6 4 5
we sort the list s and get index of the sorted list ii
s=
1 1 1 3 4 4 4 5 5 6 8 8
we can find index of repeated elements and for it we check if an element is equal to the previous element
x =
0 1 1 0 0 1 1 0 1 0 0 1
however in x the first elements of each block is omitted to find it we can apply [or] between each element with the previous element
y =
1 1 1 0 1 1 1 1 1 0 1 1
we now have sorted logical index of repeated elements. It should be reordered to its original order. For it we use index of sorted elements ii :
z =
1 1 1 1 0 1 1 1 1 0 1 1
finally use z to extract only the repeated elements.
result =
1 1 8 8 1 4 5 4 4 5
Here is a result of a test in Octave* for the following input:
a = randi([1 100000],1,10000000);
-------HIST--------
Elapsed time is 5.38654 seconds.
----ACCUMARRAY------
Elapsed time is 2.62602 seconds.
-------SORT--------
Elapsed time is 1.83391 seconds.
-------LOOP--------
Doesn't complete in 15 seconds.
*Since in Octave histcounts hasn't been implemented so instead of histcounts I used hist.
You can test it Online

X = [1 1 2 3 4 5 4 4 5];
Y = X;
A = unique(X);
for i = 1:length(A)
idx = find(X==A(i));
if length(idx) == 1
Y(idx) = NaN;
end
end
Y(isnan(Y)) = [];
Then, Y would be [1 1 4 5 4 4 5]. It detects all single elements, and makes them as NaN, and then remove all NaN elements from the vector.

Related

How can I calculate the relative frequency of a row in a data set using Matlab?

I am new to Matlab and I have a basic question.
I have this data set:
1 2 3
4 5 7
5 2 7
1 2 3
6 5 3
I am trying to calculate the relative frequencies from the dataset above
specifically calculating the relative frequency of x=1, y=2 and z=3
my code is:
data = load('datasetReduced.txt')
X = data(:, 1)
Y = data(:, 2)
Z = data(:, 3)
f = 0;
for i=1:5
if X == 1 & Y == 2 & Z == 3
s = 1;
else
s = 0;
end
f = f + s;
end
f
r = f/5
it is giving me a 0 result.
How can the code be corrected??
thanks,
Shosho
Your issue is likely that you are comparing floating point numbers using the == operator which is likely to fail due to floating point errors.
A faster way to do this would be to use ismember with the 'rows' option which will result in a logical array that you can then sum to get the total number of rows that matched and divide by the total number of rows.
tf = ismember(data, [1 2 3], 'rows');
relFreq = sum(tf) / numel(tf);
I think you want to count frequency of each instance, So try this
data = [1 2 3
4 5 7
5 2 7
1 2 3
6 5 3];
[counts,centers] = hist(data , unique(data))
Where centers is your unique instances and counts is count of each of them. The result should be as follow:
counts =
2 0 0
0 3 0
0 0 3
1 0 0
1 2 0
1 0 0
0 0 2
centers =
1 2 3 4 5 6 7
That it means you have 7 unique instances, from 1 to 7 and there is two 1s in first column and there is not any 1s in second and third and etc.

How find rows and columns in matlab

I have variable matrix :
A = [1 2 8 8 1
4 6 8 1 1
5 3 1 1 8];
and I have variable B :
B=[2 3 1 8 8];
Question is how to find rows and columns (sort by rows) in variable A from variable B.
Example, first index in variable B is 2, and then I want to find value 2 in variable A and get to first rows and columns, and next process until index 5, but if rows and columns has been used so get second position (ex. index 4 & 5 having same value).
rows;
columns;
Result is:
rows = 1 3 1 1 1
columns = 2 2 1 3 4
Use can use find and sub2ind to achieve what you want
but for that you have to take transpose of your A first
A = [1 2 8 8 1
4 6 8 1 1
5 3 1 1 8];
B= [2 3 1 8 8];
TMP = A.';
for i = 1:length(B)
indx = find(TMP== B(i),1,'first') %Finding the element of B present in A
if(~isempty(indx )) % If B(i) is a member of A
[column(i),row(i)] = ind2sub(size(TMP),indx) % store it in row and column matrix
TMP(indx) = nan; % remove that element
end
end
column =
2 2 1 3 4
row =
1 3 1 1 1
As in one of the comments Usama suggested preallocation of memory
you can do that by using
row = zeros(1,sum(ismember(B,A)))
column= zeros(1,sum(ismember(B,A)))
The above code works even if there are some members of B not present in A
Use find. The function could return both a linear index or a row/col index.
Using linear index a solution could be
idx = zeros(size(B));
for i = 1:numel(B)
% Find all indexes
tmpIdx = find(A == B(i));
% Remove those already used
tmpIdx = setdiff(tmpIdx, idx);
% Get the first new unique
idx(i) = tmpIdx(1);
end
% Convert index to row and col
[rows, cols] = ind2sub(size(A),idx)
Giving:
rows = 1 3 1 1 2
cols = 2 2 1 3 3
Note that as the linear indexing goes down column by column, the result here differs from the one in your example (although still a correct index)
rows = 1 3 1 1 1
columns= 2 2 1 3 4
But to get this you could just transpose the A matrix (A.') and flip the rows and cols (the result from ind2sub)
Here is on solution where I use for loop, I tried to optimize the number of iteration and the computational cost. If there is no corresponding value between B and A the row/col index return NaN.
[Bu,~,ord] = unique(B,'stable');
% Index of each different values
[col,row] = arrayfun(#(x) find(A'==x),Bu,'UniformOutput',0)
% For each value in vector B we search the first "non already used" corresponding value in A.
for i = 1:length(B)
if ~isempty(row{ord(i)})
r(i) = row{ord(i)}(1);
row{ord(i)}(1) = [];
c(i) = col{ord(i)}(1);
col{ord(i)}(1) = [];
else
r(i) = NaN;
c(i) = NaN;
end
end
RESULT:
c = [2 2 1 3 4]
r = [1 3 1 1 1]

Position of integers in vector

I think the question is pretty basic, but still keeps me busy since some time.
Lets assume we have a vector containing 4 integers randomly repetetive, like:
v = [ 1 3 3 3 4 2 1 2 3 4 3 2 1 4 3 3 4 2 2]
I am searching for the vector of all positions of each integer, e.g. for 1 it should be a vector like:
position_one = [1 7 13]
Since I want to search every row of a 100x10000 matrix I was not able to deal with linear indeces.
Thanks in advance!
Rows and columns
Since your output for every integer changes, a cell array will fit the whole task. For the whole matrix, you can do something like:
A = randi(4,10,30); % some data
Row = repmat((1:size(A,1)).',1,size(A,2)); % index of the row
Col = repmat((1:size(A,2)),size(A,1),1); % index of the column
pos = #(n) [Row(A==n) Col(A==n)]; % Anonymous function to find the indices of 'n'
than for every n you can write:
>> pos(3)
ans =
1 1
2 1
5 1
6 1
9 1
8 2
3 3
. .
. .
. .
where the first column is the row, and the second is the column for every instance of n in A.
And for all ns you can use an arrayfun:
positions = arrayfun(pos,1:max(A(:)),'UniformOutput',false) % a loop that goes over all n's
or a simple for loop (faster):
positions = cell(1,max(A(:)));
for n = 1:max(A(:))
positions(n) = {pos(n)};
end
The output in both cases would be a cell array:
positions =
[70x2 double] [78x2 double] [76x2 double] [76x2 double]
and for every n you can write positions{n}, to get for example:
>> positions{1}
ans =
10 1
2 3
5 3
3 4
5 4
1 5
4 5
. .
. .
. .
Only rows
If all you want in the column index per a given row and n, you can write this:
A = randi(4,10,30);
row_pos = #(k,n) A(k,:)==n;
positions = false(size(A,1),max(A(:)),size(A,2));
for n = 1:max(A(:))
positions(:,n,:) = row_pos(1:size(A,1),n);
end
now, positions is a logical 3-D array, that every row corresponds to a row in A, every column corresponds to a value of n, and the third dimension is the presence vector for the combination of row and n. this way, we can define R to be the column index:
R = 1:size(A,2);
and then find the relevant positions for a given row and n. For instance, the column indices of n=3 in row 9 is:
>> R(positions(9,3,:))
ans =
2 6 18 19 23 24 26 27
this would be just like calling find(A(9,:)==3), but if you need to perform this many times, the finding all indices and store them in positions (which is logical so it is not so big) would be faster.
Find linear indexes in a matrix: I = find(A == 1).
Find two dimensional indexes in matrix A: [row, col] = find(A == 1).
%Create sample matrix, with elements equal one:
A = zeros(5, 4);
A([2 10 14]) = 1
A =
0 0 0 0
1 0 0 0
0 0 0 0
0 0 1 0
0 1 0 0
Find ones as linear indexes in A:
find(A == 1)
ans =
2
10
14
%This is the same as reshaping A to a vector and find ones in the vector:
B = A(:);
find(B == 1);
B' =
0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
Find two dimensional indexes:
[row, col] = find(A == 1)
row =
2
5
4
col =
1
2
3
You can do that with accumarray using an anonymous function as follows:
positions = accumarray(v(:), 1:numel(v), [], #(x) {sort(x.')});
For
v = [ 1 3 3 3 4 2 1 2 3 4 3 2 1 4 3 3 4 2 2];
this gives
positions{1} =
1 7 13
positions{2} =
6 8 12 18 19
positions{3} =
2 3 4 9 11 15 16
positions{4} =
5 10 14 17

average 3rd column when 1st and 2nd column have same numbers

just lets make it simple, assume that I have a 10x3 matrix in matlab. The numbers in the first two columns in each row represent the x and y (position) and the number in 3rd columns show the corresponding value. For instance, [1 4 12] shows that the value of function in x=1 and y=4 is equal to 12. I also have same x, and y in different rows, and I want to average the values with same x,y. and replace all of them with averaged one.
For example :
A = [1 4 12
1 4 14
1 4 10
1 5 5
1 5 7];
I want to have
B = [1 4 12
1 5 6]
I really appreciate your help
Thanks
Ali
Like this?
A = [1 4 12;1 4 14;1 4 10; 1 5 5;1 5 7];
[x,y] = consolidator(A(:,1:2),A(:,3),#mean);
B = [x,y]
B =
1 4 12
1 5 6
Consolidator is on the File Exchange.
Using built-in functions:
sparsemean = accumarray(A(:,1:2), A(:,3).', [], #mean, 0, true);
[i,j,v] = find(sparsemean);
B = [i.' j.' v.'];
A = [1 4 12;1 4 14;1 4 10; 1 5 5;1 5 7]; %your example data
B = unique(A(:, 1:2), 'rows'); %find the unique xy pairs
C = nan(length(B), 1);
% calculate means
for ii = 1:length(B)
C(ii) = mean(A(A(:, 1) == B(ii, 1) & A(:, 2) == B(ii, 2), 3));
end
C =
12
6
The step inside the for loop uses logical indexing to find the mean of rows that match the current xy pair in the loop.
Use unique to get the unique rows and use the returned indexing array to find the ones that should be averaged and ask accumarray to do the averaging part:
[C,~,J]=unique(A(:,1:2), 'rows');
B=[C, accumarray(J,A(:,3),[],#mean)];
For your example
>> [C,~,J]=unique(A(:,1:2), 'rows')
C =
1 4
1 5
J =
1
1
1
2
2
C contains the unique rows and J shows which rows in the original matrix correspond to the rows in C then
>> accumarray(J,A(:,3),[],#mean)
ans =
12
6
returns the desired averages and
>> B=[C, accumarray(J,A(:,3),[],#mean)]
B =
1 4 12
1 5 6
is the answer.

Overlapping time intervals WITHOUT for/while loops

The best way to ask my question is via a clear example. Consider 2 timelines (e.g. time in seconds) A and B where the intervals for each timeline are:
intervals_a =
0 1
1 4
4 7
7 9
intervals_b =
0 2
2 3
3 5
5 8
Notice that the first a-interval overlaps the first b-interval. The second a-interval overlaps the first, second, and third b-intervals, and so on.
Ultimately, I need an output which shows the indices of a-intervals which overlap with b-intervals as below:
output =
1 1 \\ 1st a-interval overlaps 1st b-interval
2 1 \\ 2nd a-interval overlaps 1st b-interval
2 2 \\ 2nd a-interval overlaps 2nd b-interval
2 3 \\ 2nd a-interval overlaps 3rd b-interval
3 3 \\ etc...
3 4
4 4
The big challenge is: The solution cannot contain for/while loops ("why" is irrelevant). Can this be done efficiently using vector / matrix / array / sort or other tools? A MATLAB implementation would be perfect, but any other language is fine. Thanks in advance!
To find overlapping intervals, you need to check if the start-time or the end-time of one interval falls within the boundaries of another. To do that with for all intervals at once, you could use bsxfun:
ovlp = #(x, y)bsxfun(#ge, x(:, 1), y(:, 1)') & bsxfun(#le, x(:, 1), y(:, 2)');
idx = ovlp(intervals_a, intervals_b) | ovlp(intervals_b, intervals_a)';
[row, col] = ind2sub(size(idx), find(idx));
output = [row, col];
Example
Let's see how this works for your example:
intervals_a = [0 1; 1 4; 4 7; 7 9]
intervals_b = [0 2; 2 3; 3 5; 5 8]
The anonymous function ovlp checks if the start-times in x (that is, x(:, 1)) fall inside the intervals given in y. Therefore, ovlp(intervals_a, intervals_b) yields:
ans =
1 0 0 0
1 0 0 0
0 0 1 0
0 0 0 1
The '1's indicate where start-time of interval_a falls inside interval_b. The row number is the index of the interval in intervals_a, and the column number is the index of the interval in intervals_b.
We need to do the same process for the start-times of intervals_b to find all the overlapping intervals, and we do a logical OR between the two results:
idx = ovlp(intervals_a, intervals_b) | ovlp(intervals_b, intervals_a)'
Notice that the second result is transposed, to keep the rows corresponding with intervals_a and not intervals_b. The resulting matrix idx is:
idx =
1 0 0 0
1 1 1 0
0 0 1 1
0 0 0 1
The final step is to translate the matrix idx into indices in intervals_a and intervals_b, so we obtain the row and column numbers of the '1's and concatenate them:
[row, col] = ind2sub(size(idx), find(idx));
output = [row, col];
The final result is:
output =
1 1
2 1
2 2
2 3
3 3
3 4
4 4
You want the indices (ii,jj) of int_A and int_B such that int_A(ii,1) > int_B(jj,1) and int_A(ii,2)
NA = size(A_int,1);
NB = size(int_B,1);
ABlower= repmat(A_int(:,1),[1,NB]);
ABupper= repmat(A_int(:,2),[1,NB]);
BAlower= repmat(B_int(:,1),[1,NA])';
BAupper= repmat(B_int(:,2),[1,NA])';
inInt = find((ABlower>BAlower & ABlower < BAupper) | (ABupper>BAlower & ABupper<BAupper);
[ii,jj]=ind2sub([NA,NB], inInt);
I don't have acces to Matlab right now but I believe this is very close...