Position of integers in vector - matlab

I think the question is pretty basic, but still keeps me busy since some time.
Lets assume we have a vector containing 4 integers randomly repetetive, like:
v = [ 1 3 3 3 4 2 1 2 3 4 3 2 1 4 3 3 4 2 2]
I am searching for the vector of all positions of each integer, e.g. for 1 it should be a vector like:
position_one = [1 7 13]
Since I want to search every row of a 100x10000 matrix I was not able to deal with linear indeces.
Thanks in advance!

Rows and columns
Since your output for every integer changes, a cell array will fit the whole task. For the whole matrix, you can do something like:
A = randi(4,10,30); % some data
Row = repmat((1:size(A,1)).',1,size(A,2)); % index of the row
Col = repmat((1:size(A,2)),size(A,1),1); % index of the column
pos = #(n) [Row(A==n) Col(A==n)]; % Anonymous function to find the indices of 'n'
than for every n you can write:
>> pos(3)
ans =
1 1
2 1
5 1
6 1
9 1
8 2
3 3
. .
. .
. .
where the first column is the row, and the second is the column for every instance of n in A.
And for all ns you can use an arrayfun:
positions = arrayfun(pos,1:max(A(:)),'UniformOutput',false) % a loop that goes over all n's
or a simple for loop (faster):
positions = cell(1,max(A(:)));
for n = 1:max(A(:))
positions(n) = {pos(n)};
end
The output in both cases would be a cell array:
positions =
[70x2 double] [78x2 double] [76x2 double] [76x2 double]
and for every n you can write positions{n}, to get for example:
>> positions{1}
ans =
10 1
2 3
5 3
3 4
5 4
1 5
4 5
. .
. .
. .
Only rows
If all you want in the column index per a given row and n, you can write this:
A = randi(4,10,30);
row_pos = #(k,n) A(k,:)==n;
positions = false(size(A,1),max(A(:)),size(A,2));
for n = 1:max(A(:))
positions(:,n,:) = row_pos(1:size(A,1),n);
end
now, positions is a logical 3-D array, that every row corresponds to a row in A, every column corresponds to a value of n, and the third dimension is the presence vector for the combination of row and n. this way, we can define R to be the column index:
R = 1:size(A,2);
and then find the relevant positions for a given row and n. For instance, the column indices of n=3 in row 9 is:
>> R(positions(9,3,:))
ans =
2 6 18 19 23 24 26 27
this would be just like calling find(A(9,:)==3), but if you need to perform this many times, the finding all indices and store them in positions (which is logical so it is not so big) would be faster.

Find linear indexes in a matrix: I = find(A == 1).
Find two dimensional indexes in matrix A: [row, col] = find(A == 1).
%Create sample matrix, with elements equal one:
A = zeros(5, 4);
A([2 10 14]) = 1
A =
0 0 0 0
1 0 0 0
0 0 0 0
0 0 1 0
0 1 0 0
Find ones as linear indexes in A:
find(A == 1)
ans =
2
10
14
%This is the same as reshaping A to a vector and find ones in the vector:
B = A(:);
find(B == 1);
B' =
0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
Find two dimensional indexes:
[row, col] = find(A == 1)
row =
2
5
4
col =
1
2
3

You can do that with accumarray using an anonymous function as follows:
positions = accumarray(v(:), 1:numel(v), [], #(x) {sort(x.')});
For
v = [ 1 3 3 3 4 2 1 2 3 4 3 2 1 4 3 3 4 2 2];
this gives
positions{1} =
1 7 13
positions{2} =
6 8 12 18 19
positions{3} =
2 3 4 9 11 15 16
positions{4} =
5 10 14 17

Related

Remove single elements from a vector

I have a vector M containing single elements and repeats. I want to delete all the single elements. Turning something like [1 1 2 3 4 5 4 4 5] to [1 1 4 5 4 4 5].
I thought I'd try to get the count of each element then use the index to delete what I don't need, something like this:
uniq = unique(M);
list = [uniq histc(M,uniq)];
Though I'm stuck here and not sure how to go forward. Can anyone help?
Here is a solution using unique, histcounts and ismember:
tmp=unique(M) ; %finding unique elements of M
%Now keeping only those elements in tmp which appear only once in M
tmp = tmp(histcounts(M,[tmp tmp(end)])==1); %Thanks to rahnema for his insight on this
[~,ind] = ismember(tmp,M); %finding the indexes of these elements in M
M(ind)=[];
histcounts was introduced in R2014b. For earlier versions, hist can be used by replacing that line with this:
tmp=tmp(hist(M,tmp)==1);
You can get the result with the following code:
A = [a.', ones(length(a),1)];
[C,~,ic] = unique(A(:,1));
result = [C, accumarray(ic,A(:,2))];
a = A(~ismember(A(:,1),result(result(:,2) == 1))).';
The idea is, add ones to the second column of a', then accumarray base on the first column (elements of a). After that, found the elements in first column which have accum sum in the second column. Therefore, these elements repeated once in a. Finally, removing them from the first column of A.
Here is a cheaper alternative:
[s ii] = sort(a);
x = [false s(2:end)==s(1:end-1)]
y = [x(2:end)|x(1:end-1) x(end)]
z(ii) = y;
result = a(z);
Assuming the input is
a =
1 1 8 8 3 1 4 5 4 6 4 5
we sort the list s and get index of the sorted list ii
s=
1 1 1 3 4 4 4 5 5 6 8 8
we can find index of repeated elements and for it we check if an element is equal to the previous element
x =
0 1 1 0 0 1 1 0 1 0 0 1
however in x the first elements of each block is omitted to find it we can apply [or] between each element with the previous element
y =
1 1 1 0 1 1 1 1 1 0 1 1
we now have sorted logical index of repeated elements. It should be reordered to its original order. For it we use index of sorted elements ii :
z =
1 1 1 1 0 1 1 1 1 0 1 1
finally use z to extract only the repeated elements.
result =
1 1 8 8 1 4 5 4 4 5
Here is a result of a test in Octave* for the following input:
a = randi([1 100000],1,10000000);
-------HIST--------
Elapsed time is 5.38654 seconds.
----ACCUMARRAY------
Elapsed time is 2.62602 seconds.
-------SORT--------
Elapsed time is 1.83391 seconds.
-------LOOP--------
Doesn't complete in 15 seconds.
*Since in Octave histcounts hasn't been implemented so instead of histcounts I used hist.
You can test it Online
X = [1 1 2 3 4 5 4 4 5];
Y = X;
A = unique(X);
for i = 1:length(A)
idx = find(X==A(i));
if length(idx) == 1
Y(idx) = NaN;
end
end
Y(isnan(Y)) = [];
Then, Y would be [1 1 4 5 4 4 5]. It detects all single elements, and makes them as NaN, and then remove all NaN elements from the vector.

How can I calculate the relative frequency of a row in a data set using Matlab?

I am new to Matlab and I have a basic question.
I have this data set:
1 2 3
4 5 7
5 2 7
1 2 3
6 5 3
I am trying to calculate the relative frequencies from the dataset above
specifically calculating the relative frequency of x=1, y=2 and z=3
my code is:
data = load('datasetReduced.txt')
X = data(:, 1)
Y = data(:, 2)
Z = data(:, 3)
f = 0;
for i=1:5
if X == 1 & Y == 2 & Z == 3
s = 1;
else
s = 0;
end
f = f + s;
end
f
r = f/5
it is giving me a 0 result.
How can the code be corrected??
thanks,
Shosho
Your issue is likely that you are comparing floating point numbers using the == operator which is likely to fail due to floating point errors.
A faster way to do this would be to use ismember with the 'rows' option which will result in a logical array that you can then sum to get the total number of rows that matched and divide by the total number of rows.
tf = ismember(data, [1 2 3], 'rows');
relFreq = sum(tf) / numel(tf);
I think you want to count frequency of each instance, So try this
data = [1 2 3
4 5 7
5 2 7
1 2 3
6 5 3];
[counts,centers] = hist(data , unique(data))
Where centers is your unique instances and counts is count of each of them. The result should be as follow:
counts =
2 0 0
0 3 0
0 0 3
1 0 0
1 2 0
1 0 0
0 0 2
centers =
1 2 3 4 5 6 7
That it means you have 7 unique instances, from 1 to 7 and there is two 1s in first column and there is not any 1s in second and third and etc.

Shift rows in matrix with respect to vector values in Octave/MATLAB

Can I shift rows in matrix A with respect to values in vector v?
For instance A and v specified as follows:
A =
1 0 0
1 0 0
1 0 0
v =
0 1 2
In this case I want to get this matrix from A:
A =
1 0 0
0 1 0
0 0 1
Every i-th row in A has been shifted by i-th value in v
Can I do this operation with native functions?
Or should I write it by myself?
I've tried circshift function, but I couldn't figure out how to shift rows separately.
The function circshift does not work as you want and even if you use a vector for the amount of shift, that is interpreted as the amount of shift for each dimension. While it is possible to loop over the rows of your matrix, that will not be very efficient.
More efficient is if you compute the indexing for each row which is actually quite simple:
## First, prepare all your input
octave> A = randi (9, 4, 6)
A =
8 3 2 7 4 5
4 4 7 3 9 1
1 6 3 9 2 3
7 4 1 9 5 5
octave> v = [0 2 0 1];
octave> sz = size (A);
## Compute how much shift per row, the column index (this will not work in Matlab)
octave> c_idx = mod ((0:(sz(2) -1)) .- v(:), sz(2)) +1
c_idx =
1 2 3 4 5 6
5 6 1 2 3 4
1 2 3 4 5 6
6 1 2 3 4 5
## Convert it to linear index
octave> idx = sub2ind (sz, repmat ((1:sz(1))(:), 1, sz(2)) , c_idx);
## All you need is to index
octave> A = A(idx)
A =
8 3 2 7 4 5
9 1 4 4 7 3
1 6 3 9 2 3
5 7 4 1 9 5
% A and v as above. These could be function input arguments
A = [1 0 0; 1 0 0; 1 0 0];
v = [0 1 2];
assert (all (size (v) == [1, size(A, 1)]), ...
'v needs to be a horizontal vector with as many elements as rows of A');
% Calculate shifted indices
[r, c] = size (A);
tmp = mod (repmat (0 : c-1, r, 1) - repmat (v.', 1, c), c) + 1;
Out = A(sub2ind ([r, c], repmat ([1 : r].', 1, c), tmp))
Out =
1 0 0
0 1 0
0 0 1
If performance is an issue, you can replace repmat with an equivalent bsxfun call which is more efficient (I use repmat here for simplicity to demonstrate the approach).
With focus on performance, here's one approach using bsxfun/broadcasting -
[m,n] = size(A);
idx0 = mod(bsxfun(#plus,n-v(:),1:n)-1,n);
out = A(bsxfun(#plus,(idx0*m),(1:m)'))
Sample run -
A =
1 7 5 7 7
4 8 5 7 6
4 2 6 3 2
v =
3 1 2
out =
5 7 7 1 7
6 4 8 5 7
3 2 4 2 6
Equivalent Octave version to use automatic broadcasting would look something like this -
[m,n] = size(A);
idx0 = mod( ((n-v(:)) + (1:n)) -1 ,n);
out = A((idx0*m)+(1:m)')
Shift vector with circshift in loop, iterating row index.

Greatest values in a matrix, row by row - matlab

I have an m-by-n matrix. For each row, I want to find the position of the k greatest values, and set the others to 0.
Example, for k=2
I WANT
[1 2 3 5 [0 0 3 5
4 5 9 3 0 5 9 0
2 6 7 1] 0 6 7 0 ]
You can achieve it easily using the second output of sort:
data = [ 1 2 3 5
4 5 9 3
2 6 7 1 ];
k = 2;
[M N] = size(data);
[~, ind] = sort(data,2);
data(repmat((1:M).',1,N-k) + (ind(:,1:N-k)-1)*M) = 0;
In the example, this gives
>> data
data =
0 0 3 5
0 5 9 0
0 6 7 0
You can use prctile command to find the threshold per-line.
prctile returns percentiles of the values in the rows of data and thus can be easily tweaked to return the threshold value above which the k-th largest elements at each row exist:
T = prctile( data, 100*(1 - k/size(data,2)), 2 ); % find the threshold
out = bsxfun(#gt, data, T) .* data; % set lower than T to zero
For the data matrix posted in the question we get
>> out
out =
0 0 3 5
0 5 9 0
0 6 7 0

Split matrix based on number in first column

I have a matrix which has the following form:
M =
[1 4 56 1;
1 3 5 1;
1 3 6 4;
2 3 5 0;
2 0 0 0;
3 1 2 3;
3 3 3 3]
I want to split this matrix based on the number given in the first column. So I want to split the matrix into this:
A =
[1 4 56 1;
1 3 5 1;
1 3 6 4]
B =
[2 3 5 0;
2 0 0 0]
C =
[3 1 2 3;
3 3 3 3]
I tried this by making the following loop, but this gave me the desired matrices with rows of zeros:
for i = 1:length(M)
if (M(i,1) == 1)
A(i,:) = M(i,:);
elseif (M(i,1) == 2)
B(i,:) = M(i,:);
elseif (M(i,1) == 3)
C(i,:) = M(i,:);
end
end
The result for matrix C is then for example:
C =
[0 0 0 0;
0 0 0 0;
0 0 0 0;
2 3 5 0;
2 0 0 0]
How should I solve this issue?
Additional information:
The actual data has a date in the first column in the form yyyymmdd. The data set spans several years and I want to split this dataset in matrices for each year and after that for each month.
You can use arrayfun to solve this task:
M = [
1 4 56 1;
1 3 5 1;
1 3 6 4;
2 3 5 0;
2 0 0 0;
3 1 2 3;
3 3 3 3]
A = arrayfun(#(x) M(M(:,1) == x, :), unique(M(:,1)), 'uniformoutput', false)
The result A is a cell array and its contents can be accessed as follows:
>> a{1}
ans =
1 4 56 1
1 3 5 1
1 3 6 4
>> a{2}
ans =
2 3 5 0
2 0 0 0
>> a{3}
ans =
3 1 2 3
3 3 3 3
To split the data based on an yyyymmdd format in the first column, you can use the following:
yearly = arrayfun(#(x) M(floor(M(:,1)/10000) == x, :), unique(floor(M(:,1)/10000)), 'uniformoutput', false)
monthly = arrayfun(#(x) M(floor(M(:,1)/100) == x, :), unique(floor(M(:,1)/100)), 'uniformoutput', false)
If you don't know how many outputs you'll have, it is most convenient to put the data into a cell array rather than into separate arrays. The command to do this is MAT2CELL. Note that this assumes your data is sorted. If it isn't use sortrows before running the code.
%# count the repetitions
counts = hist(M(:,1),unique(M(:,1));
%# split the array
yearly = mat2cell(M,counts,size(M,2))
%# if you'd like to split each cell further, but still keep
%# the data also grouped by year, you can do the following
%# assuming the month information is in column 2
yearByMonth = cellfun(#(x)...
mat2cell(x,hist(x(:,2),unique(x(:,2)),size(x,2)),...
yearly,'uniformOutput',false);
You'd then access the data for year 3, month 4 as yearByMonth{3}{4}
EDIT
If the first column of your data is yyyymmdd, I suggest splitting it into three columns yyyy,mm,dd, like below, to facilitate grouping afterward:
ymd = 20120918;
yymmdd = floor(ymd./[10000 100 1])
yymmdd(2:3) = yymmdd(2:3)-100*yymmdd(1:2)