I have a matrix distance = [d11,d12,d13,d14;d15;...;dn1,dn2,dn3,dn4,dn5]; and a vector index(n,1). The value of index are between 1 and 5.
I want to get the sum of the distance according to the index if a vector R(1,5).
An example :
distance = [1,2,4,1,2 ; 4,5,6,1,6 ; 7,8,9,5,8] and index = [1;1;3]
So, I want R(1) = 1+4 = 5, R(2) = 0, R(3) = 9, and R(4) = R(5) = 0
The condition is to not use a loop over 1:5 with a if condition in order to minimise the time of execution if there are billions of points.
Maybe it is possible with arrayfun, but I don't succeed.
Best regards
To find out which elements you have to sum up, you can use the bsxfun to create a matrix containing a 1 if the value is relevant, and a 0 otherwise. This can done with
bsxfun(#eq, index, 1:5)
which will create a vector [1, 2, 3, 4, 5] and do an element-wise comparison between index and that vector. The result of this function is
ans =
1 0 0 0 0
1 0 0 0 0
0 0 1 0 0
Now you can multiply this matrix with the distance matrix (element-wise!) and finally sum over each column:
>> R = sum(bsxfun(#eq, index, 1:5) .* distance, 1);
which results in
R =
5 0 9 0 0
Related
I want to convert a onehot array to an array of integer values in MATLAB. Given:
Y = 1 0 0
0 1 0
0 1 0
I want to return:
new_y = 1
2
2
You could use find and return only the column indices like so
Y = [1 0 0; 0 1 0; 0 1 0];
[~, new_y] = find(Y); % output: [1; 2; 2] is the col indices of your 1s
Similarly you can return the row indices if your input was the transpose
[new_y, ~] = find(Y); % output: [1; 2; 3] is the row indices of your 1s
The Neural Network toolbox of MATLAB has built-in functions for converting between one-hot vectors and indices: ind2vec() to create a one-hot matrix, and vec2ind() to convert the one-hot matrix back to a vector of indices.
Note: ind2vec returns a sparse matrix. To convert it to a full matrix, you have to use the full() function.
>> Y = full(ind2vec([1, 2, 3]))
Y =
1 0 0
0 1 0
0 0 1
>> new_y = vec2ind(Y)
new_y =
1 2 3
I have two matrices. One is of size 1,000,000 x 9 and the other is 500,000 x 9.
The columns have the same meaning and the first 7 columns have the function of a key. Correspondingly, the last two columns have data character. There are many overlapping key values in both of the matrices and I would like to have a big matrix to compare the values. This big matrix should be of dimension 1,000,000 x 11.
For example:
A = [0 0 0 0 0 0 0 10 20; 0 0 0 0 0 0 1 30 40];
B = [0 0 0 0 0 0 0 50 60];
A merged matrix would look like this:
C = [0 0 0 0 0 0 0 10 20 50 60; 0 0 0 0 0 0 1 30 40 0 0];
As you can see, the first row of C has columns 8, 9 from matrix A and columns 10,11 from matrix B. The second row uses the columns 8, 9 from matrix A and 0,0 for the last to columns because there is no corresponding entry in matrix B.
I have accomplished this task theoretically, but it is very, very slow. I use loops a lot. In any other programming language, I would sort both tables, would iterate both of the tables in one big loop keeping two pointers.
Is there a more efficient algorithm available in Matlab using vectorization or at least a sufficiently efficient one that is idiomatic/short?
(Additional note: My largest issue seems to be the search function: Given my matrix, I would like to throw in one column vector 7x1, let's name it key to find the corresponding row. Right now, I use bsxfun for that:
targetRow = data( min(bsxfun(#eq, data(:, 1:7), key), [], 2) == 1, :);
I use min because the result of bsxfun is a vector with 7 match flags and I obviously want all of them to be true. It seems to me that this could be bottleneck of a Matlab algorithm)
Maybe with ismember and some indexing:
% locates in B the last ocurrence of each key in A. idxA has logicals of
% those keys found, and idxB tells us where in B.
[idxA, idxB] = ismember(A(:,1:7), B(:,1:7),'rows');
C = [ A zeros(size(A, 1), 2) ];
C(idxA, 10:11) = B(idxB(idxA), 8:9); % idxB(idxA) are the idxB != 0
I think this does what you want, only tested with your simple example.
% Initial matrices
A = [0 0 0 0 0 0 0 10 20;
0 0 0 0 0 0 1 30 40];
B = [0 0 0 0 0 0 0 50 60];
% Stack matrices with common key columns, 8&9 or 10&11 for data columns
C = [[A, zeros(size(A,1),2)]; [B(:,1:7), zeros(size(B,1),2), B(:,8:9)]];
% Sort C so that matching key rows will be consecutive
C = sortrows(C,1:7);
% Loop through rows
curRow = 1;
lastRow = size(C,1) - 1;
while curRow < lastRow
if all(C(curRow,1:7) == C(curRow+1,1:7))
% If first 7 cols of 2 rows match, take max values (override 0s)
% It may be safer to initialise the 0 columns to NaNs, as max will
% choose a numeric value over NaN, and it allows your data to be
% negative values.
C(curRow,8:11) = max(C(curRow:curRow+1, 8:11));
% Remove merged row
C(curRow+1,:) = [];
% Decrease size counter for matrix
lastRow = lastRow - 1;
else
% Increase row counter
curRow = curRow + 1;
end
end
Answer:
C = [0 0 0 0 0 0 0 10 20 50 60
0 0 0 0 0 0 1 30 40 0 0]
I have the following matrix in Matlab:
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
Each row has exactly one 1. How can I (without looping) determine a column vector so that the first element is a 2 if there is a 1 in the second column, the second element is a 3 for a one in the third column etc.? The above example should turn into:
M = [ 3
1
2
1
3];
You can actually solve this with simple matrix multiplication.
result = M * (1:size(M, 2)).';
3
1
2
1
3
This works by multiplying your M x 3 matrix with a 3 x 1 array where the elements of the 3x1 are simply [1; 2; 3]. Briefly, for each row of M, element-wise multiplication is performed with the 3 x 1 array. Only the 1's in the row of M will yield anything in the result. Then the result of this element-wise multiplication is summed. Because you only have one "1" per row, the result is going to be the column index where that 1 is located.
So for example for the first row of M.
element_wise_multiplication = [0 0 1] .* [1 2 3]
[0, 0, 3]
sum(element_wise_multiplication)
3
Update
Based on the solutions provided by #reyryeng and #Luis below, I decided to run a comparison to see how the performance of the various methods compared.
To setup the test matrix (M) I created a matrix of the form specified in the original question and varied the number of rows. Which column had the 1 was chosen randomly using randi([1 nCols], size(M, 1)). Execution times were analyzed using timeit.
When run using M of type double (MATLAB's default) you get the following execution times.
If M is a logical, then the matrix multiplication takes a hit due to the fact that it has to be converted to a numerical type prior to matrix multiplication, whereas the other two have a bit of a performance improvement.
Here is the test code that I used.
sizes = round(linspace(100, 100000, 100));
times = zeros(numel(sizes), 3);
for k = 1:numel(sizes)
M = generateM(sizes(k));
times(k,1) = timeit(#()M * (1:size(M, 2)).');
M = generateM(sizes(k));
times(k,2) = timeit(#()max(M, [], 2), 2);
M = generateM(sizes(k));
times(k,3) = timeit(#()find(M.'), 2);
end
figure
plot(range, times / 1000);
legend({'Multiplication', 'Max', 'Find'})
xlabel('Number of rows in M')
ylabel('Execution Time (ms)')
function M = generateM(nRows)
M = zeros(nRows, 3);
col = randi([1 size(M, 2)], 1, size(M, 1));
M(sub2ind(size(M), 1:numel(col), col)) = 1;
end
You can also abuse find and observe the row positions of the transpose of M. You have to transpose the matrix first as find operates in column major order:
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
[out,~] = find(M.');
Not sure if this is faster than matrix multiplication though.
Yet another approach: use the second output of max:
[~, result] = max(M.', [], 1);
Or, as suggested by #rayryeng, use max along the second dimension instead of transposing M:
[~, result] = max(M, [], 2);
For
M = [0 0 1
1 0 0
0 1 0
1 0 0
0 0 1];
this gives
result =
3 1 2 1 3
If M contains more than one 1 in a given row, this will give the index of the first such 1.
Is it possible to assign ranges to a matrix.
If you consider the below zeros matrix as a 'grid' for plotting:
R = zeros(5,8);
R =
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
So can you treat this matrix as a grid so each x-axis zero can considered as a range? for example R(5,1) is a range 0-0.1 seconds. R(5,2) is a range 0.1-0.2 seconds etc.
Can the range idea also be applied to the columns?
The purpose for this is so I can read cell array data I have already organised into ranges into the zeros matrix to produce a 2d histogram.
Assume you have the times tt and the datavalues val, where val(i) contains the datavalue for time tt(i). In your example you would have
tt = [0.02, 0.22, 0.15, 0.08, 0.27, 0.09];
val = [0.5, 1.4, 2.5, 0.6 , 0.8, 0.3 ];
Now you need vectors that represent the time and data ranges that you want (increasing), for example
trange = [0, 0.1, 0.2, 0.3, Inf];
valrange = [0, 1, 2, 3, Inf];
Now you create a matrix of the right size
R = zeros(length(valrange), length(trange));
You can fill the matrix up easily just by looping over all times you have
for i=1:length(tt)
%// We consider the pair tt(i), val(i)
%// First find out, in which time range tt(i) lies:
tind = find(trange > tt(i), 1, 'first');
%// Now find out, in which value range val(i) lies:
valind = find(valrange > val(i), 1, 'first');
%// Now we increase the corresponding matrix entry
R(valind,tind) = R(valind,tind) + 1;
end
Note that the first column corresponds to the time range between -Inf to trange(1) and the last column to the range between trange(end-1) and trange(end)==Inf. Simliary for the first and last row.
I'm not sure if I understand your question.
If you ask, whether it is possible to assign a vector, e.g. a = [1;2;3], to be a column in some matrix R = zeros(3, 5), then this can be achieved by
R(:, 1) = a;
R(:, 2) = [4;5;6];
Given a vector of integers such as:
X = [1 2 3 4 5 1 2]
I would like to find a really fast way to count the number of unique combinations with 2-elements.
In this case the two-number combinations are:
[1 2] (occurs twice)
[2 3] (occurs once)
[3 4] (occurs once)
[4 5] (occurs once)
[5 1] (occurs once)
As it stands, I am currently doing this in MATLAB as follows
X = [1 2 3 4 5 1 2];
N = length(X)
X_max = max(X);
COUNTS = nan(X_max); %store as a X_max x X_max matrix
for i = 1:X_max
first_number_indices = find(X==1)
second_number_indices = first_number_indices + 1;
second_number_indices(second_number_indices>N) = [] %just in case last entry = 1
second_number_vals = X(second_number_indices);
for j = 1:X_max
COUNTS(i,j) = sum(second_number_vals==j)
end
end
Is there a faster/smarter way of doing this?
Here is a super fast way:
>> counts = sparse(x(1:end-1),x(2:end),1)
counts =
(5,1) 1
(1,2) 2
(2,3) 1
(3,4) 1
(4,5) 1
You could convert to a full matrix simply as: full(counts)
Here is an equivalent solution using accumarray:
>> counts = accumarray([x(1:end-1);x(2:end)]', 1)
counts =
0 2 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
1 0 0 0 0
EDIT: #Amro has provided a much better solution (well, better in the vast majority of cases, I suspect my method would work better if MaxX is very large and X contains zeros - this is because the presence of zeros will rule out the use of sparse while a large MaxX will slow down the accumarray approach as it creates a matrix of size MaxX by MaxX).
EDIT: Thanks to #EitanT for pointing out an improvement that can be made using accumarray.
Here is how I would solve it:
%Generate some random data
T = 20;
MaxX = 3;
X = randi(MaxX, T, 1);
%Get the unique combinations and an index. Note, I am assuming X is a column vector.
[UniqueComb, ~, Ind] = unique([X(1:end-1), X(2:end)], 'rows');
NumComb = size(UniqueComb, 1);
%Count the number of occurrences of each combination
Count = accumarray(Ind, 1);
All unique sequential two element combinations are now stored in UniqueComb, while the corresponding counts for each unique combination are stored in Count.