I am implementing a neural network and am trying to one hot encode a matrix of column vectors based on the max value in each column. Previously, I had been iterating through the matrix vector by vector, but I've been told that this is unnecessary and that I can actually one hot encode every column vector in the matrix at the same time. Unfortunately, after perusing SO, GitHub, and MathWorks, nothing seems to be getting the job done. I've listed my previous code below. Please help! Thanks :)
UPDATE:
This is what I am trying to accomplish...except this only changed the max value in the entire matrix to 1. I want to change the max value in each COLUMN to 1.
one_hots = bsxfun(#eq, mini_batch_activations, max(mini_batch_activations(:)))
UPDATE 2:
This is what I am looking for, but it only works for rows. I need columns.
V = max(mini_batch_activations,[],2);
idx = mini_batch_activations == V;
Iterative code:
% This is the matrix I want to one hot encode
mini_batch_activations = activations{length(layers)};
%For each vector in the mini_batch:
for m = 1:size(mini_batch_activations, 2)
% Isolate column vector for mini_batch
vector = mini_batch_activations(:,m);
% One hot encode vector to compare to target vector
one_hot = zeros(size(mini_batch_activations, 1),1);
[max_val,ind] = max(vector);
one_hot(ind) = 1;
% Isolate corresponding column vector in targets
mini_batch = mini_batch_y{k};
target_vector = mini_batch(:,m);
% Compare one_hot to target vector , and increment result if they match
if isequal(one_hot, target_vector)
num_correct = num_correct + 1;
endif
...
endfor
You’ve got the maxima for each column:
V = max(mini_batch_activations,[],1); % note 1, not 2!
Now all you need to do is equality comparison, the output is a logical array that readily converts to 0s and 1s. Note that MATLAB and Octave do implicit singleton expansion:
one_hot = mini_batch_activations==V;
I have a vector of numbers (temperatures), and I am using the MATLAB function mink to extract the 5 smallest numbers from the vector to form a new variable. However, the numbers extracted using mink are automatically ordered from lowest to largest (of those 5 numbers). Ideally, I would like to retain the sequence of the numbers as they are arranged in the original vector. I hope my problem is easy to understand. I appreciate any advice.
The function mink that you use was introduced in MATLAB 2017b. It has (as Andras Deak mentioned) two output arguments:
[B,I] = mink(A,k);
The second output argument are the indices, such that B == A(I).
To obtain the set B but sorted as they appear in A, simply sort the vector of indices I:
B = A(sort(I));
For example:
>> A = [5,7,3,1,9,4,6];
>> [~,I] = mink(A,3);
>> A(sort(I))
ans =
3 1 4
For older versions of MATLAB, it is possible to reproduce mink using sort:
function [B,I] = mink(A,k)
[B,I] = sort(A);
B = B(1:k);
I = I(1:k);
Note that, in the above, you don't need the B output, your ordered_mink can be written as follows
function B = ordered_mink(A,k)
[~,I] = sort(A);
B = A(sort(I(1:k)));
Note: This solution assumes A is a vector. For matrix A, see Andras' answer, which he wrote up at the same time as this one.
First you'll need the corresponding indices for the extracted values from mink using its two-output form:
[vals, inds] = mink(array);
Then you only need to order the items in val according to increasing indices in inds. There are multiple ways to do this, but they all revolve around sorting inds and using the corresponding order on vals. The simplest way is to put these vectors into a matrix and sort the rows:
sorted_rows = sortrows([inds, vals]); % sort on indices
and then just extract the corresponding column
reordered_vals = sorted_rows(:,2); % items now ordered as they appear in "array"
A less straightforward possibility for doing the sorting after the above call to mink is to take the sorting order of inds and use its inverse to reverse-sort vals:
reverse_inds = inds; % just allocation, really
reverse_inds(inds) = 1:numel(inds); % contruct reverse permutation
reordered_vals = vals(reverse_inds); % should be the same as previously
I'm trying to insert multiple values into an array using a 'values' array and a 'counter' array. For example, if:
a=[1,3,2,5]
b=[2,2,1,3]
I want the output of some function
c=somefunction(a,b)
to be
c=[1,1,3,3,2,5,5,5]
Where a(1) recurs b(1) number of times, a(2) recurs b(2) times, etc...
Is there a built-in function in MATLAB that does this? I'd like to avoid using a for loop if possible. I've tried variations of 'repmat()' and 'kron()' to no avail.
This is basically Run-length encoding.
Problem Statement
We have an array of values, vals and runlengths, runlens:
vals = [1,3,2,5]
runlens = [2,2,1,3]
We are needed to repeat each element in vals times each corresponding element in runlens. Thus, the final output would be:
output = [1,1,3,3,2,5,5,5]
Prospective Approach
One of the fastest tools with MATLAB is cumsum and is very useful when dealing with vectorizing problems that work on irregular patterns. In the stated problem, the irregularity comes with the different elements in runlens.
Now, to exploit cumsum, we need to do two things here: Initialize an array of zeros and place "appropriate" values at "key" positions over the zeros array, such that after "cumsum" is applied, we would end up with a final array of repeated vals of runlens times.
Steps: Let's number the above mentioned steps to give the prospective approach an easier perspective:
1) Initialize zeros array: What must be the length? Since we are repeating runlens times, the length of the zeros array must be the summation of all runlens.
2) Find key positions/indices: Now these key positions are places along the zeros array where each element from vals start to repeat.
Thus, for runlens = [2,2,1,3], the key positions mapped onto the zeros array would be:
[X 0 X 0 X X 0 0] % where X's are those key positions.
3) Find appropriate values: The final nail to be hammered before using cumsum would be to put "appropriate" values into those key positions. Now, since we would be doing cumsum soon after, if you think closely, you would need a differentiated version of values with diff, so that cumsum on those would bring back our values. Since these differentiated values would be placed on a zeros array at places separated by the runlens distances, after using cumsum we would have each vals element repeated runlens times as the final output.
Solution Code
Here's the implementation stitching up all the above mentioned steps -
% Calculate cumsumed values of runLengths.
% We would need this to initialize zeros array and find key positions later on.
clens = cumsum(runlens)
% Initalize zeros array
array = zeros(1,(clens(end)))
% Find key positions/indices
key_pos = [1 clens(1:end-1)+1]
% Find appropriate values
app_vals = diff([0 vals])
% Map app_values at key_pos on array
array(pos) = app_vals
% cumsum array for final output
output = cumsum(array)
Pre-allocation Hack
As could be seen that the above listed code uses pre-allocation with zeros. Now, according to this UNDOCUMENTED MATLAB blog on faster pre-allocation, one can achieve much faster pre-allocation with -
array(clens(end)) = 0; % instead of array = zeros(1,(clens(end)))
Wrapping up: Function Code
To wrap up everything, we would have a compact function code to achieve this run-length decoding like so -
function out = rle_cumsum_diff(vals,runlens)
clens = cumsum(runlens);
idx(clens(end))=0;
idx([1 clens(1:end-1)+1]) = diff([0 vals]);
out = cumsum(idx);
return;
Benchmarking
Benchmarking Code
Listed next is the benchmarking code to compare runtimes and speedups for the stated cumsum+diff approach in this post over the other cumsum-only based approach on MATLAB 2014B-
datasizes = [reshape(linspace(10,70,4).'*10.^(0:4),1,[]) 10^6 2*10^6]; %
fcns = {'rld_cumsum','rld_cumsum_diff'}; % approaches to be benchmarked
for k1 = 1:numel(datasizes)
n = datasizes(k1); % Create random inputs
vals = randi(200,1,n);
runs = [5000 randi(200,1,n-1)]; % 5000 acts as an aberration
for k2 = 1:numel(fcns) % Time approaches
tsec(k2,k1) = timeit(#() feval(fcns{k2}, vals,runs), 1);
end
end
figure, % Plot runtimes
loglog(datasizes,tsec(1,:),'-bo'), hold on
loglog(datasizes,tsec(2,:),'-k+')
set(gca,'xgrid','on'),set(gca,'ygrid','on'),
xlabel('Datasize ->'), ylabel('Runtimes (s)')
legend(upper(strrep(fcns,'_',' '))),title('Runtime Plot')
figure, % Plot speedups
semilogx(datasizes,tsec(1,:)./tsec(2,:),'-rx')
set(gca,'ygrid','on'), xlabel('Datasize ->')
legend('Speedup(x) with cumsum+diff over cumsum-only'),title('Speedup Plot')
Associated function code for rld_cumsum.m:
function out = rld_cumsum(vals,runlens)
index = zeros(1,sum(runlens));
index([1 cumsum(runlens(1:end-1))+1]) = 1;
out = vals(cumsum(index));
return;
Runtime and Speedup Plots
Conclusions
The proposed approach seems to be giving us a noticeable speedup over the cumsum-only approach, which is about 3x!
Why is this new cumsum+diff based approach better than the previous cumsum-only approach?
Well, the essence of the reason lies at the final step of the cumsum-only approach that needs to map the "cumsumed" values into vals. In the new cumsum+diff based approach, we are doing diff(vals) instead for which MATLAB is processing only n elements (where n is the number of runLengths) as compared to the mapping of sum(runLengths) number of elements for the cumsum-only approach and this number must be many times more than n and therefore the noticeable speedup with this new approach!
Benchmarks
Updated for R2015b: repelem now fastest for all data sizes.
Tested functions:
MATLAB's built-in repelem function that was added in R2015a
gnovice's cumsum solution (rld_cumsum)
Divakar's cumsum+diff solution (rld_cumsum_diff)
knedlsepp's accumarray solution (knedlsepp5cumsumaccumarray) from this post
Naive loop-based implementation (naive_jit_test.m) to test the just-in-time compiler
Results of test_rld.m on R2015b:
Old timing plot using R2015a here.
Findings:
repelem is always the fastest by roughly a factor of 2.
rld_cumsum_diff is consistently faster than rld_cumsum.
repelem is fastest for small data sizes (less than about 300-500 elements)
rld_cumsum_diff becomes significantly faster than repelem around 5 000 elements
repelem becomes slower than rld_cumsum somewhere between 30 000 and 300 000 elements
rld_cumsum has roughly the same performance as knedlsepp5cumsumaccumarray
naive_jit_test.m has nearly constant speed and on par with rld_cumsum and knedlsepp5cumsumaccumarray for smaller sizes, a little faster for large sizes
Old rate plot using R2015a here.
Conclusion
Use repelem below about 5 000 elements and the cumsum+diff solution above.
There's no built-in function I know of, but here's one solution:
index = zeros(1,sum(b));
index([1 cumsum(b(1:end-1))+1]) = 1;
c = a(cumsum(index));
Explanation:
A vector of zeroes is first created of the same length as the output array (i.e. the sum of all the replications in b). Ones are then placed in the first element and each subsequent element representing where the start of a new sequence of values will be in the output. The cumulative sum of the vector index can then be used to index into a, replicating each value the desired number of times.
For the sake of clarity, this is what the various vectors look like for the values of a and b given in the question:
index = [1 0 1 0 1 1 0 0]
cumsum(index) = [1 1 2 2 3 4 4 4]
c = [1 1 3 3 2 5 5 5]
EDIT: For the sake of completeness, there is another alternative using ARRAYFUN, but this seems to take anywhere from 20-100 times longer to run than the above solution with vectors up to 10,000 elements long:
c = arrayfun(#(x,y) x.*ones(1,y),a,b,'UniformOutput',false);
c = [c{:}];
There is finally (as of R2015a) a built-in and documented function to do this, repelem. The following syntax, where the second argument is a vector, is relevant here:
W = repelem(V,N), with vector V and vector N, creates a vector W where element V(i) is repeated N(i) times.
Or put another way, "Each element of N specifies the number of times to repeat the corresponding element of V."
Example:
>> a=[1,3,2,5]
a =
1 3 2 5
>> b=[2,2,1,3]
b =
2 2 1 3
>> repelem(a,b)
ans =
1 1 3 3 2 5 5 5
The performance problems in MATLAB's built-in repelem have been fixed as of R2015b. I have run the test_rld.m program from chappjc's post in R2015b, and repelem is now faster than other algorithms by about a factor 2:
can I have something like
A=1:10;
A(1:2 && 5:6)=0;
meaning I want to zero out specific ranges within my vector index expression in one line
Is that possible?
And what if I wanted to zero out all the rest like
A(~[1:2]) = 0
What's the way of logical NOT within vector indexing?
Thanks
The following should work:
idx = [1:2,5:6];
A(idx) = 0
If you want to zero the complement of the vector of indices:
idx = [1:2,5:6];
A(~ismembc(1:length(A),idx)) = 0
Where ismembc is a faster, lightweight version of ismember that assumes the array is sorted and non-sparse with no NaN elements. (Credit goes to this question.)
Just do A([1:2 5:6]). I.e., just create a vector of the indices you want to zero out.
Let us say I have the following:
M = randn(10,20);
T = randn(1,20);
I would like to threshold each column of M, by each entry of T. For example, find all indicies of all elements of M(:,1) that are greater than T(1). Find all indicies of all elements in M(:,2) that are greater than T(2), etc etc.
Of course, I would like to do this without a for-loop. Is this possible?
You can use bsxfun like this:
I = bsxfun(#gt, M, T);
Then I will be a logcial matrix of size(M) with ones where M(:,i) > T(i).
You can use bsxfun to do things like this, but it may not be faster than a for loop (more below on this).
result = bsxfun(#gt,M,T)
This will do an element wise comparison and return you a logical matrix indicating the relationship governed by the first argument. I have posted code below to show the direct comparison, indicating that it does return what you are looking for.
%var declaration
M = randn(10,20);
T = randn(1,20);
% quick method
fastres = bsxfun(#gt,M,T);
% looping method
res = false(size(M));
for i = 1:length(T)
res(:,i) = M(:,i) > T(i);
end
% check to see if the two matrices are identical
isMatch = all(all(fastres == res))
This function is very powerful and can be used to help speed up processes, but keep in mind that it will only speed things up if there is a lot of data. There is a bit of background work that bsxfun must do, which can actually cause it to be slower.
I would only recommend using it if you have several thousand data points. Otherwise, the traditional for-loop will actually be faster. Try it out for yourself by changing the size of the M and T variables.
You can replicate the threshold vector and use matrix comparison:
s=size(M);
T2=repmat(T, s(1), 1);
M(M<T2)=0;
Indexes=find(M);