How can I hot one encode in Matlab? [duplicate] - matlab

This question already has answers here:
Create a zero-filled 2D array with ones at positions indexed by a vector
(4 answers)
Closed 5 years ago.
Often you are given a vector of integer values representing your labels (aka classes), for example
[2; 1; 3; 3; 2]
and you would like to hot one encode this vector, such that each value is represented by a 1 in the column indicated by the value in each row of the labels vector, for example
[0 1 0;
1 0 0;
0 0 1;
0 0 1;
0 1 0]

For speed and memory savings, you can use bsxfun combined with eq to accomplish the same thing. While your eye solution may work, your memory usage grows quadratically with the number of unique values in X.
Y = bsxfun(#eq, X(:), 1:max(X));
Or as an anonymous function if you prefer:
hotone = #(X)bsxfun(#eq, X(:), 1:max(X));
Or if you're on Octave (or MATLAB version R2016b and later) , you can take advantage of automatic broadcasting and simply do the following as suggested by #Tasos.
Y = X == 1:max(X);
Benchmark
Here is a quick benchmark showing the performance of the various answers with varying number of elements on X and varying number of unique values in X.
function benchit()
nUnique = round(linspace(10, 1000, 10));
nElements = round(linspace(10, 1000, 12));
times1 = zeros(numel(nUnique), numel(nElements));
times2 = zeros(numel(nUnique), numel(nElements));
times3 = zeros(numel(nUnique), numel(nElements));
times4 = zeros(numel(nUnique), numel(nElements));
times5 = zeros(numel(nUnique), numel(nElements));
for m = 1:numel(nUnique)
for n = 1:numel(nElements)
X = randi(nUnique(m), nElements(n), 1);
times1(m,n) = timeit(#()bsxfunApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times2(m,n) = timeit(#()eyeApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times3(m,n) = timeit(#()sub2indApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times4(m,n) = timeit(#()sparseApproach(X));
X = randi(nUnique(m), nElements(n), 1);
times5(m,n) = timeit(#()sparseFullApproach(X));
end
end
colors = get(0, 'defaultaxescolororder');
figure;
surf(nElements, nUnique, times1 * 1000, 'FaceColor', colors(1,:), 'FaceAlpha', 0.5);
hold on
surf(nElements, nUnique, times2 * 1000, 'FaceColor', colors(2,:), 'FaceAlpha', 0.5);
surf(nElements, nUnique, times3 * 1000, 'FaceColor', colors(3,:), 'FaceAlpha', 0.5);
surf(nElements, nUnique, times4 * 1000, 'FaceColor', colors(4,:), 'FaceAlpha', 0.5);
surf(nElements, nUnique, times5 * 1000, 'FaceColor', colors(5,:), 'FaceAlpha', 0.5);
view([46.1000 34.8000])
grid on
xlabel('Elements')
ylabel('Unique Values')
zlabel('Execution Time (ms)')
legend({'bsxfun', 'eye', 'sub2ind', 'sparse', 'full(sparse)'}, 'Location', 'Northwest')
end
function Y = bsxfunApproach(X)
Y = bsxfun(#eq, X(:), 1:max(X));
end
function Y = eyeApproach(X)
tmp = eye(max(X));
Y = tmp(X, :);
end
function Y = sub2indApproach(X)
LinearIndices = sub2ind([length(X),max(X)], [1:length(X)]', X);
Y = zeros(length(X), max(X));
Y(LinearIndices) = 1;
end
function Y = sparseApproach(X)
Y = sparse(1:numel(X), X,1);
end
function Y = sparseFullApproach(X)
Y = full(sparse(1:numel(X), X,1));
end
Results
If you need a non-sparse output bsxfun performs the best, but if you can use a sparse matrix (without conversion to a full matrix), then that is the fastest and most memory efficient option.

You can use the identity matrix and index into it using the input/labels vector, for example if the labels vector X is some random integer vector
X = randi(3,5,1)
ans =
2
1
2
3
3
then, the following will hot one encode X
eye(max(X))(X,:)
which can be conveniently defined as a function using
hotone = #(v) eye(max(v))(v,:)
EDIT:
Although the solution above works in Octave, you have you modify it for Matlab as follows
I = eye(max(X));
I(X,:)

I think this is fast specially when matrix dimension grows:
Y = sparse(1:numel(X), X,1);
or
Y = full(sparse(1:numel(X), X,1));

Just posting the sub2ind solution too to satisfy your curiosity :)
But I like your solution better :p
>> X = [2,1,2,3,3]'
>> LinearIndices = sub2ind([length(X),3], [1:length(X)]', X);
>> tmp = zeros(length(X), 3);
>> tmp(LinearIndices) = 1
tmp =
0 1 0
1 0 0
0 1 0
0 0 1
0 0 1

Just in case someone is looking for the 2D case (as I was):
X = [2 1; ...
3 3; ...
2 4]
Y = zeros(3,2,4)
for i = 1:4
Y(:,:,i) = ind2sub(X,X==i)
end
gives a one-hot encoded matrix along the 3rd dimension.

Related

Cumulative count of unique element in Matlab array

Working with Matlab 2019b.
x = [10 10 10 20 20 30]';
How do I get a cumulative count of unique elements in x, which should look like:
y = [1 2 3 1 2 1]';
EDIT:
My real array is actually much longer than the example given above. Below are the methods I tested:
x = randi([1 100], 100000, 1);
x = sort(x);
% method 1: check neighboring values in one loop
tic
y = ones(size(x));
for ii = 2:length(x)
if x(ii) == x(ii-1)
y(ii) = y(ii-1) + 1;
end
end
toc
% method 2 (Wolfie): count occurrence of unique values explicitly
tic
u = unique(x);
y = zeros(size(x));
for ii = 1:numel(u)
idx = (x == u(ii));
y(idx) = 1:nnz(idx);
end
toc
% method 3 (Luis Mendo): triangular matrix
tic
y = sum(triu(x==x'))';
toc
Results:
Method 1: Elapsed time is 0.016847 seconds.
Method 2: Elapsed time is 0.037124 seconds.
Method 3: Elapsed time is 10.350002 seconds.
EDIT:
Assuming that x is sorted:
x = [10 10 10 20 20 30].';
x = sort(x);
d = [1 ;diff(x)];
f = find(d);
d(f) = f;
ic = cummax(d);
y = (2 : numel(x) + 1).' - ic;
When x is unsorted use this:
[s, is] = sort(x);
d = [1 ;diff(s)];
f = find(d);
d(f) = f;
ic = cummax(d);
y(is) = (2 : numel(s) + 1).' - ic;
Original Answer that only works on GNU Octave:
Assuming that x is sorted:
x = [10 10 10 20 20 30].';
x = sort(x);
[~, ic] = cummax(x);
y = (2 : numel(x) + 1).' - ic;
When x is unsorted use this:
[s, is] = sort(x);
[~, ic] = cummax(s);
y(is) = (2 : numel(s) + 1).' - ic;
You could loop over the unique elements, and set their indices to 1:n each time...
u = unique(x);
y = zeros(size(x));
for ii = 1:numel(u)
idx = (x == u(ii));
y(idx) = 1:nnz(idx);
end
This is a little inefficient because it generates an intermediate matrix, when actually only a triangular half is needed:
y = sum(triu(x==x.')).';
Here's a no-for-loop version. On my machine it's a bit faster than the previous working methods:
% if already sorted, can omit this first and last line
[s, is] = sort(x);
[u,~,iu] = unique(s);
c = accumarray(iu,1);
cs = cumsum([0;c]);
z = (1:numel(x))'-repelem(cs(1:end-1),c);
y(is) = z;

Multidimensional data storage and interpolation

I have a function (so to speak, i actually have data with this characteristic) with one variable x and several parameters a, b and c, so y = f(x, a, b, c).
Now i want to interpolate within families of parameters (for example for variations of a).
I'm currently doing this for data with one parameter (here, y is the data matrix)
% generate variable and data
x = linspace(0, 1, 100);
a = [0, 1]; % parameter
for i = 1:length(a)
y(:, i) = x.^2 + a(i);
end
% interpolate:
yi = interp1(a, y.', 0.5);
This works fine, but how do i expand this to more dimensions?
My current data format is like this: Each column of my data matrix represents one specific set of parameters, so for example:
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
where the first column denotes a = 0, b = 0, the second a = 1, b = 0, the third a = 0, b = 1 and the last a = 1, b = 1 (values are just for clarification, this is not on purpose binary. Also, the data columns are obviously not the same).
This data format is just the consequence of my data aquisition scheme, but i'm happy to change this into something more useful. Whatever works.
Works well for me:
% generate variable and data
x = linspace(0, 1, 100);
a = [0, 1, 2]; % parameter
b = [3, 4, 5]; % parameter
c = [6, 7, 8]; % parameter
% Create grid
[X,A,B,C]=ndgrid(x,a,b,c);
% define function
foo = #(x,p1,p2,p3) p1.*x.^2 + p2.*x + p3;
% evaluate function
Y = foo(X,A,B,C);
% interpolate:
yi = interpn(X,A,B,C,Y,x,1,4,6);
#zlon's answer works fine for the interpolation part, here i want to show how to convert the data from the format i provided to the needed format for the interpolation.
The two-dimensional matrix must be transformed into a N-dimensional one. Since the columns are not necessarily in order, we need to find the right ones. This is what i did:
First, we need to know the parameter set of each column:
a = [ 2, 2, 1, 0, 0, 1 ];
b = [ 1, 0, 0, 1, 0, 1 ];
These vectors length match the number of columns in the data matrix. The first column for example now contains the data for a = 2 and b = 1.
Now we can generate the new table:
A = -Inf;
i = 1;
while true
A = min(a(a > A)); % find next a
if isempty(A)
break
end
idxa = find(a == A); % store possible indices
B = -Inf;
j = 1;
while true
B = min(b(b > B))); % find next b
if isempty(B)
break
end
idxb = find(b == B); % store possible indices
% combine both indices
idx = intersect(idxa, idxb);
% save column in new data table
data(:, i, j) = olddata(:, idx);
% advance
j = j + 1;
end
i = i + 1;
end

Sequence of dots in matlab and psychotoolbox

How would i display one by one dots that are in a 3x3 matrix such as in the code below?
I would like to have dot1 appears in position [x1,y1] of the grid for a time t1, then dot2 to appears in position [x2,y2] of the grid for a time t2. Only one dot is being shown at each time.
Thanks for help
%grid
dim = 1
[x, y] = meshgrid(-dim:1:dim, -dim:1:dim);
pixelScale = screenYpixels / (dim * 2 + 2);
x = x .* pixelScale;
y = y .* pixelScale;
% Calculate the number of dots
numDots = numel(x);
% Make the matrix of positions for the dots.
dotPositionMatrix = [reshape(x, 1, numDots); reshape(y, 1, numDots)];
% We can define a center for the dot coordinates to be relaitive to.
dotCenter = [xCenter yCenter];
dotColors = [1 0 0];
dotSizes = 20;
Screen('DrawDots', window, dotPositionMatrix,...
dotSizes, dotColors, dotCenter, 2);
I think you want something like this?
%positions of each successive dots:
x_vec = [1,2,3,1,2,3,1,2,3];
y_vec = [1,1,1,2,2,2,3,3,3];
%wait times in sec for each dot:
wait_times = [1,1,2,1,1,2,1,1,2]
dotColor = [1 0 0];
dotSize = 400;
num_dots = length(x_vec);
for i = 1:num_dots
scatter(x_vec(i),y_vec(i),dotSize,dotColor,'filled');
xlim([0,max(x_vec)])
ylim([0,max(y_vec)])
pause(wait_times(i));
end

Create a zero-filled 2D array with ones at positions indexed by a vector

I'm trying to vectorize the following MATLAB operation:
Given a column vector with indexes, I want a matrix with the
same number of rows of the column and a fixed number of columns. The
matrix is initialized with zeroes and contains ones in the locations
specified by the indexes.
Here is an example of the script I've already written:
y = [1; 3; 2; 1; 3];
m = size(y, 1);
% For loop
yvec = zeros(m, 3);
for i=1:m
yvec(i, y(i)) = 1;
end
The desired result is:
yvec =
1 0 0
0 0 1
0 1 0
1 0 0
0 0 1
Is it possible to achieve the same result without the for loop? I tried something like this:
% Vectorization (?)
yvec2 = zeros(m, 3);
yvec2(:, y(:)) = 1;
but it doesn't work.
Two approaches you can use here.
Approach 1:
y = [1; 3; 2; 1; 3];
yvec = zeros(numel(y),3);
yvec(sub2ind(size(yvec),1:numel(y),y'))=1
Approach 2 (One-liner):
yvec = bsxfun(#eq, 1:3,y)
Yet another approach:
yvec = full(sparse(1:numel(y),y,1));
You could do this with accumarray:
yvec = accumarray([(1:numel(y)).' y], 1);
I did it this way:
classes_count = 10;
sample_count = 20;
y = randi([1 classes_count], 1, sample_count);
y_onehot = zeros(classes_count, size(y, 2));
idx = sub2ind(size(y_onehot), y, [1:size(y, 2)]);
y_onehot(idx) = 1

Solving a difference equation with initial condition

Consider a difference equation with its initial conditions.
5y(n) + y(n-1) - 3y(n-2) = (1/5^n) u(n), n>=0
y(n-1) = 2, y(n-2) = 0
How can I determine y(n) in Matlab?
Use an approach similar to this (using filter), but specifying initial conditions as done here (using filtic).
I'm assuming your initial conditions are: y(-1)=2, y(-2)=0.
num = 1; %// numerator of transfer function (from difference equation)
den = [5 1 -3]; %// denominator of transfer function (from difference equation)
n = 0:100; %// choose as desired
x = (1/5).^n; %// n is >= 0, so u(n) is 1
y = filter(num, den, x, filtic(num, den, [2 0], [0 0]));
%// [2 0] reflects initial conditions on y, and [0 0] those on x.
Here's a plot of the result, obtained with stem(n,y).
The second line of your code does not give initial conditions, because it refers to the index variable n. Since Matlab only allows positive integer indices, I'll assume that you mean y(1) = 0 and y(2) = 2.
You can get an iteration rule out of your first equation by simple algebra:
y(n) = ( (1/5^n) u(n) - y(n-1) + 3y(n-2) ) / 5
Code to apply this rule in Matlab:
n_max = 100;
y = nan(n_max, 1);
y(1) = 0;
y(2) = 2;
for n = 3 : n_max
y(n) = ( (1/5^n) * u(n) - y(n-1) + 3 * y(n-2) ) / 5;
end
This code assumes that the array u is already defined. n_max specifies how many elements of y to compute.