Julia vs. MATLAB - Distance Matrix - Run Time Test - matlab

I started learning Julia not a long time ago and I decided to do a simple
comparison between Julia and Matlab on a simple code for computing Euclidean
distance matrices from a set of high dimensional points.
The task is simple and can be divided into two cases:
Case 1: Given two datasets in the form of n x d matrices, say X1 and X2, compute the pair wise Euclidean distance between each point in X1 and all the points in X2. If X1 is of size n1 x d, and X2 is of size n2 x d, then the resulting Euclidean distance matrix D will be of size n1 x n2. In the general setting, matrix D is not symmetric, and diagonal elements are not equal to zero.
Case 2: Given one dataset in the form of n x d matrix X, compute the pair wise Euclidean distance between all the n points in X. The resulting Euclidean distance matrix D will be of size n x n, symmetric, with zero elements on the main diagonal.
My implementation of these functions in Matlab and in Julia is given below. Note that none of the implementations rely on loops of any sort, but rather simple linear algebra operations. Also, note that the implementation using both languages is very similar.
My expectations before running any tests for these implementations is that the Julia code will be much faster than the Matlab code, and by a significant margin. To my surprise, this was not the case!
The parameters for my experiments are given below with the code. My machine is a MacBook Pro. (15" Mid 2015) with 2.8 GHz Intel Core i7 (Quad Core), and 16 GB 1600 MHz DDR3.
Matlab version: R2018a
Julia version: 0.6.3
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
The results are given in Table (1) below.
Table 1: Average time in seconds (with standard deviation) over 30 trials for computing Euclidean distance matrices between two different datasets (Col. 1),
and between all pairwise points in one dataset (Col. 2).
Two Datasets || One Dataset
Matlab: 2.68 (0.12) sec. 1.88 (0.04) sec.
Julia V1: 5.38 (0.17) sec. 4.74 (0.05) sec.
Julia V2: 5.2 (0.1) sec.
I was not expecting this significant difference between both languages. I expected Julia to be faster than Matlab, or at least, as fast as Matlab. It was really a surprise to see that Matlab is almost 2.5 times faster than Julia in this particular task. I didn't want to draw any early conclusions based on these results for few reasons.
First, while I think that my Matlab implementation is as good as it can be, I'm wondering whether my Julia implementation is the best one for this task. I'm still learning Julia and I hope there is a more efficient Julia code that can yield faster computation time for this task. In particular, where is the main bottleneck for Julia in this task? Or, why does Matlab have an edge in this case?
Second, my current Julia package is based on the generic and standard BLAS and LAPACK packages for MacOS. I'm wondering whether JuliaPro with BLAS and LAPACK based on Intel MKL will be faster than the current version I'm using. This is why I opted to get some feedback from more knowledgeable people on StackOverflow.
The third reason is that I'm wondering whether the compile time for Julia was
included in the timings shown in Table 1 (2nd and 3rd rows), and whether there is a better way to assess the execution time for a function.
I will appreciate any feedback on my previous three questions.
Thank you!
Hint: This question has been identified as a possible duplicate of another question on StackOverflow. However, this is not entirely true. This question has three aspects as reflected by the answers below. First, yes, one part of the question is related to the comparison of OpenBLAS vs. MKL. Second, it turns out that the implementation as well can be improved as shown by one of the answers. And last, bench-marking the julia code itself can be improved by using BenchmarkTools.jl.
MATLAB
num_trials = 30;
dim = 1000;
n1 = 10000;
n2 = 10000;
T = zeros(num_trials,1);
XX1 = randn(n1,dim);
XX2 = rand(n2,dim);
%%% DIFEERENT MATRICES
DD2ds = zeros(n1,n2);
for (i = 1:num_trials)
tic;
DD2ds = distmat_euc2ds(XX1,XX2);
T(i) = toc;
end
mt = mean(T);
st = std(T);
fprintf(1,'\nDifferent Matrices:: dim: %d, n1 x n2: %d x %d -> Avg. Time %f (+- %f) \n',dim,n1,n2,mt,st);
%%% SAME Matrix
T = zeros(num_trials,1);
DD1ds = zeros(n1,n1);
for (i = 1:num_trials)
tic;
DD1ds = distmat_euc1ds(XX1);
T(i) = toc;
end
mt = mean(T);
st = std(T);
fprintf(1,'\nSame Matrix:: dim: %d, n1 x n1 : %d x %d -> Avg. Time %f (+- %f) \n\n',dim,n1,n1,mt,st);
distmat_euc2ds.m
function [DD] = distmat_euc2ds (XX1,XX2)
n1 = size(XX1,1);
n2 = size(XX2,1);
DD = sqrt(ones(n1,1)*sum(XX2.^2.0,2)' + (ones(n2,1)*sum(XX1.^2.0,2)')' - 2.*XX1*XX2');
end
distmat_euc1ds.m
function [DD] = distmat_euc1ds (XX)
n1 = size(XX,1);
GG = XX*XX';
DD = sqrt(ones(n1,1)*diag(GG)' + diag(GG)*ones(1,n1) - 2.*GG);
end
JULIA
include("distmat_euc.jl")
num_trials = 30;
dim = 1000;
n1 = 10000;
n2 = 10000;
T = zeros(num_trials);
XX1 = randn(n1,dim)
XX2 = rand(n2,dim)
DD = zeros(n1,n2)
# Euclidean Distance Matrix: Two Different Matrices V1
# ====================================================
for i = 1:num_trials
tic()
DD = distmat_eucv1(XX1,XX2)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Different Matrices V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
# Euclidean Distance Matrix: Two Different Matrices V2
# ====================================================
for i = 1:num_trials
tic()
DD = distmat_eucv2(XX1,XX2)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Different Matrices V2:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
# Euclidean Distance Matrix: Same Matrix V1
# =========================================
for i = 1:num_trials
tic()
DD = distmat_eucv1(XX1)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Same Matrix V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
distmat_euc.jl
function distmat_eucv1(XX1::Array{Float64,2},XX2::Array{Float64,2})
(num1,dim1) = size(XX1)
(num2,dim2) = size(XX2)
if (dim1 != dim2)
error("Matrices' 2nd dimensions must agree!")
end
DD = sqrt.((ones(num1)*sum(XX2.^2.0,2)') +
(ones(num2)*sum(XX1.^2.0,2)')' - 2.0.*XX1*XX2');
end
function distmat_eucv2(XX1::Array{Float64,2},XX2::Array{Float64,2})
(num1,dim1) = size(XX1)
(num2,dim2) = size(XX2)
if (dim1 != dim2)
error("Matrices' 2nd dimensions must agree!")
end
DD = (ones(num1)*sum(Base.FastMath.pow_fast.(XX2,2.0),2)') +
(ones(num2)*sum(Base.FastMath.pow_fast.(XX1,2.0),2)')' -
Base.LinAlg.BLAS.gemm('N','T',2.0,XX1,XX2);
DD = Base.FastMath.sqrt_fast.(DD)
end
function distmat_eucv1(XX::Array{Float64,2})
n = size(XX,1)
GG = XX*XX';
DD = sqrt.(ones(n)*diag(GG)' + diag(GG)*ones(1,n) - 2.0.*GG)
end

First question: If I re-write the julia distance function like so:
function dist2(X1::Matrix, X2::Matrix)
size(X1, 2) != size(X2, 2) && error("Matrices' 2nd dimensions must agree!")
return sqrt.(sum(abs2, X1, 2) .+ sum(abs2, X2, 2)' .- 2 .* (X1 * X2'))
end
I shave >40% off the execution time.
For a single dataset you can save a bit more, like this:
function dist2(X::Matrix)
G = X * X'
dG = diag(G)
return sqrt.(dG .+ dG' .- 2 .* G)
end
Third question: You should do your benchmarking with BenchmarkTools.jl, and perform the benchmarking like this (remember $ for variable interpolation):
julia> using BenchmarkTools
julia> #btime dist2($XX1, $XX2);
Additionally, you should not do powers using floats, like this: X.^2.0. It is faster, and equally correct to write X.^2.
For multiplication there is no speed difference between 2.0 .* X and 2 .* X, but you should still prefer using an integer, because it is more generic. As an example, if X has Float32 elements, multiplying with 2.0 will promote the array to Float64s, while multiplying with 2 will preserve the eltype.
And finally, note that in new versions of Matlab, too, you can get broadcasting behaviour by simply adding Mx1 arrays with 1xN arrays. There is no need to first expand them by multiplying with ones(...).

Related

MATLAB: Fast correlation computation for all indices in 2 vectors

I have 2 vectors A and B, each of length 10,000. For each of ind=1:10000, I want to compute the Pearson's correlation of A(1:ind) and B(1:ind). When I do this in a for loop, it takes too much time. parfor does not work with more than 2 workers in my machine. Is there a way to do this operation fast and save results in a vector C (apparently of length 10,000 where the first element is NaN)? I found the question Fast rolling correlation in Matlab, but this is a little different than what I need.
You can use this method to compute cumulative correlation coefficient:
function result = cumcor(x,y)
n = reshape(1:numel(x),size(x));
sumx = cumsum(x);
sumy = cumsum(y);
sumx2 = cumsum(x.^2);
sumy2 = cumsum(y.^2);
sumxy = cumsum(x.*y);
result = (n.*sumxy-sumx.*sumy)./(sqrt((sumx.^2-n.*sumx2).*(sumy.^2-n.*sumy2)));
end
Solution
I suggest the following approach:
Pearson correlation can be calculated by using the following formula:
calculating the accumulative mean of each of the random variabiles above efficiently is realtively easy
(X, Y, XY, X^2, Y^2).
given the accumulative mean calculated in 2, we can calculate the accumulative std of X and Y.
given the accumulative std of X,Y and accumulative mean above, we can calculate the accumulative pearson coefficient.
Code
%defines inputs
N = 10000;
X = rand(N,1);
Y = rand(N,1);
%calculates accumolative mean for X, Y, X^2, Y^2, XY
EX = accumMean(X);
EY = accumMean(Y);
EX2 = accumMean(X.^2);
EY2 = accumMean(Y.^2);
EXY = accumMean(X.*Y);
%calculates accumolative pearson correlation
accumPearson = zeros(N,1);
for ii=2:N
stdX = (EX2(ii)-EX(ii)^2).^0.5;
stdY = (EY2(ii)-EY(ii)^2).^0.5;
accumPearson(ii) = (EXY(ii)-EX(ii)*EY(ii))/(stdX*stdY);
end
%accumulative mean function, to be defined in an additional m file.
function [ accumMean ] = accumMean( vec )
accumMean = zeros(size(vec));
accumMean(1) = vec(1);
for ii=2:length(vec)
accumMean(ii) = (accumMean(ii-1)*(ii-1) +vec(ii))/ii;
end
end
Runtime
for N=10000:
Elapsed time is 0.002096 seconds.
for N=1000000:
Elapsed time is 0.240669 seconds.
Correctness
Testing the correctness of the code above could be done by calculative the accumulative pearson coefficient by corr function, and comparing it to the result given from the code above:
%ground truth for correctness comparison
gt = zeros(N,1)
for z=1:N
gt(z) = corr(X(1:z),Y(1:z));
end
Unfortunately, I dont have the Statistics and Machine Learning Toolbox, so I cant make this check.
I do think that it is a good start though, and you can continue from here :)

State space system gives different bode plot then transfer function matrix

I have a state space system with matrices A,B,C and D.
I can either create a state space system, sys1 = ss(A,B,C,D), of it or compute the transfer function matrix, sys2 = C*inv(z*I - A)*B + D
However when I draw the bode plot of both systems, they are different while they should be the same.
What is going wrong here? Does anyone have a clue? I know btw that the bodeplot generated by sys1 is correct.
The system can be downloaded here: https://dl.dropboxusercontent.com/u/20782274/system.mat
clear all;
close all;
clc;
Ts = 0.01;
z = tf('z',Ts);
% Discrete system
A = [0 1 0; 0 0 1; 0.41 -1.21 1.8];
B = [0; 0; 0.01];
C = [7 -73 170];
D = 1;
% Set as state space
sys1 = ss(A,B,C,D,Ts);
% Compute transfer function
sys2 = C*inv(z*eye(3) - A)*B + D;
% Compute the actual transfer function
[num,den] = ss2tf(A,B,C,D);
sys3 = tf(num,den,Ts);
% Show bode
bode(sys1,'b',sys2,'r--',sys3,'g--');
Edit: I made a small mistake, the transfer function matrix is sys2 = C*inv(z*I - A)*B + D, instead of sys2 = C*inv(z*I - A)*B - D which I did wrote done before. The problem still holds.
Edit 2: I have noticted that when I compute the denominator, it is correct.
syms z;
collect(det(z*eye(3) - A),z)
Your assumption that sys2 = C*inv(z*I- A)*B + D is incorrect. The correct equivalent to your state-space system (A,B,C,D) is sys2 = C*inv(s*I- A)*B + D. If you want to express it in terms of z, you'll need to invert the relationship z = exp(s*T). sys1 is the correct representation of your state-space system. What I would suggest for sys2 is to do as follows:
sys1 = ss(mjlsCE.A,mjlsCE.B,mjlsCE.C,mjlsCE.D,Ts);
sys1_c = d2c(sys1);
s = tf('s');
sys2_c = sys1_c.C*inv(s*eye(length(sys1_c.A)) - sys1_c.A)*sys1_c.B + sys1_c.D;
sys2_d = c2d(sys2_c,Ts);
That should give you the correct result.
Due to inacurracy of the inverse function extra unobservable poles and zeros are added to the system. For this reason you need to compute the minimal realization of your transfer function matrix.
Meaning
% Compute transfer function
sys2 = minreal(C*inv(z*eye(3) - A)*B + D);
What you are noticing is actually a numerical instability regarding pole-zero pair cancellations.
If you run the following code:
A = [0, 1, 0; 0, 0, 1; 0.41, -1.21, 1.8] ;
B = [0; 0; 0.01] ;
C = [7, -73, 170] ;
D = 1 ;
sys_ss = ss(A, B, C, D) ;
sys_tf_simp = tf(sys_ss) ;
s = tf('s') ;
sys_tf_full = tf(C*inv(s*eye(3) - A)*B + D) ;
zero(sys_tf_simp)
zero(sys_tf_full)
pole(sys_tf_simp)
pole(sys_tf_full)
you will see that the transfer function formulated by matrices directly has a lot more poles and zeros than the one formulated by MatLab's tf function. You will also notice that every single pair of these "extra" poles and zeros are equal- meaning that they cancel with each other if you were to simply the rational expression. MatLab's tf presents the simplified form, with equal pole-zero pairs cancelled out. This is algebraically equivalent to the unsimplified form, but not numerically.
When you call bode on the unsimplified transfer function, MatLab begins its numerical plotting routine with the pole-zero pairs not cancelled algebraically. If the computer was perfect, the result would be the same as in the simplified case. However, numerical error when evaluating the numerator and denominators effectively leaves some of the pole-zero pairs "uncancelled" and as many of these poles are in the far right side of the s plane, they drastically influence the output behavior.
Check out this link for info on this same problem but from the perspective of design: http://ctms.engin.umich.edu/CTMS/index.php?aux=Extras_PZ
In your original code, you can think of the output drawn in green as what the naive designer wanted to see when he cancelled all his unstable poles with zeros, but the output drawn in red is what he actually got because in practice, finite-precision and real-world tolerances prevent the poles and zeros from cancelling perfectly.
Why is an unobservable / uncontrollable pole? I think this issue comes only because the inverse of a transfer function matrix is inaccurate in Matlab.
Note:
A is 3x3 and the minimal realization has also order 3.
What you did is the inverse of a transfer function matrix, not a symbolic or numeric matrix.
# Discrete system
Ts = 0.01;
A = [0 1 0; 0 0 1; 0.41 -1.21 1.8];
B = [0; 0; 0.01];
C = [7 -73 170];
D = 1;
z = tf('z', Ts)) # z is a discrete tf
A1 = z*eye(3) - A # a tf matrix with a direct feedthrough matrix A
# inverse it, multiply with C and B from left and right, and plus D
G = D + C*inv(A1)*B
G is now a scalar (SISO) transfer function.
Without "minreal", G has order 9 (funny, I don't know how Matlab computes it, perhaps the "Adj(.)/det(.)" method). Matlab cannot cancel the common factors in the numerator and the denominator, because z is of class 'tf' rather than a symbolic variable.
Do you agree or do I have misunderstanding?

tensile tests in matlab

The problem says:
Three tensile tests were carried out on an aluminum bar. In each test the strain was measured at the same values of stress. The results were
where the units of strain are mm/m.Use linear regression to estimate the modulus of elasticity of the bar (modulus of elasticity = stress/strain).
I used this program for this problem:
function coeff = polynFit(xData,yData,m)
% Returns the coefficients of the polynomial
% a(1)*x^(m-1) + a(2)*x^(m-2) + ... + a(m)
% that fits the data points in the least squares sense.
% USAGE: coeff = polynFit(xData,yData,m)
% xData = x-coordinates of data points.
% yData = y-coordinates of data points.
A = zeros(m); b = zeros(m,1); s = zeros(2*m-1,1);
for i = 1:length(xData)
temp = yData(i);
for j = 1:m
b(j) = b(j) + temp;
temp = temp*xData(i);
end
temp = 1;
for j = 1:2*m-1
s(j) = s(j) + temp;
temp = temp*xData(i);
end
end
for i = 1:m
for j = 1:m
A(i,j) = s(i+j-1);
end
end
% Rearrange coefficients so that coefficient
% of x^(m-1) is first
coeff = flipdim(gaussPiv(A,b),1);
The problem is solved without a program as follows
MY ATTEMPT
T=[34.5,69,103.5,138];
D1=[.46,.95,1.48,1.93];
D2=[.34,1.02,1.51,2.09];
D3=[.73,1.1,1.62,2.12];
Mod1=T./D1;
Mod2=T./D2;
Mod3=T./D3;
xData=T;
yData1=Mod1;
yData2=Mod2;
yData3=Mod3;
coeff1 = polynFit(xData,yData1,2);
coeff2 = polynFit(xData,yData2,2);
coeff3 = polynFit(xData,yData3,2);
x1=(0:.5:190);
y1=coeff1(2)+coeff1(1)*x1;
subplot(1,3,1);
plot(x1,y1,xData,yData1,'o');
y2=coeff2(2)+coeff2(1)*x1;
subplot(1,3,2);
plot(x1,y2,xData,yData2,'o');
y3=coeff3(2)+coeff3(1)*x1;
subplot(1,3,3);
plot(x1,y3,xData,yData3,'o');
What do I have to do to get this result?
As a general advice:
avoid for loops wherever possible.
avoid using i and j as variable names, as they are Matlab built-in names for the imaginary unit (I really hope that disappears in a future release...)
Due to m being an interpreted language, for-loops can be very slow compared to their compiled alternatives. Matlab is named MATtrix LABoratory, meaning it is highly optimized for matrix/array operations. Usually, when there is an operation that cannot be done without a loop, Matlab has a built-in function for it that runs way way faster than a for-loop in Matlab ever will. For example: computing the mean of elements in an array: mean(x). The sum of all elements in an array: sum(x). The standard deviation of elements in an array: std(x). etc. Matlab's power comes from these built-in functions.
So, your problem. You have a linear regression problem. The easiest way in Matlab to solve this problem is this:
%# your data
stress = [ %# in Pa
34.5 69 103.5 138] * 1e6;
strain = [ %# in m/m
0.46 0.95 1.48 1.93
0.34 1.02 1.51 2.09
0.73 1.10 1.62 2.12]' * 1e-3;
%# make linear array for the data
yy = strain(:);
xx = repmat(stress(:), size(strain,2),1);
%# re-formulate the problem into linear system Ax = b
A = [xx ones(size(xx))];
b = yy;
%# solve the linear system
x = A\b;
%# modulus of elasticity is coefficient
%# NOTE: y-offset is relatively small and can be ignored)
E = 1/x(1)
What you did in the function polynFit is done by A\b, but the \-operator is capable of doing it way faster, way more robust and way more flexible than what you tried to do yourself. I'm not saying you shouldn't try to make these thing yourself (please keep on doing that, you learn a lot from it!), I'm saying that for the "real" results, always use the \-operator (and check your own results against it as well).
The backslash operator (type help \ on the command prompt) is extremely useful in many situations, and I advise you learn it and learn it well.
I leave you with this: here's how I would write your polynFit function:
function coeff = polynFit(X,Y,m)
if numel(X) ~= numel(X)
error('polynFit:size_mismathc',...
'number of elements in matrices X and Y must be equal.');
end
%# bad condition number, rank errors, etc. taken care of by \
coeff = bsxfun(#power, X(:), m:-1:0) \ Y(:);
end
I leave it up to you to figure out how this works.

Improve Execution Time of MATLAB Function

The following function calculates the Gaussian Kernel and is part of the Kernel Ridge Regression algorithm that I wrote. I was wondering how could I modify this function properly in order to improve the execution time (i.e. get rid of the two for loops). Any ideas?
function [K] = calculate_krr_gaussiankernel(Xi,Xj,S)
K = zeros(size(Xi,1),size(Xj,1));
for Ixi = 1:size(Xi,1),
for Ixj = 1:size(Xj,1),
K(Ixi,Ixj) = exp((-norm(Xi(Ixi,:) - Xj(Ixj,:)) .^ 2) ./ (2 * (S .^ 2)));
end
end
end
EDIT: The formula:
Here's a version that's most likely faster. It might however give rise to memory issues for large Xi/Xj.
function K = calculate_krr_gaussiankernel(Xi, Xj, S)
%# create an array of difference between Xi(r,:) and Xj(s,:) for all r,s
delta = bsxfun(#minus, permute(Xi,[1 3 2]), permute(Xj,[3 1 2]));
%# calculate the squared norm
ssq = sum(delta.^2, 3);
%# calculate the kernel
K = exp(-ssq./(2*S.^2));
Here's an explanation of what I'm doing:
the bsxfun line: I reshape the inputs, such that I can get, at every (i,j), the difference vector in the third dimension
the ssq line simply takes the sum of squares. I could take the square root here and thus get the norm, but since we'll square that again, anyway, there's no point in that.
the final line implements the formula in the OP, where ssq is the squared norm of the differences.
You can certainly double the speed (approximately) since K is symmetric. In addition you can calculate the norm of the difference vector and then make a single call to exp() which may be faster than calling exp() over and over again. Putting this together:
function [K] = calculate_krr_gaussiankernel(Xi,Xj,S)
arg = zeros(size(Xi,1),size(Xj,1));
for Ixi = 1:size(Xi,1),
% diagonal elements can be done in outer loop:
arg(Ixi,Ixi) = norm(Xi(Ixi,:) - Xj(Ixi,:));
for Ixj = Ixi+1:size(Xj,1), % off-diagonals done once and copied
arg(Ixi,Ixj) = norm(Xi(Ixi,:) - Xj(Ixj,:));
arg(Ixj,Ixi) = arg(Ixi,Ixj);
end
end
end
K = exp(( -arg.^ 2) ./ (2 * (S .^ 2)))

How can I speed up this call to quantile in Matlab?

I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:
The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.
To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:
function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q)
q=transpose(q);
end
Y = zeros(size(X));
for i = 1:nLevels
% "The variables g and l indicate the entries that are respectively greater than
% or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
% value i to any element that falls in this bucket."
if i ~= nLevels % "The default; doesnt include upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#lt,X,q(i+1,:));
else % "For the final level we include the upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#le,X,q(i+1,:));
end
Y(g & l) = i;
end
Is there anything I can do to speed this up? Can the code be vectorized?
If I understand correctly, you want to know how many items fell in each bucket.
Use:
n = hist(Y,nbins)
Though I am not sure that it will help in the speedup. It is just cleaner this way.
Edit : Following the comment:
You can use the second output parameter of histc
[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then
How About this
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
Y = zeros(size(X));
for i = 1:numel(q)-1
Y = Y+ X>=q(i);
end
This results in the following:
>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)
Y =
1 1 2 2 2 1
q =
1 3.5 7
You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.
I think you shoud use histc
[~,Y] = histc(X,q)
As you can see in matlab's doc:
Description
n = histc(x,edges) counts the number of values in vector x that fall
between the elements in the edges vector (which must contain
monotonically nondecreasing values). n is a length(edges) vector
containing these counts. No elements of x can be complex.
I made a couple of refinements (including one inspired by Aero Engy in another answer) that have resulted in some improvements. To test them out, I created a random matrix of a million rows and 100 columns to run the improved functions on:
>> x = randn(1000000,100);
First, I ran my unmodified code, with the following results:
Note that of the 40 seconds, around 14 of them are spent computing the quantiles - I can't expect to improve this part of the routine (I assume that Mathworks have already optimized it, though I guess that to assume makes an...)
Next, I modified the routine to the following, which should be faster and has the advantage of being fewer lines as well!
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
Y = ones(size(X));
for i = 2:nLevels
Y = Y + bsxfun(#ge,X,q(i,:));
end
The profiling results with this code are:
So it is 15 seconds faster, which represents a 150% speedup of the portion of code that is mine, rather than MathWorks.
Finally, following a suggestion of Andrey (again in another answer) I modified the code to use the second output of the histc function, which assigns entries to bins. It doesn't treat the columns independently, so I had to loop over the columns manually, but it seems to be performing really well. Here's the code:
function [Y q] = levels(X,nLevels)
p = linspace(0,1,nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
q(end,:) = 2 * q(end,:);
Y = zeros(size(X));
for k = 1:size(X,2)
[junk Y(:,k)] = histc(X(:,k),q(:,k));
end
And the profiling results:
We now spend only 4.3 seconds in codes outside the quantile function, which is around a 500% speedup over what I wrote originally. I've spent a bit of time writing this answer because I think it's turned into a nice example of how you can use the MATLAB profiler and StackExchange in combination to get much better performance from your code.
I'm happy with this result, although of course I'll continue to be pleased to hear other answers. At this stage the main performance increase will come from increasing the performance of the part of the code that currently calls quantile. I can't see how to do this immediately, but maybe someone else here can. Thanks again!
You can sort the columns and divide+round the inverse indexes:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
[S,IX]=sort(X);
[grid1,grid2]=ndgrid(1:size(IX,1),1:size(IX,2));
invIX=zeros(size(X));
invIX(sub2ind(size(X),IX(:),grid2(:)))=grid1;
Y=ceil(invIX/size(X,1)*nLevels);
Or you can use tiedrank:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
R=tiedrank(X);
Y=ceil(R/size(X,1)*nLevels);
Surprisingly, both these solutions are slightly slower than the quantile+histc solution.