How to accelerate this matlab function - matlab

I have a function that performs the HodgesLehmann robust mean over a vector x[m,n]. n is the batch index of data, m is the number of samples.
function HLe = HodgesLehmann(x)
% Obtain dimensions
[m,n] = size(x);
% Create xi and xj values with the i <= j restriction enforced
q = logical(triu(ones(m,m),0));
i = uint32((1:m)'*ones(1,m));
xi = x(i(q),:);
j = uint32(ones(m,1)*(1:m));
xj = x(j(q),:);
% Calculate pairwise means (Walsh averages)
W = (xi+xj)./2;
% Calculate ordinary median of Walsh averages
HLe = median(W);
I am looking for a way to accelerate this function, it does not scale well for large dimensions of x. Any way of accelerating this is also welcome.
Many thanks.

Inspired by this solution, here's a possible (not tested for performance) improvement -
%// Calculate pairwise means (Walsh averages)
[I,J] = find(bsxfun(#le,[1:m]',[1:m])); %//'
W = (x(J,:) + x(I,:))./2;
%// Calculate ordinary median of Walsh averages
HLe = median(W);

Related

Vectors must be the same length error in Curve Fitting in Matlab

I'm having problems in curve fitting my randomized data for the function
Here is my code
N = 100;
mu = 5; stdev = 2;
x = mu+stdev*randn(N,1);
bin=mu-6*stdev:0.5:mu+6*stdev;
f=hist(x,bin);
plot(bin,f,'bo'); hold on;
x_ = x(1):0.1:x(end);
y_ = (1./sqrt(8.*pi)).*exp(-((x_-mu).^2)./8);
plot(x_,y_,'b-'); hold on;
It seems like I'm having vector size problems since it is giving me the error
Error using plot
Vectors must be the same length.
Note that I simplified y_ since mu and the standard deviation is known.
Plot:
Well first of all some adjustments to your question:
You are not trying to do curve fitting. What you are trying to do (in my opinion) is to overlay a probability density function on an histogram obtained by taking random points from the same distribution (A normal distribution with parameters (mu,sigma)). These two curve should indeed overlay, as they represent the same thing, only one is analytical and the other one is obtained numerically.
As seen in the hist documentation, hist is not recommended and you should use histogram instead
First step: Generating your random data
Knowing the distribution is the Normal distribution, we can use MATLAB's random function to do that :
N = 150;
rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,N,1);
Second step: Plot the histogram
Because we don't just want a count of the elements in each bin, but a feel of the probability density function, we can use the 'Normalization' 'pdf' arguments
Nbins = 25;
f=histogram(r,Nbins,'Normalization','pdf');
hold on
Here I'd rather specify a number of bins than specifying the bins themselves, because you never know in advance how far from the mean your data is going to be.
Last step: overlay the probability density function over the histogram
The histogram being already consistent with a probability density function, it is sufficient to just overlay the density function:
x_ = linspace(min(r),max(r),100);
y_ = (1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'b-');
With N = 150
With N = 1500
With N = 150.000 and Nbins = 50
If for some obscure reason you want to use old hist() function
The old hist() function can't handle normalization, so you'll have to do it by hand, by normalizing your density function to fit your histogram:
N = 1500;
% rng('default') % For reproducibility
mu = 5;
sigma = 2;
r = random('Normal',mu,sigma,1,N);
Nbins = 50;
[~,centers]=hist(r,Nbins);
hist(r,Nbins); hold on
% Width of bins
Widths = diff(centers);
x_ = linspace(min(r),max(r),100);
y_ = N*mean(Widths)*(1./sqrt(2*sigma^2*pi)).*exp(-((x_-mu).^2)./(2*sigma^2));
plot(x_,y_,'r-');

Which function allow me to calculate cumulative variance over a vector?

I need to calculate the cumulative variance of a vector. I have tried to build and script, but this script takes too much time to calculate the cumulative variance of my vectors of size 1*100000. Do you know if there exists a faster way to find this cumulative variance?
This is the code I am using
%%Creation of the rand vectors. ans calculation of the variances
d=100000; %dimension of the vectors
nv=6 %quantity of vectors
for j=1:nv;
VItimeseries(:,j)=rand(d,1); % Final matrix with vectors
end
%% script to calculate the cumulative variance in the columns of my matrix
VectorVarianza=0;
VectoFinalVar=0;
VectorFinalTotalVAriances=zeros(d,nv);
for k=1:nv %number of columns
for j=1:numel(VItimeseries(:,k)) %size of the rows
Vector=VItimeseries(:,k);
VectorVarianza(1:j)= Vector(1:j); % Vector to calculate the variance...
...Independently
VectorFinalVar(j,k)= var(VectorVarianza);%Calculation of variances
end
VectorFinalTotalVAriances(:,k)=VectorFinalVar(:,k)% construction of the...
...Final Vector with the cumulative variances
end
Looping over the n elements of x, and within the loop computing the variance of all elements up to i using var(x(1:i)) amounts to an algorithm O(n2). This is inherently expensive.
Sample variance (what var computes) is defined as sum((x-mean(x)).^2) / (n-1), with n = length(x). This can be rewritten as (sum(x.^2) - sum(x).^2 / n) / (n-1). This formula allows us to accumulate sum(x) and sum(x.^2) within a single loop, then compute the variance later. It also allows us to compute the cumulative variance in O(n).
For a vector x, we'd have the following loop:
x = randn(100,1); % some data
v = zeros(size(x)); % cumulative variance
s = x(1); % running sum of x
s2 = x(1).^2; % running sum of square of x
for ii = 2:numel(x) % loop starts at 2, for ii=1 we cannot compute variance
s = s + x(ii);
s2 = s2 + x(ii).^2;
v(ii) = (s2 - s.^2 / ii) / (ii-1);
end
We can avoid the explicit loop by using cumsum:
s = cumsum(x);
s2 = cumsum(x.^2);
n = (1:numel(x)).';
v = (s2 - s.^2 ./ n) ./ (n-1); % v(1) will be NaN, rather than 0 as in the first version
v(1) = 0; % so we set it to 0 explicitly here
The code in the OP computes the cumulative variance for each column of a matrix. The code above can be trivially adapted to do the same:
s = cumsum(VItimeseries,1); % cumulative sum explicitly along columns
s2 = cumsum(VItimeseries.^2,1);
n = (1:size(VItimeseries,1)).'; % use number of rows, rather than `numel`.
v = (s2 - s.^2 ./ n) ./ (n-1);
v(1,:) = 0; % fill first row with zeros, not just first element

Computing an ODE in Matlab

Given a system of the form y' = A*y(t) with solution y(t) = e^(tA)*y(0), where e^A is the matrix exponential (i.e. sum from n=0 to infinity of A^n/n!), how would I use matlab to compute the solution given the values of matrix A and the initial values for y?
That is, given A = [-2.1, 1.6; -3.1, 2.6], y(0) = [1;2], how would I solve for y(t) = [y1; y2] on t = [0:5] in matlab?
I try to use something like
t = 0:5
[y1; y2] = expm(A.*t).*[1;2]
and I'm finding errors in computing the multiplication due to dimensions not agreeing.
Please note that matrix exponential is defined for square matrices. Your attempt to multiply the attenuation coefs with the time vector doesn't give you what you'd want (which should be a 3D matrix that should be exponentiated slice by slice).
One of the simple ways would be this:
A = [-2.1, 1.6; -3.1, 2.6];
t = 0:5;
n = numel(t); %'number of samples'
y = NaN(2, n);
y(:,1) = [1;2];
for k =2:n
y(:,k) = expm(t(k)*A) * y(:,1);
end;
figure();
plot(t, y(1,:), t, y(2,:));
Please note that in MATLAB array are indexed from 1.

Mean and median calculation of a Gaussian Mixture Model in MATLAB

How can I calculate the mean and median of a Gaussian Mixture Model with three components like the following parameters in MATLAB:
Priors[0.4,0.25,0.34]
Centers [0.44;0.74;0.05]
Co-variance [0.03,0.18,0.03]
Thanks
Here is the MATLAB code for calculating mean and median of a Gaussian Mixture Model (GMM):
Mean Calculation for N GMMs:
for i = 1:N
mu = center{i};
p = prior{i};
mean_mix(i) = mu(1)*p(1) + mu(2)*p(2) + mu(3)*p(3);
end
Median Calculation for N GMMs:
median = zeros(N,1);
for i = 1:N
for j = 2:N
if (fix(trapz(x(1:j), gmm_pdfs(1:j,i))*100) == 50);
median(i) = x(j);
end
end
end
Note: gmm_pdfs are the evaluated pdfs against x.

Finding principal components with maximum variance in matlab

I used the following code to compute PCA :
function [signals,PC,V] = pca2(data)
[M,N] = size(data);
% subtract off the mean for each dimension
mn = mean(data,2);
data = data - repmat(mn,1,N);
% construct the matrix Y
Y = data’ / sqrt(N-1);
% SVD does it all
[u,S,PC] = svd(Y);
% calculate the variances
S = diag(S);
V = S .* S;
% project the original data
signals = PC’ * data;
I want to keep the principal components with the maximum variance , say maybe the first 10 principal components which contribute to the maximum variance. How do i go about this?
function [signals,V] = pca2(data)
[M,N] = size(data);
data = reshape(data, M*N,1);
% subtract off the mean for each dimension
mn = mean(data,2);
data = bsxfun(#minus, data, mean(data,1));
% construct the matrix Y
Y = data'*data / (M*N-1);
[V D] = eigs(Y, 10); % reduce to 10 dimension
% project the original data
signals = data * V;
I guess svds can do the job for you.
In the doc, it says:
s = svds(A,k) computes the k largest singular values and associated
singular vectors of matrix A.
Which is essentially the k largest eigenvalues and eigenvectors. These are sorted by eigenvalues in descending order.
So for 10 principal components, just use [eigvec eigval] = svds(Y, 10);