Simulate and plot with matlab - matlab

I'm trying to simulate some random variables Y such that P(Y=1)=P(y=-1)=0.5, and X_n = sum of Y_i (i from 1 to n). I want to use matlab to simulate X_n and plot it versus different n's, where n = 1,2,3,...100. Here is my matlab code:
N = 100;
for M = 1:N
y_i = randi([-1 1], M, 1);
X_n = sum(y_i);
end
plot(M, X_n)
But my plot looks like this, can someone help me fix it? Is there something wrong with my code? Thank you.

Seems like somebody provided you with the right answer already but let me explain and how i would go about it. The only thing you're doing wrong is about the indexing. Try this.
N = 100; % sets your maximum
for M = 1:N % loops from 1 - N
y_i = randi([-1 1], M, 1); % your formula
X(M) = sum(y_i); % stores your data in vectors with increasing index from 1 - 100
end
index = 1:N % generates a vector 1-100 to serve as indexes
plot(index, X) % plots each point of X a corresponding index

Related

How to loop vectors when using matlab?

I want to do a simple loop in matlab. For example, v=[1,2,3], I want to get v(1)=v*2+[1,2,5];V(2)=2*v(1)+[1,2,5], and so on. Then v(1)=[3,6,11]
I have tried:
x=[1,2,3];
y=x;
for j=1:5
y(j+1)=2*y(j)+[1,2,5];
end
but is wrong.
How can I solve it?
How about:
N = 100;
B = rand(N, N);
A = B / norm(B); % substitute norm of your choice

Function "pdist" in Matlab

I have an N by 2 matrix called r (N is very large). r is the position of points in 2D. I searched for the best-optimized way of calculating distance between point. I find that dist function is the best on in less time-consuming if one doesn't try to change it to a square matrix. I wonder if I write
D= pdist(r, 'euclidean');
When I need distance between particle i and j, what is the best way to find it using D vector? I do not really any way without using if.
I know that I can do it by
if (i < j)
D((i–1)*(m–i/2)+j–i)
end
But as N is very large, this is not efficient. Could anyone help me, please?
I'm using ii and jj as row and column indices into the hypohetical distance matrix M = squareform(D) of size and N. The result is ind, such that D(ind) equals M(ii,jj).
t = sort([ii, jj]); % temporary variable
ii = t(2); % maximum of ii and jj
jj = t(1); % minimum of ii and jj
t = N-1:-1:1;
ind = sum(t(1:jj-1)) + ii - jj;

Vectorize with Matlab Meshgrid in Chebfun

I am trying to use meshgrid in Matlab together with Chebfun to get rid of double for loops. I first define a quasi-matrix of N functions,
%Define functions of type Chebfun
N = 10; %number of functions
x = chebfun('x', [0 8]); %Domain
psi = [];
for i = 1:N
psi = [psi sin(i.*pi.*x./8)];
end
A sample calculation would be to compute the double sum $\sum_{i,j=1}^10 psi(:,i).*psi(:,j)$. I can achieve this using two for loops in Matlab,
h = 0;
for i = 1:N
for j = 1:N
h = h + psi(:,i).*psi(:,j);
end
end
I then tried to use meshgrid to vectorize in the following way:
[i j] = meshgrid(1:N,1:N);
h = psi(:,i).*psi(:,j);
I get the error "Column index must be a vector of integers". How can I overcome this issue so that I can get rid of my double for loops and make my code a bit more efficient?
BTW, Chebfun is not part of native MATLAB and you have to download it in order to run your code: http://www.chebfun.org/. However, that shouldn't affect how I answer your question.
Basically, psi is a N column matrix and it is your desire to add up products of all combinations of pairs of columns in psi. You have the right idea with meshgrid, but what you should do instead is unroll the 2D matrix of coordinates for both i and j so that they're single vectors. You'd then use this and create two N^2 column matrices that is in such a way where each column corresponds to that exact column numbers specified from i and j sampled from psi. You'd then do an element-wise multiplication between these two matrices and sum across all of the columns for each row. BTW, I'm going to use ii and jj as variables from the output of meshgrid instead of i and j. Those variables are reserved for the complex number in MATLAB and I don't want to overshadow those unintentionally.
Something like this:
%// Your code
N = 10; %number of functions
x = chebfun('x', [0 8]); %Domain
psi = [];
for i = 1:N
psi = [psi sin(i.*pi.*x./8)];
end
%// New code
[ii,jj] = meshgrid(1:N, 1:N);
%// Create two matrices and sum
matrixA = psi(:, ii(:));
matrixB = psi(:, jj(:));
h = sum(matrixA.*matrixB, 2);
If you want to do away with the temporary variables, you can do it in one statement after calling meshgrid:
h = sum(psi(:, ii(:)).*psi(:, jj(:)), 2);
I don't have Chebfun installed, but we can verify that this calculates what we need with a simple example:
rng(123);
N = 10;
psi = randi(20, N, N);
Running this code with the above more efficient solution gives us:
>> h
h =
8100
17161
10816
12100
14641
9216
10000
8649
9025
11664
Also, running the above double for loop code also gives us:
>> h
h =
8100
17161
10816
12100
14641
9216
10000
8649
9025
11664
If you want to be absolutely sure, we can have both codes run with the outputs as separate variables, then check if they're equal:
%// Setup
rng(123);
N = 10;
psi = randi(20, N, N);
%// Old code
h = 0;
for i = 1:N
for j = 1:N
h = h + psi(:,i).*psi(:,j);
end
end
%// New code
[ii,jj] = meshgrid(1:N, 1:N);
hnew = sum(psi(:, ii(:)).*psi(:, jj(:)), 2);
%// Check for equality
eql = isequal(h, hnew);
eql checks if both variables are equal, and we do get them as such:
>> eql
eql =
1

Estimated mean and covariance calculation in matlab using maximum likelihood method

I am trying to calculate estimated mean and co-variance using maximum likelihood method in matlab. I am newbie in Matlab and having problems which i like to be cleared here.
I am using following code:
clear all;
%Visualization of 2D Gaussian Distribution
% Mean of the distribution
mu = [1 -1];
% Covariance matrix (Must be symetric)
sigma = [ 2 1 ; 1 3 ];
% Samples
X = mvnrnd(mu,sigma,1000);
analytical_mean = mean(X);
analytical_cov = cov(X);
N = size(X,1);
estimated_mean = sum(X)/N;
summation = 0;
for i=1:N,
row = X(i,:);
tmp1= (row - estimated_mean);
tmp2 = tmp1';
summation = summation + tmp2;
end
covar = summation/N;
Now analytical_mean and estimated_mean are coming equal but my calculated co-variance covar is not coming as a matrix like analytical_cov. Kindly I need to know how to calculate covar correctly.
I am using below equations:
you can try this instead
[m,n] = size(X);
estimated_mean = sum(X)/m;
tmp=zeros(m,n);
for i=1:n
tmp(:,i)= ((X(:,i) - estimated_mean(i)));
end
covar = (tmp.'*tmp)/m;
I think you want
tmp2 = tmp1'*tmp1;
instead of
tmp2 = tmp1'
That change makes covar pretty close for me:
covar =
1.9042 0.9534
0.9534 3.0195
The clue was the dimensions of covar for you code, should have been 2-by-2 but yours was 2-by-1

Vectorize octave/matlab codes

Following is the octave codes(part of kmeans)
centroidSum = zeros(K);
valueSum = zeros(K, n);
for i = 1 : m
for j = 1 : K
if(idx(i) == j)
centroidSum(j) = centroidSum(j) + 1;
valueSum(j, :) = valueSum(j, :) + X(i, :);
end
end
end
The codes work, is it possible to vectorize the codes?
It is easy to vectorize the codes without if statement,
but how could we vectorize the codes with if statement?
I assume the purpose of the code is to compute the centroids of subsets of a set of m data points in an n-dimensional space, where the points are stored in a matrix X (points x coordinates) and the vector idx specifies for each data point the subset (1 ... K) the point belongs to. Then a partial vectorization is:
centroid = zeros(K, n)
for j = 1 : K
centroid(j, :) = mean(X(idx == j, :));
end
The if is eliminated by indexing, in particular logical indexing: idx == j gives a boolean array which indicates which data points belong to subset j.
I think it might be possible to get rid of the second for-loop, too, but this would result in very convoluted, unintelligible code.
Brief introduction and solution code
This could be one fully vectorized approach based on -
accumarray: For accumulating summations as done for calulating valueSum. This also introduces a technique how one can use accumarray on a 2D matrix along a certain direction, which isn't possible in a straight-forward manner with it.
bsxfun: For calculating linear indices across all columns for matching row indices from idx.
Here's the implementation -
%// Store no. of columns in X for frequent usage later on
ncols = size(X,2);
%// Find indices in idx that are within [1:k] range, call them as labels
%// Also, find their locations in that range array, call those as pos
[pos,id] = ismember(idx,1:K);
labels = id(pos);
%// OR with bsxfun: [pos,labels] = find(bsxfun(#eq,idx(:),1:K));
%// Find all labels, i.e. across all columns of X
all_labels = bsxfun(#plus,labels(:),[0:ncols-1]*K);
%// Get truncated X corresponding to all indices matches across all columns
X_cut = X(pos,:);
%// Accumulate summations within each column based on the labels.
%// Note that accumarray doesn't accept matrices, so we were required
%// to create all_labels that had same labels within each column and
%// offsetted at constant intervals from consecutive columns
acc1 = accumarray(all_labels(:),X_cut(:));
%// Regularise accumulated array and reshape back to a 2D array version
acc1_reg2D = [acc1 ; zeros(K*ncols - numel(acc1),1)];
valueSum = reshape(acc1_reg2D,[],ncols);
centroidSum = histc(labels,1:K); %// Get labels counts as centroid sums
Benchmarking code
%// Datasize parameters
K = 5000;
n = 5000;
m = 5000;
idx = randi(9,1,m);
X = rand(m,n);
disp('----------------------------- With Original Approach')
tic
centroidSum1 = zeros(K,1);
valueSum1 = zeros(K, n);
for i = 1 : m
for j = 1 : K
if(idx(i) == j)
centroidSum1(j) = centroidSum1(j) + 1;
valueSum1(j, :) = valueSum1(j, :) + X(i, :);
end
end
end
toc, clear valueSum1 centroidSum1
disp('----------------------------- With Proposed Approach')
tic
%// ... Code from earlied mentioned section
toc
Runtime results
----------------------------- With Original Approach
Elapsed time is 1.235412 seconds.
----------------------------- With Proposed Approach
Elapsed time is 0.379133 seconds.
Not sure about its runtime performance but here's a non-convoluted vectorized implementation:
b = idx == 1:K;
centroids = (b' * X) ./ sum(b)';
Vectorizing the calculation makes a huge difference in performance. Benchmarking
The original code,
The partial vectorization from A. Donda and
The full vectorization from Tom,
gave me the following results:
Original Code: Elapsed time is 1.327877 seconds.
Partial Vectorization: Elapsed time is 0.630767 seconds.
Full Vectorization: Elapsed time is 0.021129 seconds.
Benchmarking code here:
%// Datasize parameters
K = 5000;
n = 5000;
m = 5000;
idx = randi(9,1,m);
X = rand(m,n);
fprintf('\nOriginal Code: ')
tic
centroidSum1 = zeros(K,1);
valueSum1 = zeros(K, n);
for i = 1 : m
for j = 1 : K
if(idx(i) == j)
centroidSum1(j) = centroidSum1(j) + 1;
valueSum1(j, :) = valueSum1(j, :) + X(i, :);
end
end
end
centroids = valueSum1 ./ centroidSum1;
toc, clear valueSum1 centroidSum1 centroids
fprintf('\nPartial Vectorization: ')
tic
centroids = zeros(K,n);
for k = 1:K
centroids(k,:) = mean( X(idx == k, :) );
end
toc, clear centroids
fprintf('\nFull Vectorization: ')
tic
centroids = zeros(K,n);
b = idx == 1:K;
centroids = (b * X) ./ sum(b)';
toc
Note, I added an extra line to the original code to element-wise divide valueSum1 by centroidSum1 to make the output of each type of code the same.
Finally, I know this isn't strictly an "answer", however I don't have enough reputation to add a comment, and I thought the benchmarking figures were useful to anyone who is learning MATLAB (like myself) and needs some extra motivation to master vectorization.