Given two random variables X and Y, where X=(x1,..,xn) and Y=(y1,...,yn) in a nx2 matrix A, so A=[X Y], i need to perform the next operation:
median((x-median(x))(y-median(y)))
I'm trying to obtain an estimator of the covariance matrix using the median instead the mean, for a nxt matrix where t represents the number of random variables and n the length of the data set.
So far, I made the next code:
for i=1:n
for j=1:n
a1=median(A(:,i));
a2=median(A(:,j));
SMM(i,j)=median(((A(:,i)-a1(ones(t,1),:)).*(A(:,j)-a2(ones(t,1),:))));
end
end
However, theoretically I must obtain a semidefinite (positive or negative) symmetric matrix, however that's not the case with this code.
Am I making any mistake in the code formulation?
Various points:
For each of your columns of A (x, y), the median (a1, a2) doesn't change. You should compute these outside the loops.
The loops go over n, rather than t, which are the variables and the indices to the output matrix.
I would first subtract the median from each column, to avoid repeatedly doing the same computations:
A = A - median(A,1); % be explicit about which dimension to take the median over!
Next, we'd loop over the txt output elements of the covariance matrix, and compute each of the elements:
t = size(A,2);
SMM = zeros(t,t); % always preallocate output arrays before a loop
for j=1:t
for i=1:t
SMM(i,j) = median(A(:,i).*A(:,j));
end
end
The loop can likely be vectorized, but that leads to a large intermediate matrix, which slow down code also. So it might not be worth the effort to vectorize. Only try it if this code is too slow!
It should also be possible to run the inner loop from i=j:t, to skip computing the redundant half of the symmetric matrix, instead copying over the previously computed values.
Related
Please correct me if there are somethings unclear in this question. I have two matrices pop, and ben of 3 dimensions. Call these dimensions as c,t,w . I want to repeat the exact same process I describe below for all of the c dimensions, without using a for loop as that is slow. For the discussion below, fix a value of the dimension c, to explain my thinking, later I will give a MWE. So when c is fixed I have a 2D matrix with dimension t,w.
Now I repeat the entire process (coming below!) for all of the w dimension.
If the value of u is zero, then I find the next non zero entry in this same t dimension. I save both this entry as well as the corresponding t index. If the value of u is non zero, I simply store this value and the corresponding t index. Call the index as i - note i would be of dimension (c,t,w). The last entry of every u(c,:,w) is guaranteed to be non zero.
Example if the u(c,:,w) vector is [ 3 0 4 2 0 1], then the corresponding i values are [1,3,3,4,6,6].
Now I take these entries and define a new 3d array of dimension (c,t,w) as follows. I take my B array and do the following what is not a correct syntax but to explain you: B(c,t,w)/u(c,i(c,t,w),w). Meaning I take the B values and divide it by the u values corresponding to the non zero indices of u from i that I computed.
For the above example, the denominator would be [3,4,4,2,1,1]. I hope that makes sense!!
QUESTION:
To do this, as this process simply repeats for all c, I can do a very fast vectorizable calculation for a single c. But for multiple c I do not know how to avoid the for loop. I don't knw how to do vectorizable calculations across dimensions.
Here is what I did, where c_size is the dimension of c.
for c=c_size:-1:1
uu=squeeze(pop(c,:,:)) ; % EXTRACT A 2D MATRIX FROM pop.
BB=squeeze(B(c,:,:)) ; % EXTRACT A 2D MATRIX FROM B
ii = nan(size(uu)); % Start with all nan values
[dum_row, ~] = find(uu); % Get row indices of non-zero values
ii(uu ~= 0) = dum_row; % Place row indices in locations of non-zero values
ii = cummin(ii, 1, 'reverse'); % Column-wise cumulative minimum, starting from bottomi
dum_i = ii+(time_size+1).*repmat(0:(scenario_size-1), time_size+1, 1); % Create linear index
ben(c,:,:) = BB(dum_i)./uu(dum_i);
i(c,:,:) = ii ;
clear dum_i dum_row uu BB ii
end
The central question is to avoid this for loop.
Related questions:
Vectorizable FIND function with if statement MATLAB
Efficiently finding non zero numbers from a large matrix
Vectorizable FIND function with if statement MATLAB
I have a matrix W which is a block diagonal matrix with dimensions 2*4, and each of its two block diagonals is 1*2 vector. I want to find the values of its entries that minimize the difference between the following function:
( F = BH-AW )
Where: W is the required block diagonal matrix to be optimized, B is a 2*2 matrix, H is a given 2*4 matrix, and A is a 2*2 matrix. A and B are calculated using the functions used in the attached code.
I tried this attached code, but I think it is now in an infinite loop and I don't know what should I do?
%% My code is:
while ((B*H)-(A*W)~=zeros(2,4))
w1=randn(1,2);
% generate the first block diagonal vector with dimensions 1*2. The values of each entry of the block diagonal vector maybe not the same.
w2=randn(1,2);
% generate the second block diagonal vector with dimensions 1*2.
W=blkdiag(w1,w2);
% build the block diagonal matrix that I want to optimize with dimensions 2*4.
R=sqrtm(W*inv(inv(P)+(H'*inv(eye(2)+D)*H))*W');
% R is a 2*2 matrix that will be used to calculate matrix A using the LLL lattice reduction algorithm. The values of P (4*4 matrix), H (2*4 matrix) and D (2*2 matrix) are given. It's clear here that matrix R is a function of W.
A= LLL(R,3/4);
% I use here LLL lattice reduction algorithm to obtain 2*2 matrix A which is function of R.
B=A'*W*P*H'*inv(eye(2)+D+H*P*H');
% B is 2*2 matrix which is function of A and W. The values of P (4*4 matrix), H (2*4 matrix) and D (2*2 matrix) are given.
end
Numerical operations with floating point numbers are only approximate on a computer (any number is only ever represented with a finite number of bits, which means you cannot exactly represent Pi for example). For more info, see this link.
Consequently, it is extremely unlikely that the loop you wrote will ever terminate, because the difference between B*H and A*W will not be exactly zero. Instead, you need to use a tolerance factor to decide when you are satisfied with the similarity achieved.
Additionally, as suggested by others in comment, the "distance" between two matrices is typically measured using some sort of norm (e.g. the Frobenius norm). By default, the norm function in Matlab will give the 2-norm of an input matrix.
In your case, this would give something like:
tol = 1e-6;
while norm(B*H-A*W) > tol
% generate the first block diagonal vector with dimensions 1*2.
% The values of each entry of the block diagonal vector maybe not the same.
w1=randn(1,2);
% generate the second block diagonal vector with dimensions 1*2.
w2=randn(1,2);
% build the block diagonal matrix that I want to optimize with dimensions 2*4.
W=blkdiag(w1,w2);
% R is a 2*2 matrix that will be used to calculate matrix A using the LLL lattice reduction algorithm.
% The values of P (4*4 matrix), H (2*4 matrix) and D (2*2 matrix) are given.
% It's clear here that matrix R is a function of W.
R=sqrtm(W/(inv(P)+(H'/(eye(2)+D)*H))*W');
% I use here LLL lattice reduction algorithm to obtain 2*2 matrix A which is function of R.
A= LLL(R,3/4);
% B is 2*2 matrix which is function of A and W. The values of P (4*4 matrix),
% H (2*4 matrix) and D (2*2 matrix) are given.
B=A'*W*P*H'/(eye(2)+D+H*P*H');
end
Note that:
With regards to the actual algorithm, I am a little bit concerned that your loop never seems to update the value of W, but instead updates matrices A and B. This suggests that your description of the problem might be incorrect or incomplete, but that is beyond the scope of this forum anyway (ask on Maths.SE if you want to know more).
Using inv() directly is discouraged in many cases. This is because the algorithm to compute the inverse of a matrix is less reliable than the algorithm to solve systems of the type AX=B. Matlab should give you a warning to use / and \ where possible; I would advise following this recommendation unless you know what you are doing.
In Matlab, I have two m-by-n matrices X and Y, with n>m.
I need to define a 3D m-by-m-by-n matrix Z whose components can be computed as
for i=2:m
for j=i+1:m
for k=1:n
Z(i,j,k) = (Y(j-1,k)-Y(i-1,k))*X(j-1,k);
end
end
end
As these nested loops require a long computational time, I have been looking for a way to define the matrix Z using matrices multiplication, but I have not managed so far. Any suggestion?
You can simply remove the inner loop (over k) by writing
Z(i,j,:) = (Y(j-1,:)-Y(i-1,:)).*X(j-1,:);
Note the .* element-wise multiplication. You can then proceed to remove additional loops in a similar manner.
But note that, most likely, your loop is slow because you don't preallocate the output array. Do this before the loop:
Z = zeros(m,m,n);
You can also gain a bit of speed by reversing the loop order, such that the first index is iterated over in the innermost loop, and the last index is iterated over in the outermost loop. This accesses the matrix data in memory order, making your cache more efficient.
I have two matrices A(2*1600*3) and B(2*1600). I am trying to do xcorr operation for each row in A against each row in B and want to store the results in a Matrix. At present I am using the following code.
for ii=1:3
for jj=1:2
X(ii,jj)=max((xcorr(A(jj,:,ii),B(jj,:))));
end
end
Since I am using two for loops, it is consuming more time and is affecting the execution time of my entire program which already had a for loop. How can I do this without the two for loops and store the output in a matrix ?
Meanwhile, I have also tried the above code with cellfun for a single column of the output matrix.
`cellfun(#(x) max(xcorr(x, B(1,:))), A, 'UniformOutput', false);`
In my observation, for loop is much faster than cellfun.
Execution times:
For loop: 2.4 secs for two columns of matrix output. Cellfun:2.6 secs for one column of matrix output.
You can do this easily using fft. Cross-correlation is very similar to convolution:
% Compute the size of the cross-correlation.
N = size(A,2) + size(B,2) - 1;
% Do correlation using FFT. We have to flip one of the inputs.
% If A and B are both symmetric, you might want to add the 'symmetric' flag to ifft
Y = ifft(fft(A,N,2) .* fft(flip(B,2),N,2), N, 2);
% Squeeze out the second dimension and transpose so it matches your size and shape.
Y = squeeze(max(Y, [], 2))'
Im am currently writing a code to implement a numerical approximation to the 3D steady state heat equation using finite difference matrix methods. This involves discritising the 2nd order PDE into the matrix A and solving Ax=b. where x is temperature at each of the specified grid points. Further information on this type of question can be found here:
http://people.nas.nasa.gov/~pulliam/Classes/New_notes/Matrix_ODE.pdf
To complete this problem, I have represented the 3D matrix A by a 2D array calling the values in the 1D array b using an indexing function of the form:
i+(j-1)*Nx+Nx*Ny*(k-1)
for the (i,j,k)th element of the 3D matrix where Nx, Ny, Nz are the number of points in the x,y,z coordinates. There ends up being a lot of loop computation in order to create the matrix A and b and I was wondering what is the most computationally efficient and less memory exhaustive way to run these loops, i.e. is it better to use something like
for j=1:Ny
for i=2:Nx-1
b(i+(j-1)*Nx)=D4;
end
end
for j=1:Ny
for i=2:Nx-1
b(i+(j-1)*Nx+Nx*Ny*(Nz-1))=D3;
end
end
or should I condense these into a single loop like:
for j=1:Ny
for i=2:Nx-1
b(i+(j-1)*Nx)=D4;
b(i+(j-1)*Nx+Nx*Ny*(Nz-1))=D3;
end
end
I have preallocated both the arrays A and b. Is there a vectorised way to do this also?
Assuming Nx, Ny, Nz, D3 and D4 to be scalars and that you are using pre-allocation for b with zeros, you may try this vectorized approach -
I = 2:Nx-1; %// Vectors to represent i
J = 1:Ny; %// Vectors to represent j
ind1 = bsxfun(#plus,I,[(J-1)*Nx]'); %//' Indices, 1st set of nested loops
ind2 = bsxfun(#plus,I,[(J-1)*Nx+Nx*Ny*(Nz-1)]'); %//' Indices, 2nd set of loops
b(ind1) = D4; %// Assign values for 1st set
b(ind2) = D3; %// Assign values for 2nd set
The second method should be slightly faster since it performs the same number of calculations with fewer increments of the loop variables. You can look into MATLAB's built-in stopwatch commands tic and toc to time your code. http://www.mathworks.com/help/matlab/ref/tic.html
Something more vectorized might be possible but I would need to know more about the format of the arrays that contain D3 and D4. The reshape() function might be able to help.