Mean value of each column of a matrix - matlab

I have a 64 X 64 matrix that I need to find the column-wise mean values for.
However, instead of dividing by the total number of elements in each column (i.e. 64), I need to divide by the total number of non-zeros in the matrix.
I managed to get it to work for a single column as shown below. For reference, the function that generates my matrix is titled fmu2(i,j).
q = 0;
for i = 1:64
if fmu2(i,1) ~= 0;
q = q + 1;
end
end
for i = 1:64
mv = (1/q).*sum(fmu2(i,1));
end
This works for generating the "mean" value of the first column. However, I'm having trouble looping this procedure so that I will get the mean for each column. I tried doing a nested for loop, but it just calculated the mean for the entire 64 X 64 matrix instead of one column at a time. Here's what I tried:
q = 0;
for i = 1:64
for j = 1:64
if fmu2(i,j) ~= 0;
q = q +1;
end
end
end
for i = 1:64
for j = 1:64
mv = (1/q).*sum(fmu2(i,j));
end
end
Like I said, this just gave me one value for the entire matrix instead of 64 individual "means" for each column. Any help would be appreciated.

For one thing, do not call the function that generates your matrix in each iteration of a loop. This is extremely inefficient and will cause major problems if your function is complex enough to have side effects. Store the return value in a variable once, and refer to that variable from then on.
Secondly, you do not need any loops here at all. The total number of nonzeros is given by the nnz function (short for number of non-zeros). The sum function accepts an optional dimension argument, so you can just tell it to sum along the columns instead of along the rows or the whole matrix.
m = fmu2(i,1)
averages = sum(m, 1) / nnz(m)
averages will be a 64-element array with an average for each column, since sum(m, 1) is a 64 element sum along each column and nnz(m) is a scalar.
One of the great things about MATLAB is that it provides vectorized implementations of just about everything. If you do it right, you should almost never have to use an explicit loop to do any mathematical operations at all.

If you want the column-wise mean of non-zero elements you can do the following
m = randi([0,5], 5, 5); % some data
avg = sum(m,1) ./ sum(m~=0,1);
This is a column-wise sum of values, divided by the column-wise number of elements not equal to 0. The result is a row vector where each element is the average of the corresponding column in m.
Note this is very flexible, you could use any condition in place of ~=0.

Related

Generating all ordered samples with replacement

I would like to generate an array which contains all ordered samples of length k taken from a set of n elements {a_1,...,a_n}, that is all the k-tuples (x_1,...,x_k) where each x_j can be any of the a_i (repetition of elements is allowed), and whose total number is n^k.
Is there a built-in function in Matlab to obtain it?
I have tried to write a code that iteratively uses the datasample function, but I couldn't get what desired so far.
An alternative way to get all the tuples is based on k-base integer representation.
If you take the k-base representation of all integers from 0 to n^k - 1, it gives you all possible set of k indexes, knowing that these indexes start at 0.
Now, implementing this idea is quite straightforward. You can use dec2base if k is lower than 10:
X = A(dec2base(0:(n^k-1), k)-'0'+1));
For k between 10 and 36, you can still use dec2base but you must take care of letters as there is a gap in ordinal codes between '9' and 'A':
X = A(dec2base(0:(n^k-1), k)-'0'+1));
X(X>=17) = X(X>=17)-7;
Above 36, you must use a custom made code for retrieving the representation of the integer, like this one. But IMO you may not need this as 2^36 is quite huge.
What you are looking for is ndgrid: it generates the grid elements in any dimension.
In the case k is fixed at the moment of coding, get all indexes of all elements a this way:
[X_1, ..., X_k] = ndgrid(1:n);
Then build the matrix X from vector A:
X = [A(X_1(:)), ..., A(X_k(:))];
If k is a parameter, my advice would be to look at the code of ndgrid and adapt it in a new function so that the output is a matrix of values instead of storing them in varargout.
What about this solution, I don't know if it's as fast as yours, but do you think is correct?
function Y = ordsampwithrep(X,K)
%ordsampwithrep Ordered samples with replacement
% Generates an array Y containing in its rows all ordered samples with
% replacement of length K with elements of vector X
X = X(:);
nX = length(X);
Y = zeros(nX^K,K);
Y(1,:) = datasample(X,K)';
k = 2;
while k < nX^K +1
temprow = datasample(X,K)';
%checknew = find (temprow == Y(1:k-1,:));
if not(ismember(temprow,Y(1:k-1,:),'rows'))
Y(k,:) = temprow;
k = k+1;
end
end
end

Optimize nested for loop for calculating xcorr of matrix rows

I have 2 nested loops which do the following:
Get two rows of a matrix
Check if indices meet a condition or not
If they do: calculate xcorr between the two rows and put it into new vector
Find the index of the maximum value of sub vector and replace element of LAG matrix with this value
I dont know how I can speed this code up by vectorizing or otherwise.
b=size(data,1);
F=size(data,2);
LAG= zeros(b,b);
for i=1:b
for j=1:b
if j>i
x=data(i,:);
y=data(j,:);
d=xcorr(x,y);
d=d(:,F:(2*F)-1);
[M,I] = max(d);
LAG(i,j)=I-1;
d=xcorr(y,x);
d=d(:,F:(2*F)-1);
[M,I] = max(d);
LAG(j,i)=I-1;
end
end
end
First, a note on floating point precision...
You mention in a comment that your data contains the integers 0, 1, and 2. You would therefore expect a cross-correlation to give integer results. However, since the calculation is being done in double-precision, there appears to be some floating-point error introduced. This error can cause the results to be ever so slightly larger or smaller than integer values.
Since your calculations involve looking for the location of the maxima, then you could get slightly different results if there are repeated maximal integer values with added precision errors. For example, let's say you expect the value 10 to be the maximum and appear in indices 2 and 4 of a vector d. You might calculate d one way and get d(2) = 10 and d(4) = 10.00000000000001, with some added precision error. The maximum would therefore be located in index 4. If you use a different method to calculate d, you might get d(2) = 10 and d(4) = 9.99999999999999, with the error going in the opposite direction, causing the maximum to be located in index 2.
The solution? Round your cross-correlation data first:
d = round(xcorr(x, y));
This will eliminate the floating-point errors and give you the integer results you expect.
Now, on to the actual solutions...
Solution 1: Non-loop option
You can pass a matrix to xcorr and it will perform the cross-correlation for every pairwise combination of columns. Using this, you can forego your loops altogether like so:
d = round(xcorr(data.'));
[~, I] = max(d(F:(2*F)-1,:), [], 1);
LAG = reshape(I-1, b, b).';
Solution 2: Improved loop option
There are limits to how large data can be for the above solution, since it will produce large intermediate and output variables that can exceed the maximum array size available. In such a case for loops may be unavoidable, but you can improve upon the for-loop solution above. Specifically, you can compute the cross-correlation once for a pair (x, y), then just flip the result for the pair (y, x):
% Loop over rows:
for row = 1:b
% Loop over upper matrix triangle:
for col = (row+1):b
% Cross-correlation for upper triangle:
d = round(xcorr(data(row, :), data(col, :)));
[~, I] = max(d(:, F:(2*F)-1));
LAG(row, col) = I-1;
% Cross-correlation for lower triangle:
d = fliplr(d);
[~, I] = max(d(:, F:(2*F)-1));
LAG(col, row) = I-1;
end
end

MATLAB: For loop structure [duplicate]

This question already has answers here:
Element-wise array replication in Matlab
(7 answers)
Closed 6 years ago.
This is a basic program but since I'm new to MATLAB, I'm not able to figure out the solution.
I have a column vector "Time" in which I want to print value "1" in first 147 cells, followed by "2" in 148 to 2*147 cells and so on. For that, I have written the following script:
Trial>> c=1;
Trial>> k=0;
Trial>> for i = c:146+c
Time(i,1)=1+k;
c=i;
k=k+1;
end
I know I need to iterate the loop over "Time(i,1)=1+k;" before it executes the next statement. I tried using break but that's not supposed to work. Can anyone suggest me the solution to get the desired results?(It was quite simple in C with just the use of curly braces.)
I am sure you don't want to run c=i; in every iteration.
My code should work for you:
x = 10; % Replace 10 by the max number you need in your array.
k = 1;
for i = 1 : x * 147
Time(i, 1) = k;
if rem(i, 147) == 0
k = k + 1;
end
end
This is the prime example of a piece of code that should be vectorized can help you understand vectorization. Your code can be written like this:
n = 147;
reps = 10; %% Replace this by the maximum number you want your matrix to have
Time = reshape(bsxfun(#plus, zeros(n,1), 0:reps), 1, []);
Explanation:
Let A be a column vector (1 column, n rows), and B be a row vector (1 row, m columns.
What bsxfun(#plus, A, B) will do here is to add all elements in A with all elements in B, like this:
A(1)+B(1) A(1)+B(2) A(1)+B(3) ... A(1)+B(m)
A(2)+B(1) A(2)+B(2) ............. A(2)+B(m)
............................................
A(n)+B(1) A(n)+B(2) .............. A(n)+B(m)
Now, for the two vectors we have: zeros(n,1), and 0:reps, this will give us;
0+0 0+1 0+2 0+reps
0+0 0+1 0+2 0+reps
% n rows of this
So, what we need to do now is place each column underneath each other, so that you will have the column with zeros first, then the row with ones, ... and finally the one with reps (147 in your case).
This can be achieved by reshaping the matrix:
reshape(bsxfun(#plus, zeros(n,1), 0:reps), [], 1);
^ ^ ^ ^
| | | Number of rows in the new matrix. When [] is used, the appropriate value will be chosen by Matlab
| | Number of rows in the new matrix
| matrix to reshape
reshape command
Another approach is using kron:
kron(ones(reps+1, 1) * 0:(n-1)
For the record, a review of your code:
You should always preallocate memory for matrices that are created inside loops. In this case you know it will become a matrix of dimensions ((reps+1)*n-by-1). This means you should do Time = zeros((reps+1)*n, 1);. This will speed up your code a lot.
You shouldn't use i and j as variable names in Matlab, as they denote the imaginary unit (sqrt(-1)). You can for instance do: for ii = 1:(n*147) instead.
You don't want c=i inside the loop, when the loop is supposed to go from c to c + 146. That doesn't make much sense.
You can use repmat,
x = 10; % Sequence length (or what ever it can be called)
M = repmat(1:x,147,1); % Replicate array 1:x for 147 columns
M = M(:); % Reshape the matrix so that is becomes a column vector.
I can assume that this is a task to practice for loops, but this will work.
An alternative solution may be to do
n = 147;
reps = 10;
a = ceil( (1:(n*reps)) / n);
You first construct an array with the length you want. Then you divide, and round of upwards. 1 to 147 will then become 1.

How to vectorize this Matlab loop

I need some help to vectorize the following operation since I'm a little confused.
So, I have a m-by-2 matrix A and n-by-1 vector b. I want to create a n-by-1 vector c whose entries should be the values of the second column of A whose line is given by the line where the correspondent value of b would fall...
Not sure if I was clear enough. Anyway, the code below does compute c correctly so you can understand what is my desired output. However, I want to vectorize this function since my real n and m are in the order of many thousands.
Note that values of bare non-integer and not necessarily equal to any of those in the first column of A (these ones could be non-integers too!).
m = 5; n = 10;
A = [(0:m-1)*1.1;rand(1,m)]'
b = (m-1)*rand(n,1)
[bincounts, ind] = histc(b,A(:,1))
for i = 1:n
c(i) = A(ind(i),2);
end
All you need is:
c = A(ind,2);

MATLAB: Indexing a large matrix for Monte Carlo Simulation

I'm trying to index a large matrix in MATLAB that contains numbers monotonically increasing across rows, and across columns, i.e. if the matrix is called A, for every (i,j), A(i+1,j) > A(i,j) and A(i,j+1) > A(i,j).
I need to create a random number n and compare it with the values of the matrix A, to see where that random number should be placed in the matrix A. In other words, the value of n may not equal any of the contents of the matrix, but it may lie in between any two rows and any two columns, and that determines a "bin" that identifies its position in A. Once I find this position, I increment the corresponding index in a new matrix of the same size as A.
The problem is that I want to do this 1,000,000 times. I need to create a random number a million times and do the index-checking for each of these numbers. It's a Monte Carlo Simulation of a million photons coming from a point landing on a screen; the matrix A consists of angles in spherical coordinates, and the random number is the solid angle of each incident photon.
My code so far goes something like this (I haven't copy-pasted it here because the details aren't important):
for k = 1:1000000
n = rand(1,1)*pi;
for i = length(A(:,1))
for j = length(A(1,:))
if (n > A(i-1,j)) && (n < A(i+1,j)) && (n > A(i,j-1)) && (n < A(i,j+1))
new_img(i,j) = new_img(i,j) + 1; % new_img defined previously as zeros
end
end
end
end
The "if" statement is just checking to find the indices of A that form the bounds of n.
This works perfectly fine, but it takes ridiculously long, especially since my matrix A is an image of dimensions 11856 x 11000. is there a quicker / cleverer / easier way of doing this?
Thanks in advance.
You can get rid of the inner loops by performing the calculation on all elements of A at once. Also, you can create the random numbers all at once, instead of one at a time. Note that the outermost pixels of new_img can never be different from zero.
randomNumbers = rand(1,1000000)*pi;
new_img = zeros(size(A));
tmp_img = zeros(size(A)-2);
for r = randomNumbers
tmp_img = tmp_img + A(:,1:end-2)<r & A(:,3:end)>r & A(1:end-1,:)<r & A(3:end,:)>r;
end
new_img(2:end-1,2:end-1) = tmp_img;
/aside: If the arrays were smaller, I'd have used bsxfun for the comparison, but with the array sizes in the OP, the approach would run out of memory.
Are the values in A bin edges? Ie does A specify a grid? If this is the case then you can QUICKLY populate A using hist3.
Here is an example:
numRand = 1e
n = randi(100,1e6,1);
nMatrix = [floor(data./10), mod(data,10)];
edges = {0:1:9, 0:10:99};
A = hist3(dataMat, edges);
If your A doesn't specify a grid, then you should create all of your random values once and sort them. Then iterate through those values.
Because you know that n(i) >= n(i-1) you don't have to check bins that were too small for n(i-1). This is a very easy way to optimize away most redundant checks.
Here is a snippet that should help a lot in the inner loop, it finds the location of the greatest point that is smaller than your value.
idx1 = A<value
idx2 = A(idx1) == max(A(idx1))
if you want to find the exact location you can wrap it with a find.