Summing values of function whose input is in a certain interval, MATLAB - matlab

The heading might be slightly confusing, but what I want to do is the following:
I have function inputs x,t, outputs y (i.e y = f(x,t)), and a set of ranges xr, tr and I want to do
v = zeros(1,length(xr)-1)
for kk=1:(length(xr)-1)
ix = x >= xr(kk) & x < xr(kk+1) & t >= tr(kk) & t < tr(kk+1)
v(kk) = sum(y(ix));
end
This is very slow, while histc, which does almost the same (except it sums the number of entries in the interval instead of the function output) is very fast. How can this be implemented faster? I tried using arrayfun, but this only gave a 25% increas in speed.
Thanks,

If you use histc with two output arguments, the second output will give you the bin numbers for each data entry. You can use the bin numbers to sum up the entries belonging to each bin, for instance, using bsxfun or accumarray.
[val, id] = histc(x, xr);
v = accumarray(id(:), y(:));

Related

Generating all ordered samples with replacement

I would like to generate an array which contains all ordered samples of length k taken from a set of n elements {a_1,...,a_n}, that is all the k-tuples (x_1,...,x_k) where each x_j can be any of the a_i (repetition of elements is allowed), and whose total number is n^k.
Is there a built-in function in Matlab to obtain it?
I have tried to write a code that iteratively uses the datasample function, but I couldn't get what desired so far.
An alternative way to get all the tuples is based on k-base integer representation.
If you take the k-base representation of all integers from 0 to n^k - 1, it gives you all possible set of k indexes, knowing that these indexes start at 0.
Now, implementing this idea is quite straightforward. You can use dec2base if k is lower than 10:
X = A(dec2base(0:(n^k-1), k)-'0'+1));
For k between 10 and 36, you can still use dec2base but you must take care of letters as there is a gap in ordinal codes between '9' and 'A':
X = A(dec2base(0:(n^k-1), k)-'0'+1));
X(X>=17) = X(X>=17)-7;
Above 36, you must use a custom made code for retrieving the representation of the integer, like this one. But IMO you may not need this as 2^36 is quite huge.
What you are looking for is ndgrid: it generates the grid elements in any dimension.
In the case k is fixed at the moment of coding, get all indexes of all elements a this way:
[X_1, ..., X_k] = ndgrid(1:n);
Then build the matrix X from vector A:
X = [A(X_1(:)), ..., A(X_k(:))];
If k is a parameter, my advice would be to look at the code of ndgrid and adapt it in a new function so that the output is a matrix of values instead of storing them in varargout.
What about this solution, I don't know if it's as fast as yours, but do you think is correct?
function Y = ordsampwithrep(X,K)
%ordsampwithrep Ordered samples with replacement
% Generates an array Y containing in its rows all ordered samples with
% replacement of length K with elements of vector X
X = X(:);
nX = length(X);
Y = zeros(nX^K,K);
Y(1,:) = datasample(X,K)';
k = 2;
while k < nX^K +1
temprow = datasample(X,K)';
%checknew = find (temprow == Y(1:k-1,:));
if not(ismember(temprow,Y(1:k-1,:),'rows'))
Y(k,:) = temprow;
k = k+1;
end
end
end

Count values from range in Matlab

I want to count the number of values in the array. I have a code which works:
Range = [1:10^3];% [1:10^6];
N = 10^2;% 10^8
Data = randi([Range(1),Range(end)],N,1);
Counts = nan(numel(Range),1);
for iRange = 1:numel(Range)
Counts(iRange) = sum(Data==Range(iRange));
end
Could you help me to make this code faster?
I feel that it should be via unique or hist, but I could not find a solution.
N = histcounts(Data,Range)
gives me 999 numbers instead of 1000.
As Ander Biguri stated at a comment, histcounts is what you seek.
The function counts the number of values of X (Data in your example), are found at every bin between two edges, where bins defined as such:
The value X(i) is in the kth bin if edges(k) ≤ X(i) < edges(k+1).
While the last bin also includes the right edges.
This means:
For N values, you need N+1 edges.
Each bin should start at the value you want it to include (1 between 1:2, 2 between 2:3, etc).
In your example:
Counts = histcounts(Data,Range(1):(Range(end)+1))';
I wanted to point out an issue with this code:
Counts = nan(numel(Range),1);
for iRange = 1:numel(Range)
Counts(iRange) = sum(Data==Range(iRange));
end
It shows a single loop, but == and sum work over all elements in the array, making this really expensive compared to a loop that doesn't do so, especially if N is large:
Counts = zeros(numel(Range),1);
for elem = Data(:).'
Counts(elem) = Counts(elem) + 1;
end

Mean value of each column of a matrix

I have a 64 X 64 matrix that I need to find the column-wise mean values for.
However, instead of dividing by the total number of elements in each column (i.e. 64), I need to divide by the total number of non-zeros in the matrix.
I managed to get it to work for a single column as shown below. For reference, the function that generates my matrix is titled fmu2(i,j).
q = 0;
for i = 1:64
if fmu2(i,1) ~= 0;
q = q + 1;
end
end
for i = 1:64
mv = (1/q).*sum(fmu2(i,1));
end
This works for generating the "mean" value of the first column. However, I'm having trouble looping this procedure so that I will get the mean for each column. I tried doing a nested for loop, but it just calculated the mean for the entire 64 X 64 matrix instead of one column at a time. Here's what I tried:
q = 0;
for i = 1:64
for j = 1:64
if fmu2(i,j) ~= 0;
q = q +1;
end
end
end
for i = 1:64
for j = 1:64
mv = (1/q).*sum(fmu2(i,j));
end
end
Like I said, this just gave me one value for the entire matrix instead of 64 individual "means" for each column. Any help would be appreciated.
For one thing, do not call the function that generates your matrix in each iteration of a loop. This is extremely inefficient and will cause major problems if your function is complex enough to have side effects. Store the return value in a variable once, and refer to that variable from then on.
Secondly, you do not need any loops here at all. The total number of nonzeros is given by the nnz function (short for number of non-zeros). The sum function accepts an optional dimension argument, so you can just tell it to sum along the columns instead of along the rows or the whole matrix.
m = fmu2(i,1)
averages = sum(m, 1) / nnz(m)
averages will be a 64-element array with an average for each column, since sum(m, 1) is a 64 element sum along each column and nnz(m) is a scalar.
One of the great things about MATLAB is that it provides vectorized implementations of just about everything. If you do it right, you should almost never have to use an explicit loop to do any mathematical operations at all.
If you want the column-wise mean of non-zero elements you can do the following
m = randi([0,5], 5, 5); % some data
avg = sum(m,1) ./ sum(m~=0,1);
This is a column-wise sum of values, divided by the column-wise number of elements not equal to 0. The result is a row vector where each element is the average of the corresponding column in m.
Note this is very flexible, you could use any condition in place of ~=0.

Optimize nested for loop for calculating xcorr of matrix rows

I have 2 nested loops which do the following:
Get two rows of a matrix
Check if indices meet a condition or not
If they do: calculate xcorr between the two rows and put it into new vector
Find the index of the maximum value of sub vector and replace element of LAG matrix with this value
I dont know how I can speed this code up by vectorizing or otherwise.
b=size(data,1);
F=size(data,2);
LAG= zeros(b,b);
for i=1:b
for j=1:b
if j>i
x=data(i,:);
y=data(j,:);
d=xcorr(x,y);
d=d(:,F:(2*F)-1);
[M,I] = max(d);
LAG(i,j)=I-1;
d=xcorr(y,x);
d=d(:,F:(2*F)-1);
[M,I] = max(d);
LAG(j,i)=I-1;
end
end
end
First, a note on floating point precision...
You mention in a comment that your data contains the integers 0, 1, and 2. You would therefore expect a cross-correlation to give integer results. However, since the calculation is being done in double-precision, there appears to be some floating-point error introduced. This error can cause the results to be ever so slightly larger or smaller than integer values.
Since your calculations involve looking for the location of the maxima, then you could get slightly different results if there are repeated maximal integer values with added precision errors. For example, let's say you expect the value 10 to be the maximum and appear in indices 2 and 4 of a vector d. You might calculate d one way and get d(2) = 10 and d(4) = 10.00000000000001, with some added precision error. The maximum would therefore be located in index 4. If you use a different method to calculate d, you might get d(2) = 10 and d(4) = 9.99999999999999, with the error going in the opposite direction, causing the maximum to be located in index 2.
The solution? Round your cross-correlation data first:
d = round(xcorr(x, y));
This will eliminate the floating-point errors and give you the integer results you expect.
Now, on to the actual solutions...
Solution 1: Non-loop option
You can pass a matrix to xcorr and it will perform the cross-correlation for every pairwise combination of columns. Using this, you can forego your loops altogether like so:
d = round(xcorr(data.'));
[~, I] = max(d(F:(2*F)-1,:), [], 1);
LAG = reshape(I-1, b, b).';
Solution 2: Improved loop option
There are limits to how large data can be for the above solution, since it will produce large intermediate and output variables that can exceed the maximum array size available. In such a case for loops may be unavoidable, but you can improve upon the for-loop solution above. Specifically, you can compute the cross-correlation once for a pair (x, y), then just flip the result for the pair (y, x):
% Loop over rows:
for row = 1:b
% Loop over upper matrix triangle:
for col = (row+1):b
% Cross-correlation for upper triangle:
d = round(xcorr(data(row, :), data(col, :)));
[~, I] = max(d(:, F:(2*F)-1));
LAG(row, col) = I-1;
% Cross-correlation for lower triangle:
d = fliplr(d);
[~, I] = max(d(:, F:(2*F)-1));
LAG(col, row) = I-1;
end
end

MATLAB: Indexing a large matrix for Monte Carlo Simulation

I'm trying to index a large matrix in MATLAB that contains numbers monotonically increasing across rows, and across columns, i.e. if the matrix is called A, for every (i,j), A(i+1,j) > A(i,j) and A(i,j+1) > A(i,j).
I need to create a random number n and compare it with the values of the matrix A, to see where that random number should be placed in the matrix A. In other words, the value of n may not equal any of the contents of the matrix, but it may lie in between any two rows and any two columns, and that determines a "bin" that identifies its position in A. Once I find this position, I increment the corresponding index in a new matrix of the same size as A.
The problem is that I want to do this 1,000,000 times. I need to create a random number a million times and do the index-checking for each of these numbers. It's a Monte Carlo Simulation of a million photons coming from a point landing on a screen; the matrix A consists of angles in spherical coordinates, and the random number is the solid angle of each incident photon.
My code so far goes something like this (I haven't copy-pasted it here because the details aren't important):
for k = 1:1000000
n = rand(1,1)*pi;
for i = length(A(:,1))
for j = length(A(1,:))
if (n > A(i-1,j)) && (n < A(i+1,j)) && (n > A(i,j-1)) && (n < A(i,j+1))
new_img(i,j) = new_img(i,j) + 1; % new_img defined previously as zeros
end
end
end
end
The "if" statement is just checking to find the indices of A that form the bounds of n.
This works perfectly fine, but it takes ridiculously long, especially since my matrix A is an image of dimensions 11856 x 11000. is there a quicker / cleverer / easier way of doing this?
Thanks in advance.
You can get rid of the inner loops by performing the calculation on all elements of A at once. Also, you can create the random numbers all at once, instead of one at a time. Note that the outermost pixels of new_img can never be different from zero.
randomNumbers = rand(1,1000000)*pi;
new_img = zeros(size(A));
tmp_img = zeros(size(A)-2);
for r = randomNumbers
tmp_img = tmp_img + A(:,1:end-2)<r & A(:,3:end)>r & A(1:end-1,:)<r & A(3:end,:)>r;
end
new_img(2:end-1,2:end-1) = tmp_img;
/aside: If the arrays were smaller, I'd have used bsxfun for the comparison, but with the array sizes in the OP, the approach would run out of memory.
Are the values in A bin edges? Ie does A specify a grid? If this is the case then you can QUICKLY populate A using hist3.
Here is an example:
numRand = 1e
n = randi(100,1e6,1);
nMatrix = [floor(data./10), mod(data,10)];
edges = {0:1:9, 0:10:99};
A = hist3(dataMat, edges);
If your A doesn't specify a grid, then you should create all of your random values once and sort them. Then iterate through those values.
Because you know that n(i) >= n(i-1) you don't have to check bins that were too small for n(i-1). This is a very easy way to optimize away most redundant checks.
Here is a snippet that should help a lot in the inner loop, it finds the location of the greatest point that is smaller than your value.
idx1 = A<value
idx2 = A(idx1) == max(A(idx1))
if you want to find the exact location you can wrap it with a find.