Unable to write to matrix lines with parfor - matlab

How can I write into my result matrix lines using parfor?
Code sample:
xCount = 10;
yCount = 20;
area = xCount*yCount;
PP = nan(area,3);
parfor x = 1:10
for y = 1:20
id = y + (x-1)*yCount; % global PP line id.
z = x^2+y*10; % my stuff to get Z.
PP(id,:) = [x y z]; % write to PP line
end
end
The PARFOR loop cannot run due to the way variable 'PP' is used.

I actually says "Valid indices are restricted within PARFOR loops". The reason it says that it that MATLAB iterates through a parfor loop non-consecutive, meaning it can do iterations in semi-random order like 5 2 4 1 3, as opposed to 1 2 3 4 5. This means that in order to know where in PP MATLAB has to store your result, it wants to know before entering the parallel environment that no lines get called by different iterations, as to avoid conflicts when getting results back from the workers.
The solution will be to structure PP in such a way that it's known beforehand where the indices are stores, e.g. by creating a 2D array to use before the loop to store stuff in:
xCount = 10;
yCount = 20;
area = xCount*yCount;
PP(xCount,yCount) = 0;
y=1:yCount;
parfor x = 1:xCount
z = x^2+y.*10; % my stuff to get Z.
PP(x,:) = z; % write to PP line
end
%// Go to the [x y z] format
PP = [repmat((1:yCount).',xCount,1),repmat((1:xCount).',yCount,1), PP(:)];
I'd personally not do the last line in this case, since it stores three doubles for each useful value (z), whilst in the 2D matrix that comes out of the loop it only stores 1 double which can be indexed by simply reading PP(x,y). Thus it costs you 3 times the memory to store the same amount of useful data.

Related

Is there any special rules for nesting if-statement in for-loop in MATLAB?

I am trying to create a signal and then build a discrete-time signal by sampling the CT signal I create first. Until the last for-loop, things work out fine but I need to take N samples seperated by T. Without an if statement, I am getting an index out-of-bounds error and I had to limit sampling within the duration of the signal. For some reason, my code goes into if statement once and no more, and for debugging, I am printing out the values both in if and out of if. Although the logical operation should be true for more than one iteration(printing statements will show the values), it just does not print the statements inside the if-statement. What's wrong here?
function x = myA2D(b,w,p,T,N)
%MYA2D description: Takes in parameters to construct the CT-sampled DT signal
%b,w,p are Mx1 vectors and it returns Nx1 vector.
timeSpace = 0:0.001:3*pi;
xConstT = zeros(size(timeSpace));
%Construct Xc(t) signal
for k = 1:size(b,1)
temp = b(k) .* cos(w(k).*timeSpace + p(k));
xConstT = xConstT + temp;
end
plot(xConstT);
%Sampling CT-Signal to build DT-signal
disp(strcat('xConstT size',int2str(size(xConstT))));**strong text**
x = zeros(N,1);
sizeConstT = size(xConstT);
for i = 0:N-1
index = i .* T .* 1000 + 1;
disp(strcat('indexoo=',int2str(index)));
disp(strcat('xConstSizeeee',int2str(sizeConstT)));
if index <= sizeConstT
disp(strcat('idx=',int2str(index)));
disp(strcat('xSize',int2str(sizeConstT)));
%x(i+1,1) = xConstT(index);
end
end
end
sizeConstT = size(xConstT); creates an 1x2 array so you compare a float to an array, and your code enters the if loop only if comparison to each element of the array is successful. This example illustrates the issue:
if 1 <= [1 12]; disp('one'); end % <- prints 'one'
if 2 <= [1 12]; disp('two'); end % <- prints nothing
Your code will work with sizeConstT = length(xConstT);

how to vectorize array reformatting?

I have a .csv file with data on each line in the format (x,y,z,t,f), where f is the value of some function at location (x,y,z) at time t. So each new line in the .csv gives a new set of coordinates (x,y,z,t), with accompanying value f. The .csv is not sorted.
I want to use imagesc to create a video of this data in the xy-plane, as time progresses. The way I've done this is by reformatting M into something more easily usable by imagesc. I'm doing three nested loops, roughly like this
M = csvread('file.csv');
uniqueX = unique(M(:,1));
uniqueY = unique(M(:,2));
uniqueT = unique(M(:,4));
M_reformatted = zeros(length(uniqueX), length(uniqueY), length(uniqueT));
for i = 1:length(uniqueX)
for j = 1:length(uniqueY)
for k = 1:length(uniqueT)
M_reformatted(i,j,k) = M( ...
M(:,1)==uniqueX(i) & ...
M(:,2)==uniqueY(j) & ...
M(:,4)==uniqueT(k), ...
5 ...
);
end
end
end
once I have M_reformatted, I can loop through timesteps k and use imagesc on M_reformatted(:,:,k). But doing the above nested loops is very slow. Is it possible to vectorize the above? If so, an outline of the approach would be very helpful.
edit: as noted in answers/comments below, I made a mistake in that there are several possible z-values, which I haven't taken into account. If only a single z-value, the above would be ok.
This vectorized solution allows for negative values of x and y and is many times faster than the non-vectorized solution (close to 20x times for the test case at the bottom).
The idea is to sort the x, y, and t values in lexicographical order using sortrows and then using reshape to build the time slices of M_reformatted.
The code:
idx = find(M(:,3)==0); %// find rows where z==0
M2 = M(idx,:); %// M2 has only the rows where z==0
M2(:,3) = []; %// delete z coordinate in M2
M2(:,[1 2 3]) = M2(:,[3 1 2]); %// change from (x,y,t,f) to (t,x,y,f)
M2 = sortrows(M2); %// sort rows by t, then x, then y
numT = numel(unique(M2(:,1))); %// number of unique t values
numX = numel(unique(M2(:,2))); %// number of unique x values
numY = numel(unique(M2(:,3))); %// number of unique y values
%// fill the time slice matrix with data
M_reformatted = reshape(M2(:,4), numY, numX, numT);
Note: I am assuming y refers to the columns of the image and x refers to the rows. If you want these flipped, use M_reformatted = permute(M_reformatted,[2 1 3]) at the end of the code.
The test case I used for M (to compare the result to other solutions) has a NxNxN space with T times slices:
N = 10;
T = 10;
[x,y,z] = meshgrid(-N:N,-N:N,-N:N);
numPoints = numel(x);
x=x(:); y=y(:); z=z(:);
s = repmat([x,y,z],T,1);
t = repmat(1:T,numPoints,1);
M = [s, t(:), rand(numPoints*T,1)];
M = M( randperm(size(M,1)), : );
I don't think you need to vectorize. I think you change your algorithm.
You only need one loop to step through the lines of the CSV file. For every line, you have (x,y,z,t,f) so just store it in M_reformatted where it belongs. Something like this:
M_reformatted = zeros(max(M(:,1)), max(M(:,2)), max(M(:,4)));
for line = 1:size(M,2)
z = M(line, 3);
if z ~= 0, continue; end;
x = M(line, 1);
y = M(line, 2);
t = M(line, 4);
f = M(line, 5);
M_reformatted(x, y, t) = f;
end
Also note that pre-allocating M_reformatted is a very good idea, but your code may have been getting the size wrong (depending on the data). I think using max like I did will always do the right thing.

Fixed Point Iteration

I am new to Matlab and I have to use fixed point iteration to find the x value for the intersection between y = x and y = sqrt(10/x+4), which after graphing it, looks to be around 1.4. I'm using an initial guess of x1 = 0. This is my current Matlab code:
f = #(x)sqrt(10./(x+4));
x1 = 0;
xArray(10) = [];
for i = 1:10
x2 = f(x1);
xArray(i) = x2;
x1 = x1 + 1;
end
plot(xArray);
fprintf('%15.8e\n',xArray);
Now when I run this it seems like my x is approaching 0.8. Can anyone tell me what I am doing wrong?
Well done. You've made a decent start at this.
Lets look at the graphical solution. BTW, this is how I'd have done the graphical part:
ezplot(#(x) x,[-1 3])
hold on
ezplot(#(x) sqrt(10./(x+4)),[-1 3])
grid on
Or, I might subtract the two functions, then looking for a zero of the difference, so where it crosses the x axis.
This is what the fixed point iteration does anyway, trying to solve for x, such that
x = sqrt(10/(x+4))
So how would I change your code to fix it? First of all, I'd want to use more descriptive names for the variables. You don't get charged by the character, and making your code easier to read & follow will pay off greatly in the future for you.
There were a couple of code issues. To initialize a vector, use a form like one of these:
xArray = zeros(1,10);
xArray(1,10) = 0;
Note that if xArray was ALREADY defined because you have been working on this problem, the latter form will only zero out that single element. So the first form is best by a large margin. It affirmatively creates an array, or overwrites an existing array if it is already present in your workspace.
Finally, I like to initialize an array like this with something special, rather than zero, so we can see when an element was overwritten. NaNs are good for this.
Next, there was no need to add one to x1 in your code. Again, I'd strongly suggest using better variable names. It is also a good idea to use comments. Be liberal.
I'd suggest the idea of a convergence tolerance. You can also have an iteration counter.
f = #(x)sqrt(10./(x+4));
% starting value
xcurrent = 0;
% count the iterations, setting a maximum in maxiter, here 25
iter = 0;
maxiter = 25;
% initialize the array to store our iterations
xArray = NaN(1,maxiter);
% convergence tolerance
xtol = 1e-8;
% before we start, the error is set to be BIG. this
% just lets our while loop get through that first iteration
xerr = inf;
% the while will stop if either criterion fails
while (iter < maxiter) && (xerr > xtol)
iter = iter + 1;
xnew = f(xcurrent);
% save each iteration
xArray(iter) = xnew;
% compute the difference between successive iterations
xerr = abs(xnew - xcurrent);
xcurrent = xnew;
end
% retain only the elements of xArray that we actually generated
xArray = xArray(1:iter);
plot(xArray);
fprintf('%15.8e\n',xArray);
What was the result?
1.58113883e+00
1.33856229e+00
1.36863563e+00
1.36479692e+00
1.36528512e+00
1.36522300e+00
1.36523091e+00
1.36522990e+00
1.36523003e+00
1.36523001e+00
1.36523001e+00
For a little more accuracy to see how well we did...
format long g
xcurrent
xcurrent =
1.36523001364783
f(xcurrent)
ans =
1.36523001338436
By the way, it is a good idea to know why the loop terminated. Did it stop for insufficient iterations?
The point of my response here was NOT to do your homework, since you were close to getting it right anyway. The point is to show some considerations on how you might improve your code for future work.
There is no need to add 1 to x1. your output from each iteration is input for next iteration. So, x2 from output of f(x1) should be the new x1. The corrected code would be
for i = 1:10
x2 = f(x1);
xArray(i) = x2;
x1 = x2;
end
f(x)x^3+4*x^2-10 in [1,2] find an approximate root

How can I speed up this call to quantile in Matlab?

I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:
The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.
To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:
function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q)
q=transpose(q);
end
Y = zeros(size(X));
for i = 1:nLevels
% "The variables g and l indicate the entries that are respectively greater than
% or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
% value i to any element that falls in this bucket."
if i ~= nLevels % "The default; doesnt include upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#lt,X,q(i+1,:));
else % "For the final level we include the upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#le,X,q(i+1,:));
end
Y(g & l) = i;
end
Is there anything I can do to speed this up? Can the code be vectorized?
If I understand correctly, you want to know how many items fell in each bucket.
Use:
n = hist(Y,nbins)
Though I am not sure that it will help in the speedup. It is just cleaner this way.
Edit : Following the comment:
You can use the second output parameter of histc
[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then
How About this
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
Y = zeros(size(X));
for i = 1:numel(q)-1
Y = Y+ X>=q(i);
end
This results in the following:
>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)
Y =
1 1 2 2 2 1
q =
1 3.5 7
You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.
I think you shoud use histc
[~,Y] = histc(X,q)
As you can see in matlab's doc:
Description
n = histc(x,edges) counts the number of values in vector x that fall
between the elements in the edges vector (which must contain
monotonically nondecreasing values). n is a length(edges) vector
containing these counts. No elements of x can be complex.
I made a couple of refinements (including one inspired by Aero Engy in another answer) that have resulted in some improvements. To test them out, I created a random matrix of a million rows and 100 columns to run the improved functions on:
>> x = randn(1000000,100);
First, I ran my unmodified code, with the following results:
Note that of the 40 seconds, around 14 of them are spent computing the quantiles - I can't expect to improve this part of the routine (I assume that Mathworks have already optimized it, though I guess that to assume makes an...)
Next, I modified the routine to the following, which should be faster and has the advantage of being fewer lines as well!
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
Y = ones(size(X));
for i = 2:nLevels
Y = Y + bsxfun(#ge,X,q(i,:));
end
The profiling results with this code are:
So it is 15 seconds faster, which represents a 150% speedup of the portion of code that is mine, rather than MathWorks.
Finally, following a suggestion of Andrey (again in another answer) I modified the code to use the second output of the histc function, which assigns entries to bins. It doesn't treat the columns independently, so I had to loop over the columns manually, but it seems to be performing really well. Here's the code:
function [Y q] = levels(X,nLevels)
p = linspace(0,1,nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
q(end,:) = 2 * q(end,:);
Y = zeros(size(X));
for k = 1:size(X,2)
[junk Y(:,k)] = histc(X(:,k),q(:,k));
end
And the profiling results:
We now spend only 4.3 seconds in codes outside the quantile function, which is around a 500% speedup over what I wrote originally. I've spent a bit of time writing this answer because I think it's turned into a nice example of how you can use the MATLAB profiler and StackExchange in combination to get much better performance from your code.
I'm happy with this result, although of course I'll continue to be pleased to hear other answers. At this stage the main performance increase will come from increasing the performance of the part of the code that currently calls quantile. I can't see how to do this immediately, but maybe someone else here can. Thanks again!
You can sort the columns and divide+round the inverse indexes:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
[S,IX]=sort(X);
[grid1,grid2]=ndgrid(1:size(IX,1),1:size(IX,2));
invIX=zeros(size(X));
invIX(sub2ind(size(X),IX(:),grid2(:)))=grid1;
Y=ceil(invIX/size(X,1)*nLevels);
Or you can use tiedrank:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
R=tiedrank(X);
Y=ceil(R/size(X,1)*nLevels);
Surprisingly, both these solutions are slightly slower than the quantile+histc solution.

Mutual Information of MATLAB Matrix

I have a square matrix that represents the frequency counts of co-occurrences in a data set. In other words, the rows represent all possible observations of feature 1, and the columns are the possible observations of feature 2. The number in cell (x, y) is the number of times feature 1 was observed to be x at the same time feature 2 was y.
I want to calculate the mutual information contained in this matrix. MATLAB has a built-in information function, but it takes 2 arguments, one for x and one for y. How would I manipulate this matrix to get the arguments it expects?
Alternatively, I wrote my own mutual information function that takes a matrix, but I'm unsure about its accuracy. Does it look right?
function [mutualinfo] = mutualInformation(counts)
total = sum(counts(:));
pX = sum(counts, 1) ./ total;
pY = sum(counts) ./ total;
pXY = counts ./ total;
[h, w] = size(counts);
mutualinfo = 0;
for row = 1:h
for col = 1:w
mutualinfo = mutualinfo + pXY(row, col) * log(pXY(row, col) / (pX(row)*pY(col)));
end;
end;
end
I don't know of any built-in mutual information functions in MATLAB. Perhaps you got a hold of one of the submissions from the MathWorks File Exchange or some other third-party developer code?
I think there may be something wrong with how you are computing pX and pY. Plus, you can vectorize your operations instead of using for loops. Here's another version of your function to try out:
function mutualInfo = mutualInformation(counts)
pXY = counts./sum(counts(:));
pX = sum(pXY,2);
pY = sum(pXY,1);
mutualInfo = pXY.*log(pXY./(pX*pY));
mutualInfo = sum(mutualInfo(:));
end