I have the following matlab code in a project of mine. image_working at this point is a logical image, the result of edge detection. The below loop expands each white point to be essentially a cross with width width (this is so a later call to imfill() will find more closed regions. The four if statements check that each point is within the original bounds.
[edge_row, edge_col] = find(image_working);
for width = 1:width_edge_widen
for i = 1:length(edge_row)
if (edge_row(i) + width <= m)
image_working(edge_row(i) + width, edge_col(i)) = 1;
end
if (edge_row(i) - width >= 1)
image_working(edge_row(i) - width, edge_col(i)) = 1;
end
if (edge_col(i) + width <= n)
image_working(edge_row(i), edge_col(i) + width) = 1;
end
if (edge_col(i) - width >= 1)
image_working(edge_row(i), edge_col(i) - width) = 1;
end
end
end
I suspect there's a good way to vectorize it and avoid the top-level loop, but I'm at a loss as to how to go about it. Simply indexing (like image_working(edge_row, edge_col)) doesn't work, since this will give give a rectangular region rather than the individual points. Linear indexing (calling inds = find(image_working)) is undesirable because it's difficult to do both vertical and horizontal shifts, although there may well be a vectorizable transform on the indices that I haven't thought of. Any advice?
First, a "vectorized" solution will not neseserally be the fastest here, this depends in how sparse your binary image is. Second, here are a few solutions that will be faster than your code:
First create a random binary image
im0=rand(2000)>0.999; % this means sparsity (density) ~ 1e-3
Solution 1 - for loop (for a cross of width 1, but you can change it as needed):
im=im0;
sd=size(im);
width=1;
[x y]=find(im((1+width):sd(1)-(width+1), (1+width):sd(2)-(width+1)));
x=x+width; y=y+width;
for n=1:numel(y)
im(x(n)-width:x(n)+width,y(n))=1;
im(x(n),y(n)-width:y(n)+width)=1;
end
Solution 2 - vectorized and one line (for a cross of width 1):
im=conv2(single(im0),[0 1 0; 1 1 1; 0 1 0],'same')>0;
Solution 3 - vectorized logical indexing (for a cross of width 1)
im =( im0(2:end-1,2:end-1) | im0(1:end-2,2:end-1) |...
im0(2:end-1,1:end-2) | im0(3:end ,2:end-1) |...
im0(2:end-1,3:end));
im =[zeros(1,size(im,2)); im; zeros(1,size(im,2))];
im= [zeros(size(im0,1),1) im zeros(size(im,1),1)];
You'll see that for sparse images the for loop will be faster than the other methods.
Solution 1: Elapsed time is 0.028668 seconds.
Solution 2: Elapsed time is 0.041758 seconds.
Solution 3: Elapsed time is 0.120594 seconds.
For less sparse images say (~1%) you can use the vectorized solution (Solution 2), as the for loop will very quickly become less efficient but I would check performance on your data before deciding.
Bonus edit, for fun I vectorized the cross filter in solution 2 to have arbitrary width as follows:
f=#(width) circshift(vander([1 zeros(1,2*width)]),[width -width]);
so im=conv2(single(im0),f(1),'same')>0; is equivalent to what is written in Solution 2, but now you can use any f(width_size) you want.
Related
I have a scatter plot of approximately 30,000 pts, all of which lie above a horizontal line which I've visually defined in my plot. My goal now is to sum the vertical distance of all of these points to this horizontal line.
The data was read in from a .csv file and is already saved to the workspace, but I also need to check whether a value is NaN, and ignore these.
This is where I'm at right now:
vert_deviation = 0;
idx = 1;
while idx <= numel(my_data(:,5)) && isnan(idx) == 0
vert_deviation = vert_deviation + ((my_data(idx,5) - horiz_line_y_val));
idx = idx + 1;
end
I know that a prerequisite of using the && operator is having two logical statements I believe, but I'm not sure how to rewrite this loop in this way at the moment. I also don't understant why vert_deviation returns NaN at the moment, but I assume this might have to do with the first mistake I described...
I would really appreciate some guidance here - thank you in advance!
EDIT: The 'horizontal line' is a slight oversimplification - in reality the lower limit I need to find the distance to consists of 6 different line segments
I should have specified that the lower limit to which I need to calculate the distance for all scatterplot points varies for different x values (the horizontal line snippet was meant to be a simplification but may have been misleading... apologies for that)
I first modified the data I had already read into the workspace by replacing all NaNvalues with 0. Next, I wrote a while loop which defines the number if indexes to loop through, and defined an && condition to filter out any zeroes. I then wrote a nested if loop which checks what range of x values the given index falls into, and subsequently takes the delta between the y values of a linear line lower limit for that section of the plot and the given point. I repeated this for all points.
while idx <= numel(my_data(:,3)) && not(my_data(idx,3) == 0)
...
if my_data(idx,3) < upper_x_lim && my_data(idx,5) > lower_x_lim
vert_deviation = vert_deviation + (my_data(idx,4) - (m6 * (my_data(idx,5)) + b6))
end
...
m6 and b6 in this case are the slope and y intercept calculated for one section of the plot. The if loop is repeated six times for each section of the lower limit.
I'm sure there are more elegant ways to do this, so I'm open to any feedback if there's room for improvement!
Your loop doesn't exclude NaN values becuase isnan(idx) == 0 checks to see if the index is NaN, rather than checking if the data point is NaN. Instead, check for isnan(my_data(idx,5)).
Also, you can simplify your code using for instead of while:
vert_deviation = 0;
for idx=1:size(my_data,1)
if !isnan(my_data(idx,5))
vert_deviation = vert_deviation + ((my_data(idx,5) - horiz_line_y_val));
end
end
As #Adriaan suggested, you can remove the loop altogether, but it seems that the code in the OP is an oversimplification of the problem. Looking at the additional code posted, I guess it is still possible to remove the loops, but I'm not certain it will be a significant speed improvement. Just use a loop.
for some simulations, I need to make use of an approximation of the exponential function. Now, the problem that I have is that:
function s=expone(N,k)
s=0
for j=1:k
s=s+(exp(-N+j*log(N)-log(factorial(j))));
end
end
is a pretty stable, in the sense that it is almost 1 for k large enough. However, as soon as N is bigger than 200, it quickly drops to zero. How can I improve that, I need large N. I cannot really change the mathematical why of writing this, since I have an additional pertubation, my final code will look something lie:
function s=expone(N,k)
s=0
for j=1:k
s=s+(exp(-N+j*log(N)-log(factorial(j))))*pertubation(N,k);
end
end
THe pertubation is between 0 and 1, so that's no problem, but the prefactor seems not to work for N>200. Can anyone help?
Thanks a lot!
The function log(x) - x has positive and negative part
Graphic in Wolframalpha
while x - log(x!) is negative for x>= 0
Graphic in Wolframalpha
So the problem arise when (N - log(N) ) is much greater than (j - log(j) ). So the solution is to choose a j much bigger than N . Exp(negative) tends to zero
for example expone(20,1) = 7.1907e-05 but expone(20,20) = 0.5591 and expone (20,50) = 1.000
As conclusion, if you want to work with N big, j should be bigger, and as an extra tip you may want to change you function to avoid for loops:
function s = expone(N,k)
j = 1:k;
s = sum ((exp(-N+j*log(N)-log(factorial(j)))));
end
I have written a code where i have to control, if the position (x,y) (saved in the Matrix Mat) is inside of a circular object which is centered at (posx,posy). If so the point gets a value val otherwise its zero.
My Code looks like this but as a matter of fact it is advertised to NOT use loops in matlab. Since i use not 1 but 2 loops, i was wondering if there is a more effective way for solving my problem.
Mat = zeros(300); %creates my coordinate system with zeros
...
for i =lowlimitx:highlimitx %variable boundary of my object
for j=lowlimity:highlimity
helpsqrdstnc = abs(posx-i)^2 + abs(posy-j)^2; %square distance from center
if helpsqrdstnc < radius^2
Mat(i,j)= val(helpsqrdstnc);
end
end
end
the usual way to optimize matlab code is to vectorize the operations. This is because built in functions and operators is in general much faster. For your case this would leave you with this code:
Mat = zeros(300); %creates my coordinate system with zeros
...
xSq = abs(posx-(lowlimitx:highlimitx)).^2;
ySq = abs(posy-(lowlimity:highlimity)).^2;
helpsqrdstnc = bsxfun(#plus,xSq,ySq.'); %bsxfun to do [xSq(1)+ySq(1),xSq(2)+ySq(1),...; xSq(1)+ySq(2),xSq(2)+ySq(2)...; ...]
Mat(helpsqrdstnc < radius^2)= val(helpsqrdstnc(helpsqrdstnc < radius^2));
where helpsqrdstnc must be the same size as Mat. There may also be neseccary to do a reshape here, but you will notice that by yourself if you get a column vector.
This does of course assume that radius, posx and posy is constant, but reading the question this seems to be the case. However, I do not know exactly how val looks, so it I have not managed to test the code. I also think that val(helpsqrdstnc) is tedious, since this refer to the distance, which does not neseccarily need to be an integer.
I've been getting into Matlab more and more lately and another question came up during my latest project.
I generate several rectangles (or meshs) within an overall boundary.
These meshs can have varying spacings/intervals.
I do so, because I want to decrease the mesh/pixel resolution of certain areas of a digital elevation model. So far, everything works fine.
But because the rectangles can be chosen in a GUI, it might happen that the rectangles overlap. This overlap is what I want to find, and remove. Would they have the same spacing, e.g. rectangle 1&2 would look something like this:
[t1x, t1y] = meshgrid(1:1:9,1:1:9);
[t2x, t2y] = meshgrid(7:1:15,7:1:15);
[t3x, t3y] = meshgrid(5:1:17,7:1:24);
In this case, I could just use unique, to find the overlapping areas.
However, they look more like this:
[t1x, t1y] = meshgrid(1:2:9,1:2:9);
[t2x, t2y] = meshgrid(7:3:15,7:3:15);
[t3x, t3y] = meshgrid(5:4:17,7:4:24);
Therefore, unique cannot be applied, because mesh 1 might very well overlap with mesh 2 without having the same nodes. For convenience and further processing, all rectangles / meshes are brought into column notation and put in one result matrix within my code:
result = [[t1x(:), t1y(:)]; [t2x(:), t2y(:)]; [t3x(:), t3y(:)]];
Now I was thinking about using 2 nested for-loops to solve this problem, sth like this (which does not quite work yet):
res = zeros(length(result),1);
for i=1:length(result)
currX = result(i,1);
currY = result(i,2);
for j=1:length(result)
if result(j,1)< currX < result(j+1,1) && result(j,2)< currY < result(j+1,2)
res(j) = 1;
end
end
end
BUT: First of all, this does not quite work yet, because I get an out of bounds error due to length(result)=j+1 and moreover, res(j) = 1 seems to get overwritten by the loop.
But this was just for testing and demonstratin anyway.
Because the meshes shown here are just examples, and the ones I use are fairly big, the result Matrix contains up to 2000x2000 = 4 mio nodes --> lenght(result) ~4mio.
Putting this into a nested for-loop running over the entire length will most likely kill my memory.
Therefore I was hoping to find a sophisticade solution which does not require a nested loop, but takes advantage of Matlabs find and clever matrix indexing.
I am not able to think of something, but was hoping to get help here.
Discussions and help is very much appreciated!
Cheers,
Theo
Here follows a quick stab (not extensively tested):
% Example meshes
[t1x, t1y] = meshgrid(1:2:9,1:2:9);
[t2x, t2y] = meshgrid(7:3:15,7:3:15);
% Group points for convenience
A = [t1x(:), t1y(:)];
B = [t2x(:), t2y(:)];
% Compare which points of A within edges of B (and viceversa)
idxA = A(:,1) >= B(1,1) & A(:,1) <= B(end,1) & A(:,2) >= B(1,2) & A(:,2) <= B(end,2);
idxB = B(:,1) >= A(1,1) & B(:,1) <= A(end,1) & B(:,2) >= A(1,2) & B(:,2) <= A(end,2);
% Plot result of identified points
plot(A(:,1),A(:,2), '*r')
hold on
plot(B(:,1),B(:,2), '*b')
plot([A(idxA,1); B(idxB,1)], [A(idxA,2); B(idxB,2)], 'sk')
I squared the points that were identified as overlapping:
Also, related to your question is this Puzzler: overlapping rectangles by Doug Hull of TMW.
Suppose I have a vector J of jump sizes and an initial starting point X_0. Also I have boundaries 0, B (assume 0 < X_0 < B). I want to do a random walk where X_i = [min(X_{i-1} + J_i,B)]^+. (positive part). Basically if it goes over a boundary, it is made equal to the boundary. Anyone know a vectorized way to do this? The current way I am doing it consists of doing cumsums and then finding places where it violates a condition, and then starting from there and repeating the cumsum calculation, etc until I find that I stop violating the boundaries. It works when the boundaries are rarely hit, but if they are hit all the time, it basically becomes a for loop.
In the code below, I am doing this across many samples. To 'fix' the ones that go out of the boundary, I have to loop through the samples to check...(don't think there is a vectorized 'find')
% X_init is a row vector describing initial resource values to use for
% each sample
% J is matrix where each col is a sequence of Jumps (columns = sample #)
% In this code the jumps are subtracted, but same thing
X_intvl = repmat(X_init,NumJumps,1) - cumsum(J);
X = [X_init; X_intvl];
for sample = 1:NumSamples
k = find(or(X_intvl(:,sample) > B, X_intvl(:,sample) < 0),1);
while(~isempty(k))
change = X_intvl(k-1,sample) - X_intvl(k,sample);
X_intvl(k:end,sample) = X_intvl(k:end,sample)+change;
k = find(or(X_intvl(:,sample) > B, X_intvl(:,sample) < 0),1);
end
end
Interesting question (+1).
I faced a similar problem a while back, although slightly more complex as my lower and upper bound depended on t. I never did work out a fully-vectorized solution. In the end, the fastest solution I found was a single loop which incorporates the constraints at each step. Adapting the code to your situation yields the following:
%# Set the parameters
LB = 0; %# Lower bound
UB = 5; %# Upper bound
T = 100; %# Number of observations
N = 3; %# Number of samples
X0 = (1/2) * (LB + UB); %# Arbitrary start point halfway between LB and UB
%# Generate the jumps
Jump = randn(N, T-1);
%# Build the constrained random walk
X = X0 * ones(N, T);
for t = 2:T
X(:, t) = max(min(X(:, t-1) + Jump(:, t-1), UB), 0);
end
X = X';
I would be interested in hearing if this method proves faster than what you are currently doing. I suspect it will be for cases where the constraint is binding in more than one or two places. I can't test it myself as the code you provided is not a "working" example, ie I can't just copy and paste it into Matlab and run it, as it depends on several variables for which example (or simulated) values are not provided. I tried adapting it myself, but couldn't get it to work properly?
UPDATE: I just switched the code around so that observations are indexed on columns and samples are indexed on rows, and then I transpose X in the last step. This will make the routine more efficient, since Matlab allocates memory for numeric arrays column-wise - hence it is faster when performing operations down the columns of an array (as opposed to across the rows). Note, you will only notice the speed-up for large N.
FINAL THOUGHT: These days, the JIT accelerator is very good at making single loops in Matlab efficient (double loops are still pretty slow). Therefore personally I'm of the opinion that every time you try and obtain a fully-vectorized solution in Matlab, ie no loops, you should weigh up whether the effort involved in finding a clever solution is worth the slight gains in efficiency to be made over an easier-to-obtain method that utilizes a single loop. And it is important to remember that fully-vectorized solutions are sometimes slower than solutions involving single loops when T and N are small!
I'd like to propose another vectorized solution.
So, first we should set the parameters and generate random Jumpls. I used the same set of parameters as Colin T Bowers:
% Set the parameters
LB = 0; % Lower bound
UB = 20; % Upper bound
T = 1000; % Number of observations
N = 3; % Number of samples
X0 = (1/2) * (UB + LB); % Arbitrary start point halfway between LB and UB
% Generate the jumps
Jump = randn(N, T-1);
But I changed generation code:
% Generate initial data without bounds
X = cumsum(Jump, 2);
% Apply bounds
Amplitude = UB - LB;
nsteps = ceil( max(abs(X(:))) / Amplitude - 0.5 );
for ii = 1:nsteps
ind = abs(X) > (1/2) * Amplitude;
X(ind) = Amplitude * sign(X(ind)) - X(ind);
end
% Shifting X
X = X0 + X;
So, instead of for loop I'm using cumsum function with smart post-processing.
N.B. This solution works significantly slower than Colin T Bowers's one for tight bounds (Amplitude < 5), but for loose bounds (Amplitude > 20) it works much faster.