I have to implement the steepest descent method and test it on functions of two variables, using Matlab. Here's what I did so far:
x_0 = [0;1.5]; %Initial guess
alpha = 1.5; %Step size
iteration_max = 10000;
tolerance = 10e-10;
% Two anonymous function to compute 1st and 2nd entry of gradient
f = #(x,y) (cos(y) * exp(-(x-pi)^2 - (y-pi)^2) * (sin(x) - 2*cos(x)*(pi-x)));
g = #(x,y) (cos(x) * exp(-(x-pi)^2 - (y-pi)^2) * (sin(y) - 2*cos(y)*(pi-y)));
%Initiliazation
iter = 0;
grad = [1; 1]; %Gradient
while (norm(grad,2) >= tolerance)
grad(1,1) = f(x_0(1), x_0(2));
grad(2,1) = g(x_0(1), x_0(2));
x_new = x_0 - alpha * grad; %New solution
x_0 = x_new %Update old solution
iter = iter + 1;
if iter > iter_max
break
end
end
The problem is that, compared to for example the results from WolframAlpha, I do not obtain the same values. For this particular function, I should obtain either (3.14,3.14) or (1.3,1.3) but I obtain (0.03, 1.4).
You should know that this method is a local search and thus it can stuck in local minimum depending on the initial guess and step size.
With a different initial guess, it will find a different local minimum.
Step size is important because a big stepsize can prevent the algorithm from converging. A small stepsize makes the algorithm really slow. This is why you should adapt the size of the steps as the function value decreases.
Always it is a good idea to understand the function you want to optimize by plotting it (if possible). The function you are working with looks like this (in range [-pi pi]):
with the following parameter values you will get to the local minimum you are looking for.
x_0 = [2;2]; %Initial guess
alpha = 0.5; %Step size
Related
I want to generate a random series x with length N through the following rule related to non-central chi-square distribution:
xn+1~χν2(λxn)
where ν is a given constant representing degrees of freedom, λ is also pre-specified and the multiplication of λ and xn is the non-centrality parameter, x1 is supposed to be given.
I wrote the following code to generate such sequence and time the running with x1=0.04, ν=0.005, λ=100 and N=1e5:
tic;
N = 1e5;
x = zeros(1,N);
x(1) = 0.04;
nu = 0.005;
lambda = 100;
for i = 1:N-1
x(i+1) = ncx2rnd(nu,lambda*x(i));
end
toc;
To illustrate my question, I have tested another example, which is different from above. Here I considered generating N=1e5 samples from the distribution χν2(λ) with ν=0.005 and λ=100:
tic;
N = 1e5;
x = zeros(1,N);
nu = 0.005;
lambda = 100;
for i = 1:N
x(i) = ncx2rnd(nu,lambda);
end
toc;
tic;
N = 1e5;
nu = 0.005;
lambda = 100;
x = ncx2rnd(nu,lambda*ones(1,N));
toc;
These two approaches work equivalently. However, it turns out that the second approach which does not use for-loop is much faster than the first one. The difference between both examples is, in the second example, the rule to generate some sample does not require the information of previous samples, which is not the case in the first, therefore all samples can be generated simultaneously without using for-loop. Based on this I wonder whether avoiding for-loop would accelerate the code execution. So would there be any MATLAB built-in function to generate random series shown in the first example without using for-loop when the rule of dependence on previous samples is explicit? If the rule is linear I know the function filter would be a possible choice, what about cases like the first example?
Logically it's impossible to calculate something iterative without doing the iterations. If x(n+1) is dependent on x(n) then you must calculate x(n) first, there is no "clever trick" here.
That just leaves us to optimise the calculation within the loop, specifically ncx2rnd. As with most MATLAB in-built functions, it is already fairly concise and performant, but there are some things to consider. Note that what I'm about to suggest involves using edit ncx2nrd to look inside this in-built function which contains code under MathWorks copyright, I'm simply noting observations about how it works.
There are some input checks to handle incorrectly sized inputs and/or inputs with negative values. If you can take the burden of validation on yourself (i.e. you know your inputs are valid) then you can reduce the function to its single mathematical operation:
% function r = ncx2rnd(v,delta)
r = 2.*randg(poissrnd(delta./2, sizeOut)) + 2.*randg(v./2,sizeOut);
Running this standalone saves around 20% of the processing time, which was for input validation (with a nominal N=1e5).
In the MathWorks syntax, delta is equal to your lambda*x(i), the other term including v is independent of your x, so you could compute it outside of the loop, i.e. vectorising one of the calls to randg. Again using N=1e5 this brings the total time saving to around 25%.
The result would mean this change to your example:
% Common inputs
N = 1e5;
nu = 0.1;
lambda = 0.1;
% Baseline example
x = zeros(1,N);
x(1) = 0.04;
for i = 1:N-1
x(i+1) = ncx2rnd(nu,lambda*x(i));
end
% ~25% faster alternative, with no input validation and partially vectorised
x = zeros(1,N);
x(1) = 0.04;
vTerm = 2.*randg(nu./2, [1,N]);
for i = 1:N-1
x(i+1) = 2.*randg(poissrnd(lambda*x(i)./2, [1,1])) + vTerm(i);
end
I have a system of ODE equations which I want to solve, but there is a tricky part that when the system reaches steady-state, I would like to change the value of one (or more) parameters. For example, consider the following:
function dydt = diff(t,x,params)
F = params(1);
G = params(2);
dydt = zeros(2,1);
dydt(1) = F*x(1) - G*x(1)*x(2);
dydt(2) = (F-G)*x(2);
end
I would like my code to work such that when the system has reached steady-state, the value of F is changed to 10 and the value of G is changed to 2, for example. I was thinking of detecting the values of dydt(1) and dydt(2) by using, for example,
if norm(dydt)<1
F = 10;
G = 2;
end
How do I do that for ODE expression in Matlab? If I put this if condition before the ODE expression, I the value of dydt will always be zero. But if I put this If condition after the ODE expression, the If conditions will have no use to correct the ODE expression.
Thank you!
A parameter of an ODE is assumed to be fixed and does not depend on the state of the system. What you're trying to do is simulate a piecewise continuous ODE. This sort of problem is usually solved with event location (assuming a solution exists). You need to stop the simulation at the key point, change your parameters, and restart a new simulation with initial conditions the same as those at the end of the previous.
Here's a little example with your ODE function. I don't know your initial conditions, initial parameter values, or other settings, so this is just to demonstrate this scheme:
function eventsdemo
params = [-1.5 1];
tspan = [0 10];
x0 = [1;1];
opts = odeset('Events',#events);
[t1,x1] = ode45(#(t,x)f(t,x,params),tspan,x0,opts); % Simulate with events
% Change parameters, set initial conditions based on end of previous run
params = [1.5 1];
x0 = x1(end,:);
tspan = [t1(end) 10];
[t2,x2] = ode45(#(t,x)f(t,x,params),tspan,x0); % Simulate again
% Concatenate results, removing duplicate points
t = [t1;t2(2:end)];
x = [x1;x2(2:end,:)];
figure;
plot(t,x);
hold on;
plot(t2(1),x2(1,:),'k*'); % Plot event location
function dxdt = f(t,x,params) %#ok<INUSL>
F = params(1);
G = params(2);
dxdt = [F*x(1) - G*x(1)*x(2);
(F-G)*x(2)];
function [value,isterminal,direction] = events(t,x) %#ok<INUSL>
value = norm(x)-1e-3; % Don't try to detect exact zero for asymptotics
isterminal = true;
direction = -1;
In this case the solution asymptotically approaches a stable fixed point at (0,0). It's important to not try to detect (0,0) exactly as the solution may never reach that point. You can instead use a small tolerance. However, depending on your system, it's possible that choosing this tolerance could impact the behavior after you change parameters. You could also consider reformulating this problem as a boundary value problem. I don't know what you're trying to do with this system so I can't say much else (and it would probably be off-topic for this site).
My approach
fun = #(y) (1/sqrt(pi))*exp(-(y-1).^2).*log(1 + exp(-4*y))
integral(fun,-Inf,Inf)
This gives NaN.
So I tried plotting it.
y= -10:0.1:10;
plot(y,exp(-(y-1).^2).*log(1 + exp(-4*y)))
Then understood that domain (siginificant part) is from -4 to +4.
So changed the limits to
integral(fun,-10,10)
However I do not want to always plot the graph and then know its limits. So is there any way to know the integral directly from -Inf to Inf.
Discussion
If your integrals are always of the form
I would use a high-order Gauss–Hermite quadrature rule.
It's similar to the Gauss-Legendre-Kronrod rule that forms the basis for quadgk but is specifically tailored for integrals over the real line with a standard Gaussian multiplier.
Rewriting your equation with the substitution x = y-1, we get
.
The integral can then be computed using the Gauss-Hermite rule of arbitrary order (within reason):
>> order = 10;
>> [nodes,weights] = GaussHermiteRule(order);
>> f = #(x) log(1 + exp(-4*(x+1)))/sqrt(pi);
>> sum(f(nodes).*weights)
ans =
0.1933
I'd note that the function below builds a full order x order matrix to compute nodes, so it shouldn't be made too large.
There is a way to avoid this by explicitly computing the weights, but I decided to be lazy.
Besides, event at order 100, the Gaussian multiplier is about 2E-98, so the integrand's contribution is extremely minimal.
And while this isn't inherently adaptive, a high-order rule should be sufficient in most cases ... I hope.
Code
function [nodes,weights] = GaussHermiteRule(n)
% ------------------------------------------------------------------------------
% Find the nodes and weights for a Gauss-Hermite Quadrature integration.
%
if (n < 1)
error('There is no Gauss-Hermite rule of order 0.');
elseif (n < 0) || (abs(n - round(n)) > eps())
error('Given order ''n'' must be a strictly positive integer.');
else
n = round(n);
end
% Get the nodes and weights from the Golub-Welsch function
n = (0:n)' ;
b = n*0 ;
a = b + 0.5 ;
c = n ;
[nodes,weights] = GolubWelsch(a,b,c,sqrt(pi));
end
function [xk,wk] = GolubWelsch(ak,bk,ck,mu0)
%GolubWelsch
% Calculate the approximate* nodes and weights (normalized to 1) of an orthogonal
% polynomial family defined by a three-term reccurence relation of the form
% x pk(x) = ak pkp1(x) + bk pk(x) + ck pkm1(x)
%
% The weight scale factor mu0 is the integral of the weight function over the
% orthogonal domain.
%
% Calculate the terms for the orthonormal version of the polynomials
alpha = sqrt(ak(1:end-1) .* ck(2:end));
% Build the symmetric tridiagonal matrix
T = full(spdiags([[alpha;0],bk,[0;alpha]],[-1,0,+1],length(alpha),length(alpha)));
% Calculate the eigenvectors and values of the matrix
[V,xk] = eig(T,'vector');
% Calculate the weights from the eigenvectors - technically, Golub-Welsch requires
% a normalization, but since MATLAB returns unit eigenvectors, it is omitted.
wk = mu0*(V(1,:).^2)';
end
I've had success with transforming such infinite-bounded integrals using a numerical variable transformation, as explained in Numerical Recipes 3e, section 4.5.3. Basically, you substitute in y=c*tan(t)+b and then numerically integrate over t in (-pi/2,pi/2), which sweeps y from -infinity to infinity. You can tune the values of c and b to optimize the process. This approach largely dodges the question of trying to determine cutoffs in the domain, but for this to work reliably using quadrature you have to know that the integrand does not have features far from y=b.
A quick and dirty solution would be to look for a position, where your function is sufficiently small enough and then taking it as limits. This assumes that for x>0 the function fun decreases montonically and fun(x) is roughly the same size as fun(-x) for all x.
%// A small number
epsilon = eps;
%// Stepsize for searching bound
stepTest = 1;
%// Starting position for searching bound
position = 0;
%// Not yet small enough
smallEnough = false;
%// Search bound
while ~smallEnough
smallEnough = (fun(position) < eps);
position = position + stepTest;
end
%// Calculate integral
integral(fun, -position, position)
If your were happy with plotting the function, deciding by eye where you can cut, then this code will suffice, I guess.
I am working on a problem involves my using the Euler Method to approximate the differential equation df/dt= af(t)−b[f(t)]^2, both when b=0 and when b is not zero; and I am to compare the analytic solution to the approximate solution when b=0.
f(1) = 1000;
t(1)= 0;
a = 10;
b = 0 ;
dt = 0.01;
Nsteps = 10/dt;
for i = 2:Nsteps
t(i) = dt + t(i-1);
%f(i) = f(i-1)*(1 + dt*(a - b*f(i-1)));
f(i) = f(i-1)*(1 + a*dt);
end
plot(t,f,'r-')
hold on
fa= a*exp(a*t)
plot(t,fa,'bo')
When b=0, the solution to the differential equation is f(t)=c*exp(at). When I apply the initial condition, that f(0) = 1000, then the differential equation becomes f(t)=1000*exp(at). Now, my professor said that a differential equation has an analytic solution, no matter what time step you use, the graph of analytic solution and the approximation (Euler's Method) will coincide. So, I expected the two graphs to overlap. I attached a picture of what I got.
Why did this occur? In order to get the graphs to overlap, I changed 1000 to 10, which is a=10, just for the heck of it. When I did this, the two overlapped. I don't understand. What am I doing incorrectly?
Why should the numerical solution give the same answer as the analytical one? Looking at pixels overlapping on the screen is not a very precise way to discern anything. You should examine the error between the two (absolute and/or relative). You might also want to examine what happens when you change the step size. And you might want to play with a linear system as well. You don't need to integrate out so far to see these effects – just setting t equal 0.1 or 1 suffices. Here is some better-formatted code to work with:
t0 = 0;
dt = 0.01;
tf = 0.1;
t = t0:dt:tf; % No need to integrate t in for loop for fixed time step
lt = length(t);
f = zeros(1,lt); % Pre-allocate f
f0 = 1000; % Initial condition
f(1) = f0;
a = 10;
for i = 1:lt-1
f(i+1) = f(i) + a*f(i)*dt;
%f(i+1) = f(i) + a*dt; % Alternative linear system to try
end
% Analytic solution
fa = f0*exp(a*t);
%fa = f0+a*t; % Alternative linear system to try
figure;
plot(t,f,'r-',t,fa,'bo')
% Plot absolute error
figure;
plot(t,abs(f-fa))
% Plot relative error
figure;
plot(t,abs(f-fa)./fa)
You're also not preallocating any of your arrays which makes you code very inefficient. My code does. Read about that here.
Much more beyond this is really off-topic for this site, which is focussed on programming rather than mathematics. If you really have questions about the numerical details that aren't answered by reading your text book (or the Wikipedia page for the Euler method) then you should ask them at Math.StackExchange.
Numerical methods are not precise and there is always an error between numerical and analytical solution. As Euler's method is first order method, global truncation error is proportional to step of integration step.
I have a image as shown as fig.1. I am trying to fit this binary image with a capped rectangular (fig.2) to figure out:
the orientation (the angle between the long axis and the horizontal axis)
the length (l) and radius (R) of the object. What is the best way to do it?
Thanks for the help.
My very naive idea is using least square fit to find out these information however I found out there is no equation for capped rectangle. In matlab there is a function called rectangle can create the capped rectangle perfectly however it seems just for the plot purpose.
I solved this 2 different ways and have notes on each approach below. Each method varies in complexity so you will need to decide the best trade for your application.
First Approach: Least-Squares-Optimization:
Here I used unconstrained optimization through Matlab's fminunc() function. Take a look at Matlab's help to see the options you can set prior to optimization. I made some fairly simple choices just to get this approach working for you.
In summary, I setup a model of your capped rectangle as a function of the parameters, L, W, and theta. You can include R if you wish but personally I don't think you need that; by examining continuity with the half-semi-circles at each edge, I think it may be sufficient to let R = W, by inspection of your model geometry. This also reduces the number of optimization parameters by one.
I made a model of your capped rectangle using boolean layers, see the cappedRectangle() function below. As a result, I needed a function to calculate finite difference gradients of the model with respect to L, W, and theta. If you don't provide these gradients to fminunc(), it will attempt to estimate these but I found that Matlab's estimates didn't work well for this application, so I provided my own as part of the error function that gets called by fminunc() (see below).
I didn't initially have your data so I simply right-clicked on your image above and downloaded: 'aRIhm.png'
To read your data I did this (creates the variable cdata):
image = importdata('aRIhm.png');
vars = fieldnames(image);
for i = 1:length(vars)
assignin('base', vars{i}, image.(vars{i}));
end
Then I converted to double type and "cleaned-up" the data by normalizing. Note: this pre-processing was important to get the optimization to work properly, and may have been needed since I didn't have your raw data (as mentioned I downloaded your image from the webpage for this question):
data = im2double(cdata);
data = data / max(data(:));
figure(1); imshow(data); % looks the same as your image above
Now get the image sizes:
nY = size(data,1);
nX = size(data,2);
Note #1: you might consider adding the center of the capped rectangle, (xc,yc), as optimization parameters. These extra degrees of freedom will make a difference in the overall fitting results (see comment on final error function values below). I didn't set that up here but you can follow the approach I used for L, W, and theta, to add that functionality with the finite difference gradients. You will also need to setup the capped rectangle model as a function of (xc,yc).
EDIT: Out of curiosity I added the optimization over the capped rectangle center, see the results at the bottom.
Note #2: for "continuity" at the ends of the capped rectangle, let R = W. If you like, you can later include R as an explicit optimization
parameter following the examples for L, W, theta. You might even want to have say R1 and R2 at each endpoint as variables?
Below are arbitrary starting values that I used to simply illustrate an example optimization. I don't know how much information you have in your application but in general, you should try to provide the best initial estimates that you can.
L = 25;
W = L;
theta = 90;
params0 = [L W theta];
Note that you will get different results based on your initial estimates.
Next display the starting estimate (the cappedRectangle() function is defined later):
capRect0 = reshape(cappedRectangle(params0,nX,nY),nX,nY);
figure(2); imshow(capRect0);
Define an anonymous function for the error metric (errorFunc() is listed below):
f = #(x)errorFunc(x,data);
% Define several optimization parameters for fminunc():
options = optimoptions(#fminunc,'GradObj','on','TolX',1e-3, 'Display','iter');
% Call the optimizer:
tic
[x,fval,exitflag,output] = fminunc(f,params0,options);
time = toc;
disp(['convergence time (sec) = ',num2str(time)]);
% Results:
disp(['L0 = ',num2str(L),'; ', 'L estimate = ', num2str(x(1))]);
disp(['W0 = ',num2str(W),'; ', 'W estimate = ', num2str(x(2))]);
disp(['theta0 = ',num2str(theta),'; ', 'theta estimate = ', num2str(x(3))]);
capRectEstimate = reshape(cappedRectangle(x,nX,nY),nX,nY);
figure(3); imshow(capRectEstimate);
Below is the output from fminunc (for more details on each column see Matlab's help):
Iteration f(x) step optimality CG-iterations
0 0.911579 0.00465
1 0.860624 10 0.00457 1
2 0.767783 20 0.00408 1
3 0.614608 40 0.00185 1
.... and so on ...
15 0.532118 0.00488281 0.000962 0
16 0.532118 0.0012207 0.000962 0
17 0.532118 0.000305176 0.000962 0
You can see that the final error metric values have not decreased that much relative to the starting value, this indicates to me that the model function probably doesn't have enough degrees of freedom to really "fit" the data that well, so consider adding extra optimization parameters, e.g., image center, as discussed earlier.
EDIT: Added optimization over the capped rectangle center, see results at the bottom.
Now print the results (using a 2011 Macbook Pro):
Convergence time (sec) = 16.1053
L0 = 25; L estimate = 58.5773
W0 = 25; W estimate = 104.0663
theta0 = 90; theta estimate = 36.9024
And display the results:
EDIT: The exaggerated "thickness" of the fitting results above are because the model is trying to fit the data while keeping its center fixed, resulting in larger values for W. See updated results at bottom.
You can see by comparing the data to the final estimate that even a relatively simple model starts to resemble the data fairly well.
You can go further and calculate error bars for the estimates by setting up your own Monte-Carlo simulations to check accuracy as a function of noise and other degrading factors (with known inputs that you can generate to produce simulated data).
Below is the model function I used for the capped rectangle (note: the way I did image rotation is kind of sketchy numerically and not very robust for finite-differences but its quick and dirty and gets you going):
function result = cappedRectangle(params, nX, nY)
[x,y] = meshgrid(-(nX-1)/2:(nX-1)/2,-(nY-1)/2:(nY-1)/2);
L = params(1);
W = params(2);
theta = params(3); % units are degrees
R = W;
% Define r1 and r2 for the displaced rounded edges:
x1 = x - L;
x2 = x + L;
r1 = sqrt(x1.^2+y.^2);
r2 = sqrt(x2.^2+y.^2);
% Capped Rectangle prior to rotation (theta = 0):
temp = double( (abs(x) <= L) & (abs(y) <= W) | (r1 <= R) | (r2 <= R) );
cappedRectangleRotated = im2double(imrotate(mat2gray(temp), theta, 'bilinear', 'crop'));
result = cappedRectangleRotated(:);
return
And then you will also need the error function called by fminunc:
function [error, df_dx] = errorFunc(params,data)
nY = size(data,1);
nX = size(data,2);
% Anonymous function for the model:
model = #(params)cappedRectangle(params,nX,nY);
% Least-squares error (analogous to chi^2 in the literature):
f = #(x)sum( (data(:) - model(x) ).^2 ) / sum(data(:).^2);
% Scalar error:
error = f(params);
[df_dx] = finiteDiffGrad(f,params);
return
As well as the function to calculate the finite difference gradients:
function [df_dx] = finiteDiffGrad(fun,x)
N = length(x);
x = reshape(x,N,1);
% Pick a small delta, dx should be experimented with:
dx = norm(x(:))/10;
% define an array of dx values;
h_array = dx*eye(N);
df_dx = zeros(size(x));
f = #(x) feval(fun,x);
% Finite Diff Approx (use "centered difference" error is O(h^2);)
for j = 1:N
hj = h_array(j,:)';
df_dx(j) = ( f(x+hj) - f(x-hj) )/(2*dx);
end
return
Second Approach: use regionprops()
As others have pointed out you can also use Matlab's regionprops(). Overall I think this could work the best with some tuning and checking to insure that its doing what you expect. So the approach would be to call it like this (it certainly is a lot simpler than the first approach!):
data = im2double(cdata);
data = round(data / max(data(:)));
s = regionprops(data, 'Orientation', 'MajorAxisLength', ...
'MinorAxisLength', 'Eccentricity', 'Centroid');
And then the struct result s:
>> s
s =
Centroid: [345.5309 389.6189]
MajorAxisLength: 365.1276
MinorAxisLength: 174.0136
Eccentricity: 0.8791
Orientation: 30.9354
This gives enough information to feed into a model of a capped rectangle. At first glance this seems like the way to go, but it seems like you have your mind set on another approach (maybe the first approach above).
Anyway, below is an image of the results (in red) overlaid on top of your data which you can see looks quite good:
EDIT: I couldn't help myself, I suspected that by including the image center as an optimization parameter, much better results could be obtained, so I went ahead and did it just to check. Sure enough, with the same starting estimates used earlier in the Least-Squares Estimation, here are the results:
Iteration f(x) step optimality CG-iterations
0 0.911579 0.00465
1 0.859323 10 0.00471 2
2 0.742788 20 0.00502 2
3 0.530433 40 0.00541 2
... and so on ...
28 0.0858947 0.0195312 0.000279 0
29 0.0858947 0.0390625 0.000279 1
30 0.0858947 0.00976562 0.000279 0
31 0.0858947 0.00244141 0.000279 0
32 0.0858947 0.000610352 0.000279 0
By comparison with the earlier values we can see that the new least-square error values are quite a bit smaller when including the image center, confirming what we suspected earlier (so no big surprise).
The updated estimates for the capped rectangle parameters are thus:
Convergence time (sec) = 96.0418
L0 = 25; L estimate = 89.0784
W0 = 25; W estimate = 80.4379
theta0 = 90; theta estimate = 31.614
And relative to the image array center we get:
xc = -22.9107
yc = 35.9257
The optimization takes longer but the results are improved as seen by visual inspection:
If performance is an issue you may want to consider writing your own optimizer or first try tuning Matlab's optimization parameters, perhaps using different algorithm options as well; see the optimization options above.
Here is the code for the updated model:
function result = cappedRectangle(params, nX, nY)
[X,Y] = meshgrid(-(nX-1)/2:(nX-1)/2,-(nY-1)/2:(nY-1)/2);
% Extract params to make code more readable:
L = params(1);
W = params(2);
theta = params(3); % units are degrees
xc = params(4); % new param: image center in x
yc = params(5); % new param: image center in y
% Shift coordinates to the image center:
x = X-xc;
y = Y-yc;
% Define R = W as a constraint:
R = W;
% Define r1 and r2 for the rounded edges:
x1 = x - L;
x2 = x + L;
r1 = sqrt(x1.^2+y.^2);
r2 = sqrt(x2.^2+y.^2);
temp = double( (abs(x) <= L) & (abs(y) <= W) | (r1 <= R) | (r2 <= R) );
cappedRectangleRotated = im2double(imrotate(mat2gray(temp), theta, 'bilinear', 'crop'));
result = cappedRectangleRotated(:);
and then prior to calling fminunc() I adjusted the parameter list:
L = 25;
W = L;
theta = 90;
% set image center to zero as intial guess:
xc = 0;
yc = 0;
params0 = [L W theta xc yc];
Enjoy.
First I have to say that I do not have the answer to all of your questions but I can help you with the orientation.
I suggest using principal component analysis on the binary image. A good tutorial on PCA is given by Jon Shlens. In Figure 2 of his tutorial there is an example what it can be used for. In Section 5 of his paper you can see some sort of instruction how to compute the principal components. With singular value decomposition it is much easier as shown in Section 6.1.
To use PCA you have to get measurements for which you want to compute the principal components. In your case each white pixel is a measurement which is represented by the pixel location (x, y)'. You will have N two-dimensional vectors that give your measurements. Thus, your measurement 2xN matrix X is represented by the concatenated vectors.
When you have built this matrix proceed as given in Section 6.1. The singular values are representing the "strength" of the different components. Thus, the largest singular value represents the long axis of your ellipse. The second largest singular value (and it should only be two at all) is represents the other (perpendicular) axis.
Remember, if the ellipse is a circle your singular values should be equal but with your discrete image representation you will not get a perfect circle.