mle memory error with custom negative log-likelihood function - matlab

I am trying to use 'mle' with a custom negative log-likelihood function, but I get the following error:
Requested 1200000x1200000 (10728.8GB) array exceeds maximum array size preference (15.6GB). This might cause MATLAB to become unresponsive.
The data I am using is a 1x1200000 binary array (which I had to convert to double), and the function has 10 arguments: one for the data, 3 known paramenters, and 6 to be optimized. I tried setting 'OptimFun' to both 'fminsearch' and 'fmincon'. Also, optimizing the parameters using 'fminsearch' and 'fminunc' instead of 'mle' works fine.
The problem happens in the 'checkFunErrs' functions, inside the 'mlecustom.m' file (call at line 173, actuall error at line 705).
With 'fminunc' I could calculate the optimal parameters, but it does not give me confidence intervals. Is there a way to circumvent this? Or am I doing something wrong?
Thanks for the help.
T_1 = 50000;
T_2 = 100000;
npast = 10000;
start = [0 0 0 0 0 0];
func = #(x, data, cens, freq)loglike(data, [x(1) x(2) x(3) x(4) x(5) x(6)],...
T_1, T_2, npast);
params = mle(data, 'nloglf', func, 'Start', start, 'OptimFun', 'fmincon');
% Computes the negative log likehood
function out = loglike(data, params, T_1, T_2, npast)
size = length(data);
if npast == 0
past = 0;
else
past = zeros(1, size);
past(npast+1:end) = movmean(data(npast:end-1),[npast-1, 0]); % Average number of events in the previous n years
end
lambda = params(1) + ...
(params(2)*cos(2*pi*(1:size)/T_1)) + ...
(params(3)*sin(2*pi*(1:size)/T_1)) + ...
(params(4)*cos(2*pi*(1:size)/T_2)) + ...
(params(5)*sin(2*pi*(1:size)/T_2)) + ...
params(6)*past;
out = sum(log(1+exp(lambda))-data.*lambda);
end

Your issue is line 228 (as of MATLAB R2017b) of the in-built mle function, which happens just before the custom function is called:
data = data(:);
The input variable data is converted to a column array without warning. This is typically done to ensure that all further calculations are robust to the orientation of the input vector.
However, this is causing you issues, because your custom function assumes data is a row vector, specifically this line:
out = sum(log(1+exp(lambda))-data.*lambda);
Due to implicit expansion, when the row vector lambda and the column vector data interact, you get a huge square matrix per your error message.
Adding these two lines to make it explicit that both are column vectors resolves the issue, avoids implicit expansion, and applies the calculation element-wise as you intended.
lambda = lambda(:);
data = data(:);
So your function becomes
function out = loglike(data, params, T_1, T_2, npast)
N = length(data);
if npast == 0
past = 0;
else
past = zeros(1,N);
past(npast+1:end) = movmean(data(npast:end-1),[npast-1, 0]); % Average number of events in the previous n years
end
lambda = params(1) + ...
(params(2)*cos(2*pi*(1:N)/T_1)) + ...
(params(3)*sin(2*pi*(1:N)/T_1)) + ...
(params(4)*cos(2*pi*(1:N)/T_2)) + ...
(params(5)*sin(2*pi*(1:N)/T_2)) + ...
params(6)*past;
lambda = lambda(:);
data = data(:);
out = sum(log(1+exp(lambda))-data.*lambda);
end
An alternative would be to re-write your function so that it uses column vectors, but you create new row vectors with the (1:N) steps and the concatenation within the movmean. The suggested approach is arguably "lazier", but also robust to row or column inputs.
Note also I've changed your variable name from size to N, since size is an in-built function which you should avoid shadowing.

Related

Two functions in Matlab to approximate integral - not enough input arguments?

I want to write a function that approximates integrals with the trapezoidal rule.
I first defined a function in one file:
function[y] = integrand(x)
y = x*exp(-x^2); %This will be integrand I want to approximate
end
Then I wrote my function that approximates definite integrals with lower bound a and upper bound b (also in another file):
function [result] = trapez(integrand,a,b,k)
sum = 0;
h = (b-a)/k; %split up the interval in equidistant spaces
for j = 1:k
x_j = a + j*h; %this are the points in the interval
sum = sum + ((x_j - x_(j-1))/2) * (integrand(x_(j-1)) + integrand(x_j));
end
result = sum
end
But when I want to call this function from the command window, using result = trapez(integrand,0,1,10) for example, I always get an error 'not enough input arguments'. I don't know what I'm doing wrong?
There are numerous issues with your code:
x_(j-1) is not defined, and is not really a valid Matlab syntax (assuming you want that to be a variable).
By calling trapez(integrand,0,1,10) you're actually calling integrand function with no input arguments. If you want to pass a handle, use #integrand instead. But in this case there's no need to pass it at all.
You should avoid variable names that coincide with Matlab functions, such as sum. This can easily lead to issues which are difficult to debug, if you also try to use sum as a function.
Here's a working version (note also a better code style):
function res = trapez(a, b, k)
res = 0;
h = (b-a)/k; % split up the interval in equidistant spaces
for j = 1:k
x_j1 = a + (j-1)*h;
x_j = a + j*h; % this are the points in the interval
res = res+ ((x_j - x_j1)/2) * (integrand(x_j1) + integrand(x_j));
end
end
function y = integrand(x)
y = x*exp(-x^2); % This will be integrand I want to approximate
end
And the way to call it is: result = trapez(0, 1, 10);
Your integrandfunction requires an input argument x, which you are not supplying in your command line function call

Solving differential equation for a single time in loop with matlab

I have a Mechanical system with following equation:
xdot = Ax+ Bu
I want to solve this equation in a loop because in every step I need to update u but solvers like ode45 or lsim solving the differential equation for a time interval.
for i = 1:10001
if x(i,:)>= Sin1 & x(i,:)<=Sout2
U(i,:) = Ueq - (K*(S/Alpha))
else
U(i,:) = Ueq - (K*S)
end
% [y(i,:),t,x(i+1,:)]=lsim(sys,U(i,:),(time=i/1000),x(i,:));
or %[t,x] = ode45(#(t,x)furuta(t,x,A,B,U),(time=i/1000),x)
end
Do I have another ways to solve this equation in a loop for a single time(Not single time step).
There are a number of methods for updating and storing data across function calls.
For the ODE suite, I've come to like what is called "closures" for doing that.
A closure is basically a nested function accessing or modifying a variable from its parent function.
The code below makes use of this feature by wrapping the right-hand side function passed to ode45 and the 'OutputFcn' in a parent function called odeClosure().
You'll notice that I am using logical-indexing instead of an if-statement.
Vectors in if-statements will only be true if all elements are true and vice-versa for false.
Therefore, I create a logical array and use it to make the denominator either 1 or Alpha depending on the signal value for each row of x/U.
The 'OutputFcn' storeU() is called after a successful time step by ode45.
The function grows the U storage array and updates it appropriately.
The array U will have the same number of columns as the number of solution points requested by tspan (12 in this made-up example).
If a successful full step leaps over any requested points, the function is called with intermediate all requested times and their associated solution values (so x may be rectangular and not just a vector); this is why I used bsxfun in storeU and not in rhs.
Example function:
function [sol,U] = odeClosure()
% Initilize
% N = 10 ;
A = [ 0,0,1.0000,0; 0,0,0,1.0000;0,1.3975,-3.7330,-0.0010;0,21.0605,-6.4748,-0.0149];
B = [0;0;0.6199;1.0752 ] ;
x0 = [11;11;0;0];
K = 100;
S = [-0.2930;4.5262;-0.5085;1.2232];
Alpha = 0.2 ;
Ueq = [0;-25.0509;6.3149;-4.5085];
U = Ueq;
Sin1 = [-0.0172;-4.0974;-0.0517;-0.2993];
Sout2 = [0.0172 ; 4.0974; 0.0517; 0.2993];
% Solve
options = odeset('OutputFcn', #(t,x,flag) storeU(t,x,flag));
sol = ode45(#(t,x) rhs(t,x),[0,0.01:0.01:0.10,5],x0,options);
function xdot = rhs(~,x)
between = (x >= Sin1) & (x <= Sout2);
uwork = Ueq - K*S./(1 + (Alpha-1).*between);
xdot = A*x + B.*uwork;
end
function status = storeU(t,x,flag)
if isempty(flag)
% grow array
nAdd = length(t) ;
iCol = size(U,2) + (1:nAdd);
U(:,iCol) = 0 ;
% update U
between = bsxfun(#ge,x,Sin1) & bsxfun(#le,x,Sout2);
U(:,iCol) = Ueq(:,ones(1,nAdd)) - K*S./(1 + (Alpha-1).*between);
end
status = 0;
end
end

How to vectorize a piecewise periodic function in MATLAB?

I've noticed that matlab builtin functions can handle either scalar or vector parameters. Example:
sin(pi/2)
ans =
1
sin([0:pi/5:pi])
ans =
0 0.5878 0.9511 0.9511 0.5878 0.0000
If I write my own function, for example, a piecewise periodic function:
function v = foo(t)
t = mod( t, 2 ) ;
if ( t < 0.1 )
v = 0 ;
elseif ( t < 0.2 )
v = 10 * t - 1 ;
else
v = 1 ;
end
I can call this on individual values:
[foo(0.1) foo(0.15) foo(0.2)]
ans =
0 0.5000 1.0000
however, if the input for the function is a vector, it is not auto-vectorized like the builtin function:
foo([0.1:0.05:0.2])
ans =
1
Is there a syntax that can be used in the definition of the function that indicates that if a vector is provided, a vector should be produced? Or do builtin functions like sin, cos, ... check for the types of their input, and if the input is a vector produce the same result?
You need to change your syntax slightly to be able to handle data of any size. I typically use logical filters to vectorise if-statements, as you're trying to do:
function v = foo(t)
v = zeros(size(t));
t = mod( t, 2 ) ;
filt1 = t<0.1;
filt2 = ~filt1 & t<0.2;
filt3 = ~filt1 & ~filt2;
v(filt1) = 0;
v(filt2) = 10*t(filt2)-1;
v(filt3) = 1;
In this code, we've got three logical filters. The first picks out all elements such that t<0.1. The second picks out all of the elements such that t<0.2 that weren't in the first filter. The final filter gets everything else.
We then use this to set the vector v. We set every element of v that matches the first filter to 0. We set everything in v which matches the second filter to 10*t-1. We set every element of v which matches the third filter to 1.
For a more comprehensive coverage of vectorisation, check the MATLAB help page on it.
A simple approach that minimizes the number of operations is:
function v = foo(t)
t = mod(t, 2);
v = ones(size(t)) .* (t > 0.1);
v(t < 0.2) = 10*t(t < 0.2) - 1;
end
If the vectors are large, it might be faster to do ind = t < 0.2, and use that in the last line. That way you only search through the array once. Also, the multiplication might be substituted by an extra line with logical indices.
I repeatedly hit the same problem, thus I was looking for a more generic solution and came up with this:
%your function definition
c={#(t)(mod(t,2))<0.1,0,...
#(t)(mod(t,2))<0.2,#(t)(10 * t - 1),...
true,1};
%call pw which returns the function
foo=pw(c{:});
%example evaluation
foo([0.1:0.05:0.2])
Now the code for pw
function f=pw(varargin)
for ip=1:numel(varargin)
switch class(varargin{ip})
case {'double','logical'}
varargin{ip}=#(x)(repmat(varargin{ip},size(x)));
case 'function_handle'
%do nothing
otherwise
error('wrong input class')
end
end
c=struct('cnd',varargin(1:2:end),'fcn',varargin(2:2:end));
f=#(x)pweval(x,c);
end
function y=pweval(x,p)
todo=true(size(x));
y=x.*0;
for segment=1:numel(p)
mask=todo;
mask(mask)=logical(p(segment).cnd(x(mask)));
y(mask)=p(segment).fcn(x(mask));
todo(mask)=false;
end
assert(~any(todo));
end

Integration via trapezoidal sums in MATLAB

I need help finding an integral of a function using trapezoidal sums.
The program should take successive trapezoidal sums with n = 1, 2, 3, ...
subintervals until there are two neighouring values of n that differ by less than a given tolerance. I want at least one FOR loop within a WHILE loop and I don't want to use the trapz function. The program takes four inputs:
f: A function handle for a function of x.
a: A real number.
b: A real number larger than a.
tolerance: A real number that is positive and very small
The problem I have is trying to implement the formula for trapezoidal sums which is
Δx/2[y0 + 2y1 + 2y2 + … + 2yn-1 + yn]
Here is my code, and the area I'm stuck in is the "sum" part within the FOR loop. I'm trying to sum up 2y2 + 2y3....2yn-1 since I already accounted for 2y1. I get an answer, but it isn't as accurate as it should be. For example, I get 6.071717974723753 instead of 6.101605982576467.
Thanks for any help!
function t=trapintegral(f,a,b,tol)
format compact; format long;
syms x;
oldtrap = ((b-a)/2)*(f(a)+f(b));
n = 2;
h = (b-a)/n;
newtrap = (h/2)*(f(a)+(2*f(a+h))+f(b));
while (abs(newtrap-oldtrap)>=tol)
oldtrap = newtrap;
for i=[3:n]
dx = (b-a)/n;
trapezoidsum = (dx/2)*(f(x) + (2*sum(f(a+(3:n-1))))+f(b));
newtrap = trapezoidsum;
end
end
t = newtrap;
end
The reason why this code isn't working is because there are two slight errors in your summation for the trapezoidal rule. What I am precisely referring to is this statement:
trapezoidsum = (dx/2)*(f(x) + (2*sum(f(a+(3:n-1))))+f(b));
Recall the equation for the trapezoidal integration rule:
Source: Wikipedia
For the first error, f(x) should be f(a) as you are including the starting point, and shouldn't be left as symbolic. In fact, you should simply get rid of the syms x statement as it is not useful in your script. a corresponds to x1 by consulting the above equation.
The next error is the second term. You actually need to multiply your index values (3:n-1) by dx. Also, this should actually go from (1:n-1) and I'll explain later. The equation above goes from 2 to N, but for our purposes, we are going to go from 1 to N-1 as you have your code set up like that.
Remember, in the trapezoidal rule, you are subdividing the finite interval into n pieces. The ith piece is defined as:
x_i = a + dx*i; ,
where i goes from 1 up to N-1. Note that this starts at 1 and not 3. The reason why is because the first piece is already taken into account by f(a), and we only count up to N-1 as piece N is accounted by f(b). For the equation, this goes from 2 to N and by modifying the code this way, this is precisely what we are doing in the end.
Therefore, your statement actually needs to be:
trapezoidsum = (dx/2)*(f(a) + (2*sum(f(a+dx*(1:n-1))))+f(b));
Try this and let me know if you get the right answer. FWIW, MATLAB already implements trapezoidal integration by doing trapz as #ADonda already pointed out. However, you need to properly structure what your x and y values are before you set this up. In other words, you would need to set up your dx before hand, then calculate your x points using the x_i equation that I specified above, then use these to generate your y values. You then use trapz to calculate the area. In other words:
dx = (b-a) / n;
x = a + dx*(0:n);
y = f(x);
trapezoidsum = trapz(x,y);
You can use the above code as a reference to see if you are implementing the trapezoidal rule correctly. Your implementation and using the above code should generate the same results. All you have to do is change the value of n, then run this code to generate the approximation of the area for different subdivisions underneath your curve.
Edit - August 17th, 2014
I figured out why your code isn't working. Here are the reasons why:
The for loop is unnecessary. Take a look at the for loop iteration. You have a loop going from i = [3:n] yet you don't reference the i variable at all in your loop. As such, you don't need this at all.
You are not computing successive intervals properly. What you need to do is when you compute the trapezoidal sum for the nth subinterval, you then increment this value of n, then compute the trapezoidal rule again. This value is not being incremented properly in your while loop, which is why your area is never improving.
You need to save the previous area inside the while loop, then when you compute the next area, that's when you determine whether or not the difference between the areas is less than the tolerance. We can also get rid of that code at the beginning that tries and compute the area for n = 2. That's not needed, as we can place this inside your while loop. As such, this is what your code should look like:
function t=trapintegral(f,a,b,tol)
format long; %// Got rid of format compact. Useless
%// n starts at 2 - Also removed syms x - Useless statement
n = 2;
newtrap = ((b-a)/2)*(f(a) + f(b)); %// Initialize
oldtrap = 0; %// Initialize to 0
while (abs(newtrap-oldtrap)>=tol)
oldtrap = newtrap; %//Save the old area from the previous iteration
dx = (b-a)/n; %//Compute width
%//Determine sum
trapezoidsum = (dx/2)*(f(a) + (2*sum(f(a+dx*(1:n-1))))+f(b));
newtrap = trapezoidsum; % //This is the new sum
n = n + 1; % //Go to the next value of n
end
t = newtrap;
end
By running your code, this is what I get:
trapezoidsum = trapintegral(#(x) (x+x.^2).^(1/3),1,4,0.00001)
trapezoidsum =
6.111776299189033
Caveat
Look at the way I defined your function. You must use element-by-element operations as the sum command inside the loop will be vectorized. Take a look at the ^ operations specifically. You need to prepend a dot to the operations. Once you do this, I get the right answer.
Edit #2 - August 18th, 2014
You said you want at least one for loop. This is highly inefficient, and whoever specified having one for loop in the code really doesn't know how MATLAB works. Nevertheless, you can use the for loop to accumulate the sum term. As such:
function t=trapintegral(f,a,b,tol)
format long; %// Got rid of format compact. Useless
%// n starts at 3 - Also removed syms x - Useless statement
n = 3;
%// Compute for n = 2 first, then proceed if we don't get a better
%// difference tolerance
newtrap = ((b-a)/2)*(f(a) + f(b)); %// Initialize
oldtrap = 0; %// Initialize to 0
while (abs(newtrap-oldtrap)>=tol)
oldtrap = newtrap; %//Save the old area from the previous iteration
dx = (b-a)/n; %//Compute width
%//Determine sum
%// Initialize
trapezoidsum = (dx/2)*(f(a) + f(b));
%// Accumulate sum terms
%// Note that we multiply each term by (dx/2), but because of the
%// factor of 2 for each of these terms, these cancel and we thus have dx
for n2 = 1 : n-1
trapezoidsum = trapezoidsum + dx*f(a + dx*n2);
end
newtrap = trapezoidsum; % //This is the new sum
n = n + 1; % //Go to the next value of n
end
t = newtrap;
end
Good luck!

Mutual Information of MATLAB Matrix

I have a square matrix that represents the frequency counts of co-occurrences in a data set. In other words, the rows represent all possible observations of feature 1, and the columns are the possible observations of feature 2. The number in cell (x, y) is the number of times feature 1 was observed to be x at the same time feature 2 was y.
I want to calculate the mutual information contained in this matrix. MATLAB has a built-in information function, but it takes 2 arguments, one for x and one for y. How would I manipulate this matrix to get the arguments it expects?
Alternatively, I wrote my own mutual information function that takes a matrix, but I'm unsure about its accuracy. Does it look right?
function [mutualinfo] = mutualInformation(counts)
total = sum(counts(:));
pX = sum(counts, 1) ./ total;
pY = sum(counts) ./ total;
pXY = counts ./ total;
[h, w] = size(counts);
mutualinfo = 0;
for row = 1:h
for col = 1:w
mutualinfo = mutualinfo + pXY(row, col) * log(pXY(row, col) / (pX(row)*pY(col)));
end;
end;
end
I don't know of any built-in mutual information functions in MATLAB. Perhaps you got a hold of one of the submissions from the MathWorks File Exchange or some other third-party developer code?
I think there may be something wrong with how you are computing pX and pY. Plus, you can vectorize your operations instead of using for loops. Here's another version of your function to try out:
function mutualInfo = mutualInformation(counts)
pXY = counts./sum(counts(:));
pX = sum(pXY,2);
pY = sum(pXY,1);
mutualInfo = pXY.*log(pXY./(pX*pY));
mutualInfo = sum(mutualInfo(:));
end