GoodnessOfFit.StandardError wrong answer - mathnet-numerics

Why am I getting the wrong answer (err2) from GoodnessOfFit.StandardError? In the code below, I do the computation myself and get the right answer (err3). I get the right answer from GoodnessOfFit.RSquared. Note: esttime and phrf are double[]. Length is 63.
Tuple<double, double> p = Fit.Line(esttime, phrf);
double ss = 0.0;
for (int j = 0; j < esttime.Length; j++)
{
est[j] = p.Item1 + p.Item2 * esttime[j];
ss += Math.Pow(est[j] - phrf[j], 2);
}
double err2 = GoodnessOfFit.StandardError(est, phrf, phrf.Length - 2);
Console.WriteLine(err2.ToString()); //writes 70.91 which is wrong
double err3 = Math.Sqrt(ss / est.Length - 2);
Console.WriteLine(err3.ToString()); // writes 12.56 which is correct

Answer: The third argument, Degrees of Freedom, is actually the number of degrees of freedom lost in the regression. So in the example it should be 2 and not phrf.Length - 2. Even so, it does not match my calculation, and Excel's, exactly.

I get the exact same result with excel and Math.NET for a polynomial fit when I set the degrees of freedom to be the order of the polynomial fit + 1. So, in the following exemple:
double[] paramsPoly = MathNet.Numerics.Fit.Polynomial(X.ToArray(),Y.ToArray(),6);
The StandardError function must receive 6 + 1 as it's last parameter:
double sey = MathNet.Numerics.GoodnessOfFit.StandardError(Yest, Y, 7);

Related

Function fzero in Matlab is not converging

I am solving a problem in my Macroeconomics class. Consider the following equation:
Here, k is fixed and c(k) was defined through the ```interp1''' function in Matlab. Here is my code:
beta = 0.98;
delta = 0.13;
A = 2;
alpha = 1/3;
n_grid = 1000; % Number of points for capital
k_grid = linspace(5, 15, n_grid)';
tol = 1e-5;
max_it = 1000;
c0 = ones(n_grid, 1);
new_k = zeros(n_grid, 1);
dist_c = tol + 1;
it_c = 0;
while dist_c > tol && it_c < max_it
c_handle = #(k_tomorrow) interp1(k_grid, c0, k_tomorrow, 'linear', 'extrap');
for i=1:n_grid
% Solve for k'
euler = #(k_tomorrow) (1/((1-delta)* k_grid(i) + A * k_grid(i)^alpha - k_tomorrow)) - beta*(1-delta + alpha*A*k_tomorrow^(alpha - 1))/c_handle(k_prime);
new_k(i) = fzero(euler, k_grid(i)); % What's a good guess for fzero?
end
% Compute new values for consumption
new_c = A*k_grid.^alpha + (1-delta)*k_grid - new_k;
% Check convergence
dist_c = norm(new_c - c0);
c0 = new_c;
it_c = it_c + 1;
end
When I run this code, for some indexes $i$, it runs fine and fzero can find the solution. But for indexes it just returns NaN and exits without finding the root. This is a somewhat well-behaved problem in Economics and the solution we are looking indeed exists and the algorithm I tried to implement is guaranteed to work. But I don't have much experience with solving this in MATLAB and I guess I have a silly mistake somewhere. Any ideas on how to procede?
This is the typical error message:
Exiting fzero: aborting search for an interval containing a sign change
because complex function value encountered during search.
(Function value at -2.61092 is 0.74278-0.30449i.)
Check function or try again with a different starting value.
Thanks a lot in advance!
The only term that can produce complex numbers is
k'^(alpha - 1) = k'^(-2/3)
You probably want the result according to the real variant of the cube root, which you could get as
sign(k') * abs(k')^(-2/3)
or more generally and avoiding divisions by zero
k' * (1e-16+abs(k'))^(alpha - 2)

Summing very large numbers without using toolboxes

I am trying to sum very large numbers in MATLAB, such as e^800 and e^1000 and obtain an answer.
I know that in Double-Precision, the largest number I can represent is 1.8 * 10^308, otherwise I get Inf, which I am getting when trying to sum these numbers.
My question is, how do I go about estimating an answer for sums of very, very large numbers like these without using vpa, or some other toolbox?
Should I use strings? It is possible to do this using logs? Can I represent the floats as m x 2^E and if so, how do I take a number such as e^700 and convert it to that? If the number is larger than the threshold for Inf, should I divide it by two, and store it in two different variables?
For example, how would I obtain an approximate answer for:
e^700 + e^800 + e^900 + e^1000 ?
A possible approximation is to use the rounded values of these numbers (I personally used Wolfram|Alpha), then perform "long addition" as they teach in elementary school:
function sumStr = q57847408()
% store rounded values as string:
e700r = "10142320547350045094553295952312676152046795722430733487805362812493517025075236830454816031618297136953899163768858065865979600395888785678282243008887402599998988678389656623693619501668117889366505232839133350791146179734135738674857067797623379884901489612849999201100199130430066930357357609994944589";
e800r = "272637457211256656736477954636726975796659226578982795071066647118106329569950664167039352195586786006860427256761029240367497446044798868927677691427770056726553709171916768600252121000026950958713667265709829230666049302755903290190813628112360876270335261689183230096592218807453604259932239625718007773351636778976141601237086887204646030033802";
e900r = "7328814222307421705188664731793809962200803372470257400807463551580529988383143818044446210332341895120636693403927733397752413275206079839254190792861282973356634441244426690921723184222561912289431824879574706220963893719030715472100992004193705579194389741613195142957118770070062108395593116134031340597082860041712861324644992840377291211724061562384383156190256314590053986874606962229";
e1000r = "197007111401704699388887935224332312531693798532384578995280299138506385078244119347497807656302688993096381798752022693598298173054461289923262783660152825232320535169584566756192271567602788071422466826314006855168508653497941660316045367817938092905299728580132869945856470286534375900456564355589156220422320260518826112288638358372248724725214506150418881937494100871264232248436315760560377439930623959705844189509050047074217568";
% pad to the same length with zeros on the left:
padded = pad([e700r; e800r; e900r; e1000r], 'left', '0');
% convert the padded value to an array of digits:
dig = uint8(char(padded) - '0');
% some helpful computations for later:
colSum = [0 uint8(sum(dig, 1))]; % extra 0 is to prevent overflow of MSB
remainder = mod(colSum, 10);
carry = idivide(colSum, 10, 'floor');
while any(carry) % can also be a 'for' loop with nDigit iterations (at most)
result = remainder + circshift(carry, -1);
remainder = mod(result, 10);
carry = idivide(result, 10, 'floor');
end
% remove leading zero (at most one):
if ~result(1)
result = result(2:end);
end
% convert result back to string:
sumStr = string(char(result + '0'));
This gives the (rounded) result of:
197007111401704699388887935224332312531693805861198801302702004327171116872054081548301452764017301057216669857236647803717912876737392925607579016038517631441936559738211677036898431095605804172455718237264052427496060405708350697523284591075347592055157466708515626775854212347372496361426842057599220506613838622595904885345364347680768544809390466197511254544019946918140384750254735105245290662192955421993462796807599177706158188
Typos fixed from before.
Decimal Approximation:
function [m, new_exponent] = base10_mantissa_exponent(base, exponent)
exact_exp = exponent*log10(abs(base));
new_exponent = floor(exact_exp);
m = power(10, exact_exp - new_exponent);
end
So the value e600 would become 3.7731 * 10260.
And the value 117150 would become 1.6899 * 10310.
To add these two values together, I took the difference between the two exponents and divided the mantissa of the smaller term by it. Then it's just as simple as adding the mantissas together.
mantissaA = 3.7731;
exponentA = 260;
mantissaB = 1.6899;
exponentB = 310;
diff = abs(exponentA - exponentB);
if exponentA < exponentB
mantissaA = mantissaA / (10^diff);
finalExponent = exponentB;
elseif exponentB < exponentA
mantissaB = mantissaB / (10^diff);
finalExponent = exponentA;
end
finalMantissa = mantissaA + mantissaB;
This was important for me as I was performing sums such as:
(Σ ex) / (Σ xex)
From x=1 to x=1000.

Matlab: binomial simulation

How do i simulate a binomial distribution with values for investment with two stocks acme and widget?
Number of trials is 1000
invest in each stock for 5 years
This is my code. What am I doing wrong?
nyears = 5;
ntrials = 1000;
startamount = 100;
yrdeposit = 50;
acme = zeros(nyears, 1);
widget = zeros(nyears,1);
v5 = zeros(ntrials*5, 1);
v5 = zeros(ntrials*5, 1);
%market change between -5 to 1%
marketchangeacme = (-5+(1+5)*rand(nyears,1));
marketchangewidget = (-3+(3+3)*rand(nyears,1));
acme(1) = startamount;
widget(1) = startamount;
for m=1:numTrials
for n=1:nyears
acme(n) = acme(n-1) + (yrdeposit * (marketchangeacme(n)));
widget(n) = acme(n-1) + (yrdeposit * (marketchangewidget(n)));
vacme5(i) = acme(j);
vwidget5(i) = widget(j);
end
theMean(m) = mean(1:n*nyears);
p = 0.5 % prob neg return
acmedrop = (marketchangeacme < p)
widgetdrop = (marketchangewidget <p)
end
plot(mean)
Exactly what you are trying to calculate is not clear. However some things that are obviously wrong with the code are:
widget(n) presumable isn't a function of acme(n-1) but rather 'widget(n-1)`
Every entry of theMean will be mean(1:nyears*nyears), which for nyears=5 will be 13. (This is because n=nyears always at that point in code.)
The probability of a negative return for acme is 5/6, not 0.5.
To find the locations of the negative returns you want acmedrop = (marketchangeacme < 0); not < 0.5 (nor any other probability). Similarly for widgetdrop.
You are not preallocating vacme5 nor vwidget5 (but you do preallocate v5 twice, and then never use it.
You don't create a variable called mean (and you never should) so plot(mean) will not work.

What's the difference between these two codes to get the sum of infinite series in MATLAB?

1 + 1/(2^4) + 1/(3^4) + 1/(4^4) + ...
This is the infinite series that I'd like to get the sum value. So I wrote this code in MATLAB.
n = 1;
numToAdd = 1;
sum = 0;
while numToAdd > 0
numToAdd = n^(-4);
sum = sum + numToAdd;
n = n + 1;
end
disp(sum);
But I couldn't get the result because this code occurred an infinite loop. However, the code I write underneath -- it worked well. It took only a second.
n = 1;
oldsum = -1;
newsum = 0;
while newsum > oldsum
oldsum = newsum;
newsum = newsum + n^(-4);
n = n+1;
end
disp(newsum);
I read these codes again and googled for a while, but coudln't find out the critical point. What makes the difference between these two codes? Is it a matter of precision of double in MATLAB?
The first version would have to go down to the minimum value for a double ~10^-308, while the second will only need to go down to the machine epsilon ~10^-16. The epsilon value is the largest value x such that 1+x = 1.
This means the first version will need approximately 10^77 iterations, while the second only needs 10^4.
The problem boils down to this:
x = 1.23456789; % Some random number
xEqualsXPlusEps = (x == x + 1e-20)
ZeroEqualsEps = (0 == 1e-20)
xEqualsXPlusEps will be true, while ZeroEqualsEps is false. This is due to the way floating point arithmetic works. The value 1e-20 is smaller than the least significant bit of x, so x+1e-20 won't be larger than x. However 1e-20 is not considered equal to 0. In comparison to x, 1e-20 is relatively small, whereas in comparison to 0, 1e-20 is not small at all.
To fix this problem you would have to use:
while numToAdd > tolerance %// Instead of > 0
where tolerance is some small number greater than zero.

Removing duplicate entries in a vector, when entries are complex and rounding errors are causing problems

I want to remove duplicate entries from a vector on Matlab. The problem I'm having is that rounding errors are stopping the inbuilt Matlab function 'unique' from working properly. Ideally I'd like a way to set some sort of tolerance on the 'unique' function, or a small procedure that will remove the duplicates otherwise. If both the real and imaginary parts of two entries differ by less than 0.0001, then I'm happy to consider them equal. How can I do this?
Any help will be greatly appreciated. Thanks
A simple approximation would be to round the numbers and the use the indices returned by unique:
X = ... (input vector)
[b, i] = unique(round(X / (tolerance * (1 + i))));
output = X(i);
(you can probably replace b with ~ depending on your Matlab version).
it won't quite have the behaviour you want, since it is possible that two numbers are very close but will be rounded differently. I think you could mitigate this by doing:
X = ... (input vector)
[b, ind] = unique(round(X / (tolerance * (1 + i))));
X = X(ind);
[b, ind] = unique(round(X / (tolerance * (1 + i)) + 0.5 * (1 + i)));
X = X(ind);
This will round them twice, so any numbers that are exactly on a rounding boundary will be caught by the second unique.
There is still some messiness in this - some numbers will be affected as though the tolerance was doubled. But it might be sufficient for your needs.
The alternative is probably a for loop:
X = sort(X);
last = X(1);
indices = ones(numel(X), 1);
for j=2:numel(X)
if X(j) > last + tolerance * (1 + i)
last = X(j) + tolerance * (1 + i) / 2;
else
indices(j) = 0;
end
end
X = X(logical(indices));
I think this has the best behaviour you can expect (because you want to represent the vector by as few unique values as possible - when there are lots of numbers that differ by less than the tolerance level, there may be multiple ways of splitting them. This algorithm does so greedily, starting with the smallest).
I'm almost certain the below ill always assume any values closer than 1e-8 are equal. Simply replace 1e-8 with whatever value you want.
% unique function that assumes 1e-8 is equal
function [out, I] = unique(input, first_last)
threshold = 1e-8;
if nargin < 2
first_last = 'last';
end
[out, I] = sort(input);
db = diff(out);
k = find(abs(db) < threshold);
if strcmpi(first_last, 'last')
k2 = min(I(k), I(k+1));
elseif strcmpi(first_last, 'first')
k2 = max(I(k), I(k+1));
else
error('unknown flag option for unique, must be first or last');
end
k3 = true(1, length(input));
k3(k2) = false;
out = out(k3(I));
I = I(k3(I));
end
The following might serve your purposes. Given X, an array of complex doubles, it sorts them, then checks whether the absolute value differences between elements is within the complex tolerance, real_tol and imag_tol. It removes elements that satisfy this tolerance.
function X_unique = unique_complex_with_tolerance(X,real_tol,imag_tol)
X_sorted = sort(X); %Sorts by magnitude first, then imaginary part.
dX_sorted = diff(X_sorted);
dX_sorted_real = real(dX_sorted);
dX_sorted_imag = imag(dX_sorted);
remove_idx = (abs(dX_sorted_real)<real_tol) & (abs(dX_sorted_imag)<imag_tol);
X_unique = X_sorted;
X_unique(remove_idx) = [];
return
Note that this code will remove all elements which satisfy this difference tolerance. For example, if X = [1+i,2+2i,3+3i,4+4i], real_tol = 1.1, imag_tol = 1.1, then this function will return only one element, X_unique = [4+4i], even though you might consider, for example, X_unique = [1+i,4+4i] to also be a valid answer.