I am reading a statistics textbook Introduction to Statistics for Engineers by Sheldon Ross, p.275 and trying to re-do its examples on paper and in Octave. I am not able to replicate many Bayes calculations in Octave when it comes to the integration part. Please advise how to go about replicating below calculation in Octave? Below is a simple Bayes estimator example which naturally becomes a symbolic integration problem, where I often encounter difficulty doing in Octave.
[Clarification: This calculation is from a textbook and I understand it by hand. What I don't understand is how should one approach such statistical computing exercises in practice. This question relates to statistical/scientific computing, not coding or statistics per se.]
Suppose are independent Bernoulli variables, having pdf
p is the unknown variable .
Compute the Bayes estimator for p.
We know that
The conditional pdf of p given X is then
It can be shown that,
---(1)
Using (1) and letting , the conditional pdf becomes
Recall Bayes estimator is .
Therefore, Bayes estimator for p is:
Now, I try to replicate these steps using Octave as below and failed (integration took 40mins on my $2500 Dell desktop). Can you show my confused soul how do you do the above steps in Octave or Matlab or R to arrive at the same Bayes estimator?
#Use Octave to derive the above Bayes estimator
pkg load symbolic;
syms p n x;
f = (p^x) * (1-p)^(n-x);
F = int(f, p, [0, 1]); #integrate f, which gives the conditional pdf denominator
f_conditional = f/F; #the conditional pdf
integrand = p * f_conditional; # the integrand to derive Bayes estimator
estimator = int(integrand, p, [0, 1]);
#this integration takes forever, how else should I replicate the above in Octave?
Related
I am tasked to perform a prediction analysis. This requires performing a linear regression on several (~10) predictor variables and coming up with intercepts for all and a constant.
so final equation will be of this format y = c + c1x1 + c2x2 + c3x3....
Now I know that you can use fitlm function in MATLAB that is available with Statistics and Machine Learning Toolbox however at this point I don't know if we will be purchasing it. How do I perform linear regression on them ?
You can use the closed form solution of linear least squares.
C=inv(transpose(X)*X)*transpose(X)*y
In the above, make the first row of X all ones, and the following rows are x1, x2,...
C will contain the corresponding constants. The first entry in C is c.
From: https://www.mathworks.com/help/matlab/data_analysis/linear-regression.html
You can write your predictor variables as a matrix X using X = [ones(length(x1),1),x1,x2,x3,...,xn] and formulating the response variables Y as the equation Y = XB and doing a matrix inverse operation using mldivide as B = X\Y to find your regression coefficients.
I am taking an Econometrics course, and have been trying to use Python rather than the propreitry STATA and EVIEWS they set the assignments in.
In one of the questions, I have consumption data over time. I am asked to compute it in two ways.
The first way is calculating a model of the form consumption = Aexp(Bt), and the second way is to log both sides and do ordinary OLS on log(consumption) = alpha + Bt
I know how to do the second way. Howver, when I try to do the first way it goes wrong. Using statsmodels, I can exponentiate the time data (after normalising), but this calculates a regression in the form consumption = Aexp(t) + B, which is not what I want. (I want to specify where the parameters go). In sklearn I could find a polynomial regression, but not exponential.
Then I found scipy.curve_fit
However this seems to have two problems:
(1) It seems to rely on initial guesses for parameters, which means my output will end up being different from proprietry software (whereas output for things like OLS are the same) [as I assume initial guesses means some iterative solution is done which is helpful for very weird and wonderful functions, but I assume fairly standard results hold for exponential regression]
(2) every time I try to implement it, it just returns the guess parameters.
Here is my code
`consumption_data = pd.read_csv(......\consumption.csv")
def func(x,a,b):
return a * np.exp(b*x)
xdata = consumption_data.YEAR
ydata = consumption_data.CONSUMPTION
ydata = (ydata - 1948)/100
popt, pcov = curve_fit(func, xdata, ydata, (1,1))
print(popt)
plt.plot(xdata, func(xdata, *popt), 'g--',)
`
The scipy.optimize code is basically just copy-pasted from their tutorial
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
short answer: use statsmodels GLM
statsmodels does not have nonlinear least squares. The best python library for that is lmfit https://pypi.org/project/lmfit/
curve_fit, lmfit and nonlinear least squares algorithm in general find an iterative solution to the optimization problem. Even when we have to provide starting values, the solution is in many cases the same across packages up to convergence tolerance, e.g. 1e-5 or 1e-6.
Many standard models in statistics and econometrics have a single global maximum with well behaved data. However, in other cases like mixture models, there might be many local optima and the estimation might converge to one of them.
To the specific case:
consumption = A exp(B t)
can be rewritten as
consumption = exp(a + B t)
So this is just a single index model or a generalized linear model with an exponential mean function.
The general version has the expectation of the dependent variable as a nonlinear function of a linear combination of the explanatory variables:
E(y | x) = g(x b)
This can be estimated with statsmodels with GLM with family Gaussian and the log-link.
Aside: In econometrics, there is a literature to use Poisson quasi-likelihood as an estimator for exp models instead of taking the log of the dependent variable.
Poisson usually uses the log-link function as in the above.
However, using GLM allows us to use log-link, i.e. exponential mean function, with any of the supported distribution families. The main difference is in the underlying variance assumption. Gaussian assumes constant variance, Poisson assumes that the variance is proportional to the mean and Gamma assumes that the variance is quadratic in the mean.
If we use a robust sandwich covariance estimator for parameter inference, then standard errors and inference are correct even if the variance function is misspecified.
I'm using the Matlab function fitcsvm for training a SVM with a RBF kernel. I'm using the following calls:
SVMModel = fitcsvm(X_train,labels,'KernelFunction','rbf','KernelScale',0.2087,'BoxConstraint',2.8779);
[~,scores] = predict(SVMModel,X_test);
X_train is a NxD matrix with training data, labels is a Nx1 vector with the labels for the training data and X_test is a MxD matrix with test data points.
Now I would like to use custom kernels. To start with, I decided to try the RBF kernel. The implementation goes as follows:
SVMModel = fitcsvm(X_train,labels,'KernelFunction','rbfKernel','BoxConstraint', 2.8779);
[~,scores] = predict(SVMModel,X_test);
function K = rbfKernel(U,V)
sigma = 0.2087;
gamma = 1 ./ (2*(sigma ^2));
K = exp(-gamma .* pdist2(U,V,'euclidean').^2);
end
I stored the function rbfKernel in a rbfKernel.m file.
The result of the built-in kernel as well as the custom kernel is very similar and the fitcsvm method runs for both approaches very fast.
The problem is that the predict method is extremely slow when using the custom kernel. It takes around 1 minute, compared to 5 seconds with the built-in kernel.
Why is this? Is there a mistake I made?
Probably it's due to code optimization. MATLAB engineers spend lots of time optimizing their codes, so my bet is that although your code does the same as the built-in function, it doesn't do as fast as MATLAB code does
I would like to perform conditional simulations for Gaussian process (GP) models in Matlab. I have found a tutorial by Martin Kolář (http://mrmartin.net/?p=223).
sigma_f = 1.1251; %parameter of the squared exponential kernel
l = 0.90441; %parameter of the squared exponential kernel
kernel_function = #(x,x2) sigma_f^2*exp((x-x2)^2/(-2*l^2));
%This is one of many popular kernel functions, the squared exponential
%kernel. It favors smooth functions. (Here, it is defined here as an anonymous
%function handle)
% we can also define an error function, which models the observation noise
sigma_n = 0.1; %known noise on observed data
error_function = #(x,x2) sigma_n^2*(x==x2);
%this is just iid gaussian noise with mean 0 and variance sigma_n^2s
%kernel functions can be added together. Here, we add the error kernel to
%the squared exponential kernel)
k = #(x,x2) kernel_function(x,x2)+error_function(x,x2);
X_o = [-1.5 -1 -0.75 -0.4 -0.3 0]';
Y_o = [-1.6 -1.3 -0.5 0 0.3 0.6]';
prediction_x=-2:0.01:1;
K = zeros(length(X_o));
for i=1:length(X_o)
for j=1:length(X_o)
K(i,j)=k(X_o(i),X_o(j));
end
end
%% Demo #5.2 Sample from the Gaussian Process posterior
clearvars -except k prediction_x K X_o Y_o
%We can also sample from this posterior, the same way as we sampled before:
K_ss=zeros(length(prediction_x),length(prediction_x));
for i=1:length(prediction_x)
for j=i:length(prediction_x)%We only calculate the top half of the matrix. This an unnecessary speedup trick
K_ss(i,j)=k(prediction_x(i),prediction_x(j));
end
end
K_ss=K_ss+triu(K_ss,1)'; % We can use the upper half of the matrix and copy it to the
K_s=zeros(length(prediction_x),length(X_o));
for i=1:length(prediction_x)
for j=1:length(X_o)
K_s(i,j)=k(prediction_x(i),X_o(j));
end
end
[V,D]=eig(K_ss-K_s/K*K_s');
A=real(V*(D.^(1/2)));
for i=1:7
standard_random_vector = randn(length(A),1);
gaussian_process_sample(:,i) = A * standard_random_vector+K_s/K*Y_o;
end
hold on
plot(prediction_x,real(gaussian_process_sample))
set(plot(X_o,Y_o,'r.'),'MarkerSize',20)
The tutorial generates the conditional simulations using a direct simulation method based on covariance matrix decomposition. It is my understanding that there are several methods of generating conditional simulations that may be better when the number of simulation points is large such as conditioning by Kriging using a local neighborhood. I have found information regarding several methods in J.-P. Chilès and P. Delfiner, “Chapter 7 - Conditional Simulations,” in Geostatistics: Modeling Spatial Uncertainty, Second Edition, John Wiley & Sons, Inc., 2012, pp. 478–628.
Is there an existing Matlab toolbox that can be used for conditional simulations? I am aware of DACE, GPML, and mGstat (http://mgstat.sourceforge.net/). I believe only mGstat offers the capability to perform conditional simulations. However, mGstat also seems to be limited to only 3D models and I am interested in higher dimensional models.
Can anybody offer any advice on getting started performing conditional simulations with an existing toolbox such as GPML?
===================================================================
EDIT
I have found a few more Matlab toolboxes: STK, ScalaGauss, ooDACE
It appears STK is capable of conditional simulations using covariance matrix decomposition. However, is limited to a moderate number (maybe a few thousand?) of simulation points due to the Cholesky factorization.
I used the STK toolbox and I recommend it for others:
http://kriging.sourceforge.net/htmldoc/
I found that if you need conditional simulations at a large number of points then you might consider generating a conditional simulation at the points in a large design of experiment (DoE) and then simply relying on the mean prediction conditional on that DoE.
This question has already confused me several days. While I referred to senior students, they also cannot give a reply.
We have ten ODEs, into which each a noise term should be added. The noise is defined as follows. since I always find that I cannot upload a picture, the formula below maybe not very clear. In order to understand, you can either read my explanation or go the this address: Plos one. You could find the description of the equations directly above the Support Information in this address
The white noise term epislon_i(t) is assumed with Gaussian distribution. epislon_i(t) means that for equation i, and at t timepoint, the value of the noise.
the auto-correlation of noise are given:
(EQ.1)
where delta(t) is the Dirac delta function and the diffusion matrix D is defined by
(EQ.2)
Our problem focuses on how to explain the Dirac delta function in the diffusion matrix. Since the property of Dirac delta function is delta(0) = Inf and delta(t) = 0 if t neq 0, we don't know how to calculate the epislonif we try to sqrt of 2D(x, t)delta(t-t'). So we simply assume that delta(0) = 1 and delta(t) = 0 if t neq 0; But we don't know whether or not this is right. Could you please tell me how to use Delta function of diffusion equation in MATLAB?
This question associates with the stochastic process in MATLAB. So we review different stochastic process to inspire our ideas. In MATLAB, the Wienner process is often defined as a = sqrt(dt) * rand(1, N). N is the number of steps, dt is the length of the steps. Correspondingly, the Brownian motion can be defined as: b = cumsum(a); All of these associate with stochastic process. However, they doesn't related to the white noise process which has a constraints on the matrix of auto-correlation, noted by D.
Then we consider that, we may simply use randn(1, 10) to generate a vector representing the noise. However, since the definition of the noise must satisfy the equation (2), this cannot enable noise term in different equation have the predefined partial correlation (D_ij). Then we try to use mvnrnd to generate a multiple variable normal distribution at each time step. Unfortunately, the function mvnrnd in MATLAB return a matrix. But we need to return a vector of length 10.
We are rather confused, so could you please give me just a light? Thanks so much!
NOTE: I see two hazy questions in here: 1) how to deal with a stochastic term in a DE and 2) how to deal with a delta function in a DE. Both of these are math related questions and http://www.math.stackexchange.com will be a better place for this. If you had a question pertaining to MATLAB, I haven't been able to pin it down, and you should perhaps add code examples to better illustrate your point. That said, I'll answer the two questions briefly, just to put you on the right track.
What you have here are not ODEs, but Stochastic differential equations (SDE). I'm not sure how you're using MATLAB to work with this, but routines like ode45 or ode23 will not be of any help. For SDEs, your usual mathematical tools of separation of variables/method of characteristics etc don't work and you'll need to use Itô calculus and Itô integrals to work with them. The solutions, as you might have guessed, will be stochastic. To learn more about SDEs and working with them, you can consider Stochastic Differential Equations: An Introduction with Applications by Bernt Øksendal and for numerical solutions, Numerical Solution of Stochastic Differential Equations by Peter E. Kloeden and Eckhard Platen.
Coming to the delta function part, you can easily deal with it by taking the Fourier transform of the ODE. Recall that the Fourier transform of a delta function is 1. This greatly simplifies the DE and you can take an inverse transform in the very end to return to the original domain.