I know that WinBugs uses precision as a parameter in dnorm instead of variance
model {
#Likelihood
for(i in 1:N1){
y1[i] ~dnorm(mu,tau)
}
sigma <- sqrt(1/tau)
#Priors
mu ~ dnorm(0,0.000001)
tau~ dgamma(taumu, taus)
}
My question is: if instead I want to specify the prior for sigma because I know its mean and variance would it be right to use the following model ?
model {
#Likelihood
for(i in 1:N1){
y1[i] ~dnorm(mu,tau)
}
tau <- sqrt(1/sigma)
#Priors
mu ~ dnorm(0,0.000001)
sigma ~ dnorm(sigmamu, sigmas)
}
Thanks in advance
Related
I need to compute efficiently an array of something like f(i,a) = exp(-0.5 * (i-1) * i * a) for all i in (0..n), with n up to 20.000 and a a positive value very close to 0.
To avoid computing exp n times, I used an incremental approach such as (writing in scala):
def fInc(n: Int, a: Double)
val expA = Math.exp(-a)
var u = 1.0
var v = 1.0
var i = 1
while(i < n){
u *= expA
v *= u // in practice I store that value in an array, for all i
i += 1
}
}
// reference by calling exp directly
def fRef(n: Int, a: Double) = Math.exp(-0.5 * (i-1) * i * a)
This is mathematically correct, but then the difference with direct exp computation is too big. Here are some results:
n a v Math.exp diff
1000 1E-6 0.6068340008761639 0.6068340008714599 4.704014955336788E-12
1000 1E-9 0.9995006247427483 0.9995006247293567 1.3391510123028638E-11
1000 1E-12 0.9999995005111699 0.9999995005001248 1.1045164782785832E-11
1000 1E-15 0.9999999995008992 0.9999999995005 3.992361996552063E-13
10000 1E-6 1.938417748402E-22 1.938417746809E-22 1.5929953847004499E-31
10000 1E-9 0.9512341819777599 0.9512341806597269 1.3180330160622589E-9
10000 1E-12 0.9999500073554776 0.9999500062497292 1.1057483817467073E-9
10000 1E-15 0.9999999500449599 0.9999999500050013 3.995859199079632E-11
As you can see, for some values,the difference goes up to 1e-9, while I can accept maybe 1e-13
So question:
Is there a way to get a better approximate with an algorithm that is still much more efficient than calling exp on all i?
Notes:
I use apache FastMath exp, which gives almost the same results as standard java exp.
The actual algorith is more complex, with other such incremental exp (not quadratic though)
Here is the best solution I found:
The error being incremented (kind of) linearly with each multiplication by the "unitary exp(a)". We can think of the error as a function similar to err(i) ~= i*i*err0 for some err0. The point is that the error of v is quadratic w.r.t i.
The best I found is:
reset the v to the correct value at some chosen frequency (each k iteration)
improve the correctness of u each k iteration, using incremental exp computation
.
val k = 100
val expA = Math.exp(-a)
val expAk = Math.exp(-k*a)
var u = 1.0
var uk = 1.0
var v = 1.0
var i = 1
while(i < n){
if(i%k==0){
uk *= expAk
u = uk
v = Math.exp(- 0.5*(i+1)*i * a)
} else{
u *= expA
v *= u
}
i += 1
}
This method require n / k + 2 call to exp, not quite satifying but the best I have for now. It can probably be improved by choosing the best frequency parameter k.
I've developed a neural network in R to classify a set of images, namely the images in the MNIST handwritten digit database.
I use pca on the images and the nn has two hidden layers.
So far I can't get more than 95% of accuracy on the validation set.
What can I do to get a 100% of accuracy on the validation set? That is, what can I do to improve the generalization capabilities of the nn?
(I'm using a stochastic back-propagation algorithm to find the optimal weights).
I'll post the code for the function that finds the weights.
DICLAIMER: I'm so totally new to neural networks and R so this is just an attempt to come up with something.
fixedLearningRateStochasticGradientDescent <- function(X_in, Y, w_list, eta, numOfIterations){
x11();
err_data <- NULL
N <- dim(X_in)[2]
X_in <- rbind(rep(1, N), X_in) #add bias neurons to input
iter <- 0
for(i in 1:numOfIterations){
errGrad <- NULL;
iter <- i
e_in <- 0
g_list <- initGradient(w_list)
L <- length(w_list)
for(i in (1:N)){
#compute x
s_list <- list()
x_list <- list(X_in[,i, drop = FALSE])
for(l in 1:L){
S <- t(w_list[[l]]) %*% x_list[[l]]
s_list[[length(s_list) + 1]] <- S
X <- apply(S, 1:2, theta_list[[l]])
X_n <- dim(X)[2]
if(l < L){
X <- rbind(rep(1, X_n), X) #add bias neurons to input
}
x_list[[length(x_list) + 1]] <- X
}
#compute d
d_list <- list()
for(l in (1:L)){
d_list[[l]] <- NULL
}
target <- t(Y[i,,drop = FALSE])
d_list[[L]] <- 2 * (x_list[[L + 1]] - target) * theta_der_list[[L]](x_list[[L + 1]])
for(l in (L - 1):1){
T <- theta_der_list[[l]](x_list[[l + 1]])
Q <- w_list[[l + 1]] %*% d_list[[l + 1]]
D <- T * Q
D <- D[-1, , drop=FALSE] #remove bias
d_list[[l]] <- D
}
e_in <- e_in + (1/N * sum((x_list[[L + 1]] - target)^2))
for(l in 1:L){
G <- x_list[[l]] %*% t(d_list[[l]])
#print(G)
g_list[[l]] <- G
}
for(i in 1:(length(w_list))){
w_list[[i]] <- w_list[[i]] - eta * g_list[[i]]
}
}
err <- e_in
g_list <- errGrad[[2]]
err_data <- c(err_data, err)
print(paste0(iter, ": ", err))
}
plot(err_data, type="o", col="red")
print(err)
return(w_list)
}
The rest of the code is trivial:
- perform pca on input
- initialize weights
- find weights
- calculate performance on test and validation sets.
I want to solve:
I use following MATLAB code, but it does not work.
Can someone please guide me?
function f=objfun
f=-f;
function [c1,c2,c3]=constraint(x)
a1=1.1; a2=1.1; a3=1.1;
c1=f-log(a1)-log(x(1)/(x(1)+1));
c2=f-log(a2)-log(x(2)/(x(2)+1))-log(1-x(1));
c3=f-log(a3)-log(1-x(1))-log(1-x(2));
x0=[0.01;0.01];
[x,fval]=fmincon('objfun',x0,[],[],[],[],[0;0],[1;1],'constraint')
You need to flip the problem around a bit. You are trying to find the point x (which is (l_1,l_2)) that makes the minimum of the 3 LHS functions the largest. So, you can rewrite your problem as, in pseudocode,
maximise, by varying x in [0,1] X [0,1]
min([log(a1)+log(x(1)/(x(1)+1)) ...
log(a2)+log(x(2)/(x(2)+1))+log(1-x(1)) ...
log(a3)+log(1-x(1))+log(1-x(2))])
Since Matlab has fmincon, rewrite this as a minimisation problem,
minimise, by varying x in [0,1] X [0,1]
max(-[log(a1)+log(x(1)/(x(1)+1)) ...
log(a2)+log(x(2)/(x(2)+1))+log(1-x(1)) ...
log(a3)+log(1-x(1))+log(1-x(2))])
So the actual code is
F=#(x) max(-[log(a1)+log(x(1)/(x(1)+1)) ...
log(a2)+log(x(2)/(x(2)+1))+log(1-x(1)) ...
log(a3)+log(1-x(1))+log(1-x(2))])
[L,fval]=fmincon(F,[0.5 0.5])
which returns
L =
0.3383 0.6180
fval =
1.2800
Can also solve this in the convex optimization package CVX with the following MATLAB code:
cvx_begin
variables T(1);
variables x1(1);
variables x2(1);
maximize(T)
subject to:
log(a1) + x1 - log_sum_exp([0, x1]) >= T;
log(a2) + x2 - log_sum_exp([0, x2]) + log(1 - exp(x1)) >= T;
log(a3) + log(1 - exp(x1)) + log(1 - exp(x2)) >= T;
x1 <= 0;
x2 <= 0;
cvx_end
l1 = exp(x1); l2 = exp(x2);
To use CVX, each constraint and the objective function has to be written in a way that is proveably convex using CVX's ruleset. Making the substitution x1 = log(l1) and x2 = log(l2) allows one to do that. Note that: log_sum_exp([0,x1]) = log(exp(0) + exp(x1)) = log(1 + l1)
This also returns the answers: l1 = .3383, l2 = .6180, T = -1.2800
I need to solve the following equation for the Mach number M over an entire flow field:
Where q_c is defined as
γ is a constant and is the ratio of specific heats (1.4 for air) and p is the pressure and is a matrix of the the dimension of the mesh. It is thus an equation with M on both sides and needs to be solved explicitly.
Is there a built in MATLAB function or any other way to solve this equation for M over the entire flow field?
Basically, this is a polynomial with non-integer powers:
a := 0.88...
N := M²
⇒ N - a²·(½γN + 1)·(1 - 1/7N)²⁵ = 0
for which there is no analytic solution. So, you'll have to go numerical. The easiest (but not the fastest) way:
gamma = 1.4;
a = 0.88128485;
M = zeros(size(p));
for ii = 1:numel(M)
M(ii) = fzero(#(M)...
M - a*sqrt( (gamma/2*p(ii)*M.^2 + 1).*(1-1./7./M.^2).^(2.5) ), ...
2.5); %# initial value; insert your roughly expected value here
end
I am trying to prove CLT in matlab by comparing histogram for sum of three RV and normal distribution.
Here is my code:
clc;clear;
len = 50000;
%y0 : Exponential Distribution
lambda = 3;
y0=-log(rand(1,len))./lambda;
%y1 : Rayleigh Distribution
mu = 0;
sig = 2;
var1 = mu + sig*randn(1,len);
var2 = mu + sig*randn(1,len);
t1 = var1 .^ 2;
t2 = var2 .^ 2;
y1 = sqrt(t1+t2);
% %y2: Normal Distribution
y2 = randn(1,len);
%y3 : What result excpected to be:
mean0 = (sum(y0)+ sum(y1)+ sum(y2)) / (len * 3);%how do I calculate this?
var0 = 1;%how do I calculate this?
y3 = mean0 + var0*randn(1,len);
delta = 0.1;
x3 = min(y3):delta:max(y3);
figure('Name','Normal Distribution');
hist(y3,x3);
%Central Limit Theorem:
%what result is:
res = y0+y1+y2;
xn = min(res):delta:max(res);
figure('Name','Final Result');
hist(res,xn);
I have two main problems.
How can I calculate mean and variance for y3 (what result should be)
Is my code correct?
Since y0, y1 and y2 are row vectors, you have to do:
mean0 = mean([y0 y1 y2]);
variance0 = var([y0 y1 y2]);
When you create [y0 y1 y2] you are creating a big vector with all your previous samples in a single vector (As if they were samples form one single distribution).
Now just plug it into the functions you want (mean and variance) as showed above.
About the statistical part: I think you are getting some things wrong.
The Central Limit Theorem applies for the sum of variables distributed according to a same distribution. It can be indeed be any distribution D, but all variables must have that same distribution D. You are trying to sum different distributions.
The theorem says:
I've coded an example for variables distributed according to an exponential distribution.
Run it and you observe that when you increase N, the resulting distribution tends to the expected normal distribution. For N=1 you have your exponential distribution (very different from a normal distribution), but for N=100 you already have a distribution that is very close to the expected normal distribution (you can see how the mean and variance are basically the same now).
CLT for Exponentials with N=1
CLT for Exponentials with N=3
CLT for Exponentials with N=10
CLT for Exponentials with N=100
The expected normal distribution (convergence distibution of CLT)
clc;clear;
len = 50000;
lambda = 3;
%yA : Exponential Distribution A
yA=-log(rand(1,len))./lambda;
%yB : Exponential Distribution B
yB=-log(rand(1,len))./lambda;
%yC : Exponential Distribution C
yC=-log(rand(1,len))./lambda;
%yD : Exponential Distribution D
yD=-log(rand(1,len))./lambda;
%yE : Exponential Distribution E
yE=-log(rand(1,len))./lambda;
%yF : Exponential Distribution F
yF=-log(rand(1,len))./lambda;
%yG : Exponential Distribution G
yG=-log(rand(1,len))./lambda;
%yH : Exponential Distribution H
yH=-log(rand(1,len))./lambda;
%yI : Exponential Distribution I
yI=-log(rand(1,len))./lambda;
%yJ : Exponential Distribution J
yJ=-log(rand(1,len))./lambda;
%y1 : What result you expect it to be (centred Gaussian with same variation as exponential):
mean0 = 0;
var0 = var(yA);
y1 = mean0 + sqrt(var0)*randn(1,len);
delta = 0.01;
x1 = min(y1):delta:max(y1);
figure('Name','Normal Distribution (Expected)');
hist(y1,x1);
%Central Limit Theorem:
%what result is:
res1 = (((yA)/1) - mean(yA))*sqrt(1);
res2 = (((yA+yB)/2) - mean(yA))*sqrt(2);
res3 = (((yA+yB+yC)/3) - mean(yA))*sqrt(3);
res4 = (((yA+yB+yC+yD)/4) - mean(yA))*sqrt(4);
res5 = (((yA+yB+yC+yD+yE)/5) - mean(yA))*sqrt(5);
res10 = (((yA+yB+yC+yD+yE+yF+yG+yH+yI+yJ)/10) - mean(yA))*sqrt(10);
delta = 0.01;
xn = min(res1):delta:max(res1);
figure('Name','Final Result for N=1');
hi st(res1,xn);
xn = min(res2):delta:max(res2);
figure('Name','Final Result for N=2');
hist(res2,xn);
xn = min(res3):delta:max(res3);
figure('Name','Final Result for N=3');
hist(res3,xn);
xn = min(res4):delta:max(res4);
figure('Name','Final Result for N=4');
hist(res4,xn);
xn = min(res5):delta:max(res5);
figure('Name','Final Result for N=5');
hist(res5,xn);
xn = min(res10):delta:max(res10);
figure('Name','Final Result for N=10');
hist(res10,xn);
%for N = 100
y100=-log(rand(100,len))./lambda;
res100 = ((sum(y100)/100) - mean(yA))*sqrt(100);
xn = min(res100):delta:max(res100);
figure('Name','Final Result for N=100');
hist(res100,xn);