neural network for classification - generalization - neural-network

I've developed a neural network in R to classify a set of images, namely the images in the MNIST handwritten digit database.
I use pca on the images and the nn has two hidden layers.
So far I can't get more than 95% of accuracy on the validation set.
What can I do to get a 100% of accuracy on the validation set? That is, what can I do to improve the generalization capabilities of the nn?
(I'm using a stochastic back-propagation algorithm to find the optimal weights).
I'll post the code for the function that finds the weights.
DICLAIMER: I'm so totally new to neural networks and R so this is just an attempt to come up with something.
fixedLearningRateStochasticGradientDescent <- function(X_in, Y, w_list, eta, numOfIterations){
x11();
err_data <- NULL
N <- dim(X_in)[2]
X_in <- rbind(rep(1, N), X_in) #add bias neurons to input
iter <- 0
for(i in 1:numOfIterations){
errGrad <- NULL;
iter <- i
e_in <- 0
g_list <- initGradient(w_list)
L <- length(w_list)
for(i in (1:N)){
#compute x
s_list <- list()
x_list <- list(X_in[,i, drop = FALSE])
for(l in 1:L){
S <- t(w_list[[l]]) %*% x_list[[l]]
s_list[[length(s_list) + 1]] <- S
X <- apply(S, 1:2, theta_list[[l]])
X_n <- dim(X)[2]
if(l < L){
X <- rbind(rep(1, X_n), X) #add bias neurons to input
}
x_list[[length(x_list) + 1]] <- X
}
#compute d
d_list <- list()
for(l in (1:L)){
d_list[[l]] <- NULL
}
target <- t(Y[i,,drop = FALSE])
d_list[[L]] <- 2 * (x_list[[L + 1]] - target) * theta_der_list[[L]](x_list[[L + 1]])
for(l in (L - 1):1){
T <- theta_der_list[[l]](x_list[[l + 1]])
Q <- w_list[[l + 1]] %*% d_list[[l + 1]]
D <- T * Q
D <- D[-1, , drop=FALSE] #remove bias
d_list[[l]] <- D
}
e_in <- e_in + (1/N * sum((x_list[[L + 1]] - target)^2))
for(l in 1:L){
G <- x_list[[l]] %*% t(d_list[[l]])
#print(G)
g_list[[l]] <- G
}
for(i in 1:(length(w_list))){
w_list[[i]] <- w_list[[i]] - eta * g_list[[i]]
}
}
err <- e_in
g_list <- errGrad[[2]]
err_data <- c(err_data, err)
print(paste0(iter, ": ", err))
}
plot(err_data, type="o", col="red")
print(err)
return(w_list)
}
The rest of the code is trivial:
- perform pca on input
- initialize weights
- find weights
- calculate performance on test and validation sets.

Related

Scala for-yield with multiple conditions

I have a bitmap object that is a 3-dimensional array with third dimension equal to 3. I want to split it into 64x64x3 blocks. For this I have the following code snippet:
val tiles: someType = for {
x <- bitmap.indices by 64
y <- bitmap(0).indices by 64
data = for {
//For all X and Y within one future tile coordinates
tx <- x until x + 64
ty <- y until y + 64
} yield bitmap(tx)(ty)
...
}
In the data for loop yield will cause an ArrayIndexOutOfBoundsException at the last chunk. How can I check, whether x and y don't exceed array borders in this loop? Is it possible to have multiple until conditions for the same variable in the same loop?
What about this?
val tiles: someType = for {
x <- bitmap.indices by 64
y <- bitmap(0).indices by 64
data = for {
//For all X and Y within one future tile coordinates
tx <- x until math.min(x + 64, bitmap.length)
ty <- y until math.min(y + 64, bitmap(0).length)
} yield bitmap(tx)(ty)
}

Conversion of Looping to Recursive Solution

I have written a method pythagoreanTriplets in scala using nested loops. As a newbie in scala, I am struggling with how can we do the same thing using recursion and use Lazy Evaluation for the returning list(List of tuples). Any help will be highly appreciated.
P.S: The following method is working perfectly fine.
// This method returns the list of all pythagorean triples whose components are
// at most a given limit. Formula a^2 + b^2 = c^2
def pythagoreanTriplets(limit: Int): List[(Int, Int, Int)] = {
// triplet: a^2 + b^2 = c^2
var (a,b,c,m) = (0,0,0,2)
var triplets:List[(Int, Int, Int)] = List()
while (c < limit) {
breakable {
for (n <- 1 until m) {
a = m * m - n * n
b = 2 * m * n
c = m * m + n * n
if (c > limit)
break
triplets = triplets :+ (a, b, c)
}
m += 1
}
}// end of while
triplets
}
I don't see where recursion would offer significant advantages.
def pythagoreanTriplets(limit: Int): List[(Int, Int, Int)] =
for {
m <- (2 to limit/2).toList
n <- 1 until m
c = m*m + n*n if c <= limit
} yield (m*m - n*n, 2*m*n, c)

Scala yield IndexedSeq conversion

I have following code snippet:
val tiles = for {
x <- 0 to bitmap.length by tileSize
y <- 0 to bitmap(0).length by tileSize
} yield new Tile[Number](x, y, tileSize, tileSize,
data = for {tx <- x to x + tileSize - 1;
ty <- y to y + tileSize - 1
} yield (bitmap(tx)(ty)))
This can look complicated, but the idea behind is to create Tile objects for every XY position in 3-dimensional bitmap object. The 'nested' yield that is given as a data parameter into the Tile's constructor is an IndexedSeq[Number], which should be converted to an Array[Number] to match the type of the data parameter. The problem is that toArray method doesn't exist for the final yielded object:
val tiles = for {
x <- 0 to bitmap.length by tileSize
y <- 0 to bitmap(0).length by tileSize
} yield new Tile[Number](x, y, tileSize, tileSize,
data = for {tx <- x to x + tileSize - 1;
ty <- y to y + tileSize - 1
} yield (bitmap(tx)(ty).toArray))
causes an error Cannot resolve symbol toArray, even though yield (bitmap(tx)(ty).toArray)) is shown as IndexedSeq[java.lang.Number] in IntelliJIDEA and theoretically should contain a definition of toArray method.
What is happening in the last yield? How can I convert resulting collection to an Array? I know, this code may and should be rewritten in simplier and more readable manner, but now I want to know, what is going on behind the curtain.
You need to call toArray to the final result of all the for, not to each yield.
You may do this:
val tiles = for {
x <- 0 to bitmap.length by tileSize
y <- 0 to bitmap(0).length by tileSize
} yield new Tile[Number](x, y, tileSize, tileSize,
data = (for {tx <- x to x + tileSize - 1;
ty <- y to y + tileSize - 1
} yield bitmap(tx)(ty)).toArray
However, this syntax is not encouraged in Scala, consider this snippet instead.
val tiles = for {
x <- 0 to bitmap.length by tileSize
y <- 0 to bitmap(0).length by tileSize
data = for {
tx <- x to x + tileSize - 1
ty <- y to y + tileSize - 1
} yield bitmap(tx)(ty)
tile = new Tile[Number](x, y, tileSize, tileSize, data.toArray)
} yield tile

Multiple definitions of node W[1]

I try to estimate delta[j,k] on the condition sum(delta[j,1:5])=0 for each "j". but when I compile the code, software output is "multiple definitions of node W[1]". Could someone help me?
model {
for (j in 1:p){
for (k in 1:5){
Z[j, k]<- sum(delta[j,1:k])
}
for (i in 1:n){
Y[i , j] ~ dcat ( prob [i , j , 1: 5])
}}
for (i in 1:n){
theta [i] ~ dnorm (0.0 , 1.0)
}
for (i in 1:n){
for (j in 1:p){
for (k in 1:5){
eta[i , j , k] <- alpha [j] * (k*theta [i] - k*beta [j]+Z[j, k])
psum [i , j , k] <- sum(eta[i , j , 1: k])
exp.psum[i , j , k]<- exp( psum [i , j , k])
prob [i , j , k] <- exp.psum[i , j , k] / sum(exp.psum [i , j , 1:5])
} } }
for (j in 1:p){
W[j] <- sum(delta [j, 1:5])
W[j]<- 0
alpha [j] ~ dlnorm (0.83 , pr.alpha)
beta [j] ~ dnorm (-1.73 , pr.beta )
delta[j,1] <- 0.0
for (k in 2:5){
delta [j , k] ~ dnorm (0.02 , pr.delta )
} }
pr.alpha <- pow(1.2 , -2)
pr.beta <- pow(0.7, -2)
pr.delta <- pow(1.3, -2)
}
thanks
BUGS does not allow you to overwrite deterministic nodes,... you have have W[j] <- twice in the last for loop.
I guess there are many way to write the code to meet your condition. For example you could use a different distribution for delta or set delta[1] to be the remainder required to get all delta to sum to 0 after simulating delta[2] to delta[5]

Assessing parameter bias in simulation

set.seed(123456)
reps <- 500 # no. of repetitions
par.est <- matrix(NA, nrow= reps, ncol=2) # empty matrix to store the estimates
b0 <- .2 # true value for the intercept
b1 <- .5 # true value for the slope
n <- 1000 # sample size
X <- runif(n, -1, 1) # create a sample of n obs on the independent variable x
for (i in 1:reps){ # start of the loop
Y <- b0 +b1*X + rnorm(n,0,1) # the true DGP, with N(0,1) error
model <-lm(Y~X) # estimate the OLS model
par.est[i,1] <- model$coef[1] # put the estimate for the intercept in the 1st column
par.est[i, 2] <- model$coef[2] # put the estimates for the coefficient of X in the 2nd column
}
Can someone show me how to assess the bias in the estimates of the intercept and the slope?