What is the usefulness of the mean file with AlexNet neural network? - neural-network

When using an AlexNet neural network, be it with caffe or CNTK, it needs a mean file as input. What is this mean file for ? How does it affect the training ? How is it generated, only from training sample ?

Mean subtraction removes the DC component from images. It has the geometric interpretation of centering the cloud of data around the origin along every dimension. It reduces the correlation between images which improves training. From my experience I can say that it improves the training accuracy significantly. It is computed from the training data. Computing mean from the testing data makes no sense.

Related

TensorFlow: Binary classification accuracy

In the context of a binary classification, I use a neural network with 1 hidden layer using a tanh activation function. The input is coming from a word2vect model and is normalized.
The classifier accuracy is between 49%-54%.
I used a confusion matrix to have a better understanding on what’s going on. I study the impact of feature number in input layer and the number of neurons in the hidden layer on the accuracy.
What I can observe from the confusion matrix is the fact that the model predict based on the parameters sometimes most of the lines as positives and sometimes most of the times as negatives.
Any suggestion why this issue happens? And which other points (other than input size and hidden layer size) might impact the accuracy of the classification?
Thanks
It's a bit hard to guess given the information you provide.
Are the labels balanced (50% positives, 50% negatives)? So this would mean your network is not training at all as your performance corresponds to the random performance, roughly. Is there maybe a bug in the preprocessing? Or is the task too difficult? What is the training set size?
I don't believe that the number of neurons is the issue, as long as it's reasonable, i.e. hundreds or a few thousand.
Alternatively, you can try another loss function, namely cross entropy, which is standard for multi-class classification and can also be used for binary classification:
https://www.tensorflow.org/api_docs/python/nn/classification#softmax_cross_entropy_with_logits
Hope this helps.
The data set is well balanced, 50% positive and negative.
The training set shape is (411426,X)
The training set shape is (68572,X)
X is the number of the feature coming from word2vec and I try with the values between [100,300]
I have 1 hidden layer, and the number of neurons that I test varied between [100,300]
I also test with mush smaller features/neurons size: 2-20 features and 10 neurons on the hidden layer.
I use also the cross entropy as cost fonction.

Neural network parameter selection

I am looking at (two-layer) feed-forward Neural Networks in Matlab. I am investigating parameters that can minimise the classification error.
A google search reveals that these are some of them:
Number of neurons in the hidden layer
Learning Rate
Momentum
Training type
Epoch
Minimum Error
Any other suggestions?
I've varied the number of hidden neurons in Matlab, varying it from 1 to 10. I found that the classification error is close to 0% with 1 hidden neuron and then grows very slightly as the number of neurons increases. My question is: shouldn't a larger number of hidden neurons guarantee an equal or better answer, i.e. why might the classification error go up with more hidden neurons?
Also, how might I vary the Learning Rate, Momentum, Training type, Epoch and Minimum Error in Matlab?
Many thanks
Since you are considering a simple two layer feed forward network and have already pointed out 6 different things you need to consider to reduce classification errors, I just want to add one thing only and that is amount of training data. If you train a neural network with more data, it will work better. Note that, training with large amount of data is a key to get good outcome from neural networks, specially from deep neural networks.
Why the classification error goes up with more hidden neurons?
Answer is simple. Your model has over-fitted the training data and thus resulting in poor performance. Note that, if you increase the number of neurons in hidden layers, it would decrease training errors but increase testing errors.
In the following figure, see what happens with increased hidden layer size!
How may I vary the Learning Rate, Momentum, Training type, Epoch and Minimum Error in Matlab?
I am expecting you have already seen feed forward neural net in Matlab. You just need to manipulate the second parameter of the function feedforwardnet(hiddenSizes,trainFcn) which is trainFcn - a training function.
For example, if you want to use gradient descent with momentum and adaptive learning rate backpropagation, then use traingdx as the training function. You can also use traingda if you want to use gradient descent with adaptive learning rate backpropagation.
You can change all the required parameters of the function as you want. For example, if you want to use traingda, then you just need to follow the following two steps.
Set net.trainFcn to traingda. This sets net.trainParam to traingda's default parameters.
Set net.trainParam properties to desired values.
Example
net = feedforwardnet(3,'traingda');
net.trainParam.lr = 0.05; % setting the learning rate to 5%
net.trainParam.epochs = 2000 % setting number of epochs
Please see this - gradient descent with adaptive learning rate backpropagation and gradient descent with momentum and adaptive learning rate backpropagation.

Convolution Neural Network for image detection/classification

So here is there setup, I have a set of images (labeled train and test) and I want to train a conv net that tells me whether or not a specific object is within this image.
To do this, I followed the tensorflow tutorial on MNIST, and I train a simple conv net reduced to the area of interest (the object) which are training on image of size 128x128. The architecture is as follows : successively 3 layers consisting of 2 conv layers and 1 max pool down-sampling layers, and one fully connected softmax layers (with two class 0 and 1 whether the object is present or not)
I impleted it using tensorflow, and this works quite well, but since I have enough computing power I was wondering how I could improve the complexity of the classification:
- adding more layers ?
- adding more channel at each layer ? (currently 32,64,128 and 1024 for the fully connected)
- anything else ?
But the most important part is that now I want to detect this same object on larger images (roughle 600x600 whereas the size of the object should be around 100x100).
I was wondering how I could use the previously training "small" network used for small images, in order to pretrained a larger network on the large images ? One option could be to classify the image using a slicing window of size 128x128 and scan the whole image but I would like to try if possible to train a whole network on it.
Any suggestion on how to proceed ? Or an article / ressource tackling this kind of problem ? (I am really new to deep learning so sorry if this is stupid question...)
Thanks !
I suggest that you continue reading on the field overall. Your search keys include CNN, image classification, neural net, AlexNet, GoogleNet, and ResNet. This will return many articles, on-line classes and lectures, and other materials to help you learn about classification with neural nets.
Don't just add layers or filters: the complexity of the topology (net design) must be fitted to the task; a net that's too complex will over-fit the training data. The one you've been using is probably LeNet; the three I cite above are for the ImageNet image classification contest.
Since you are working on images, I would suggest you to use a pretrained image classification network (like VGG, Alexnet etc.)and fine tune this network with your 128x128 image data. In my experience until we have very large data set fine tuned network will give more accuracy and also save training time. After building a good image classifier on your data set you can use any popular algorithm to generate region of proposal from the image. Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object is present in that region. Otherwise it's not. If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.

How to improve digit recognition prediction in Neural Networks in Matlab?

I've made digit recognition (56x56 digits) using Neural Networks, but I'm getting 89.5% accuracy on test set and 100% on training set. I know that it's possible to get >95% on test set using this training set. Is there any way to improve my training so I can get better predictions? Changing iterations from 300 to 1000 gave me +0.12% accuracy. I'm also file size limited so increasing number of nodes can be impossible, but if that's the case maybe I could cut some pixels/nodes from the input layer.
To train I'm using:
input layer: 3136 nodes
hidden layer: 220 nodes
labels: 36
regularized cost function with lambda=0.1
fmincg to calculate weights (1000 iterations)
As mentioned in the comments, the easiest and most promising way is to switch to a Convolutional Neural Network. But with you current model you can:
Add more layers with less neurons each, which increases learning capacity and should increase accuracy by a bit. Problem is that you might start overfitting. Use regularization to counter this.
Use batch Normalization (BN). While you are already using regularization, BN accelerates training and also does regularization, and is a NN specific algorithm that might work better.
Make an ensemble. Train several NNs on the same dataset, but with a different initialization. This will produce slightly different classifiers and you can combine their output to get a small increase in accuracy.
Cross-entropy loss. You don't mention what loss function you are using, if its not Cross-entropy, then you should start using it. All the high accuracy classifiers use cross-entropy loss.
Switch to backpropagation and Stochastic Gradient Descent. I do not know the effect of using a different optimization algorithm, but backpropagation might outperform the optimization algorithm you are currently using, and you could combine this with other optimizers such as Adagrad or ADAM.
Other small changes that might increase accuracy are changing the activation functions (like ReLU), shuffle training samples after every epoch, and do data augmentation.

Continuously train MATLAB ANN, i.e. online training?

I would like to ask for ideas what options there is for training a MATLAB ANN (artificial neural network) continuously, i.e. not having a pre-prepared training set? The idea is to have an "online" data stream thus, when first creating the network it's completely untrained but as samples flow in the ANN is trained and converges.
The ANN will be used to classify a set of values and the implementation would visualize how the training of the ANN gets improved as samples flows through the system. I.e. each sample is used for training and then also evaluated by the ANN and the response is visualized.
The effect that I expect is that for the very first samples the response of the ANN will be more or less random but as the training progress the accuracy improves.
Any ideas are most welcome.
Regards, Ola
In MATLAB you can use the adapt function instead of train. You can do this incrementally (change weights every time you get a new piece of information) or you can do it every N-samples, batch-style.
This document gives an in-depth run-down on the different styles of training from the perspective of a time-series problem.
I'd really think about what you're trying to do here, because adaptive learning strategies can be difficult. I found that they like to flail all over compared to their batch counterparts. This was especially true in my case where I work with very noisy signals.
Are you sure that you need adaptive learning? You can't periodically re-train your NN? Or build one that generalizes well enough?