I have trained a fastercnn model to detect human faces in an image using caffe. My current model size is 530MB. I wanted to reduce the size of my model, so I came accross Deep Compression By Song Han.
I've updated the less significant weights with 0 in my model using Pycaffe. The model size isn't reduced now, how to remove those insignificant connections from the trained caffe model, so that the size of the model is reduced?
Since Blob data type in caffe (the basic "container" of numerical arrays) does not support "sparse" representation, replacing weights with zeros does not change the storage complexity: caffe still needs space to store these zeros. This is why you do not see a reduction in model size.
In order to prune connections you have to ensure the zeros follow a certain pattern: For example, an entire row of an "InnerProduct" is zero - you can eliminate one dimension of the previous layer, etc.
These modification can be made carefully manually using net surgery. Read more about it here (this example is actually on adding connections, but you can apply the same steps to prune connections).
You might find the SVD "trick" useful for reducing model complexity.
#Shai 's answer explains well why your model size wasn't reduced.
As a supplement, to make the weights more sparse to obtain model compression in size, you can try the caffe for Structurally Sparse Deep Neural Networks.
Its main idea is to add in the loss function some regularizers, which in fact are L2-norms of weights grouped by row, column or channel etc(assume the weights from a layer has a shape (num_out, channel, row, column)). During training, these regularizers can make weights within the same group decay uniformly and thus the weights become more sparse and it's more easy to eliminate the weights in a whole row or column or even a whole channel.
Related
Suppose we have a set of images and labels meant for a machine-learning classification task. The problem is that these images come with a relatively short retention policy. While one could train a model online (i.e. update it with new image data every day), I'm ideally interested in a solution that can somehow retain images for training and testing.
To this end, I'm interested if there are any known techniques, for example some kind of one-way hashing on images, which obfuscates the image, but still allows for deep learning techniques on it.
I'm not an expert on this but the way I'm thinking about it is as follows: we have a NxN image I (say 1024x1024) with pixel values in P:={0,1,...,255}^3, and a one-way hash map f(I):P^(NxN) -> S. Then, when we train a convolutional neural network on I, we first map the convolutional filters via f, to then train on a high-dimensional space S. I think there's no need for f to locally-sensitive, in that pixels near each other don't need to map to values in S near each other, as long as we know how to map the convolutional filters to S. Please note that it's imperative that f is not invertible, and that the resulting stored image in S is unrecognizable.
One option for f,S is to use a convolutional neural network on I to then extract the representation of I from it's fully connected layer. This is not ideal because there's a high chance that this network won't retain the finer features needed for the classification task. So I think this rules out a CNN or auto encoder for f.
Can someone help me what to do with a classification, if I get a training and validation error shown in the picture to improve my neural network? I tried to stop the training earlier, so that the validation error is smaller, but it's still too high. I get a validation accuracy of 62.45%, but thats too low. The dataset are images that show objects somewhere in the image (not centered). If I use the same network with the same number of images, but where the shown objects are always centered to the principal point, it works much better with a validation accuracy of 95%,
One can look for following things while implementing the Neural net:
Dataset Issues:
i) Check if the input data you are feeding the network makes sense and is there too much noise in the data.
ii) Try passing random input and see if the error performance persist. If it does, then it's time to make changes in your net.
iii) Check if the input data has appropriate labels.
iv) If the input data is not shuffled and is passed in a specific order of label, leads to negative impact on the learning. So, shuffling of data and label together is necessary.
v) Reduce the batch size and make sure batch don't contain the same label.
vi) Too much data augmentation is not good as it has a regularizing effect and when combined with other forms of regularization (weight L2, dropout, etc.) can cause the net to underfit.
vii) Data must be pre-processed as per the requirement of the data. For example, if you are training the network for face classification then the image face without or very any background should be passed to the network for learning.
Implementation Issues:
i) Check your loss function, weight initialization, and gradient checking to make sure the backpropagation works in an appropriate manner.
ii) Visualize the biases, activation, and weights for each layer with help of visualization library like Tensorboard.
iii) Try using dynamic learning rate concept, where the learning rate changes with a designed set of epochs.
iv) Increase the network size by adding more layer or more neurons, as it might not be enough to capture the features of its mark.
In the context of a binary classification, I use a neural network with 1 hidden layer using a tanh activation function. The input is coming from a word2vect model and is normalized.
The classifier accuracy is between 49%-54%.
I used a confusion matrix to have a better understanding on what’s going on. I study the impact of feature number in input layer and the number of neurons in the hidden layer on the accuracy.
What I can observe from the confusion matrix is the fact that the model predict based on the parameters sometimes most of the lines as positives and sometimes most of the times as negatives.
Any suggestion why this issue happens? And which other points (other than input size and hidden layer size) might impact the accuracy of the classification?
Thanks
It's a bit hard to guess given the information you provide.
Are the labels balanced (50% positives, 50% negatives)? So this would mean your network is not training at all as your performance corresponds to the random performance, roughly. Is there maybe a bug in the preprocessing? Or is the task too difficult? What is the training set size?
I don't believe that the number of neurons is the issue, as long as it's reasonable, i.e. hundreds or a few thousand.
Alternatively, you can try another loss function, namely cross entropy, which is standard for multi-class classification and can also be used for binary classification:
https://www.tensorflow.org/api_docs/python/nn/classification#softmax_cross_entropy_with_logits
Hope this helps.
The data set is well balanced, 50% positive and negative.
The training set shape is (411426,X)
The training set shape is (68572,X)
X is the number of the feature coming from word2vec and I try with the values between [100,300]
I have 1 hidden layer, and the number of neurons that I test varied between [100,300]
I also test with mush smaller features/neurons size: 2-20 features and 10 neurons on the hidden layer.
I use also the cross entropy as cost fonction.
I am having some issues with using neural network. I am using a non linear activation function for the hidden layer and a linear function for the output layer. Adding more neurons in the hidden layer should have increased the capability of the NN and made it fit to the training data more/have less error on training data.
However, I am seeing a different phenomena. Adding more neurons is decreasing the accuracy of the neural network even on the training set.
Here is the graph of the mean absolute error with increasing number of neurons. The accuracy on the training data is decreasing. What could be the cause of this?
Is it that the nntool that I am using of matlab splits the data randomly into training,test and validation set for checking generalization instead of using cross validation.
Also I could see lots of -ve output values adding neurons while my targets are supposed to be positives. Could it be another issues?
I am not able to explain the behavior of NN here. Any suggestions? Here is the link to my data consisting of the covariates and targets
https://www.dropbox.com/s/0wcj2y6x6jd2vzm/data.mat
I am unfamiliar with nntool but I would suspect that your problem is related to the selection of your initial weights. Poor initial weight selection can lead to very slow convergence or failure to converge at all.
For instance, notice that as the number of neurons in the hidden layer increases, the number of inputs to each neuron in the visible layer also increases (one for each hidden unit). Say you are using a logit in your hidden layer (always positive) and pick your initial weights from the random uniform distribution between a fixed interval. Then as the number of hidden units increases, the inputs to each neuron in the visible layer will also increase because there are more incoming connections. With a very large number of hidden units, your initial solution may become very large and result in poor convergence.
Of course, how this all behaves depends on your activation functions and the distributio of the data and how it is normalized. I would recommend looking at Efficient Backprop by Yann LeCun for some excellent advice on normalizing your data and selecting initial weights and activation functions.
In Matlab (Neural Network Toolbox + Image Processing Toolbox), I have written a script to extract features from images and construct a "feature vector". My problem is that some features have more data than others. I don't want these features to have more significance than others with less data.
For example, I might have a feature vector made up of 9 elements:
hProjection = [12,45,19,10];
vProjection = [3,16,90,19];
area = 346;
featureVector = [hProjection, vProjection, area];
If I construct a Neural Network with featureVector as my input, the area only makes up 10% of the input data and is less significant.
I'm using a feed-forward back-propogation network with a tansig transfer function (pattern-recognition network).
How do I deal with this?
When you present your input data to the network, each column of your feature vector is fed to the input layer as an attribute by itself.
The only bias you have to worry about is the scale of each (ie: we usually normalize the features to the [0,1] range).
Also if you believe that the features are dependent/correlated, you might want to perform some kind of attribute selection technique. And in your case it depends one the meaning of the hProj/vProj features...
EDIT:
It just occurred to me that as an alternative to feature selection, you can use a dimensionality reduction technique (PCA/SVD, Factor Analysis, ICA, ...). For example, factor analysis can be used to extract a set of latent hidden variables upon which those hProj/vProj depends on. So instead of these 8 features, you can get 2 features such that the original 8 are a linear combination of the new two features (plus some error term). Refer to this page for a complete example