How can i reduce U-Net parameters? - image-segmentation

I need to implement an U-net for semantic segmentation task. Is it possible decrease the amount of parameters in U-Net with a reduction of image size? For example from (256,256,3) to (32,32,3). Or are there others ways?

For a fully convolutional architecture, the number of parameters is independent of the input size: filter sizes are fixed. They do not change with image size, only the computed activation maps.
If you want to reduce the model size, you can:
Decrease the kernel size of the convolutions.
Reduce the number of filters (out_channels) in the conv layers.
Apply group convolution instead of the regular ones.
Note that reducing the number of parameters (model size) does not always mean reducing the number of FLOPS required for evaluating the model. With convolutional networks, the number of ops required for evaluation heavily depends on the input size.

Related

Choosing a margin for contrastive loss in a siamese network

I'm building a siamese network for a metric-learning task, using a contrastive loss function, and I'm uncertain on how to set the 'margin' hyperparameter for the loss.
My inputs to the loss function are currently 1024-dimension dense embeddings from an RNN layer - Does the dimensionality of that input affect how I pick a margin? Should I use a dense layer to project it to a lower-dimensional space first? Any pointers on how to pick a specific margin value (or any relevant research) would be really appreciated! In case it matters, I'm using PyTorch.
You don't need to project it to a lower dimensional space.
The dependence of the margin with the dimensionality of the space depends on how the loss is formulated: If you don't normalize the embedding values and compute a global difference between vectors, the right margin will depend on the dimensionality. But if you compute a normalize difference, such as cosine distance, the margin values won't depend on the dimensionality of the embedding space.
Here ranking (or contrastive) losses are explained, it might be useful https://gombru.github.io/2019/04/03/ranking_loss/

Convolutional Neural Networks filter

When coding a convolutional neural network I am unsure of where to start with the convolutional layer. When different convolutional filters are used to produce different feature maps does that mean that the filters have different sizes (for example, 3x3, 2x2 etc.) ?
In most examples which is a good indication of how to go about coding a convolutional neural network, you will find to start with 1 convolutional layer and pass layer sizes, 3x3 window, input data features.
model.add(Conv2D(layer_size, (3,3), input_shape = x.shape[1:]))
The filter sizes usually only differ in the max pooling layer e.g 2x2.
model.add(MaxPooling2D(pool_size=(2,2)))
Layer sizes are usually selected from range layer_size = [32, 64,128] and you can do the same to experiment with different convolution_layers = [1,2,3]
I've never seen different kernel sizes for the filters in the same layer, although it is possible to do so, is not a default option the frameworks I have used. What makes filters yield different feature maps are the weights.
Along different layers different kernel sizes are used because the idea of the convolutional networks is to gradually reduce dimensionality through downsampling layers (max pooling for example), so in deep levels you have smaller feature maps and a smaller filter keeps it convolutional and less fully connected (having a kernel the same size as the image is equivalent to have a dense layer).
If you're starting with convolutionals I recommend you to play with this interactive visualization of a CNN, it helped me with a lot of concepts.

TensorFlow: Binary classification accuracy

In the context of a binary classification, I use a neural network with 1 hidden layer using a tanh activation function. The input is coming from a word2vect model and is normalized.
The classifier accuracy is between 49%-54%.
I used a confusion matrix to have a better understanding on what’s going on. I study the impact of feature number in input layer and the number of neurons in the hidden layer on the accuracy.
What I can observe from the confusion matrix is the fact that the model predict based on the parameters sometimes most of the lines as positives and sometimes most of the times as negatives.
Any suggestion why this issue happens? And which other points (other than input size and hidden layer size) might impact the accuracy of the classification?
Thanks
It's a bit hard to guess given the information you provide.
Are the labels balanced (50% positives, 50% negatives)? So this would mean your network is not training at all as your performance corresponds to the random performance, roughly. Is there maybe a bug in the preprocessing? Or is the task too difficult? What is the training set size?
I don't believe that the number of neurons is the issue, as long as it's reasonable, i.e. hundreds or a few thousand.
Alternatively, you can try another loss function, namely cross entropy, which is standard for multi-class classification and can also be used for binary classification:
https://www.tensorflow.org/api_docs/python/nn/classification#softmax_cross_entropy_with_logits
Hope this helps.
The data set is well balanced, 50% positive and negative.
The training set shape is (411426,X)
The training set shape is (68572,X)
X is the number of the feature coming from word2vec and I try with the values between [100,300]
I have 1 hidden layer, and the number of neurons that I test varied between [100,300]
I also test with mush smaller features/neurons size: 2-20 features and 10 neurons on the hidden layer.
I use also the cross entropy as cost fonction.

How to remove unwanted connections from an trained caffe model?

I have trained a fastercnn model to detect human faces in an image using caffe. My current model size is 530MB. I wanted to reduce the size of my model, so I came accross Deep Compression By Song Han.
I've updated the less significant weights with 0 in my model using Pycaffe. The model size isn't reduced now, how to remove those insignificant connections from the trained caffe model, so that the size of the model is reduced?
Since Blob data type in caffe (the basic "container" of numerical arrays) does not support "sparse" representation, replacing weights with zeros does not change the storage complexity: caffe still needs space to store these zeros. This is why you do not see a reduction in model size.
In order to prune connections you have to ensure the zeros follow a certain pattern: For example, an entire row of an "InnerProduct" is zero - you can eliminate one dimension of the previous layer, etc.
These modification can be made carefully manually using net surgery. Read more about it here (this example is actually on adding connections, but you can apply the same steps to prune connections).
You might find the SVD "trick" useful for reducing model complexity.
#Shai 's answer explains well why your model size wasn't reduced.
As a supplement, to make the weights more sparse to obtain model compression in size, you can try the caffe for Structurally Sparse Deep Neural Networks.
Its main idea is to add in the loss function some regularizers, which in fact are L2-norms of weights grouped by row, column or channel etc(assume the weights from a layer has a shape (num_out, channel, row, column)). During training, these regularizers can make weights within the same group decay uniformly and thus the weights become more sparse and it's more easy to eliminate the weights in a whole row or column or even a whole channel.

Issues with neural network

I am having some issues with using neural network. I am using a non linear activation function for the hidden layer and a linear function for the output layer. Adding more neurons in the hidden layer should have increased the capability of the NN and made it fit to the training data more/have less error on training data.
However, I am seeing a different phenomena. Adding more neurons is decreasing the accuracy of the neural network even on the training set.
Here is the graph of the mean absolute error with increasing number of neurons. The accuracy on the training data is decreasing. What could be the cause of this?
Is it that the nntool that I am using of matlab splits the data randomly into training,test and validation set for checking generalization instead of using cross validation.
Also I could see lots of -ve output values adding neurons while my targets are supposed to be positives. Could it be another issues?
I am not able to explain the behavior of NN here. Any suggestions? Here is the link to my data consisting of the covariates and targets
https://www.dropbox.com/s/0wcj2y6x6jd2vzm/data.mat
I am unfamiliar with nntool but I would suspect that your problem is related to the selection of your initial weights. Poor initial weight selection can lead to very slow convergence or failure to converge at all.
For instance, notice that as the number of neurons in the hidden layer increases, the number of inputs to each neuron in the visible layer also increases (one for each hidden unit). Say you are using a logit in your hidden layer (always positive) and pick your initial weights from the random uniform distribution between a fixed interval. Then as the number of hidden units increases, the inputs to each neuron in the visible layer will also increase because there are more incoming connections. With a very large number of hidden units, your initial solution may become very large and result in poor convergence.
Of course, how this all behaves depends on your activation functions and the distributio of the data and how it is normalized. I would recommend looking at Efficient Backprop by Yann LeCun for some excellent advice on normalizing your data and selecting initial weights and activation functions.