Large Neural Network Pruning - neural-network

I have done some experiments on neural network pruning, but only on small models. I used to prune the relevant weights as follows (similarly as it is explained in the official tutorial https://pytorch.org/tutorials/intermediate/pruning_tutorial.html):
for name,module in model.named_modules():
if 'layer' in name:
parameters_to_prune.append((getattr(model, name),'weight'))
prune.global_unstructured(
parameters_to_prune,
pruning_method=prune.L1Unstructured,
amount=sparsity_constant,
)
The main problem in doing this, is that I have to define a list (or tuple) of layers to prune. This works when I define my model by hands and I know the name of different layers (for example, in the code provided, I was aware of the fact that all the fully connected layers, had the string "layer" in their name.
How can I avoid this process, and define a pruning method that prunes all the parameters of a given model, without having to call the layers by name?
All in all, I'm looking for a function that, given a model and a constant of sparsity, globally prunes the given model (by masking it):
model = models.ResNet18()
function_that_prunes(model, sparsity_constant)

Related

Calculate number of parameters in neural network

I am wondering would the number of parameters in the models like ResNet18, Vgg16, and DenseNet201 would change if we change the input size to the model?
I did measure the number of parameters with the following command
pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
Also, I have tried this snippet, and the number of parameters did not change for different input size
import torchvision.models as models
model= models.resnet18(pretrained = False)
model.cuda()
summary(model, (1,64,64))
No it would not. Parameters of a model have the purpose of processing the input as it propagates inside the network pipeline.
The parameters are mostly trained to serve their purpose, which is defined by the training task. Consider a increase in number of parameters based on the input? What would their values be? Would they be random? How would this new parameters with new values affect the inference of the model?
Such a sudden, random change to the fine-tuned, well-trained parameters of the model would be impractical. Maybe there are some other algorithms that I am unaware of, that change their parameter collection based on input. But the architectures that have been mentioned in question do not support such functionality.
Traninable parameters do not change with the change in input. If you see the weights in first layer of the model with the command list(model.parameters())[0].shape you can realize that it does not depend on the height and width of the input, but it depends on the number of channels(e.g Gray, RGB, HyperSpectral), which usually is very insignificant in bigger models. For further information about getting the input shape, you can see this toy example.

Predictions using Convolutional Neural Networks and DL4J

This is my first time working with DL4J (Deep Learning for Java) and also my first Convolutional Neural Network. My Goal is to use the Convolutional Neural Netowrk to give me some predicted values about an image. I gathered and labelled my images myself. The labels or expected outputs consist of two numbers between 0 and 1 (I just wrote them in the file name like 0.01x0.87.jpg).
Now I can't find any way to use the DataSetIterator Class which DL4J uses so that I can also set my label values.
Is there a simple way to tell DL4J that I want to train my Network to recognize that image 0.01x0.01.jpg should spit out the values 0.01 and 0.01?
What you want to do is usually known as regression. In contrast to classification where you want to either have a 0 or 1 output, in regression any value can be the target.
In your case, you will likely want to use a network architecture that uses either a sigmoid (which forces your values to be between 0 and 1) or an identity (which keeps the values as is, i.e. allows for them to be outside of the 0 to 1 range) activation function.
As you have two values that you are trying to predict, you will have to also define that you are using two outputs.
So much for your model architecture.
For data loading, you can use the ImageRecordReader, but also pass it a PathMultiLabelGenerator of your own. When you implement the PathMultiLabelGenerator interface, you will get the full path of the image as a string, and you can do whatever you want with it, like for example remove the file ending, split on x and parse your filename into a list of DoubleWritable. DoubleWritable is just a simple wrapper class for double so creating that is as easy as just instantiating it by passing the actual value to the constructor.
To create a dataset iterator you can now follow the documentation on RecordReaderDataSetIterator.

How to create a "Denoising Autoencoder" in Matlab?

I know Matlab has the function TrainAutoencoder(input, settings) to create and train an autoencoder. The result is capable of running the two functions of "Encode" and "Decode".
But this is only applicable to the case of normal autoencoders. What if you want to have a denoising autoencoder? I searched and found some sample codes, where they used the "Network" function to convert the autoencoder to a normal network and then Train(network, noisyInput, smoothOutput)like a denoising autoencoder.
But there are multiple missing parts:
How to use this new network object to "encode" new data points? it doesn't support the encode().
How to get the "latent" variables to the features, out of this "network'?
I appreciate if anyone could help me resolve this issue.
Thanks,
-Moein
At present (2019a), MATALAB does not permit users to add layers manually in autoencoder. If you want to build up your own, you will have start from the scratch by using layers provided by MATLAB;
In order to to use TrainNetwork(...) to train your model, you will have you find out a way to insert your data into an object called imDatastore. The difficulty for autoencoder's data is that there is NO label, which is required by imDatastore, hence you will have to find out a smart way to avoid it--essentially you are to deal with a so-called OCC (One Class Classification) problem.
https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.imagedatastore.html
Use activations(...) to dump outputs from intermediate (hidden) layers
https://www.mathworks.com/help/deeplearning/ref/activations.html?searchHighlight=activations&s_tid=doc_srchtitle
I swang between using MATLAB and Python (Keras) for deep learning for a couple of weeks, eventually I chose the latter, albeit I am a long-term and loyal user to MATLAB and a rookie to Python. My two cents are that there are too many restrictions in the former regarding deep learning.
Good luck.:-)
If you 'simulation' means prediction/inference, simply use activations(...) to dump outputs from any intermediate (hidden) layers as I mentioned earlier so that you can check them.
Another way is that you construct an identical network but with the encoding part only, copy your trained parameters into it, and feed your simulated signals.

Object detection for a single object only

I have been working with object detection. But these methods consist of very deep neural networks and require lots of memory to store the trained models. E.g. I once tried to train a Mask R-CNN model, and the weights take 200 MB.
However, my focus is on detecting a single object only. So, I guess these methods are not suitable. Are there any object detection method that can do this job with a low memory requirement?
You can try SSD or faster RCNN they are easily available in Tensorflow object detection API
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
here you can get pre-trained models and config file
you can select your model by taking look on speed and mAP(accuracy) column as per your requirement.
Following mukul's answer, I specifically recommend you check out SSDLite-MobileNetV2.
It's a lite-weight model, which is still enough expressive for good results.
Especially when you're restricting yourself to a single class, as you can see in the example of FaceSSD-MobileNetV2 as in here (Note however this is vanilla SSD).
So you can simply Take the pre-trained model of SSDLite-MobileNetV2 with the corresponding config file, and modify it for a single class.
This means changing num_classes to 1, modifying the label_map.pbtxt, and of course - preparing the dataset with the single class you want.
If you want a more robust model, but which has no pre-trained mode, you can use an FPN version.
Checkout this config file, which is with MobileNetV1, and modify it for your needs (e.g. switching to MobileNetV2, switching to use_depthwise, etc).
On one hand, there's no detection pre-trained model, but on the other the detection head is shared over all (relevant) scales, so it's somewhat easier to train.
So simply fine-tune it from the corresponding classification checkpoint from here.

Evaluating neural networks built with Comp Graph dl4j

I am trying to build a complex neural network using Computation Graph implementation in Deeplearning4J. I need to have multiple outputs so that's why I can't go with the generic MultiLayerConfiguration.
However, my problem is that in this case I do not know how to do the evaluation of my model and I would like at least to know the accuracy.
Has anybody worked with Comp Graphs in dl4j?
First of all yes: tons of people use computation graph. They usually start from our existing examples though and tend to mainly use it for things like seq2seq.
As for your question on evaluation, it's conceptually the same as multi layer network. How you evaluate is likely going to be task specific though. If you think about where evaluation happens, it's always tied to a task (classification,regression,binary classification,..) with an output layer . In the most common case usually you only have 1 output which outputs a classification. In that case you can just use the first array it outputs.
Otherwise for multiple outputs..you'd have to define what you're evaluating. Usually tasks merge to 1 path.
If they don't, you'd have multiple output layers where you want to do an evaluation object per output.
Computation graphs and multi layer network both use a .output method to give you raw arrays. That is typically what you pass to eval.eval.