How to add a custom layer and loss function into a pretrained CNN model by matconvnet? - loss

I'm new to matconvnet. Recently, I'd like to try a new loss function instead of the existing one in pretrained model, e.g., vgg-16, which usually uses softmax loss layer. What's more, I want to use a new feature extractor layer, instead of pooling layer or max layer. I know there are 2 CNN wrappers in matconvnet, simpleNN and DagNN respectively, since I'm using vgg-16,a linear model which has a linear sequence of building blocks. So, in simpleNN wrapper, how to create a custom layer in detail, espectially the procedure and the relevant concept, e.g., do I need to remove layers behind the new feature extractor layer or just leave them ? And I know how to compute the derivative of the loss function so the details of computation inside the layer is not that important in this question, I just want to know the procedure represented by codes. Could someone help me? I'll appreciate it a lot !

You can remove the older error or objective layer
net.layer('abc')=[];
and you can add new error code in vl_nnloss() file

Related

how to improve TensorFlow object detection model?

I need to diagnosis captcha for a project. I did this using the object_detection provided by Tensorflow.
also, I added 500 captcha samples by turning images into XML by LabelImg and then to TFRecord.
beside I used "faster_rcnn_inception_v2_coco_2018_01_28"
The problem is that the accuracy of the machine is very low.
My questions are:
Can the problem be solved by increasing the number of training data?
Should I change my algorithm?
How effective is the use of the Yolo 3 instead of the detection object provided by Tensorflow?
Q. Can the problem be solved by increasing the number of training data?
A. It would be depend on how many data you can get more. I think that only increasing the number of training data is not good approach.
Consider using Fine-tuning existing trained model to detect object class. If you want to fine-tune the model, you need to be careful class label assignment because existing trained model like YOLO3, Faster RCNN, etc. has no label "captcha" in their training dataset.
I recommend you to refer to this website that can help you to fine-tune the model.
Q. Should I change my algorithm?
A. Do as you wish.
Q. How effective is the use of the Yolo 3 instead of the detection object provided by Tensorflow?
A. In my opinion, two different models are much the same if you don't need to consider inference time.

create deep network in matlab with logsig layer instead of softmax layer

I want to create a deep classification net, but my classes aren't mutually exclusive (that is what sofmaxlayer do).
Is it possible to define a non mutually exclusive classification layer (i.e., a data can be in more than one class)?
One way to do it, it would be with a logsig function in the classification layer, instead of a softmax, but I have no idea how to acomplish that....
In CNN you can have multiple class in last layer as you know. But if I understand correctly your need in last layer an out put with that is in a range of numbers instead of 1 or 0 for each class. Its mean you need regression. If your labels support this task it's OK and you can do it with regression just like what happen in bounding box regression for localization. And you don't need soft-max in last layer. just use other activation functions that produce sufficient out put for your task.

Keras: using VGG16 to detect specific, non-generic item?

I'm learning about using neural networks and object detection, using Python and Keras. My goal is to detect something very specific in an image, let's say a very specific brand / type of car carburetor (part of a car engine).
The tutorials I found so far use the detection of cats and dogs as example, and many of those use a pre-trained VGG16 network to improve performance.
If I want to detect only my specific carburetor, and don't care about anything else in the image, does it make sense to use VGG16.? Is VGG16 only useful when you want to detect many generic items, rather than one specific item.?
Edit: I only want to know if there is a specific object (carburetor) in the image. No need to locate or put a box around it. I have about 1000 images of this specific carburetor for the network to train on.
VGG16 or some other pretrained neural network is primarily used for classification. That means that you can use it to distinguish in what category the image belongs in.
As i understand, what you need is to detect where in an image a carburetor is located. For something like that you need a different, more complicated approach.
You could use
NVIDIA DetectNet,
YOLO,
Segmentation networks such as U-net or
SegNet, etc...
The VGG 16 can be used for that. (Now is it the best? This is an open question without a clear answer)
But you must replace its ending to fit your needs.
While a regular VGG model has about a thousand classes at its end, a cats x dogs VGG has its end changed to have two classes. In your case, you should change its ending to have only one class.
In Keras, you'd have to load the VGG model with the option include_top = False.
And you should then add your own final Dense layers (two or three dense layers at the end), making sure that the last layer has only one unit: Dense(1, activation='sigmoid').
This will work for "detecting" (yes / no results).
But if your goal is "locating/segmentation", then you should create your own version of a U-net or a SegNet, for instance.

Can I use next layer's output as current layer's input by Keras?

In text generate mission, we usually use model's last output as current input to generate next word. More generalized, I want to achieve a neural network that regards next layer's finally hidden state as current layer's input. Just like the following(what confuses me is the decoder part):
But I have read Keras document and haven't found any functions to achieve it.
Can I achieve this structure by Keras? How?
What you are asking is an autoencoders, you can find similar structures in Keras.
But there are certain details that you should figure it out on your own. Including the padding strategy and preprocessing your input and output data. Your input cannot get dynamic input size, so you need to have a fixed length for input and outputs. I don't know what do you mean by arrows who join in one circle but I guess you can take a look at Merge layer in Keras (basically adding, concatenating, and etc.)
You probably need 4 sequential model and one final model that represent the combined structure.
One more thing, the decoder setup of LSTM (The Language Model) is not dynamic in design. In your model definition, you basically introduce a fixed inputs and outputs for it. Then you prepare the training correctly, so you don't need anything dynamic. Then during the test, you can predict each decoded word in a loop by running the model once predict the next output step and run it again for next time step and so on.
The structure you have showed is a custom structure. So, Keras doesn't provide any class or wrapper to directly build such structure. But YES, you can build this kind of structure in Keras.
So, it looks like you need LSTM model in backward direction. I didn't understand the other part which probably looks like incorporating previous sentence embedding as input to the next time-step input of LSTM unit.
I rather encourage you to work with simple language-modeling with LSTM first. Then you can tweak the architecture later to build an architecture depicted in figure.
Example:
Text generation with LSTM in Keras

Neural Network bias

I'm building a feed forward neural network, and I'm trying to decide how to implement the bias. I'm not sure about two things:
1) Is there any downside to implementing the bias as a trait of the node as opposed to a dummy input+weight?
2) If I implement it as a dummy input, would it be input just in the first layer (from the input to the hidden layer), or would I need a dummy input in every layer?
Thanks!
P.S. I'm currently using 2d arrays to represent weights between layers. Any ideas for other implementation structures? This isn't my main question, just looking for food for thought.
Implementation doesn't matter as long as the behaviour is right.
Yes, it is needed in every layer.
2d array is a way to go.
I'd suggest to include bias as another neuron with constant input 1. This will make it easier to implement - you don't need a special variable for it or anything like that.