I am trying to understand the caffe library. For that I run through step by step for feature_extraction.cpp and classification.cpp.
In those cpp files, I found out layers, prototxt file, caffemodel, net.cpp, caffe.pb.cc, caffe.pb.hfiles.
I know caffe is formed using different layers. So those layer files inside layer folder are used.
prototxt file is meant for the structure of a particular network such as googlenet, alexnet etc. Different net has different structure.
caffemodel is the trained model using caffe library for a specific type of net structure.
What do those net.cpp, caffe.pb.cc do? I mean how to understand their roles in forming this caffe deep learning network.
You understand correctly that caffe implements deep learning by stacking "layers" one on top of the other to form a "net".
'net.cpp'
Each layer works as a "functional block" and its behavior/implementation is defined in src/caffe/layers/<layer>.cpp, src/caffe/layers/<layer>.cu and include/caffe/layers/<layer>.hpp.
The code that actually "stack" all the layers into a net can be found (mostly) in net.cpp.
'caffe.pb.h', 'caffe.pb.cc'
In order to define the specific structure of a specific deep net architecture (e.g., AlexNet, GoogLeNet, ResNet etc.) caffe uses protocol-buffers library. The specific format of caffe protocol buffer is defined in src/caffe/proto/caffe.proto. The caffe.proto is "compiled" using google protobuffer compiler to produce 'caffe.pb.h' and 'caffe.pb.cc' c++ code for parsing and processing caffe prototxt and caffemodel files.
Related
I'm working on matlab and try to use the pretrained model cited above as feature extractor. In Alexnet and vggnet the fully connected layer is clear which named 'fc7' but in googlenet/resnet50/resnet101/inception v2 v3 it is not clear, could someone guide me? also what is the size of features in these models because in alexnet for example is 4096?
In any CNN, the fully connected layer can be spotted looking at the end of the network, as it processes the features extracted by the Convolutional Layer. If you access
net.Layers, you see that matlab calls the fully connected layer "Fully Connected" (which in ResNet 50 is fc1000). It is also followed by a softmax and a classification output.
The size of the classification layers depends on the Convolutional layer used for features extraction. In alexnet, different fully connected layers are stacked (fc6,fc7,fc8). I think that you can find the matrix extracted (therefore the features), by flattening the output before the first fully connected layer. In this case before fc1000
i have a "simple" problem after loading Keras model.
During training my network has following structure
model.summary()
Output from command line after defining architecture
After training + saving + loading my network appears to have following structure:
Output from command line after loading
Unfortunately I'm unable to find the way to access the sequential layer. My goal is to visualize the feature maps which are within the sequential layer.
I will provide code in case it will be necessary.
Thank you,
Vaclav
I used the Core ML Converter to convert a Caffe AlexNet model to a Core ML model. The model works just fine and outputs correct classification results. However, I do not know how to access the output of a layer inside the CNN model. Say for example I want to know what the output of one of the convolution layer (e.g. conv5) is. Caffe let you do so easily, but I could not find documentation on how to do this using Core ML.
Does Core ML allow access to outputs of layers inside the CNN model like Caffe does?
I have some questions about how to actually interact with a pre-trained Caffe model. In my case I'm using a model for scene recognition.
In the caffe git repository, there are some code examples in Python and C++ on the implementations of Image Classifiers. However, those do not apply to my use case (since they only classify the input image as ONE class).
My goal is an application that takes an input image (jpg) and outputs the highest predicted class label for each pixel in the input image (e.i., indices for sky, beach, road, car).
Could anyone give me some pointers on how to proceed?
There already seem to exist implementations for this. This demo (http://places.csail.mit.edu/demo.html) is kind of what I what.
Thank you!
What you are looking for is not image classification, but rather semantic segmentation.
A recent work, by Jonathan Long, Evan Shelhamer and Trevor Darrell is based on Caffe, and can be found here. It uses fully convolutional network, that is, a network with no "InnerProduct" layers only convolutional layers, thus capable of producing outputs with different sizes for different sizes of inputs.
I was wondering if anyone has managed to use the OpenCV implementation of Latent SVM Detector (http://docs.opencv.org/modules/objdetect/doc/latent_svm.html) successfully. There is a sample code that shows how to utilize the library but the problem is that the sample code uses a ready-made detector model that was generated using MatLab. Can some one guide me through the steps on how to generate my own detector model?
The MATLAB implementation of LatSVM by the authors of the paper has a train script called pascal. There is a README with the tarball explaining its usage:
Using the learning code
=======================
1. Download and install the 2006-2011 PASCAL VOC devkit and dataset.
(you should set VOCopts.testset='test' in VOCinit.m)
2. Modify 'voc_config.m' according to your configuration.
3. Start matlab.
4. Run the 'compile' function to compile the helper functions.
(you may need to edit compile.m to use a different convolution
routine depending on your system)
5. Use the 'pascal' script to train and evaluate a model.
example:
>> pascal('bicycle', 3); % train and evaluate a 6 component bicycle model
The learning code saves a number of intermediate models in a model cache
directory defined in 'voc_config.m'.
For more information, visit the authors website. The page also contain the paper of this method.