Trying to train the ImageNet model with Region CNN (R-CNN) - neural-network

After several month working with caffe, I've been able to train my own models successfully. For example further than my own models, I've been able to train ImageNet with 1000 classes.
In my project now, I'm trying to extract the region of my interest class. After that I've compiled and run the demo of Fast R-CNN and it works ok, but the sample models contains only 20 classes and I'd like to have more classes, for example all of them.
I've already downloaded the bounding boxes of ImageNet, with the real images.
Now, I've gone blank, I can't figure out the next steps and there's not a documentation of how to do it. The only thing I've found is how to train the INRIA person model, and they provide dataset + annotations + python script.
My questions are:
Is there maybe any tutorial or guide that I've missed?
Is there already a model trained with 1000 classes able to classify images and extract the bounding boxes?
Thank you very much in advance.
Regards.
Rafael.

Dr Ross Girshik has done a lot of work on object detection. You can learn a lot from his detailed git on fast RCNN: you should be able to find a caffe branch there, with a demo. I did not use it myself, but it seems very comprehensible.
Another direction you might find interesting is LSDA: using weak supervision to train object detection for many classes.
BTW, have you looked into faster-rcnn?

Related

Predict a number with a given image (0 to 1)

I am a total beginner to ML and Neural networks. I am currently working on a project where I have a lot of pictures stored in a MongoDB database. Each one of those pictures has a number from 0 to 1. For example "picture 1" 0.71.
I want to train my model given the database. The main goal for the project is that after the model is finished and trained, given an image the model will be able to return(predict) a number from 0 to 1. After doing some research and asking a few people I figured out some libraries that would be useful for the project are: Tenserflow and Keras. Some people told me that it is impossible, but I'm not sure therefore I came to ask here.
So my questions are: Is it possible? If so, how can I implement it? Are there any specific tools you recommend? If you specify a way that I should use for my project do I need to export my MongoDB database in a certain form? Since I am a beginner maybe there are some tutorials that you think that can help?
I'm sorry if this question is a bit too general, if there are any misunderstandings please comment and I will try to answer.
Thanks in advance!
What you want to do is totally feasible, this kind of project is called regression, since you are using images data the best type of models are called convolutional neural network (CNN), you'll need some understanding if you want to build your own model. I've done a project where I had to predict a number of bacterial colonies using an image, much like your problem except that I had no boundaries on the predicted values.
What is a CNN ? Here is a link
Basically a CNN will understand the features in the images and will use those features to predict a value.
You won't need to create your own model, most people just use well-designed one in the scientific litterature.
Go for keras, it's the easiest framework out there and work like a charm. Here is how to implement VGG16 (an architecture that is probably the best for your problem) : link
You should follow this tutorial to get going on developing with keras.
Last hint: don't use the same last layer as the one on the VGG16 implementation, use a Dense Layer with one neuron and with a sigmoid/linear/leaky relu activation.
ie:
#model.add(Dense(1000, activation='softmax'))
model.add(Dense(1, activation='sigmoid'))
This means : predict 1 number (sigmoid will bound it between 0 and 1, but maybe lrelu or linear is better)
Also, I guess you could use MongoDB to read the images as arrays, but I would just put the images on a folder.
Edit : When compiling the model, use a mean squared error as in
adam = keras.optimizers.Adam(lr=1e-4)
model.compile(optimizer=adam, loss='mse')
Here you have the "hello world program" in terms of neural networks and digits classification. You can start studying it because I think you will end up with a similar architecture for your NN. What you should focus on is the output of your model, because in this example they are performing classification on 10 classes (digits from 0 to 9) but you are trying to read a real number. You could try to use a single neurone with sigmoid or linear activation at the end of your model.

Best discriminatory method for 1d data with a lot of variance

I have a problem that I have tried to solve using Support Vector Machines (SVMs) to discriminate 1d series of data between two classes. One of the classes have very specific characteristics and are easily distinguishable from a human perspective, the only drawback is that the other class has data with a lot of variation from data sample to data sample, and it looks like it is not feasible to use this as a class at all. I'm only interested in discriminate between data that is from the class of interest (see image under) and all other "uninteresting" data. Then I tried implementing a one class SVM (OC-SVM), and it looks like it works okey but not as well as I had hoped. Therefore I started looking at alternatives, and came across one-class neural networks and Generative Adversarial Networks (GANs) as a possible solution. The Idea is that since the data points that I want to detect has a certain characteristic (see Image under) then an Adversarial network could preform well. I am very new to the field of neural networks and deep learning, so I wanted to ask the community if I am on to something before diving into it. Feel free to come up with alternative methods as well.
Ps: Unsupervised methods and clustering has not worked well solving this problem because of huge variations in the data.
Image of data of interest

How would I find the cost of a neural network without knowing the target output like for a game?

For example,
I want to create an AI that plays Ticktacktoe, this is how I would go about it.
I have 9 input nodes which is for each space on the board, 3 nodes for one hidden layer (which I'm guessing would somehow benefit the AI by having it select a row or column with 3 spaces), and then 9 output nodes to see where the AI would put its mark on the entire board.
I'm lost on how I would find the cost of this neural network because I don't know how I would judge its prediction and affect its weights and biases.
If I wanted the AI to play a guessing game, it would make sense since I have the correct answer and I can teach it to be more accurate based on how off it was to the actual answer.
(NOTE: I am very new to neural networks, so there may be a simple answer that I've missed somewhere)
So, I did some digging around and found a good introduction to reinforcement learning. This is the method that is used to train neural networks to achieve a goal without knowing an exact target like which move is good in a certain scenario. Backpropagation is not the only learning method, but so many sources only used this method without letting the viewer know of any other methods which confused me.
Going through this playlist right now: https://www.youtube.com/watch?v=2pWv7GOvuf0&index=1&list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT
Hope this will help someone getting started with neural networks!

Training a model for Latent-SVM

GOOD MORNING COLLEAGUES!
I am very into train a new model from my own data set of faces!
I have found no information about this topic, then I hope my information could help people and I can get some answers as well.
I will try to explain the steps I have needed to do to train my own model and later on some questions...
I have download the Latent code from: http://cs.brown.edu/~pff/latent-release4/
I have download the PASCAL VOC 2008 code (devkit) from: http://host.robots.ox.ac.uk/pascal/VOC/voc2008/index.html
I have emulate the structure of files/folders of the VOC PASCAL but in my own data set:
Annotations. I have created a .xml where I have defined a object, face, (in each image I only have one face). I didn't define difficulties or poses...
JPEGImages where I have stored all the images
ImageSets where I have defined three files:
test.txt, where I wrote the file name of my positive samples
train.txt, where I wrote the file name of my negative samples
trainval.txt, where I wrote the file name of my positive samples (exactly the same file than test.txt).
I have change some things in globals.m and VOCinit.m (to tell the algorithm the path and the location of some files...)
Then I run the training with the command: pascal('face', 1);
Following these steps I have achieved that the training run completely and doesn't fail and I get my own model BUT I have some doubts...
Can you see anything weird in my explanation? Could it work?
Must the files test.txt/trainval.txt be equal? Why... What does it mean?
Do I have to choose the number of parts I want in the model INSIDE the function?
Please, you imagine I have two kind of samples (frontal faces and side faces) and I want to detect both... How can I address this issue? I thought I have to train a model with two components... but How can I tell to the training code which are frontal or side samples?? In the annotations with the label pose?? (I don't think so...) Are there other way to handle this purpose?
Thank you for your time!!
I hope you can solve my doubts :)
I think test.txt should contain samples (images) that will be used to estimate how good the system is after learning the faces. However, trainval.txt is used during the learning stage (training) to fine-tune the parameters of the model; it is an essential part of supervised learning.
Also, it is very hard to have one single SVM to classify faces that are both frontal and sideways. Here is my suggestion:
Train one SVM to detect if the input image is a frontal face or a sideways face. Call this something like SVM-0.
Train another SVM for frontal faces. This SVM will classify all your individuals. Note, however, that SVM is usually a binary classifier, so make sure you choose the right SVM, one that as a multiclass architecture. Call this SVM-F.
Tran a final SVM for sideways faces. Again, use a multiclass SVM. Call it SVM-S.
Present the input image to SVM-0 and if it detects it is a frontal face, present the input again to SVM-F; otherwise, give the input to SVM-S.
In my experience, you should expect very low performance in SVM-S. It is a hard problem to solve. But frontal faces is not a big deal, unless you are working with faces that vary in pose, illumination, and expression (PIE). Face recognition is affected greatly with PIE variations in the images.
I recommend you this website, it contains very good information and tutorials for starters, with or without experience.

matlab train set UCI

I'm being required to train a perceptron in Matlab to learn a classification data set (any, really). The only restriction is that the data set must come from the UCI Machine Learning Repository. The problem is that I have really no idea where to begin as my teacher is extremely bad at what he does and never explained it well. I've tried asking other class-mates for help but none of them seem to have the answers. I hope I can get help from this community as it's my last chance. Thank you guys.
Well, we're really not your last chance. There are plenty of tutorials, examples, and resources findable easily from Google that would help (for example, search on "MATLAB perceptron iris" - the iris dataset is a famous example dataset, included in the UCI repository).
But here's a start. I'm assuming that if you've been set the task of training a perceptron in MATLAB, then you have access to Neural Network Toolbox (if you're asking how to implement a perceptron algorithm from scratch in MATLAB, look in a textbook).
Type doc nnet. That will bring up the documentation for Neural Network Toolbox. Then click through to the section labelled "Examples". Scrolling down to the bottom, there are several demos in a section called "Perceptrons". Try looking at the demos "Classification with a 2-Input Perceptron" or "Linearly Non-separable Vectors". Those demos use toy datasets, but should give you an idea of how to train a perceptron.
Then scroll up to the section "Pattern Recognition and Classification", and take a look at the demo "Wine Classification". The Wine dataset this demo uses is part of the UCI repository. Adapt and combine the demos you've now learnt from, to create an example your prof will like.
Neural Network Toolbox also comes with the Iris dataset that is part of the UCI repository. You may also find a demo somewhere that uses this as an example.
Hope that helps!