Combining labeled and unlabeled data in a single pipeline - scipy

I'm building image classifier that uses DBN for feature learning and logistic regression to fine-tune resulting network. Normally, the most convenient way to implement such an architecture in SciKit Learn is to use Pipeline class. But in my case I have ~10K unlabeled images and only ~300 labeled ones. Surely, I want to use all images to train DBN and fit logistic regression with only labeled examples.
I can think of implementing my own Pipeline class that will handle this case, but first I'd like to know if there's already something existing. Is it?

The current scikit-learn Pipeline API is not well suited for supervised learning with unsupervised pre-training. Implementing your own wrapper class is probably the best way to go forward for that case.

Related

Is it possible to simultaneously use and train a neural network?

Is it possible to use Tensorflow or some similar library to make a model that you can efficiently train and use at the same time.
An example/use case for this would be a chat bot that you give feedback to. Somewhat like how pets learn (i.e. replicating what they just did for a reward). Or being able to add new entries or new responses they can use.
I think what you are asking is whether a model can be trained continuously without having to retrain it from scratch each time new labelled data comes in.
Answer to that is - Online models
There are models that can be trained continuously on data without worrying about training them from scratch. As per Wikipedia definition
Online machine learning is a method of machine learning in which data becomes available in sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once.
Some examples of such algorithms are
BernoulliNB
GaussianNB
MiniBatchKMeans
MultinomialNB
PassiveAggressiveClassifier
PassiveAggressiveRegressor
Perceptron
SGDClassifier
SGDRegressor
DNNs

Best discriminatory method for 1d data with a lot of variance

I have a problem that I have tried to solve using Support Vector Machines (SVMs) to discriminate 1d series of data between two classes. One of the classes have very specific characteristics and are easily distinguishable from a human perspective, the only drawback is that the other class has data with a lot of variation from data sample to data sample, and it looks like it is not feasible to use this as a class at all. I'm only interested in discriminate between data that is from the class of interest (see image under) and all other "uninteresting" data. Then I tried implementing a one class SVM (OC-SVM), and it looks like it works okey but not as well as I had hoped. Therefore I started looking at alternatives, and came across one-class neural networks and Generative Adversarial Networks (GANs) as a possible solution. The Idea is that since the data points that I want to detect has a certain characteristic (see Image under) then an Adversarial network could preform well. I am very new to the field of neural networks and deep learning, so I wanted to ask the community if I am on to something before diving into it. Feel free to come up with alternative methods as well.
Ps: Unsupervised methods and clustering has not worked well solving this problem because of huge variations in the data.
Image of data of interest

How to jointly learn two tasks at prediction level?

I have trained a network on two different modals of the same image. I pass the data together in one layer but after that, it is pretty much two networks in parallel, they don't share a layer and the two tasks have different set of labels, therefore I have two different loss and accuracy layers (I use caffe btw). I would like to learn these tasks jointly. For example, the prediction of a class of task 1 should be higher in presence of the task 2 predicting a certain class label. I don't want to join them at feature level but at prediction level. How do I get to do this?
Why don't you want to join the prediction at feature level?
If you really want to stick to your idea of not joining any layers of the network, you can apply a CRF or SVM on top of the overall prediction pipeline to learn cross-correlations between the predictions. For any other method you will need to combine features inside the network, one way or another. However I would strongly recommend, that you consider doing this. It is a general theme in deep learning, that doing stuff inside the network works better then doing it outside.
From what I have learned by experimenting with joint prediction, you will get the most performance gain, if you share weights between all convolutional layers of the network. You can then apply independent fc-layers, followed by a softmax regression and separate loss functions on top of the jointly predicted features. This will allow the network to learn cross-correlation between features while it is still able to make separate predictions.
Have a look at my MultiNet paper as a good starting point. All our training code is on github.

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

How to Combine two classification model in matlab?

I am trying to detect the faces using the Matlab built-in viola jones face detection. Is there anyway that I can combine two classification models like "FrontalFaceCART" and "ProfileFace" into one in order to get a better result?
Thank you.
You can't combine models. That's a non-sense in any classification task since every classifier is different (works differently, i.e. different algorithm behind it, and maybe is also trained differently).
According to the classification model(s) help (which can be found here), your two classifiers work as follows:
FrontalFaceCART is a model composed of weak classifiers, based on classification and regression tree analysis
ProfileFace is composed of weak classifiers, based on a decision stump
More infos can be found in the link provided but you can easily see that their inner behaviour is rather different, so you can't mix them or combine them.
It's like (in Machine Learning) mixing a Support Vector Machine with a K-Nearest Neighbour: the first one uses separating hyperplanes whereas the latter is simply based on distance(s).
You can, however, train several models in parallel (e.g. independently) and choose the model that better suits you (e.g. smaller error rate/higher accuracy): so you basically create as many different classifiers as you like, give them the same training set, evaluate each accuracy (and/or other parameters) and choose the best model.
One option is to make a hierarchical classifier. So in a first step you use the frontal face classifier (assuming that most pictures are frontal faces). If the classifier fails, you try with the profile classifier.
I did that with a dataset of faces and it improved my overall classification accuracy. Furthermore, if you have some a priori information, you can use it. In my case the faces were usually in the middle up part of the picture.
To further improve your performance, without using the two classifiers in MATLAB you are using, you would need to change your technique (and probably your programming language). This is the best method so far: Facenet.