Is it possible to simultaneously use and train a neural network? - neural-network

Is it possible to use Tensorflow or some similar library to make a model that you can efficiently train and use at the same time.
An example/use case for this would be a chat bot that you give feedback to. Somewhat like how pets learn (i.e. replicating what they just did for a reward). Or being able to add new entries or new responses they can use.

I think what you are asking is whether a model can be trained continuously without having to retrain it from scratch each time new labelled data comes in.
Answer to that is - Online models
There are models that can be trained continuously on data without worrying about training them from scratch. As per Wikipedia definition
Online machine learning is a method of machine learning in which data becomes available in sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once.
Some examples of such algorithms are
BernoulliNB
GaussianNB
MiniBatchKMeans
MultinomialNB
PassiveAggressiveClassifier
PassiveAggressiveRegressor
Perceptron
SGDClassifier
SGDRegressor
DNNs

Related

During ML agent training, If I want to change observations, do I have to run the training from the beginning again?

During ML agent training, If I want to change observations(sensor shape, number of sensors, etc), do I have to run the training from the beginning again?
Short answer: Yes!
A bit longer answer: Changes in anything model structure or training data related will lead to starting the training from the beginning. All the popular frameworks transfer their model structure to the GPU while building/compiling the model and there is no way of hot-swapping this during runtime.

Neural Network checkpoints?

I am new to Neural Network and I dont know what exactly to search on google for solution,here is my problem ,if you kindly please let me know what I am looking for,
So I am working on a project where,it will have many contributors over time,and each contributor will write a new line on excel file and then run the code to train dataset,
if want to ask is that ,is there a way to save a checkpoint so each time the code don't have to train the whole dataset and just continue to train the new entries instead of starting from zero.
Please let me know what exactly I should google.
Kind regards
This is, as you guessed, extremely common and usually referred to as "fine-tuning". In your case, since the dataset barely changes between training runs, you can expect the model to be very similar, so you could initialize your weights to the weights of the previous best model and retrain for only a few epochs, likely with a small learning rate.
People usually do fine-tuning starting from a network trained on an entirely different dataset, so it's likely that you will find that use-case rather than yours, but it will work even better if you keep a very similar dataset.
"Continual learning without forgetting"

Use a trained neural network to imitate its training data

I'm in the overtures of designing a prose imitation system. It will read a bunch of prose, then mimic it. It's mostly for fun so the mimicking prose doesn't need to make too much sense, but I'd like to make it as good as I can, with a minimal amount of effort.
My first idea is to use my example prose to train a classifying feed-forward neural network, which classifies its input as either part of the training data or not part. Then I'd like to somehow invert the neural network, finding new random inputs that also get classified by the trained network as being part of the training data. The obvious and stupid way of doing this is to randomly generate word lists and only output the ones that get classified above a certain threshold, but I think there is a better way, using the network itself to limit the search to certain regions of the input space. For example, maybe you could start with a random vector and do gradient descent optimisation to find a local maximum around the random starting point. Is there a word for this kind of imitation process? What are some of the known methods?
How about Generative Adversarial Networks (GAN, Goodfellow 2014) and their more advanced siblings like Deep Convolutional Generative Adversarial Networks? There are plenty of proper research articles out there, and also more gentle introductions like this one on DCGAN and this on GAN. To quote the latter:
GANs are an interesting idea that were first introduced in 2014 by a
group of researchers at the University of Montreal lead by Ian
Goodfellow (now at OpenAI). The main idea behind a GAN is to have two
competing neural network models. One takes noise as input and
generates samples (and so is called the generator). The other model
(called the discriminator) receives samples from both the generator
and the training data, and has to be able to distinguish between the
two sources. These two networks play a continuous game, where the
generator is learning to produce more and more realistic samples, and
the discriminator is learning to get better and better at
distinguishing generated data from real data. These two networks are
trained simultaneously, and the hope is that the competition will
drive the generated samples to be indistinguishable from real data.
(DC)GAN should fit your task quite well.

Why we need training and test datasets in research?

I'm newbie in research area of data mining (text clustering) and i have couple question regarding to training and test datasets.
Is that clustering need training and testing datasets?
why we need to separate into training and test datasets?
Sorry for the rookie question hope expert in this group can help me.
As your question is on clustering:
In cluster analysis, there usually is no training or test data split.
Because you do cluster analysis when you do not have labels, so you cannot "train".
Training is a concept from machine learning, and train-test splitting is used to avoid overfitting.
But if you are not learning labels, you cannot overfit.
Properly used cluster analysis is a knowledge discovery method. You want to discover some new structure in your data, not rediscover something that is already labeled.
To train your data you need a sets of relevant data similar but not identical to your testing data. For example, you could split up your data where 0.7 of your data is training and the rest testing. This will allow your algorithm to get a feel for what it should be looking for. The rest of the data 0.3 can be used for testing as it is a distinct set of information (hopefully) which should allow the algorithm to test itself.
Why split it up?
Well if you train your data on data A and then test your algorithm on data A your algorithm will be able to identify all the information correctly because that is what it was trained on.
For example, if when learning addition you were given the sums 3+4, 4+5, 6+9, which you correctly solved it would be redundant to test your knowledge of addition using the same sums.
further information:
http://en.wikipedia.org/wiki/Natural_language_processing
http://www.nltk.org/book
Hope this helps.

When to start using the selection set in a Back Propagation Neural Network?

Beginner on ANNs:
I am implementing a back propagation neural network to predict the price of gold. I know that I have to split my data into training data, selection data and test data.
However I unsure How to go on about using these sets of data. At first I was training the data network with my training set then after it's trained I am getting a number of inputs to my network from the test set and comparing the output.
I'm not sure if I'm doing this right and were does the selection set come in ?
thanks in advance!
The general idea is:
Train the network for a little while on the training set.
Evaluate the network on a second set, often called the validation set. Probably what you're calling the selection set.
Train the network a little more on the training set.
Evaluate the new network on the selection set again.
Which did better, the old network or the new network? If the new network is better, we're still getting some use out of training, so goto 3. If the new network is worse, more training will probably only hurt. Use the previously version of the network, since it did better.
In this way, you can tell when to stop training.
One easy modification to this is to always keep track of the best network seen so far, and we only stop training when we see some number (say, three) of training attempts that do worse in a row.
The third set, the test set, is necessary because the selection set is, if indirectly, involved in the training process. Final evaluation must be done on data that was not used at all during training.
This sort of thing is sufficient for simple experiments, but in general you'll want to use cross-validation to get a better idea of your system's performance.
I wanted to leave a comment just to say that validation sets are a good place for model-dependent hyper-parameter tuning, but I'm new here and hence lack the reputation points to do so. To make this more worthy of a separate posting, I've included an outline of my own train-validate-test process. In practice, my workflow is as follows:
Identify, collect, and clean data. Try to limit complaining during data munging process.
Split data into three sets: training, validation, test.
Establish two "base" models for evaluating more complex models built later on in the process. The first of these models is typically a basic linear/logistic regression using all possible features. The second models uses only the most obviously informative (initial identification of informative features depends on use case, typically involves combination of domain knowledge, basic clustering, simple correlation).
Begin more empirical feature selection (i.e. unsupervised NN, but usually random forest) and prototype a broad range of models using the training set.
Eliminate poorly performing models as well as uninformative features
Compare performance of remaining models against each other and the "base" models, using a modified version of the training set (same data, but sans uninformative features). Toss under-performing models.
Using the validation set, tune the appropriate hyper-parameters for each of the models (either by hand or gridsearch). Further reduce the number of models in consideration, ideally to just 2-3 (excluding base models).
Finally, evaluate model performance (with optimized hyper-parameters) on the test set. Again, compare models among themselves and against the base models. Make final model choice based on a problem-specific appropriate combination of computational complexity/cost, ease of interpretation/transparency/"explainability", and improvement over and/or performance vs base models.