I know I do need a GPU to train a model but even after the model is trained do I need a GPU to deploy the same trained model?
For example I have a model for a car with auto-pilot to predict and take a decision... Do I need a GPU for the prediction too..
Specially in case of reinforcement learning
Strictly speaking you usually don't need a GPU for training either depending on the platform, it would just be much slower than if you utilized he GPU rather than the CPU.
For deploying the model you do not need a GPU. Most models are simply an organized list of weights which are used by the model to operate on its inputs. Since this usually isn't particularly computationally expensive, except for very large models, a GPU isn't necessary for deployment either, but may provide some performance benefit for lager models.
Related
I've already trained the neural network in Keras for detecting two classes of images (cats and dogs) and got accuracy on test data. Is it enough for the conclusion in the master thesis or should I do other actions for evaluating the quality of network (for instance, cross-validation)?
Not really, I would expect more than just accuracy from my students in any classification setup. Accuracy only evaluates that particular network on that particular test set but you would have to some extent justify the design choices you've made in building that network. Here are some things to consider:
Presumably you have some hyper-parameters you've fixed, you can investigate how these affect your results. How many filters? How many layers? and most importantly why?
An important aspect of object classification is how your model handles noise. Depending on your dataset, one simple way would be to pre-process the test data, blur it, invert colours etc and you'll see that your performance will drop. Why does it do that? How does the confusion matrix look like then?
What is the performance of the network? Is it fast, slow compared to another system, say VGG?
When you evaluate your project in general not just the network, asking why things worked helps a lot, not just why things didn't work.
I am currently running a Tensorflow convnet for image recognition and I am considering of buying new GPUs to enable more complex graphs, batch size, and input dimensions. I have read posts like this that do not recommend using AWS GPU instances to train convnets, but more opinions are always welcomed.
I've read Tensorflow's guide 'Training a Model Using Multiple GPU Cards', and it seems that the graph is duplicated across the GPUs. I would like to know is this the only way to use parallel GPUs in Tensorflow convnet?
The reason I am asking this is because if Tensorflow can only duplicate graphs across multiple GPUs, it would mean each GPU must have at least the memory size that my model requires for one batch. (Example if the minimum memory size required is 5GB, two card of 4GB each would not do the job)
Thank you in advance!
No, it is definitely possible to use different variables on different GPUs.
For every variable and every layer that you declare, you have the choice of where do you declare the variable.
And in the specific case, you would want to use multiple GPUs for duplicating your model only to increase its batch_size training parameter to train faster, you would still need to explicitly build your model using the concept of shared parameters and manage how do those parameters communicate.
I'm searching for existing work on Neural Net architectures that grow based on need or complexity/variability of training data. Some architectures that I've found include self-organizing maps, and growing Neural gas. Are these the only one's out there?
What I'm searching for is best illustrated by a simple scenario;
if the training data only has a few patterns, then the neural net would be 2-3 layers deep with a small set of nodes in each layer. If the training data was more convoluted, then we would see deeper networks.
Such work seems rare or absent in the AI literature. Is it because the performance is comparatively weak ? I'd appreciate any guidance.
An example of this is called neuro-evolution. What you could do is combine backprop with evolution to find the optimal structure for your dataset. Neataptic is one of the NN libraries which offers neuro-evolution. With some simple coding you could turn this into backprop + evolution.
The disadvantage of this is that it will require much more computation power as it requires a genetic algorithm to run an entire population. So using neuro-evolution does make the performance comparibly weak.
However, I think there are more techniques out there that disable certain nodes, and if there is no negative effect on the output, they will be removed. I'm not sure though.
My neural networks model was built in Keras over Theano using a GPU.
I am storing it using Pickle for future use, possibly on another computer.
Is it possible to use the model for prediction without a GPU?
Sure. It's even a common use-case. GPUs help boost training, but sometimes aren't available in production (for example, if you run on a customer's phone).
I don't know Theano much, but they might have an equivalent to tensorflow.serving. You can always serialize the trained Model object and read it from the other machine.
To serialize, you can either use:
The built-in keras.models.save_model and keras.models.load_model that dumps Models to hdf5 files.
If you need/prefer pickle - it's basically not supported by Keras, but you can use this trick - http://zachmoshe.com/2017/04/03/pickling-keras-models.html
I have a training dataset which gives me the ranking of various cricket players(2008) on the basis of their performance in the past years(2005-2007).
I've to develop a model using this data and then apply it on another dataset to predict the ranking of players(2012) using the data already given to me(2009-2011).
Which predictive modelling will be best for this? What are the pros and cons of using the different forms of regression or neural networks?
The type of model to use depends on different factors:
Amount of data: if you have very little data, you better opt for a simple prediction model like linear regression. If you use a prediction model which is too powerful you run into the risk of over-fitting your model with the effect that it generalizes bad on new data. Now you might ask, what is little data? That depends on the number of input dimensions and on the underlying distributions of your data.
Your experience with the model. Neural networks can be quite tricky to handle if you have little experience with them. There are quite a few parameters to be optimized, like the network layer structure, the number of iterations, the learning rate, the momentum term, just to mention a few. Linear prediction is a lot easier to handle with respect to this "meta-optimization"
A pragmatic approach for you, if you still cannot opt for one of the methods, would be to evaluate a couple of different prediction methods. You take some of your data where you already have target values (the 2008 data), split it into training and test data (take some 10% as test data, e.g.), train and test using cross-validation and compute the error rate by comparing the predicted values with the target values you already have.
One great book, which is also on the web, is Pattern recognition and machine learning by C. Bishop. It has a great introductory section on prediction models.
Which predictive modelling will be best for this? 2. What are the pros
and cons of using the different forms of regression or neural
networks?
"What is best" depends on the resources you have. Full Bayesian Networks (or k-Dependency Bayesian Networks) with information theoretically learned graphs, are the ultimate 'assumptionless' models, and often perform extremely well. Sophisticated Neural Networks can perform impressively well too. The problem with such models is that they can be very computationally expensive, so models that employ methods of approximation may be more appropriate. There are mathematical similarities connecting regression, neural networks and bayesian networks.
Regression is actually a simple form of Neural Networks with some additional assumptions about the data. Neural Networks can be constructed to make less assumptions about the data, but as Thomas789 points out at the cost of being considerably more difficult to understand (sometimes monumentally difficult to debug).
As a rule of thumb - the more assumptions and approximations in a model the easier it is to A: understand and B: find the computational power necessary, but potentially at the cost of performance or "overfitting" (this is when a model suits the training data well, but doesn't extrapolate to the general case).
Free online books:
http://www.inference.phy.cam.ac.uk/mackay/itila/
http://ciml.info/dl/v0_8/ciml-v0_8-all.pdf