Spark-mllib retraining saved models

Spark-mllib retraining saved models - scala

I am trying to make a classification with spark-mllib, especially using RandomForestModel.
I have taken a look on this example from spark (RandomForestClassificationExample.scala), but I need a somewhat expanded approach.
I need to be able to train a model, save the model for future usage, but also to be able to load it and train further. Like, extend the dataset and train again.

I completely understand the need to export and import a model for future usage.
Unfortunately, training "further" isn't possible with Spark nor does it make sense. Thus it's recommended to retrain the model with the data from use to train the first model + new data.
Your first training values/metrics don't have much sense anymore if you want to add more data (e.g features, intercept, coefficients, etc.)
I hope that this answers your question.

You may need to look for some reinforcement learning technique instead of Random Forest if you want to use the old model and retrain it with new data.
That I know, there's deeplearning4j that implements deep reinforcement learning algorithms on top of Spark (and Hadoop).

If you only need to save JavaRDD[Object], you can do (in Java)
model.saveAsObjectFile()
Values will be writter out using Java Serialization. Then, to read your data you do:
JavaRDD[Object] model = jsc.objectFile(pathOfYourModel)
Be careful, object files are not available in Python. But you could use saveAsPickleFile() to write your model and pickleFile() to read it.

Related

How to create a "Denoising Autoencoder" in Matlab?

I know Matlab has the function TrainAutoencoder(input, settings) to create and train an autoencoder. The result is capable of running the two functions of "Encode" and "Decode".
But this is only applicable to the case of normal autoencoders. What if you want to have a denoising autoencoder? I searched and found some sample codes, where they used the "Network" function to convert the autoencoder to a normal network and then Train(network, noisyInput, smoothOutput)like a denoising autoencoder.
But there are multiple missing parts:
How to use this new network object to "encode" new data points? it doesn't support the encode().
How to get the "latent" variables to the features, out of this "network'?
I appreciate if anyone could help me resolve this issue.
Thanks,
-Moein

At present (2019a), MATALAB does not permit users to add layers manually in autoencoder. If you want to build up your own, you will have start from the scratch by using layers provided by MATLAB;
In order to to use TrainNetwork(...) to train your model, you will have you find out a way to insert your data into an object called imDatastore. The difficulty for autoencoder's data is that there is NO label, which is required by imDatastore, hence you will have to find out a smart way to avoid it--essentially you are to deal with a so-called OCC (One Class Classification) problem.
https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.imagedatastore.html
Use activations(...) to dump outputs from intermediate (hidden) layers
https://www.mathworks.com/help/deeplearning/ref/activations.html?searchHighlight=activations&s_tid=doc_srchtitle
I swang between using MATLAB and Python (Keras) for deep learning for a couple of weeks, eventually I chose the latter, albeit I am a long-term and loyal user to MATLAB and a rookie to Python. My two cents are that there are too many restrictions in the former regarding deep learning.
Good luck.:-)

If you 'simulation' means prediction/inference, simply use activations(...) to dump outputs from any intermediate (hidden) layers as I mentioned earlier so that you can check them.
Another way is that you construct an identical network but with the encoding part only, copy your trained parameters into it, and feed your simulated signals.

Object detection for a single object only

I have been working with object detection. But these methods consist of very deep neural networks and require lots of memory to store the trained models. E.g. I once tried to train a Mask R-CNN model, and the weights take 200 MB.
However, my focus is on detecting a single object only. So, I guess these methods are not suitable. Are there any object detection method that can do this job with a low memory requirement?

You can try SSD or faster RCNN they are easily available in Tensorflow object detection API
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
here you can get pre-trained models and config file
you can select your model by taking look on speed and mAP(accuracy) column as per your requirement.

Following mukul's answer, I specifically recommend you check out SSDLite-MobileNetV2.
It's a lite-weight model, which is still enough expressive for good results.
Especially when you're restricting yourself to a single class, as you can see in the example of FaceSSD-MobileNetV2 as in here (Note however this is vanilla SSD).
So you can simply Take the pre-trained model of SSDLite-MobileNetV2 with the corresponding config file, and modify it for a single class.
This means changing num_classes to 1, modifying the label_map.pbtxt, and of course - preparing the dataset with the single class you want.
If you want a more robust model, but which has no pre-trained mode, you can use an FPN version.
Checkout this config file, which is with MobileNetV1, and modify it for your needs (e.g. switching to MobileNetV2, switching to use_depthwise, etc).
On one hand, there's no detection pre-trained model, but on the other the detection head is shared over all (relevant) scales, so it's somewhat easier to train.
So simply fine-tune it from the corresponding classification checkpoint from here.

Train a huge model inception with keras

I need to train an inception model with more than 400 000 images.
I know I can't load it all on the memory, since it's too big.
So, I will certainly train it over batch, instead than epoch (And so generate load every batch from the disk)
But, it will be very slow, no ?
Do you know if there is a different way of doing it ?
I also want to apply different and aleatory transformations to my images during the training.
I looked over the dataimagegenerator class, but, it's incompatible with all the images I have.
So, there is a way to do that without the generator ?
Thanks to you!

You can use the fit_generator method (https://keras.io/models/model/#fit_generator) of the model. This still loads images from memory, however this is done in parallel and has less overhead. You can write your own generator to apply the transformations you want to (https://wiki.python.org/moin/Generators).
If you need faster memory access you can take a look at hdf5. You can store the images in hdf5 to provide faster indexing and loading for your program. (http://www.h5py.org/)

Can I get predictions from Winbugs/OpenBUGS?

I am new to WinBUGS and OpenBUGS. I just ran an example model, and am wondering whether I can get the predictions generated by WinBUGS/OpenBUGS. If not, is there any convenient way to achieve this (e.g. with the help of other applications such as R)?

Yes, you can. In Bayesian tools, it is very easy to get predictions. In your design matrix, just add new rows, with response variable set to NA. You can see concrete example here.

Sample data for testing binary linear classificaion code

I am loking for some sample binary data for testing my linear classifiation code. I need a data set where the data is 2d and belongs to either one of two classes. If anyone has such data or any reference for the same, kindly reply. Any help is appreciated.

I have my own dataset which contain 2 categories of data with 2 features each.
http://dl.dropbox.com/u/28068989/segmentation_mi_kit.zip
Extract this archive and go to 'segmentation_mi_kit/mango_banyan_dataset/'
Alternately if you want something standard to test your algorithm on, have a look at UCI Machine Learning dataset : http://www.ics.uci.edu/~mlearn/
I guess thatz a kind of data you need.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Spark-mllib retraining saved models - scala

You may need to look for some reinforcement learning technique instead of Random Forest if you want to use the old model and retrain it with new data. That I know, there's deeplearning4j that implements deep reinforcement learning algorithms on top of Spark (and Hadoop).

Related

How to create a "Denoising Autoencoder" in Matlab?

Object detection for a single object only

Train a huge model inception with keras

Can I get predictions from Winbugs/OpenBUGS?

Sample data for testing binary linear classificaion code

Categories

Resources