How do I save H2O Sparkling Water models to disk - pyspark

I have a PySpark code to train an H2o DRF model. I need to save this model to disk and then load it.
from pysparkling.ml import H2ODRF
drf = H2ODRF(featuresCols = predictors,
labelCol = response,
columnsToCategorical = [response])
I can not find any document on this so I am asking this question here.

I think the section of the docs on deploying pipeline models might be relevant: https://docs.h2o.ai/sparkling-water/2.3/latest-stable/doc/deployment/pysparkling_pipeline.html
Pipelines may not be what you're looking for depending on the use case.
Something like the following might work for your use case.
drf = H2ODRF(featuresCols = predictors,
labelCol = response,
columnsToCategorical = [response])
pipeline = Pipeline(stages=[drf])
model = pipeline.fit(data)
model.save("drf_model")

Related

Save a pipeline Logistic Regression model

lrmodel=logreg_pipeline.fit(X_train_resh,y_train_resh)
lrmodel.write().overwrite().save("E:/strokestuff/strokelrpred")
lrmodel.save("E:/strokestuff/strokelrpred")
lrmodel is a pipeline, I want to save it, My aim is to save this model then load it to deploy it in Flutter. I have tried every solution I got, can someone help me with this?
You can use joblib to save your model in .joblib file:
import joblib
pipe_clf_params = {}
filename = 'E:/strokestuff/strokelrpred/strokelrpred.joblib'
pipe_clf_params['pipeline'] = lrmodel
joblib.dump(pipe_clf_params, filename)

How to deploy a pytorch model?

I trained a model for Style Transfer and trained it on 1000's of images. I saved the Model for every 1000 of images and also saved the Transform Network final weights. Now I want it to be saved as a model so that I can use it in an app but none of the search are giving me a clear answer how to do that.
VG1 = vgg.VGG16("/kaggle/working/transformer_weight.pth")
example = torch.rand(1, 3, 800, 800)
traced_script_module = torch.jit.script(VG1, example)
traced_script_module.save('kaggle/working')
but it gives
RuntimeError:
Module 'Sequential' has no attribute '_modules' :
File "/kaggle/working/vgg.py", line 43
layers = {'3': 'relu1_2', '8': 'relu2_2', '15': 'relu3_3', '22': 'relu4_3'}
features = {}
for name, layer in self.features._modules.items():
~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
x = layer(x)
if name in layers:
I am beginner and I have been trying for days. Pls tell if you want more information.
I want to save the model so that I can use it to an android studio to make app.
The notebook is in 'https://www.kaggle.com/starktony45/fast-neural-style'

LSTM dimension issues with swift/coreml implementation

I have generated a LSTM model for audio classification using keras with tf as the backend. Upon conversion to a .mlmodel using coremltools I am running into issues as you can see here. The dimensions are very different from what is expected.
I used this for my base in xcode in swift.
Particularly this snip is what I believe is giving me the trouble:
do {
let request = try SNClassifySoundRequest(mlModel: soundClassifier.model)
try analyzer.add(request, withObserver: resultsObserver)
} catch {
print("Unable to prepare request: \(error.localizedDescription)")
return
}
}
Running this model gives me the following error:
Invalid model, inputDescriptions.count = 5
Unable to prepare request: Invalid model, inputDescriptions.count = 5
Even though when I build the model I see what is expected in the spec:
description {
input {
name: "audioSamples"
shortDescription: "Audio from microphone"
type {
multiArrayType {
shape: 13
dataType: DOUBLE
}
}
}
I am trying to incorporate this post into my code but I am not sure how to format it to my needs. Any advice is greatly appreciated. I can see that MLMultiArray is the key to my question, but I am unsure of: how to put the proper data into it and how to push this into a SNClassifySoundRequest type.
keras == 2.3.1
coremltools == 3.3
When you use SNClassifySoundRequest, your model needs to have a certain structure. I don't know the exact details off the top of my head, but I think it needs to be a pipeline where the first model is a built-in model that converts the audio to spectrograms.
If you trained your model with Keras, it's most likely not compatible with the requirements of SNClassifySoundRequest.
The good news is that you don't need SNClassifySoundRequest to run your model. Simply call soundClassifier.prediction(...) on the model.
Note that you need to pass in the input but also the hidden states of the LSTM layers. Core ML will not automatically manage the LSTM state for you (unlike Keras).

How to convert tensorflow .pb file to .bytes?

I am trying to convert the android tensorflow example provided in the tensorflow github into a Unity project. I have a .pb file for ssd_mobilenet_v1_android_export. But to use tensorflow models in Unity you have to have the model in a .bytes format. I can't figure out how to convert my .pb file to .bytes. I was going to use this code but I don't have any checkpoints for this graph, only the .pb file.
from tensorflow.python.tools import freeze_graph
freeze_graph.freeze_graph(input_graph = model_path +'/raw_graph_def.pb',
input_binary = True,
input_checkpoint = last_checkpoint,
output_node_names = "action",
output_graph = model_path +'/your_name_graph.bytes' ,
clear_devices = True, initializer_nodes = "",input_saver = "",
restore_op_name = "save/restore_all", filename_tensor_name = "save/Const:0")
Is there a simple way to do this conversion? Or a simple way to get a checkpoint for this model? It seems like this should be obvious but I can't figure it out. Thanks.
You can just switch extension from .pb to .bytes and for most cases this will work just fine. Check my TF Classify example for Unity.

saving picture to mongodb

am trying yo do this using tornado and pil and mongodb.
avat = self.request.files['avatar'][0]["body"]
nomfich = self.request.files['avatar'][0]["filename"]
try:
image = Image.open(StringIO.StringIO(buf=avat))
size = image.size
type = image.format
avatar = r"/profile-images/{0}/{1}".format(pseudo, nomfich)
except IOError:
self.redirect("/erreur-im")
and the database code:
user={
"pseudo": pseudo,
"password":password,
"email":email,
"tel":tel,
"commune":commune,
"statut":statut,
"nom":nom,
"prenom":prenom,
"daten":daten,
"sexe":sexe,
"avatar":avatar
}
self.db.essog.insert(user)
and it worked ok, the "avatar" is saved, but there in no image, it saves only a name!
my problem is:
to understand how database deals with pictures, must i make image.save(path, format), but the path, is it a path of a normal system path (windows, or linux)?
the profile is simple, and i've limited the picture upload to 500ko, and the document in mongodb is 16mb, so the document will handle the entire profile, but must i use gridFS even for small document when it contains picture?
the key problem is in path of the picture saving, am stuck, and it's the first time i deal with database, so am sorry for that question.
You don't necessarily need GridFS for storing files in MongoDB, but it surely makes it a nicer experience, because it handles the splitting and saving of the binary data, while making the metadata also available. You can then store an ID in your User document to the avatar picture.
That aside, you could also store binary data directly in your documents, though in your code you are not saving the data. You simply are opening it with PIL.Image, but then doing nothing with it.
Assuming you are using pymongo for your driver, I think what you can do is just wrap the binary data in a Binary container, and then store it. This is untested by me, but I assume it should work:
from pymongo.binary import Binary
binary_avatar = Binary(avat)
user={
...
"avatar":avatar,
"avatar_file": binary_avatar
...
}
Now that being said... just make it easier on yourself and use GridFS. That is what it is meant for.
If you were to use GridFS, it might look like this:
from gridfs import GridFS
avat_ctype = self.request.files['avatar'][0]["content_type"]
fs = GridFS(db)
avatar_id = fs.put(avat, content_type=avat_ctype, filename=nomfich)
user={
...
"avatar_name":avatar,
"avatar_id": avatar_id
...
}
This is the code to insert and retrieve image in mongodb without using gridfs.
def insert_image(request):
with open(request.GET["image_name"], "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
print encoded_string
abc=db.database_name.insert({"image":encoded_string})
return HttpResponse("inserted")
def retrieve_image(request):
data = db.database_name.find()
data1 = json.loads(dumps(data))
img = data1[0]
img1 = img['image']
decode=img1.decode()
img_tag = '<img alt="sample" src="data:image/png;base64,{0}">'.format(decode)
return HttpResponse(img_tag)
there is an error in :
from pymongo.binary import Binary
the correct syntax is:
from bson.binary import Binary
thk you all for your endless support
Luca
You need to save binary data using the Binary() datatype of pymongo.
http://api.mongodb.org/python/2.0/api/bson/binary.html#module-bson.binary