Saving SparkXGBoost model yields empty directory - pyspark

I'm trying to save a SparkXGBoost model using the following code (based on documentation):
model_1.save("/tmp/xgboost-pyspark-model")
The model was trained on a dataset and returns correct predictions. However, the directories created by the code (xgboost-pyspark-model/metadata and xgboost-pyspark-model/model) are empty (only _SUCCESS files are there). When I try to load the model:
model_2 = SparkXGBClassifier.load("./models/xgboost-pyspark-model")
I'm, getting the following error:
ValueError: RDD is empty
I tried to save the model using model.write().overwrite().save() code as well. The same result.

Related

An error occurs when using MLDataTable to load data

I tried to create a word tagger model in Swift according to this tutorial in the latest XCode. But I cannot load data from a local file using MLDataTable. Here is my code.
let data = try MLDataTable(contentsOf:
URL(fileURLWithPath: "/path/to/data.json"))
The error is as follows.
error: Couldn't lookup symbols:
CreateML.MLDataTable.init(contentsOf: Foundation.URL, options:
CreateML.MLDataTable.ParsingOptions) throws -> CreateML.MLDataTable
I tried absolute path and relative path, but neither of them worked(I am pretty sure that the data file is in the right location and the paths are correct). In addition, I can load the local file to a URL object, so the problem should lie in MLDataTable.
Could someone help?
I have the same error however I used .csv file. But the problem is solved when I use COREML tool under developer tools of Xcode.
Here are some recommendations:
Your training data's class column label should be "label"
Your training data can be one file but testing data should contains sub folders named exactly the same name of your label's names. To illustrate, you have "negative", "positive" and "neutral as label names. Then you should have three sub folders named "negative", "positive" and "neutral". Moreover testing data files can't be one json or csv file including all the testing data. For example if you have five rows of negative labeled data, you can't put that csv file under negative sub-folder. You have to create five txt file for each five row.

Reading files frame by frame

I have a folder containing .ply files. I want to read them and plot them like an animation. Initially i am trying to read the files and plot individually using the following code:
testfiledir = 'Files\';
plyfiles = dir(fullfile(testfiledir, '*.ply'));
for k=1:length(plyfiles)
FileNames = plyfiles(k).name;
plys=pcread(FileNames);
pcshow(plys)
end
But while running the script i get the error:
Error using pcread (line 51)
File "val0.ply" does not exist.
Error in read_pcd (line 6)
plys=pcread(FileNames);
val0.ply is one my first frame which is read in the variable 'plyfiles'
Where am I making mistake?
Use a datastore it is much easier and will keep track of everything for you. E.g.
ds = fileDatastore("Files/","ReadFcn",#pcread,"FileExtensions",".ply");
then you can read the files from it using read or readall, e.g.
while hasdata(ds)
plys = read(ds);
pcshow(plys)
end
It is slightly slower than if you can make the optimal implementation, but I prefer it big-time for its ease.

CNTK TransferLearning.model

I've been following the Build your own image classifier using Transfer Learning tutorial found at this link.
At the end of the tutorial, it says you use your own data set, which I created by following the example in the tutorial. It successfully completed and created a folder ~/CNTK-Samples-2-3-1/Examples/Image/TransferLearning/Output, which is expected. Inside this output folder is predictions.txt, predOutput.txt, and TransferLearning.model.
My question is, how do I access TransferLearning.model? I'm not sure what to do with it and I can't find anything in the documentation allowing me to alter it or run it.
Once you successfully create your model, you can use it in Python environment (with CNTK) to classify your images. The code should look something like this:
#load the trained transfer learning model
trained_model = load_model(model_file)
#for every new image:
#get predictions for a single image
probs = eval_single_image(trained_model, img_file, image_width, image_height)

export als recommendation model to a file

I am new to Apache Spark. I ran the sample ALS algorithm code present in the examples folder. I gave a csv file as an input. When I use model.save(path) to save the model, it is stored in gz.parquet file.
When I tried to open this file, I get these errors
Now I want to store the recommendation model generated in a text or csv file for using it outside Spark.
I tried the following function to store the model generated in a file but it was useless:
model.saveAsTextFile("path")
Please suggest me a way to overcome this issue.
Lest say you have trained your model with something like this:
val model = ALS.train(ratings, rank, numIterations, 0.01)
All that you have to do is:
import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.MatrixFactorizationModel
import org.apache.spark.mllib.recommendation.Rating
// Save
model.save(sc, "yourpath/yourmodel")
// Load Model
val sameModel = MatrixFactorizationModel.load(sc, "yourpath/yourmodel")
As it turns out saveAsTextFile() only works on the slaves.Use collect() to collect the data from the slaves so it can be saved locally on the master. Solution can be found here

error using save can't write file

I have got this really strange error in matlab. When I try to run the command
save(fullfile('filepath','filename'),'var','-v7');
I get the error message,
error using save can't write file
but when I try
save(fullfile('filepath','filename'),'var','-v7.3');
everything works fine. The the variable takes some space on the workspace, 165MB, but the I would guess that the size should not be an issue here. Does anyone know why it does not work to save in v7?
For the one that want to confirm the size of the variable, I will add the whos information,
Name Size Bytes Class Attributes
myName 1x1 173081921 struct
BR/ Patrik
EDIT
The variable I try to save is a struct with plenty of fields. I have tried to save a 3 dimensional matrix of size 800 mb, which went through without problems.
This is not an exact match to your problem, but I received the same error message when trying to save to -v6 format. Matlab is supposed to issue an error when a variable type or size is not supported:
help save
...
If any data items require features that the specified version does not support, MATLAB does not save those items and issues a warning. You cannot specify a version later than your version of MATLAB software.
Matlab's error checking seems to not be perfect, because there are certain situations (dependent on Matlab version and the particular variable type) that just fail all together with this not so helpful error message:
Error using save
Can't write file filename.mat.
For example, saving a string with certain unicode characters with the '-v6' option in Matlab r2015b in Linux produces the error, but Matlab r2016a in Windows does not. This is the output from my Matlab r2015b session:
>> A=char(double(65533))
A =
?
>> save('filename.mat','-v6','A')
Error using save
Can't write file filename.mat.
Without having your specific variable to test with, but seeing that the error messages match, I suggest removing parts of your data structure until it will save in order to isolate the variable that is causing it to fail.