Pyspark NLP ContextSpellChecker unable to load from path - pyspark

I cannot load ContextSpellCheckerModel from path. But I'm able to load other models. Any issue with this model? Thank you.
spellModel = ContextSpellCheckerModel\
.load(parameters["paths"]["model_check_spelling"])\
.setInputCols("token")\
.setOutputCol("checked")\

Related

How to load a spark-nlp pre-trained model from disk

From the spark-nlp Github page I downloaded a .zip file containing a pre-trained NerCRFModel. The zip contains three folders: embeddings, fields, and metadata.
How do I load that into a Scala NerCrfModel so that I can use it? Do I have to drop it into HDFS or the host where I launch my Spark Shell? How do I reference it?
you just need to provide the path where the folders you mentioned are contained,
import com.johnsnowlabs.nlp.annotators.ner.crf.NerCrfModel
val path = "path/to/unziped/file/folder"
val model = NerCrfModel.read.load(path)
// use your model
model.setInputCols(someCol)
model.transform(yourData) // which contains 'someCol',
As long as I remember, you can place the folder in local FS or distributed FS, hope this helps other users as well!.
best,
Alberto.

In Spark MLlib, How to save the BisectingKMeansModel with Python to HDFS?

In Spark MLlib, BisectingKMeansModel in pyspark have no save/load function.
why?
How to save or load the BisectingKMeans Model with Python to HDFS ?
It may be your spark version. For bisecting k_means is recommended to have above 2.1.0.
You can find a complete example here on the class pyspark.ml.clustering.BisectingKMeans, hope it helps:
https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.clustering.BisectingKMeans%20featuresCol=%22features%22,%20predictionCol=%22prediction%22
The last part of the example code include a model save/load:
model_path = temp_path + "/bkm_model"
model.save(model_path)
model2 = BisectingKMeansModel.load(model_path)
It works for hdfs as well, but make sure that temp_path/bkm_model folder does not exist before saving the model or it will give you an error:
(java.io.IOException: Path <temp_path>/bkm_model already exists)

Spark read error file path does not exist

Hi Everyone,
While reading data from a file in spark I'm getting an error like path does not exist. Please find the screenshot for the same.
Could you please tell me what I missed regarding processing data?
Many thanks for your help in advance.
Regards,
Sunitha.
Your data should contain path with file extension. It's missing here.
Add extension to us-500.

export als recommendation model to a file

I am new to Apache Spark. I ran the sample ALS algorithm code present in the examples folder. I gave a csv file as an input. When I use model.save(path) to save the model, it is stored in gz.parquet file.
When I tried to open this file, I get these errors
Now I want to store the recommendation model generated in a text or csv file for using it outside Spark.
I tried the following function to store the model generated in a file but it was useless:
model.saveAsTextFile("path")
Please suggest me a way to overcome this issue.
Lest say you have trained your model with something like this:
val model = ALS.train(ratings, rank, numIterations, 0.01)
All that you have to do is:
import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.MatrixFactorizationModel
import org.apache.spark.mllib.recommendation.Rating
// Save
model.save(sc, "yourpath/yourmodel")
// Load Model
val sameModel = MatrixFactorizationModel.load(sc, "yourpath/yourmodel")
As it turns out saveAsTextFile() only works on the slaves.Use collect() to collect the data from the slaves so it can be saved locally on the master. Solution can be found here

Play Framework/Scala getting specific files

I have two inputs image and music.
With image i have not problems handling it as
val imageFile = request.body.file("imageFile").get.ref.file
however, musicFile can be multiple and i couldn't find a way to get them with request.body.file("musicFile")
I can get them as request.body.files but this also return the image file now the problem how i am gonna identify them.?
I am using playframework 2.1.1 with Scala
Cheers,
I found my way : you can get all files by request.body.files
and then check the files =>file
file.key.equals("musicFile")