I ran a multi-class Logistic Regression with Spark but I would like to use
SVM to cross validate results.
It looks like Spark 1.6 only supports SVM binary classifications. Should I use other tools to do this? H20 for example?
After some research, I found this branch which was not integrated in Spark 1.6 that allowed me to run the SVM on a multi class classification problem.
Big thanks to Bekbolatov.
The commit can be ofund here:
https://github.com/Bekbolatov/spark/commit/463d73323d5f08669d5ae85dc9791b036637c966
Related
Anybody ever done a custom pytorch.data.InMemoryDataset for a spark GraphFrame (or rather Pyspark DataFrames? Looked for people that have done it already but didn't find anything on GitHub/Stackoverflow et cetera and I have little knowledge of pytorch geometric as of right now.
Thankful for code samples, tips or matching links :)
You cannot run gcn on spark as of now. So PyTorch geometric doesn't support spark based training.
I want to ask is this possible to write a custom loss function for Multi class Classification in Spark using Scala. I want to code multi-class logarithmic loss in Scala. I searched Spark documentation but could not get any hint.
From the Spark 2.2.0 MLlib guide:
Currently, only binary classification is supported.. This will likely change when multiclass classification is supported.
If you are not restricted to a particular classification technique I would suggest using XGBoost. It has a Spark-compatible implementation, and it makes it possible to use any loss function provided you can compute is derivative twice.
You can find a tutorial here.
Also the explanation about why it is possible to use a custom loss function can be found here.
Wondering if there a runWithValidation feature for Gradient Boosted Trees (GBT) in Spark ml to prevent overfitting. It's there in mllib which works with RDDs. I am looking the same for dataframes.
Found a K-Fold Cross Validation support in Spark. It can be done using CrossValidation() with Estimators, Evaluators, ParamMap and number of folds. This helps in finding the best parameters for the model i.e model tuning.
Refer http://spark.apache.org/docs/latest/ml-tuning.html for more details.
I have developed an application performing Logistic regression using Spark mllib.How can we visually perceive the results?. I mean, in R-programming,we can see the result graphically.Is there any way to visualize the results in scala spark program as well.
I do not think gaussian mixture model is available in mllib yet. I am wondering if any good Scala/Java implementation of GMM (suitable for large data) is available elsewhere. Please let me know.
Thanks and regards,
It is available in Spark MLlib now:
http://spark.apache.org/docs/latest/mllib-clustering.html#gaussian-mixture
Have a look at https://issues.apache.org/jira/browse/SPARK-4156
It is still under progress. We can expect it soon in MLLib.