I have developed an application performing Logistic regression using Spark mllib.How can we visually perceive the results?. I mean, in R-programming,we can see the result graphically.Is there any way to visualize the results in scala spark program as well.
Related
I want to ask is this possible to write a custom loss function for Multi class Classification in Spark using Scala. I want to code multi-class logarithmic loss in Scala. I searched Spark documentation but could not get any hint.
From the Spark 2.2.0 MLlib guide:
Currently, only binary classification is supported.. This will likely change when multiclass classification is supported.
If you are not restricted to a particular classification technique I would suggest using XGBoost. It has a Spark-compatible implementation, and it makes it possible to use any loss function provided you can compute is derivative twice.
You can find a tutorial here.
Also the explanation about why it is possible to use a custom loss function can be found here.
I'm new to spark and scala. I'm working on a project doing forecasting with ARIMA models. I see from the posts below that I can train ARIMA models with spark.
I'm wondering what's the advantage of using spark for ARIMA models?
How to do time-series simple forecast?
https://badrit.com/blog/2017/5/29/time-series-analysis-using-spark#.W9ONGBNKi7M
The advantage of Spark is a distributed processing engine. If you have a huge amount of data which is typically the case in real-life systems, we need such processing engines. It will benefit in terms of scalability and performance to use any algorithm not only ARIMA on platforms like Spark.
I am currently looking for an Algorithm in Apache Spark (Scala/Java) that is able to cluster data that has numeric and categorical features.
As far as I have seen, there is an implementation for k-medoids and k-prototypes for pyspark (https://github.com/ThinkBigAnalytics/pyspark-distributed-kmodes), but I could not identify something similar for the Scala/Java version I am currently working with.
Is there another recommend algorithm to achieve similar things for Spark running Scala? Or am I overlooking something and could actually make use of the pyspark library in my Scala project?
If you need further information or clarification feel free to ask.
I think you need first to convert your categorical variables to numbers using OneHotEncoder then, you can apply your clustering algorithm using mllib (e.g. kmeans). Also, I recommend doing scaling or normalization before applying your cluster algorithm as it is distance sensitive.
Wondering if there a runWithValidation feature for Gradient Boosted Trees (GBT) in Spark ml to prevent overfitting. It's there in mllib which works with RDDs. I am looking the same for dataframes.
Found a K-Fold Cross Validation support in Spark. It can be done using CrossValidation() with Estimators, Evaluators, ParamMap and number of folds. This helps in finding the best parameters for the model i.e model tuning.
Refer http://spark.apache.org/docs/latest/ml-tuning.html for more details.
I ran a multi-class Logistic Regression with Spark but I would like to use
SVM to cross validate results.
It looks like Spark 1.6 only supports SVM binary classifications. Should I use other tools to do this? H20 for example?
After some research, I found this branch which was not integrated in Spark 1.6 that allowed me to run the SVM on a multi class classification problem.
Big thanks to Bekbolatov.
The commit can be ofund here:
https://github.com/Bekbolatov/spark/commit/463d73323d5f08669d5ae85dc9791b036637c966