load pmml (generated by sklearn) in spark to predict but get error

load pmml (generated by sklearn) in spark to predict but get error - pmml

I am following instruction jpmml-evaluator-spark to load local pmml model
my code is like below
import java.io.File
import org.jpmml.evaluator.spark._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql._
// load pmml
val new File(getClass.getClassLoader.getResource("random_forest.pmml").getFile)
// create evaluator
val evaluator = EvaluatorUtil.createEvaluator(pmmlFile)
I cannot show the error message directly, so I put it here
guesses:
there are some reasons i think may cause this problem
1, "jpmml-evaluator-spark" does not support PMML4.3, even if the author said new version 1.1.0 has already supported PMML4.3
2, there are some problems about my "random_forest.pmml", because this file is from others
Note:
development environment
spark 2.1.1
scala 2.11.8
and I run on the local, mac system version is OS X El Capitan Version 10.11.6

You are using Apache Spark 2.0, 2.1 or 2.2, which has prepended a legacy version of the JPMML-Model library (1.2.15, to be precise) to your application classpath. This issue is documented in SPARK-15526.
Solution - fix your application classpath as described in JPMML-Evaluator-Spark documentation (alternatively, consider switching to Apache Spark 2.3.0 or newer).

Another option of using PMML in Spark is PMML4S-Spark, which supports the latest PMML4.4, for example:
import org.pmml4s.spark.ScoreModel
val model = ScoreModel.fromFile(pmmlFile)
val scoreDf = model.transform(df)

Related

Cannot import Cosmosdb in databricks

I setup a new cluster on databricks which using databricks runtime version 10.1 (includes Apache Spark 3.2.0, Scala 2.12). I also installed azure_cosmos_spark_3_2_2_12_4_6_2.jar in Libraries.
I create a new notebook with Scala
import com.microsoft.azure.cosmosdb.spark.schema._
import com.microsoft.azure.cosmosdb.spark.CosmosDBSpark
import com.microsoft.azure.cosmosdb.spark.config.Config
But I still get error: object cosmosdb is not a member of package com.microsoft.azure
Does anyone know which step I missing?
Thanks

Looks like the imports you are doing are for the older Spark Connector (https://github.com/Azure/azure-cosmosdb-spark).
For the Spark 3.2 Connector, you might want to follow the quickstart guides: https://learn.microsoft.com/azure/cosmos-db/sql/create-sql-api-spark
The official repository is: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3-2_2-12
Complete Scala sample: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3_2-12/Samples/Scala-Sample.scala
Here is the configuration reference: https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos-spark_3_2-12/docs/configuration-reference.md

You may be missing the pip install step:
pip install azure-cosmos

java.lang.NoSuchMethodError: com.mongodb.internal.connection.Cluster.selectServer

I am new to Apache Spark and I am using Scala and Mongodb to learn it.
https://docs.mongodb.com/spark-connector/current/scala-api/
I am trying to read the RDD from my MongoDB database, my notebook script as below:
import com.mongodb.spark.config._
import com.mongodb.spark._
val readConfig = ReadConfig(Map("uri" -> "mongodb+srv://root:root#mongodbcluster.td5gp.mongodb.net/test_database.test_collection?retryWrites=true&w=majority"))
val testRDD = MongoSpark.load(sc, readConfig)
print(testRDD.collect)
At the print(testRDD.collect) line, I got this error:
java.lang.NoSuchMethodError:
com.mongodb.internal.connection.Cluster.selectServer(Lcom/mongodb/selector/ServerSelector;)Lcom/mongodb/internal/connection/Server;
And more than 10 lines "at..."
Used libraries:
org.mongodb.spark:mongo-spark-connector_2.12:3.0.1
org.mongodb.scala:mongo-scala-driver_2.12:4.2.3
Is this the problem from Mongodb internal libraries or how could I fix it?
Many thanks

I suspect that there is a conflict between mongo-spark-connector and mongo-scala-driver. The former is using Mongo driver 4.0.5, but the later is based on the version 4.2.3. I would recommend to try only with mongo-spark-connector

I was also facing the same problem, solved it using mongo-spark-connector-2.12:3.0.1 jar and with that also added jar of Scalaj HTTP » 2.4.2. It's working fine now.

How to install kafka module in pyspark

I have problem when import KafkaUtils is :
No module named 'pyspark.streaming.kafka' But i don't know how to install kafka module.
I use python 3.6.8, spark 2.2.0 and kafka_2.12-2.5.0

As it turns out, KafkaUtils is being deprecated and replaced with Spark Structured Streaming. Which means you have two paths forward:
Redesign your application to use Structured Streaming instead (see https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html for a primer).
Downgrade your version of Spark to a version that still includes KafkaUtils as part of the distribution (you'll find that KafkaUtils won't need to be installed separately).

Why does from_json fail with “not found : value from_json"? (2)

Have already read the answer to this question that is on SO. None of those fixes are my problem.
I am unable to call the function "from_json".
I already had below in my code:
import org.apache.spark.sql.functions._
I also tried adding:
import org.apache.spark.sql.Column
I am running Scala/Spark through Eclipse. Scala Version 2.11.11, Spark Version 2.0.0.
Any ideas?

from_json function isn't available in Spark 2.0
It is available from Spark 2.1
Release notes of spark 2.1 mentions about adding from_json functionality

SparkContext cannot be initialized in 'yarn-client' mode called from Scala-IDE

I have installed Cloudera VM (Single node) and inside this VM i have Spark running on top of Yarn. I would like to use Eclipse IDE (with scala plugin) for testing/learning with Spark.
If i instantiate SparkContext as following, everything works as i expected
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._
val sparkConf = new SparkConf().setAppName("TwitterPopularTags").setMaster("local[2]")
However, if i want now to connect to local server by changing the master to 'yarn-client' then it does not work:
val master = "yarn-client"
val sparkConf = new SparkConf().setAppName("TwitterPopularTags").setMaster(master)
Specifically im getting following errors:
Error details displayed in the Eclipse console:
Error details from the NodeManager logs:
Here are the things i have tried so far:
1. Dependencies
I added all the dependencies through Maven repository
Cloudera version is 5.5 and corresponding Hadoop version is 2.6.0 and Spark version is 1.5.0.
2. Configurations
I added 3 path variables into Eclipse classpath:
SPARK_CONF_DIR=/etc/spark/conf/
HADOOP_CONF_DIR=/usr/lib/hadoop/
YARN_CONF_DIR=/etc/hadoop/conf.cloudera.yarn/
Can anybody clarify me what is the problem here and ways to solve it?

I worked around it! I still don't understand what the exact problems is but i created a folder with my username in hadoop , i.e. /user/myusername directory and it worked. Anyway now i switched to Hortonworks distribution and i found it much more smoother to get started with than the Cloudera distribution.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

load pmml (generated by sklearn) in spark to predict but get error - pmml

Another option of using PMML in Spark is PMML4S-Spark, which supports the latest PMML4.4, for example: import org.pmml4s.spark.ScoreModel val model = ScoreModel.fromFile(pmmlFile) val scoreDf = model.transform(df)

Related

Cannot import Cosmosdb in databricks

java.lang.NoSuchMethodError: com.mongodb.internal.connection.Cluster.selectServer

How to install kafka module in pyspark

Why does from_json fail with “not found : value from_json"? (2)

SparkContext cannot be initialized in 'yarn-client' mode called from Scala-IDE

Categories

Resources