Why does from_json fail with “not found : value from_json"? (2) - scala

Have already read the answer to this question that is on SO. None of those fixes are my problem.
I am unable to call the function "from_json".
I already had below in my code:
import org.apache.spark.sql.functions._
I also tried adding:
import org.apache.spark.sql.Column
I am running Scala/Spark through Eclipse. Scala Version 2.11.11, Spark Version 2.0.0.
Any ideas?

from_json function isn't available in Spark 2.0
It is available from Spark 2.1
Release notes of spark 2.1 mentions about adding from_json functionality

Related

How to install kafka module in pyspark

I have problem when import KafkaUtils is :
No module named 'pyspark.streaming.kafka' But i don't know how to install kafka module.
I use python 3.6.8, spark 2.2.0 and kafka_2.12-2.5.0
As it turns out, KafkaUtils is being deprecated and replaced with Spark Structured Streaming. Which means you have two paths forward:
Redesign your application to use Structured Streaming instead (see https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html for a primer).
Downgrade your version of Spark to a version that still includes KafkaUtils as part of the distribution (you'll find that KafkaUtils won't need to be installed separately).

Upgrading Play to 2.4, Slick to 3.1.1, value withTransaction is not a member of play.api.db.slick.Database

I am trying to upgrade my application from using Play 2.3.x to Play 2.4.x (will end at 2.6, but going one step at a time) and Slick from 2.1.0 to 3.1.1.
I have done my best to follow Play's migration guide, the Play Slick migration guide, and the Slick upgrade guides.
One of the problems I'm having right now is with the following line:
val db: slick.Database = play.api.db.slick.DB
This no longer seems to be the correct way to do this b/c I get errors like:
value withTransaction is not a member of play.api.db.slick.Database
From the Play slick migration guide, it seems like I should modify this to something like
val dbConfig = DatabaseConfigProvider.get[JdbcProfile](Play.current)
But idk if I just don't have the right imports or what, but I get errors like:
object driver is not a member of package play.api.db.slick
not found: value DatabaseConfigProvider
For more context, here is one of the files I'm working with that gives this error: https://github.com/ProjectSidewalk/SidewalkWebpage/blob/2c48dfa2e34c691e40568bfa9d50493aa3fe9971/app/models/attribute/GlobalAttributeTable.scala
Anyone know what I missed among these migration guides?
Thank you in advance!
Turns out that I was missing a few things:
I had not realized that I needed to use a more recent version of the play-slick library (I was using 0.8.0 still instead of 1.1.1).
I needed to add the import import play.api.Play instead of the import import play.api.Play.current that I already had.
I had an import import play.api.db.slick that was causing the "object driver is not a member of package play.api.db.slick" error at the line with this import: import slick.driver.JdbcProfile. I just removed the former import that was not needed.
As #Valerii said, withTransaction has been removed in Slick 3.1, and the replacement is documented in the various links in the comments above.

load pmml (generated by sklearn) in spark to predict but get error

I am following instruction jpmml-evaluator-spark to load local pmml model
my code is like below
import java.io.File
import org.jpmml.evaluator.spark._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql._
// load pmml
val new File(getClass.getClassLoader.getResource("random_forest.pmml").getFile)
// create evaluator
val evaluator = EvaluatorUtil.createEvaluator(pmmlFile)
I cannot show the error message directly, so I put it here
guesses:
there are some reasons i think may cause this problem
1, "jpmml-evaluator-spark" does not support PMML4.3, even if the author said new version 1.1.0 has already supported PMML4.3
2, there are some problems about my "random_forest.pmml", because this file is from others
Note:
development environment
spark 2.1.1
scala 2.11.8
and I run on the local, mac system version is OS X El Capitan Version 10.11.6
You are using Apache Spark 2.0, 2.1 or 2.2, which has prepended a legacy version of the JPMML-Model library (1.2.15, to be precise) to your application classpath. This issue is documented in SPARK-15526.
Solution - fix your application classpath as described in JPMML-Evaluator-Spark documentation (alternatively, consider switching to Apache Spark 2.3.0 or newer).
Another option of using PMML in Spark is PMML4S-Spark, which supports the latest PMML4.4, for example:
import org.pmml4s.spark.ScoreModel
val model = ScoreModel.fromFile(pmmlFile)
val scoreDf = model.transform(df)

How to Solve "Cannot Import Name UNIX_TIMESTAMP" in PySpark?

Spark version 1.3.0
Python Version: 2.7.8
I am trying to add a module called
from pyspark.sql.functions import unix_timestamp
However, it gives me an error:
ImportError: cannot import name SparkSession
How can I solve this?
Your pyspark python library is incompatible with the Spark Version that you are using( version == 1.3.0) and SparkSession was introduced in 2.0.0.
Try updating Spark to latest version 2.3.0.

object SparkSession is not a member of package org.apache.spark.sql [duplicate]

This question already has answers here:
Resolving dependency problems in Apache Spark
(7 answers)
Closed 6 years ago.
I am trying to use the latest Spark api with SparkSession.
While i am importing the package, my eclipse is showing an error show in attachment.
I am using 2.10.6 scala compiler.
Please help me to resolve this issue.
Your version specified in Maven is too old. SparkSession is introduced in Spark 2.0. Your need to use Spark 2.0.0 or later to import it. The answers of question below may help you to configure the details:
What is version library spark supported SparkSession