How to fix "error: encountered unrecoverable cycle resolving import"? - scala

How to resolve the following compile error?
SOApp.scala:7: error: encountered unrecoverable cycle resolving import.
Note: this is often due in part to a class depending on a definition nested within its companion.
If applicable, you may wish to try moving some members into another object.
import spark.implicits._
Code:
object SOApp extends App with Logging {
// For implicit conversions like converting RDDs to DataFrames
import spark.implicits._
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("Stackoverflow App")
.master("local[*]")
.getOrCreate()
}

tl;dr Move import spark.implicits._ after val spark = SparkSession...getOrCreate().
That name spark causes a lot of confusion since it could refer to org.apache.spark package as well as to spark value.
Unlike Java, Scala allows for import statements in many more places.
What you could consider a Spark SQL idiom is to create a spark value that gives access to implicits. In Scala, you can only bring implicits into scope from stable objects (like values) so the following is correct:
// For implicit conversions like converting RDDs to DataFrames
import spark.implicits._
And as you comment says, it's to bring implicit conversions of RDDs to DataFrames (among the things).
This is not to import org.apache.spark package, but for the implicit conversions.

Related

sparkpb UDF compile giving "error: could not find implicit value for evidence parameter of type frameless.TypedEncoder[Array[Byte]]"

I'm a scala newbie, using pyspark extensively (on DataBricks, FWIW). I'm finding that Protobuf deserialization is too slow for me in python, so I'm porting my deserialization udf to scala.
I've compiled my .proto files to scala and then a JAR using scalapb as described here
When I try to use these instructions to create a UDF like this:
import gnmi.gnmi._
import org.apache.spark.sql.{Dataset, DataFrame, functions => F}
import spark.implicits.StringToColumn
import scalapb.spark.ProtoSQL
// import scalapb.spark.ProtoSQL.implicits._
import scalapb.spark.Implicits._
val deserialize_proto_udf = ProtoSQL.udf { bytes: Array[Byte] => SubscribeResponse.parseFrom(bytes) }
I get the following error:
command-4409173194576223:9: error: could not find implicit value for evidence parameter of type frameless.TypedEncoder[Array[Byte]]
val deserialize_proto_udf = ProtoSQL.udf { bytes: Array[Byte] => SubscribeResponse.parseFrom(bytes) }
I've double checked that I'm importing the correct implicits, to no avail. I'm pretty fuzzy on implicits, evidence parameters and scala in general.
I would really appreciate it if someone would point me in the right direction. I don't even know how to start diagnosing!!!
Update
It seems like frameless doesn't include an implicit encoder for Array[Byte]???
This works:
frameless.TypedEncoder[Byte]
this does not:
frameless.TypedEncoder[Array[Byte]]
The code for frameless.TypedEncoder seems to include a generic Array encoder, but I'm not sure I'm reading it correctly.
#Dymtro, Thanks for the suggestion. That helped.
Does anyone have ideas about what is going on here?
Update
Ok, progress - this looks like a DataBricks issue. I think that the notebook does something like the following on startup:
import spark.implicits._
I'm using scalapb, which requires that you don't do that
I'm hunting for a way to disable that automatic import now, or "unimport" or "shadow" those modules after they get imported.
If spark.implicits._ are already imported then a way to "unimport" (hide or shadow them) is to create a duplicate object and import it too
object implicitShadowing extends SQLImplicits with Serializable {
protected override def _sqlContext: SQLContext = ???
}
import implicitShadowing._
Testing for case class Person(id: Long, name: String)
// no import
List(Person(1, "a")).toDS() // doesn't compile, value toDS is not a member of List[Person]
import spark.implicits._
List(Person(1, "a")).toDS() // compiles
import spark.implicits._
import implicitShadowing._
List(Person(1, "a")).toDS() // doesn't compile, value toDS is not a member of List[Person]
How to override an implicit value?
Wildcard Import, then Hide Particular Implicit?
How to override an implicit value, that is imported?
How can an implicit be unimported from the Scala repl?
Not able to hide Scala Class from Import
NullPointerException on implicit resolution
Constructing an overridable implicit
Caching the circe implicitly resolved Encoder/Decoder instances
Scala implicit def do not work if the def name is toString
Is there a workaround for this format parameter in Scala?
Please check whether this helps.
Possible problem can be that you don't want just to unimport spark.implicits._ (scalapb.spark.Implicits._), you probably want to import scalapb.spark.ProtoSQL.implicits._ too. And I don't know whether implicitShadowing._ shadow some of them too.
Another possible workaround is to resolve implicits manually and use them explicitly.

Using Apache Spark's Time class/type

Note: I am using Spark 2.2.0. I am getting an error when trying to run my Scala code from my Zeppelin notebook
%spark
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.streaming.{Time, Seconds, StreamingContext}
...
...
case class Record(time: Time, topic: String, count: Integer)
...
...
import org.apache.spark.streaming.{Time, Seconds, StreamingContext} should allow me to use Time
When I try to run the paragraph/block in Zeppelin, I am getting this error:
<console>:12: error: not found: type Time
case class Record(time: Time, topic: String, count: Integer)
What could the issue be? Is Time deprecated or something in Spark 2? Any alternative to Time?
In general, when using Spark SQL or Spark Structured Streaming, I can recommend sticking to java.sql.Timestamp and java.sql.Date: they are fully integrated with the ecosystems, meaning you won't need custom serializers and there are very nice built-in functions (look them up in the Date functions section) that you can use.

Dataframe methods within SBT project

I have the following code that works on the spark-shell
df1.withColumn("tags_splitted", split($"tags", ",")).withColumn("tag_exploded", explode($"tags_splitted")).select("id", "tag_exploded").show()
But fails in sbt with the following errors:
not found: value split
not found: value explode
My scala code has the following
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("Books").getOrCreate()
import spark.implicits._
Can someone give me a pointer to what is wrong in the sbt enviroment?
Thanks
The split and explode function are available in the package org.apache.spark.sql inside functions.
So you need to import both
org.apache.spark.sql.functions.split
org.apache.spark.sql.functions.explode
Or
org.apache.spark.sql.functions._
Hope this helps!

How to make SparkSession and Spark SQL implicits globally available (in functions and objects)?

I have a project with many .scala files inside a package. I want to use Spark SQL as follows:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
val spark: SparkSession = SparkSession.builder()
.appName("My app")
.config("spark.master", "local")
.getOrCreate()
// For implicit conversions like converting RDDs to DataFrames
import spark.implicits._
Is it a good practice to wrap the above code inside a singleton object like:
object sparkSessX{
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
val spark: SparkSession = SparkSession.builder()
.appName("My App")
.config("spark.master", "local")
.getOrCreate()
// For implicit conversions like converting RDDs to DataFrames
import spark.implicits._
}
and every class to extend or import that object?
I've never seen it before, but the more Scala developers use Spark the more we see new design patterns emerge. That could be one.
I think you could instead consider making val spark implicit and pass it around where needed through this implicit context (as the second argument set of your functions).
I'd however consider making the object a trait (as I'm not sure you can extend Scala objects) and moreover to make room for other traits of your classes.

SQLContext implicits

I am learning spark and scala. I am well versed in java, but not so much in scala. I am going through a tutorial on spark, and came across the following line of code, which has not been explained:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
(sc is the SparkContext instance)
I know the concepts behind scala implicits (atleast I think I know). Could somebody explain to me what exactly is meant by the import statement above? What implicits are bound to the sqlContext instance when it is instantiated and how? Are these implicits defined inside the SQLContext class?
EDIT
The following seems to work for me as well (fresh code):
val sqlc = new SQLContext(sc)
import sqlContext.implicits._
In this code just above. what exactly is sqlContext and where is it defined?
From ScalaDoc:
sqlContext.implicits contains "(Scala-specific) Implicit methods available in Scala for converting common Scala objects into DataFrames. "
And is also explained in Spark programming guide:
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
For example in the code below .toDF() won't work unless you will import sqlContext.implicits:
val airports = sc.makeRDD(Source.fromFile(airportsPath).getLines().drop(1).toSeq, 1)
.map(s => s.replaceAll("\"", "").split(","))
.map(a => Airport(a(0), a(1), a(2), a(3), a(4), a(5), a(6)))
.toDF()
What implicits are bound to the sqlContext instance when it is
instantiated and how? Are these implicits defined inside the SQLContext class?
Yes they are defined in object implicits inside SqlContext class, which extends SQLImplicits.scala. It looks there are two types of implicit conversions defined there:
RDD to DataFrameHolder conversion, which enables using above mentioned rdd.toDf().
Various instances of Encoder which are "Used to convert a JVM object of type T to and from the internal Spark SQL representation."