I'm trying to write some abstractions in some Spark Scala code, but running into some issues when using objects. I'm using Spark's Encoder which is used to convert case classes to database schema's here as an example, but I think this question goes for any context bound.
Here is a minimal code example of what I'm trying to do:
package com.sample.myexample
import org.apache.spark.sql.Encoder
import scala.reflect.runtime.universe.TypeTag
case class MySparkSchema(id: String, value: Double)
abstract class MyTrait[T: TypeTag: Encoder]
object MyObject extends MyTrait[MySparkSchema]
Which fails with the following compilation error:
Unable to find encoder for type com.sample.myexample.MySparkSchema. An implicit Encoder[com.sample.myexample.MySparkSchema] is needed to store com.sample.myexample.MySparkSchema instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
I tried defining the implicit evidence in the object like such: (the import statement was suggested by IntelliJ, but it looks a bit weird)
import com.sample.myexample.MyObject.encoder
object MyObject extends MyTrait[MySparkSchema] {
implicit val encoder: Encoder[MySparkSchema] = Encoders.product[MySparkSchema]
}
Which fails with the error message
MyTrait.scala:13:25: super constructor cannot be passed a self reference unless parameter is declared by-name
One other thing I tried is to convert the object to a class and provide implicit evidence to the constructor:
class MyObject(implicit evidence: Encoder[MySparkSchema]) extends MyTrait[MySparkSchema]
This compiles and works fine, but at the expense of MyObject now being a class instead.
Question: Is it possible to provide implicit evidence for the context bounds when extending a trait? Or does the implicit evidence force me to make a constructor and use class instead?
Your first error almost gives you the solution, you have to import spark.implicits._ for Product types.
You could do this:
val spark: SparkSession = SparkSession.builder().getOrCreate()
import spark.implicits._
Full Example
package com.sample.myexample
import org.apache.spark.sql.Encoder
import scala.reflect.runtime.universe.TypeTag
case class MySparkSchema(id: String, value: Double)
abstract class MyTrait[T: TypeTag: Encoder]
val spark: SparkSession = SparkSession.builder().getOrCreate()
import spark.implicits._
object MyObject extends MyTrait[MySparkSchema]
Related
I'm trying to make a generic implicit provider which can create an implicit value for a given type, something in the lines of:
trait Evidence[T]
class ImplicitProvider[T] {
class Implementation extends Evidence[T]
implicit val evidence: Evidence[T] = new Implementation
}
To use this implicit, I create a val provider = new ImplicitProvider[T] instance where necessary and import from it import provider._. This works fine as long as there is just one instance. However sometimes implicits for several types are needed in one place
case class A()
case class B()
class Test extends App {
val aProvider = new ImplicitProvider[A]
val bProvider = new ImplicitProvider[B]
import aProvider._
import bProvider._
val a = implicitly[Evidence[A]]
val b = implicitly[Evidence[B]]
}
And this fails to compile with could not find implicit value for parameter and not enough arguments for method implicitly errors.
If I use implicit vals from providers directly, everything starts to work again.
implicit val aEvidence = aProvider.evidence
implicit val bEvidence = bProvider.evidence
However I'm trying to avoid importing individual values, as there are actually several implicits inside each provider and the goal is to abstract them if possible.
Can this be achieved somehow or do I want too much from the compiler?
The issue is that when you import from both objects, you're bringing in two entities that have colliding names: evidence in aProvider and evidence in bProvider. The compiler cannot disambiguate those, both because of how its implemented, and because it'd be a bad idea for implicits, which can already be arcane, to be able to do things that cannot be done explicitly (disambiguating between clashing names).
What I don't understand is what the point of ImplicitProvider is. You can pull the Implementation class out to the top level and have an object somewhere that holds the implicit vals.
class Implementation[T] extends Evidence[T]
object Evidence {
implicit val aEvidence: Evidence[A] = new Implementation[A]
implicit val bEvidence: Evidence[B] = new Implementation[B]
}
// Usage:
import Evidence._
implicitly[Evidence[A]]
implicitly[Evidence[B]]
Now, there is no name clash.
If you need to have an actual ImplicitProvider, you can instead do this:
class ImplicitProvider[T] { ... }
object ImplicitProviders {
implicit val aProvider = new ImplicitProvider[A]
implicit val bProvider = new ImplicitProvider[B]
implicit def ImplicitProvider2Evidence[T: ImplicitProvider]: Evidence[T]
= implicitly[ImplicitProvider[T]].evidence
}
// Usage
import ImplicitProviders._
// ...
Im using ArgonautShapeless to define some json codecs.
When I provide the type for my codec I get StackOverflowError's but If I leave the type off it works. How can I provide the type?
My understanding of the problem is that the implicit lookup from def of[A: DecodeJson] = implicitly[DecodeJson[A]] finds my definition on the same line implicit def fooCodec: DecodeJson[Foo] and thus is recursive so breaks.
Is there some other way that will allow me to provide the type? Ideally I want to have one object in my project where I define all of the codes and they may depend on each other.
import $ivy.`com.github.alexarchambault::argonaut-shapeless_6.2:1.2.0-M4`
import argonaut._, Argonaut._
case class Foo(a: Int)
object SomeCodecs {
import ArgonautShapeless._
// this doesnt work
implicit def fooCodec: DecodeJson[Foo] = DecodeJson.of[Foo]
}
import SomeCodecs._
"""{"a":1}""".decode[Foo]
java.lang.StackOverflowError
ammonite.$sess.cmd3$SomeCodecs$.fooCodec(cmd3.sc:3)
It works if I leave the type off.
object SomeCodecs {
import ArgonautShapeless._
// this works
implicit def fooCodec = DecodeJson.of[Foo]
}
import SomeCodecs._
"""{"a":1}""".decode[Foo]
res4: Either[Either[String, (String, CursorHistory)], Foo] = Right(Foo(1))
Thanks
I want to create a Scala trait that should be implemented with a case class T. The trait is simply to load data and transform it into a Spark Dataset of type T. I got the error that no encoder can be stored, which I think is because Scala does not know that T should be a case class. How can I tell the compiler that? I've seen somewhere that I should mention Product, but there is no such class defined.. Feel free to suggest other ways to do this!
I have the following code but it is not compiling with the error: 42: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._
[INFO] .as[T]
I'm using Spark 1.6.1
Code:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{Dataset, SQLContext}
/**
* A trait that moves data on Hadoop with Spark based on the location and the granularity of the data.
*/
trait Agent[T] {
/**
* Load a Dataframe from the location and convert into a Dataset
* #return Dataset[T]
*/
protected def load(): Dataset[T] = {
// Read in the data
SparkContextKeeper.sqlContext.read
.format("com.databricks.spark.csv")
.load("/myfolder/" + location + "/2016/10/01/")
.as[T]
}
}
Your code is missing 3 things:
Indeed, you must let compiler know that T is subclass of Product (the superclass of all Scala case classes and Tuples)
Compiler would also require the TypeTag and ClassTag of the actual case class. This is used implicitly by Spark to overcome type erasure
import of sqlContext.implicits._
Unfortunately, you can't add type parameters with context bounds in a trait, so the simplest workaround would be to use an abstract class instead:
import scala.reflect.runtime.universe.TypeTag
import scala.reflect.ClassTag
abstract class Agent[T <: Product : ClassTag : TypeTag] {
protected def load(): Dataset[T] = {
val sqlContext: SQLContext = SparkContextKeeper.sqlContext
import sqlContext.implicits._
sqlContext.read.// same...
}
}
Obviously, this isn't equivalent to using a trait, and might suggest that this design isn't the best fit for the job. Another alternative is placing load in an object and moving the type parameter to the method:
object Agent {
protected def load[T <: Product : ClassTag : TypeTag](): Dataset[T] = {
// same...
}
}
Which one is preferable is mostly up to where and how you're going to call load and what you're planning to do with the result.
You need to take two actions :
Add import sparkSession.implicits._ in your imports
Make your trait trait Agent[T <: Product]
Linked from this question
I came across Slick's documentation and found it mandates a def * method in the definition of a table to get a mapped projection.
So the line looks like this
def * = (name, id.?).<>(User.tupled,User.unapply)
Slick example here
I see the <> method is invoked on a tuple - in this case a Tuple2. The method is defined on the case class ShapedValue in Slick's code. How do I find out the implicit method that is doing the lookup?
Here are my imports:
import scala.concurrent.Await
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration.Duration
import slick.driver.H2Driver.api._
import slick.lifted.ShapedValue
import slick.lifted.ProvenShape
So i figured that one out for myself.
The object Shape implements three traits namely ConstColumnShapeImplicits , AbstractTableShapeImplicits and TupleShapeImplicits . These three traits handle the implicit conversions concerning Shapes in Slick .
The TupleShapeImplicits houses all implicit conversion methods required to convert a Tuple to a TupleShape.
Now in the line (name, id.?, salary.?).<>(User.tupled,User.unapply) what is happening is that the the method <> has a implicit parameter of Shape
The Shape class thus comes in scope for the implicit conversion. And the TupleShapeImplicits comes into scope as well.
I have the following code which uses spray-json to deserialise some JSON into a case class, via the parseJson method.
Depending on where the implicit JsonFormat[MyCaseClass] is defined (in-line or imported from companion object), and whether there is an explicit type provided when it is defined, the code may not compile.
I don't understand why importing the implicit from the companion object requires it to have an explicit type when it is defined, but if I put it inline, this is not the case?
Interestingly, IntelliJ correctly locates the implicit parameters (via cmd-shift-p) in all cases.
I'm using Scala 2.11.7.
Broken Code - Wildcard import from companion object, inferred type:
import SampleApp._
import spray.json._
class SampleApp {
import MyJsonProtocol._
val inputJson = """{"children":["a", "b", "c"]}"""
println(s"Deserialise: ${inputJson.parseJson.convertTo[MyCaseClass]}")
}
object SampleApp {
case class MyCaseClass(children: List[String])
object MyJsonProtocol extends DefaultJsonProtocol {
implicit val myCaseClassSchemaFormat = jsonFormat1(MyCaseClass)
}
}
Results in:
Cannot find JsonReader or JsonFormat type class for SampleAppObject.MyCaseClass
Note that the same thing happens with an explicit import of the myCaseClassSchemaFormat implicit.
Working Code #1 - Wildcard import from companion object, explicit type:
Adding an explicit type to the JsonFormat in the companion object causes the code to compile:
import SampleApp._
import spray.json._
class SampleApp {
import MyJsonProtocol._
val inputJson = """{"children":["a", "b", "c"]}"""
println(s"Deserialise: ${inputJson.parseJson.convertTo[MyCaseClass]}")
}
object SampleApp {
case class MyCaseClass(children: List[String])
object MyJsonProtocol extends DefaultJsonProtocol {
//Explicit type added here now
implicit val myCaseClassSchemaFormat: JsonFormat[MyCaseClass] = jsonFormat1(MyCaseClass)
}
}
Working Code #2 - Implicits inline, inferred type:
However, putting the implicit parameters in-line where they are used, without the explicit type, also works!
import SampleApp._
import spray.json._
class SampleApp {
import DefaultJsonProtocol._
//Now in-line custom JsonFormat rather than imported
implicit val myCaseClassSchemaFormat = jsonFormat1(MyCaseClass)
val inputJson = """{"children":["a", "b", "c"]}"""
println(s"Deserialise: ${inputJson.parseJson.convertTo[MyCaseClass]}")
}
object SampleApp {
case class MyCaseClass(children: List[String])
}
After searching for the error message Huw mentioned in his comment, I was able to find this StackOverflow question from 2010: Why does this explicit call of a Scala method allow it to be implicitly resolved?
This led me to this Scala issue created in 2008, and closed in 2011: https://issues.scala-lang.org/browse/SI-801 ('require explicit result type for implicit conversions?')
Martin stated:
I have implemented a slightly more permissive rule: An implicit conversion without explicit result type is visible only in the text following its own definition. That way, we avoid the cyclic reference errors. I close for now, to see how this works. If we still have issues we migth come back to this.
This holds - if I re-order the breaking code so that the companion object is declared first, then the code compiles. (It's still a little weird!)
(I suspect I don't see the 'implicit method is not applicable here' message because I have an implicit value rather than a conversion - though I'm assuming here that the root cause is the same as the above).