Strange behavior with implicits - scala

I'm using the Scalacheck library to test my application. In that library there's a Gen object that defines implicit conversions of any object to a generator of objects of that class.
E.g., importing Gen._ lets you call methods such as sample on any object, through its implicit conversion to Gen:
scala> import org.scalacheck.Gen._
import org.scalacheck.Gen._
scala> "foo" sample
res1: Option[java.lang.String] = Some(foo)
In this example, the implicit Gen.value() is applied to "foo", yielding a generator that always returns Some(foo).
But this doesn't work:
scala> import org.scalacheck.Gen.value
import org.scalacheck.Gen.value
scala> "foo" sample
<console>:5: error: value sample is not a member of java.lang.String
"foo" sample
^
Why not?
Update
I'm using Scala 2.7.7final and ScalaCheck 2.7.7-1.6.
Update
Just switched to Scala 2.8.0.final with ScalaCheck 2.8.0-1.7. The problem did indeed go away.

I just tried this with Scala 2.8.0.final and ScalaCheck 1.7 built for the same. Both imports worked, meaning the second line produced the desired result for both imports:
scala> "foo" sample
res1: Option[java.lang.String] = Some(foo)
What version of Scala and ScalaCheck did you use?

Simple: You did not import the implicit conversion (whatever its name is), you only imported something called value from object org.scalacheck.Gen.
Correction / clarification:
Gen.value (that's object Gen, not trait Gen[+T]) is the implicit used to wrap arbitrary values in an instance of (an anonymous class implementing) trait Gen[T] (where T is a function from Gen.Params to the argument to which Gen.value is applied). Gen.sample is a method of trait Gen[T] that invokes its (the concrete Gen subclass) apply method to get the synthesized value.
Sadly, having looked closer, I have to admit I don't understand why the code doesn't work when the rest of the members of object Gen are not imported.

Related

Scala: type hint for lambda

I'm refreshing scala. This looks very simple to me but I can't get it to run:
import java.nio.file.{FileSystems, Files}
object ScalaHello extends App {
val dir = FileSystems.getDefault.getPath("/home/intelli/workspace")
Files.walk(dir).map(_.toFile).forEach(println)
}
It throws error at the mapping lambda:
argument expression's type is not compatible with formal parameter type;
found : java.util.function.Function[java.nio.file.Path,java.io.File]
required: java.util.function.Function[_ >: java.nio.file.Path, _ <: ?R]
I suspect it has something to do with providing type hints for the lambda but I can't find anything surfing Google. Much appreciated
Note that Files.walk returns a Java Stream, so map and forEach come from Java.
Assuming you are using Scala 2.12, your code will work if you either:
Update Scala version to 2.13 (no need to make any other changes in this case)
Specify return type of map:
Files.walk(dir).map[File](_.toFile).forEach(println)
Convert to Scala collections before calling map:
import scala.collection.JavaConverters._
Files.walk(dir).iterator().asScala.map(_.toFile).foreach(println)

sparkpb UDF compile giving "error: could not find implicit value for evidence parameter of type frameless.TypedEncoder[Array[Byte]]"

I'm a scala newbie, using pyspark extensively (on DataBricks, FWIW). I'm finding that Protobuf deserialization is too slow for me in python, so I'm porting my deserialization udf to scala.
I've compiled my .proto files to scala and then a JAR using scalapb as described here
When I try to use these instructions to create a UDF like this:
import gnmi.gnmi._
import org.apache.spark.sql.{Dataset, DataFrame, functions => F}
import spark.implicits.StringToColumn
import scalapb.spark.ProtoSQL
// import scalapb.spark.ProtoSQL.implicits._
import scalapb.spark.Implicits._
val deserialize_proto_udf = ProtoSQL.udf { bytes: Array[Byte] => SubscribeResponse.parseFrom(bytes) }
I get the following error:
command-4409173194576223:9: error: could not find implicit value for evidence parameter of type frameless.TypedEncoder[Array[Byte]]
val deserialize_proto_udf = ProtoSQL.udf { bytes: Array[Byte] => SubscribeResponse.parseFrom(bytes) }
I've double checked that I'm importing the correct implicits, to no avail. I'm pretty fuzzy on implicits, evidence parameters and scala in general.
I would really appreciate it if someone would point me in the right direction. I don't even know how to start diagnosing!!!
Update
It seems like frameless doesn't include an implicit encoder for Array[Byte]???
This works:
frameless.TypedEncoder[Byte]
this does not:
frameless.TypedEncoder[Array[Byte]]
The code for frameless.TypedEncoder seems to include a generic Array encoder, but I'm not sure I'm reading it correctly.
#Dymtro, Thanks for the suggestion. That helped.
Does anyone have ideas about what is going on here?
Update
Ok, progress - this looks like a DataBricks issue. I think that the notebook does something like the following on startup:
import spark.implicits._
I'm using scalapb, which requires that you don't do that
I'm hunting for a way to disable that automatic import now, or "unimport" or "shadow" those modules after they get imported.
If spark.implicits._ are already imported then a way to "unimport" (hide or shadow them) is to create a duplicate object and import it too
object implicitShadowing extends SQLImplicits with Serializable {
protected override def _sqlContext: SQLContext = ???
}
import implicitShadowing._
Testing for case class Person(id: Long, name: String)
// no import
List(Person(1, "a")).toDS() // doesn't compile, value toDS is not a member of List[Person]
import spark.implicits._
List(Person(1, "a")).toDS() // compiles
import spark.implicits._
import implicitShadowing._
List(Person(1, "a")).toDS() // doesn't compile, value toDS is not a member of List[Person]
How to override an implicit value?
Wildcard Import, then Hide Particular Implicit?
How to override an implicit value, that is imported?
How can an implicit be unimported from the Scala repl?
Not able to hide Scala Class from Import
NullPointerException on implicit resolution
Constructing an overridable implicit
Caching the circe implicitly resolved Encoder/Decoder instances
Scala implicit def do not work if the def name is toString
Is there a workaround for this format parameter in Scala?
Please check whether this helps.
Possible problem can be that you don't want just to unimport spark.implicits._ (scalapb.spark.Implicits._), you probably want to import scalapb.spark.ProtoSQL.implicits._ too. And I don't know whether implicitShadowing._ shadow some of them too.
Another possible workaround is to resolve implicits manually and use them explicitly.

Why does a spark RDD behave differently depending on contents?

Based on this description of datasets and dataframes I wrote this very short test code which works.
import org.apache.spark.sql.functions._
val thing = Seq("Spark I am your father", "May the spark be with you", "Spark I am your father")
val wordsDataset = sc.parallelize(thing).toDS()
If that works... why does running this give me a
error: value toDS is not a member of org.apache.spark.rdd.RDD[org.apache.spark.sql.catalog.Table]
import org.apache.spark.sql.functions._
val sequence = spark.catalog.listDatabases().collect().flatMap(db =>
spark.catalog.listTables(db.name).collect()).toSeq
val result = sc.parallelize(sequence).toDS()
toDS() is not a member of RRD[T]. Welcome to the bizarre world of Scala implicits where nothing is what it seems to be.
toDS() is a member of DatasetHolder[T]. In SparkSession, there is an object called implicits. When brought in scope with an expression like import sc.implicits._, an implicit method called rddToDatasetHolder becomes available for resolution:
implicit def rddToDatasetHolder[T](rdd: RDD[T])(implicit arg0: Encoder[T]): DatasetHolder[T]
When you call rdd.toDS(), the compiler first searches the RDD class and all of its superclasses for a method called toDS(). It doesn't find one so what it does is start searching all the compatible implicits in scope. While doing so, it finds the rddToDatasetHolder method which accepts an RDD instance and returns an object of a type which does have a toDS() method. Basically, the compiler rewrites:
sc.parallelize(sequence).toDS()
into
SparkSession.implicits.rddToDatasetHolder(sc.parallelize(sequence)).toDS()
Now, if you look at rddToDatasetHolder itself, it has two argument lists:
(rdd: RDD[T])
(implicit arg0: Encoder[T])
Implicit arguments in Scala are optional and if you do not supply the argument explicitly, the compiler searches the scope for implicits that match the required argument type and passes whatever object it finds or can construct. In this particular case, it looks for an instance of the Encoder[T] type. There are many predefined encoders for the standard Scala types, but for most complex custom types no predefined encoders exist.
So, in short: The existence of a predefined Encoder[String] makes it possible to call toDS() on an instance of RDD[String], but the absence of a predefined Encoder[org.apache.spark.sql.catalog.Table] makes it impossible to call toDS() on an instance of RDD[org.apache.spark.sql.catalog.Table].
By the way, SparkSession.implicits contains the implicit class StringToColumn which has a $ method. This is how the $"foo" expression gets converted to a Column instance for column foo.
Resolving all the implicit arguments and implicit transformations is why compiling Scala code is so dang slow.

What's the difference between scala.List and scala.collection.immutable.List?

I'm new to Scala. And I noticed that List exists in both scala and scala.collection.immutable. At the beginning, I thought scala.List is just an alias for scala.collection.immutable.List. But later I found that:
scala> typeOf[scala.List[A]]
res1: reflect.runtime.universe.Type = scala.List[A]
scala> typeOf[scala.collection.immutable.List[A]]
res2: reflect.runtime.universe.Type = List[A]
As shown above, the typeof on these two Lists give different results, which make me doubious about my judgement.
I would like to know:
Are scala.List and scala.collection.immutable.List the same thing?
If yes, why typeOf gives different results as shown above?
If no, what are the differences?
Thanks!
As #Patryk mentioned in comment, it is an alias for scala.collection.immutable.List.
If you look at scala\package.scala:
type List[+A] = scala.collection.immutable.List[A]
val List = scala.collection.immutable.List
So fundamentally they are the same:
scala> implicitly[scala.List[Int] =:= scala.collection.immutable.List[Int]]
res19: =:=[List[Int],List[Int]] = <function1>
scala> :type List
scala.collection.immutable.List.type
scala> :type scala.collection.immutable.List
collection.immutable.List.type
What you are using is I believe typeOf from import scala.reflect.runtime.universe._. Which is a reflective representation of the type parameter and hence it gave you just scala.List (which is correct behavior in my opinion)
scala.List is type alias for scala.collection.immutable.List
package scala
type List[+A] = scala.collection.immutable.List[A]
Anything under the package scala is by default available you don't have to explicitly import it.
As List is something which is widely used it makes perfect sense if it is available by default in the context rather than importing. That is why a type alias of the scala.collection.immutable.List is in package scala.
typeOf must be showing the higher preference package by default.

What is the consequence of ignoring scala's advanced feature warning?

I'm reading this tutorial about implicit conversion.
I entered this code in REPL with -feature switch:
object Rational {
implicit def intToRational(x: Int): Rational = new Rational(x)
}
And I got this warning:
<console>:9: warning: implicit conversion method intToRational should be enabled
by making the implicit value scala.language.implicitConversions visible.
This can be achieved by adding the import clause 'import scala.language.implicitConversions'
or by setting the compiler option -language:implicitConversions.
See the Scala docs for value scala.language.implicitConversions for a discussion
why the feature should be explicitly enabled.
implicit def intToRational(x: Int): Rational = new Rational(x)
But the implicit conversion works fine when I run this code:
scala> 12 * new Rational(1, 3)
res5: Rational = 4/1
So is there bad consequence if I don't follow the warning suggestion? (i.e. adding import clause or setting compiler option)
Possibly in some future version the code won't compile without adding the import clause. Or if you want to use -Xfatal-warnings.
For other feature warnings (reflective calls in particular) you may actually want to eliminate them; this doesn't really apply to this specific warning. Read the docs, as the warning suggests.