No implicits found for parameter evidence - scala

I have a line of code in a scala app that takes a dataframe with one column and two rows, and assigns them to variables start and end:
val Array(start, end) = datesInt.map(_.getInt(0)).collect()
This code works fine when run in a REPL, but when I try to put the same line in a scala object in Intellij, it inserts a grey (?: Encoder[Int]) before the .collect() statement, and show an inline error No implicits found for parameter evidence$6: Encoder[Int]
I'm pretty new to scala and I'm not sure how to resolve this.

Spark needs to know how to serialize JVM types to send them from workers to the master. In some cases they can be automatically generated and for some types there are explicit implementations written by Spark devs. In this case you can implicitly pass them. If your SparkSession is named spark then you miss following line:
import spark.implicits._
As you are new to Scala: implicits are parameters that you don't have to explicitly pass. In your example map function requires Encoder[Int]. By adding this import, it is going to be included in the scope and thus passed automatically to map function.
Check Scala documentation to learn more.

Related

Scala | Spark | Invoking undefined method

I am new to Scala and trying to grab the language fundamentals. I have working knowledge of Spark with Java API.
I have some hard time understanding some scala code and therfore I am not able to write the same in Java. I got this piece of code in https://learn.microsoft.com/en-us/azure/cosmos-db/spark-connector
// Import Necessary Libraries
import com.microsoft.azure.cosmosdb.spark.schema._
import com.microsoft.azure.cosmosdb.spark._
import com.microsoft.azure.cosmosdb.spark.config.Config
// Read Configuration
val readConfig = Config(Map(
"Endpoint" -> "https://doctorwho.documents.azure.com:443/",
"Masterkey" -> "YOUR-KEY-HERE",
"Database" -> "DepartureDelays",
"Collection" -> "flights_pcoll",
"query_custom" -> "SELECT c.date, c.delay, c.distance, c.origin, c.destination FROM c WHERE c.origin = 'SEA'" // Optional
))
// Connect via azure-cosmosdb-spark to create Spark DataFrame
val flights = spark.read.cosmosDB(readConfig)
flights.count()
As far as I know the read method returns an object of type org.apache.spark.sql.DataFrameReader and this does not have any method cosmosDB(), then how this code is working. Also how do I convert this code to Java.
Thank You
What you are seeing is the magic of Scala implicit conversions. The compiler sees that you intend to call the cosmosDB method of a DataFrameReader and that there's no method of that name with the proper signature, as you note.
When you
import com.microsoft.azure.cosmosdb.spark.schema._
you also import the contents of the package object (current git commit as of this writing, last updated in 2017 so it's stable code). The relevant bit that gets imported is
implicit def toDataFrameReaderFunctions(dfr: DataFrameReader): DataFrameReaderFunctions
An implicit def which takes one argument signals to the compiler that, if this def is in scope, the compiler can insert a call to this method if:
it has a DataFrameReader
a method is being called which is not a member of DataFrameReader
com.microsoft.azure.cosmosdb.spark.schema.DataFrameReaderFunctions has member with the desired name and signature
Since DataFrameReaderFunctions has a method cosmosDB, the compiler then translates your code to
toDataFrameReaderFunctions(spark.read).cosmosDB(readConfig)
This general approach of using an implicit conversion to make it look like you're adding methods to a type without modifying the type is called enrichment or an extension method. Implicit conversions in general should probably be avoided: they very often make code hard to follow and an errant implicit conversion in scope can make code you don't intend to compile compile. For an enrichment like this, there's an alternative: use an implicit class, where the compiler essentially autogenerates the implicit conversion but this doesn't allow you to use an Int in place of a String.

Special grammar in scala

I am very new at Scala and Spark area, and I found a strange grammar usage in the scala inside the Apache beam project and I can't understand.
Here is the strange place:
JavaDStream<Metadata> metadataDStream = mapWithStateDStream.map(new Tuple2MetadataFunction());
// register ReadReportDStream to report information related to this read.
new ReadReportDStream(metadataDStream.dstream(), id, getSourceName(source, id), stepName)
.register();
From the above code, you can see inside the constructor of ReadReportDstream, the first parameter is
metadataDStream.dstream()
If we go inside the dstream() method, you will see the following code:
class JavaDStream[T](val dstream: DStream[T])(implicit val classTag: ClassTag[T])
extends AbstractJavaDStreamLike[T, JavaDStream[T], JavaRDD[T]] {
I am wondering why it uses "metadataDStream.dstream()" in the constructor instead of "metadataDStream.dstream"? What does the "()" do?
It's mostly a question of convention. Methods with empty parameter lists are evaluated for their side-effects. Methods without parameters are assumed to be purely functional, and free of side-effects. You can read more about that here - https://docs.scala-lang.org/style/method-invocation.html (Arity-0 section)
So in that case, we're probably having some side-effects in metadataDStream.dstream(). However, syntactically writing it as metadataDStream.dstream won't be an error.

SparkContext implicit functions definition

I am reading the latest code of Spark,there are comments:
// The following implicit functions were in SparkContext before 1.3 and users had to
// `import SparkContext._` to enable them. Now we move them here to make the compiler find
// them automatically. However, we still keep the old functions in SparkContext for backward
// compatibility and forward to the following functions directly.
I don't understand Now we move them here to make the compiler find them automatically, I would ask how spark could automatically find these implicit definitions,and put the implicit definition into scope, because user only create s SparkContext instance in their spark code
I would ask how spark could automatically find these implicit
definition
It is not Spark, it is the Scala compiler. The compiler has several places it traces while compiling your code to try and find implicits. You can find them in Where does Scala look for implicits?
Since these methods are defined on WritableConverter, anytime that type is in scope in one of the ways Scala looks for implicits, you'll automatically have these conversions in scope and be able to apply them.

Spark toDF cannot resolve symbol after importing sqlContext implicits

I'm working on writing some unit tests for my Scala Spark application
In order to do so I need to create different dataframes in my tests. So I wrote a very short DFsBuilder code that basically allows me to add new rows and eventually create the DF. The code is:
class DFsBuilder[T](private val sqlContext: SQLContext, private val columnNames: Array[String]) {
var rows = new ListBuffer[T]()
def add(row: T): DFsBuilder[T] = {
rows += row
this
}
def build() : DataFrame = {
import sqlContext.implicits._
rows.toList.toDF(columnNames:_*) // UPDATE: added :_* because it was accidently removed in the original question
}
}
However the toDF method doesn't compile with a cannot resolve symbol toDF.
I wrote this builder code with generics since I need to create different kinds of DFs (different number of columns and different column types). The way I would like to use it is to define some certain case class in the unit test and use it for the builder
I know this issue somehow relates to the fact that I'm using generics (probably some kind of type erasure issue) but I can't quite put my finger on what the problem is exactly
And so my questions are:
Can anyone show me where the problem is? And also hopefully how to fix it
If this issue cannot be solved this way, could someone perhaps offer another elegant way to create dataframes? (I prefer not to pollute my unit tests with the creation code)
I obviously googled this issue first but only found examples where people forgot to import the sqlContext.implicits method or something about a case class out of scope which is probably not the same issue as I'm having
Thanks in advance
If you look at the signatures of toDF and of SQLImplicits.localSeqToDataFrameHolder (which is the implicit function used) you'll be able to detect two issues:
Type T must be a subclass of Product (the superclass of all case classes, tuples...), and you must provide an implicit TypeTag for it. To fix this - change the declaration of your class to:
class DFsBuilder[T <: Product : TypeTag](...) { ... }
The columnNames argument is not of type Array, it's a "repeated parameter" (like Java's "varargs", see section 4.6.2 here), so you have to convert the array into arguments:
rows.toList.toDF(columnNames: _*)
With these two changes, your code compiles (and works).

Scala Unit Testing - Mocking an implicitly wrapped function

I have a question concerning unit tests that I'm trying to achieve using Mockito in Scala. I've also looked up ScalaMock but it sounds like the feature is not provided as well. I suppose that maybe I'm looking from a narrow way to the solution and there might be a different perspective or approach to what im doing so all your opinions are welcomed.
Basically, I want to mock a function that is available to the object using implicit conversion, and I don't have any control to change how that is done. Since I'm a user to the library. The concrete example is similar to the following scenario
rdd: RDD[T] = //existing RDD
sqlContext: SQLContext = //existing sqlcontext
import sqlContext.implicits._
rdd.toDF()
/*toDF() doesn't originally exist at RDD but is implicitly added when importing sqlContext.implicits._*/
Now In the testing, I'm mocking the rdd and the sqlContext and I want to mock the toDF() function. I Can't mock the function toDF() since it doesn't exist on the RDD level. Even if I do a simple trick, importing the mocked sqlContext.implicit._ I get an error that any function that is not publicaly available to the object can't be mocked. I even tried to mock the code that is implicitly executed until toDF() but I get stuck with Final/Pivate[in accessible] classes that I also can't mock. Your suggestions are more than welcomed. Thanks in advance :)