could not find implicit value for evidence parameter of type org.apache.flink.api.common.typeinfo.TypeInformation[...] - scala

I am trying to write some use cases for Apache Flink. One error I run into pretty often is
could not find implicit value for evidence parameter of type org.apache.flink.api.common.typeinfo.TypeInformation[SomeType]
My problem is that I cant really nail down when they happen and when they dont.
The most recent example of this would be the following
...
val largeJoinDataGen = new LargeJoinDataGen(dataSetSize, dataGen, hitRatio)
val see = StreamExecutionEnvironment.getExecutionEnvironment
val newStreamInput = see.addSource(largeJoinDataGen)
...
where LargeJoinDataGen extends GeneratorSource[(Int, String)] and GeneratorSource[T] extends SourceFunction[T], both defined in separate files.
When trying to build this I get
Error:(22, 39) could not find implicit value for evidence parameter of type org.apache.flink.api.common.typeinfo.TypeInformation[(Int, String)]
val newStreamInput = see.addSource(largeJoinDataGen)
1. Why is there an error in the given example?
2. What would be a general guideline when these errors happen and how to avoid them in the future?
P.S.: first scala project and first flink project so please be patient

You may make an import instead of implicits
import org.apache.flink.streaming.api.scala._
It will also help.

This mostly happens when you have user code, i.e. a source or a map function or something of that nature that has a generic parameter. In most cases you can fix that by adding something like
implicit val typeInfo = TypeInformation.of(classOf[(Int, String)])
If your code is inside another method that has a generic parameter you can also try adding a context bound to the generic parameter of the method, as in
def myMethod[T: TypeInformation](input: DataStream[Int]): DataStream[T] = ...

My problem is that I cant really nail down when they happen and when they dont.
They happen when an implicit parameter is required. If we look at the method definition we see:
def addSource[T: TypeInformation](function: SourceFunction[T]): DataStream[T]
But we don't see any implicit parameter defined, where is it?
When you see a polymorphic method where the type parameter is of the form
def foo[T : M](param: T)
Where T is the type parameter and M is a context bound. It means that the creator of the method is requesting an implicit parameter of type M[T]. It is equivalent to:
def foo[T](param: T)(implicit ev: M[T])
In the case of your method, it is actually expanded to:
def addSource[T](function: SourceFunction[T])(implicit evidence: TypeInformation[T]): DataStream[T]
This is why you see the compiler complaining, as it can't find the implicit parameter the method is requiring.
If we go to the Apache Flink Wiki, under Type Information we can see why this happens :
No Implicit Value for Evidence Parameter Error
In the case where TypeInformation could not be created, programs fail to compile with an error stating “could not find implicit value for evidence parameter of type TypeInformation”.
A frequent reason if that the code that generates the TypeInformation has not been imported. Make sure to import the entire flink.api.scala package.
import org.apache.flink.api.scala._
For generic methods, you'll need to require them to generate a TypeInformation at the call-site as well:
For generic methods, the data types of the function parameters and return type may not be the same for every call and are not known at the site where the method is defined. The code above will result in an error that not enough implicit evidence is available.
In such cases, the type information has to be generated at the invocation site and passed to the method. Scala offers implicit parameters for that.
Note that import org.apache.flink.streaming.api.scala._ may also be necessary.
For your types this means that if the invoking method is generic, it also needs to request the context bound for it's type parameter.

For example Scala versions (2.11, 2.12, etc.) are not binary compatible.
The following is a wrong configuration even if you use import org.apache.flink.api.scala._ :
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<scala.version>2.12.8</scala.version>
<scala.binary.version>2.11</scala.binary.version>
</properties>
Correct configuration in Maven:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<scala.version>2.12.8</scala.version>
<scala.binary.version>2.12</scala.binary.version>
</properties>

Related

Why does a spark RDD behave differently depending on contents?

Based on this description of datasets and dataframes I wrote this very short test code which works.
import org.apache.spark.sql.functions._
val thing = Seq("Spark I am your father", "May the spark be with you", "Spark I am your father")
val wordsDataset = sc.parallelize(thing).toDS()
If that works... why does running this give me a
error: value toDS is not a member of org.apache.spark.rdd.RDD[org.apache.spark.sql.catalog.Table]
import org.apache.spark.sql.functions._
val sequence = spark.catalog.listDatabases().collect().flatMap(db =>
spark.catalog.listTables(db.name).collect()).toSeq
val result = sc.parallelize(sequence).toDS()
toDS() is not a member of RRD[T]. Welcome to the bizarre world of Scala implicits where nothing is what it seems to be.
toDS() is a member of DatasetHolder[T]. In SparkSession, there is an object called implicits. When brought in scope with an expression like import sc.implicits._, an implicit method called rddToDatasetHolder becomes available for resolution:
implicit def rddToDatasetHolder[T](rdd: RDD[T])(implicit arg0: Encoder[T]): DatasetHolder[T]
When you call rdd.toDS(), the compiler first searches the RDD class and all of its superclasses for a method called toDS(). It doesn't find one so what it does is start searching all the compatible implicits in scope. While doing so, it finds the rddToDatasetHolder method which accepts an RDD instance and returns an object of a type which does have a toDS() method. Basically, the compiler rewrites:
sc.parallelize(sequence).toDS()
into
SparkSession.implicits.rddToDatasetHolder(sc.parallelize(sequence)).toDS()
Now, if you look at rddToDatasetHolder itself, it has two argument lists:
(rdd: RDD[T])
(implicit arg0: Encoder[T])
Implicit arguments in Scala are optional and if you do not supply the argument explicitly, the compiler searches the scope for implicits that match the required argument type and passes whatever object it finds or can construct. In this particular case, it looks for an instance of the Encoder[T] type. There are many predefined encoders for the standard Scala types, but for most complex custom types no predefined encoders exist.
So, in short: The existence of a predefined Encoder[String] makes it possible to call toDS() on an instance of RDD[String], but the absence of a predefined Encoder[org.apache.spark.sql.catalog.Table] makes it impossible to call toDS() on an instance of RDD[org.apache.spark.sql.catalog.Table].
By the way, SparkSession.implicits contains the implicit class StringToColumn which has a $ method. This is how the $"foo" expression gets converted to a Column instance for column foo.
Resolving all the implicit arguments and implicit transformations is why compiling Scala code is so dang slow.

Scala type inference not working with play json

I am writing an http client and this is my signature:
def post[Req, Resp](json: Req)(implicit r: Reads[Resp], w: Writes[Req]): Future[Resp]
Using play json behind the scenes.
When I use it like this
def create(req: ClusterCreateRequest): Future[ClusterCreateResponse] = endpoint.post(req)
I get the following error
diverging implicit expansion for type play.api.libs.json.Reads[Resp]
The following works
def create(req: ClusterCreateRequest): Future[ClusterCreateResponse] = endpoint.post[ClusterCreateRequest, ClusterCreateResponse](req)
Why is type inference not working as expected? What can I do for this?
diverging implicit expansion for type play.api.libs.json.Reads[Resp]
means that Resp has few JSON serializers that are not shadowed one by another.
It's not possible to pinpoint root cause the issue and say fix X and everything will work from the infrmation given in post.
But you can try to "debug" implicit search. Consider checking the implicit search order:
Where does Scala look for implicits? Enabling implicit parameter expansion in idea might help to check which implicits(Ctrl+Shift+=) cause a clash.
General advice for type class instances - hold them organized and declared, put them to companion object or to specially dedicated object.

Scala- How to get a Vector's contained class?

Does Scala have a way to get the contained class(es) of a collection? i.e. if I have:
val foo = Vector[Int]()
Is there a way to get back classOf[Int] from it?
(Just checking the first element doesn't work since it might be empty.)
You can use TypeTag:
import scala.reflect.runtime.universe._
def getType[F[_], A: TypeTag](as: F[A]) = typeOf[A]
val foo = Vector[Int]()
getType(foo)
Not from the collection itself, but if you get it a parameter from a method, you could add an implicit TypeTag to that method to obtain the type at runtime. E.g.
def mymethod[T](x: Vector[T])(implicit tag: TypeTag[T]) = ...
See https://docs.scala-lang.org/.../typetags-manifests.html for details.
Technically you can do it by using TypeTag or Typeable/TypeCase from Shapless library (see link). But I just want to note that all these tricks are really very advanced solutions when there is no any better way get the task done without digging inside type parameters.
All type parameters in scala and java are affected by type erasure on runtime, and if you сatch yourself thinking about extracting these information from the class it might be a good sign to redesign the solution that you are trying to implement.

Excluding type evidence parameters from analysis in Scala when using -Ywarn-unused

Compiling a program that contains a type evidence parameter in Scala (such as T <:< U) can cause a warning when -Ywarn-unused is passed to the compiler. Especially in the case when the type evidence parameter is used to verify a constraint encoded using phantom types, this warning is likely to occur.
As an example, compiling the file here:
https://github.com/hseeberger/demo-phantom-types/blob/master/src/main/scala/de/heikoseeberger/demophantomtypes/Hacker.scala returns the following:
# scalac -Ywarn-unused Hacker.scala
Hacker.scala:42: warning: parameter value ev in method hackOn is never used
def hackOn(implicit ev: IsCaffeinated[S]): Hacker[State.Decaffeinated] = {
^
Hacker.scala:47: warning: parameter value ev in method drinkCoffee is never used
def drinkCoffee(implicit ev: IsDecaffeinated[S]): Hacker[State.Caffeinated] = {
^
two warnings found
It's clear to me that the parameter ev is not actually necessary at runtime, but the parameter is useful at compile time. Is there any way to instruct the compiler to ignore this case, while still raising the warning for unused function parameters in other contexts?
For example, I think instructing the compiler to ignore implicit parameters of class <:< or =:= would solve this issue, but I'm not sure how that could be accomplished.
I often find myself adding this because of either -Ywarn-unused or -Ywarn-value-discard:
package myproject
package object syntax {
implicit class IdOps[A](a: A) {
def unused: Unit = ()
}
}
Lets you do ev.unused in the code to explicitly "specify" that the value is not going to be used or is only there for side effects. You're not using class field in the definition, but that's okay for -Ywarn-unused.
Your other option is to use silencer plugin to suppress warnings for these few methods.
Many years later, it's worth to mention there is am #unused annotation available (since when, I am not sure):
import scala.annotation.unused
def drinkCoffee(implicit #unused ev: IsDecaffeinated[S]): Hacker[State.Caffeinated]
Consequently, you can not use a context-bounds

[Akka]: Passing generic type function without type loss

I have a actor message of the following type:
case class RetrieveEntities[A](func:(Vector[A]) => Vector[A])
I then would like to handle the message in the following way:
def receive = {
case RetrieveEntities(parameters, func) =>
context.become(retrieveEntities(func))
def retrieveEntities(func:(Vector[T]) => Vector[T])(implicit mf: Manifest[T]){
case _ => ...
}
And I instantiate the actor in the following way:
TestActorRef(new RetrieveEntitiesService[Picture])
The problem is I receive the following compiler error:
type mismatch;
[error] found : Vector[Any] => Vector[Any]
[error] required: Vector[T] => Vector[T]
[error] context.become(retrieveEntities(func))
Which I suppose means I lost the type information but I am unsure why and how to prevent it.
Your example code is a bit to short to give you a solution, but from what you show it seems like what you are trying to do is not possible.
This is why
In Scala (and Java) the type parameters are erased, which means they disappear after compilation, so during runtime they are no longer available. This means that your pattern match on RetrieveEntities(parameters, func) is really a match where A can be anything. You then go on and call a method that is typed with T and there is no way for the compiler to know what you mean with that.
Manifest (which is deprecated), TypeTag and ClassTag are a mechanism that tells the compiler to create an object that provides type information for those after compilation but you have to "save" that information.
To be able to know what A you typed your RetrieveEntitiesService with you would need to take an implicit ClassTag to the constructor to base any logic on it (since when calling the constructor is the time that you know what A is):
import scala.reflect.ClassTag
case class RetrieveEntities[A](func:(Vector[A]) => Vector[A])(implicit val tag: ClassTag[A])
You could then call runtimeClass on the tag to get the type of A:
scala> val retrieve = RetrieveEntities[String](identity)
scala> retrieve.tag.runtimeClass
res2: Class[_] = class java.lang.String
Note that this still would not let you type a method call with since we are now in runtime, but it would let you use that instance of Class to compare with the runtimeClass of E of the actor and then do a safe cast to RetrieveEntities[E] if you like. (and also regular runtime conditional flows, reflection etc.).
Two important notes before you start doing that
I would not advice you to go down that path until you are more confident with the type system and really really know that there is no other reasonable design that solves your problem. Again I can not help you towards such a solution with the sparse example code given. (Maybe your actor does not really need to know about the type of A for example, or there is a limited set of E:s that you might match on with concrete types)
As an additional warning, type and class tags are not thread safe in Scala 2.10, and might not be safe in 2.11 either, so mixing them with actors might be a bad idea. (http://docs.scala-lang.org/overviews/reflection/thread-safety.html)
johanandren's answer was certainly helpful, but at the end I found a way that it could compile and it is working for now.
I needed to give the compiler a more precise type annotation to make it work:
case RetrieveEntities(parameters, func:(Vector[T]) => Vector[T])
I still will continue to use Manifest instead of the new Reflection API (TypeTag and ClassTag) mainly because the library I am using (json4s) uses internally also the Manifest implementation, and I assume it will lead to less problems this way.