How to use flink fold function in scala - scala

This is a non working try for using Flink fold with scala anonymous function:
val myFoldFunction = (x: Double, t:(Double,String,String)) => x + t._1
env.readFileStream(...).
...
.groupBy(1)
.fold(0.0, myFoldFunction : Function2[Double, (Double,String,String), Double])
It compiles well, but at execution, I get a "type erasure issue" (see below). Doing so in Java is fine, but of course more verbose. I like the concise and clear lambdas. How can I do that in scala?
Caused by: org.apache.flink.api.common.functions.InvalidTypesException:
Type of TypeVariable 'R' in 'public org.apache.flink.streaming.api.scala.DataStream org.apache.flink.streaming.api.scala.DataStream.fold(java.lang.Object,scala.Function2,org.apache.flink.api.common.typeinfo.TypeInformation,scala.reflect.ClassTag)' could not be determined.
This is most likely a type erasure problem.
The type extraction currently supports types with generic variables only in cases where all variables in the return type can be deduced from the input type(s).

The problem you encountered is a bug in Flink [1]. The problem originates from Flink's TypeExtractor and the way the Scala DataStream API is implemented on top of the Java implementation. The TypeExtractor cannot generate a TypeInformation for the Scala type and thus returns a MissingTypeInformation. This missing type information is manually set after creating the StreamFold operator. However, the StreamFold operator is implemented in a way that it does not accept a MissingTypeInformation and, consequently, fails before setting the right type information.
I've opened a pull request [2] to fix this problem. It should be merged within the next two days. By using then the latest 0.10 snapshot version, your problem should be fixed.
[1] https://issues.apache.org/jira/browse/FLINK-2631
[2] https://github.com/apache/flink/pull/1101

Related

How to get erased type at compile time?

In Scala 2, there is a function TypeApi#erasure described as follows
The erased type corresponding to this type after
all transformations from Scala to Java have been performed.
How to get erased Type for a given TypeRepr in Scala 3?
I have also asked this question on dotty Gitter and it came out, that currently there is no available API for getting erased types.
Then I created a dotty feature request in which you can find possible workarounds.

Specify type of a TaggedOutput to pass through GroupByKey (as a part of CombinePerKey)

When I tried to migrate my project based on apache beam pipelines from python 3.7 to 3.8 the type hint check started to fail at this place:
pcoll = (
wrong_pcoll,
some_pcoll_1,
some_pcoll_2,
some_pcoll_3,
) | beam.Flatten(pipeline=pipeline)
pcoll | beam.CombinePerKey(MyCombineFn()) # << here
with this error:
apache_beam.typehints.decorators.TypeCheckError: Input type hint violation at GroupByKey: expected Tuple[TypeVariable[K], TypeVariable[V]], got Union[TaggedOutput, Tuple[Any, Any], Tuple[Any, _MyType1], Tuple[Any, _MyType2]]
The wrong_pcoll is actually a TaggedOutput because it's received as a tagged output from one on previous ptransforms.
Type hint check fails when the type of wrong_pcoll which is a TaggedOutput as a part of the type of pcoll (which type in correspondence with the exception is Union[TaggedOutput, Tuple[Any, Any], Tuple[Any, _MyType1], Tuple[Any, _MyType2]]) passed to GrouByKey that is used inside of CombinePerKey.
So I have two questions:
Why does it work in python 3.7 and doesn't on 3.8?
How to specify type for a tagged output? I tried to specify the type for the process() method of PTransform that produced that as a union of all output types that it yields, but for some reason for the type hint check has been chose the wrong one. Then I specified strictly the type I need: Tuple[Any, Any] and it has worked. But it's not a way since process() also yields other types, like simple str.
As a workaround, I can pass this wrong_pcoll through a simple beam.Map with lambda x: x and .with_output_types(Tuple[Any, Any]), but it does not seem to be a clear way to fix it.
I investigated similar failures recently.
Beam has some type-inferencing capabilities which rely on opcode analysis of pipeline code. Inferencing is somewhat limited and conservative. For example, when Beam attempts to infer a function's return type and encounters an opcode that it does not know, Beam infers the return type as Any. It is also sensitive to Python minor version.
Python 3.8 removed some opcodes, such as SETUP_LOOP, that Beam didn't handle previously. Therefore, type inference behavior kicked in for some portions of the code where it didn't work before. I've seen pipelines where an increased type inference on Python 3.8 exposed incorrectly-specified hints.
You are running into a bug/limitation in Beam's type inference for multi-output DoFns tracked in https://issues.apache.org/jira/browse/BEAM-4132. There was some progress, but it's not completely addressed. As a workaround you could manually specify the hints. I think beam.Flatten().with_output_types(Tuple[str, Union[_MyType1, _MyType2]]) should work for your case.

can`t bind[SttpBackend[Try, Nothing]]

I want to use sttp library with guice(with scalaguice wrapper) in my app. But seems it is not so easy to correctly bind things like SttpBackend[Try, Nothing]
SttpBackend.scala
Try[_] and Try[AnyRef] show some other errors, but still have no idea how it should be done correctly
the error I got:
kinds of the type arguments (scala.util.Try) do not conform to the expected kinds of the type parameters (type T).
[error] scala.util.Try's type parameters do not match type T's expected parameters:
[error] class Try has one type parameter, but type T has none
[error] bind[SttpBackend[Try, Nothing]].toProvider[SttpBackendProvider]
[error] ` ^
SttpBackendProvider looks like:
def get: SttpBackend[Try, Nothing] = TryHttpURLConnectionBackend(opts)
complete example in scastie
interesting that version scalaguice 4.1.0 show this error, but latest 4.2.2 shows error inside it with converting Nothing to JavaType
I believe you hit two different bugs in the Scala-Guice one of which is not fixed yet (and probably even not submitted yet).
To describe those issues I need a fast intro into how Guice and Scala-Guice work. Essentially what Guice do is have a mapping from type onto the factory method for an object of that type. To support some advanced features types are mapped onto some internal "keys" representation and then for each "key" Guice builds a way to construct a corresponding object. Also it is important that generics in Java are implemented using type erasure. That's why when you write something like:
bind(classOf[SttpBackend[Try, Nothing]]).toProvider(classOf[SttpBackendProvider])
in raw-Guice, the "key" actually becomes something like "com.softwaremill.sttp.SttpBackend". Luckily Guice developers have thought about this issue with generics and introduced TypeLiteral[T] so you can convey the information about generics.
Scala type system is more reach than in Java and it has some better reflection support from the compiler. Scala-Guice exploits it to map Scala-types on those more detailed keys automatically. Unfortunately it doesn't always work perfectly.
The first issue is the result of the facts that the type SttpBackend is defined as
trait SttpBackend[R[_], -S]
so it uses it expects its first parameter to be a type constructor; and that originally Scala-Guice used the scala.reflect.Manifest infrastructure. AFAIU such higher-kind types are not representable as Manifest and this is what the error in your question really says.
Luckily Scala has added a new scala.reflect.runtime.universe.TypeTag infrastructure to tackle this issue in a better and more consistent way and the Scala-Guice migrated to its usage. That's why with the newer version of Scala-Guice the compiler error goes away. Unfortunately there is another bug in the Scala-Guice that makes the code fail in runtime and it is a lack of handling of the Nothing Scala type. You see, the Nothing type is a kind of fake one on the JVM. It is one of the things where the Scala type system is more reach than the Java one. There is no direct mapping for Nothing in the JVM world. Luckily there is no way to create any value of the type Nothing. Unfortunately you still can create a classOf[Nothing]. The Scala-to-JVM compiler handles it by using an artificial scala.runtime.Nothing$. It is not a part of the public API, it is implementation details of specifically Scala over JVM. Anyway this means that the Nothing type needs additional handling when converting into the Guice TypeLiteral and there is none. There is for Any the cousin of Nothing but not for Nothing (see the usage of the anyType in TypeConversions.scala).
So there are really two workarounds:
Use raw Java-based syntax for Guice instead of the nice Scala-Guice one:
bind(new TypeLiteral[SttpBackend[Try, Nothing]]() {})
.toInstance(sttpBackend) // or to whatever
See online demo based on your example.
Patch the TypeConversions.scala in the Scala-Guice as in:
private[scalaguice] object TypeConversions {
private val mirror = runtimeMirror(getClass.getClassLoader)
private val anyType = typeOf[Any]
private val nothingType = typeOf[Nothing] // added
...
def scalaTypeToJavaType(scalaType: ScalaType): JavaType = {
scalaType.dealias match {
case `anyType` => classOf[java.lang.Object]
case `nothingType` => classOf[scala.runtime.Nothing$] //added
...
I tried it locally and it seems to fix your example. I didn't do any extensive tests so it might have broken something else.

Correct class names in IntelliJ debugger (scala)

I am new to scala & IntelliJ, so this might sound like a stupid question. However, I see that when debugging, IntelliJ will infer parameter types to some kind of "weird" type, in my case MapLike$MappedValues. The expected type would be Map[String, Iterable[Person]].
Why can IntelliJ not display the correct type? Right now this would be kind of important to me, because by debugging I am trying to find out the correct parameter type, because the API documentation on this part is not very clear (working with Apache Flink CEP)
Code example:
val result:DataStream[String] = patternStream.select(patterns => {
val person:Person = patterns.head._2.head
s"Person ${person.name} of age ${person.age} can drink!"
})
This is what I can see in the debugger:
According to the documentation:
"The select() method takes a selection function as argument, which is called for each matching event sequence. It receives a match in the form of Map[String, Iterable[IN]]"
Why does IntelliJ display MapLike$MappedValues and not something like Map[String,Iterable[Person]]?
The debugger is showing you the runtime class.
On the JVM, generic type parameters are "erased" at runtime, so they will not be visible in the debugger. Map is just an interface, but MapLike$MappedValues implements it as inheritor of AbstractMap.
Thus, the displayed class is compatible with the type of the value.
To find the compile time type of an expression, IntelliJ can help you:
select the expression
press the "type info" hotkey (usually shift+ctrl+p)

Finding applicable implicit conversions in Scala

In the Coq proof assistant - which also has implicit conversions - it is possible to search for an implicit conversion using the SearchAbout T command, which returns all the things which have T in their type (which would include conversions to or from T).
Is there a way of finding all conversions to or from a type for Scala programmers? Note that the conversions might be defined outside the project that defines either the source or destination type.
To just quickly see if a conversion exists in the current scope between two reference types S and T, just type
((null:S):T)
and see if it compiles. With Eclipse Scala IDE >= 2.1M2 you can see which conversion is called, if implicit highlighting is enabled in the preferences.
Of course this requires you to guess both types (but you will probably already have a clear idea of what you want to convert to and from), and it requires the conversions to already be in scope.