Spark dataset type annotations support

Spark dataset type annotations support - scala

Given a simple case class with a type annotation #Bar:
case class Foo(
field: Option[String] #Bar
)
converting a RDD[Foo] to a Dataset[Foo] fails at runtime with the following stack trace:
User class threw exception: scala.MatchError: scala.Option[String] #Bar (of class scala.reflect.internal.Types$AnnotatedType)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:483)
at ...
A ticket is open for this issue (SPARK-27625). However, is there a workaround?
Using spark 2.3.2

The frameless library supports type annotations.

Related

Jackson + Scala: No deserializer found for case class

I am relatively new to Scala and using Jackson, I am getting below error for following case class:
case class Result(label: String, resultDate: Option[Date] = None)
Resolved[org.springframework.http.converter.HttpMessageNotReadableException: JSON parse error: No deserializer for document type 'result' found; nested exception is com.fasterxml.jackson.databind.JsonMappingException: No deserializer for document type 'result' found
at [Source: (PushbackInputStream); line: 1, column: 209] (through reference chain: com.project["document"])]

You need to provide Jackson a way to deserialize this type: Jackson (core) doesn't know Scala and especially case class since it's originally a Java library.
You can add Jackson Scala Module for automatic support of Scala types.

Flink: scala case class being serialized as GenericType via Kryo

I have following case classes defined in my flink application (Flink 1.10.1)
case class FilterDefinition(filterDefId: String, filter: TileFilter)
case class TileFilter(tiles: Seq[Long], zoomLevel: Int)
During runtime, I noticed the log saying
FilterDefinition cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance.
If I interpreted Flink documentation correctly, the flink should be able to serialize the scala case classes and not need Kryo for it. However, it looks like for me, the above case class fallbacks on Kryo serializer.
Did I miss interpret how case classes are handled by flink?

Excerpting here from the documentation:
Java and Scala classes are treated by Flink as a special POJO data
type if they fulfill the following requirements:
The class must be public.
It must have a public constructor without arguments (default
constructor).
All fields are either public or must be accessible through getter and
setter functions. For a field called foo the getter and setter methods
must be named getFoo() and setFoo().
The type of a field must be supported by a registered serializer.
In this case Flink it appears that Flink doesn't know how to serialize TileFilter (or more specifically, Seq[Long]).

Unable to call method from scala protobuf jar

I have define scala object(say, MyObject) which extends following
trait GeneratedMessageCompanion[A <: GeneratedMessage with Message[A]]
And when I call parseFrom method on the object, I get following error:
Caused by: java.lang.NoSuchMethodError:....MyObject$.parseFrom([B)Lscalapb/GeneratedMessage;
I tried both scalapb-runtime_2.11 and scalapb-runtime_2.12.
Edit: Issue is solved. It was case of dependency mismatches.

Scala convert Map$ to Map

I have an exception:
java.lang.ClassCastException: scala.collection.immutable.Map$ cannot
be cast to scala.collection.immutable.Map
which i'm getting in this part of code:
val iterator = new CsvMapper()
.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
.readerFor(Map.getClass).`with`(CsvSchema.emptySchema().withHeader()).readValues(reader)
while (iterator.hasNext) {
println(iterator.next.asInstanceOf[Map[String, String]])
}
So, are there any options to avoid this issue, because this:
val iterator = new CsvMapper()
.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES)
.readerFor(Map[String,String].getClass).`with`(CsvSchema.emptySchema().withHeader()).readValues(reader)
doesn't help, because I get
[error] Unapplied methods are only converted to functions when a function type is expected.
[error] You can make this conversion explicit by writing `apply _` or `apply(_)` instead of `apply`.
Thanks in advance

As has been pointed out in the earlier comments, in general you need classOf[X[_,_]] rather than X.getClass or X[A, B].getClass for a class that takes two generic types. (instance.getClass retrieves the class of the associated instance; classOf[X] does the same for some type X when an instance isn't available. Since Map is an object and objects are also instances, it retrieves the class type of the object Map - the Map trait's companion.)
However, a second problem here is that scala.collection.immutable.Map is abstract (it's actually a trait), and so it cannot be instantiated as-is. (If you look at the type of Scala Map instances created via the companion's apply method, you'll see that they're actually instances of classes such as Map.EmptyMap or Map.Map1, etc.) As a consequence, that's why your modified code still produced an error.
However, the ultimate problem here is that you required - as you mentioned - a Java java.util.Map and not a Scala scala.collections.immutable.Map (which is what you'll get by default it you just type Map in a Scala program). Just one more thing to watch out for when converting Java code examples to Scala. ;-)

Scala timestamp/date zero argument constructor?

Squeryl requires a zero argument constructor when using Option[] in fields. I realized how I could create such a constructor for Long like 0L but how do I create such a thing for a Timestamp or Date?
Essentially I need to finish this:
def this() = this(0L,"",TIMESTAMP,TIMESTAMP,0L,"","","","",Some(""),Some(""),"",DATE,DATE,false,false,false,Some(0L),Some(0),Some(0L))
Below is how I originally found the timestamp and date problem.
Background
Getting the following error in my Play! 2.0 Scala app (also using Squeryl):
Caused by: java.lang.RuntimeException: Could not deduce Option[] type of field 'startOrder' of class models.Job
This field in models.Job:
#Column("start_order")
var startOrder: Option[Int],
And in the Postgres DB it is defined as an integer. Is there different handling in Play! 2.0 of models, is this a bug, or is it a Squeryl problem? Thanks!
Stack trace, looks like Squeryl problem
Caused by: java.lang.RuntimeException: Could not deduce Option[] type of field 'startOrder' of class models.Job
at scala.sys.package$.error(package.scala:27) ~[scala-library.jar:na]
at scala.Predef$.error(Predef.scala:66) ~[scala-library.jar:0.11.2]
at org.squeryl.internals.FieldMetaData$$anon$1.build(FieldMetaData.scala:441) ~[squeryl_2.9.1-0.9.4.jar:na]
at org.squeryl.internals.PosoMetaData$$anonfun$3.apply(PosoMetaData.scala:111) ~[squeryl_2.9.1-0.9.4.jar:na]
at org.squeryl.internals.PosoMetaData$$anonfun$3.apply(PosoMetaData.scala:80) ~[squeryl_2.9.1-0.9.4.jar:na]
at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:176) ~[scala-library.jar:0.11.2]

If startOrder is defined as
val startOrder: Option[java.sql.Timestamp]
in class definition. I believe,
Some(new java.sql.Timestamp(0))
should be passed to constructor.

Option is used when a value is optional, i.e. if there could be a value or not. Only if there is a value, you use Some wrapping it. But if there is no value, you use None.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Spark dataset type annotations support - scala

The frameless library supports type annotations.

Related

Jackson + Scala: No deserializer found for case class

Flink: scala case class being serialized as GenericType via Kryo

Unable to call method from scala protobuf jar

Scala convert Map$ to Map

Scala timestamp/date zero argument constructor?

Categories

Resources