spark custom class encoder

spark custom class encoder - scala

I have a CustomClass trait,
I have implemented trait to different classes,
I have used factory pattern to created objects of this class.
import org.apache.spark.sql.Encoders
implicit val encoder = Encoders.kryo[CustomClass ](classOf[CustomClass ])
Now when I am trying to create CustomClass dataframe
I am getting below error.
the dataframe has only one column value, and the data is binary.
org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input columns: [value];;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:143)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:106)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118)
at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:106)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:140)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:92)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:177)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:92)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:89)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:130)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:156)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:153)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:68)
please help

you have to use Encoders.bean so :
implicit val encoder = Encoders.bean[CustomClass ](classOf[CustomClass ])

Related

Scala & Spark : java.lang.ArrayStoreException on deserialisation

I'm working on in Scala & Spark to load a big file (60+ GB) JSON file and process it. Since using sparksession.read.json leads to out-of-memory exception, I've went to the RDD route.
import org.json4s._
import org.json4s.jackson.JsonMethods._
import org.json4s.jackson.Serialization.{read}
val submissions_rdd = sc.textFile("/home/user/repos/concepts/abcde/RS_2019-09")
//val columns_subset = Set("author", "title", "selftext", "score", "created_utc", "subreddit")
case class entry(title: String,
selftext: String,
score: Double,
created_utc: Double,
subreddit: String,
author: String)
def jsonExtractObject(jsonStr: String) = {
implicit val formats = org.json4s.DefaultFormats
read[entry](jsonStr)
}
Upon testing my function on a single entry, I get the desired result:
val res = jsonExtractObject(submissions_rdd.take(1)(0))
res: entry =
entry(Last Ditch Effort,&#x200B;
https://preview.redd.it/9x4ld036ivj31.jpg?width=780&format=pjpg&auto=webp&s=acaed6cc0d913ec31b54235ca8bb73971bcfe598,1.0,1.567296E9,YellowOnlineUnion,Sgedelta)
Problem is, after trying to map the same function to the RDD I'm getting an error:
val subset = submissions_rdd.map(line => jsonExtractObject(line) )
subset.take(5)
org.apache.spark.SparkDriverExecutionException: Execution error at
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1485)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2236)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2120) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2139) at
org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1423) at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:388) at
org.apache.spark.rdd.RDD.take(RDD.scala:1396) ... 41 elided Caused
by: java.lang.ArrayStoreException: [Lentry; at
scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:75) at
org.apache.spark.SparkContext.$anonfun$runJob$4(SparkContext.scala:2120)
at
org.apache.spark.SparkContext.$anonfun$runJob$4$adapted(SparkContext.scala:2120)
at
org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:59)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1481)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2236)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Would appreciate if any has any hints on how to go around that. Thanks!

intermittent exception from a spark job: Schema for type scala.collection.Map[String,String] is not supported

I keep intermittently hitting the following exception in my spark job:
java.lang.UnsupportedOperationException: Schema for type scala.collection.Map[String,String] is not supported
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:780)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:715)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:714)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$8.apply(ScalaReflection.scala:776)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1$$anonfun$apply$8.apply(ScalaReflection.scala:775)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:775)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:715)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:714)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:711)
at org.apache.spark.sql.functions$.udf(functions.scala:3382)
...
I then look into the source in ScalaReflection.scala, it looks like Map[String, String] should be always a valid type in schemaFor(..) function as for line at
https://github.com/apache/spark/blob/1a5e460762593c61b7ff2c5f3641d406706616ff/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L742
where it shows
case t if t <:< localTypeOf[Map[_, _]] =>
val TypeRef(_, _, Seq(keyType, valueType)) = t
val Schema(valueDataType, valueNullable) = schemaFor(valueType)
Schema(MapType(schemaFor(keyType).dataType,
Does anyone know in what case Map[String, String] can miss this check and then fall into the case "other"?
also this issue is hard to reproduce, it happens randomly once per thousands of runs.

Spark dataset and scala.ScalaReflectionException: type V is not a class

I have the following classes:
case class S1(value: String, ws: Map[Int, String])
case class S2(value: String, ws: Map[Int, String], dep: BS)
As shown above, these two have one different field which is BS
The code below works fine.
sparkSQL.createDataset(Seq(S1("heloo", Map(0 -> "0")))).foreach(x => println(x))
The code below works also fine and it is the BS class by itself.
sparkSQL.createDataset(Seq(BS(List(0), List(Edge(0, 1, DepRelation("0-->1", "", "")))))).foreach(x => println(x))
Now if I use S2 which is basically S1 and a BS class, I get a runtime error message:
sparkSQL.createDataset(Seq(S2("heloo", Map(0 -> "0"), BS(List(0), List(Edge(0, 1, DepRelation("0-->1", "", ""))))))).foreach(x => println(x))
Exception in thread "main" scala.ScalaReflectionException: type V is not a class
at scala.reflect.api.Symbols$SymbolApi$class.asClass(Symbols.scala:275)
at scala.reflect.internal.Symbols$SymbolContextApiImpl.asClass(Symbols.scala:84)
at org.apache.spark.sql.catalyst.ScalaReflection$.getClassFromType(ScalaReflection.scala:689)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor$1.apply(ScalaReflection.scala:84)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor$1.apply(ScalaReflection.scala:66)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809)
at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor(ScalaReflection.scala:65)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.toCatalystArray$1(ScalaReflection.scala:458)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:503)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:455)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809)
at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:455)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$10.apply(ScalaReflection.scala:626)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$10.apply(ScalaReflection.scala:614)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:614)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:455)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809)
at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:455)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$10.apply(ScalaReflection.scala:626)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1$$anonfun$10.apply(ScalaReflection.scala:614)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:614)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor$1.apply(ScalaReflection.scala:455)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
at org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:809)
at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:455)
at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:444)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
at org.apache.spark.sql.LowPrioritySQLImplicits$class.newProductEncoder(SQLImplicits.scala:233)
at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:33)
And here is my env:
val conf = new SparkConf()
.setAppName("test")
.setMaster("local[*]")
val spark = new SparkContext(conf)
val sparkSQL = new SQLContext(spark)
import sparkSQL.implicits._
--Edited 1-- Edge and DepRel definitions per request in comments
case class Edge[V,E](from:V, to:V, label:E)
case class DepRelation(vl:String)

It's pretty obvious from your error stack trace that it's failing for the following.
Exception in thread "main" scala.ScalaReflectionException: type V is not a class
So, here the main pain point is Type V which you have used to define the following case class.
case class Edge[V,E](from:V, to:V, label:E)
What types are V (and E as well?). For example Are they String/Int/Double? Could you please show me the code here? (Because here lies the answer of your question)
If you are already sure about the type information of those V and E, then try to rewrite your case class definition of Edge like the following (by inferring to actual type). For e.g.
case class Edge[V: Int,E: Int](from:V, to:V, label:E)
It might solve your problem or you may need to look down further on your type V.
Hope, this helps.

Apache Spark 2.0: java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalDate

I am using Apache Spark 2.0 and creating case class for mention schema for DetaSet. When i am trying to define custom encoder according to How to store custom objects in Dataset?, for java.time.LocalDate i got following exception:
java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalDate
- field (class: "java.time.LocalDate", name: "callDate")
- root class: "FireService"
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
............
Following is by code:
case class FireService(callNumber: String, callDate: java.time.LocalDate)
implicit val localDateEncoder: org.apache.spark.sql.Encoder[java.time.LocalDate] = org.apache.spark.sql.Encoders.kryo[java.time.LocalDate]
val fireServiceDf = df.map(row => {
val dateFormatter = java.time.format.DateTimeFormatter.ofPattern("MM/dd /yyyy")
FireService(row.getAs[String](0), java.time.LocalDate.parse(row.getAs[String](4), dateFormatter))
})
How we can define third party api's encoder for spark?
Update
When i create the encoder for whole case class, df.map.. map the object into binary, as below:
implicit val fireServiceEncoder: org.apache.spark.sql.Encoder[FireService] = org.apache.spark.sql.Encoders.kryo[FireService]
val fireServiceDf = df.map(row => {
val dateFormatter = java.time.format.DateTimeFormatter.ofPattern("MM/dd/yyyy")
FireService(row.getAs[String](0), java.time.LocalDate.parse(row.getAs[String](4), dateFormatter))
})
fireServiceDf: org.apache.spark.sql.Dataset[FireService] = [value: binary]
I am expecting map for FireService, but return binary of map.

As the last comment there says, "if class contains a field Bar you need encoder for a whole object." You need to provide an implicit Encoder for FireService itself; otherwise Spark constructs one for you using SQLImplicits.newProductEncoder[T <: Product : TypeTag]: Encoder[T]. You can see from the type that it doesn't use any implicit Encoder parameters for fields, so it can't use presence of localDateEncoder.
Spark could be changed to handle this e.g. using the Shapeless library, or using macros directly; I don't know whether this is the plan in the future.

Even trivial serialization examples in Scala don't work. Why?

I am trying the simplest possible serialization examples of a class:
#serializable class Person(age:Int) {}
val fred = new Person(45)
import java.io._
val out = new ObjectOutputStream(new FileOutputStream("test.obj"))
out.writeObject(fred)
out.close()
This throws exception "java.io.NotSerializableException: Main$$anon$1$Person" on me. Why?
Is there a simple serialization example?
I also tried
#serializable class Person(nm:String) {
private val name:String=nm
}
val fred = new Person("Fred")
...
and tried to remove #serializable and some other permutations. The file "test.obj" is created, over 2Kb in size and has plausible contents.
EDIT:
Reading the "test.obj" back in (from the 2nd answer below) causes
Welcome to Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM,
Java 1.7.0_51). Type in expressions to have them evaluated. Type :help
for more information.
scala> import java.io._ import java.io._
scala> val fis = new FileInputStream( "test.obj" ) fis:
java.io.FileInputStream = java.io.FileInputStream#716ad1b3
scala> val oin = new ObjectInputStream( fis ) oin:
java.io.ObjectInputStream = java.io.ObjectInputStream#1f927f0a
scala> val p= oin.readObject java.io.WriteAbortedException: writing
aborted; java.io.NotSerializableException: Main$$anon$1
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1354)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at .(:12)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:756)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:801)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:713)
at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:577)
at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:584)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:587)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:878)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:833)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:833)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:833)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala) Caused
by: java.io.NotSerializableException: Main$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at Main$$anon$1.(a.scala:11)
at Main$.main(a.scala:1)
at Main.main(a.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:139)
at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:139)
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:28)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:45)
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:35)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:45)
at scala.tools.nsc.ScriptRunner.scala$tools$nsc$ScriptRunner$$runCompiled(ScriptRunner.scala:171)
at scala.tools.nsc.ScriptRunner$$anonfun$runScript$1.apply(ScriptRunner.scala:188)
at scala.tools.nsc.ScriptRunner$$anonfun$runScript$1.apply(ScriptRunner.scala:188)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply$mcZ$sp(ScriptRunner.scala:157)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:131)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:131)
at scala.tools.nsc.util.package$.trackingThreads(package.scala:51)
at scala.tools.nsc.util.package$.waitingForThreads(package.scala:35)
at scala.tools.nsc.ScriptRunner.withCompiledScript(ScriptRunner.scala:130)
at scala.tools.nsc.ScriptRunner.runScript(ScriptRunner.scala:188)
at scala.tools.nsc.ScriptRunner.runScriptAndCatch(ScriptRunner.scala:201)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:76)
... 3 more

Note that #serializable scaladoc tells that it is deprecated since 2.9.0:
Deprecated (Since version 2.9.0) instead of #serializable class C, use class C extends Serializable
So you just have to use Serializable trait:
class Person(val age: Int) extends Serializable
This works for me (type :paste in REPL and paste these lines):
import java.io.{ObjectOutputStream, ObjectInputStream}
import java.io.{FileOutputStream, FileInputStream}
class Person(val age: Int) extends Serializable {
override def toString = s"Person($age)"
}
val os = new ObjectOutputStream(new FileOutputStream("/tmp/example.dat"))
os.writeObject(new Person(22))
os.close()
val is = new ObjectInputStream(new FileInputStream("/tmp/example.dat"))
val obj = is.readObject()
is.close()
obj
This is the output:
// Exiting paste mode, now interpreting.
import java.io.{ObjectOutputStream, ObjectInputStream}
import java.io.{FileOutputStream, FileInputStream}
defined class Person
os: java.io.ObjectOutputStream = java.io.ObjectOutputStream#5126abfd
is: java.io.ObjectInputStream = java.io.ObjectInputStream#41e598aa
obj: Object = Person(22)
res8: Object = Person(22)
So, you can see, the [de]serialization attempt was successful.
Edit (on why you're getting NotSerializableException when you run Scala script from file)
I've put my code into a file and tried to run it via scala test.scala and got exactly the same error as you. Here is my speculation on why it happens.
According to the stack trace a weird class Main$$anon$1 is not serializable. Logical question is: why it is there in the first place? We're trying to serialize Person after all, not something weird.
Scala script is special in that it is implicitly wrapped into an object called Main. This is indicated by the stack trace:
at Main$$anon$1.<init>(test.scala:9)
at Main$.main(test.scala:1)
at Main.main(test.scala)
The names here suggest that Main.main static method is the program entry point, and this method delegates to Main$.main instance method (object's class is named after the object but with $ appended). This instance method in turn tries to create an instance of a class Main$$anon$1. As far as I remember, anonymous classes are named that way.
Now, let's try to find exact Person class name (run this as Scala script):
class Person(val age: Int) extends Serializable {
override def toString = s"Person($age)"
}
println(new Person(22).getClass)
This prints something I was expecting:
class Main$$anon$1$Person
This means that Person is not a top-level class; instead it is a nested class defined in the anonymous class generated by the compiler! So in fact we have something like this:
object Main {
def main(args: Array[String]) {
new { // this is where Main$$anon$1 is generated, and the following code is its constructor body
class Person(val age: Int) extends Serializable { ... }
// all other definitions
}
}
}
But in Scala all nested classes are something called "nested non-static" (or "inner") classes in Java. This means that these classes always contain an implicit reference to an instance of their enclosing class. In this case, enclosing class is Main$$anon$1. Because of that when Java serializer tries to serialize Person, it transitively encounters an instance of Main$$anon$1 and tries to serialize it, but since it is not Serializable, the process fails. BTW, serializing non-static inner classes is a notorious thing in Java world, it is known to cause problems like this one.
As for why it works in REPL, it seems that in REPL declared classes somehow do not end up as inner ones, so they don't have any implicit fields. Hence serialization works normally for them.

You could use the Serializable Trait:
Trivial Serialization example using Java Serialization with the Serializable Trait:
case class Person(age: Int) extends Serializable
Usage:
Serialization, Write Object
val fos = new FileOutputStream( "person.serializedObject" )
val o = new ObjectOutputStream( fos )
o writeObject Person(31)
Deserialization, Read Object
val fis = new FileInputStream( "person.serializedObject" )
val oin = new ObjectInputStream( fis )
val p= oin.readObject
Which creates following output
fis: java.io.FileInputStream = java.io.FileInputStream#43a2bc95
oin: java.io.ObjectInputStream = java.io.ObjectInputStream#710afce3
p: Object = Person(31)
As you see the deserialization can't infer the Object Type itself, which is a clear drawback.
Serialization with Scala-Pickling
https://github.com/scala/pickling or part of the Standard-Distribution starting with Scala 2.11
In the exmple code the object is not written to a file and JSON is used instead of ByteCode Serialization which avoids certain problems originating in byte code incompatibilities between different Scala version.
import scala.pickling._
import json._
case class Person(age: Int)
val person = Person(31)
val pickledPerson = person.pickle
val unpickledPerson = pickledPerson.unpickle[Person]

class Person(age:Int) {} is equivalent to the Java code:
class Person{
Person(Int age){}
}
which is probably not what you want. Note that the parameter age is simply discarded and Person has no member fields.
You want either:
#serializable case class Person(age:Int)
#serializable class Person(val age:Int)
You can leave out the empty curly brackets at the end. In fact, it's encouraged.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

spark custom class encoder - scala

you have to use Encoders.bean so : implicit val encoder = Encoders.bean[CustomClass ](classOf[CustomClass ])

Related

Scala & Spark : java.lang.ArrayStoreException on deserialisation

intermittent exception from a spark job: Schema for type scala.collection.Map[String,String] is not supported

Spark dataset and scala.ScalaReflectionException: type V is not a class

Apache Spark 2.0: java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalDate

Even trivial serialization examples in Scala don't work. Why?

Categories

Resources