Generating recursive structures in scalacheck - scala

I'm trying to make a generator for a recursive datatype called Row. A row is a list of named Vals, where a Val is either an atomic Bin or else a nested Row.
This is my code:
package com.dtci.data.anonymize.parquet
import java.nio.charset.StandardCharsets
import org.scalacheck.Gen
object TestApp extends App {
sealed trait Val
case class Bin(bytes: Array[Byte]) extends Val
object Bin {
def from_string(str: String): Bin = Bin(str.getBytes(StandardCharsets.UTF_8))
}
case class Row(flds: List[(String, Val)]) extends Val
val gen_bin = Gen.alphaStr.map(Bin.from_string)
val gen_field_name = Gen.alphaLowerStr
val gen_field = Gen.zip(gen_field_name, gen_val)
val gen_row = Gen.nonEmptyListOf(gen_field).map(Row.apply)
def gen_val: Gen[Val] = Gen.oneOf(gen_bin, gen_row)
gen_row.sample.get.flds.foreach( fld => println(s"${fld._1} --> ${fld._2}"))
}
It crashes with the following stack trace:
Exception in thread "main" java.lang.NullPointerException
at org.scalacheck.Gen.$anonfun$flatMap$2(Gen.scala:84)
at org.scalacheck.Gen$R.flatMap(Gen.scala:243)
at org.scalacheck.Gen$R.flatMap$(Gen.scala:240)
at org.scalacheck.Gen$R$$anon$3.flatMap(Gen.scala:228)
at org.scalacheck.Gen.$anonfun$flatMap$1(Gen.scala:84)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
at org.scalacheck.Gen.$anonfun$map$1(Gen.scala:79)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen.$anonfun$flatMap$2(Gen.scala:84)
at org.scalacheck.Gen$R.flatMap(Gen.scala:243)
at org.scalacheck.Gen$R.flatMap$(Gen.scala:240)
at org.scalacheck.Gen$R$$anon$3.flatMap(Gen.scala:228)
at org.scalacheck.Gen.$anonfun$flatMap$1(Gen.scala:84)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
at org.scalacheck.Gen$.$anonfun$sequence$2(Gen.scala:492)
at scala.collection.LinearSeqOps.foldLeft(LinearSeq.scala:168)
at scala.collection.LinearSeqOps.foldLeft$(LinearSeq.scala:164)
at scala.collection.immutable.List.foldLeft(List.scala:79)
at org.scalacheck.Gen$.$anonfun$sequence$1(Gen.scala:490)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen.$anonfun$map$1(Gen.scala:79)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
at org.scalacheck.Gen.$anonfun$flatMap$2(Gen.scala:84)
at org.scalacheck.Gen$R.flatMap(Gen.scala:243)
at org.scalacheck.Gen$R.flatMap$(Gen.scala:240)
at org.scalacheck.Gen$R$$anon$3.flatMap(Gen.scala:228)
at org.scalacheck.Gen.$anonfun$flatMap$1(Gen.scala:84)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen$.$anonfun$sized$1(Gen.scala:551)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
at org.scalacheck.Gen.$anonfun$map$1(Gen.scala:79)
at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
at org.scalacheck.Gen.sample(Gen.scala:154)
What's wrong with my code, and what would have been the best way for me to diagnose it myself?
As a note, I've seen the remarks about Gen.oneOf being strict and needing Gen.lzy for recursive structures. But if, in my code, I wrap the definition of gen_val inside of Gen.lzy(...) then I get a stack overflow rather than the current null pointer exception.

First of all, be careful using object Main extends App. I find its fields initialization semantic less obvious than plain old main with line-after-line semantics:
object Main {
def main(args: Array[String]): Unit = {...}
}
This is likely a problem with the NullPointerException.
Usually, it can be fixed by careful checking out fields initialization order and marking some (or all of them) val's as lazy.
The StackOverflowError arises because of too deep generated data structure.
Generally, when you are dealing with any kind of recursion, always consider the base case when the recursion should stop and the step which eventually will hit the base case.
In your particular case we can utilize the Gen.sized and Gen.resize which are responsible for how "big" are generated elements (checkout docs and google for more information).
package com.dtci.data.anonymize.parquet
import java.nio.charset.StandardCharsets
import org.scalacheck.Gen
object Main extends App {
sealed trait Val
case class Bin(bytes: Array[Byte]) extends Val
object Bin {
def from_string(str: String): Bin = Bin(str.getBytes(StandardCharsets.UTF_8))
}
case class Row(flds: List[(String, Val)]) extends Val
val gen_bin = Gen.alphaStr.map(Bin.from_string)
val gen_field_name = Gen.alphaLowerStr
val gen_field = Gen.zip(gen_field_name, gen_val)
val gen_row = Gen.sized(size => Gen.resize(size / 2, Gen.nonEmptyListOf(gen_field).map(Row.apply)))
def gen_val: Gen[Val] = Gen.sized { size =>
if (size <= 0) {
gen_bin
} else {
Gen.oneOf(gen_bin, gen_row)
}
}
gen_row.sample.get.flds.foreach(fld => println(s"${fld._1} --> ${fld._2}"))
}

Related

java.lang.NoSuchMethodException: <Class>.<init>(java.lang.String) when copying custom Transformer

Currently playing with custom tranformers in my spark-shell using both spark 2.0.1 and 2.2.1.
While writing a custom ml transformer, in order to add it to a pipeline, I noticed that there is an issue with the override of the copy method.
The copy method is called by the fit method of the TrainValidationSplit in my case.
The error I get :
java.lang.NoSuchMethodException: Custom.<init>(java.lang.String)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at org.apache.spark.ml.param.Params$class.defaultCopy(params.scala:718)
at org.apache.spark.ml.PipelineStage.defaultCopy(Pipeline.scala:42)
at Custom.copy(<console>:16)
... 48 elided
I then tried to directly call the copy method but I still get the same error.
Here is myclass and the call I perform :
import org.apache.spark.ml.Transformer
import org.apache.spark.sql.{Dataset, DataFrame}
import org.apache.spark.sql.types.{StructField, StructType, DataTypes}
import org.apache.spark.ml.param.{Param, ParamMap}
// Simple DF
val doubles = Seq((0, 5d, 100d), (1, 4d,500d), (2, 9d,700d)).toDF("id", "rating","views")
class Custom(override val uid: String) extends org.apache.spark.ml.Transformer {
def this() = this(org.apache.spark.ml.util.Identifiable.randomUID("custom"))
def copy(extra: org.apache.spark.ml.param.ParamMap): Custom = {
defaultCopy(extra)
}
override def transformSchema(schema: org.apache.spark.sql.types.StructType): org.apache.spark.sql.types.StructType = {
schema.add(org.apache.spark.sql.types.StructField("trending", org.apache.spark.sql.types.IntegerType, false))
}
def transform(df: org.apache.spark.sql.Dataset[_]): org.apache.spark.sql.DataFrame = {
df.withColumn("trending", (df.col("rating") > 4 && df.col("views") > 40))
}
}
val mycustom = new Custom("Custom")
// This call throws the exception.
mycustom.copy(new org.apache.spark.ml.param.ParamMap())
Does anyone know if this is a known issue ? I cant seem to find it anywhere.
Is there another way to implement the copy method in a custom transformer ?
Thanks
These are a couple of things that I would change about your custom Transformer (also to enable SerDe operations of your PipelineModel):
Implement the DefaultParamsWritable trait
Add a Companion object that extends the DefaultParamsReadable Interface
e.g.
class Custom(override val uid: String) extends Transformer
with DefaultParamsWritable {
...
...
}
object Custom extends DefaultParamsReadable[Custom]
Do take a look at the UnaryTransformer if you have only 1 Input/Output columns.
Finally, what's the need to call mycustom.copy(new ParamMap()) exactly??

Class 'SessionTrigger' must either be declared abstract or implement abstract member

I am building a tutorial on Flink 1.2 and I want to run some simple windowing examples. One of them is Session Windows.
The code I want to run is the following:
import <package>.Session
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.GlobalWindows
import org.apache.flink.streaming.api.windowing.triggers.PurgingTrigger
import org.apache.flink.streaming.api.windowing.windows.GlobalWindow
import scala.util.Try
object SessionWindowExample {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val source = env.socketTextStream("localhost", 9000)
//session map
val values = source.map(value => {
val columns = value.split(",")
val endSignal = Try(Some(columns(2))).getOrElse(None)
Session(columns(0), columns(1).toDouble, endSignal)
})
val keyValue = values.keyBy(_.sessionId)
// create global window
val sessionWindowStream = keyValue.
window(GlobalWindows.create()).
trigger(PurgingTrigger.of(new SessionTrigger[GlobalWindow]()))
sessionWindowStream.sum("value").print()
env.execute()
}
}
As you 'll notice I need to instantiate a new SessionTrigger object which I do, based on this class:
import <package>.Session
import org.apache.flink.streaming.api.windowing.triggers.Trigger.TriggerContext
import org.apache.flink.streaming.api.windowing.triggers.{Trigger, TriggerResult}
import org.apache.flink.streaming.api.windowing.windows.Window
class SessionTrigger[W <: Window] extends Trigger[Session,W] {
override def onElement(element: Session, timestamp: Long, window: W, ctx: TriggerContext): TriggerResult = {
if(element.endSignal.isDefined) TriggerResult.FIRE
else TriggerResult.CONTINUE
}
override def onProcessingTime(time: Long, window: W, ctx: TriggerContext): TriggerResult = {
TriggerResult.CONTINUE
}
override def onEventTime(time: Long, window: W, ctx: TriggerContext): TriggerResult = {
TriggerResult.CONTINUE
}
}
However, InteliJ keeps complaining that:
Class 'SessionTrigger' must either be declared abstract or implement abstract member 'clear(window: W, ctx: TriggerContext):void' in 'org.apache.flink.streaming.api.windowing.triggers.Trigger'.
I tried adding this in the class:
override def clear(window: W, ctx: TriggerContext): Unit = ctx.deleteEventTimeTimer(4)
but it is not working. This is the error I am getting:
03/27/2017 15:48:38 TriggerWindow(GlobalWindows(), ReducingStateDescriptor{serializer=co.uk.DRUK.flink.windowing.SessionWindowExample.SessionWindowExample$$anon$2$$anon$1#1aec64d0, reduceFunction=org.apache.flink.streaming.api.functions.aggregation.SumAggregator#1a052a00}, PurgingTrigger(co.uk.DRUK.flink.windowing.SessionWindowExample.SessionTrigger#f2f2cc1), WindowedStream.reduce(WindowedStream.java:276)) -> Sink: Unnamed(4/4) switched to CANCELED
03/27/2017 15:48:38 Job execution switched to status FAILED.
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at co.uk.DRUK.flink.windowing.SessionWindowExample.SessionWindowExample$$anonfun$1.apply(SessionWindowExample.scala:27)
at co.uk.DRUK.flink.windowing.SessionWindowExample.SessionWindowExample$$anonfun$1.apply(SessionWindowExample.scala:24)
at org.apache.flink.streaming.api.scala.DataStream$$anon$4.map(DataStream.scala:521)
at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:38)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:185)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:272)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:655)
at java.lang.Thread.run(Thread.java:745)
Process finished with exit code 1
Anybody know why?
Well the exception clearly says
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at co.uk.DRUK.flink.windowing.SessionWindowExample.SessionWindowExample$$anonfun$1.apply(SessionWindowExample.scala:27)
at co.uk.DRUK.flink.windowing.SessionWindowExample.SessionWindowExample$$anonfun$1.apply(SessionWindowExample.scala:24)
which obviously maps to the following line of code
Session(columns(0), columns(1).toDouble, endSignal)
So the next obvious thing is to log your columns and value after
val columns = value.split(",")
I suspect that value just doesn't contain second comma-separated column at least for some values.

Code working in Spark-Shell not in eclipse

I have a small Scala code which works properly on Spark-Shell but not in Eclipse with Scala plugin. I can access hdfs using plugin tried writing another file and it worked..
FirstSpark.scala
package bigdata.spark
import org.apache.spark.SparkConf
import java. io. _
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object FirstSpark {
def main(args: Array[String])={
val conf = new SparkConf().setMaster("local").setAppName("FirstSparkProgram")
val sparkcontext = new SparkContext(conf)
val textFile =sparkcontext.textFile("hdfs://pranay:8020/spark/linkage")
val m = new Methods()
val q =textFile.filter(x => !m.isHeader(x)).map(x=> m.parse(x))
q.saveAsTextFile("hdfs://pranay:8020/output") }
}
Methods.scala
package bigdata.spark
import java.util.function.ToDoubleFunction
class Methods {
def isHeader(s:String):Boolean={
s.contains("id_1")
}
def parse(line:String) ={
val pieces = line.split(',')
val id1=pieces(0).toInt
val id2=pieces(1).toInt
val matches=pieces(11).toBoolean
val mapArray=pieces.slice(2, 11).map(toDouble)
MatchData(id1,id2,mapArray,matches)
}
def toDouble(s: String) = {
if ("?".equals(s)) Double.NaN else s.toDouble
}
}
case class MatchData(id1: Int, id2: Int,
scores: Array[Double], matched: Boolean)
Error Message:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2032)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:335)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:334)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
Can anyone please help me with this
Try changing class Methods { .. } to object Methods { .. }.
I think the problem is at val q =textFile.filter(x => !m.isHeader(x)).map(x=> m.parse(x)). When Spark sees the filter and map functions it tries to serialize the functions passed to them (x => !m.isHeader(x) and x=> m.parse(x)) so that it can dispatch the work of executing them to all of the executors (this is the Task referred to). However, to do this, it needs to serialize m, since this object is referenced inside the function (it is in the closure of the two anonymous methods) - but it cannot do this since Methods is not serializable. You could add extends Serializable to the Methods class, but in this case an object is more appropriate (and is already Serializable).

Spark: Task not serializable (Broadcast/RDD/SparkContext)

There are numerous questions about Task is not serializable in Spark. However, this case seems quite particular.
I have created a class:
class Neighbours(e: RDD[E], m: KMeansModel) extends Serializable {
val allEs: RDD[(String, E)] = e.map(e => (e.w, e))
.persist()
val sc = allEs.sparkContext
val centroids = sc.broadcast(m.clusterCenters)
[...]
The class defines the following method:
private def centroidDistances(v: Vector): Array[Double] = {
centroids.value.map(c => (centroids.value.indexOf(c), Vectors.sqdist(v, c)))
.sortBy(_._1)
.map(_._2)
}
However, when the class is called, a Task is not serializable exception is thrown.
Strange enough, a tiny change in the header of class Neighbours suffices to fix the issue. Instead of creating a val sc: SparkContext to use for broadcasting, I merely inline the code that creates the Spark context:
class Neighbours(e: RDD[E], m: KMeansModel) extends Serializable {
val allEs: RDD[(String, E)] = e.map(e => (e.w, e))
.setName("embeddings")
.persist()
val centroids = allEmbeddings.sparkContext(m.clusterCenters)
[...]
My question is: how does the second variant make a difference? What goes wrong in the first one? Intuitively, this should be merely syntactic sugar, is this a bug in Spark?
I use Spark 1.4.1 on a Hadoop/Yarn cluster.
When you define
class Neighbours(e: RDD[E], m: KMeansModel) extends Serializable {
...
val sc = allEmbeddings.sparkContext
val centroids = sc.broadcast(m.clusterCenters)
...
}
You have made sc into a class variable, meaning it could be accessed from an instance of Neighbours e.g. neighbours.sc. This means that sc needs to be serializable, which is it not.
When you inline the code, only the final value of centroids needs to be serializable. centroids is of type Broadcast which is Serializable.

Even trivial serialization examples in Scala don't work. Why?

I am trying the simplest possible serialization examples of a class:
#serializable class Person(age:Int) {}
val fred = new Person(45)
import java.io._
val out = new ObjectOutputStream(new FileOutputStream("test.obj"))
out.writeObject(fred)
out.close()
This throws exception "java.io.NotSerializableException: Main$$anon$1$Person" on me. Why?
Is there a simple serialization example?
I also tried
#serializable class Person(nm:String) {
private val name:String=nm
}
val fred = new Person("Fred")
...
and tried to remove #serializable and some other permutations. The file "test.obj" is created, over 2Kb in size and has plausible contents.
EDIT:
Reading the "test.obj" back in (from the 2nd answer below) causes
Welcome to Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM,
Java 1.7.0_51). Type in expressions to have them evaluated. Type :help
for more information.
scala> import java.io._ import java.io._
scala> val fis = new FileInputStream( "test.obj" ) fis:
java.io.FileInputStream = java.io.FileInputStream#716ad1b3
scala> val oin = new ObjectInputStream( fis ) oin:
java.io.ObjectInputStream = java.io.ObjectInputStream#1f927f0a
scala> val p= oin.readObject java.io.WriteAbortedException: writing
aborted; java.io.NotSerializableException: Main$$anon$1
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1354)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at .(:12)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:756)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:801)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:713)
at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:577)
at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:584)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:587)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:878)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:833)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:833)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:833)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala) Caused
by: java.io.NotSerializableException: Main$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at Main$$anon$1.(a.scala:11)
at Main$.main(a.scala:1)
at Main.main(a.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:139)
at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:139)
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:28)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:45)
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:35)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:45)
at scala.tools.nsc.ScriptRunner.scala$tools$nsc$ScriptRunner$$runCompiled(ScriptRunner.scala:171)
at scala.tools.nsc.ScriptRunner$$anonfun$runScript$1.apply(ScriptRunner.scala:188)
at scala.tools.nsc.ScriptRunner$$anonfun$runScript$1.apply(ScriptRunner.scala:188)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply$mcZ$sp(ScriptRunner.scala:157)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:131)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:131)
at scala.tools.nsc.util.package$.trackingThreads(package.scala:51)
at scala.tools.nsc.util.package$.waitingForThreads(package.scala:35)
at scala.tools.nsc.ScriptRunner.withCompiledScript(ScriptRunner.scala:130)
at scala.tools.nsc.ScriptRunner.runScript(ScriptRunner.scala:188)
at scala.tools.nsc.ScriptRunner.runScriptAndCatch(ScriptRunner.scala:201)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:76)
... 3 more
Note that #serializable scaladoc tells that it is deprecated since 2.9.0:
Deprecated (Since version 2.9.0) instead of #serializable class C, use class C extends Serializable
So you just have to use Serializable trait:
class Person(val age: Int) extends Serializable
This works for me (type :paste in REPL and paste these lines):
import java.io.{ObjectOutputStream, ObjectInputStream}
import java.io.{FileOutputStream, FileInputStream}
class Person(val age: Int) extends Serializable {
override def toString = s"Person($age)"
}
val os = new ObjectOutputStream(new FileOutputStream("/tmp/example.dat"))
os.writeObject(new Person(22))
os.close()
val is = new ObjectInputStream(new FileInputStream("/tmp/example.dat"))
val obj = is.readObject()
is.close()
obj
This is the output:
// Exiting paste mode, now interpreting.
import java.io.{ObjectOutputStream, ObjectInputStream}
import java.io.{FileOutputStream, FileInputStream}
defined class Person
os: java.io.ObjectOutputStream = java.io.ObjectOutputStream#5126abfd
is: java.io.ObjectInputStream = java.io.ObjectInputStream#41e598aa
obj: Object = Person(22)
res8: Object = Person(22)
So, you can see, the [de]serialization attempt was successful.
Edit (on why you're getting NotSerializableException when you run Scala script from file)
I've put my code into a file and tried to run it via scala test.scala and got exactly the same error as you. Here is my speculation on why it happens.
According to the stack trace a weird class Main$$anon$1 is not serializable. Logical question is: why it is there in the first place? We're trying to serialize Person after all, not something weird.
Scala script is special in that it is implicitly wrapped into an object called Main. This is indicated by the stack trace:
at Main$$anon$1.<init>(test.scala:9)
at Main$.main(test.scala:1)
at Main.main(test.scala)
The names here suggest that Main.main static method is the program entry point, and this method delegates to Main$.main instance method (object's class is named after the object but with $ appended). This instance method in turn tries to create an instance of a class Main$$anon$1. As far as I remember, anonymous classes are named that way.
Now, let's try to find exact Person class name (run this as Scala script):
class Person(val age: Int) extends Serializable {
override def toString = s"Person($age)"
}
println(new Person(22).getClass)
This prints something I was expecting:
class Main$$anon$1$Person
This means that Person is not a top-level class; instead it is a nested class defined in the anonymous class generated by the compiler! So in fact we have something like this:
object Main {
def main(args: Array[String]) {
new { // this is where Main$$anon$1 is generated, and the following code is its constructor body
class Person(val age: Int) extends Serializable { ... }
// all other definitions
}
}
}
But in Scala all nested classes are something called "nested non-static" (or "inner") classes in Java. This means that these classes always contain an implicit reference to an instance of their enclosing class. In this case, enclosing class is Main$$anon$1. Because of that when Java serializer tries to serialize Person, it transitively encounters an instance of Main$$anon$1 and tries to serialize it, but since it is not Serializable, the process fails. BTW, serializing non-static inner classes is a notorious thing in Java world, it is known to cause problems like this one.
As for why it works in REPL, it seems that in REPL declared classes somehow do not end up as inner ones, so they don't have any implicit fields. Hence serialization works normally for them.
You could use the Serializable Trait:
Trivial Serialization example using Java Serialization with the Serializable Trait:
case class Person(age: Int) extends Serializable
Usage:
Serialization, Write Object
val fos = new FileOutputStream( "person.serializedObject" )
val o = new ObjectOutputStream( fos )
o writeObject Person(31)
Deserialization, Read Object
val fis = new FileInputStream( "person.serializedObject" )
val oin = new ObjectInputStream( fis )
val p= oin.readObject
Which creates following output
fis: java.io.FileInputStream = java.io.FileInputStream#43a2bc95
oin: java.io.ObjectInputStream = java.io.ObjectInputStream#710afce3
p: Object = Person(31)
As you see the deserialization can't infer the Object Type itself, which is a clear drawback.
Serialization with Scala-Pickling
https://github.com/scala/pickling or part of the Standard-Distribution starting with Scala 2.11
In the exmple code the object is not written to a file and JSON is used instead of ByteCode Serialization which avoids certain problems originating in byte code incompatibilities between different Scala version.
import scala.pickling._
import json._
case class Person(age: Int)
val person = Person(31)
val pickledPerson = person.pickle
val unpickledPerson = pickledPerson.unpickle[Person]
class Person(age:Int) {} is equivalent to the Java code:
class Person{
Person(Int age){}
}
which is probably not what you want. Note that the parameter age is simply discarded and Person has no member fields.
You want either:
#serializable case class Person(age:Int)
#serializable class Person(val age:Int)
You can leave out the empty curly brackets at the end. In fact, it's encouraged.