Scala Case Class serialization - scala

I've been trying to binary serialize a composite case class object that kept throwing a weird exception. I don't really understand what is wrong with this example which throws the following exception. I used to get that exception for circular references which is not the case here. Some hints please?
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field com.Table.rows of type scala.collection.immutable.List in instance of com.Table
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field com.Table.rows of type scala.collection.immutable.List in instance of com.Table
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at com.TestSeri$.serializeBinDeserialise(TestSeri.scala:37)
at com.TestSeri$.main(TestSeri.scala:22)
at com.TestSeri.main(TestSeri.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
Here is the code
import java.io._
import scalax.file.Path
case class Row(name: String)
case class Table(rows: List[Row])
case class Cont(docs: Map[String, Table])
case object TestSeri {
def main(args: Array[String]) {
val cc = Cont(docs = List(
"1" -> Table(rows = List(Row("r1"), Row("r2"))),
"2" -> Table(rows = List(Row("r301"), Row("r31"), Row("r32")))
).toMap)
val tt = Table(rows = List(Row("r1"), Row("r2")))
val ttdes = serializeBinDeserialize(tt)
println(ttdes == tt)
val ccdes = serializeBinDeserialize(cc)
println(ccdes == cc)
}
def serializeBinDeserialize[T](payload: T): T = {
val bos = new ByteArrayOutputStream()
val out = new ObjectOutputStream(bos)
out.writeObject(payload)
val bis = new ByteArrayInputStream(bos.toByteArray)
val in = new ObjectInputStream(bis)
in.readObject().asInstanceOf[T]
}
}

Replacing List with Array which is immutable too, fixed the problem.
In my original problem I had a Map which I replaced with TreeMap.
I think is likely to be related to the proxy pattern implementation in generic immutable List and Map mentioned here:
https://issues.scala-lang.org/browse/SI-9237.
Can't believe I wasted a full day on this.

Related

Scala error: Exception in thread "main" org.apache.spark.SparkException: Task not serializable

I got not serializable error when running this code:
import org.apache.spark.{SparkContext, SparkConf}
import scala.collection.mutable.ArrayBuffer
object Task1 {
def findHighestRatingUsers(movieRating: String): (String) = {
val tokens = movieRating.split(",", -1)
val movieTitle = tokens(0)
val ratings = tokens.slice(1, tokens.size)
val maxRating = ratings.max
var userIds = ArrayBuffer[Int]()
for(i <- 0 until ratings.length){
if (ratings(i) == maxRating) {
userIds += (i+1)
}
}
return movieTitle + "," + userIds.mkString(",")
return movieTitle
}
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Task 1")
val sc = new SparkContext(conf)
val Lines = sc.textFile(args(0))
val TitleAndMaxUserIds = Lines.map(findHighestRatingUsers)
.saveAsTextFile(args(1))
}
}
The error occurs at line:
val TitleAndMaxUserIds =Lines.map(findHighestRatingUsers)
.saveAsTextFile(args(1))
I believe this is due to something in function 'findHighestRatingUsers'. Could somebody explain why and how to fix it?
More info in the exception is like:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2362)
at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:396)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
at org.apache.spark.rdd.RDD.map(RDD.scala:395)
at Task1$.main(Task1.scala:63)
at Task1.main(Task1.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: Task1$
Serialization stack:
- object not serializable (class: Task1$, value: Task1$#3c770db4)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 1)
- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class Task1$, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic Task1$.$anonfun$main$1:(LTask1$;Ljava/lang/String;)Ljava/lang/String;, instantiatedMethodType=(Ljava/lang/String;)Ljava/lang/String;, numCaptured=1])
- writeReplace data (class: java.lang.invoke.SerializedLambda)
- object (class Task1$$$Lambda$1023/20408451, Task1$$$Lambda$1023/20408451#4f59a516)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:413)
... 22 more
I checked this post
Difference between object and class in Scala and tried to use object to enclose the function:
import org.apache.spark.{SparkContext, SparkConf}
import scala.collection.mutable.ArrayBuffer
object Function{
def _findHighestRatingUsers(movieRating: String): (String) = {
val tokens = movieRating.split(",", -1)
val movieTitle = tokens(0)
val ratings = tokens.slice(1, tokens.size)
val maxRating = ratings.max
var userIds = ArrayBuffer[Int]()
for(i <- 0 until ratings.length){
if (ratings(i) == maxRating) {
userIds += (i+1)
}
}
return movieTitle + "," + userIds.mkString(",")
}
}
object Task1 {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Task 1")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val output = textFile.map(Function._findHighestRatingUsers)
.saveAsTextFile(args(1))
}
}
But still got exception With a huge amount of errors...
This time I tried to put object Function in the object task1 like this:
import org.apache.spark.{SparkContext, SparkConf}
import scala.collection.mutable.ArrayBuffer
object Task1 {
object Function{
def _findHighestRatingUsers(movieRating: String): (String) = {
val tokens = movieRating.split(",", -1)
val movieTitle = tokens(0)
val ratings = tokens.slice(1, tokens.size)
val maxRating = ratings.max
var userIds = ArrayBuffer[Int]()
for(i <- 0 until ratings.length){
if (ratings(i) == maxRating) {
userIds += (i+1)
}
}
return movieTitle + "," + userIds.mkString(",")
}
}
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Task 1")
val sc = new SparkContext(conf)
val textFile = sc.textFile(args(0))
val output = textFile.map(Function._findHighestRatingUsers)
.saveAsTextFile(args(1))
}
}
And problem solved. But I still don't know why the nested object solves this problem. Could somebody explain it?
And further more, I have several points not sure:
What is main function in scala? Is it the entrance of program?
Why we use an object to describe main function?
Could somebody give a common structure of a Scala program containing function, class or some basic components?
First thing is that I would recommend that you should get familiar by reading documentation both with Scala and Spark as your questions highlight that you are just starting working with it.
I'll give you some insights for your original question about "Task not serializable" (but not answering it precisely though) and let you open other questions for the questions you added later in your post otherwise this answer will be a mess.
As you probably know, Spark allows distributed computation. To do so, one thing Spark does is take the code you write, serialize it and send it to some executors somewhere to actually run it. The key part here is that your code must be serializable.
The error you got is telling you that Spark cannot serialize your code.
Now, how to make it serializable? This is where it can becomes challenging and even though Spark tries to help you by providing a "serialization stack", sometimes the info it provides are not that helpful.
In your case (1st example of code), findHighestRatingUsers must be serialized but to do so it has to serialize the whole object Task1 which is not serializable.
Why is Task1 not serializable? I'll admit I'm not really sure but I would bet on the main method, though I'd expected your 2nd example to be serializable then.
You can read more about this on various documentation or blog posts on the web. For instance: https://medium.com/swlh/spark-serialization-errors-e0eebcf0f6e6

Setting and Getting of Values using Scala Reflection

I have a patient resource of below type:
val p:Patient = new Patient
which comes under below package:
import org.hl7.fhir.r4.model.Patient
Now, I want to set some value for it like one ID attribute with value like example and when I try something like p.getId() I should be able to retrieve it. I was trying scala reflection and desgined below methods by referring one of the posts but not sure how to use it over here. Below are the methods for get and set:
object PatientInvoker {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().appName("Patient").master("local[1]").getOrCreate()
val patientOutput = "C:\\Users\\siddheshk2\\IdeaProjects\\fhir\\mapper\\src\\main\\resources\\patientOutput.json"
val idValue = spark.read.option("multiline", "true").json(patientOutput).select(col("id")).first.getString(0)
implicit def reflector(ref: AnyRef) = new {
def getV(name: String): Any = ref.getClass.getMethods.find(_.getName == name).get.invoke(ref)
def setV(name: String, value: Any): Unit = ref.getClass.getMethods.find(_.getName == name + "_$eq").get.invoke(ref, value.asInstanceOf[AnyRef])
}
val p: Patient = new Patient
p.setV("id", idValue)
println("id:" + p.getV("id"))
}
}
I am getting below error:
Exception in thread "main" java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at com.fhir.mapper.io.PatientInvoker$$anon$1.setV(StudentInvoker.scala:16)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
Unable to set value of idValue using reflector method. Please guide me through it

Why does custom DefaultSource give java.io.NotSerializableException?

this is my first post on SO and my apology if the improper format is being used.
I'm working with Apache Spark to create a new source (via DefaultSource), BaseRelations, etc... and run into a problem with serialization that I would like to understand better. Consider below a class that extends BaseRelation and implements the scan builder.
class RootTableScan(path: String, treeName: String)(#transient val sqlContext: SQLContext) extends BaseRelation with PrunedFilteredScan{
private val att: core.SRType =
{
val reader = new RootFileReader(new java.io.File(Seq(path) head))
val tmp =
if (treeName==null)
buildATT(findTree(reader.getTopDir), arrangeStreamers(reader), null)
else
buildATT(reader.getKey(treeName).getObject.asInstanceOf[TTree],
arrangeStreamers(reader), null)
tmp
}
// define the schema from the AST
def schema: StructType = {
val s = buildSparkSchema(att)
s
}
// builds a scan
def buildScan(requiredColumns: Array[String], filters: Array[Filter]): RDD[Row] = {
// parallelize over all the files
val r = sqlContext.sparkContext.parallelize(Seq(path), 1).
flatMap({fileName =>
val reader = new RootFileReader(new java.io.File(fileName))
// get the TTree
/* PROBLEM !!! */
val rootTree =
// findTree(reader)
if (treeName == null) findTree(reader)
else reader.getKey(treeName).getObject.asInstanceOf[TTree]
new RootTreeIterator(rootTree, arrangeStreamers(reader),
requiredColumns, filters)
})
println("Done building Scan")
r
}
}
}
PROBLEM identifies where the issue happens. treeName is a val that gets injected into the class thru the constructor. The lambda that uses it is supposed to be executed on the slave and I do need to send the treeName - serialize it. I would like to understand why exactly the code snippet below causes this NotSerializableException. I know for sure that without treeName in it, it works just fine
val rootTree =
// findTree(reader)
if (treeName == null) findTree(reader)
else reader.getKey(treeName).getObject.asInstanceOf[TTree]
Below is the Stack trace
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2056)
at org.apache.spark.rdd.RDD$$anonfun$flatMap$1.apply(RDD.scala:375)
at org.apache.spark.rdd.RDD$$anonfun$flatMap$1.apply(RDD.scala:374)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.flatMap(RDD.scala:374)
at org.dianahep.sparkroot.package$RootTableScan.buildScan(sparkroot.scala:95)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$8.apply(DataSourceStrategy.scala:260)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$8.apply(DataSourceStrategy.scala:260)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:303)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:302)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:379)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:298)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:256)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:60)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:60)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:61)
at org.apache.spark.sql.execution.SparkPlanner.plan(SparkPlanner.scala:47)
at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:51)
at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1$$anonfun$apply$1.applyOrElse(SparkPlanner.scala:48)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:298)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:321)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:319)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:298)
at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply(SparkPlanner.scala:48)
at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1.apply(SparkPlanner.scala:48)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:78)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:76)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:83)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2572)
at org.apache.spark.sql.Dataset.head(Dataset.scala:1934)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2149)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:239)
at org.apache.spark.sql.Dataset.show(Dataset.scala:526)
at org.apache.spark.sql.Dataset.show(Dataset.scala:486)
at org.apache.spark.sql.Dataset.show(Dataset.scala:495)
... 50 elided
Caused by: java.io.NotSerializableException: org.dianahep.sparkroot.package$RootTableScan
Serialization stack:
- object not serializable (class: org.dianahep.sparkroot.package$RootTableScan, value: org.dianahep.sparkroot.package$RootTableScan#6421e9e7)
- field (class: org.dianahep.sparkroot.package$RootTableScan$$anonfun$1, name: $outer, type: class org.dianahep.sparkroot.package$RootTableScan)
- object (class org.dianahep.sparkroot.package$RootTableScan$$anonfun$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
From the stack I think I can deduce that it tries to serialize my lambda and can not. this lambda should be a closure as we have a val in there that is defined outside of the lambda scope. But I don't understand why this can not be serialized.
Any help would be really appreciated!!!
Thanks a lot!
Any time a scala closure references a class variable, like treeName, then the JVM serializes the parent class along with the closure. Your class RootTableScan is not serializable, though! The solution is to create a local string variable:
// builds a scan
def buildScan(requiredColumns: Array[String], filters: Array[Filter]): RDD[Row] = {
val localTreeName = treeName // this is safe to serialize
// parallelize over all the files
val r = sqlContext.sparkContext.parallelize(Seq(path), 1).
flatMap({fileName =>
val reader = new RootFileReader(new java.io.File(fileName))
// get the TTree
/* PROBLEM !!! */
val rootTree =
// findTree(reader)
if (localTreeName == null) findTree(reader)
else reader.getKey(localTreeName).getObject.asInstanceOf[TTree]
new RootTreeIterator(rootTree, arrangeStreamers(reader),
requiredColumns, filters)
})

NoSuchElementException while trying to read header from excel

I have a class that parsing an xsl file for me and mapping it by the headers of the xsl.
I have another class that is the object of each line of the xsl and is using the headers to know which cell get which attribute by the headers that im mapping before...
from some reason im getting en error NoSuchElementException on one of the headers that is actually there, and there is no typos...it worked before, I dont know whats wrong now.
this is DataSource.scala class (its a trait) that controlling the xsl:
import java.io.File
import com.github.tototoshi.csv.CSVReader
import jxl.{Cell, Workbook}
import scala.collection.mutable
trait DataSource {
def read (fileName: String): Seq[Map[String, String]]
}
object CsvDataSource extends DataSource {
import com.github.tototoshi.csv.CSVFormat
import com.github.tototoshi.csv.Quoting
import com.github.tototoshi.csv.QUOTE_MINIMAL
implicit object VATBoxFormat extends CSVFormat {
val delimiter: Char = '\t'
val quoteChar: Char = '"'
val escapeChar: Char = '"'
val lineTerminator: String = "\r\n"
val quoting: Quoting = QUOTE_MINIMAL
val treatEmptyLineAsNil: Boolean = false
}
override def read(file: String): Seq[Map[String, String]] = {
val reader = CSVReader.open(file, "UTF-16")(VATBoxFormat)
reader.readNext()
val country = reader.readNext().get(5)
reader.readNext()
reader.iteratorWithHeaders.toSeq.map(c => c + ("country" -> country))
}
}
object ExecDataSource extends DataSource {
override def read(file: String): Seq[Map[String, String]] = {
val workbook = Workbook.getWorkbook(new File(file))
val sheet = workbook.getSheet(0)
val rowsUsed: Int = sheet.getRows
val headers = sheet.getRow(3).toList
// println(headers.map(_.getContents))
val country = sheet.getCell(5, 1).getContents
(4 until rowsUsed).map { i =>
val c = headers.zip(sheet.getRow(i)).map{case (k,v) => (k.getContents, v.getContents)}.toMap
c + ("country" -> country)
}
}
}
this is the PurchaseInfo class which is creating an object of each line of the excel:
case class PurchaseInfo(
something1: String,
something2: String,
something3: String,
something4: String) {
}
object PurchaseInfo {
private def changeDateFormat(dateInString: String): String = {
//System.out.println(dateInString)
val formatter: SimpleDateFormat = new SimpleDateFormat("MMM dd, yyyy")
val formatter2: SimpleDateFormat = new SimpleDateFormat("dd/MM/yyyy")
val date: Date = formatter.parse(dateInString)
return formatter2.format(date).toString
}
def fromDataSource (ds: DataSource)(fileName: String): Seq[PurchaseInfo] = {
ds.read(fileName).map { c =>
PurchaseInfo(
something1 = c("Supplier Address Street Number"),
something2 = c("Supplier Address Route"),
something3 = c("Supplier Address Locality"),
something4 = c("Supplier Address Postal Code")
)
}
}
}
(iv cut some of the var's in purchaseInfo to make it shorter for the question)
Now, this is the error im getting while running my code (from a diff class that runs my actions, this is an automation project that I use selenium)
Exception in thread "main" java.util.NoSuchElementException: key not found: Supplier Address Street Number
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at PurchaseInfo$$anonfun$fromDataSource$1.apply(PurchaseInfo.scala:50)
at PurchaseInfo$$anonfun$fromDataSource$1.apply(PurchaseInfo.scala:48)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at PurchaseInfo$.fromDataSource(PurchaseInfo.scala:48)
at HolandPortal$.main(HolandPortal.scala:22)
at HolandPortal.main(HolandPortal.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Process finished with exit code 1
Does someone can see the issue...? I dont know why he cant find "Supplier Address Street Number", in the xsl I have this header exactly the same :/
thanks

Even trivial serialization examples in Scala don't work. Why?

I am trying the simplest possible serialization examples of a class:
#serializable class Person(age:Int) {}
val fred = new Person(45)
import java.io._
val out = new ObjectOutputStream(new FileOutputStream("test.obj"))
out.writeObject(fred)
out.close()
This throws exception "java.io.NotSerializableException: Main$$anon$1$Person" on me. Why?
Is there a simple serialization example?
I also tried
#serializable class Person(nm:String) {
private val name:String=nm
}
val fred = new Person("Fred")
...
and tried to remove #serializable and some other permutations. The file "test.obj" is created, over 2Kb in size and has plausible contents.
EDIT:
Reading the "test.obj" back in (from the 2nd answer below) causes
Welcome to Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM,
Java 1.7.0_51). Type in expressions to have them evaluated. Type :help
for more information.
scala> import java.io._ import java.io._
scala> val fis = new FileInputStream( "test.obj" ) fis:
java.io.FileInputStream = java.io.FileInputStream#716ad1b3
scala> val oin = new ObjectInputStream( fis ) oin:
java.io.ObjectInputStream = java.io.ObjectInputStream#1f927f0a
scala> val p= oin.readObject java.io.WriteAbortedException: writing
aborted; java.io.NotSerializableException: Main$$anon$1
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1354)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at .(:12)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:756)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:801)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:713)
at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:577)
at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:584)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:587)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:878)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:833)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:833)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:833)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:83)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala) Caused
by: java.io.NotSerializableException: Main$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at Main$$anon$1.(a.scala:11)
at Main$.main(a.scala:1)
at Main.main(a.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:139)
at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:71)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:139)
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:28)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:45)
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:35)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:45)
at scala.tools.nsc.ScriptRunner.scala$tools$nsc$ScriptRunner$$runCompiled(ScriptRunner.scala:171)
at scala.tools.nsc.ScriptRunner$$anonfun$runScript$1.apply(ScriptRunner.scala:188)
at scala.tools.nsc.ScriptRunner$$anonfun$runScript$1.apply(ScriptRunner.scala:188)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply$mcZ$sp(ScriptRunner.scala:157)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:131)
at scala.tools.nsc.ScriptRunner$$anonfun$withCompiledScript$1.apply(ScriptRunner.scala:131)
at scala.tools.nsc.util.package$.trackingThreads(package.scala:51)
at scala.tools.nsc.util.package$.waitingForThreads(package.scala:35)
at scala.tools.nsc.ScriptRunner.withCompiledScript(ScriptRunner.scala:130)
at scala.tools.nsc.ScriptRunner.runScript(ScriptRunner.scala:188)
at scala.tools.nsc.ScriptRunner.runScriptAndCatch(ScriptRunner.scala:201)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:76)
... 3 more
Note that #serializable scaladoc tells that it is deprecated since 2.9.0:
Deprecated (Since version 2.9.0) instead of #serializable class C, use class C extends Serializable
So you just have to use Serializable trait:
class Person(val age: Int) extends Serializable
This works for me (type :paste in REPL and paste these lines):
import java.io.{ObjectOutputStream, ObjectInputStream}
import java.io.{FileOutputStream, FileInputStream}
class Person(val age: Int) extends Serializable {
override def toString = s"Person($age)"
}
val os = new ObjectOutputStream(new FileOutputStream("/tmp/example.dat"))
os.writeObject(new Person(22))
os.close()
val is = new ObjectInputStream(new FileInputStream("/tmp/example.dat"))
val obj = is.readObject()
is.close()
obj
This is the output:
// Exiting paste mode, now interpreting.
import java.io.{ObjectOutputStream, ObjectInputStream}
import java.io.{FileOutputStream, FileInputStream}
defined class Person
os: java.io.ObjectOutputStream = java.io.ObjectOutputStream#5126abfd
is: java.io.ObjectInputStream = java.io.ObjectInputStream#41e598aa
obj: Object = Person(22)
res8: Object = Person(22)
So, you can see, the [de]serialization attempt was successful.
Edit (on why you're getting NotSerializableException when you run Scala script from file)
I've put my code into a file and tried to run it via scala test.scala and got exactly the same error as you. Here is my speculation on why it happens.
According to the stack trace a weird class Main$$anon$1 is not serializable. Logical question is: why it is there in the first place? We're trying to serialize Person after all, not something weird.
Scala script is special in that it is implicitly wrapped into an object called Main. This is indicated by the stack trace:
at Main$$anon$1.<init>(test.scala:9)
at Main$.main(test.scala:1)
at Main.main(test.scala)
The names here suggest that Main.main static method is the program entry point, and this method delegates to Main$.main instance method (object's class is named after the object but with $ appended). This instance method in turn tries to create an instance of a class Main$$anon$1. As far as I remember, anonymous classes are named that way.
Now, let's try to find exact Person class name (run this as Scala script):
class Person(val age: Int) extends Serializable {
override def toString = s"Person($age)"
}
println(new Person(22).getClass)
This prints something I was expecting:
class Main$$anon$1$Person
This means that Person is not a top-level class; instead it is a nested class defined in the anonymous class generated by the compiler! So in fact we have something like this:
object Main {
def main(args: Array[String]) {
new { // this is where Main$$anon$1 is generated, and the following code is its constructor body
class Person(val age: Int) extends Serializable { ... }
// all other definitions
}
}
}
But in Scala all nested classes are something called "nested non-static" (or "inner") classes in Java. This means that these classes always contain an implicit reference to an instance of their enclosing class. In this case, enclosing class is Main$$anon$1. Because of that when Java serializer tries to serialize Person, it transitively encounters an instance of Main$$anon$1 and tries to serialize it, but since it is not Serializable, the process fails. BTW, serializing non-static inner classes is a notorious thing in Java world, it is known to cause problems like this one.
As for why it works in REPL, it seems that in REPL declared classes somehow do not end up as inner ones, so they don't have any implicit fields. Hence serialization works normally for them.
You could use the Serializable Trait:
Trivial Serialization example using Java Serialization with the Serializable Trait:
case class Person(age: Int) extends Serializable
Usage:
Serialization, Write Object
val fos = new FileOutputStream( "person.serializedObject" )
val o = new ObjectOutputStream( fos )
o writeObject Person(31)
Deserialization, Read Object
val fis = new FileInputStream( "person.serializedObject" )
val oin = new ObjectInputStream( fis )
val p= oin.readObject
Which creates following output
fis: java.io.FileInputStream = java.io.FileInputStream#43a2bc95
oin: java.io.ObjectInputStream = java.io.ObjectInputStream#710afce3
p: Object = Person(31)
As you see the deserialization can't infer the Object Type itself, which is a clear drawback.
Serialization with Scala-Pickling
https://github.com/scala/pickling or part of the Standard-Distribution starting with Scala 2.11
In the exmple code the object is not written to a file and JSON is used instead of ByteCode Serialization which avoids certain problems originating in byte code incompatibilities between different Scala version.
import scala.pickling._
import json._
case class Person(age: Int)
val person = Person(31)
val pickledPerson = person.pickle
val unpickledPerson = pickledPerson.unpickle[Person]
class Person(age:Int) {} is equivalent to the Java code:
class Person{
Person(Int age){}
}
which is probably not what you want. Note that the parameter age is simply discarded and Person has no member fields.
You want either:
#serializable case class Person(age:Int)
#serializable class Person(val age:Int)
You can leave out the empty curly brackets at the end. In fact, it's encouraged.