Play 2.2 Scala class serializable for cache - scala

I have one scala class from play-mongojack example. It works fine. However, when I tried to save it in Play ehcache, it throws NotSerializableException. How can I make this class serializable?
class BlogPost(#ObjectId #Id val id: String,
#BeanProperty #JsonProperty("date") val date: Date,
#BeanProperty #JsonProperty("title") val title: String,
#BeanProperty #JsonProperty("author") val author: String,
#BeanProperty #JsonProperty("content") val content: String) {
#ObjectId #Id #BeanProperty var blogId: String = _
#BeanProperty #JsonProperty("uploadedFile") var uploadedFile: Option[(String, String, Long)] = None
}
object BlogPost {
def apply(
date: Date,
title: String,
author: String,
content: String): BlogPost = new BlogPost(date,title,author,content)
def unapply(e: Event) =
new Some((e.messageId,
e.date,
e.title,
e.author,
e.content,
e.blogId,
e.uploadedFile) )
private lazy val db = MongoDB.collection("blogposts", classOf[BlogPost], classOf[String])
def save(blogPost: BlogPost) { db.save(blogPost) }
def findByAuthor(author: String) = db.find().is("author", author).asScala
}
Saving to cache:
var latestBlogs = List[BlogPost]()
Cache.set("latestBlogs", latestBlogs, 30)
It throws an exception:
[error] n.s.e.s.d.DiskStorageFactory - Disk Write of latestBlogs failed:
java.io.NotSerializableException: BlogPost
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) ~[na:1.7.0_45]
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) ~[na:1.7.0_45]
at java.util.ArrayList.writeObject(ArrayList.java:742) ~[na:1.7.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_45]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_45]
EDIT 1:
I tried to extends the object with Serializable, it doesn't work.
object BlogPost extends Serializable {}
EDIT 2:
The vitalii's comment works for me.
class BlogPost() extends scala.Serializable {}

Try to derive class BlogPost from Serializable or define it as a case class which are serializable by default.

Related

Task not serializable after adding it to ForEachPartition

I am receiving a task not serializable exception in spark when attempting to implement an Apache pulsar Sink in spark structured streaming.
I have already attempted to extrapolate the PulsarConfig to a separate class and call this within the .foreachPartition lambda function which I normally do for JDBC connections and other systems I integrate into spark structured streaming like shown below:
PulsarSink Class
class PulsarSink(
sqlContext: SQLContext,
parameters: Map[String, String],
partitionColumns: Seq[String],
outputMode: OutputMode) extends Sink{
override def addBatch(batchId: Long, data: DataFrame): Unit = {
data.toJSON.foreachPartition( partition => {
val pulsarConfig = new PulsarConfig(parameters).client
val producer = pulsarConfig.newProducer(Schema.STRING)
.topic(parameters.get("topic").get)
.compressionType(CompressionType.LZ4)
.sendTimeout(0, TimeUnit.SECONDS)
.create
partition.foreach(rec => producer.send(rec))
producer.flush()
})
}
PulsarConfig Class
class PulsarConfig(parameters: Map[String, String]) {
def client(): PulsarClient = {
import scala.collection.JavaConverters._
if(!parameters.get("tlscert").isEmpty && !parameters.get("tlskey").isEmpty) {
val tlsAuthMap = Map("tlsCertFile" -> parameters.get("tlscert").get,
"tlsKeyFile" -> parameters.get("tlskey").get).asJava
val tlsAuth: Authentication = AuthenticationFactory.create(classOf[AuthenticationTls].getName, tlsAuthMap)
PulsarClient.builder
.serviceUrl(parameters.get("broker").get)
.tlsTrustCertsFilePath(parameters.get("tlscert").get)
.authentication(tlsAuth)
.enableTlsHostnameVerification(false)
.allowTlsInsecureConnection(true)
.build
}
else{
PulsarClient.builder
.serviceUrl(parameters.get("broker").get)
.enableTlsHostnameVerification(false)
.allowTlsInsecureConnection(true)
.build
}
}
}
The error message I receive is the following:
ERROR StreamExecution: Query [id = 12c715c2-2d62-4523-a37a-4555995ccb74, runId = d409c0db-7078-4654-b0ce-96e46dfb322c] terminated with error
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:340)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:330)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:156)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2294)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:925)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:924)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:924)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply$mcV$sp(Dataset.scala:2341)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2341)
at org.apache.spark.sql.Dataset$$anonfun$foreachPartition$1.apply(Dataset.scala:2341)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2828)
at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2340)
at org.apache.spark.datamediation.impl.sink.PulsarSink.addBatch(PulsarSink.scala:20)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply$mcV$sp(StreamExecution.scala:666)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply(StreamExecution.scala:666)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply(StreamExecution.scala:666)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch(StreamExecution.scala:665)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:306)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:290)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
Caused by: java.io.NotSerializableException: org.apache.spark.datamediation.impl.sink.PulsarSink
Serialization stack:
- object not serializable (class: org.apache.spark.datamediation.impl.sink.PulsarSink, value: org.apache.spark.datamediation.impl.sink.PulsarSink#38813f43)
- field (class: org.apache.spark.datamediation.impl.sink.PulsarSink$$anonfun$addBatch$1, name: $outer, type: class org.apache.spark.datamediation.impl.sink.PulsarSink)
- object (class org.apache.spark.datamediation.impl.sink.PulsarSink$$anonfun$addBatch$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:337)
... 31 more
Values used in "foreachPartition" can be reassigned from class level to function variables:
override def addBatch(batchId: Long, data: DataFrame): Unit = {
val parametersLocal = parameters
data.toJSON.foreachPartition( partition => {
val pulsarConfig = new PulsarConfig(parametersLocal).client

solr add document error

I am trying to load CSV file to solr doc, i am trying using scala. I am new to scala. For case class structure, if i pass one set of values it works fine. But if i want to want all read values from CSV, it gives an error. I am not sure how to do it in scala, any help greatly appreciated.
object BasicParseCsv {
case class Person(id: String, name: String,age: String, addr: String )
val schema = ArrayBuffer[Person]()
def main(args: Array[String]) {
val master = args(0)
val inputFile = args(1)
val outputFile = args(2)
val sc = new SparkContext(master, "BasicParseCsv", System.getenv("SPARK_HOME"))
val params = new ModifiableSolrParams
val Solr = new HttpSolrServer("http://localhost:8983/solr/person1")
//Preparing the Solr document
val doc = new SolrInputDocument()
val input = sc.textFile(inputFile)
val result = input.map{ line =>
val reader = new CSVReader(new StringReader(line));
reader.readNext();
}
def getSolrDocument(person: Person): SolrInputDocument = {
val document = new SolrInputDocument()
document.addField("id",person.id)
document.addField("name", person.name)
document.addField("age",person.age)
document.addField("addr", person.addr)
document
}
def send(persons:List[Person]){
persons.foreach(person=>Solr.add(getSolrDocument(person)))
Solr.commit()
}
val people = result.map(x => Person(x(0), x(1),x(2),x(3)))
val book1 = new Person("101","xxx","20","abcd")
send(List(book1))
people.map(person => send(List(Person(person.id, person.name, person.age,person.addr))))
System.out.println("Documents added")
}
}
people.map(person => send(List(Person(person.id, person.name, person.age,person.addr)))) ==> gives error
val book1 = new Person("101","xxx","20","abcd") ==> works fine
Update : I get below error
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2067)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:323)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.map(RDD.scala:323)
at BasicParseCsv$.main(BasicParseCsv.scala:90)
at BasicParseCsv.main(BasicParseCsv.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.io.NotSerializableException: org.apache.http.impl.client.SystemDefaultHttpClient
Serialization stack:
- object not serializable (class: org.apache.http.impl.client.SystemDefaultHttpClient, value: org.apache.http.impl.client.SystemDefaultHttpClient#1dbd580)
- field (class: org.apache.solr.client.solrj.impl.HttpSolrServer, name: httpClient, type: interface org.apache.http.client.HttpClient)
- object (class org.apache.solr.client.solrj.impl.HttpSolrServer, org.apache.solr.client.solrj.impl.HttpSolrServer#17e0827)
- field (class: BasicParseCsv$$anonfun$main$1, name: Solr$1, type: class org.apache.solr.client.solrj.impl.HttpSolrServer)

How to use map resource by a FUNC to cover to DataFrame?

I want to use a FUNC to map a line data from a HDFS , then cover to a DataFrame, but it doesn't work. Please help me as soon as possible .
For example :
case class Kof(UID: String, SITEID: String, MANAGERID: String, ROLES: String, EXTERNALURL: String, EXTERNALID: String, OPTION1: String,
OPTION2: String, OPTION3: String
)
def GetData(argv1: Array[String]): Kof =
{
return Kof(argv1(0), argv1(1),argv1(2), argv1(3),argv1(4),
argv1(5),argv1(6), argv1(7),argv1(8)) }
val textFile2 = sc.textFile("hdfs://hadoop-s3:8020/tmp/mefang/modify.txt").
map(_.split(",")).map(p => {GetData(p)})**toDF** <!-here it break error ->
Exception in thread "main" org.apache.spark.SparkException: Task not serializable

Scala: Xtream complains object not serializable

I have the following case classes defined and i would like to print out ClientData in xml format using xstream.
case class Address(addressLine1: String,
addressLine2: String,
city: String,
provinceCode: String,
country: String,
addressTypeDesc: String) extends Serializable{
}
case class ClientData(title: String,
firstName: String,
lastName: String,
addrList:Option[List[Address]]) extends Serializable{
}
object ex1{
def main(args: Array[String]){
...
...
...
// In below, x is Try[ClientData]
val xstream = new XStream(new DomDriver)
newClientRecord.foreach(x=> if (x.isSuccess) println(xstream.toXML(x.get)))
}
}
And when the program execute the line to print each ClientData in xml format, I am getting the runtime error below. Please help.
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:911)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:910)
at lab9$.main(lab9.scala:63)
at lab9.main(lab9.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.io.NotSerializableException: com.thoughtworks.xstream.XStream
Serialization stack:
- object not serializable (class: com.thoughtworks.xstream.XStream, value: com.thoughtworks.xstream.XStream#51e94b7d)
- field (class: lab9$$anonfun$main$1, name: xstream$1, type: class com.thoughtworks.xstream.XStream)
- object (class lab9$$anonfun$main$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 16 more
It isn't XStream which complains, it's Spark. You need to define xstream variable inside the task:
newClientRecord.foreach { x=>
if (x.isSuccess) {
val xstream = new XStream(new DomDriver)
println(xstream.toXML(x.get))
}
}
if XStream is sufficiently cheap to create;
newClientRecord.foreachPartition { xs =>
val xstream = new XStream(new DomDriver)
xs.foreach { x =>
if (x.isSuccess) {
println(xstream.toXML(x.get))
}
}
}
otherwise.

NoSuchElementException while trying to read header from excel

I have a class that parsing an xsl file for me and mapping it by the headers of the xsl.
I have another class that is the object of each line of the xsl and is using the headers to know which cell get which attribute by the headers that im mapping before...
from some reason im getting en error NoSuchElementException on one of the headers that is actually there, and there is no typos...it worked before, I dont know whats wrong now.
this is DataSource.scala class (its a trait) that controlling the xsl:
import java.io.File
import com.github.tototoshi.csv.CSVReader
import jxl.{Cell, Workbook}
import scala.collection.mutable
trait DataSource {
def read (fileName: String): Seq[Map[String, String]]
}
object CsvDataSource extends DataSource {
import com.github.tototoshi.csv.CSVFormat
import com.github.tototoshi.csv.Quoting
import com.github.tototoshi.csv.QUOTE_MINIMAL
implicit object VATBoxFormat extends CSVFormat {
val delimiter: Char = '\t'
val quoteChar: Char = '"'
val escapeChar: Char = '"'
val lineTerminator: String = "\r\n"
val quoting: Quoting = QUOTE_MINIMAL
val treatEmptyLineAsNil: Boolean = false
}
override def read(file: String): Seq[Map[String, String]] = {
val reader = CSVReader.open(file, "UTF-16")(VATBoxFormat)
reader.readNext()
val country = reader.readNext().get(5)
reader.readNext()
reader.iteratorWithHeaders.toSeq.map(c => c + ("country" -> country))
}
}
object ExecDataSource extends DataSource {
override def read(file: String): Seq[Map[String, String]] = {
val workbook = Workbook.getWorkbook(new File(file))
val sheet = workbook.getSheet(0)
val rowsUsed: Int = sheet.getRows
val headers = sheet.getRow(3).toList
// println(headers.map(_.getContents))
val country = sheet.getCell(5, 1).getContents
(4 until rowsUsed).map { i =>
val c = headers.zip(sheet.getRow(i)).map{case (k,v) => (k.getContents, v.getContents)}.toMap
c + ("country" -> country)
}
}
}
this is the PurchaseInfo class which is creating an object of each line of the excel:
case class PurchaseInfo(
something1: String,
something2: String,
something3: String,
something4: String) {
}
object PurchaseInfo {
private def changeDateFormat(dateInString: String): String = {
//System.out.println(dateInString)
val formatter: SimpleDateFormat = new SimpleDateFormat("MMM dd, yyyy")
val formatter2: SimpleDateFormat = new SimpleDateFormat("dd/MM/yyyy")
val date: Date = formatter.parse(dateInString)
return formatter2.format(date).toString
}
def fromDataSource (ds: DataSource)(fileName: String): Seq[PurchaseInfo] = {
ds.read(fileName).map { c =>
PurchaseInfo(
something1 = c("Supplier Address Street Number"),
something2 = c("Supplier Address Route"),
something3 = c("Supplier Address Locality"),
something4 = c("Supplier Address Postal Code")
)
}
}
}
(iv cut some of the var's in purchaseInfo to make it shorter for the question)
Now, this is the error im getting while running my code (from a diff class that runs my actions, this is an automation project that I use selenium)
Exception in thread "main" java.util.NoSuchElementException: key not found: Supplier Address Street Number
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at PurchaseInfo$$anonfun$fromDataSource$1.apply(PurchaseInfo.scala:50)
at PurchaseInfo$$anonfun$fromDataSource$1.apply(PurchaseInfo.scala:48)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at PurchaseInfo$.fromDataSource(PurchaseInfo.scala:48)
at HolandPortal$.main(HolandPortal.scala:22)
at HolandPortal.main(HolandPortal.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Process finished with exit code 1
Does someone can see the issue...? I dont know why he cant find "Supplier Address Street Number", in the xsl I have this header exactly the same :/
thanks