Deserialization of json using jackson in scala - scala

I am trying to de-serialize the below json string to scala object using jackson json api
{ "Domain1": { "data-file": "dataFile1", "filter": {
"affected-object": "AffectedObject1", "affected-nd":
"AffectedNd1" } }, "Domain2": { "data-file": "dataFile2",
"filter": { "affected-ci": "AffectedCI2", "affected-net":
"AffectedNet2" } } }
I tried to use case class and tried first using "ClassOf" in ValueType of "readValue" Method but the output is Map of Map Object. Data is not converted into Case Class Object.
case class CrossDomainFilterObj(#JsonProperty("data-file")dataFile: String,
#JsonProperty("filter")filter: Map[String,String])
val jsonString = "{\"Domain1\": {\"data-file\": \"dataFile1\", \"filter\": { \"affected-object\":
\"AffectedObject1\", \"affected-nd\" : \"AffectedNd1\"}},\"Domain2\": {\"data-file\":
\"dataFile2\", \"filter\": { \"affected-ci\":\"AffectedCI2\", \"affected-net\" :
\"AffectedNet2\"}}}"
val mapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)
val data = mapper.readValue(jsonString, classOf[Map[String, CrossDomainFilterObj]])
println(data)
I am getting output like below
Map(Domain1 -> Map(data-file -> dataFile1, filter -> Map(affected-object ->
AffectedObject1, affected-nd -> AffectedNd1)), Domain2 -> Map(data-file ->
dataFile2, filter -> Map(affected-ci -> AffectedCI2,
affected-net -> AffectedNet2)))
But I am expecting an output like below
Map(Domain1 -> CrossDomainFilterObj(dataFile1, Map(affected-object ->
AffectedObject1, affected-nd -> AffectedNd1)), Domain2 ->
CrossDomainFilterObj(dataFile2, Map(affected-ci ->
AffectedCI2, affected-net -> AffectedNet2)))
Then i tried using TypeReference as ValueType as shown below,
case class CrossDomainFilterObj(#JsonProperty("data-file")dataFile: String,
#JsonProperty("filter")filter: Map[String,String])
val jsonString = "{\"Domain1\": {\"data-file\": \"dataFile1\", \"filter\": { \"affected-object\":
\"AffectedObject1\", \"affected-nd\" : \"AffectedNd1\"}},\"Domain2\": {\"data-file\":
\"dataFile2\", \"filter\": { \"affected-ci\":\"AffectedCI2\", \"affected-net\" :
\"AffectedNet2\"}}}"
val mapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)
val reference = new TypeReference[Map[String, CrossDomainFilterObj]] {}
val data = mapper.readValue(jsonString, reference)
println(data)
I am getting error like below
dead code following this construct
"val data = mapper.readValue(jsonString, reference)"
Could someone help to identify what I am doing wrong here.

Just make sure you use ScalaObjectMapper:
val mapper = new ObjectMapper() with ScalaObjectMapper
Then this should work:
val data = mapper.readValue[Map[String, CrossDomainFilterObj]](jsonString)

Related

Map JSON String to JsonObject

I have a JSON string like
{
"key1": "value1",
"definition": {
// JSON content here
}
}
"definition" key in JSON can contains JSONArray, JSONObject.
For example, it can have
"key2" : ""
or
"key2" : {}
or
"key2" : []
To accommodate this, I have created corresponding Scala class like
import com.google.gson.JsonObject
class JsonData {
var key1: String = _
var definition: JsonObject = _
}
While mapping JSON string to class JsonData, I am getting "definition" in JsonData instance as empty.
Sample code:
import com.fasterxml.jackson.databind.{DeserializationFeature, ObjectMapper}
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
import com.google.gson.JsonObject
object TestMe {
val mapper = new ObjectMapper with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
def main(args: Array[String]): Unit = {
val jsonString = "{\"key1\": \"value1\",\"definition\": {\"key2\" : \"abc\"}}"
val data = mapper.readValue[JsonData](jsonString)
println(data.definition.getAsJsonObject()) //empty
val jsonString1 = "{\"key1\": \"value1\",\"definition\": {\"key2\" : {\"key3\" : \"\"}}}"
val data1 = mapper.readValue[JsonData](jsonString1)
println(data1.definition.getAsJsonObject()) //empty
val jsonString2 = "{\"key1\": \"value1\",\"definition\": {\"key2\" : [\"a\",\"b\"]}}"
val data2 = mapper.readValue[JsonData](jsonString2)
println(data2.definition.getAsJsonObject()) //empty
}
class JsonData {
var key1: String = _
var definition: JsonObject = _
}
}
How can I read JSON string and map it to class which has one of its attribute type of JsonObject?
Versions:
Scala : 2.11
Jackson-core = 2.6.x;
Gson = 2.6.x;
Jackson-databind = 2.6.x;
Jackson-module-scala = 2.6.5;
I would use com.fasterxml.jackson.databind.JsonNode instead of using Google's Gson JsonObject class. Using Jackson's own classes should make it pretty trivial.
Although you may just map to a Map[String, Any] instead for this kind of flexibility, unless you really need it to still be in Json.

Returning the custom return type upon exception in scala

I am trying to read xml using the following method to extract data from xml
def xmlparser(xml:String): (String,List[String]) =
Try {
val documentbuilder=DocumentBuilderFactory.newInstance.newDocumentBuilder
val xmldocument = documentbuilder.parse(new InputSource(new java.io.StringReader(xml)))
val nodesofchild=xmldocument.getChildNodes
val xmlvalues=extractvalues(nodesofchild)
("xmlname",xmlvalues)
}
I need to return ("xmlname",xmlvalues) if xml is valid ,else i need to return ("xmlname",null).I tried using ".toOption.orNull" but it is returning only "null".Could somebody help me how to return ("xmlname",null) instead of "null"
Instead of your current code:
def xmlparser(xml:String): (String, Option[List[String]]) =
val values = Try {
val documentbuilder=DocumentBuilderFactory.newInstance.newDocumentBuilder
val xmldocument = documentbuilder.parse(new InputSource(new java.io.StringReader(xml)))
val nodesofchild=xmldocument.getChildNodes
val xmlvalues=extractvalues(nodesofchild)
}
("xmlname", xmlvalues.toOption)
}

Scala : Write to a file inside foreachRDD

I'm using Spark streaming to process data coming from Kafka. And I would like to write the result in a file (on local). When I print on console everything works fine and I get my results but when I try to write that to a file I get an error.
I use PrintWriter to do that, but I get this error :
Exception in thread "main" java.io.NotSerializableException: DStream checkpointing has been enabled but the DStreams with their functions are not serializable
java.io.PrintWriter
Serialization stack:
- object not serializable (class: java.io.PrintWriter, value: java.io.PrintWriter#20f6f88c)
- field (class: streaming.followProduction$$anonfun$main$1, name: qualityWriter$1, type: class java.io.PrintWriter)
- object (class streaming.followProduction$$anonfun$main$1, <function1>)
- field (class: streaming.followProduction$$anonfun$main$1$$anonfun$apply$1, name: $outer, type: class streaming.followProduction$$anonfun$main$1)
- object (class streaming.followProduction$$anonfun$main$1$$anonfun$apply$1, <function1>)
- field (class: org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, name: cleanedF$1, type: interface scala.Function1)
- object (class org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, <function2>)
- writeObject data (class: org.apache.spark.streaming.dstream.DStreamCheckpointData)
- object (class org.apache.spark.streaming.kafka010.DirectKafkaInputDStream$DirectKafkaInputDStreamCheckpointData,
I guess I can't use the writer like this inside the ForeachRDD !
Here is my code :
object followProduction extends Serializable {
def main(args: Array[String]) = {
val qualityWriter = new PrintWriter(new File("diskQuality.txt"))
qualityWriter.append("dateTime , quality , status \n")
val sparkConf = new SparkConf().setMaster("spark://address:7077").setAppName("followProcess").set("spark.streaming.concurrentJobs", "4")
val sc = new StreamingContext(sparkConf, Seconds(10))
sc.checkpoint("checkpoint")
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "address:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> s"${UUID.randomUUID().toString}",
"auto.offset.reset" -> "earliest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("A", "C")
topics.foreach(t => {
val stream = KafkaUtils.createDirectStream[String, String](
sc,
PreferConsistent,
Subscribe[String, String](Array(t), kafkaParams)
)
stream.foreachRDD(rdd => {
rdd.collect().foreach(i => {
val record = i.value()
val newCsvRecord = process(t, record)
println(newCsvRecord)
qualityWriter.append(newCsvRecord)
})
})
})
qualityWriter.close()
sc.start()
sc.awaitTermination()
}
var componentQuantity: componentQuantity = new componentQuantity("", 0.0, 0.0, 0.0)
var diskQuality: diskQuality = new diskQuality("", 0.0)
def process(topic: String, record: String): String = topic match {
case "A" => componentQuantity.checkQuantity(record)
case "C" => diskQuality.followQuality(record)
}
}
I have this class I am calling :
case class diskQuality(datetime: String, quality: Double) extends Serializable {
def followQuality(record: String): String = {
val dateFormat: SimpleDateFormat = new SimpleDateFormat("dd-mm-yyyy hh:mm:ss")
var recQuality = msgParse(record).quality
var date: Date = dateFormat.parse(msgParse(record).datetime)
var recDateTime = new SimpleDateFormat("dd-mm-yyyy hh:mm:ss").format(date)
// some operations here
return recDateTime + " , " + recQuality
}
def msgParse(value: String): diskQuality = {
import org.json4s._
import org.json4s.native.JsonMethods._
implicit val formats = DefaultFormats
val res = parse(value).extract[diskQuality]
return res
}
}
How can I achieve this ? I'm new to both Spark and Scala so maybe I'm not doing things right.
Thank you for your time
EDIT :
I've changed My code and I don't get this error anymore. But at the same time, I have only the first line in my file and the records are not appended. The writer (handleWriter) inside is actually not working.
Here is my code :
stream.foreachRDD(rdd => {
val qualityWriter = new PrintWriter(file)
qualityWriter.write("dateTime , quality , status \n")
qualityWriter.close()
rdd.collect().foreach(i =>
{
val record = i.value()
val newCsvRecord = process(topic , record)
val handleWriter = new PrintWriter(file)
handleWriter.append(newCsvRecord)
handleWriter.close()
println(newCsvRecord)
})
})
Where did I miss ? Maybe I'm doing this wrong ...
The PrintWriter is a local resource, bound to a single machine and cannot be serialized.
To remove this object from the Java serialization plan, we can declare it #transient. That means that a serialization form of the followProduction object will not attempt to serialize this field.
In the code of the question, it should be declared as:
#transient val qualityWriter = new PrintWriter(new File("diskQuality.txt"))
Then it becomes possible to use it within the foreachRDD closure.
But, this process does not solve issues that have to do with the proper handling of the file. The qualityWriter.close() will be executed on the first pass of the streaming job and the file descriptor will be closed for writing during the execution of the job. To properly use local resources, such as a File, I would follow Yuval suggestion to recreate the PrintWriter within the foreachRDD closure. The missing piece is declaring the new PrintWritter in append mode. The modified code within the foreachRDD will look like this (making some additional code changes):
// Initialization phase
val qualityWriter = new PrintWriter(new File("diskQuality.txt"))
qualityWriter.println("dateTime , quality , status")
qualityWriter.close()
....
dstream.foreachRDD{ rdd =>
val data = rdd.map(e => e.value())
.collect() // get the data locally
.map(i=> process(topic , i)) // create csv records
val allRecords = data.mkString("\n") // why do I/O if we can do in-mem?
val handleWriter = new PrintWriter(file, append=true)
handleWriter.append(allRecords)
handleWriter.close()
}
Few notes about the code in the question:
"spark.streaming.concurrentJobs", "4"
This will create an issue with multiple threads writing to the same local file. It's probably also being misused in this context.
sc.checkpoint("checkpoint")
There seems to be no need for checkpointing on this job.
The simplest thing to do would be to create the instance of PrintWriter inside foreachRDD, which means it wouldn't be captured by the function closure:
stream.foreachRDD(rdd => {
val qualityWriter = new PrintWriter(new File("diskQuality.txt"))
qualityWriter.append("dateTime , quality , status \n")
rdd.collect().foreach(i => {
val record = i.value()
val newCsvRecord = process(t, record)
qualityWriter.append(newCsvRecord)
})
})
})

playframework 2.4 - Unspecified value parameter headers error

I am upgrading playframework 2.4 from 2.3, I changed versions then if I compile same code, I see following error. Since I am novice at Scala, I am trying to learn Scala to solve this issue but still don't know what is the problem. What I want to do is adding a request header value from original request headers. Any help will be appreciated.
[error] /mnt/garner/project/app-service/app/com/company/playframework/filters/LoggingFilter.scala:26: not enough arguments for constructor Headers: (headers: Seq[(String, String)])play.api.mvc.Headers.
[error] Unspecified value parameter headers.
[error] val newHeaders = new Headers { val data = (requestHeader.headers.toMap
The LoggingFilter class
class LoggingFilter extends Filter {
val logger = AccessLogger.getInstance();
def apply(next: (RequestHeader) => Future[Result])(requestHeader: RequestHeader): Future[Result] = {
val startTime = System.currentTimeMillis
val requestId = logger.createLog();
val newHeaders = new Headers { val data = (requestHeader.headers.toMap
+ (AccessLogger.X_HEADER__REQUEST_ID -> Seq(requestId))).toList }
val newRequestHeader = requestHeader.copy(headers = newHeaders)
next(newRequestHeader).map { result =>
val endTime = System.currentTimeMillis
val requestTime = endTime - startTime
val bytesToString: Enumeratee[ Array[Byte], String ] = Enumeratee.map[Array[Byte]]{ bytes => new String(bytes) }
val consume: Iteratee[String,String] = Iteratee.consume[String]()
val resultBody : Future[String] = result.body |>>> bytesToString &>> consume
resultBody.map {
body =>
logger.finish(requestId, result.header.status, requestTime, body)
}
result;
}
}
}
Edit
I updated codes as following and it compiled well
following codes changed
val newHeaders = new Headers { val data = (requestHeader.headers.toMap
+ (AccessLogger.X_HEADER__REQUEST_ID -> Seq(requestId))).toList }
to
val newHeaders = new Headers((requestHeader.headers.toSimpleMap
+ (AccessLogger.X_HEADER__REQUEST_ID -> requestId)).toList)
It simply states that if you want to construct Headers you need to supply a field named headers which is of type Seq[(String, String)]. If you omit the inital new you will be using the apply function of the corresponding object for Headers which will just take a parameter of a vararg of (String, String) and your code should work. If you look at documentation https://www.playframework.com/documentation/2.4.x/api/scala/index.html#play.api.mvc.Headers and flip between the docs for object and class it should become clear.

Why is immutable map size always zero?

Below Scala class parses a file using JDOM and populates the values from the file into a Scala immutable Map. Using the + operator on the Map does not seem to have any effect as the Map is always zero.
import java.io.File
import org.jsoup.nodes.Document
import org.jsoup.Jsoup
import org.jsoup.select.Elements
import org.jsoup.nodes.Element
import scala.collection.immutable.TreeMap
class JdkElementDetail() {
var fileLocation: String = _
def this(fileLocation: String) = {
this()
this.fileLocation = fileLocation;
}
def parseFile : Map[String , String] = {
val jdkElementsMap: Map[String, String] = new TreeMap[String , String];
val input: File = new File(fileLocation);
val doc: Document = Jsoup.parse(input, "UTF-8", "http://example.com/");
val e: Elements = doc.getElementsByAttribute("href");
val href: java.util.Iterator[Element] = e.iterator();
while (href.hasNext()) {
var objectName = href.next();
var hrefValue = objectName.attr("href");
var name = objectName.text();
jdkElementsMap + name -> hrefValue
println("size is "+jdkElementsMap.size)
}
jdkElementsMap
}
}
println("size is "+jdkElementsMap.size) always prints "size is 0"
Why is the size always zero, am I not adding to the Map correctly?
Is the only fix for this to convert jdkElementsMap to a var and then use the following?
jdkElementsMap += name -> hrefValue
Removing the while loop here is my updated object:
package com.parse
import java.io.File
import org.jsoup.nodes.Document
import org.jsoup.Jsoup
import org.jsoup.select.Elements
import org.jsoup.nodes.Element
import scala.collection.immutable.TreeMap
import scala.collection.JavaConverters._
class JdkElementDetail() {
var fileLocation: String = _
def this(fileLocation: String) = {
this()
this.fileLocation = fileLocation;
}
def parseFile : Map[String , String] = {
var jdkElementsMap: Map[String, String] = new TreeMap[String , String];
val input: File = new File(fileLocation);
val doc: Document = Jsoup.parse(input, "UTF-8", "http://example.com/");
val elements: Elements = doc.getElementsByAttribute("href");
val elementsScalaIterator = elements.iterator().asScala
elementsScalaIterator.foreach {
keyVal => {
var hrefValue = keyVal.attr("href");
var name = keyVal.text();
println("size is "+jdkElementsMap.size)
jdkElementsMap += name -> hrefValue
}
}
jdkElementsMap
}
}
Immutable data structures -- be they lists or maps -- are just that: immutable. You don't ever change them, you create new data structures based on changes to the old ones.
If you do val x = jdkElementsMap + (name -> hrefValue), then you'll get the new map on x, while jdkElementsMap continues to be the same.
If you change jdkElementsMap into a var, then you could do jdkEleemntsMap = jdkElementsMap + (name -> hrefValue), or just jdkElementsMap += (name -> hrefValue). The latter will also work for mutable maps.
Is that the only way? No, but you have to let go of while loops to achieve the same thing. You could replace these lines:
val href: java.util.Iterator[Element] = e.iterator();
while (href.hasNext()) {
var objectName = href.next();
var hrefValue = objectName.attr("href");
var name = objectName.text();
jdkElementsMap + name -> hrefValue
println("size is "+jdkElementsMap.size)
}
jdkElementsMap
With a fold, such as in:
import scala.collection.JavaConverters.asScalaIteratorConverter
e.iterator().asScala.foldLeft(jdkElementsMap) {
case (accumulator, href) => // href here is not an iterator
val objectName = href
val hrefValue = objectName.attr("href")
val name = objectName.text()
val newAccumulator = accumulator + (name -> hrefValue)
println("size is "+newAccumulator.size)
newAccumulator
}
Or with recursion:
def createMap(hrefIterator: java.util.Iterator[Element],
jdkElementsMap: Map[String, String]): Map[String, String] = {
if (hrefIterator.hasNext()) {
val objectName = hrefIterator.next()
val hrefValue = objectName.attr("href")
val name = objectName.text()
val newMap = jdkElementsMap + name -> hrefValue
println("size is "+newMap.size)
createMap(hrefIterator, newMap)
} else {
jdkElementsMap
}
}
createMap(e.iterator(), new TreeMap[String, String])
Performance-wise, the fold will be rather slower, and the recursion should be very slightly faster.
Mind you, Scala does provide mutable maps, and not just to be able to say it has them: if they fit better you problem, then go ahead and use them! If you want to learn how to use the immutable ones, then the two approaches above are the ones you should learn.
The map is immutable, so any modifications will return the modified map. jdkElementsMap + (name -> hrefValue) returns a new map containing the new pair, but you are discarding the modified map after it is created.
EDIT: It looks like you can convert Java iterables to Scala iterables, so you can then fold over the resulting sequence and accumulate a map:
import scala.collection.JavaConverters._
val e: Elements = doc.getElementsByAttribute("href");
val jdkElementsMap = e.asScala
.foldLeft(new TreeMap[String , String])((map, href) => map + (href.text() -> href.attr("href"))
if you don't care what kind of map you create you can use toMap:
val jdkElementsMap = e.asScala
.map(href => (href.text(), href.attr("href")))
.toMap