Scala Object Mapper write incorrect value - scala

I have an complex object which contains another object. I want to write it as yaml file.I am using scala with ObjectMapper . My final object looks like this
Configuration(Some(List(s3://etl/configuration/minioApp/23-08-2021/metrics.yaml)),Some(Map(inputDataFrame -> Input(None,Some(Kafka(List(uguyhi, ytvvt),Some(hgvgvugb),Some(ytfytvi),Some(yftug),None,None,None,Some( uyguytf)))))),None,Some(Output(None,Some(Kafka(null,Some(yrdryft),None)))),None,None,None,None,None,None,None,None,Some(minioApp),None,None,None,None)
I want to write it to file. My main case class looks like this
case class Configuration(#BeanProperty var metrics: Option[Seq[String]],
#BeanProperty var inputs: Option[Map[String, Input]],
#BeanProperty var variables: Option[Map[String, String]] = None,
#BeanProperty var output: Option[Output] = None,
#BeanProperty var outputs: Option[Map[String, Output]] = None,
#BeanProperty var cacheOnPreview: Option[Boolean] = None,
#BeanProperty var showQuery: Option[Boolean] = None,
#BeanProperty var streaming: Option[Streaming] = None,
#BeanProperty var periodic: Option[Periodic] = None,
#BeanProperty var logLevel: Option[String] = None,
#BeanProperty var showPreviewLines: Option[Int] = None,
#BeanProperty var explain: Option[Boolean] = None,
#BeanProperty var appName: Option[String] = None,
#BeanProperty var continueOnFailedStep: Option[Boolean] = None,
#BeanProperty var cacheCountOnOutput: Option[Boolean] = None,
#BeanProperty var ignoreDeequValidations: Option[Boolean] = None,
#BeanProperty var failedDFLocationPrefix: Option[String] = None) extends Conf
and further input, output all class also have BeanProperty added. When I am trying to write this file using below snipplet.
val objectMapper = new ObjectMapper(new YAMLFactory)
objectMapper.writeValue(new File("/Users/ayush.goyal/Downloads/servingWrapper/src/main/resources/input1.yaml"),jobYamls)
where jobYaml is my object as mentioned above. I am getting below file.
metrics:
empty: false
defined: true
inputs:
empty: false
defined: true
variables:
empty: true
defined: false
output:
empty: false
defined: true
outputs:
empty: true
defined: false
cacheOnPreview:
empty: true
defined: false
showQuery:
empty: true
defined: false
streaming:
empty: true
defined: false
periodic:
empty: true
defined: false
logLevel:
empty: true
defined: false
showPreviewLines:
empty: true
defined: false
explain:
empty: true
defined: false
appName:
empty: false
defined: true
continueOnFailedStep:
empty: true
defined: false
cacheCountOnOutput:
empty: true
defined: false
ignoreDeequValidations:
empty: true
defined: false
failedDFLocationPrefix:
empty: true
defined: false
Now this file has 2 issue
Values of object is not populated
I don't want to populated fields that have null value.
How can I do that?
EDIT:
As per the suggestion, I was abl to get below yaml
---
metrics:
- "s3://etl/configuration/minioApp/23-08-2021/metrics.yaml"
inputs:
inputDataFrame:
file: null
kafka:
servers:
- "uguyhi"
- "ytvvt"
topic: "hgvgvugb"
topicPattern: "ytfytvi"
consumerGroup: "yftug"
options: null
schemaRegistryUrl: "yrftg"
schemaSubject: null
schemaId: " uyguytf"
output:
file: null
kafka:
vservers: null
checkpointLocation: "yrdryft"
compressionType: null
streaming: null
appName: "minioApp"
As you can see in above yaml, Several fields have null value. I don't want to write them. How can I do that? I just don't want those fiels to appear in Yaml.

Jackson doesn't know how to serialize Option as it's a Java library.
You can add the jackson-scala-module library and register the DefaultScalaModule on your ObjectMapper to let it know how to serialize common Scala types.
See https://github.com/FasterXML/jackson-module-scala

Related

Scala spark dataframe parse string field to typesafe Config error

I´m trying to parse a String field from a spark Dataframe to a Config, reading the Dataframe as a specific case class. The objective is to get a new row in the dataframe for each input & output that the config field contains. I'm far from having a solution, but fighting with an error about serializable objects:
config field comes as this:
data {
inputs = [
{
name = "table1"
paths = ["/data/master/table1"]
},
{
name = "table2"
paths = ["/data/master/table2"]
}
]
outputs = [
{
name = "table1"
type = "parquet"
mode = "append"
force = true
path = "/data/master/table1"
},
{
name = "table2"
type = "parquet"
mode = "append"
force = true
path = "/data/master/table2"
}
]
}
Code I'm using:
// case class to read the config field
case class JobConf(name: String, job: String, version: Long, updatedAt: Timestamp, conf: String)
//case class for output dataframe
case class InputOutputJob(name: String, job: String, version: String, path: String,
inputOuput: String)
val jobsDF = spark.read.parquet("path")
jobsDF.map {
job => {
val x = JobConf(job.getString(0), job.getString(1), job.getInt(2), job.getTimestamp(3), job.getString(4))
val c = ConfigFactory.parseString(x.conf, ConfigParseOptions.defaults().setSyntax(ConfigSyntax.PROPERTIES))
val p:Map[String, ConfigValue] = c.root().asScala
//Pending
InputOutputJob( ))
}
}.toDF
The error that comes when try to compile is:
Cause: java.io.NotSerializableException:
Serialization stack:
- object not serializable
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 1)
- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)

Serializing/Deserializing model using spray json scala test

I am new to scala and writing test cases using scala test and spray json. My code is as follows.
case class MyModel(Point1: String,
Point2: String,
Point3: Seq[String],
Point4: Seq[String])
it should "serialise/deserialize a MyModel to JSON" in {
val json= """{"Point1":"","Point3":[],"Point2":"","Point4":[]}""".parseJson
val myModelViaJson= json.convertTo[MyModel]
myModelViaJson.Point1 shouldBe ""
myModelViaJson.Point3.isEmpty shouldBe true
myModelViaJson.Point2 shouldBe ""
myModelViaJson.Point4.isEmpty shouldBe true
}
On doing sbt test I am geting following error
should serialise/deserialize a MyModel to JSON *** FAILED ***
[info] spray.json.DeserializationException: Expected String as JsString, but got []
[info] at spray.json.package$.deserializationError(package.scala:23)
[info] at spray.json.ProductFormats.fromField(ProductFormats.scala:63)
[info] at spray.json.ProductFormats.fromField$(ProductFormats.scala:51)
How to solve this?
Add val myModelViaJson= json.convertTo[MyModel] before parsing.
Refer: jsonformats-for-case-classes
So, the code will look like
val json= """{"Point1":"","Point3":[],"Point2":"","Point4":[]}""".parseJson
implicit val format = jsonFormat4(MyModel)
val myModelViaJson= json.convertTo[MyModel]
myModelViaJson.Point1 shouldBe ""
myModelViaJson.Point3.isEmpty shouldBe true
myModelViaJson.Point2 shouldBe ""
myModelViaJson.Point4.isEmpty shouldBe true

how can i create a custom CellEncoder in kantan

i have a code that converts a list of case class into a csv string, i'm using kantan so when i tried to pass the encoder i have this error:
could not find implicit value for evidence parameter of type kantan.csv.CellEncoder[Option[javax.xml.datatype.XMLGregorianCalendar]]
original date : 2020-08-13T21:52:27.000Z
this is my code:
import kantan.csv._
import kantan.csv.ops._
import kantan.csv.java8._
import kantan.csv.CellEncoder
val itemsList :List[ItemData] = getItems.getOrElse(Seq.empty[ItemData]).toList
implicit val itemEncoder: HeaderEncoder[ItemData] = HeaderEncoder.caseEncoder("absolutePath", "creationDate",
"displayName", "fileName", "lastModified","lastModifier","owner","parentAbsolutePath","typeValue")(ItemData.unapply _)
val csvItems :String =itemsList.asCsv(rfc.withHeader)
the case class :
case class ItemData(absolutePath: Option[String] = None,
creationDate: Option[javax.xml.datatype.XMLGregorianCalendar] = None,
displayName: Option[String] = None,
fileName: Option[String] = None,
lastModified: Option[javax.xml.datatype.XMLGregorianCalendar] = None,
lastModifier: Option[String] = None,
owner: Option[String] = None,
parentAbsolutePath: Option[String] = None,
typeValue: Option[String] = None)
dependencies:
lazy val `kantan-csv` = "com.nrinaudo" %% "kantan.csv" % Version.kantan
lazy val `kantan-csv-commons` = "com.nrinaudo" %% "kantan.csv-commons" % Version.kantan
lazy val `kantan-csv-generic` = "com.nrinaudo" %% "kantan.csv-generic" % Version.kantan
lazy val `kantan-csv-java8` = "com.nrinaudo" %% "kantan.csv-java8" % Version.kantan
I don't actually know much about javax.xml.datatype.XMLGregorianCalendar, so I'm not sure how you'd represent that as a string. This answer assumes it's done by calling toString, but change that to whatever is the correct way of doing so.
You need to provide a CellEncoder[javax.xml.datatype.XMLGregorianCalendar]. This is documented, and fairly straightforward:
implicit val xmlCalendarEncoder: CellEncoder[javax.xml.datatype.XMLGregorianCalendar] = CellEncoder.from(_.toString)
kantan.csv should be able to work out the rest for you.

Read yaml of same nested structure with different properties in scala

I am trying to make a generic function to read yaml files of same nested structure but different properties in scala using snakeYaml. Like one yaml could be
myMap:
-
name: key1
value: value1
-
name: key2
value: value2
Another yaml could be
myMap:
-
name: key1
value: value1
data: data1
-
name: key2
value: value2
data: data2
To read first yaml , I am able to read using below code from here :
class configParamsKeyValue {
#BeanProperty var name: String = null
#BeanProperty var value: String = null
}
class myConfig{
#BeanProperty var myMap= new java.util.ArrayList[configParamsKeyValue]();
}
def loadConfig(filename : String): myConfig = {
val yaml = new Yaml(new Constructor(classOf[myConfig]))
val stream = new FileInputStream(filename)
try {
val obj = yaml.load(stream)
obj.asInstanceOf[myConfig]
} finally {
stream.close()
}
}
I want to be able to pass this type configParamsKeyValue of ArrayList as parameter in class myConfig so that I can read second yaml file as well by defining another class like
class configParamsKeyValueData {
#BeanProperty var name: String = null
#BeanProperty var value: String = null
#BeanProperty var data: String = null
}
Can some body please help

Retrieving list of objects from application.conf

I have the following entry in Play for Scala application.conf:
jobs = [
{number: 0, dir: "/dir1", name: "General" },
{number: 1, dir: "/dir2", name: "Customers" }
]
I want to retrieve this list of objects in a Scala program:
val conf = ConfigFactory.load
val jobs = conf.getAnyRefList("jobs").asScala
println(jobs)
this prints
Buffer({number=0, name=General, dir=/dir1}, {number=1, name=Customers, dir=/dir2})
But how to convert the result to actual Scala objects?
Try this one:
case class Job(number: Int, dir: String, name: String)
object Job {
implicit val configLoader: ConfigLoader[List[Job]] = ConfigLoader(_.getConfigList).map(
_.asScala.toList.map(config =>
Job(
config.getInt("number"),
config.getString("dir"),
config.getString("name")
)
)
)
}
Then from Confugutation DI
Configuration.get[List[Job]]("jobs")
Here is a Config object which will extract data from a config file into a type that you specify.
Usage:
case class Job(number: Int, dir: String, name: String)
val jobs = Config[List[Job]]("jobs")
Code:
import com.typesafe.config._
import org.json4s._
import org.json4s.jackson.JsonMethods._
object Config {
private val conf = ConfigFactory.load()
private val jData = parse(conf.root.render(ConfigRenderOptions.concise))
def apply[T](name: String)(implicit formats: Formats = DefaultFormats, mf: Manifest[T]): T =
Extraction.extract(jData \\ name)(formats, mf)
}
This will throw an exception if the particular config object does not exist or does not match the format of T.