How to dump nested list using SnakeYAML - scala

I'm parsing this yaml file
View:
from : 01.01.2007
to : 04.01.2007
driver : sun.jdbc.odbc.JdbcOdbcDriver
using SnakeYAML in Scala like this:
val stream = getClass.getResourceAsStream("/config_view.yml")
var configMap: Map[String, Any] = new Yaml().load(stream).asInstanceOf[java.util.Map[String, Any]].asScala
var view = configMap("View").asInstanceOf[java.util.LinkedHashMap[String, String]].asScala
view = view + ("from" -> "neu") // some test modifying
and I dump it like this:
val fileWriter = new FileWriter(System.getProperty("user.home") + "\\Desktop\\test.yml")
new Yaml().dump(Map[String, Any]("View" -> view.asJava).asJava, fileWriter)
which saves the new yaml file like this:
View: {driver: sun.jdbc.odbc.JdbcOdbcDriver, from: neu, to: 04.01.2007}
But I want it to save it like this:
View:
driver: sun.jdbc.odbc.JdbcOdbcDriver
from: neu
to: 04.01.2007
How can I tell SnakeYAML to save it in the desired format you see above?

By default SnakeYAML uses the DumperOptions.FlowStyle.FLOW but it can be changed to the DumperOptions.FlowStyle.BLOCK that will dump the data with the desired format.
An example in Kotlin:
val options = DumperOptions()
options.indent = 2
options.defaultFlowStyle = DumperOptions.FlowStyle.BLOCK
Yaml(options).dump(yourObject)

How about manually handling the indentation and key: value formatting:
view.map{ case (k,v) => s"\t$k: $v\n" }
In the case of nested maps you will want a method that
accepts the current "level" of nesting. Place the level tabs in front of the output to give the proper output nesting
checks each of the entries. If it were another collection type then it needs to recursively invoke itself - which will increase the indention level

Related

Scala changing parquet path in config (typesafe)

Currently I have a configuration file like this:
project {
inputs {
baseFile {
paths = ["project/src/test/resources/inputs/parquet1/date=2020-11-01/"]
type = parquet
applyConversions = false
}
}
}
And I want to change the date "2020-11-01" to another one during run time. I read I need a new config object since it's immutable, I'm trying this but I'm not quite sure how to edit paths since it's a list and not a String and it definitely needs to be a list or else it's going to say I haven't configured a path for the parquet.
val newConfig = config.withValue("project.inputs.baseFile.paths"(0),
ConfigValueFactory.fromAnyRef("project/src/test/resources/inputs/parquet1/date=2020-10-01/"))
But I'm getting a:
Error com.typesafe.config.ConfigException$BadPath: path parameter: Invalid path 'project.inputs.baseFile.': path has a leading, trailing, or two adjacent period '.' (use quoted "" empty string if you want an empty element)
What's the correct way to set the new path?
One option you have, is to override the entire array:
import scala.collection.JavaConverters._
val mergedConfig = config.withValue("project.inputs.baseFile.paths",
ConfigValueFactory.fromAnyRef(Seq("project/src/test/resources/inputs/parquet1/date=2020-10-01/").asJava))
But a more elegant way to do this (IMHO), is to create a new config, and to use the existing as a fallback.
For example, we can create a new config:
val newJsonString = """project {
|inputs {
|baseFile {
| paths = ["project/src/test/resources/inputs/parquet1/date=2020-10-01/"]
|}}}""".stripMargin
val newConfig = ConfigFactory.parseString(newJsonString)
And now to merge them:
val mergedConfig = newConfig.withFallback(config)
The output of:
println(mergedConfig.getList("project.inputs.baseFile.paths"))
println(mergedConfig.getString("project.inputs.baseFile.type"))
is:
SimpleConfigList(["project/src/test/resources/inputs/parquet1/date=2020-10-01/"])
parquet
As expected.
You can read more about Merging config trees. Code run at Scastie.
I didn't find any way to replace one element of the array with withValue.

Add element to the xml string in scala

I have the following relatively simple scenario, but it’s working.
I need an append to my xml string, here's the scenario:
val xmlStr = "<return> <numberPin> 123456 </numberPin> </return>"
I need some way to add the element data and return the string below, I would like some solution with regular expression if possible
"<return> <numberPin> 123456 </numberPin> <date> 2019-09-04 00:00:00 </date> </return>"
You can create a template xml at first that can be updated at runtime.
You can do something like below:
def updateXml (xmlStr:String, dateContent: String) = {
xmlStr.replace("DATE_DATA", dateContent)
}
val xmlStr = "<return> <numberPin> 123456 </numberPin> DATE_DATA </return>"
val dateData = "<date> 2019-09-04 00:00:00 </date>"
updateXml(xmlStr, dateData)
Another alternative is to create an xml template in a file(if the xml content is like a big file). Read it in your code and insert required data at run-time as shown in the above example(where i stuffed DATE_DATA in template and replaced it at runtime using the method).

Mixed Content XML parsing using DataFrame

I have an XML document that has mixed content and I am using a custom schema in Dataframe to parse it. I am having an issue where the schema will only pick up the text for "Measure".
The XML looks like this
<QData>
<Measure> some text here
<Answer>Answer1</Answer>
<Question>Question1</Question>
</Measure>
<Measure> some text here
<Answer>Answer1</Answer>
<Question>Question1</Question>
</Meaure>
</QData>
My schema is as follows:
def getCustomSchema():StructType = {StructField("QData",
StructType(Array(
StructField("Measure",
StructType( Array(
StructField("Answer",StringType,true),
StructField("Question",StringType,true)
)),true)
)),true)}
When I try to access the data in Measure I am only getting "some text here" and it fails when I try to get info from Answer. I am also just getting one Measure.
EDIT: This is how I am trying to access the data
val result = sc.read.format("com.databricks.spark.xml").option("attributePrefix", "attr_").schema(getCustomSchema)
.load(filename.toString)
val qDfTemp = result.mapPartitions(partition =>{val mapper = new QDMapper();partition.map(row=>{mapper(row)}).flatMap(list=>list)}).toDF()
case class QDMapper(){
def apply(row: Row):List[QData]={
val qDList = new ListBuffer[QData]()
val qualData = row.getAs[Row]("QData") //When I print as list I get the first Measure text and that is it
val measure = qualData.getAs[Row]("Measure") //This fails
}
}
you can use row tag as a root tag and access other element:-
df_schema = sqlContext.read.format('com.databricks.spark.xml').options(rowTag='<xml_tag_name>').load(schema_path)
please visit https://github.com/harshaltaware/Pyspark/blob/main/Spark-data-parsing/xmlparsing.py for brief code

Way to Extract list of elements from Scala list

I have standard list of objects which is used for the some analysis. The analysis generates a list of Strings and i need to look through the standard list of objects and retrieve objects with same name.
case class TestObj(name:String,positions:List[Int],present:Boolean)
val stdLis:List[TestObj]
//analysis generates a list of strings
var generatedLis:List[String]
//list to save objects found in standard list
val lisBuf = new ListBuffer[TestObj]()
//my current way
generatedLis.foreach{i=>
val temp = stdLis.filter(p=>p.name.equalsIgnoreCase(i))
if(temp.size==1){
lisBuf.append(temp(0))
}
}
Is there any other way to achieve this. Like having an custom indexof method that over rides and looks for the name instead of the whole object or something. I have not tried that approach as i am not sure about it.
stdLis.filter(testObj => generatedLis.exists(_.equalsIgnoreCase(testObj.name)))
use filter to filter elements from 'stdLis' per predicate
use exists to check if 'generatedLis' has a value of ....
Don't use mutable containers to filter sequences.
Naive solution:
val lisBuf =
for {
str <- generatedLis
temp = stdLis.filter(_.name.equalsIgnoreCase(str))
if temp.size == 1
} yield temp(0)
if we discard condition temp.size == 1 (i'm not sure it is legal or not):
val lisBuf = stdLis.filter(s => generatedLis.exists(_.equalsIgnoreCase(s.name)))

Scala Play upload file within a form

How would I upload a file within a form defined with Scala Play's play.api.data.Forms framework. I want the file to be stored under Treatment Image.
val cForm: Form[NewComplication] = Form(
mapping(
"Name of Vital Sign:" -> of(Formats.longFormat),
"Complication Name:" -> text,
"Definition:" -> text,
"Reason:" -> text,
"Treatment:" -> text,
"Treatment Image:" -> /*THIS IS WHERE I WANT THE FILE*/,
"Notes:" -> text,
"Weblinks:" -> text,
"Upper or Lower Bound:" -> text)
(NewComplication.apply _ )(NewComplication.unapply _ ))
is there a simple way to do this? By using built in Formats?
I think you have to handle the file component of a multipart upload separately and combine it with your form data afterwards. You could do this several ways, depending on what you want the treatment image field to actually be (the file-path as a String, or, to take you literally, as a java.io.File object.)
For that last option, you could make the treatment image field of your NewComplication case class an Option[java.io.File] and handle it in your form mapping with ignored(Option.empty[java.io.File]) (so it won't be bound with the other data.) Then in your action do something like this:
def createPost = Action(parse.multipartFormData) { implicit request =>
request.body.file("treatment_image").map { picture =>
// retrieve the image and put it where you want...
val imageFile = new java.io.File("myFileName")
picture.ref.moveTo(imageFile)
// handle the other form data
cForm.bindFromRequest.fold(
errForm => BadRequest("Ooops"),
complication => {
// Combine the file and form data...
val withPicture = complication.copy(image = Some(imageFile))
// Do something with result...
Redirect("/whereever").flashing("success" -> "hooray")
}
)
}.getOrElse(BadRequest("Missing picture."))
}
A similar thing would apply if you wanted just to store the file path.
There are several ways to handle file upload which will usually depend on what you're doing with the files server-side, so I think this approach makes sense.