Working with YAML for Scala

Working with YAML for Scala - scala

I found one library for this https://github.com/daltontf/scala-yaml, but it seems like not many developers use it and it's pretty outdated. It also might be this http://www.lag.net/configgy/ if the link wasn't dead.
I wonder, what is the most popular or de-facto library for working with YAML in Scala?

Here's an example of using the Jackson YAML databinding.
First, here's our sample document:
name: test
parameters:
"VERSION": 0.0.1-SNAPSHOT
things:
- colour: green
priority: 128
- colour: red
priority: 64
Add these dependencies:
libraryDependencies ++= Seq(
"com.fasterxml.jackson.core" % "jackson-core" % "2.1.1",
"com.fasterxml.jackson.core" % "jackson-annotations" % "2.1.1",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.1.1",
"com.fasterxml.jackson.dataformat" % "jackson-dataformat-yaml" % "2.1.1"
)
Here's our outermost class (Preconditions is a Guava-like check and raises an exception if said field is not in the YAML):
import java.util.{List => JList, Map => JMap}
import collection.JavaConversions._
import com.fasterxml.jackson.annotation.JsonProperty
class Sample(#JsonProperty("name") _name: String,
#JsonProperty("parameters") _parameters: JMap[String, String],
#JsonProperty("things") _things: JList[Thing]) {
val name = Preconditions.checkNotNull(_name, "name cannot be null")
val parameters: Map[String, String] = Preconditions.checkNotNull(_parameters, "parameters cannot be null").toMap
val things: List[Thing] = Preconditions.checkNotNull(_things, "things cannot be null").toList
}
And here's the inner object:
import com.fasterxml.jackson.annotation.JsonProperty
class Thing(#JsonProperty("colour") _colour: String,
#JsonProperty("priority") _priority: Int {
val colour = Preconditions.checkNotNull(_colour, "colour cannot be null")
val priority = Preconditions.checkNotNull(_priority, "priority cannot be null")
}
Finally, here's how to instantiate it:
val reader = new FileReader("sample.yaml")
val mapper = new ObjectMapper(new YAMLFactory())
val config: Sample = mapper.readValue(reader, classOf[Sample])

A little late to the party but I think this method works in the most seamless way. This method has:
Automatic conversion to scala collection types
Use case classes
No need for boilerplate code like BeanProperty/JsonProperty
Uses Jackson-YAML & Jackson-scala
Code:
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory
import com.fasterxml.jackson.module.scala.DefaultScalaModule
case class Prop(url: List[String])
// uses Jackson YAML to parsing, relies on SnakeYAML for low level handling
val mapper: ObjectMapper = new ObjectMapper(new YAMLFactory())
// provides all of the Scala goodiness
mapper.registerModule(DefaultScalaModule)
val prop: Prop = mapper.readValue("url: [abc, def]", classOf[Prop])
// prints List(abc, def)
println(prop.url)

SnakeYAML is a high-quality, actively maintained YAML parser/renderer for Java. You can of course use it from Scala.
If you're already working with circe, you might be interested in circe-yaml which uses SnakeYAML to parse a YAML file and then converts the result to a circe AST.
I would love to see a library that could parse either JSON or YAML (or whatever -- pluggable) to a common AST and then construct Scala objects using typeclasses. Several JSON libraries work like that (and of course can also render JSON for objects using the same typeclasses), but I don't know of such a facility for YAML.
PS: There also appear to be a number of seemingly abandoned wrappers for SnakeYAML, namely HelicalYAML and yaml4s

And now we have circe-yaml https://github.com/circe/circe-yaml
SnakeYAML provides a Java API for parsing YAML and marshalling its structures into JVM classes. However, you might find circe's way of marshalling into a Scala ADT preferable -- using compile-time specification or derivation rather than runtime reflection. This enables you to parse YAML into Json, and use your existing (or circe's generic) Decoders to perform the ADT marshalling. You can also use circe's Encoder to obtain a Json, and print that to YAML using this library.

I came across moultingyaml today.
MoultingYAML is a Scala wrapper for SnakeYAML based on spray-json.
It looks quite familiar to me, having worked years with spray-json. I think it might fit #sihil's need of a "compelling" and "mature" Scala YAML library.

For anyone else that runs across this answer and is looking for help and examples, I found a basic example that uses snakeYAML Hope it helps. Here's the code:
package yaml
import org.yaml.snakeyaml.Yaml
import org.yaml.snakeyaml.constructor.Constructor
import scala.collection.mutable.ListBuffer
import scala.reflect.BeanProperty
object YamlBeanTest1 {
val text = """
accountName: Ymail Account
username: USERNAME
password: PASSWORD
mailbox: INBOX
imapServerUrl: imap.mail.yahoo.com
protocol: imaps
minutesBetweenChecks: 1
usersOfInterest: [barney, betty, wilma]
"""
def main(args: Array[String]) {
val yaml = new Yaml(new Constructor(classOf[EmailAccount]))
val e = yaml.load(text).asInstanceOf[EmailAccount]
println(e)
}
}
/**
* With the Snakeyaml Constructor approach shown in the main method,
* this class must have a no-args constructor.
*/
class EmailAccount {
#BeanProperty var accountName: String = null
#BeanProperty var username: String = null
#BeanProperty var password: String = null
#BeanProperty var mailbox: String = null
#BeanProperty var imapServerUrl: String = null
#BeanProperty var minutesBetweenChecks: Int = 0
#BeanProperty var protocol: String = null
#BeanProperty var usersOfInterest = new java.util.ArrayList[String]()
override def toString: String = {
return format("acct (%s), user (%s), url (%s)", accountName, username, imapServerUrl)
}
}

So I don't have enough reputation to comment (41 atm) but I thought my experience was worth mentioning.
After reading this thread, I decided to try to use the Jackson YAML parser because I didn't want zero-argument constructors and it was much more readable. What I didn't realize was that there is no support for inheritance (merging), and there is limited support for anchor reference (isn't that the whole point of YAML??).
Merge is explained here.
Anchor reference is explained here. While it appears that complex anchor reference is supported, I could not get it to work in a simple case.

In my experience JSON libraries for Scala are more mature and easier to use (none of the YAML approaches are enormously compelling or as mature as JSON equivalents when it comes to dealing with case classes or writing custom serialisers and deserialisers).
As such I prefer to converting from YAML to JSON and then use a JSON library. this might sound slightly crazy but it works really well provided that:
You are only working with YAML that is a subset of JSON (a great deal of use cases in my experience)
The path is not performance critical (as there is overhead in taking this approach)
The approach I use for converting from YAML to JSON leverages Jackson:
val tree = new ObjectMapper(new YAMLFactory()).readTree(yamlTemplate)
val json = new ObjectMapper()
.writer(new DefaultPrettyPrinter().withoutSpacesInObjectEntries())
.writeValueAsString(tree)

Related

How to get all config parameters from a .scala file ?

I want to call all the parameters from BeamConfig.scala in another scala class. The parameters stored in BeamConfig.scala are like below:
case class WarmStart(
enabled: scala.Boolean,
path: java.lang.String
)
object WarmStart {
def apply(c: com.typesafe.config.Config): BeamConfig.Beam.WarmStart = {
BeamConfig.Beam.WarmStart(
enabled = c.hasPathOrNull("enabled") && c.getBoolean("enabled"),
path = if (c.hasPathOrNull("path")) c.getString("path") else "output"
)
}
}
So There are more than 100 parameters object like above object in BeamConfig.scala. If I want to get the parameter from this file than I will do like this:
beam.warmStart.enable
beam.warmStart.path
Where beam is the root class.So is there any way so that i can call all the parameters in a bulk or I can store all the object in some Map or something else.
Thanks

there's a couple different ways you could do this:
Using Typesafe Config in a somewhat unsafe-ish manner:
https://github.com/lightbend/config#api-example
This would give you map-like access but it can very easily explode if the names are wrong, types don't line up etc.
Using PureConfig (a wrapper around typesafe config which allows automatic derivation of case class based config decoders, kinda like circe for json)
https://pureconfig.github.io/docs/
So you'd have to write you large caseclass with 100 fields once but you have a safe decoding of config into that case class and after that you have normal named properties with their correct types.
(Note that this will lose you invariance under rename refactor)

Firstly, I would separate the code that reads the config from the code that processes the results. In this case the default value "output" is embedded in the code that reads the config when it should probably be done in a separate pass.
Secondly, I would use a package to automatically populate a case class from a config entry. You then need one line per config object, and you get the results checked for you. E.g.
object BeamConfig {
val warmStart = Config[WarmStart]("warmStart")
val config2 = Config[Config2]("config2")
...
}
If you need some processing you can do this
val warmStart = ProcessWarmStart(Config[WarmStart]("warmStart"))
This approach still requires a bit of boiler plate code, but it has better type safety than a bulk import of the config.
I would also consider combining the objects into fewer, nested objects with matching nested case classes.
Here is a cut-down version of Config using json4s and jackson:
import com.typesafe.config._
import org.json4s._
import org.json4s.jackson.JsonMethods._
object Config {
private val cfgFile = "configFile"
private val conf = ConfigFactory.load(cfgFile).withFallback(ConfigFactory.load())
private val jData = parse(conf.root.render(ConfigRenderOptions.concise))
def apply[T](name: String)(implicit formats: Formats = DefaultFormats, mf: Manifest[T]): T =
Extraction.extract(jData \\ name)(formats, mf)
}
This will throw an exception if the particular config object does not exist or does not match the format of class T.

Chunked Response from an Iterator with Play Framework in Scala

I have a large result set from a database call that I need to stream back to the user as it can't all fit into memory.
I am able to stream the results from the database back by setting the options
val statement = session.conn.prepareStatement(query,
java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY)
statement.setFetchSize(Integer.MIN_VALUE)
....
....
val res = statement.executeQuery
And then by using an Iterator
val result = new Iterator[MyResultClass] {
def hasNext = res.next
def next = MyResultClass(someValue = res.getString("someColumn"), anotherValue = res.getInt("anotherValue"))
}
In Scala, Iterator extends TraversableOnce which should allow me to pass the Iterator to the Enumerator class that is used for the Chunked Response in the play framework according to the documentation at https://www.playframework.com/documentation/2.3.x/ScalaStream
When looking at the source code for Enumerator I discovered that it has an overloaded apply method for consuming a TraversableOnce object
I tried using the following code
import play.api.libs.iteratee.Enumerator
val dataContent = Enumerator(result)
Ok.chunked(dataContent)
But this isn't working as it throws the following exception
Cannot write an instance of Iterator[MyResultClass] to HTTP response. Try to define a Writeable[Iterator[MyResultClass]]
I can't find anywhere in the documentation that talks about what Writable is or does. I thought once the Enumerator consumed the TraversableOnce object, it would take it from there but I guess not??

Problem in your approach
There are two problems with your approach:
You are writing the Iterator to the Enumerator / Iteratee. You should write the content of the Iterator and not the whole Iterator
Scala doesn't know how to express objects of MyResultClass on a HTTP stream. Try to convert them to a String representation (e.g. JSON) before writing them.
Example
build.sbt
A simple Play Scala project with H2 and SQL support.
lazy val root = (project in file(".")).enablePlugins(PlayScala)
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
jdbc,
"org.scalikejdbc" %% "scalikejdbc" % "2.2.4",
"com.h2database" % "h2" % "1.4.185",
"ch.qos.logback" % "logback-classic" % "1.1.2"
)
project/plugins.sbt
Just the minimal config for the sbt play plugin in the current stable version
resolvers += "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.3.8")
conf/routes
Just one route on /json
GET /json controllers.Application.json
Global.scala
Configuration file, creates and fills the database with demo data during startup of the Play application
import play.api.Application
import play.api.GlobalSettings
import scalikejdbc._
object Global extends GlobalSettings {
override def onStart(app : Application): Unit = {
// initialize JDBC driver & connection pool
Class.forName("org.h2.Driver")
ConnectionPool.singleton("jdbc:h2:mem:hello", "user", "pass")
// ad-hoc session provider
implicit val session = AutoSession
// Create table
sql"""
CREATE TABLE persons (
customer_id SERIAL NOT NULL PRIMARY KEY,
first_name VARCHAR(64),
sure_name VARCHAR(64)
)""".execute.apply()
// Fill table with demo data
Seq(("Alice", "Anderson"), ("Bob", "Builder"), ("Chris", "Christoph")).
foreach { case (firstName, sureName) =>
sql"INSERT INTO persons (first_name, sure_name) VALUES (${firstName}, ${sureName})".update.apply()
}
}
}
models/Person.scala
Here we define the database schema and the Scala representation of the database objects. Key here is the function personWrites. It converts Person objects to JSON representation (real code is conveniently generated by a macro).
package models
import scalikejdbc._
import scalikejdbc.WrappedResultSet
import play.api.libs.json._
case class Person(customerId : Long, firstName: Option[String], sureName : Option[String])
object PersonsTable extends SQLSyntaxSupport[Person] {
override val tableName : String = "persons"
def apply(rs : WrappedResultSet) : Person =
Person(rs.long("customer_id"), rs.stringOpt("first_name"), rs.stringOpt("sure_name"))
}
package object models {
implicit val personWrites: Writes[Person] = Json.writes[Person]
}
controllers/Application.scala
Here you have the Iteratee / Enumerator code. First we read the data from the database, then we convert the result to an Iterator and then to an Enumerator. That Enumerator would not be useful, because its content are Person objects and Play doesn't know how to write such objects over HTTP. But with the help of personWrites, we can convert these objects to JSON. And Play knows how to write JSON over HTTP.
package controllers
import play.api.libs.json.JsValue
import play.api.mvc._
import play.api.libs.iteratee._
import scala.concurrent.ExecutionContext.Implicits.global
import scalikejdbc._
import models._
import models.personWrites
object Application extends Controller {
implicit val session = AutoSession
val allPersons : Traversable[Person] = sql"SELECT * FROM persons".map(rs => PersonsTable(rs)).traversable().apply()
def personIterator(): Iterator[Person] = allPersons.toIterator
def personEnumerator() : Enumerator[Person] = Enumerator.enumerate(personIterator)
def personJsonEnumerator() : Enumerator[JsValue] = personEnumerator.map(personWrites.writes(_))
def json = Action {
Ok.chunked(personJsonEnumerator())
}
}
Discussion
Database config
The database config is a hack in this example. Usually we would configure Play so it provides a data source and handles all the database stuff in the background.
JSON conversion
In the code I call the JSON conversion directly. There are better approaches, leading to more compact code (but easier to understand for a beginner).
The response you get is not really valid JSON. Example:
{"customerId":1,"firstName":"Alice","sureName":"Anderson"}
{"customerId":2,"firstName":"Bob","sureName":"Builder"}
{"customerId":3,"firstName":"Chris","sureName":"Christoph"}
(Remark: The line break is only for the formatting. On the wire it looks like that:
...son"}{"custom...
Instead you get blocks of valid JSON chunked together. That's what you requested. The receiving end can consume each block on its own. But there is a problem: you must find some way to separate the response into the valid blocks.
The request itself is indeed chunked. Consider the following HTTP headers (in JSON HAR format, exported from Google Chrome):
"status": 200,
"statusText": "OK",
"httpVersion": "HTTP/1.1",
"headers": [
{
"name": "Transfer-Encoding",
"value": "chunked"
},
{
"name": "Content-Type",
"value": "application/json; charset=utf-8"
}
Code organization
I put some SQL code in the controller. In this case this is totally fine. If the code becomes bigger, it might be better to the SQL stuff in the model and let the controller use a more general (in this case: "monadic plus", i.e. map, filter, flatMap) interface.
In the controller JSON code and SQL code are mixed together. When the code gets bigger, you should organize it, e.g. per technology or per model object / business domain.
Blocking iterator
The usage of an iterator leads to blocking behavior. This is usually a big problem, but should be avoided for applications the must should a lot of load (hundreds or thousands of hits per second) or that must answer really fast (think of trading algorithms working live on the stack exchange). In this case you could use a NoSQL database as a cache (please don't use it as the only data store) or non-blocking JDBC (e.g. async postgres / mysql). Again: this is not necessary for big applications.
Attention: As soon as you convert to an iterator, remember that you can consume an iterator only once. For each request, you need a fresh iterator.
Conclusion
A complete WebApp including database access completely in a (not so short) SO answer. I really like the Play framework.
This code is for educational purposes. It is extra awkward in some places, to make it easier to understand the concepts for a beginner. In a real application, you would straighten these things out, because you already know the concepts and you just want to see the purpose of the code (why is it there? which tools is it using? when is it doing what?) on the first glance.
Have fun!

Handling case classes in twitter chill (Scala interface to Kryo)?

Twitter-chill looks like a good solution to the problem of how to serialize efficiently in Scala without excessive boilerplate.
However, I don't see any evidence of how they handle case classes. Does this just work automatically or does something need to be done (e.g. creating a zero-arg constructor)?
I have some experience with the WireFormat serialization mechanism built into Scoobi, which is a Scala Hadoop wrapper similar to Scalding. They have serializers for case classes up to 22 arguments that use the apply and unapply methods and do type matching on the arguments to these functions to retrieve the types. (This might not be necessary in Kryo/chill.)

They generally just work (as long as the component members are also serializable by Kryo):
case class Foo(id: Int, name: String)
// setup
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
// write
val data = Foo(1,"bob")
val buffer = new Array[Byte](4096]
val output = new Output(buffer)
kryo.writeObject(output, data)
// read
val input = new Input(buffer)
val data2 = kryo.readObject(input,classOf[Foo]).asInstanceOf[Foo]

Scala pickling: Simple custom pickler for my own class?

I am trying to pickle some relatively-simple-structured but large-and-slow-to-create classes in a Scala NLP (natural language processing) app of mine. Because there's lots of data, it needs to pickle and esp. unpickle quickly and without bloat. Java serialization evidently sucks in this regard. I know about Kryo but I've never used it. I've also run into Apache Avro, which seems similar although I'm not quite sure why it's not normally mentioned as a suitable solution. Neither is Scala-specific and I see there's a Scala-specific package called Scala Pickling. Unfortunately it lacks almost all documentation and I'm not sure how to create a custom pickler.
I see a question here:
Scala Pickling: Writing a custom pickler / unpickler for nested structures
There's still some context lacking in that question, and also it looks like an awful lot of boilerplate to create a custom pickler, compared with the examples given for Kryo or Avro.
Here's some of the classes I need to serialize:
trait ToIntMemoizer[T] {
protected val minimum_raw_index: Int = 1
protected var next_raw_index: Int = minimum_raw_index
// For replacing items with ints. This is a wrapper around
// gnu.trove.map.TObjectIntMap to make it look like mutable.Map[T, Int].
// It behaves the same way.
protected val value_id_map = trovescala.ObjectIntMap[T]()
// Map in the opposite direction. This is a wrapper around
// gnu.trove.map.TIntObjectMap to make it look like mutable.Map[Int, T].
// It behaves the same way.
protected val id_value_map = trovescala.IntObjectMap[T]()
...
}
class FeatureMapper extends ToIntMemoizer[String] {
val features_to_standardize = mutable.BitSet()
...
}
class LabelMapper extends ToIntMemoizer[String] {
}
case class FeatureLabelMapper(
feature_mapper: FeatureMapper = new FeatureMapper,
label_mapper: LabelMapper = new LabelMapper
)
class DoubleCompressedSparseFeatureVector(
var keys: Array[Int], var values: Array[Double],
val mappers: FeatureLabelMapper
) { ... }
How would I create custom pickers/unpicklers in way that uses as little boilerplate as possible (since I have a number of other classes that need similar treatment)?
Thanks!

Is there an equivalent Scala library (i.e., open source) to Scala's JavaConverters that does not use Wrappers?

I'm currently using Scala with JSF, of which the two play pretty well together. However at times JSF needs to re-instantiate (via Class.newInstance) a data structure, like a list. For example in a managed bean I have:
#BeanProperty
var countries: java.util.List[String] = List("US").asJava
Which works fine until you get to JSF's process validation phase where it runs into java.lang.InstantiationException:
java.lang.InstantiationException: scala.collection.JavaConversions$SeqWrapper
at java.lang.Class.newInstance0(Class.java:340)
at java.lang.Class.newInstance(Class.java:308)
at com.sun.faces.renderkit.html_basic.MenuRenderer.createCollection(MenuRenderer.java:906)
at com.sun.faces.renderkit.html_basic.MenuRenderer.convertSelectManyValuesForModel(MenuRenderer.java:366)
at com.sun.faces.renderkit.html_basic.MenuRenderer.convertSelectManyValue(MenuRenderer.java:128)
at com.sun.faces.renderkit.html_basic.MenuRenderer.getConvertedValue(MenuRenderer.java:314)
at org.primefaces.component.selectcheckboxmenu.SelectCheckboxMenuRenderer.getConvertedValue(SelectCheckboxMenuRenderer.java:34)
...
From a fundamental level it makes sense that Wrappers may not be re-instantiated from scratch, so using JavaConverters won't work well here. My question is there library that already provides a complete Data Structure mapping/conversion without wrappers? If not I'll just write my own internal ones.

Use a Java ArrayList as the var and then use JavaConverters/JavaConversions in your code to manipulate. That's the usual approach I use for APIs like Hibernate, JAX-WS, JSR-303, etc. that need Java collections.
import collection.JavaConversions._
#BeanProperty
var countries: java.util.List[String] = new java.util.ArrayList[String] += "US"
or
import collection.JavaConverters._
#BeanProperty
var countries: java.util.List[String] = new java.util.ArrayList[String]
countries.asScala += "US"
countries.asScala ++= List("US", "MX")
If you really want to just convert back and forth and not wrap it's easy enough without creating your own classes:
import collection.JavaConverters._
import collection.mutable.ArrayBuffer
#BeanProperty
var countries: java.util.List[String] = new java.util.ArrayList[String]
val countriesBuff = new ArrayBuffer.empty[String]
countriesBuff ++= countries.asScala // Convert from ArrayList to ArrayBuffer
// ...
countries.addAll(countriesBuff.asJava) // Convert the other direction
But then you have to worry about the cost of copying and about when synchronization needs to happen. Wrapping/decorating just a lot more convenient.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Working with YAML for Scala - scala

I came across moultingyaml today. MoultingYAML is a Scala wrapper for SnakeYAML based on spray-json. It looks quite familiar to me, having worked years with spray-json. I think it might fit #sihil's need of a "compelling" and "mature" Scala YAML library.

Related

How to get all config parameters from a .scala file ?

Chunked Response from an Iterator with Play Framework in Scala

Handling case classes in twitter chill (Scala interface to Kryo)?

Scala pickling: Simple custom pickler for my own class?

Is there an equivalent Scala library (i.e., open source) to Scala's JavaConverters that does not use Wrappers?

Categories

Resources