I have a large result set from a database call that I need to stream back to the user as it can't all fit into memory.
I am able to stream the results from the database back by setting the options
val statement = session.conn.prepareStatement(query,
java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY)
statement.setFetchSize(Integer.MIN_VALUE)
....
....
val res = statement.executeQuery
And then by using an Iterator
val result = new Iterator[MyResultClass] {
def hasNext = res.next
def next = MyResultClass(someValue = res.getString("someColumn"), anotherValue = res.getInt("anotherValue"))
}
In Scala, Iterator extends TraversableOnce which should allow me to pass the Iterator to the Enumerator class that is used for the Chunked Response in the play framework according to the documentation at https://www.playframework.com/documentation/2.3.x/ScalaStream
When looking at the source code for Enumerator I discovered that it has an overloaded apply method for consuming a TraversableOnce object
I tried using the following code
import play.api.libs.iteratee.Enumerator
val dataContent = Enumerator(result)
Ok.chunked(dataContent)
But this isn't working as it throws the following exception
Cannot write an instance of Iterator[MyResultClass] to HTTP response. Try to define a Writeable[Iterator[MyResultClass]]
I can't find anywhere in the documentation that talks about what Writable is or does. I thought once the Enumerator consumed the TraversableOnce object, it would take it from there but I guess not??
Problem in your approach
There are two problems with your approach:
You are writing the Iterator to the Enumerator / Iteratee. You should write the content of the Iterator and not the whole Iterator
Scala doesn't know how to express objects of MyResultClass on a HTTP stream. Try to convert them to a String representation (e.g. JSON) before writing them.
Example
build.sbt
A simple Play Scala project with H2 and SQL support.
lazy val root = (project in file(".")).enablePlugins(PlayScala)
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
jdbc,
"org.scalikejdbc" %% "scalikejdbc" % "2.2.4",
"com.h2database" % "h2" % "1.4.185",
"ch.qos.logback" % "logback-classic" % "1.1.2"
)
project/plugins.sbt
Just the minimal config for the sbt play plugin in the current stable version
resolvers += "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.3.8")
conf/routes
Just one route on /json
GET /json controllers.Application.json
Global.scala
Configuration file, creates and fills the database with demo data during startup of the Play application
import play.api.Application
import play.api.GlobalSettings
import scalikejdbc._
object Global extends GlobalSettings {
override def onStart(app : Application): Unit = {
// initialize JDBC driver & connection pool
Class.forName("org.h2.Driver")
ConnectionPool.singleton("jdbc:h2:mem:hello", "user", "pass")
// ad-hoc session provider
implicit val session = AutoSession
// Create table
sql"""
CREATE TABLE persons (
customer_id SERIAL NOT NULL PRIMARY KEY,
first_name VARCHAR(64),
sure_name VARCHAR(64)
)""".execute.apply()
// Fill table with demo data
Seq(("Alice", "Anderson"), ("Bob", "Builder"), ("Chris", "Christoph")).
foreach { case (firstName, sureName) =>
sql"INSERT INTO persons (first_name, sure_name) VALUES (${firstName}, ${sureName})".update.apply()
}
}
}
models/Person.scala
Here we define the database schema and the Scala representation of the database objects. Key here is the function personWrites. It converts Person objects to JSON representation (real code is conveniently generated by a macro).
package models
import scalikejdbc._
import scalikejdbc.WrappedResultSet
import play.api.libs.json._
case class Person(customerId : Long, firstName: Option[String], sureName : Option[String])
object PersonsTable extends SQLSyntaxSupport[Person] {
override val tableName : String = "persons"
def apply(rs : WrappedResultSet) : Person =
Person(rs.long("customer_id"), rs.stringOpt("first_name"), rs.stringOpt("sure_name"))
}
package object models {
implicit val personWrites: Writes[Person] = Json.writes[Person]
}
controllers/Application.scala
Here you have the Iteratee / Enumerator code. First we read the data from the database, then we convert the result to an Iterator and then to an Enumerator. That Enumerator would not be useful, because its content are Person objects and Play doesn't know how to write such objects over HTTP. But with the help of personWrites, we can convert these objects to JSON. And Play knows how to write JSON over HTTP.
package controllers
import play.api.libs.json.JsValue
import play.api.mvc._
import play.api.libs.iteratee._
import scala.concurrent.ExecutionContext.Implicits.global
import scalikejdbc._
import models._
import models.personWrites
object Application extends Controller {
implicit val session = AutoSession
val allPersons : Traversable[Person] = sql"SELECT * FROM persons".map(rs => PersonsTable(rs)).traversable().apply()
def personIterator(): Iterator[Person] = allPersons.toIterator
def personEnumerator() : Enumerator[Person] = Enumerator.enumerate(personIterator)
def personJsonEnumerator() : Enumerator[JsValue] = personEnumerator.map(personWrites.writes(_))
def json = Action {
Ok.chunked(personJsonEnumerator())
}
}
Discussion
Database config
The database config is a hack in this example. Usually we would configure Play so it provides a data source and handles all the database stuff in the background.
JSON conversion
In the code I call the JSON conversion directly. There are better approaches, leading to more compact code (but easier to understand for a beginner).
The response you get is not really valid JSON. Example:
{"customerId":1,"firstName":"Alice","sureName":"Anderson"}
{"customerId":2,"firstName":"Bob","sureName":"Builder"}
{"customerId":3,"firstName":"Chris","sureName":"Christoph"}
(Remark: The line break is only for the formatting. On the wire it looks like that:
...son"}{"custom...
Instead you get blocks of valid JSON chunked together. That's what you requested. The receiving end can consume each block on its own. But there is a problem: you must find some way to separate the response into the valid blocks.
The request itself is indeed chunked. Consider the following HTTP headers (in JSON HAR format, exported from Google Chrome):
"status": 200,
"statusText": "OK",
"httpVersion": "HTTP/1.1",
"headers": [
{
"name": "Transfer-Encoding",
"value": "chunked"
},
{
"name": "Content-Type",
"value": "application/json; charset=utf-8"
}
Code organization
I put some SQL code in the controller. In this case this is totally fine. If the code becomes bigger, it might be better to the SQL stuff in the model and let the controller use a more general (in this case: "monadic plus", i.e. map, filter, flatMap) interface.
In the controller JSON code and SQL code are mixed together. When the code gets bigger, you should organize it, e.g. per technology or per model object / business domain.
Blocking iterator
The usage of an iterator leads to blocking behavior. This is usually a big problem, but should be avoided for applications the must should a lot of load (hundreds or thousands of hits per second) or that must answer really fast (think of trading algorithms working live on the stack exchange). In this case you could use a NoSQL database as a cache (please don't use it as the only data store) or non-blocking JDBC (e.g. async postgres / mysql). Again: this is not necessary for big applications.
Attention: As soon as you convert to an iterator, remember that you can consume an iterator only once. For each request, you need a fresh iterator.
Conclusion
A complete WebApp including database access completely in a (not so short) SO answer. I really like the Play framework.
This code is for educational purposes. It is extra awkward in some places, to make it easier to understand the concepts for a beginner. In a real application, you would straighten these things out, because you already know the concepts and you just want to see the purpose of the code (why is it there? which tools is it using? when is it doing what?) on the first glance.
Have fun!
Related
I created a Spark Data Source that uses the "older" DataSource V1 API to write data in a specific binary format our measuring devices and some software requires, i.e., my DefaultSource extends CreatableRelationProvider.
In the appropriate createRelation method I call my own custom method to write the data from the DataFrame passed in. I am doing this with the help of Hadoop's FileSystem API, initialized from the Hadoop Configuration one can pull out of the supplied DataFrame:
def createRelation(sqlContext: SQLContext,
mode : SaveMode,
parameters: Map[String, String],
data : DataFrame): BaseRelation = {
val path = ... // get from parameters; in real here is more preparation code, checking save mode etc.
MyCustomWriter.write(data, path)
EchoingRelation(data) // small class that just wraps the data frame into a BaseRelation with TableScan
}
In the MyCustomWriter I then do all sorts of things, and in the end, I save data as a side effect to map, mapPartitions and foreachPartition calls on the executors of the cluster, like this:
val confBytes = conf.toByteArray // implicit I wrote turning Hadoop Writables to Byte Array, as Configuration isn't serializable
data.
select(...).
where(...).
// much more
as[Foo].
mapPartitions { it =>
val conf = confBytes.toWritable[Configuration] // vice-versa like toByteArray
val writeResult = customWriteRecords(it, conf) // writes data to the disk using Hadoop FS API
writeResult.iterator
}.
// do more stuff
While this approach works fine, I notice that when running this, the Output column in the Spark job UI is not updated. Is it somehow possible to propagate this information or do I have to wrap the data in Writables and use a Hadoop FileOutputFormat approach instead?
I found a hacky approach.
Inside a RDD/DF operation you can get OutputMetrics:
val metrics = TaskContext.get().taskMetrics().outputMetrics
This has the fields bytesWritten and recordsWritten. However, the setters are package-local for org.apache.spark.executor. So, I created a "breakout object" in the package:
package org.apache.spark.executor
object OutputMetricsBreakout {
def setRecordsWritten(outputMetrics: OutputMetrics,
recordsWritten: Long): Unit =
outputMetrics.setRecordsWritten(recordsWritten)
def setBytesWritten(outputMetrics: OutputMetrics,
bytesWritten: Long): Unit =
outputMetrics.setBytesWritten(bytesWritten)
}
Then I can use:
val myBytesWritten = ... // calculate written bytes
OutputMetricsBreakout.setBytesWritten(metrics, myBytesWritten + metrics.bytesWritten)
This is a hack but the only "simple" way I could come up with.
I am new to Scala and was trying my hands on with akka. I am trying to access data from MongoDB in Scala and want to convert it into JSON and XML format.
This code attached below is using path /getJson and calling getJson() function to get data in a form of future.
get {
concat(
path("getJson"){
val f = Patterns.ask(actor1,getJson(),10.seconds)
val res = Await.result(f,10.seconds)
val result = res.toString
complete(res.toString)
}
}
The getJson() method is as follows:
def getJson()= {
val future = collection.find().toFuture()
future
}
I have a Greeting Case class in file Greeting.scala:
case class Greeting(msg:String,name:String)
And MyJsonProtocol.scala file for Marshelling of scala object to JSON format as follows:
trait MyJsonProtocol extends SprayJsonSupport with DefaultJsonProtocol {
implicit val templateFormat = jsonFormat2(Greeting)
}
I am getting output of complete(res.toString) in Postman as :
Future(Success(List(
Iterable(
(_id,BsonObjectId{value=5fc73944986ced2b9c2527c4}),
(msg,BsonString{value='Hiiiiii'}),
(name,BsonString{value='Ruchirrrr'})
),
Iterable(
(_id,BsonObjectId{value=5fc73c35050ec6430ec4b211}),
(msg,BsonString{value='Holaaa Amigo'}),
(name,BsonString{value='Pablo'})),
Iterable(
(_id,BsonObjectId{value=5fc8c224e529b228916da59d}),
(msg,BsonString{value='Demo'}),
(name,BsonString{value='RuchirD'}))
)))
Can someone please tell me how to iterate over this output and to display it in JSON format?
When working with Scala, its very important to know your way around types. First step toweards this is at least knowing the types of your variables and values.
If you look at this method,
def getJson() = {
val future = collection.find().toFuture()
future
}
Is lacks the type type information at all levels, which is a really bad practice.
I am assuming that you are using mongo-scala-driver. And your collection is actually a MongoCollection[Document].
Which means that the output of collection.find() should be a FindOberservable[Document], hence collection.find().toFuture() should be a Future[Seq[Document]]. So, your getJson method should be written as,
def getJson(): Future[Seq[Document]] =
collection.find().toFuture()
Now, this means that you are passing a Future[Seq[Document]] to your actor1, which is again a bad practice. You should never send any kind of Future values among actors. It looks like your actor1 does nothing but sends the same message back. Why does this actor1 even required when it does nothing ?
Which means your f is a Future[Future[Seq[Document]]]. Then you are using Await.result to get the result of this future f. Which is again an anti-pattern, since Await blocks your thread.
Now, your res is a Future[Seq[Document]]. And you are converting it to a String and sending that string back with complete.
Your JsonProtocol is not working because you are not even passing it any Greeting's.
You have to do the following,
Read raw Bson objects from mongo.
convert raw Bson objects to your Gretting objects.
comlete your result with these Gretting objects. The JsonProtocol should take case of converting these Greeting objects to Json.
The easist way to do all this is by using the mongo driver's CodecRegistreis.
case class Greeting(msg:String, name:String)
Now, your MongoDAL object will look like following (it might be missing some imports, fill any missing imports as you did in your own code).
import org.mongodb.scala.bson.codecs.Macros
import org.mongodb.scala.bson.codecs.DEFAULT_CODEC_REGISTRY
import org.bson.codecs.configuration.CodecRegistries
import org.mongodb.scala.{MongoClient, MongoCollection, MongoDatabase}
object MongoDAL {
val greetingCodecProvider = Macros.createCodecProvider[Greeting]()
val codecRegistry = CodecRegistries.fromRegistries(
CodecRegistries.fromProviders(greetingCodecProvider),
DEFAULT_CODEC_REGISTRY
)
val mongoClient: MongoClient = ... // however you are connecting to mongo and creating a mongo client
val mongoDatabase: MongoDatabase =
mongoClient
.getDatabase("database_name")
.withCodecRegistry(codecRegistry)
val greetingCollection: MongoCollection[Greeting] =
mongoDatabase.getCollection[Greeting]("greeting_collection_name")
def fetchAllGreetings(): Future[Seq[Greeting]] =
greetingCollection.find().toFuture()
}
Now, your route can be defined as
get {
concat(
path("getJson") {
val greetingSeqFuture: Future[Seq[Greeting]] = MongoDAL.fetchAllGreetings()
// I don't see any need for that actor thing,
// but if you really need to do that, then you can
// do that by using flatMap to chain future computations.
val actorResponseFuture: Future[Seq[Greeting]] =
greetingSeqFuture
.flatMap(greetingSeq => Patterns.ask(actor1, greetingSeq, 10.seconds))
// complete can handle futures just fine
// it will wait for futre completion
// then convert the seq of Greetings to Json using your JsonProtocol
complete(actorResponseFuture)
}
}
First of all, don't call toString in complete(res.toString).
As it said in AkkaHTTP json support guide if you set everything right, your case class will be converted to json automatically.
But as I see in the output, your res is not an object of a Greeting type. Looks like it is somehow related to the Greeting and has the same structure. Seems to be a raw output of the MongoDB request. If it is a correct assumption, you should convert the raw output from MongoDB to your Greeting case class.
I guess it could be done in getJson() after collection.find().
I want to call all the parameters from BeamConfig.scala in another scala class. The parameters stored in BeamConfig.scala are like below:
case class WarmStart(
enabled: scala.Boolean,
path: java.lang.String
)
object WarmStart {
def apply(c: com.typesafe.config.Config): BeamConfig.Beam.WarmStart = {
BeamConfig.Beam.WarmStart(
enabled = c.hasPathOrNull("enabled") && c.getBoolean("enabled"),
path = if (c.hasPathOrNull("path")) c.getString("path") else "output"
)
}
}
So There are more than 100 parameters object like above object in BeamConfig.scala. If I want to get the parameter from this file than I will do like this:
beam.warmStart.enable
beam.warmStart.path
Where beam is the root class.So is there any way so that i can call all the parameters in a bulk or I can store all the object in some Map or something else.
Thanks
there's a couple different ways you could do this:
Using Typesafe Config in a somewhat unsafe-ish manner:
https://github.com/lightbend/config#api-example
This would give you map-like access but it can very easily explode if the names are wrong, types don't line up etc.
Using PureConfig (a wrapper around typesafe config which allows automatic derivation of case class based config decoders, kinda like circe for json)
https://pureconfig.github.io/docs/
So you'd have to write you large caseclass with 100 fields once but you have a safe decoding of config into that case class and after that you have normal named properties with their correct types.
(Note that this will lose you invariance under rename refactor)
Firstly, I would separate the code that reads the config from the code that processes the results. In this case the default value "output" is embedded in the code that reads the config when it should probably be done in a separate pass.
Secondly, I would use a package to automatically populate a case class from a config entry. You then need one line per config object, and you get the results checked for you. E.g.
object BeamConfig {
val warmStart = Config[WarmStart]("warmStart")
val config2 = Config[Config2]("config2")
...
}
If you need some processing you can do this
val warmStart = ProcessWarmStart(Config[WarmStart]("warmStart"))
This approach still requires a bit of boiler plate code, but it has better type safety than a bulk import of the config.
I would also consider combining the objects into fewer, nested objects with matching nested case classes.
Here is a cut-down version of Config using json4s and jackson:
import com.typesafe.config._
import org.json4s._
import org.json4s.jackson.JsonMethods._
object Config {
private val cfgFile = "configFile"
private val conf = ConfigFactory.load(cfgFile).withFallback(ConfigFactory.load())
private val jData = parse(conf.root.render(ConfigRenderOptions.concise))
def apply[T](name: String)(implicit formats: Formats = DefaultFormats, mf: Manifest[T]): T =
Extraction.extract(jData \\ name)(formats, mf)
}
This will throw an exception if the particular config object does not exist or does not match the format of class T.
I've done a sample project in GitHub: akauppi/akka-2.4.6-trial
What I want seems simple: read a URL, provide the contents as a line-wise stream of Strings. Now I've struggled with this (and reading documentation) for the whole day so decided to push the sample public and ask for help.
I'm comfortable with Scala. I know Akka, and last time I've used Akka-streams it was probably pre-2.4. Now, I'm lost.
Questions:
On these lines I'd like to return a Source[String,Any], not a Future (note: those lines do not compile).
The problem probably is that Http().singleRequest(...) materialises the flow, and I don't want that. How to just inject the "recipe" of reading a web page without actually reading it?
def sourceAsByteString(url: URL)(implicit as: ActorSystem, mat: Materializer): Source[ByteString, Any] = {
import as.dispatcher
val req: HttpRequest = HttpRequest( uri = url.toString )
val tmp: Source[ByteString, Any] = Http().singleRequest(req).map( resp => resp.entity.dataBytes ) // does not compile, gives a 'Future'
tmp
}
The problem is that the chunks you get from the server are not lines, but might be anything. You will often get small responses in a single chunk. So you have to split the stream to lines yourself.
Something like this should work:
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.client.RequestBuilding._
import akka.stream.ActorMaterializer
implicit val system = ActorSystem("test")
implicit val mat = ActorMaterializer()
val delimiter: Flow[ByteString, ByteString, NotUsed] =
Framing.delimiter(
ByteString("\r\n"),
maximumFrameLength = 100000,
allowTruncation = true)
import system.dispatcher
val f = Http().singleRequest(Get("http://www.google.com")).flatMap { res =>
val lines = res.entity.dataBytes.via(delimiter).map(_.utf8String)
lines.runForeach { line =>
println(line)
}
}
f.foreach { _ =>
system.terminate()
}
Note that if you wanted to return the lines instead of printing them, you would end up with a Future[Source[String, Any]], which is unavoidable because everything in akka-http is async. You could "flatten" this to a Source[String, Any] that produces no elements in case of a failed request, but that would probably not be a good idea.
To get a "recipe" for reading a web page, you could use Http().outgoingConnection("http://www.google.com"), which creates a Flow[HttpRequest, HttpResponse, Future[OutgoingConnection]], so a thing where you put in HttpRequest objects and get back HttpResponse objects.
The problem probably is that Http().singleRequest(...) materialises
the flow, and I don't want that.
That was indeed at the heart of the problem. There are two ways to start:
Http().singleRequest(...) leads to a Future (i.e. materializes the stream, already in the very beginning).
Source.single(HttpRequest(...)) leads to a Source (non-materialized).
http://doc.akka.io/docs/akka/2.4.7/scala/http/client-side/connection-level.html#connection-level-api
Ideally, such an important difference would be visible in the names of the methods used, but it's not. One simply has to know it, and understand the two above approaches are actually vastly different.
#RüdigerKlaehn's answer covers the linewise cutting pretty well, but also see the Cookbook
when dealing with a Source, use mapConcat in place of the flatMap (Futures), to flatten the res.entity.dataBytes (which is an inner stream). Having these two levels of streams (requests, then chunks per request) adds to the mental complexity especially since we only have one of the outer entities.
There might still be some simpler way, but I'm not looking at that actively any more. Things work. Maybe once I become more fluent with akka streams, I'll suggest a further solution.
Code for reading an HTTP response, linewise (akka-http 1.1.0-RC2):
val req: HttpRequest = ???
val fut: Future[Source[String,_]] = Http().singleRequest(req).map { resp =>
resp.entity.dataBytes.via(delimiter)
.map(_.utf8String)
}
delimiter as in #Rüdiger-klaehn's answer.
I found one library for this https://github.com/daltontf/scala-yaml, but it seems like not many developers use it and it's pretty outdated. It also might be this http://www.lag.net/configgy/ if the link wasn't dead.
I wonder, what is the most popular or de-facto library for working with YAML in Scala?
Here's an example of using the Jackson YAML databinding.
First, here's our sample document:
name: test
parameters:
"VERSION": 0.0.1-SNAPSHOT
things:
- colour: green
priority: 128
- colour: red
priority: 64
Add these dependencies:
libraryDependencies ++= Seq(
"com.fasterxml.jackson.core" % "jackson-core" % "2.1.1",
"com.fasterxml.jackson.core" % "jackson-annotations" % "2.1.1",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.1.1",
"com.fasterxml.jackson.dataformat" % "jackson-dataformat-yaml" % "2.1.1"
)
Here's our outermost class (Preconditions is a Guava-like check and raises an exception if said field is not in the YAML):
import java.util.{List => JList, Map => JMap}
import collection.JavaConversions._
import com.fasterxml.jackson.annotation.JsonProperty
class Sample(#JsonProperty("name") _name: String,
#JsonProperty("parameters") _parameters: JMap[String, String],
#JsonProperty("things") _things: JList[Thing]) {
val name = Preconditions.checkNotNull(_name, "name cannot be null")
val parameters: Map[String, String] = Preconditions.checkNotNull(_parameters, "parameters cannot be null").toMap
val things: List[Thing] = Preconditions.checkNotNull(_things, "things cannot be null").toList
}
And here's the inner object:
import com.fasterxml.jackson.annotation.JsonProperty
class Thing(#JsonProperty("colour") _colour: String,
#JsonProperty("priority") _priority: Int {
val colour = Preconditions.checkNotNull(_colour, "colour cannot be null")
val priority = Preconditions.checkNotNull(_priority, "priority cannot be null")
}
Finally, here's how to instantiate it:
val reader = new FileReader("sample.yaml")
val mapper = new ObjectMapper(new YAMLFactory())
val config: Sample = mapper.readValue(reader, classOf[Sample])
A little late to the party but I think this method works in the most seamless way. This method has:
Automatic conversion to scala collection types
Use case classes
No need for boilerplate code like BeanProperty/JsonProperty
Uses Jackson-YAML & Jackson-scala
Code:
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory
import com.fasterxml.jackson.module.scala.DefaultScalaModule
case class Prop(url: List[String])
// uses Jackson YAML to parsing, relies on SnakeYAML for low level handling
val mapper: ObjectMapper = new ObjectMapper(new YAMLFactory())
// provides all of the Scala goodiness
mapper.registerModule(DefaultScalaModule)
val prop: Prop = mapper.readValue("url: [abc, def]", classOf[Prop])
// prints List(abc, def)
println(prop.url)
SnakeYAML is a high-quality, actively maintained YAML parser/renderer for Java. You can of course use it from Scala.
If you're already working with circe, you might be interested in circe-yaml which uses SnakeYAML to parse a YAML file and then converts the result to a circe AST.
I would love to see a library that could parse either JSON or YAML (or whatever -- pluggable) to a common AST and then construct Scala objects using typeclasses. Several JSON libraries work like that (and of course can also render JSON for objects using the same typeclasses), but I don't know of such a facility for YAML.
PS: There also appear to be a number of seemingly abandoned wrappers for SnakeYAML, namely HelicalYAML and yaml4s
And now we have circe-yaml https://github.com/circe/circe-yaml
SnakeYAML provides a Java API for parsing YAML and marshalling its structures into JVM classes. However, you might find circe's way of marshalling into a Scala ADT preferable -- using compile-time specification or derivation rather than runtime reflection. This enables you to parse YAML into Json, and use your existing (or circe's generic) Decoders to perform the ADT marshalling. You can also use circe's Encoder to obtain a Json, and print that to YAML using this library.
I came across moultingyaml today.
MoultingYAML is a Scala wrapper for SnakeYAML based on spray-json.
It looks quite familiar to me, having worked years with spray-json. I think it might fit #sihil's need of a "compelling" and "mature" Scala YAML library.
For anyone else that runs across this answer and is looking for help and examples, I found a basic example that uses snakeYAML Hope it helps. Here's the code:
package yaml
import org.yaml.snakeyaml.Yaml
import org.yaml.snakeyaml.constructor.Constructor
import scala.collection.mutable.ListBuffer
import scala.reflect.BeanProperty
object YamlBeanTest1 {
val text = """
accountName: Ymail Account
username: USERNAME
password: PASSWORD
mailbox: INBOX
imapServerUrl: imap.mail.yahoo.com
protocol: imaps
minutesBetweenChecks: 1
usersOfInterest: [barney, betty, wilma]
"""
def main(args: Array[String]) {
val yaml = new Yaml(new Constructor(classOf[EmailAccount]))
val e = yaml.load(text).asInstanceOf[EmailAccount]
println(e)
}
}
/**
* With the Snakeyaml Constructor approach shown in the main method,
* this class must have a no-args constructor.
*/
class EmailAccount {
#BeanProperty var accountName: String = null
#BeanProperty var username: String = null
#BeanProperty var password: String = null
#BeanProperty var mailbox: String = null
#BeanProperty var imapServerUrl: String = null
#BeanProperty var minutesBetweenChecks: Int = 0
#BeanProperty var protocol: String = null
#BeanProperty var usersOfInterest = new java.util.ArrayList[String]()
override def toString: String = {
return format("acct (%s), user (%s), url (%s)", accountName, username, imapServerUrl)
}
}
So I don't have enough reputation to comment (41 atm) but I thought my experience was worth mentioning.
After reading this thread, I decided to try to use the Jackson YAML parser because I didn't want zero-argument constructors and it was much more readable. What I didn't realize was that there is no support for inheritance (merging), and there is limited support for anchor reference (isn't that the whole point of YAML??).
Merge is explained here.
Anchor reference is explained here. While it appears that complex anchor reference is supported, I could not get it to work in a simple case.
In my experience JSON libraries for Scala are more mature and easier to use (none of the YAML approaches are enormously compelling or as mature as JSON equivalents when it comes to dealing with case classes or writing custom serialisers and deserialisers).
As such I prefer to converting from YAML to JSON and then use a JSON library. this might sound slightly crazy but it works really well provided that:
You are only working with YAML that is a subset of JSON (a great deal of use cases in my experience)
The path is not performance critical (as there is overhead in taking this approach)
The approach I use for converting from YAML to JSON leverages Jackson:
val tree = new ObjectMapper(new YAMLFactory()).readTree(yamlTemplate)
val json = new ObjectMapper()
.writer(new DefaultPrettyPrinter().withoutSpacesInObjectEntries())
.writeValueAsString(tree)