How do I generate pretty JSON with json4s? - scala

This snippet of code works very well, but it generates compact JSON (no line breaks / not very human readable).
import org.json4s.native.Serialization.write
implicit val jsonFormats = DefaultFormats
//snapshotList is a case class
val jsonString: String = write(snapshotList)
Is there an easy way to generate pretty JSON from this?
I have this workaround but I'm wondering if a more efficient way exists:
import org.json4s.jackson.JsonMethods._
val prettyJsonString = pretty(render(parse(jsonString)))

import org.json4s.jackson.Serialization.writePretty
val jsonString: String = writePretty(snapshotList)

You can use the ObjectMapper writerWithDefaultPrettyPrinter function:
ObjectMapper mapper = new ObjectMapper();
val pretty = mapper.writerWithDefaultPrettyPrinter()
.writeValueAsString(jsonString));
This returns a ObjectWriter object from which you can get a pretty-formatted string.

Related

Create SparkSQL UDF with non serializable objects

I'm trying to write an UDF that I would like to use on Hive tables in an sqlContext. Is it in any way possible to include objects from other libraries that are not serializable? Here's a minimal example of what does not work:
def myUDF(s: String) = {
import sun.misc.BASE64Encoder
val coder= new BASE64Encoder
val encoded= decoder.encode(s)
encoded
}
I register the function in the spark shell as udf function
val encoding = sqlContext.udf.register("encoder", myUDF)
If I try to run it on a table "test"
sqlContext.sql("SELECT encoder(colname) from test").show()
I get the error
org.apache.spark.SparkException: Task not serializable
object not serializable (class: sun.misc.BASE64Encoder, value: sun.misc.BASE64Encoder#4a7f9a94)
Is there a workaround for this? I tried embedding myUDF in an object and in a class but that didn't work either.
You can try defining udf function as
def encoder = udf((s: String) => {
import sun.misc.BASE64Encoder
val coder= new BASE64Encoder
val encoded= coder.encode(s.getBytes("UTF-8"))
encoded
})
And call the udf function as
dataframe.withColumn("encoded", encoder(col("id"))).show
Updated
As #santon has pointed out that BASE64Encoder encoder is initiated for each row in the dataframe which might lead to performance issues. The solution to that would be to create a static object of BASE64Encoder and call it within udf function.

Reading YAML in Scala without 'case class'

I am trying to read YAML file from Scala and I am able to read the file using the code given below. One of the disadvantage I see here is the necessity for me to create case class to have a mapping with the YAML file I am using. Every time I change the content of YAML it becomes necessary for me to change the case class. Is there any way in Scala to read YAML with out the need for me to create the case class. (I have also used Python to read YAML ; where we do not have the constraint of mapping a Class with the YAML structure...and would like to do similar thing in Scala)
Note : When I add a new property in YAML and if my case class do not have a matching property I get UnrecognizedPropertyException
package yamlexamples
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule
object YamlTest extends App{
case class Prop(country: String, state: List[String])
val mapper: ObjectMapper = new ObjectMapper(new YAMLFactory())
mapper.registerModule(DefaultScalaModule)
val fileStream = getClass.getResourceAsStream("/sample.yaml")
val prop:Prop = mapper.readValue(fileStream, classOf[Prop])
println(prop.country + ", " + prop.state)
}
sample.yaml(This works with code)
country: US
state:
- TX
- VA
sample.yaml (This throws Exception)
country: US
state:
- TX
- VA
continent: North America
You could parse the Yaml file and load it as a collections object instead of case case. But this comes at the cost of losing typesafety in your code. The Below code uses the load function supported by Yaml. Note that the load has overloaded methods to read from a inputstream/reader as well..
import scala.collection.JavaConverters._
val yaml = new Yaml()
val data = yaml.load(
"""
|country: US
|state:
| - TX
| - VA
|continent: North America
""".stripMargin).asInstanceOf[java.util.Map[String, Any]].asScala
Now data is a scala mutable collection instead of a case class
data: scala.collection.mutable.Map[String,Any] = Map(country -> US, state -> [TX, VA], continent -> North America)
You could parse the YAML file using Jackson or SnakeYaml. However, Jackson does not support references/anchors, while SnakeYaml does. So, it is better to parse the YAML file using SnakeYaml & access data elements with Jackson library
import java.io.{File, FileInputStream, FileReader}
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.databind.{JsonNode, ObjectMapper}
import org.yaml.snakeyaml.Yaml
// Parsing the YAML file with SnakeYAML - since Jackson Parser does not support Anchors and references
val ios = new FileInputStream(new File(yamlFilePath))
val yaml = new Yaml()
val mapper = new ObjectMapper().registerModules(DefaultScalaModule)
val yamlObj = yaml.loadAs(ios, classOf[Any])
// Converting the YAML to Jackson YAML - since it has more flexibility
val jsonString = mapper.writerWithDefaultPrettyPrinter().writeValueAsString(yamlObj) // Formats YAML to a pretty printed JSON string - easy to read
val jsonObj = mapper.readTree(jsonString)
Finally, you get the JsonNode object, which enables us to convert to other datatypes.

How do I set Jackson parser features when using json4s?

I am receiving the following error while attempting to parse JSON with json4s:
Non-standard token 'NaN': enable JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS to allow
How do I enable this feature?
Assuming your ObjectMapper object is named mapper:
val mapper = new ObjectMapper()
// Configure NaN here
mapper.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, true)
...
val json = ... //Get your json
val imported = mapper.readValue(json, classOf[Thing]) // Thing being whatever class you're importing to.
#Nathaniel Ford, thanks for setting me on the right path!
I ended up looking at the source code for the parse() method (which is what I should have done in the first place). This works:
import com.fasterxml.jackson.core.JsonParser
import com.fasterxml.jackson.databind.ObjectMapper
import org.json4s._
import org.json4s.jackson.Json4sScalaModule
val jsonString = """{"price": NaN}"""
val mapper = new ObjectMapper()
// Configure NaN here
mapper.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, true)
mapper.registerModule(new Json4sScalaModule)
val json = mapper.readValue(jsonString, classOf[JValue])
While the answers above are still correct, what should be amended is, that since Jackson 2.10 JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS is deprecated.
The sustainable way for configuring correct NaN handling is the following:
val mapper = JsonMapper.builder().enable(JsonReadFeature.ALLOW_NON_NUMERIC_NUMBERS).build();
// now your parsing

scala json4s, can't convert LocalDate

I'm having an issue with org.json4s (scala), joda.time.LocalDate and org.json4s.ext.JodaTimeSerializers. Given that JodaTimeSerializers.all has a LocalDate conversion in it, i was hoping that i could do the following, but I get the exception shown after
scala> import org.json4s.JString
import org.json4s.JString
scala> import org.joda.time.LocalDate
import org.joda.time.LocalDate
scala> import org.json4s.ext.JodaTimeSerializers
import org.json4s.ext.JodaTimeSerializers
scala> import org.json4s._
import org.json4s._
scala> implicit val formats: Formats = DefaultFormats ++ JodaTimeSerializers.all
formats: org.json4s.Formats = org.json4s.Formats$$anon$3#693d3d7f
scala> val jDate = JString("2016-01-26")
jDate: org.json4s.JsonAST.JString = JString(2016-01-26)
scala> jDate.extract[LocalDate]
org.json4s.package$MappingException: Can't convert JString(2016-01-26) to class org.joda.time.LocalDate
Other the other hand, this works (not surprisingly)
scala> val jodaDate = LocalDate.parse(jDate.values)
I've tried to create a custom Serializer, which never gets called b/c it falls into the JodaSerializer realm it seems. I have also created a custom Deserializer that will work with Java.time.LocalDate (int and bytes from strings), but java.time.LocalDate messes with some other code which is likely a different question...this one is: I'm looking for clues by JodaTimeSerializers.all can not parse JString(2016-01-26), or any date string.
The top of the exception is: org.json4s.package$MappingException:
Can't convert JString(2016-01-01) to class org.joda.time.LocalDate (JodaTimeSerializers.scala:126)
Edit
This is still biting me, so dug a bit further and its reproducible with the following.
import org.joda.time.LocalDate
import org.json4s.ext.JodaTimeSerializers
import org.json4s._
implicit val formats: Formats = DefaultFormats ++ JodaTimeSerializers.all
import org.joda.time.LocalDate
case class MyDate(myDate: LocalDate)
val stringyDate =
"""
{
"myDate" : "2016-01-01"
}
"""
import org.json4s.jackson.JsonMethods.parse
parse(stringyDate).extract[MyDate]
org.json4s.package$MappingException: No usable value for myDate
Can't convert JString(2016-01-01) to class org.joda.time.LocalDate
This seems to happen b/c on line 125 of JodaTimeSerializers.scala, its not a JObject, it is a JString, so it falls into the value case on line 126, which throws the error.
Adding this here in case it bites someone else and hopefully get some assistance fixing it...but now i'm late. I have moved the code locally hopefully to come up with a fix tomorrow.
This works. I define a custom serializer for LocalDate.
import org.json4s.JString
import org.joda.time.LocalDate
import org.json4s._
case object LocalDateSerializer
extends CustomSerializer[LocalDate](
format =>
({
case JString(s) => LocalDate.parse(s)
}, Map() /* TO BE IMPLEMENTED */)
)
implicit val formats: Formats = DefaultFormats + LocalDateSerializer
val jDate = JString("2016-01-26")
jDate.extract[LocalDate] // res173: org.joda.time.LocalDate = 2016-01-26
The new serializers are included in the library, but not in the default formats:
implicit val formats: Formats = DefaultFormats ++ JavaTimeSerializers.all

How do you deserialize immutable collection using Kryo?

How do you deserialize immutable collection using Kryo ? Do I need to register something in addition to what I've done ?
Here is my sample code
import com.esotericsoftware.kryo.Kryo
import com.esotericsoftware.kryo.io.Input
import com.esotericsoftware.kryo.io.Output
import com.romix.scala.serialization.kryo._
val kryo = new Kryo
// Serialization of Scala maps like Trees, etc
kryo.addDefaultSerializer(classOf[scala.collection.Map[_,_]], classOf[ScalaMapSerializer])
kryo.addDefaultSerializer(classOf[scala.collection.generic.MapFactory[scala.collection.Map]], classOf[ScalaMapSerializer])
// Serialization of Scala sets
kryo.addDefaultSerializer(classOf[scala.collection.Set[_]], classOf[ScalaSetSerializer])
kryo.addDefaultSerializer(classOf[scala.collection.generic.SetFactory[scala.collection.Set]], classOf[ScalaSetSerializer])
// Serialization of all Traversable Scala collections like Lists, Vectors, etc
kryo.addDefaultSerializer(classOf[scala.collection.Traversable[_]], classOf[ScalaCollectionSerializer])
val filename = "c:\\aaa.bin"
val ofile = new FileOutputStream(filename)
val output2 = new BufferedOutputStream(ofile)
val output = new Output(output2)
kryo.writeClassAndObject(output, List("Test1", "Test2"))
output.close()
val ifile = new FileInputStream(filename)
val input = new Input(new BufferedInputStream(ifile))
val deserialized = kryo.readClassAndObject(input)
input.close()
It throws exception
Exception in thread "main" com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): scala.collection.immutable.$colon$colon
Try this out as it worked for me:
import com.esotericsoftware.kryo.Kryo
import com.esotericsoftware.kryo.io.Input
import com.esotericsoftware.kryo.io.Output
import com.romix.scala.serialization.kryo._
import org.objenesis.strategy.StdInstantiatorStrategy
val kryo = new Kryo
kryo.setRegistrationRequired(false)
kryo.setInstantiatorStrategy(new StdInstantiatorStrategy());
kryo.register(classOf[scala.collection.immutable.$colon$colon[_]],60)
kryo.register(classOf[scala.collection.immutable.Nil$],61)
kryo.addDefaultSerializer(classOf[scala.Enumeration#Value], classOf[EnumerationSerializer])
val filename = "c:\\aaa.bin"
val ofile = new FileOutputStream(filename)
val output2 = new BufferedOutputStream(ofile)
val output = new Output(output2)
kryo.writeClassAndObject(output, List("Test1", "Test2"))
output.close()
val ifile = new FileInputStream(filename)
val input = new Input(new BufferedInputStream(ifile))
val deserialized = kryo.readClassAndObject(input)
input.close()
As an FYI, I got this working by looking at the unit tests for the romix library and then doing what they were doing.