I'm learning Circe and need help navigating json hierarchies.
Given a json defined as follows:
import io.circe._
import io.circe.parser._
val jStr = """
{
"a1": ["c1", "c2"],
"a2": [{"d1": "abc", "d2": "efg"}, {"d1": "hij", "d2": "klm"}]
}
"""
val j = parse(jStr).getOrElse(Json.Null)
val ja = j.hcursor.downField("a2").as[Json].getOrElse("").toString
ja
ja is now: [ { "d1" : "abc", "d2" : "efg" }, { "d1" : "hij", "d2" : "klm" } ]
I can now do the following to this list:
case class Song(id: String, title: String)
implicit val songDecoder: Decoder[Song] = (c: HCursor ) => for {
id <- c.downField("d1").as[String]
title <- c.downField("d2").as[String]
} yield Song(id,title)
io.circe.parser.decode[List[Song]](ja).getOrElse("")
Which returns what I want: List(Song(abc,efg), Song(hij,klm))
My questions are as follows:
How do I add item a1.c1 from the original json (j) to each item retrieved from the array? I want to add it to Song modified as follows: case class Song(id: String, title: String, artist: String)
It seems wrong to turn the json object back into a String for the iterative step of retrieving id and title. Is there a way to do this without turning json into String?
Related
I'm new in Scala and trying to achieve the following using Scala's foldLeft() method or any other functional solution:
I have the following JSON:
{
"aspects": [
{
"name": "Name",
"values": [
"Phone"
]
},
{
"name": "Color",
"values": [
"Red",
"Black"
]
},
{
"name": "Size",
"values": [
"6",
"10"
]
}
]
}
I want to convert this into the following Seq<String>:
["Name:::Phone", "Color:::Red", "Color:::Black", "Size:::6", "Size:::10"]
I did that using Java style where aspects is an object representing the JSON:
aspects.foreach(pair => {
pair.values.foreach(value => {
valuesList += pair.name + ":::" + value
})
})
What is the best Scala's way to do this?
It depends on the JSON library you are using, but once you have the aspects data in a suitable format the output can be generated is like this:
case class Aspect(name: String, values: Seq[String])
val aspects: Seq[Aspect] = ???
aspects.flatMap(a => a.values.map(a.name + ":::" + _))
I use json4s and jackson and the conversion code is basically just
val aspects = parse(json).extract[Seq[Aspect]]
Check the documentation for details, and check out other JSON libraries which may be more suitable for your application.
Have not parsed the JSON since the object aspects has these values. Just manually constructing the case class and flatmapping it
scala> final case class Aspects(pairs:Seq[Pair])
defined class Aspects
val namepair = Pair("Name",Seq("Phone"))
namepair: Pair = Pair(Name,List(Phone))
val colorpair = Pair("Color",Seq("Red","Black"))
colorpair: Pair = Pair(Color,List(Red, Black))
val sizepair = Pair("Size",Seq("6","10"))
sizepair: Pair = Pair(Size,List(6, 10))
val aspects = Aspects(Seq(namepair,colorpair,sizepair))
aspects: Aspects = Aspects(List(Pair(Name,List(Phone)), Pair(Color,List(Red, Black)), Pair(Size,List(6, 10))))
aspects.pairs.flatMap(pair=>pair.values.map(value=>s"${pair.name}:::$value"))
res1: Seq[String] = List(Name:::Phone, Color:::Red, Color:::Black, Size:::6, Size:::10)
I need to change from one List[String] to List[MyObject] in scala.
For example,JSON input is like below
employee: {
name: "test",
employeeBranch: ["CSE", "IT", "ECE"]
}
Output should be like this,
Employee: {
Name: "test",
EmployeeBranch:[{"branch": "CSE"}, {"branch": "IT"}, {"branch": "ECE"}]
}
Input case class:
Class Office(
name: Option[String],
employeeBranch: Option[List[String]])
Output case class:
Class Output(
Name: Option[String],
EmployeeBranch: Option[List[Branch]])
case class Branch(
branch: Option[String])
This is the requirement.
It is hard to answer without knowing details of the particular JSON library, but an Object is probably represented as a Map. So to convert a List[String] to a List[Map[String, String]] you can do this:
val list = List("CSE", "IT", "ECE")
val map = list.map(x => Map("branch" -> x))
This gives
List(Map(branch -> CSE), Map(branch -> IT), Map(branch -> ECE))
which should convert to the JSON you want.
I have a use case where I need to read a json file or json string using spark as Dataset[T] in scala. The json file has nested elements and some of the elements in the json are optional. I am able to read the json file and map those to case class if I ignore optional fields in the json as the schema matches with the case class.
According to this link and answer it works for first level json when case class have option field but if it is there is nested element it does not work.
Json String that I am using is as below :
val jsonString = """{
"Input" :
{
"field1" : "Test1",
"field2" : "Test2",
"field3Array" : [
{
"key1" : "Key123",
"key2" : ["keyxyz","keyAbc"]
}
]
},
"Output":
{
"field1" : "Test2",
"field2" : "Test3",
"requiredKey" : "RequiredKeyValue",
"field3Array" : [
{
"key1" : "Key123",
"key2" : ["keyxyz","keyAbc"]
}
]
}
}"""
The case class that I have created are as below :
case class InternalFields (key1: String, key2 : Array[String])
case class Input(field1:String, field2: String,field3Array : Array[InternalFields])
case class Output(field1:String, field2: String,requiredKey : String,field3Array : Array[InternalFields])
case class ExternalObject(input : Input, output : Output)
The code through which I am reading the jsonString is as below :
val df = spark.read.option("multiline","true").json(Seq(jsonString).toDS).as[ExternalObject]
The above code works perfectly fine. Now when I add a optional field in the Output case class as json string could have it to support some use case it throws an error saying that the optional field that I have specified in the case class is missing.
So in order to get around this I went ahead and tried providing schema using encoders and see if that works.
After adding optional field my case class got changed to as below :
case class InternalFields (key1: String, key2 : Array[String])
case class Input(field1:String, field2: String,field3Array : Array[InternalFields])
case class Output(field1:String, field2: String,requiredKey : String, optionalKey : Option[String],field3Array : Array[InternalFields]) //changed
case class ExternalObject(input : Input, output : Output)
There is one additional optional field added in Output case class.
Now I am trying to read the jsonString as below :
import org.apache.spark.sql.Encoders
val schema = Encoders.product[ExternalObject].schema
val df = spark.read
.schema(schema)
.json(Seq(jsonString).toDS)
.as[ExternalObject]
When I do df.show or display(df) it gives me output table as below which is null for both input column as well as output column.
If I remove that optional field from the case class then this code also works fine and shows me the expected output.
Is there any way by which I can make this optional field in the inner json or inner case class work and cast it directly to respective case class inside dataset[T].
Any ideas, guidance, suggestions that can make it work would be of great help.
The problem is that spark uses struct types to map a class to a Row, take this as an example:
case class MyRow(a: String, b: String, c: Option[String])
Can spark create a dataframe, which sometimes has column c and sometimes not? like:
+-----+-----+-----+
| a | b | c |
+-----+-----+-----+
| a1 | b1 | c1 |
+-----+-----+-----+
| a2 | b2 | <-- note the non-existence here :)
+-----+-----+-----+
| a3 | b3 | c3 |
+-----+-----+-----+
Well it cannot, and being nullable, means the key has to exist, but the value can be null:
... other key values
"optionalKey": null,
...
This is considered to be valid, and is convertible to your structs. I suggest you use a dedicated JSON library (as you know there are many of them out there), and use udf's or something to extract what you need from json.
I tested the above code base with the following case class structres
case class Field3Array(
key1: String,
key2: List[String]
)
case class Input(
field1: String,
field2: String,
field3Array: List[Field3Array]
)
case class Output(
field1: String,
field2: String,
requiredKey: String,
field3Array: List[Field3Array]
)
case class Root(
Input: Input,
Output: Output
)
The Json string cannot be directly passed to the DataFrameReader as you have tried since the json method expects a path.
I put the JSON string in a file and passed the file path to the DataFrameReader and the results were as follows
import org.apache.spark.sql.{Encoder,Encoders}
import org.apache.spark.sql.Dataset
case class Field3Array(
key1: String,
key2: List[String]
)
case class Input(
field1: String,
field2: String,
field3Array: List[Field3Array]
)
case class Output(
field1: String,
field2: String,
requiredKey: String,
field3Array: List[Field3Array]
)
case class Root(
Input: Input,
Output: Output
)
val pathToJson: String = "file:////path/to/json/file/on/local/filesystem"
val jsEncoder: Encoder[Root] = Encoders.product[Root]
val df: Dataset[Root] = spark.read.option("multiline","true").json(pathToJson).as[Root]
The results for show are as follows:
df.show(false)
+--------------------------------------------+--------------------------------------------------------------+
|Input |Output |
+--------------------------------------------+--------------------------------------------------------------+
|[Test1, Test2, [[Key123, [keyxyz, keyAbc]]]]|[Test2, Test3, [[Key123, [keyxyz, keyAbc]]], RequiredKeyValue]|
+--------------------------------------------+--------------------------------------------------------------+
df.select("Input.field1").show()
+------+
|field1|
+------+
| Test1|
+------+
Let say I have a config file with the following:
someConfig: [
{"t1" :
[ {"t11" : "v11",
"t12" : "v12",
"t13" : "v13",
"t14" : "v14",
"t15" : "v15"},
{"t21" : "v21",
"t22" : "v22",
"t23" : "v13",
"t24" : "v14",
"t25" : "v15"}]
},
"p1" :
[ {"p11" : "k11",
"p12" : "k12",
"p13" : "k13",
"p14" : "k14",
"p15" : "k15"},
{"p21" : "k21",
"p22" : "k22",
"p23" : "k13",
"p24" : "k14",
"p25" : "k15"}]
}
]
I would like to retrieve it as a Scala immutable collection Map[List[Map[String, String]]].
using the following code I am only able to retrieve it as a List of HashMaps (more precisely a $colon$colon of HashMap) which fails when I try to iterate trough it. Ideally to complete my code I need a way to convert the HashMap to scala maps
def example: Map[String, List[Map[String,String]]] = {
val tmp = ConfigFactory.load("filename.conf")
val mylist : Iterable[ConfigObject] = tmp.getObjectList("someConfig")
.asScala
(for {
item : ConfigObject <- mylist
myEntry: Entry[String, ConfigValue] <- item.entrySet().asScala
name = entry.getKey
value = entry.getValue.unwrapped()
.asInstanceOf[util.ArrayList[Map[String,String]]]
.asScala.toList
} yield (name, value)).toMap
}
This code should be able to give you what you are looking for.
It builds up lists and maps for your bespoke structure.
The final reduceLeft, is because your json starts with a list, someConfig: [ ], and so I've flattened that out. If you wanted you could probably have removed the [ ]'s, as they as probably not required to represent the data you have.
//These methods convert from Java lists/maps to Scala ones, so its easier to use
private def toMap(hashMap: AnyRef): Map[String, AnyRef] = hashMap.asInstanceOf[java.util.Map[String, AnyRef]].asScala.toMap
private def toList(list: AnyRef): List[AnyRef] = list.asInstanceOf[java.util.List[AnyRef]].asScala.toList
val someConfig: Map[String, List[Map[String, String]]] =
config.getList("someConfig").unwrapped().map { someConfigItem =>
toMap(someConfigItem) map {
case (key, value) =>
key -> toList(value).map {
x => toMap(x).map { case (k, v) => k -> v.toString }
}
}
}.reduceLeft(_ ++ _)
if you stroe your configs in the application.conf like this
someConfig{
list1{
value1 = "myvalue1"
value2 = "myvalue2"
.....
valueN = "myvalueN"
}
list2{
......
}
.....
listN{
......
}
}
you can do the following:
val myconfig = ConfigFactory.load().getObject("someConfig.list1").toConfig
and after you can acces the values like
myconfig.getString("value1")
myconfig.getString("value2")
etc.
which will return the strings "myvalue1", "myvalue2"
not the most elegant way but plain easy
I have a user object as follows:
{ user: "joe", acks: ["a", "b" ] }
I want to add a set of strings to the acks field. Here's my attempt to do this with one update:
def addSomeAcks(toBeAcked = Array[String])
DB.getCollection("userAcks").update(
MongoDBObject("user" -> "joe"),
$addToSet("acks") $each toBeAcked
)
}
def test() {
addSomeAcks(Set("x", "y", "z"))
}
When I run this code I get an embedded set as follows:
{ user: "joe", acks: ["a", "b", ["x", "y", "z" ] ] }
but the result I want is:
{ user: "joe", acks: ["a", "b", "x", "y", "z" ] }
I can make it work by calling update for each item in toBeAcked, is there a way to do this in one call?
The problem is that $each takes a variable number of arguments, not a collection type like Traversable. Because of that it treats the set that you pass as a single element and adds it to array as such. This leads to nesting as you observe. You need to unwrap it this way: $each(toBeAcked: _*) or pass each elem separately $each("x", "y", "z").
Here is a complete example that works as you'd expect it to:
package com.example
import com.mongodb.casbah.Imports._
object TestApp extends App {
val col = MongoConnection()("test")("userAcks")
def printAll(): Unit =
col.find().foreach(println)
def insertFirst(): Unit =
col.insert(MongoDBObject("user" -> "joe", "acks" -> List("a", "b")))
def addSomeAcks(toBeAcked: Seq[String]): Unit =
col.update(
MongoDBObject("user" -> "joe"),
$addToSet("acks") $each (toBeAcked: _*))
printAll()
insertFirst()
printAll()
addSomeAcks(Seq("x", "y", "z"))
printAll()
}