Spark Scala - How to construct Scala Map from nested JSON? - scala

I've a nested json data with nested fields that I want to extract and construct a Scala Map.
Heres the sample JSON:
"nested_field": [
{
"airport": "sfo",
"score": 1.0
},
{
"airport": "phx",
"score": 1.0
},
{
"airport": "sjc",
"score": 1.0
}
]
I want to use saveToES() and construct a Scala Map to index the field into ES index with mapping as below:
"nested_field": {
"properties": {
"score": {
"type": "double"
},
"airport": {
"type": "keyword",
"ignore_above": 1024
}
}
}
The json file is read into the dataframe using spark.read.json("example.json"). Whats the right way to construct the Scala Map in this case?
Thanks for any help!

You can do it using the below sample code
import org.json4s.DefaultFormats
import org.json4s.jackson.JsonMethods.parse
case class AirPortScores(airport: String, score: Double)
case class JsonRulesHandler(airports: List[AirPortScores])
val jsonString: String = """{"airports":[{"airport":"sfo","score":1},{"airport":"phx","score":1},{"airport":"sjc","score":1}]}"""
def loadJsonString(JsonString: String): JsonRulesHandler = {
implicit val formats: DefaultFormats.type = org.json4s.DefaultFormats
parse(JsonString).extract[JsonRulesHandler]
}
val parsedJson: JsonRulesHandler = loadJsonString(jsonString)
parsedJson.airports.foreach(println)//you can select parsedJson.airport or scores
//below ouput
AirPortScores(sfo,1.0)
AirPortScores(phx,1.0)
AirPortScores(sjc,1.0)

Related

use JsLookup for multiple level array using play Json for lists with objects and simple lists

i have a function that exctract from a json list of elements based on the path i give it.
the func looks like this:
def findAllValuesAtPath(jsValue: JsObject, path: String): List[JsValue] = {
val jsPath = JsPath(path
.split("\\[\\*]\\.")
.flatMap(s => s.split("\\.")
.map(RecursiveSearch)
).toList)
jsPath(jsValue)
}
for example:
{
"person": {
"kids": [
{
"name": "josh",
"age": 5
},
{
"name": "julia",
"age": 13
}
]
}
}
now if i give the path - "person.kids[*].name" I will get list of their names List("josh", "julia") which its what i want.
but if the list of kids were just simple list like:
{
"person": {
"kids": [
"josh",
"julia"
]
}
}
and i will give the path - "person.kids[*]" I will get empty list List() and I want to get this as List("josh", "julia") (without the brakets)
do you see any way to improve my func to handle both cases?

Scala & json4s: How do I filter a json array

Array example:
[
{
"name": "John"
},
{
"name": "Joseph"
},
{
"name": "Peter"
}
]
I'd like to filter off objects with names which are not starting with Jo:
[
{
"name": "John"
},
{
"name": "Joseph"
}
]
The result might be a String or JValue with json array inside.
I was not able to find a direct JSON query mechanism in json4s hence created a case class.
Mappd the JSON -> filtered it -> wrote it back to JSON
import org.json4s.jackson.JsonMethods.parse
import org.json4s.jackson.Serialization
import org.json4s.native.Serialization.write
import org.json4s.{Formats, ShortTypeHints}
object JsonFIlter {
def main(args: Array[String]): Unit = {
implicit val formats: AnyRef with Formats = Serialization.formats(ShortTypeHints(List(classOf[PersonInfo])))
val parseJson :List[PersonInfo] = parse("""[
| {
| "name": "John"
| },
| {
| "name": "Joseph"
| },
| {
| "name": "Peter"
| }
|]""".stripMargin)
.extract[List[PersonInfo]]
val output = write(parseJson.filter(p => p.name.startsWith("Jo")))
println(output)
}
}
case class PersonInfo(name: String)

Decode a nested array with circe-optics

I have JSON like this:
"data": {
"project": {
"activityChildren": [
{
"id": 2,
"parents": [
{
"id": 1
}
]
},
]
}
}
I'd like to decode this to List[(Long, List[Long])] with circe-optics. I got as far as:
val activityParents: Map[Long, List[Long]] = root.data.activityChildren.each.json
.getAll(json)
.flatMap { activity =>
root.id.long.getOption(activity).map(_ -> root.parents.each.long.getAll(activity))
}
.toMap
I wonder whether it's possible to define a single lens for this that just turns the JSON into the desired map without explicitly mapping over the intermediate array. If so, how?

Scala - Extract a list efficiently from a map

I have a large json object: myNestedObject
{
"size": 2,
"values": [{
"name": "mullock",
"upstatus": "Green",
"details": {
"key": "rupture farms",
"server": "mudos",
"owner": "magog_cartel",
"type": "NORMAL",
"links": {
"self": [{
"address": "https://mudos.com:port/access"
}]
}
}
},{
"name": "tassadar",
"upstatus": "Orange",
"details": {
"key": "archon",
"server": "protoss",
"owner": "aspp67",
"type": "NORMAL",
"links": {
"self": [{
"address": "https://aiur.com:port/access"
}]
}
}
}],
"limit": 100
}
This is what the case classes look like as follows:
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
case class Result(size: Int, limit: Int, values: Seq[Values])
case class Values(name: String, upstatus: String, details: ValuesDetails)
case class ValuesDetails(key: String, server: String, owner: String, `type`: String, links: ValuesDetailsLinks)
case class ValuesDetailsLinks(self: Seq[ValuesDetailsLinksAddress])
case class ValuesDetailsLinksAddress(address: String)
object Foo {
val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
def test(Json): Unit
val return = mapper.readValue[Result](Json)
myNestedObject.size
is a field that gives the length of the array
myNestedObject.values(i)
Trying to extract the below values into a List[String]
myNestedObject.values(i).name
I've been using a crude for loop to extract and the code works.
val selectNames: List[String] = (
for (i <- 0 to myNestedObject.size-1 toList) yield
myNestedObject.values(i).name
)
Refactoring my code and trying to use a .map (unsuccessfully) to do the same thing, two attempts:
myNestedObject.map(_ => values(i).name)
myNestedObject.(values(i).name).asInstanceOfList
Disclaimer: I'm a complete novice at this.
SOLUTION: the values list can be accessed without specifying index
myNestedObject.values.map(_.name)
I think you could do this:
myNestedObject.values.map(_.name)
Dylan Grald's answer is correct, but you could also use an equivalent for-comprehension
for (x <- myNestedObject.values)
yield (x.name)
This desugars to the version using the map method. For simple cases like this I prefer just calling the map method directly, but I thought I would mention the for version as an alternative.

Elasticsearch nested filtering (elastic4s, scala)

I am using Elasticsearch 1.7 and elastic4s DSL.
My problem is I can't add and & or filter on a nested document.
For example, here is a JSON representation of my instance of case class Candidate:
{
"name": "Samy"
"interviews": [
{
"clientId": 0,
"stateId": "CANCELED",
},
{
"clientId": 1,
"stateId": "APPROVED"
}
]
Here is my filter :
def filtering(interviewAndCandidates: IntCand)(implicit user: PublicUser): Seq[FilterDefinition] = {
nestedFilter("interviews").filter(termFilter("clientId", user.id)) ::
List(or(interviewAndCandidates.interviews.map(state ⇒ nestedFilter("interviews").filter(termFilter("stateId", state)))))
}
Then I build the query:
var request: SearchDefinition = search in "myIndex" -> "candidate" query {
filteredQuery query {
matchAllQuery
} filter {
and(filters)
}
}
With:
case class IntCand(interviews: List[String])
case class Candidate(name: String, interviews: List[Interview])
case class Interview(clientId: Long, stateId: String)
The problem is when I filter on IntCand(List("CANCELED")) and clientId=1, the response show me the Candidate (I want filter on clientId AND interviews)
I succeed by denormalizing the data