Decoding a nested json using circe - scala

Hi I am trying to write a decoder for a nested json using circe in scala3 but can't quite figure out how. The json I have looks something like this:
[{
"id" : "something",
"clientId" : "something",
"name" : "something",
"rootUrl" : "something",
"baseUrl" : "something",
"surrogateAuthRequired" : something boolean,
"enabled" : something boolean,
"alwaysDisplayInConsole" : someBoolean,
"clientAuthenticatorType" : "client-secret",
"redirectUris" : [
"/realms/WISEMD_V2_TEST/account/*"
],
"webOrigins" : [
],
.
.
.
.
"protocolMappers" : [
{
"id" : "some Id",
"name" : "something",
"protocol" : "something",
"protocolMapper" : "something",
"consentRequired" : someBoolean,
"config" : {
"claim.value" : "something",
"userinfo.token.claim" : "someBoolean",
"id.token.claim" : "someBoolean",
"access.token.claim" : "someBoolean",
"claim.name" : "something",
"jsonType.label" : "something",
"access.tokenResponse.claim" : "something"
},
{
"id" : "some Id",
"name" : "something",
"protocol" : "something",
"protocolMapper" : "something",
"consentRequired" : someBoolean,
"config" : {
"claim.value" : "something",
"userinfo.token.claim" : "someBoolean",
"id.token.claim" : "someBoolean",
"access.token.claim" : "someBoolean",
"claim.name" : "something",
"jsonType.label" : "something",
"access.tokenResponse.claim" : "something"
},
.
.
}
],
}]
What I want is my decoder to have list of protocolMappers with name and claim.value. something like List(ProtocolMappers("something", Configs("something")),ProtocolMappers("something", Configs("something")))
The case class I have consists of just the needed keys and looks something like this
case class ClientsResponse (
id: String,
clientId: String,
name: String,
enabled: Boolean,
alwaysDisplayInConsole: Boolean,
redirectUris: Seq[String],
directAccessGrantsEnabled: Boolean,
publicClient: Boolean,
access: Access,
protocolMappers : List[ProtocolMappers]
)
case class ProtocolMappers (
name: String,
config: Configs
)
case class Configs (
claimValue: String
)
And my decoder looks something like this:
given clientsDecoder: Decoder[ClientsResponse] = new Decoder[ClientsResponse] {
override def apply(x: HCursor) =
for {
id <- x.downField("id").as[Option[String]]
clientId <- x.downField("clientId").as[Option[String]]
name <- x.downField("name").as[Option[String]]
enabled <- x.downField("enabled").as[Option[Boolean]]
alwaysDisplayInConsole <- x
.downField("alwaysDisplayInConsole")
.as[Option[Boolean]]
redirectUris <- x.downField("redirectUris").as[Option[Seq[String]]]
directAccessGrantsEnabled <- x
.downField("directAccessGrantsEnabled")
.as[Option[Boolean]]
publicClient <- x.downField("publicClient").as[Option[Boolean]]
access <- x.downField("access").as[Option[Access]]
protocolMapper <- x.downField("protocolMappers").as[Option[List[ProtocolMappers]]]
} yield ClientsResponse(
id.getOrElse(""),
clientId.getOrElse(""),
name.getOrElse(""),
enabled.getOrElse(false),
alwaysDisplayInConsole.getOrElse(false),
redirectUris.getOrElse(Seq()),
directAccessGrantsEnabled.getOrElse(false),
publicClient.getOrElse(false),
access.getOrElse(Access(false, false, false)),
protocolMapper.getOrElse(List(ProtocolMappers("", Configs(""))))
)
}
given protocolMapperDecoder: Decoder[ProtocolMappers] = new Decoder[ProtocolMappers] {
override def apply(x: HCursor) =
for {
protocolName <- x.downField("protocolMappers").downField("name").as[Option[String]]
configs <- x.downField("protocolMappers").downField("config").as[Option[Configs]]
claimValue <- x.downField("protocolMappers").downField("config").downField("claim.value").as[Option[String]]
}yield ProtocolMappers(protocolName.getOrElse(""), configs.getOrElse(Configs("")))
}
given configsDecoder: Decoder[Configs] = new Decoder[Configs] {
override def apply(x: HCursor) =
for {
claimValue <- x.downField("protocolMappers").downField("config").downField("claim.value").as[Option[String]]
}yield Configs(claimValue.getOrElse(""))
}
but it just returns empty strings. Can you please help me on how to do this?

Related

Join data-frame based on value in list of WrappedArray

I have to join two spark data-frames in Scala based on a custom function. Both data-frames have the same schema.
Sample Row of data in DF1:
{
"F1" : "A",
"F2" : "B",
"F3" : "C",
"F4" : [
{
"name" : "N1",
"unit" : "none",
"count" : 50.0,
"sf1" : "val_1",
"sf2" : "val_2"
},
{
"name" : "N2",
"unit" : "none",
"count" : 100.0,
"sf1" : "val_3",
"sf2" : "val_4"
}
]
}
Sample Row of data in DF2:
{
"F1" : "A",
"F2" : "B",
"F3" : "C",
"F4" : [
{
"name" : "N1",
"unit" : "none",
"count" : 80.0,
"sf1" : "val_5",
"sf2" : "val_6"
},
{
"name" : "N2",
"unit" : "none",
"count" : 90.0,
"sf1" : "val_7",
"sf2" : "val_8"
},
{
"name" : "N3",
"unit" : "none",
"count" : 99.0,
"sf1" : "val_9",
"sf2" : "val_10"
}
]
}
RESULT of Joining these sample rows:
{
"F1" : "A",
"F2" : "B",
"F3" : "C",
"F4" : [
{
"name" : "N1",
"unit" : "none",
"count" : 80.0,
"sf1" : "val_5",
"sf2" : "val_6"
},
{
"name" : "N2",
"unit" : "none",
"count" : 100.0,
"sf1" : "val_3",
"sf2" : "val_4"
},
{
"name" : "N3",
"unit" : "none",
"count" : 99.0,
"sf1" : "val_9",
"sf2" : "val_10"
}
]
}
The result is:
full-outer-join based on value of "F1", "F2" and "F3" +
join of "F4" keeping unique nodes(use name as id) with max value of "count"
I am not very familiar with Scala and have been struggling with this for more than a day now. Here is what I have gotten to so far:
val df1 = sqlContext.read.parquet("stack_a.parquet")
val df2 = sqlContext.read.parquet("stack_b.parquet")
val df4 = df1.toDF(df1.columns.map(_ + "_A"):_*)
val df5 = df2.toDF(df1.columns.map(_ + "_B"):_*)
val df6 = df4.join(df5, df4("F1_A") === df5("F1_B") && df4("F2_A") === df5("F2_B") && df4("F3_A") === df5("F3_B"), "outer")
def joinFunction(r:Row) = {
//Need the real-deal here!
//print(r(3)) //-->Any = WrappedArray([..])
//also considering parsing as json to do the processing but not sure about the performance impact
//val parsed = JSON.parseFull(r.json) //then play with parsed
r.toSeq //
}
val finalResult = df6.rdd.map(joinFunction)
finalResult.collect
I was planning to add the custom merge logic in joinFunction but I am struggling to convert the WrappedArray/Any class to something I can work with.
Any inputs on how to do the conversion or the join in a better way will be very helpful.
Thanks!
Edit (7 Mar, 2021)
The full-outer join actually has to be performed only on "F1".
Hence, using #werner's answer, I am doing:
val df1_a = df1.toDF(df1.columns.map(_ + "_A"):_*)
val df2_b = df2.toDF(df2.columns.map(_ + "_B"):_*)
val finalResult = df1_a.join(df2_b, df1_a("F1_A") === df2_b("F1_B"), "full_outer")
.drop("F1_B")
.withColumn("F4", joinFunction(col("F4_A"), col("F4_B")))
.drop("F4_A", "F4_B")
.withColumn("F2", when(col("F2_A").isNull, col("F2_B")).otherwise(col("F2_A")))
.drop("F2_A", "F2_B")
.withColumn("F3", when(col("F3_A").isNull, col("F3_B")).otherwise(col("F3_A")))
.drop("F3_A", "F3_B")
But I am getting this error. What am I missing..?
You can implement the merge logic with the help of an udf:
//case class to define the schema of the udf's return value
case class F4(name: String, unit: String, count: Double, sf1: String, sf2: String)
val joinFunction = udf((a: Seq[Row], b: Seq[Row]) =>
(a ++ b).map(r => F4(r.getAs[String]("name"),
r.getAs[String]("unit"),
r.getAs[Double]("count"),
r.getAs[String]("sf1"),
r.getAs[String]("sf2")))
//group the elements from both arrays by name
.groupBy(_.name)
//take the element with the max count from each group
.map { case (_, d) => d.maxBy(_.count) }
.toSeq)
//join the two dataframes
val finalResult = df1.withColumnRenamed("F4", "F4_A").join(
df2.withColumnRenamed("F4", "F4_B"), Seq("F1", "F2", "F3"), "full_outer")
//call the merge function
.withColumn("F4", joinFunction('F4_A, 'F4_B))
//drop the the intermediate columns
.drop("F4_A", "F4_B")

SCALA: Reading JSON file with the path provided

I have scenario where I will be getting different JSON result from multiple API's, I need to read specific value from the response.
For instance my JSON response is as below, now I need a format from user to provider by which I can read the value of Lat, Don't want hard-coded approach for this, user can provided a node to read in some other json file or txt file:
{
"name" : "Watership Down",
"location" : {
"lat" : 51.235685,
"long" : -1.309197
},
"residents" : [ {
"name" : "Fiver",
"age" : 4,
"role" : null
}, {
"name" : "Bigwig",
"age" : 6,
"role" : "Owsla"
} ]
}
You can get the key of json using scala JSON parser as below. Im defining a function to get the lat, which you can make generic as per your need, so that you just need to change the function.
import scala.util.parsing.json.JSON
val json =
"""
|{
| "name" : "Watership Down",
| "location" : {
| "lat" : 51.235685,
| "long" : -1.309197
| },
| "residents" : [ {
| "name" : "Fiver",
| "age" : 4,
| "role" : null
| }, {
| "name" : "Bigwig",
| "age" : 6,
| "role" : "Owsla"
| } ]
|}
""".stripMargin
val jsonObject = JSON.parseFull(json).get.asInstanceOf[Map[String, Any]]
val latLambda : (Map[String, Any] => Option[Double] ) = _.get("location")
.map(_.asInstanceOf[Map[String, Any]]("lat").toString.toDouble)
assert(latLambda(jsonObject) == Some(51.235685))
The expanded version of function,
val latitudeLambda = new Function[Map[String, Any], Double]{
override def apply(input: Map[String, Any]): Double = {
input("location").asInstanceOf[Map[String, Any]]("lat").toString.toDouble
}
}
Make the function generic so that once you know what key you want from the JSON, just change the function and apply the JSON.
Hope it helps. But there are nicer APIs out there like Play JSON lib. You simply can use,
import play.api.libs.json._
val jsonVal = Json.parse(json)
val lat = (jsonVal \ "location" \ "lat").get

elastic4s geodistance sort query syntax

I am using elastic4s version 1.6.2 and need to write a query which searches for a given geo location, sorts results by distance and also returns the distance.
I can do this using get request in curl but struggling to find the correct syntax and example got geolocation sort.
case class Store(id: Int, name: String, number: Int, building: Option[String] = None, street: String, postCode: String,county: String, geoLocation: GeoLocation)
case class GeoLocation(lat: Double, lon: Double)
val createMappings = client.execute {
create index "stores" mappings (
"store" as(
"store" typed StringType,
"geoLocation" typed GeoPointType
)
)
}
def searchByGeoLocation(geoLocation: GeoLocation) = {
client.execute {
search in "stores" -> "store" postFilter {
geoDistance("geoLocation") point(geoLocation.lat, geoLocation.lon) distance(2, KILOMETERS)
}
}
}
Does someone knows how to add the sort by distance from a geo location and get the distance
Following curl command works as expected
curl -XGET 'http://localhost:9200/stores/store/_search?pretty=true' -d '
{
"sort" : [
{
"_geo_distance" : {
"geoLocation" : {
"lat" : 52.0333793839746,
"lon" : -0.768937531935448
},
"order" : "asc",
"unit" : "km"
}
}
],
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "20km",
"geoLocation" : {
"lat" : 52.0333793839746,
"lon" : -0.768937531935448
}
}
}
}
}
}'
Not an expert in elasti4s but this query should be equivalent to your curl command:
val geoLocation = GeoLocation(52.0333793839746, -0.768937531935448)
search in "stores" -> "store" query {
filteredQuery filter {
geoDistance("geoLocation") point(geoLocation.lat, geoLocation.lon) distance(20, KILOMETERS)
}
} sort {
geoSort("geoLocation") point (geoLocation.lat, geoLocation.lon) order SortOrder.ASC
}
it prints the following query:
{
"query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"geo_distance" : {
"geoLocation" : [ -0.768937531935448, 52.0333793839746 ],
"distance" : "20.0km"
}
}
}
},
"sort" : [ {
"_geo_distance" : {
"geoLocation" : [ {
"lat" : 52.0333793839746,
"lon" : -0.768937531935448
} ]
}
} ]
}
asc is the default value of sorting so you can remove order SortOrder.ASC. Just wanted to be explicit in this example.

Scala Salat BasicDBObject cannot be cast to

I've spent so much time and still don't understand what the problem here.
So I have a collection that data looks like:
{ "_id" : "someId", "employment" : { "data" : [
{ "retrieved" : { "$date" : "2015-03-12T14:39:41.214Z"} , "value" : { "city" : "someSity" , "fromMonth" : 0 , "name" : "someName" , "fromYear" : 2011 , "toMonth" : 0 , "speciality" : "someSpeciality"}},
{ "retrieved" : { "$date" : "2015-03-12T14:39:41.214Z"} , "value" : { "city" : "someSity" , "fromMonth" : 7 , "name" : "someName" , "fromYear" : 2013 , "toMonth" : 7 , "toYear" : 2014 , "speciality" : "someSpeciality"}},
{ "retrieved" : { "$date" : "2015-03-12T14:39:41.214Z"} , "value" : { "city" : "someSity" , "fromMonth" : 10 , "name" : "someName" , "fromYear" : 2010 , "toMonth" : 10 , "toYear" : 2010 , "speciality" : "someSpeciality"}}
{ "retrieved" : { "$date" : "2015-03-12T14:39:41.214Z"} , "value" : { "fromMonth" : 2 , "name" : "someName" , "fromYear" : 2007 , "toMonth" : 2 , "toYear" : 2010 , "speciality" : "someSpeciality"}}
]}}
also I have SalatDAO for that collection:
object ProfileDAO extends SalatDAO[Profile, ObjectId](
collection = MongoFactory.getDB("profiles"))
and ofcourse a bunch of case class:
case class Profile(
#Key("_id") id: String,
employment: Option[ListField[Employment]]
case class ListField[T](
data: List[Value[T]])
case class Value[T](
value: Option[T],
retrieved: Option[Instant],
deleted: Option[Instant])
and finally Employment class:
case class Employment(
name: Option[String],
country: Option[String],
city: Option[String],
fromMonth: Option[Int],
toMonth: Option[Int],
fromYear: Option[Int],
toYear: Option[Int],
speciality: Option[String]
)
Byt when I try do something like this:
ProfileDAO.findAll().take(20).map(
profile => profile.employment.map(
employment => employment.data.map(
employmentData => employmentData.value.name)))
.foreach(println)
I get Exception: com.mongodb.BasicDBObject cannot be cast to com....Employment
Only one idea I have - some data in DBCollection don't match with Employment class, but there is Option[] evrywhere, so...
Go to where it's blowing up and print the _id value of the document that's failing to deserialize?

Casbah cas from BasicDBObject to my type

I have a collection in the database that looks like below:
Question
{
"_id" : ObjectId("52b3248a43fa7cd2bc4a2d6f"),
"id" : 1001,
"text" : "Which is a valid java access modifier?",
"questype" : "RADIO_BUTTON",
"issourcecode" : true,
"sourcecodename" : "sampleques",
"examId" : 1000,
"answers" : [
{
"id" : 1,
"text" : "private",
"isCorrectAnswer" : true
},
{
"id" : 2,
"text" : "personal",
"isCorrectAnswer" : false
},
{
"id" : 3,
"text" : "protect",
"isCorrectAnswer" : false
},
{
"id" : 4,
"text" : "publicize",
"isCorrectAnswer" : false
}
]
}
I have a case class that represents both the Question and Answer. The Question case class has a List of Answer objects. I tried converting the result of the find operation to convert the DBObject to my Answer type:
def toList[T](dbObj: DBObject, key: String): List[T] =
(List[T]() ++ dbObject(key).asInstanceOf[BasicDBList]) map { _.asInstanceOf[T]}
The result of the above operation when I call it like
toList[Answer](dbObj, "answers") map {y => Answer(y.id,y.text, y.isCorrectAnswer)}
fails with the following exception:
com.mongodb.BasicDBObject cannot be cast to domain.content.Answer
Why should it fail? Is there a way to convert the DBObject to Answer type that I want?
You have to retrieve values from BasicDBObject, cast them and then populate the Answer class:
Answer class:
case class Answer(id:Int,text:String,isCorrectAnswer:Boolean)
toList, I changed it to return List[Answer]
def toList(dbObj: DBObject, key: String): List[Answer] = dbObj.get(key).asInstanceOf[BasicDBList].map { o=>
Answer(
o.asInstanceOf[BasicDBObject].as[Int]("id"),
o.asInstanceOf[BasicDBObject].as[String]("text"),
o.asInstanceOf[BasicDBObject].as[Boolean]("isCorrectAnswer")
)
}.toList