How can I extract JSON values using for-comprehensions - scala

I want to extract JSON values usgin for-comprehensions
my code is this:
import net.liftweb.json._
val json = parse("""
{
"took": 212,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.625,
"hits": [
{
"_index": "siteindex",
"_type": "posts",
"_id": "1",
"_score": 0.625,
"_source": {
"title": "title 1",
"content": "content 1"
},
"highlight": {
"title": [
"<b>title</b> 1"
]
}
},
{
"_index": "siteindex",
"_type": "posts",
"_id": "4",
"_score": 0.19178301,
"_source": {
"title": "title 4",
"content": "content 4"
},
"highlight": {
"title": [
"<b>title</b> 4"
]
}
},
{
"_index": "siteindex",
"_type": "posts",
"_id": "2",
"_score": 0.19178301,
"_source": {
"title": "title 2",
"content": "content 2"
},
"highlight": {
"title": [
"<b>title</b> 2"
]
}
},
{
"_index": "siteindex",
"_type": "posts",
"_id": "3",
"_score": 0.19178301,
"_source": {
"title": "title 3",
"content": "content 3"
},
"highlight": {
"title": [
"<b>title</b> 3"
]
}
}
]
}
}
""")
my "case class" is this:
case class Document(title:String, content:String)
my "for" is this:
val ret: List[Document] = for {
JObject(child) <- json
JField("title", JString(title)) <- child
JField("content", JString(content)) <- child
} yield (Document( title, content ))
and my "list" is this:
ret: List[Document] = List(Document(title 1,content 1), Document(title 4,content 4), Document(title 2,content 2), Document(title 3,content 3))
until here everything is fine!
but now i need something like this:
List(Document2(1,<b>title</b> 1,content 1), Document2(4,<b>title</b> 4,content 4), Document2(2,<b>title</b> 2,content 2), Document2(3,<b>title</b> 3,content 3))
i need the value of:
"highlight": {
"title": [
"<b>*</b> *"
]
}
and this:
"_id": "*",
in my list.
my "case class" is this:
case class Document2(_id:String, title:String, content:String)
i try this, but it does not work
val ret: List[Document2] = for {
JObject(child) <- json
JField("_id", JString(_id)) <- child
JField("title", JString(title)) <- child
JField("content", JString(content)) <- child
} yield (Document2( _id, title, content ))
i don't know, if there is a better way of data extraction for this json
but the result is this:
<console>:23: warning: `withFilter' method does not yet exist on net.liftweb.json.JValue, using `filter' method instead
JObject(child) <- json
ret: List[Document2] = List()
any suggestion please
thanks for your help

Here is an answer similar to the approach from #flav, but I'll give you the structure to map to the json and how to get your end result too. First, the case classes:
case class Document2(_id:String, title:String, content:String)
case class Results(hits:HitsList)
case class HitsList(hits:List[Hit])
case class Hit(_id:String, _source:Source, highlight:Highlight)
case class Source(title:String, content:String)
case class Highlight(title:List[String])
Then, the code for parsing it and converting it:
implicit val formats = DefaultFormats
val results = json.extract[Results]
val docs2 = results.hits.hits.map{ hit =>
Document2(hit._id, hit.highlight.title.head, hit._source.content)
}

Related

play framework json lookup inside array

I have simple json:
{
"name": "John",
"placesVisited": [
{
"name": "Paris",
"data": {
"weather": "warm",
"date": "31/01/22"
}
},
{
"name": "New York",
"data": [
{
"weather": "warm",
"date": "31/01/21"
},
{
"weather": "cold",
"date": "28/01/21"
}
]
}
]
}
as you can see in this json there is placesVisited field, and if name is "New York" the "data" field is a List, and if the name is "Paris" its an object.
what I want to do is to pull the placesVisited object where "name": "New York" and then I will parse it to a case class I have, I can't use this case class for both objects in placesVisited cause they have diff types for the same name.
so what I thought is to do something like:
(myJson \ "placesVisited") and here I need to add something that will give me element where name is "New York", how can I do that?
my result should be this:
{
"name": "New York",
"data": [
{
"weather": "warm",
"date": "31/01/21"
},
{
"weather": "cold",
"date": "28/01/21"
}
]
}
something like this maybe can happen but its horrible haha:
(Json.parse(myjson) \ "placesVisited").as[List[JsObject]].find(item => {
item.value.get("name").toString.contains("New York")
}).getOrElse(throw Exception("could not find New York element")).as[NewYorkModel]
item.value.get("name").toString can slightly be simplified to (item \ "name").as[String] but otherwise there's not much to improve.
Another option is to use a case class Place(name: String, data: JsValue) and do it like this:
(Json.parse(myjson) \ "placesVisited")
.as[List[Place]]
.find(_.name == "New York")

Create JSON with class objects

I have almost ready what I want to do, however the method that converts to a JSON object does not help me to solve what is missing. I want to get the same thing, but there will be more content inside "add" and inside "firsts" and so I need them to be arrays of objects.
My code:
case class FirstIdentity(docType: String, docNumber: String, pId: String)
case class SecondIdentity(firm: String, code: String, orgType: String,
orgNumber: String, typee: String, perms: Seq[String])
case class General(id: Int, pName: String, description: String, add: Seq[SecondIdentity],
delete: Seq[String], act: String, firsts: Seq[FirstIdentity])
val someDF = Seq(
("0010XR_TYPE_6","0010XR", "222222", "6", "TYPE", "77444478", "6", 123, 1, "PF 1", "name", "description",
Seq("PERM1", "PERM2"))
).toDF("firm", "code", "org_number", "org_type", "type", "doc_number",
"doc_type", "id", "p_id", "p_name", "name", "description", "perms")
someDF.createOrReplaceTempView("vw_test")
val filter = spark.sql("""
select
firm, code, org_number, org_type, type, doc_number,
doc_type, id, p_id, p_name, name, description, perms
from vw_test
""")
val group =
filter.rdd.map(x => {
(
x.getInt(x.fieldIndex("id")),
x.getString(x.fieldIndex("p_name")),
x.getString(x.fieldIndex("description")),
SecondIdentity(
x.getString(x.fieldIndex("firm")),
x.getString(x.fieldIndex("code")),
x.getString(x.fieldIndex("org_type")),
x.getString(x.fieldIndex("org_number")),
x.getString(x.fieldIndex("type")),
x.getSeq(x.fieldIndex("perms"))
),
"act",
FirstIdentity(
x.getString(x.fieldIndex("doc_number")),
x.getString(x.fieldIndex("doc_type")),
x.getInt(x.fieldIndex("p_id")).toString
)
)
})
.toDF("id", "name", "desc", "add", "actKey", "firsts")
.groupBy("id", "name", "desc", "add", "actKey", "firsts")
.agg(collect_list("add").as("null"))
.drop("null")
group.toJSON.show(false)
result:
{
"id": 123,
"name": "PF 1",
"desc": "description",
"add": {
"firm": "0010XR_TYPE_6",
"code": "0010XR",
"orgType": "6",
"orgNumber": "222222",
"typee": "TYPE",
"perms": [
"PERM1",
"PERM2"
]
},
"actKey": "act",
"firsts": {
"docType": "77444478",
"docNumber": "6",
"pId": "1"
}
}
I want to have an array of "add" and also of "firsts"
this:
EDIT
{
"id": 123,
"name": "PF 1",
"desc": "description",
"add": [ <----
{
"firm": "0010XR_TYPE_6",
"code": "0010XR",
"orgType": "6",
"orgNumber": "222222",
"typee": "TYPE",
"perms": [
"PERM1",
"PERM2"
]
},
{
"firm": "0010XR_TYPE_6",
"code": "0010XR",
"orgType": "5",
"orgNumber": "11111",
"typee": "TYPE2",
"perms": [
"PERM1",
"PERM2"
]
}
],
"actKey": "act",
"firsts": [ <----
{
"docType": "77444478",
"docNumber": "6",
"pId": "1"
},
{
"docType": "411133",
"docNumber": "6",
"pId": "2"
}
]
}
As per your comment, you want to aggregate add depending on some grouping. Please check what all columns you want to group by. The columns which you want to Agrregate cannot be part of grouping. That will never work, and will give you always separate records.
This will work as per your expectations (I suppose):
val group =
filter.rdd.map(x => {
(
x.getInt(x.fieldIndex("id")),
x.getString(x.fieldIndex("p_name")),
x.getString(x.fieldIndex("description")),
SecondIdentity(
x.getString(x.fieldIndex("firm")),
x.getString(x.fieldIndex("code")),
x.getString(x.fieldIndex("org_type")),
x.getString(x.fieldIndex("org_number")),
x.getString(x.fieldIndex("type")),
x.getSeq(x.fieldIndex("perms"))
),
"act",
FirstIdentity(
x.getString(x.fieldIndex("doc_number")),
x.getString(x.fieldIndex("doc_type")),
x.getInt(x.fieldIndex("p_id")).toString
)
)
})
.toDF("id", "name", "desc", "add", "actKey", "firsts")
.groupBy("id", "name", "desc", "actKey")
.agg(collect_list("add").as("null"))
.drop("null")
Result:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"id":123,"name":"PF 1","desc":"description","actKey":"act","collect_list(add)":[{"firm":"0010XR_TYPE_6","code":"0010XR","orgType":"6","orgNumber":"222222","typee":"TYPE","perms":["PERM1","PERM2"]},{"firm":"0010XR_TYPE_5","code":"0010XR","orgType":"5","orgNumber":"222223","typee":"TYPE","perms":["PERM1","PERM2"]}]}|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Inside your map function, you are not mapping the FirstEntity and SecondEntity as Seq hence the add is not getting converted to array.
Change your map function to this:
filter.rdd.map(x => {
(
x.getInt(x.fieldIndex("id")),
x.getString(x.fieldIndex("p_name")),
x.getString(x.fieldIndex("description")),
Seq(SecondIdentity(
x.getString(x.fieldIndex("firm")),
x.getString(x.fieldIndex("code")),
x.getString(x.fieldIndex("org_type")),
x.getString(x.fieldIndex("org_number")),
x.getString(x.fieldIndex("type")),
x.getSeq(x.fieldIndex("perms"))
)),
"act",
Seq(FirstIdentity(
x.getString(x.fieldIndex("doc_number")),
x.getString(x.fieldIndex("doc_type")),
x.getInt(x.fieldIndex("p_id")).toString
))
)
})
Will result into this:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"id":123,"name":"PF 1","desc":"description","add":[{"firm":"0010XR_TYPE_6","code":"0010XR","orgType":"6","orgNumber":"222222","typee":"TYPE","perms":["PERM1","PERM2"]}],"actKey":"act","firsts":[{"docType":"77444478","docNumber":"6","pId":"1"}]}|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Cannot use Nested VariableOperators.mapItemsOf in Spring Data MongoDb

I'm forced to use the aggregation framework and the project operation of Spring Data MongoDb.
What I'd like to do is creating an array of object as a result of a project operation.
Considering this intermediate aggregation result:
{
"processes": [
{
"id": "101a",
"assignees": [
{
"id": "201a",
"username": "carl93"
},
{
"id": "202a",
"username": "susan"
}
]
},
{
"id": "101b",
"assignees": [
{
"id": "201a",
"username": "carl93"
},
{
"id": "202a",
"username": "susan"
}
]
}
]
}
I'm trying to get for each process, all the assignee usernames and ids. Hence, what I want to obtain is something like this:
[
{
"results": [
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101a"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101a"
},
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101b"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101b"
}
]
}
]
To reach this goal I'm using 2 nested VariableOperators.mapItemsOf obtaining:
org.springframework.data.mapping.MappingException: Cannot convert [Document{{id= 201a, value= carl93, parentObjectId= 101a}}, Document{{id= 202a, value = susan, parentObjectId= 101a}}]
of type class java.util.ArrayList into an instance of class java.lang.Object!
Implement a custom Converter<class java.util.ArrayList, class java.lang.Object> and register it with the CustomConversions.
Here's the code that I'm currently using:
new ProjectionOperation().and(
VariableOperators.mapItemsOf("processes")
.as("pr")
.andApply(
VariableOperators.mapItemsOf("$pr.ownership.assignees")
.as("ass")
.andApply(aggregationOperationContext -> {
Document document = new Document();
document.append("id", "$$ass.id");
document.append("value", "$$ass.username");
document.append("parentObjectId", "$$pr.id");
return document;
})
)
).as("results");
The code produces this:
[
[
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101a"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101a"
}
],
[
{
"id": "201a",
"value": "carl93",
"parentObjectId": "101b"
},
{
"id": "202a",
"value": "susan",
"parentObjectId": "101b"
}
]
]
As you can see there are 2 nested arrays, [[],[]]. This is the reason why the exception is thrown.
Nevertheless what I want to obtain is just one array, adding all the objects in it (possibly without duplicates or null values). I've tried the addToSet operator and other aggregtion operators, without any success.
Use $reduce with $concatArrays to join the arrays.
new ProjectionOperation().and(
ArrayOperators.arrayOf("processes")
.reduce(ArrayOperators.ConcatArrays.arrayOf("$$value").concat(
VariableOperators.mapItemsOf("$$this.ownership.assignees")
.as("ass")
.andApply(aggregationOperationContext -> {
Document document = new Document();
document.append("id", "$$ass.id");
document.append("value", "$$ass.username");
document.append("parentObjectId", "$$this.id");
return document;
})
)).startingWith(Arrays.asList())
).as("results");

Scala Map -add new key and copy value from another key

Considering 2 sets of data as follows:
JSON1=> {
"data": [
{"id": "1-abc",
"model": "Agile",
"status":"open"
"configuration": {
"state": "running",
"rootVolumeSize": "0.00000",
"count": "2",
"type": "large",
"platform": "Linux"
}
"stateId":"123-567"
}
]}
JSON2=>{
"data": [
{"id": "1-abc",
"model": "Agile",
"configuration": {
"state": "running",
"diskSize": "0",
"type": "small",
"platform":"Windows"
}
}
]}
I need to compare JSON1 and JSON2 based on the 1st field id and if they match , I need to merge JSON1 with JSON 2 retaining the existing values in JSON2( only append fields not present).
I have coded the same as below:
private def merger(JSON1: Seq[JSON], JSON2: Seq[JSON]):Seq[JSON] = {
val abcKey = JSON1.groupBy(_.id) map { case (k, v) => (k, v.head)
val mergedRecords = for {
xyzJSON<- JSON2
} yield (
abcKey.get(xyzJSON.id) match {
case Some(JSON1) => xyzJSON.copy(status = JSON1.status,
stateId = JSON1.stateId)
case None => xyzJSON.copy(origin = "N/A")
}
)
I am not able to derive at a solution for reconciling the fields within the configurationMap.
Expected result set should be like:
{
"data": [
{"id": "1-abc",
"model": "Agile",
"status":"open"
"configuration": {
"state": "running",
"diskSize": "0",
"rootVolumeSize": "0.00000",
"count": "2",
"type": "small",
"platform": "Windows",
}
"stateId":"123-567"
}
]}

Fields are empty when doing GET in elastic4s

I'm trying to implement a service in my play2 app that uses elastic4s to get a document by Id.
My document in elasticsearch:
curl -XGET 'http://localhost:9200/test/venues/3659653'
{
"_index": "test",
"_type": "venues",
"_id": "3659653",
"_version": 1,
"found": true,
"_source": {
"id": 3659653,
"name": "Salong Anna och Jag",
"description": "",
"telephoneNumber": "0811111",
"postalCode": "16440",
"streetAddress": "Kistagången 12",
"city": "Kista",
"lastReview": null,
"location": {
"lat": 59.4045675,
"lon": 17.9502138
},
"pictures": [],
"employees": [],
"reviews": [],
"strongTags": [
"skönhet ",
"skönhet ",
"skönhetssalong"
],
"weakTags": [
"Frisörsalong",
"Frisörer"
],
"reviewCount": 0,
"averageGrade": 0,
"roundedGrade": 0,
"recoScore": 0
}
}
My Service:
#Singleton
class VenueSearchService extends ElasticSearchService[IndexableVenue] {
/**
* Elastic search conf
*/
override def path = "test/venues"
def getVenue(companyId: String) = {
val resp = client.execute(
get id companyId from path
).map { response =>
// transform response to IndexableVenue
response
}
resp
}
If I use getFields() on the response object I get an empty object. But if I call response.getSourceAsString I get the document as json:
{
"id": 3659653,
"name": "Salong Anna och Jag ",
"description": "",
"telephoneNumber": "0811111",
"postalCode": "16440",
"streetAddress": "Kistagången 12",
"city": "Kista",
"lastReview": null,
"location": {
"lat": 59.4045675,
"lon": 17.9502138
},
"pictures": [],
"employees": [],
"reviews": [],
"strongTags": [
"skönhet ",
"skönhet ",
"skönhetssalong"
],
"weakTags": [
"Frisörsalong",
"Frisörer"
],
"reviewCount": 0,
"averageGrade": 0,
"roundedGrade": 0,
"recoScore": 0
}
As you can se the get request omits info:
"_index": "test",
"_type": "venues",
"_id": "3659653",
"_version": 1,
"found": true,
"_source": {}
If I try to do a regular search:
def getVenue(companyId: String) = {
val resp = client.execute(
search in "test"->"venues" query s"id:${companyId}"
//get id companyId from path
).map { response =>
Logger.info("response: "+response.toString)
}
resp
}
I get:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "venues",
"_id": "3659653",
"_score": 1,
"_source": {
"id": 3659653,
"name": "Salong Anna och Jag ",
"description": "",
"telephoneNumber": "0811111",
"postalCode": "16440",
"streetAddress": "Kistagången 12",
"city": "Kista",
"lastReview": null,
"location": {
"lat": 59.4045675,
"lon": 17.9502138
},
"pictures": [],
"employees": [],
"reviews": [],
"strongTags": [
"skönhet ",
"skönhet ",
"skönhetssalong"
],
"weakTags": [
"Frisörsalong",
"Frisörer"
],
"reviewCount": 0,
"averageGrade": 0,
"roundedGrade": 0,
"recoScore": 0
}
}
]
}
}
My Index Service:
trait ElasticIndexService [T <: ElasticDocument] {
val clientProvider: ElasticClientProvider
def path: String
def indexInto[T](document: T, id: String)(implicit writes: Writes[T]) : Future[IndexResponse] = {
Logger.debug(s"indexing into $path document: $document")
clientProvider.getClient.execute {
index into path doc JsonSource(document) id id
}
}
}
case class JsonSource[T](document: T)(implicit writes: Writes[T]) extends DocumentSource {
def json: String = {
val js = Json.toJson(document)
Json.stringify(js)
}
}
and indexing:
#Singleton
class VenueIndexService #Inject()(
stuff...) extends ElasticIndexService[IndexableVenue] {
def indexVenue(indexableVenue: IndexableVenue) = {
indexInto(indexableVenue, s"${indexableVenue.id.get}")
}
Why is getFields empty when doing get?
Why is query info left out when doing getSourceAsString in a get request?
Thank you!
What you're hitting in question 1 is that you're not specifying which fields to return. By default ES will return the source and not fields (other than type and _id). See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html
I've added a test to elastic4s to show how to retrieve fields, see:
https://github.com/sksamuel/elastic4s/blob/master/src%2Ftest%2Fscala%2Fcom%2Fsksamuel%2Felastic4s%2FSearchTest.scala
I am not sure on question 2.
The fields are empty because elasticsearch don't return it.
If you need fields, you must indicate in query what field you need:
this is you search query without field:
search in "test"->"venues" query s"id:${companyId}"
and in this query we indicate which field we want to, in this case 'name' and 'description':
search in "test"->"venues" fields ("name","description") query s"id:${companyId}"
now you can retrieve the fields:
for(x <- response.getHits.hits())
{
println(x.getFields.get("name").getValue)
You found a getSourceAsString in a get request because the parameter _source is to default 'on' and fields is to default 'off'.
I hope this will help you

Categories