Elasticsearch nested filtering (elastic4s, scala) - scala

I am using Elasticsearch 1.7 and elastic4s DSL.
My problem is I can't add and & or filter on a nested document.
For example, here is a JSON representation of my instance of case class Candidate:
{
"name": "Samy"
"interviews": [
{
"clientId": 0,
"stateId": "CANCELED",
},
{
"clientId": 1,
"stateId": "APPROVED"
}
]
Here is my filter :
def filtering(interviewAndCandidates: IntCand)(implicit user: PublicUser): Seq[FilterDefinition] = {
nestedFilter("interviews").filter(termFilter("clientId", user.id)) ::
List(or(interviewAndCandidates.interviews.map(state ⇒ nestedFilter("interviews").filter(termFilter("stateId", state)))))
}
Then I build the query:
var request: SearchDefinition = search in "myIndex" -> "candidate" query {
filteredQuery query {
matchAllQuery
} filter {
and(filters)
}
}
With:
case class IntCand(interviews: List[String])
case class Candidate(name: String, interviews: List[Interview])
case class Interview(clientId: Long, stateId: String)
The problem is when I filter on IntCand(List("CANCELED")) and clientId=1, the response show me the Candidate (I want filter on clientId AND interviews)

I succeed by denormalizing the data

Related

use JsLookup for multiple level array using play Json for lists with objects and simple lists

i have a function that exctract from a json list of elements based on the path i give it.
the func looks like this:
def findAllValuesAtPath(jsValue: JsObject, path: String): List[JsValue] = {
val jsPath = JsPath(path
.split("\\[\\*]\\.")
.flatMap(s => s.split("\\.")
.map(RecursiveSearch)
).toList)
jsPath(jsValue)
}
for example:
{
"person": {
"kids": [
{
"name": "josh",
"age": 5
},
{
"name": "julia",
"age": 13
}
]
}
}
now if i give the path - "person.kids[*].name" I will get list of their names List("josh", "julia") which its what i want.
but if the list of kids were just simple list like:
{
"person": {
"kids": [
"josh",
"julia"
]
}
}
and i will give the path - "person.kids[*]" I will get empty list List() and I want to get this as List("josh", "julia") (without the brakets)
do you see any way to improve my func to handle both cases?

Decode a nested array with circe-optics

I have JSON like this:
"data": {
"project": {
"activityChildren": [
{
"id": 2,
"parents": [
{
"id": 1
}
]
},
]
}
}
I'd like to decode this to List[(Long, List[Long])] with circe-optics. I got as far as:
val activityParents: Map[Long, List[Long]] = root.data.activityChildren.each.json
.getAll(json)
.flatMap { activity =>
root.id.long.getOption(activity).map(_ -> root.parents.each.long.getAll(activity))
}
.toMap
I wonder whether it's possible to define a single lens for this that just turns the JSON into the desired map without explicitly mapping over the intermediate array. If so, how?

Spark Scala - How to construct Scala Map from nested JSON?

I've a nested json data with nested fields that I want to extract and construct a Scala Map.
Heres the sample JSON:
"nested_field": [
{
"airport": "sfo",
"score": 1.0
},
{
"airport": "phx",
"score": 1.0
},
{
"airport": "sjc",
"score": 1.0
}
]
I want to use saveToES() and construct a Scala Map to index the field into ES index with mapping as below:
"nested_field": {
"properties": {
"score": {
"type": "double"
},
"airport": {
"type": "keyword",
"ignore_above": 1024
}
}
}
The json file is read into the dataframe using spark.read.json("example.json"). Whats the right way to construct the Scala Map in this case?
Thanks for any help!
You can do it using the below sample code
import org.json4s.DefaultFormats
import org.json4s.jackson.JsonMethods.parse
case class AirPortScores(airport: String, score: Double)
case class JsonRulesHandler(airports: List[AirPortScores])
val jsonString: String = """{"airports":[{"airport":"sfo","score":1},{"airport":"phx","score":1},{"airport":"sjc","score":1}]}"""
def loadJsonString(JsonString: String): JsonRulesHandler = {
implicit val formats: DefaultFormats.type = org.json4s.DefaultFormats
parse(JsonString).extract[JsonRulesHandler]
}
val parsedJson: JsonRulesHandler = loadJsonString(jsonString)
parsedJson.airports.foreach(println)//you can select parsedJson.airport or scores
//below ouput
AirPortScores(sfo,1.0)
AirPortScores(phx,1.0)
AirPortScores(sjc,1.0)

RemoteTransportException, Fielddata is disabled on text fields when doing aggregation on text field

I am migrating from 2.x to 5.x
I am adding values to the index like this
indexInto (indexName / indexType) id someKey source foo
however I would also want to fetch all values by field:
def getValues(tag: String) ={
client execute {
search(indexName / indexType) query ("_field_names", tag) aggregations (termsAggregation( "agg") field tag size 1)
}
But I am getting this exception :
RemoteTransportException[[8vWOLB2][172.17.0.5:9300][indices:data/read/search[phase/query]]];
nested: IllegalArgumentException[Fielddata is disabled on text fields
by default. Set fielddata=true on [my_tag] in order to load fielddata
in memory by uninverting the inverted index. Note that this can
however use significant memory.];
I am thought maybe to use keyword as shown here , but the fields are not known in advanced (sent by the user) so I cannot use perpend mappings
By default all the unknown fields will be indexed/added to elasticsearch as text fields which are not specified in the mappings.
If you will take a look at mappings of such a field, you can see there a field is enabled with for such fields with type 'keyword' and these fields are indexed but not analyzed.
GET new_index2/_mappings
{
"new_index2": {
"mappings": {
"type": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
so you can use the fields values for the text fields for aggregations like the following
POST new_index2/_search
{
"aggs": {
"NAME": {
"terms": {
"field": "name.fields",
"size": 10
}
}
}
}
Check name.fields
So your scala query can work if you can shift to fields value.
def getValues(tag: String) = {
client.execute {
search(indexName / indexType)
.query("_field_name", tag)
.aggregations {
termsAgg("agg", "field_name.fields")
}.size(1)
}
}
Hope this helps.
Thanks

[json4s]:Extracting Array of different objects

I am using the facebook graph API and the responses look similar to this:
{
"data": [
{
"id": "311620272349920_311718615673419",
"from": {
"id": "1456046457993048",
"name": "Richard Ettinson"
},
"to": {
"data": [
{
"id": "311620272349920",
"name": "Barbara Fallerman"
}
]
},
"with_tags": {
"data": [
{
"id": "311620272349920",
"name": "Barbara Fallerman"
}
]
},
"message": "I was gong out with her",
"actions": [
{
"name": "Comment",
"link": "https://www.facebook.com/311620272349920/posts/311718615673419"
},
{
"name": "Like",
"link": "https://www.facebook.com/311620272349920/posts/311718615673419"
}
]
}
I managed to for example extract the from field through
val extracted = (json \ "data" \"from").extract[PostFrom]
But I worry that if I use this technique I will need to pass over the Json multiple times to extract all the values I need which could lead to a bad performance.
How exactly could I extract these fields into case classes from the array of non similar objects?
I tried with the following case classes:
abstract class BaseResponse()
case class Data(list:List[Post])
case class Post(id: String, post: PostFrom) extends BaseResponse
case class PostFrom(id: String, name:String)
Which always lead to an empty List, is there a way to get back a Data class which has a list of certain classes which I am interested in? (As example the top level id, from and with_tags)
A possibility I found was to use more case classes instead of inheritance:
case class Root[T](data:Option[T])
case class Post(id: String, from: From, message: String)
case class From(id: String, name:String)
Basically there has to be a root object which takes some kind of graphs response object, additionally it is optional so that it won't throw an exception if there was a problem with the parsing of the response.
I then used it in the following way:
val body = r.entity.asString
val json = parse(r.entity.asString)
val root = json.extract[Root[Post]]
root.data match {
case Some(post) =>
val tagger = Tagger(post.from.id, post.from.name, post.id, post.message)
log.info(s"We received $tagger")
originalSender ! RetrievedTagger(tagger)
case None => originalSender ! NoTaggerFound
}