Elasticsearch: Class Cast Exception Scala API

Elasticsearch: Class Cast Exception Scala API - scala

I have been using ES 5.6 and the aggregation queries are working
fine. Recently, we upgraded our ES to 7.1 and it has resulted in a
ClassCastException for one of the queries. I'm posting the ES Index
mapping along with the Scala code and ES query that is resulting in
the exception.
Mapping:
{
"orgs": {
"mappings": {
"org": {
"properties": {
"people": {
"type": "nested",
"properties": {
"email": {
"type": "keyword"
},
"first_name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"last_name": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"pcsi": {
"type": "keyword"
},
"position": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"position_type": {
"type": "keyword"
},
"source_guid": {
"type": "keyword"
},
"source_lni": {
"type": "keyword"
},
"suffix": {
"type": "keyword"
}
}
}
}
}
}
}
}
Scala Query:
baseQuery.aggs(nestedAggregation("people", OrganizationSchema.People)
.subAggregations(termsAgg("positiontype", "people.position_type")))
Elastic Query:
{"query":{"term":{"_id":{"value":"id"}}},"aggs":{"people":{"nested":{"path":"people"},"aggs":{"positiontype":{"terms":{"field":"people.position_type"}}}}}}
response:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"people": {
"doc_count": 52,
"positiontype": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Board Member",
"doc_count": 28
},
{
"key": "Executive",
"doc_count": 22
},
{
"key": "Others",
"doc_count": 2
}
]
}
}
}
}
Scala code:
def getOrganizationPeopleFilters(client: ElasticClient, entityType: String, entityId: String, request: Option[PostFilterApiRequest], baseQuery: SearchRequest): IO[PostFilters] = {
val q = baseQuery.aggs(nestedAggregation("people", OrganizationSchema.People)
.subAggregations(termsAgg("positiontype", "people.position_type")))
client.execute {
q
}.flatMap { res ⇒
esToJsonOrganizationPeopleFilters(res.result)
}
}
The ES query is running and aggregating correctly in Kibana. But, when we are trying to FlatMap the response in the above Scala api code, it is resulting in a ClassCastException (java.lang.ClassCastException: scala.collection.immutable.Map$Map2 cannot be cast to java.lang.Integer)

Related

Querying a map (<String, Object>) in JSON through MongoDB

How to query a map of type Map<String, List> in JSON form, in MongoDB?
Sample JSON:
{
"WIDTH": 810,
"HEIGHT": 465,
"MODULES": {
"23": {
"XNAME": "COMP1",
"PARAMS": {
"_Klockers": {
"TYPE": "text",
"VALUE": "Klocker#3"
},
"SUBSYS": {
"TYPE": "text",
"VALUE": "2"
},
"EP": {
"TYPE": "integer",
"VALUE": "2"
}
}
},
"24": {
"XNAME": "COMP2",
"PARAMS": {
"_Rockers": {
"TYPE": "text",
"VALUE": "Rocker#3"
},
"Driver": {
"TYPE": "binary",
"VALUE": 1
},
"EP": {
"TYPE": "long",
"VALUE": "233"
}
}
},
"25": {
"XNAME": "COMP3",
"PARAMS": {
"_Mockers": {
"TYPE": "text",
"VALUE": "Mocker#3"
},
"SYSMain": {
"TYPE": "text",
"VALUE": "2342"
},
"TLP": {
"TYPE": "double",
"VALUE": "2.3"
}
}
}
}
}
Basically I want to :
List all the "XNAME" field values of all keys in "MODULES".
Expected output : {"COMP1", "COMP2", "COMP3"}
List all the "TYPE" in "PARAMS" object within each key of "MODULES".
Expected output : {"text", "text", "integer", "text", "binary", "long", "text", "text", "double"}
I am new to MongoDB and any help or redirection is appreciated.

You can use this
db.collection.aggregate([
{
$project: {//You require this as your data is dynamic
"modules": {
"$objectToArray": "$MODULES"
}
}
},
{//Destruct the array
"$unwind": "$modules"
},
{
"$project": {//Again, requires the same as keys are dynamic
"types": {
"$objectToArray": "$modules.v.PARAMS"
},
xname: "$modules.v.XNAME"
}
},
{//Destruct the types
$unwind: "$types"
},
{//Get the distinct values
$group: {
"_id": null,
"xname": {
"$addToSet": "$xname"
},
"types": {
"$addToSet": "$types.v.TYPE"
},
}
}
])

Need JOLT spec file for transfer of complex JSON

I have a complex JSON object (I've simplified it for this example) that I cannot figure out the JOLT transform JSON for. Does anybody have any ideas of what the JOLT spec file should be?
Original JSON
[
{
"date": {
"isoDate": "2019-03-22"
},
"application": {
"name": "SiebelProject"
},
"applicationResults": [
{
"reference": {
"name": "Number of Code Lines"
},
"result": {
"value": 44501
}
},
{
"reference": {
"name": "Transferability"
},
"result": {
"grade": 3.1889542208002064
}
}
]
},
{
"date": {
"isoDate": "2019-03-21"
},
"application": {
"name": "SiebelProject"
},
"applicationResults": [
{
"reference": {
"name": "Number of Code Lines"
},
"result": {
"value": 45000
}
},
{
"reference": {
"name": "Transferability"
},
"result": {
"grade": 3.8
}
}
]
}
]
Desired JSON after transformation and sorting by "Name" ASC, "Date" DESC
[
{
"Name": "SiebelProject",
"Date": "2019-03-22",
"Number of Code Lines": 44501,
"Transferability" : 3.1889542208002064
},
{
"Name": "SiebelProject",
"Date": "2019-03-21",
"Number of Code Lines": 45000,
"Transferability" : 3.8
}
]

I couldn't find a way to do the sort (I'm not even sure you can sort descending in JOLT) but here's a spec to do the transform:
[
{
"operation": "shift",
"spec": {
"*": {
"date": {
"isoDate": "[#3].Date"
},
"application": {
"name": "[#3].Name"
},
"applicationResults": {
"*": {
"reference": {
"name": {
"Number of Code Lines": {
"#(3,result.value)": "[#7].Number of Code Lines"
},
"Transferability": {
"#(3,result.grade)": "[#7].Transferability"
}
}
}
}
}
}
}
}
]
After that there are some tools (like jq I think) that could do the sort.

How can I use CloudKit web services to query based on a reference field?

I've got two CloudKit data objects that look somewhat like this:
Parent Object:
{
"records": [
{
"recordName": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"recordType": "ParentObject",
"fields": {
"fsYear": {
"value": "2015",
"type": "STRING"
},
"displayOrder": {
"value": 2015221153856287200,
"type": "INT64"
},
"fjpFSGuidForReference": {
"value": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"type": "STRING"
},
"fsDateSearch": {
"value": "2015221153856287158",
"type": "STRING"
},
},
"recordChangeTag": "id4w7ivn",
"created": {
"timestamp": 1439149087571,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
},
"modified": {
"timestamp": 1439149087571,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
}
}
],
"total":
}
Child Object:
{
"records": [
{
"recordName": "2015221153856287168",
"recordType": "ChildObject",
"fields": {
"District": {
"value": "002",
"type": "STRING"
},
"ZipCode": {
"value": "12345",
"type": "STRING"
},
"InspecReference": {
"value": {
"recordName": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"action": "NONE",
"zoneID": {
"zoneName": "_defaultZone"
}
},
"type": "REFERENCE"
},
},
"recordChangeTag": "id4w7lew",
"created": {
"timestamp": 1439149090856,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
},
"modified": {
"timestamp": 1439149090856,
"userRecordName": "_0d26968032e31bbc72c213037b6cb35d",
"deviceID": "A19CD995FDA3093781096AF5D818033A241D65C1BFC3D32EC6C5D6B3B4A9AA6B"
}
}
],
"total": 1
}
I'm trying to write a query to directly access the CloudKit web service and return the Child Object based on the reference of the parent object.
My test JSON looks something like this:
{"query":{"recordType":"ChildObject","filterBy":{"fieldName":"InspecReference","fieldValue":{ "value" : "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57", "type" : "string" },"comparator":"EQUALS"}},"zoneID":{"zoneName":"_defaultZone"}}
However, I'm getting the following error from CloudKit:
{"uuid":"33db91f3-b768-4a68-9056-216ecc033e9e","serverErrorCode":"BAD_REQUEST","reason":"BadRequestException:
Unexpected input"}
I'm guessing I have the Record Field Dictionary in the query wrong. However, the documentation isn't clear on what this should look like on a reference object.

You have to re-create the actual object of the reference. In this particular case, the JSON looks like this:
{
"query": {
"recordType": "ChildObject",
"filterBy": {
"fieldName": "InspecReference",
"fieldValue": {
"value": {
"recordName": "14102C0A-60F2-4457-AC1C-601BC628BF47-184-000000012D225C57",
"action": "NONE"
},
"type": "REFERENCE"
},
"comparator": "EQUALS"
}
},
"zoneID": {
"zoneName": "_defaultZone"
}
}

Elastic Search: Any way to make space-separated words in a comma-separated list regarded as one term?

I don't know if this is possible, but I'm trying to search by locations with an "exact search" option. There are a couple fields that get searched, with the most important one being the "location_raw" field:
"match": {
"location.location_raw": {
"type": "boolean",
"operator": "AND",
"query": "[location query]",
"analyzer": "standard"
}
}
The location_raw field is a location string with a comma between each place, such as "Sudbury, Middlesex, Massachusetts" or "Leamington, Warwickshire, England". If someone searches for "Sudbury, Middlesex" it gets passed in as
"query": "Sudbury Middlesex"
and both of those terms must exist in the location_raw field. This part works.
The problem is that when the location_raw field contains multi-word location, like New York or Saint George, these get returned when someone searches for "York" or "George." If I do an exact search for "George," I do not want to get results for "Saint George." Is there any way to make Elastic consider "Saint George" one term in the string "Saint George, Stamford, Lincoln, England"?

Here's one way to do it, but you have to query in csv too, or use a terms filter.
I used a pattern analyzer with a simple pattern: ", ". I set up a simple index with a single document:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"csv": {
"type": "pattern",
"pattern": ", ",
"lowercase": false
}
}
}
},
"mappings": {
"doc": {
"properties": {
"location": {
"type": "string",
"index_analyzer": "csv",
"search_analyzer": "standard",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"location":"Saint George, Stamford, Lincoln, England"}
I can see the terms generated with a simple terms aggregation:
POST /test_index/_search?search_type=count
{
"aggs": {
"location_terms": {
"terms": {
"field": "location"
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"location_terms": {
"buckets": [
{
"key": "England",
"doc_count": 1
},
{
"key": "Lincoln",
"doc_count": 1
},
{
"key": "Saint George",
"doc_count": 1
},
{
"key": "Stamford",
"doc_count": 1
}
]
}
}
}
And then if I query with the same csv syntax, the document isn't returned for "George, England":
POST /test_index/_search
{
"query": {
"match": {
"location": {
"type": "boolean",
"operator": "AND",
"query": "George, England",
"analyzer": "csv"
}
}
}
}
...
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
but is for "Saint George, England":
POST /test_index/_search
{
"query": {
"match": {
"location": {
"type": "boolean",
"operator": "AND",
"query": "Saint George, England",
"analyzer": "csv"
}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.2169777,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.2169777,
"_source": {
"location": "Saint George, Stamford, Lincoln, England"
}
}
]
}
}
This query is equivalent, and probably more performant:
POST /test_index/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"location": [
"Saint George",
"England"
],
"execution": "and"
}
}
}
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/234ea93accb7b20ad8fd33e62fe92f1d450a51ab

Highlighting part of word in elasticsearch

I have made a auto-suggester in elastic search using n-gram tokenizer. Now I want to highlight the user entered character sequence in the auto suggest list. For this purpose I used the highlighter available in elastic search my code is as below but in the output the complete term is being highlighted where am I going wrong.
{
"query": {
"query_string": {
"query": "soft",
"default_field": "competency_display_name"
}
},
"highlight": {
"pre_tags": ["<b>"],
"post_tags": ["</b>"],
"fields": {
"competency_display_name": {}
}
}
}
and the result is
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "competency_auto_suggest",
"_type": "competency",
"_id": "4",
"_score": 1,
"_source": {
"review": null,
"competency_title": "Software Development",
"id": 4,
"competency_display_name": "Software Development"
},
"highlight": {
"competency_display_name": [
"<b>Software Development</b>"
]
}
}
]
}
}
mapping
"competency":{
"properties": {
"competency_display_name":{
"type":"string",
"index_analyzer": "index_ngram_analyzer",
"search_analyzer": "search_term_analyzer"
}
}
}
settings
"analysis": {
"filter": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [ "letter", "digit" ]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "ngram_tokenizer", "lowercase" ]
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
how to highlight Soft instead of Software Development.

You should use ngram tokenizer instead of ngram filter to highlight in this case.
with_positions_offsets is needed to help highlighting more faster.
Here's the workable settings & mapping :
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "1",
"max_gram": "15",
"token_chars": [ "letter", "digit" ]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [ "lowercase" ]
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
mapping
"competency":{
"properties": {
"competency_display_name":{
"type":"string",
"index_analyzer": "index_ngram_analyzer",
"search_analyzer": "search_term_analyzer",
"term_vector":"with_positions_offsets"
}
}
}