elastic4s geodistance sort query syntax - scala

I am using elastic4s version 1.6.2 and need to write a query which searches for a given geo location, sorts results by distance and also returns the distance.
I can do this using get request in curl but struggling to find the correct syntax and example got geolocation sort.
case class Store(id: Int, name: String, number: Int, building: Option[String] = None, street: String, postCode: String,county: String, geoLocation: GeoLocation)
case class GeoLocation(lat: Double, lon: Double)
val createMappings = client.execute {
create index "stores" mappings (
"store" as(
"store" typed StringType,
"geoLocation" typed GeoPointType
)
)
}
def searchByGeoLocation(geoLocation: GeoLocation) = {
client.execute {
search in "stores" -> "store" postFilter {
geoDistance("geoLocation") point(geoLocation.lat, geoLocation.lon) distance(2, KILOMETERS)
}
}
}
Does someone knows how to add the sort by distance from a geo location and get the distance
Following curl command works as expected
curl -XGET 'http://localhost:9200/stores/store/_search?pretty=true' -d '
{
"sort" : [
{
"_geo_distance" : {
"geoLocation" : {
"lat" : 52.0333793839746,
"lon" : -0.768937531935448
},
"order" : "asc",
"unit" : "km"
}
}
],
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "20km",
"geoLocation" : {
"lat" : 52.0333793839746,
"lon" : -0.768937531935448
}
}
}
}
}
}'

Not an expert in elasti4s but this query should be equivalent to your curl command:
val geoLocation = GeoLocation(52.0333793839746, -0.768937531935448)
search in "stores" -> "store" query {
filteredQuery filter {
geoDistance("geoLocation") point(geoLocation.lat, geoLocation.lon) distance(20, KILOMETERS)
}
} sort {
geoSort("geoLocation") point (geoLocation.lat, geoLocation.lon) order SortOrder.ASC
}
it prints the following query:
{
"query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"geo_distance" : {
"geoLocation" : [ -0.768937531935448, 52.0333793839746 ],
"distance" : "20.0km"
}
}
}
},
"sort" : [ {
"_geo_distance" : {
"geoLocation" : [ {
"lat" : 52.0333793839746,
"lon" : -0.768937531935448
} ]
}
} ]
}
asc is the default value of sorting so you can remove order SortOrder.ASC. Just wanted to be explicit in this example.

Related

MongoDB querying nested documents

I have records like:
{
"_id" : ObjectId("5f99cede36fd08653a3d4e92"),
"accessions" : {
"sample_accessions" : {
"5f99ce9636fd08653a3d4e86" : {
"biosampleAccession" : "SAMEA7494329",
"sraAccession" : "ERS5250977",
"submissionAccession" : "ERA3032827",
"status" : "accepted"
},
"5f99ce9636fd08653a3d4e87" : {
"biosampleAccession" : "SAMEA7494330",
"sraAccession" : "ERS5250978",
"submissionAccession" : "ERA3032827",
"status" : "accepted"
}
}
}
}
How do I query by the mongo id in sample_accessions? I thought this should work but it doesn't. What should I be doing?
db.getCollection('collection').find({"accessions.sample_accessions":"5f99ce9636fd08653a3d4e86"})
The id is a key and check whether key is exists or not use $exists, customize response using project to get specific object
db.getCollection('collection').find(
{
"accessions.sample_accessions.5f99ce9636fd08653a3d4e86": {
$exists: true
}
},
{ sample_doc: "$accessions.sample_accessions.5f99ce9636fd08653a3d4e86" }
)
Playground

Mongo DB aggregation pipeline: convert string to document/object

Have a field of type "String" that contain representation of an object/document
" {"a":35,b:[1,2,3,4]}"
I know is a strange construct but i can't change it.
my goal would be to extract for example the value of "a".
As the document represented by the string are nested and repeated a regex doesnt fit.
So how can i convert in a mongo db aggregation/query this String to object so that i can process it in a following aggregation step?
(could extract string with python make a dict and extract infos, but i'd like to stay inside the aggregation pipeline and so having better performance)
In 4.4 this works
db.target.aggregate([{$project: {
X: "$AD_GRAPHIC",
Y : {
$function : {
body: function(jsonString) {
return JSON.parse(jsonString)
},
args: [ "$AD_GRAPHIC"],
lang: "js"
}
}
}
}])
Basically use the $function operator to invoke the JSON parser. (assumes you have enabled Javascript)
Results
{ "_id" : ObjectId("60093dc8f2c829000e38a8d0"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"modem.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "modem.jpg" } }
{ "_id" : ObjectId("60093dc8f2c829000e38a8d1"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"monitor.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "monitor.jpg" } }
{ "_id" : ObjectId("60093dc8f2c829000e38a8d2"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"mousepad.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "mousepad.jpg" } }
{ "_id" : ObjectId("60093dc8f2c829000e38a8d3"), "X" : "{\"alias\":\"MEDIA_DIR\",\"path\":\"keyboard.jpg\"}", "Y" : { "alias" : "MEDIA_DIR", "path" : "keyboard.jpg" } }
>
There's no native way in the MongoDB engine to parse a blob of JSON from a field. However, I'd recommend just doing it client-side in your language of choice and then if required save it back.
Alternatively, if your data is too big and still needs to aggregate it you could use regex and project out the required fields from the JSON to then use them later to filter etc...
For example if we insert the following document:
> db.test.insertOne({ name: 'test', blob: '{"a":35,b:[1,2,3,4]}' })
{
"acknowledged" : true,
"insertedId" : ObjectId("5ed9fe21b5d91941c9e85cdb")
}
We can then just project out the array with some regex:
db.test.aggregate([
{ $addFields: { b: { $regexFind: { input: "$blob", regex: /\[(((\d+,*))+)\]/ } } } },
{ $addFields: { b: { $split: [ { $arrayElemAt: [ "$b.captures", 0 ] }, "," ] } } }
]);
{
"_id" : ObjectId("5ed9fe21b5d91941c9e85cdb"),
"name" : "test",
"blob" : "{\"a\":35,b:[1,2,3,4]}",
"b" : [
"1",
"2",
"3",
"4"
]
}
This means we can do some filtering, sorting and any of the other aggregation stages.
You could just use JSON.parse()
For example
db.getCollection('system').find({
a: JSON.parse('{"a":35,b:[1,2,3,4]}').a
})

Elasticsearch doesn't find value in range query

I launch following query:
GET archive-bp/_search
{
"query": {
"bool" : {
"filter" : [ {
"bool" : {
"should" : [ {
"terms" : {
"naDataOwnerCode" : [ "ACME-FinServ", "ACME-FinServ CA", "ACME-FinServ NY", "ACME-FinServ TX", "ACME-Shipping APA", "ACME-Shipping Eur", "ACME-Shipping LATAM", "ACME-Shipping ME", "ACME-TelCo-CN", "ACME-TelCo-ESAT", "ACME-TelCo-NL", "ACME-TelCo-PL", "ACME-TelCo-RO", "ACME-TelCo-SA", "ACME-TelCo-Treasury", "Default" ]
}
},
{
"bool" : {
"must_not" : {
"exists" : {
"field" : "naDataOwnerCode"
}
}
}
} ]
}
}, {
"range" : {
"bankCommunicationStatusDate" : {
"from" : "2006-02-27T06:45:47.000Z",
"to" : null,
"time_zone" : "+02:00",
"include_lower" : true,
"include_upper" : true
}
}
} ]
}
}
}
And I receive no results, but the field exists in my index.
When I strip off the data owner part, I still have no results. When I strip off the bankCommunicationDate, I get 10 results, so there is the problem.
The query of only the bankCommunicationDate:
GET archive-bp/_search
{
"query" :
{
"range" : {
"bankCommunicationStatusDate" : {
"from" : "2016-04-27T09:45:43.000Z",
"to" : "2026-04-27T09:45:43.000Z",
"time_zone" : "+02:00",
"include_lower" : true,
"include_upper" : true
}
}
}
}
The mapping of my index contains the following bankCommunicationStatusDate field:
"bankCommunicationStatusDate": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
And there are values for the field bankCommunicationStatusDate in elasticsearch:
"bankCommunicationStatusDate": "2016-04-27T09:45:43.000Z"
"bankCommunicationStatusDate": "2016-04-27T09:45:47.000Z"
What is wrong?
What version of Elastic Search do you use?
I guess the reason is that you should use "gte/lte" instead of "from/to/include_lower/include_upper".
According to documentation to version 0.90.4
https://www.elastic.co/guide/en/elasticsearch/reference/0.90/query-dsl-range-query.html
Deprecated in 0.90.4.
The from, to, include_lower and include_upper parameters have been deprecated in favour of gt,gte,lt,lte.
The strange thing is that I have tried your example on elastic search version 1.7 and it returns data!
I guess real depreciation took place much later - between 1.7 and maybe newer version you have.
BTW. You can isolate the problem even further using Sense plugin for Chrome and this code:
DELETE /test
PUT /test
{
"mappings": {
"myData" : {
"properties": {
"bankCommunicationStatusDate": {
"type": "date"
}
}
}
}
}
PUT test/myData/1
{
"bankCommunicationStatusDate":"2016-04-27T09:45:43.000Z"
}
PUT test/myData/2
{
"bankCommunicationStatusDate":"2016-04-27T09:45:47.000Z"
}
GET test/_search
{
"query" :
{
"range" : {
"bankCommunicationStatusDate" : {
"gte" : "2016-04-27T09:45:43.000Z",
"lte" : "2026-04-27T09:45:43.000Z"
}
}
}
}

Using 'findAndModify' on mongodb within multiple nested array (complex)

I have the next document in a collection:
{
"_id" : ObjectId("546a7a0f44aee82db8469f6d"),
...
"valoresVariablesIterativas" : [
{
"asignaturaVO" : {
"_id" : ObjectId("546a389c44aee54fc83112e9")
},
"valoresEstaticos" : {
"IT_VAR3" : "",
"IT_VAR1" : "",
"IT_VAR2" : "asdasd"
},
"valoresPreestablecidos" : {
"IT_ASIGNATURA" : "Matemáticas",
"IT_NOTA_DEFINITIVA_ASIGNATURA" : ""
}
},
{
"asignaturaVO" : {
"_id" : ObjectId("546a3d8d44aee54fc83112fa")
},
"valoresEstaticos" : {
"IT_VAR3" : "",
"IT_VAR1" : "",
"IT_VAR2" : ""
},
"valoresPreestablecidos" : {
"IT_ASIGNATURA" : "Español",
"IT_NOTA_DEFINITIVA_ASIGNATURA" : ""
}
}
]
...
}
I want modify an element of the valoresEstaticos, I know the fields "_id", "asignaturaVO", and the key of the item valoresEstaticos that I want modify.
Which is the correct query for this?, I have this:
db.myCollection.findAndModify({
query:{"_id" : ObjectId("546a7a0f44aee82db8469f6d")},
update: {
{valoresVariablesIterativas.asignaturaVO._id: ObjectId("546a389c44aee54fc83112e9")},
{ $set: {}}
}
})
but I dont know how to build a query :(
Help me please, Thank you very much!
You can just use update.
db.myCollection.update(
{ "_id" : ObjectId("546a7a0f44aee82db8469f6d"), "valoresVariablesIterativas.asignaturaVO._id" : ObjectId("546a389c44aee54fc83112e9") },
{ "$set" : { "valoresVariablesIterativas.$.valoresEstaticos.IT_VAR3" : 99 } }
)
assuming you want to update key IT_VAR3. The key is the positional update operator $. The condition on the array in the query portion of the update is redundant for finding the document, but necessary to use the $ to update the correct array element.

MongoDB MapReduce producing different results for each document

This is a follow-up from this question, where I tried to solve this problem with the aggregation framework. Unfortunately, I have to wait before being able to update this particular mongodb installation to a version that includes the aggregation framework, so have had to use MapReduce for this fairly simple pivot operation.
I have input data in the format below, with multiple daily dumps:
"_id" : "daily_dump_2013-05-23",
"authors_who_sold_books" : [
{
"id" : "Charles Dickens",
"original_stock" : 253,
"customers" : [
{
"time_bought" : 1368627290,
"customer_id" : 9715923
}
]
},
{
"id" : "JRR Tolkien",
"original_stock" : 24,
"customers" : [
{
"date_bought" : 1368540890,
"customer_id" : 9872345
},
{
"date_bought" : 1368537290,
"customer_id" : 9163893
}
]
}
]
}
I'm after output in the following format, that aggregates across all instances of each (unique) author across all daily dumps:
{
"_id" : "Charles Dickens",
"original_stock" : 253,
"customers" : [
{
"date_bought" : 1368627290,
"customer_id" : 9715923
},
{
"date_bought" : 1368622358,
"customer_id" : 9876234
},
etc...
]
}
I have written this map function...
function map() {
for (var i in this.authors_who_sold_books)
{
author = this.authors_who_sold_books[i];
emit(author.id, {customers: author.customers, original_stock: author.original_stock, num_sold: 1});
}
}
...and this reduce function.
function reduce(key, values) {
sum = 0
for (i in values)
{
sum += values[i].customers.length
}
return {num_sold : sum};
}
However, this gives me the following output:
{
"_id" : "Charles Dickens",
"value" : {
"customers" : [
{
"date_bought" : 1368627290,
"customer_id" : 9715923
},
{
"date_bought" : 1368622358,
"customer_id" : 9876234
},
],
"original_stock" : 253,
"num_sold" : 1
}
}
{ "_id" : "JRR Tolkien", "value" : { "num_sold" : 3 } }
{
"_id" : "JK Rowling",
"value" : {
"customers" : [
{
"date_bought" : 1368627290,
"customer_id" : 9715923
},
{
"date_bought" : 1368622358,
"customer_id" : 9876234
},
],
"original_stock" : 183,
"num_sold" : 1
}
}
{ "_id" : "John Grisham", "value" : { "num_sold" : 2 } }
The even indexed documents have the customers and original_stock listed, but an incorrect sum of num_sold.
The odd indexed documents only have the num_sold listed, but it is the correct number.
Could anyone tell me what it is I'm missing, please?
Your problem is due to the fact that the format of the output of the reduce function should be identical to the format of the map function (see requirements for the reduce function for an explanation).
You need to change the code to something like the following to fix the problem, :
function map() {
for (var i in this.authors_who_sold_books)
{
author = this.authors_who_sold_books[i];
emit(author.id, {customers: author.customers, original_stock: author.original_stock, num_sold: author.customers.length});
}
}
function reduce(key, values) {
var result = {customers:[] , num_sold:0, original_stock: (values.length ? values[0].original_stock : 0)};
for (i in values)
{
result.num_sold += values[i].num_sold;
result.customers = result.customers.concat(values[i].customers);
}
return result;
}
I hope that helps.
Note : the change num_sold: author.customers.length in the map function. I think that's what you want