Waypoint matching query - mongodb

We have collection as follows. Each document represent a trip of driver, loc property contains way-points, time property contains time corresponding to way-points. For example, in Trip A, Driver would be at GeoLocation tripA.loc.coordinates[0] at the time tripA.time[0]
{
tripId : "Trip A",
time : [
"2015-03-08T04:47:43.589Z",
"2015-03-08T04:48:43.589Z",
"2015-03-08T04:49:43.589Z",
"2015-03-08T04:50:43.589Z",
],
loc: {
type: "MultiPoint",
coordinates: [
[ -73.9580, 40.8003 ],
[ -73.9498, 40.7968 ],
[ -73.9737, 40.7648 ],
[ -73.9814, 40.7681 ]
]
}
}
{
tripId : "Trip B",
time : [
"2015-03-08T04:47:43.589Z",
"2015-03-08T04:48:43.589Z",
"2015-03-08T04:49:43.589Z",
"2015-03-08T04:50:43.589Z",
],
loc: {
type: "MultiPoint",
coordinates: [
[ -72.9580, 41.8003 ],
[ -72.9498, 41.7968 ],
[ -72.9737, 41.7648 ],
[ -72.9814, 41.7681 ]
]
}
}
We would like to query for trips which starts near (1km) location "[long1,lat1]" around the time t (+-10 minutes) and ends at [long2,lat2].
Is there simple and efficient way to formulate above query for MongoDB or Elasticsearch?
If so could you please give the query to do so. either in MongoDB or Elasticsearch. (MongoDB preferable)

This did start as a comment but was clearly getting way to long. So it's a long explanation of the limitations and the approach.
The bottom line of what you are asking to achieve here is effectively a "union query" which is generally defined as two separate queries where the end result is the "set intersection" of each of the results. In more brief, where the selected "trips" from your "origin" query matches results found in your "destination" query.
In general database terms we refer to a "union" operation as a "join" or at least a condition where the selection of one set of criteria "and" the selection of another must both meet with a common base grouping identifier.
The base points in MongoDB speak as I believe also applies to elastic search indexes is that neither datastore mechanism supports the notion of a "join" in any way from a direct singular query.
There is another MongoDB principle here considering your proposed or existing modelling, in that even with items specified in "array" terms, there is no way to implement an "and" condition with a geospatial search on coordinates and that also considering your choice of modelling as a GeoJSON "MultiPoint" the query cannot "choose" which element of that object to match the "nearest" to. Therefore "all points" would be considered when considering the "nearest match".
Your explanation is quite clear in the intent. So we can see that "origin" is both notated as and represented within what is essentially "two arrays" in your document structure as the "first" element in each of those arrays. The representative data being a "location" and "time" for each progressive "waypoint" in the "trip". Naturally ending in your "destination" at the end element of each array, considering of course that the data points are "paired".
I see the logic in thinking that this is a good way to store things, but it does not follow the allowed query patterns of either of the storage solutions you mention here.
As I mentioned already, this is indeed a "union" in intent so while I see the thinking that led to the design it would be better to store things like this:
{
"tripId" : "Trip A",
"time" : ISODate("2015-03-08T04:47:43.589Z"),
"loc": {
"type": "Point",
"coordinates": [ -73.9580, 40.8003 ]
},
"seq": 0
},
{
"tripId" : "Trip A",
"time" : ISODate("2015-03-08T04:48:43.589Z"),
"loc": {
"type": "Point",
"coordinates": [ -73.9498, 40.7968 ]
},
"seq": 1
},
{
"tripId" : "Trip A",
"time" : ISODate("2015-03-08T04:49:43.589Z"),
"loc": {
"type": "Point",
"coordinates": [ -73.9737, 40.7648 ]
},
"seq": 2
},
{
"tripId" : "Trip A",
"time" : ISODate("2015-03-08T04:50:43.589Z"),
"loc": {
"type": "Point",
"coordinates": [ -73.9814, 40.7681 ]
},
"seq": 3,
"isEnd": true
}
In the example, I'm just inserting those documents into a collection called "geojunk", and then issuing a 2dsphere index for the "loc" field:
db.geojunk.ensureIndex({ "loc": "2dsphere" })
The processing of this is then done with "two" .aggregate() queries. The reason for .aggregate() is because you want to match the "first" document "per trip" in each case. This represents the nearest waypoint for each trip found by the queries. Then basically you want to "merge" these results into some kind of "hash" structure keyed by "tripId".
The end logic says that if both an "origin" and a "destination" matched your query conditions for a given "trip", then that is a valid result for your overall query.
The code I give here is an arbitrary nodejs implementaion. Mostly because it's a good base to represent issuing the queries in "parallel" for best performance and also because I'm choosing to use nedb as an example of the "hash" with a little more "Mongolike" syntax:
var async = require('async'),
MongoClient = require("mongodb").MongoClient;
DataStore = require('nedb');
// Common stream upsert handler
function streamProcess(stream,store,callback) {
stream.on("data",function(data) {
// Clean "_id" to keep nedb happy
data.trip = data._id;
delete data._id;
// Upsert to store
store.update(
{ "trip": data.trip },
{
"$push": {
"time": data.time,
"loc": data.loc
}
},
{ "upsert": true },
function(err,num) {
if (err) callback(err);
}
);
});
stream.on("err",callback)
stream.on("end",callback);
}
MongoClient.connect('mongodb://localhost/test',function(err,db) {
if (err) throw err;
db.collection('geojunk',function(err,collection) {
if (err) throw err;
var store = new DataStore();
// Parallel execution
async.parallel(
[
// Match origin trips
function(callback) {
var stream = collection.aggregate(
[
{ "$geoNear": {
"near": {
"type": "Point",
"coordinates": [ -73.9580, 40.8003 ],
},
"query": {
"time": {
"$gte": new Date("2015-03-08T04:40:00.000Z"),
"$lte": new Date("2015-03-08T04:50:00.000Z")
},
"seq": 0
},
"maxDistance": 1000,
"distanceField": "distance",
"spherical": true
}},
{ "$group": {
"_id": "$tripId",
"time": { "$first": "$time" },
"loc": { "$first": "$loc" }
}}
],
{ "cursor": { "batchSize": 1 } }
);
streamProcess(stream,store,callback);
},
// Match destination trips
function(callback) {
var stream = collection.aggregate(
[
{ "$geoNear": {
"near": {
"type": "Point",
"coordinates": [ -73.9814, 40.7681 ]
},
"query": { "isEnd": true },
"maxDistance": 1000,
"distanceField": "distance",
"spherical": true
}},
{ "$group": {
"_id": "$tripId",
"time": { "$first": "$time" },
"loc": { "$first": "$loc" }
}}
],
{ "cursor": { "batchSize": 25 } }
);
streamProcess(stream,store,callback);
}
],
function(err) {
if (err) throw err;
// Just documents that matched origin and destination
store.find({ "loc": { "$size": 2 }},{ "_id": 0 },function(err,result) {
if (err) throw err;
console.log( JSON.stringify( result, undefined, 2 ) );
db.close();
});
}
);
});
});
On the sample data as I listed it this will return:
[
{
"trip": "Trip A",
"time": [
"2015-03-08T04:47:43.589Z",
"2015-03-08T04:50:43.589Z"
],
"loc": [
{
"type": "Point",
"coordinates": [
-73.958,
40.8003
]
},
{
"type": "Point",
"coordinates": [
-73.9814,
40.7681
]
}
]
}
]
So it found the origin and destination that was nearest to the queried locations, also being an "origin" within the required time and something that is defined as a destination, i.e. "isEnd".
So the $geoNear operation does the matching with the returned results being the documents nearest to the point and other conditions. The $group stage is required because other documents in the same trip could "possibly" match the conditions,so it's just a way of making sure. The $first operator makes sure that the already "sorted" results will contain only one result per "trip". If you are really "sure" that will not happen with the conditions, then you could just use a standard $nearSphere query outside of aggregation instead. So I'm playing it safe here.
One thing to note there that even with the inclusion on "nedb" here and though it does support dumping output to disk, the data is still accumulated in memory. If you are expecting large results then rather than this type of "hash table" implementation, you would need to output in a similar fashion to what is shown to another mongodb collection and retrieve the matching results from there.
That doesn't change the overall logic though, and yet another reason to use "nedb" to demonstrate, since you would "upsert" to documents in the results collection in the same way.

Related

Mongoose aggregate match returns empty array

I'm working with mongodb aggregations using mongoose and a I'm doubt what am I doing wrong in my application.
Here is my document:
{
"_id": "5bf6fe505ca52c2088c39a45",
"loc": {
"type": "Point",
"coordinates": [
-43.......,
-19......
]
},
"name": "......",
"friendlyName": "....",
"responsibleName": "....",
"countryIdentification": "0000000000",
"categories": [
"5bf43af0f9b41a21e03ef1f9"
]
"created_at": "2018-11-22T19:06:56.912Z",
"__v": 0
}
At the context of my application I need to search documents by GeoJSON, and I execute this search using geoNear. Ok it works fine! But moreover I need to "match" or "filter" specific "categories" in the document. I think it's possible using $match but certainly I'm doing the things wrong. Here is the code:
CompanyModel.aggregate(
[
{
"$geoNear": {
"near": {
"type": "Point",
"coordinates": [pageOptions.loc.lng, pageOptions.loc.lat]
},
"distanceField": "distance",
"spherical": true,
"maxDistance": pageOptions.distance
}
},
{
"$match": {
categories: { "$in": [pageOptions.category] }
}
}
]
).then(data => {
resolve({ statusCode: 200, data: data });
}).catch(err => {
console.log(err);
reject({ statusCode: 500, error: "Error getting documents", err: err });
})
pageOptions:
var pageOptions = {
loc: {
lat: parseFloat(req.query.lat),
lng: parseFloat(req.query.lng)
},
distance: parseInt(req.query.distance) || 10000,
category: req.params.category || ""
}
If I remove $match I get all the documents by location, but I need to filter specific categories... I don't believe that I need to filter it manually, I believe it can be possible with aggregation functions...
So anyone can help me with this mongoose implementation?
Thanks for all help
In MongoDB you need to make sure that data type in your document matches the type in your query. In this case you have a string stored in the database and you're trying to use ObjectId to build the $match stage. To fix that you can use valueOf() operator on pageOptions.category, try:
{
"$match": {
categories: { "$in": [pageOptions.category.valueOf()] }
}
}

Forbid usage of the specifix index for the query

I have a mongodb collection with the following schema:
{
"description": "some arbitrary text",
"type": "TYPE", # there are a lot of different types
"status": "STATUS" # there are a few different statuses
}
I also have two indexes: for "type" and for "status".
Now I run a query:
db.obj.count({
type: { $in: ["SOME_TYPE"] },
status: { $ne: "SOME_STATUS" },
description: { $regex: /.*/ }
})
MongoDB chooses to use an index for "status", while "type" would be much better:
"query": {
"count": "obj",
"query": {
"description": Regex('.*', 2),
"status": {
"$ne": "SOME_STATUS"
},
"type": {
"$in": [
"SOME_TYPE"
]
}
}
},
"planSummary": "IXSCAN { status: 1 }"
I know I can use hint to specify an index to use, but I have different queries (which should use different indexes) and I can't annotate every one of them.
As far as I can see, a possible solution would be to forbid usage of "status" index for all queries that contain status: { $ne: "SOME_STATUS" } condition.
Is there a way to do it? Or maybe I want something weird and there is a better way?

Get only closest documents using mongodb geoNear or near

I'm getting crazy to return closest Venues from a specific point using MongoDB. It is the first time I work on it so I'm totally new to this practice.
What I did at the beginning is to create a 2DIndex of my Venue collection.
Now I'm trying to get Venues in a range of 500 meters from a specific point and the code is this:
Venue.find({ location:
{
$near: [ 52.3835443 , 4.8353073 ],
$maxDistance: 0.5 / 6371
}
}, function (err, venues) {
return venues;
});
Unfortunately it return all documents.
The Venue Model has the field for location like this:
"location": {
"type": {
"type": "string"
},
"coordinates": [{ "type": "Number" }]
}
And all my Venues are like this:
{
"name": "name",
"address": "address",
"location": {
"type": "Point",
"coordinates": [50.1981668, 7.9943994999]
}
}
I also tried using $geoNear but I always receive all documents and not only those in 500 meters distance.
EDIT:
Mongo version is 3.2;
index:
{
"v": 1,
"key": {
"location": "2dsphere"
},
"name": "location_2dsphere",
"ns": "mydb.Venue",
"2dsphereIndexVersion": 2
}
document as wrote above:
{
"name": "A name",
"address": "An address",
"location": {
"type": "Point",
"coordinates": [50.1981668, 7.9943994999]
}
}
The $maxDistance operator constrains the results of a geospatial $near or $nearSphere query to the specified distance. The measuring units for the maximum distance are determined by the coordinate system in use. For GeoJSON point object, specify the distance in meters, not radians. ref here
When I was executing this query I got:
planner returned error: unable to find index for $geoNear query
so added **$geometry ** into query body
Venue.find({ location:
{
$near: {
$geometry : {
type : "Point" ,
coordinates : [ 52.3835443 , 4.8353073 ]},
$maxDistance : 500}}}
}, function (err, venues) {
return venues;
});

How to use a document property for $maxDistance in a $nearSphere query?

I have a collection which contains documents like this:
{
"_id" : "cysMrqjootq6YS6WP",
“profile” : {
……
"deliveryDistance” : 20,
"address" : {
"loc" : {
"type" : "Point",
"coordinates" : [
—2.120361,
52.536273
]
} }
}
}
And I have a GeoJSON point like:
var referencePoint= {
"type" : "Point",
"coordinates" : [
—2.120361,
52.536273
]
}
I am using Meteor.js, Node.js and MongoDB. I would like to create a query where the maxDistance from this point is the deliveryDistance property from each document into my collection.
If the maxDistance was a fixed value, the query would be:
myCollection.find({
“profile.address.loc":{
"$nearSphere": {
"$geometry": referencePoint,
"$maxDistance": 20000 //for 20Kms
}
}
})
But this is not the case. For each document, the maxDistance has to be the value of ‘profile.deliveryDistance’. How do I use this value from the document as maxDistance in this query? Is it possible? If not, any other ideas?
You cannot reference existing properties of a document within a .find() query, and at least not within a $near or $nearSphere operation.
Instead the approach here is to use the aggregation framework and $geoNear. This allows you to calculate the distance from the queried point and then compare if that falls within the "deliveryDistance" in the document.
So for meteor, you are probably best off installing the meteorhacks aggregate package, and then doing something like this:
Meteor.publish("aggResults",function(referencePoint) {
var self = this;
var results = myCollection.aggregate([
{ "$geoNear": {
"near": referencePoint,
"distanceField": "distance",
"spherical": true
}},
{ "$redact": {
"$cond": {
"if": { "$lt": [ "$distance", "$profile.deliveryDistance" ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
]);
_.each(results,function(result) {
self.added("client_collection_name",result._id, {
"profile": result.profile,
"distance": result.distance
});
});
self.ready();
})
If your MongoDB sever was less than version 2.6 ( and would have to be at least 2.4 for geospatial queries ) then you would use $project and $match in place of $redact to filter out the documents that did not fall within the "deliveryDistance":
var results = myCollection.aggregate([
{ "$geoNear": {
"near": referencePoint,
"distanceField": "distance",
"spherical": true
}},
{ "$project": {
"profile": 1,
"distance": 1,
"within": { "$lt": [ "$distance", "$profile.distance" ] }
}},
{ "$match": { "within": true } }
]);
But that is the basic case, where you give the server the tools to work out the distance comparison and then return any of those documents.
The wrapping of aggregate output really depends on which way it is important for you to use the data in your application. This is just one sample of putting the output into a client addressable collection.
Of course you can also dig into the driver internals to call .aggregate() as shown here, but it's likely not as flexible as using the mentioned meteor package.

Get documents within a distance using Geospatial Query without Point object

I have a places collection that store location plainly as
place = {
name : "",
latitude: "",
longitude:""
}
Is there any way using mongo shell or spring data mongo where I can query places like this :
select all places with coordinates(places.longitude, place.latitude) near a point(x,y) and within a distance z . Something like:
db.places.find( {
{
"type" : "Point",
"coordinates" : [
places.longitude,
places.latitude
]
}:
{ $geoWithin:
{ $centerSphere: [ [ x, y ] ,z / 3963.2 ]
}
}
})
Or will I have to modify my collection to
place = {
name : "",
"loc" : {
"type" : "Point",
"coordinates" : [
longitude,
latitude
]
}
}
You really should change your data. MongoDB supports either a legacy coordinate pairs format or GeoJSON for geopatial indexes and queries only. You cannot use different fields for the data or "transform" in any way, as the supported field format is required by the "index" that is necessary for operations using $near or $nearSphere.
Best to to the transformation in the shell, since writing other API code for a "one off" operation is unnecesesary. And yes moving forward you really should be using the GeoJSON format:
var bulk = db.places.initializeUnorderedBulkOp(),
count = 0;
db.places.find().forEach(function(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": {
"location": {
"type": "Point",
"coordinates": [parseFloat(doc.longitude),parseFloat(doc.latitude)]
}
},
"$unset": { "latitude": "", "longitude": "" }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.places.initializeUnorderedBulkOp();
}
});
if ( count % 1000 !=0 )
bulk.execute();
Now the data is fixed and compatible with an index, create the index. What makes sense here with GeoJSON data is a "2sphere" index:
db.places.createIndex({ "location": "2dsphere" })
After that then you can query on the document as normal:
db.places.find({
"location": {
"$geoWithin": {
"$centerSphere": [ [ x, y ] ,z]
}
}
})
I should also note that a $centreSphere operation in a $geoWithin actually works out to be the same operation as $nearSphere with the $maxDistance modifier. The exception being that the latter should both process "faster" as well as produce "ordered" results for the "nearest" locations, which is something $geoWithin does not do:
db.places.find({
"$nearSphere": {
"$geometry": {
"type": "Point",
"coordinates": [x,y]
},
"$maxDistance": z
}
})
The only way you can do this on your existing data is for a $geoWithin only. This is because that operation does not require an geospatial index, so you are alowed to "transform" the document first.
You can do this using the .aggregate() method and it's $project pipeline stage along with the $map operator:
db.places.aggregate([
{ "$project": {
"name": 1
"location": {
"type": "Point",
"coordinates": {
"$map": {
"input": ["A","B"],
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", "A" ] },
"$longitude",
"$latitude"
]
}
}
}
}
}},
{ "$match": {
"location": {
"$geoWithin": {
"$centerSphere": [ [ x, y ] ,z]
}
}
}}
])
However your longitude and latitude data must be numeric already as this is something you cannot transform in the aggregation framework. And you must remember that this cannot be used for operations such as $nearSphere as the required index is not available after the initial pipeline stage.
So it can be done, but it is not advisable. It's going to add processing time, and things are going to be better, more flexible and "faster" if you fix the data and add the appropriate index instead.
Also note that all distances with GeoJSON data will be in kilometers rather than radians.