MongoDB: How can I order by distance considering multiple fields? - mongodb

I have a collection that stores information about doctors. Each doctor can work in private practices and/or in hospitals.
The collection has the following relevant fields (there are geospatial indexes on both privatePractices.address.loc and hospitals.address.loc):
{
"name" : "myName",
"privatePractices" : [{
"_id": 1,
"address" : {
"loc" : {
"lng" : 2.1608502864837646,
"lat" : 41.3943977355957
}
}
},
...
],
"hospitals" : [{
"_id": 5,
"address" : {
"loc" : {
"lng" : 2.8192520141601562,
"lat" : 41.97784423828125
}
}
},
...
]
}
I am trying to query that collection to get a list of doctors ordered by distance from a given point. This is where I am stuck:
The following queries return a list of doctors ordered by distance to the point defined in $nearSphere, considering only one of the two location types:
{ "hospitals.address.loc" : { "$nearSphere" : [2.1933, 41.4008] } }
{ "privatePractices.address.loc" : { "$nearSphere" : [2.1933, 41.4008] } }
What I want is to get the doctors ordered by the nearest hospital OR private practice, whatever is the nearest. Is it possible to do it on a single Mongo query?
Plan B is to use the queries above and then manually order the results outside Mongo (eg. using Linq). To do this, my two queries should return the distance of each hospital or private practice to the $nearSphere point. Is it possible to do that in Mongo?
EDIT - APPLIED SOLUTION (MongoDB 2.6):
I took my own approach inspired by what Neil Lunn suggests in his answer: I added a field in the Doctor document for sorting purposes, containing an array with all the locations of the doctor.
I tried this approach in MongoDB 2.4 and MongoDB 2.6, and the results are different.
Queries on 2.4 returned duplicate doctors that had more than a location, even if the _id was included in the query filter. Queries on 2.6 returned valid results.

I would have been hoping for a little more information here, but the basics still apply. So the general problem you have stumbled on is trying to have "two" location fields on what appears to be your doctors documents.
There is another problem with the approach. You have the "locations" within arrays in your document/ This would not give you an error when creating the index, but it also is not going to work like you expect. The big problem here is that being within an array, you might find the document that "contains" the nearest location, but then the question is "which one", as nothing is done to affect the array content.
Core problem though is you cannot have more than one geo-spatial index per query. But to really get what you want, turn the problem on it's head, and essentially attach the doctors to the locations, which is the other way around.
For example here, a "practices" collection or such:
{
"type": "Hospital",
"address" : {
"loc" : {
"lng" : 2.8192520141601562,
"lat" : 41.97784423828125
}
},
"doctors": [
{ "_id": 1, "name": "doc1", "specialty": "bones" },
{ "_id": 2, "name": "doc2", "specialty": "heart" }
]
}
{
"type": "Private",
"address" : {
"loc" : {
"lng" : 2.1608502864837646,
"lat" : 41.3943977355957
}
},
"doctors": [
{ "_id": 1, "name": "doc1", "specialty": "bones" },
{ "_id": 3, "name": "doc3", "specialty": "brain" }
]
}
The advantage here is that you have here is that as a single collection and all in the same index you can simply get both "types" and correctly ordered by distance or within bounds or whatever your geo-queries need be. This avoids the problems with the other modelling form.
As for the "doctors" information, of course you actually keep a separate collection for the full doctor information themselves, and possibly even keep an array of the _id values for the location documents there as well. But the main point here is that you can generally live with "embedding" some of the useful search information in a collection here that will help you.
That seems to be the better option here, and matching a doctor to criteria from inside the location is something that can be done, where as finding or sorting the nearest entry inside array is something that is not going to be supported by MongoDB itself, and would result in you applying the Math yourself in processing the results.

Related

dynamic size of subdocument mongodb

I'm using mongodb and mongoose for my web application. The web app is used for registration for swimming competitions and each competition can have X number of races. My data structure as of now:
{
"_id": "1",
"name": "Utmanaren",
"location": "town",
"startdate": "20150627",
"enddate": "20150627"
"race" : {
"gender" : "m"
"style" : "freestyle"
"length" : "100"
}
}
Doing this i need to determine and define the number of races for every competition. A solution i tried is having a separate document and having a Id for which competition a races belongs to, like below.
{
"belongsTOId" : "1"
"gender" : "m"
"style" : "freestyle"
"length" : "100"
}
{
"belongsTOId" : "1"
"gender" : "f"
"style" : "butterfly"
"length" : "50"
}
Is there a way of creating and defining dynamic number of races as a subdocument while using Mongodb?
Thanks!
You have basically two approaches of modelling your data structure; you can either design a schema where you can reference or embed the races document.
Let's consider the following example that maps swimming competition and multiple races relationships. This demonstrates the advantage of embedding over referencing if you need to view many data entities in context of another. In this one-to-many relationship between competition and race data, the competition has multiple races entities:
// db.competition schema
{
"_id": 1,
"name": "Utmanaren",
"location": "town",
"startdate": "20150627",
"enddate": "20150627"
"races": [
{
"gender" : "m"
"style" : "freestyle"
"length" : "100"
},
{
"gender" : "f"
"style" : "butterfly"
"length" : "50"
}
]
}
With the embedded data model, your application can retrieve the complete swimming competition information with just one query. This design has other merits as well, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk. The other advantage with embedded documents is the atomicity and isolation in writing data. To illustrate this, say you want to remove a competition which has a race "style" property with value "butterfly", this can be done with one single (atomic) operation:
db.competition.remove({"races.style": "butterfly"});
For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, specifically Model One-to-Many Relationships with Embedded Documents
The other design option is referencing documents follow a normalized schema where the race documents contain a reference to the competition document:
// db.race schema
{
"_id": 1,
"competition_id": 1,
"gender": "m",
"style": "freestyle",
"length": "100"
},
{
"_id": 2,
"competition_id": 1,
"gender": "f",
"style": "butterfly",
"length": "50"
}
The above approach gives increased flexibility in performing queries. For instance, to retrieve all child race documents where the main parent entity competition has id 1 will be straightforward, simply create a query against the collection race:
db.race.find({"competiton_id": 1});
The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of race documents per given competition, the embedding option has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.
If your application frequently retrieves the race data with the competition information, then your application needs to issue multiple queries to resolve the references.
The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.
Ref:
MongoDB Applied Design Patterns: Practical Use Cases with the Leading NoSQL Database By Rick Copeland
You basically want to update the data, so you should upsert the data which is basically an update on the subdocument key.
Keep an array of keys in the main document.
Insert the sub-document and add the key to the list or update the list.
To push single item into the field ;
db.yourcollection.update( { $push: { "races": { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"} } } );
To push multiple items into the field it allows duplicate in the field;
db.yourcollection.update( { $push: { "races": { $each: [ { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"}, { "belongsTOId" : "2" , "gender" : "m" , "style" : "horse" , "length" : "70"} ] } } } );
To push multiple items without duplicated items;
db.yourcollection.update( { $addToSet: { "races": { $each: [ { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"}, { "belongsTOId" : "2" , "gender" : "m" , "style" : "horse" , "length" : "70"} ] } } } );
$pushAll deprecated since version 2.4, so we use $each in $push instead of $pushAll.
While using $push you will be able to sort and slice items. You might check the mongodb manual.

query to retrieve multiple objects in an array in mongodb

Suppose I have a array of objects as below.
"array" : [
{
"id" : 1
},
{
"id" : 2
},
{
"id" : 2
},
{
"id" : 4
}
]
If I want to retrieve multiple objects ({id : 2}) from this array, the aggregation query goes like this.
db.coll.aggregate([{ $match : {"_id" : ObjectId("5492690f72ae469b0e37b61c")}}, { $unwind : "$array"}, { $match : { "array.id" : 2}}, { $group : { _id : "$_id", array : { $push : { id : "$array.id"}}}} ])
The output of above aggregation is
{
"_id" : ObjectId("5492690f72ae469b0e37b61c"),
"array" : [
{
"id" : 2
},
{
"id" : 2
}
]
}
Now the question is:
1) Is retrieving of multiple objects from an array possible using find() in MongoDB?
2) With respect to performance, is aggregation is the correct way to do? (Because we need to use four pipeline operators) ?
3) Can we use Java manipulation (looping the array and only keep {id : 2} objects) to do this after
find({"_id" : ObjectId("5492690f72ae469b0e37b61c")}) query? Because find will once retrieve the document and keeps it in RAM. But if we use aggregation four operations need to be performed in RAM to get the output.
Why I asked the 3) question is: Suppose if thousands of clients accessing at the same time, then RAM memory will be overloaded. If it is done using Java, less task on RAM.
4) For how long the workingSet will be in RAM??
Is my understanding correct???
Please correct me if I am wrong.
Please suggest me to have right insight on this..
No. You project the first matching one with $, you project all of them, or you project none of them.
No-ish. If you have to work with this array, aggregation is what will allow you to extract multiple matching elements, but the correct solution, conceptually and for performance, is to design your document structure so this problem does not arise, or arises only for rare queries whose performance is not particularly important.
Yes.
We have no information that would allow us to give a reasonable answer to this question. This is also out of scope relative to the rest of the question and should be a separate question.

MongoDB / Morphia - Projection not working on recursive objects?

I have a test object which works as nodes on a tree, containing 0 or more children instances of the same type. I'm persisting it on MongoDB and querying it with Morphia.
I perform the following query:
db.TestObject.find( {}, { _id: 1, childrenTestObjects: 1 } ).limit(6).sort( {_id: 1 } ).pretty();
Which results in:
{ "_id" : NumberLong(1) }
{ "_id" : NumberLong(2) }
{ "_id" : NumberLong(3) }
{ "_id" : NumberLong(4) }
{
"_id" : NumberLong(5),
"childrenTestObjects" : [
{
"stringValue" : "6eb887126d24e8f1cd8ad5033482c781",
"creationDate" : ISODate("1997-05-24T00:00:00Z")
"childrenTestObjects" : [
{
"stringValue" : "2ab8f86410b4f3bdcc747699295eb5a4",
"creationDate" : ISODate("2024-10-10T00:00:00Z"),
"_id" : NumberLong(7)
}
],
"_id" : NumberLong(6)
}
]
}
That's awesome, but also a little surprising. I'm having two issues with the results:
1) When I do a projection, it only applies to the top elements. The children elements still return other properties not in the projection (stringValue and creationDate). I'd like the field selection to apply to all documents and sub documents of the same type. This tree has an undermined number of sub items, so I can't specify that in the query explicitly. How to accomplish that?
2) To my surprise, limit applied to sub documents! You see that there was one embedded document with id 6. I was expecting to see 6 top level documents with N sub documents, but instead got just 5. How to tell MongoDB to return 6 top level elements, regardless of what is embedded in them? Without that having a consistent pagination system is impossible.
All your help has made learning MongoDB way faster and I really appreciate it! Thanks!
As for 1), projections retain fields in the results. In this case that field is childrenTestObjects which happens to be a document. So mongo returns that entire field which is, of course, the entire subdocument. Projections are not recursive so you'd have to specify each field explicitly.
As for 2), that doesn't sound right. it would help to see the query results without the projections added (full documents in each return document) and we can take it from there.

what is the real purpose of $ref (DBRef) in MongoDb

I want to use mongo for my app, and while I was thinking about designing issues, I came up with question, so what are the advantages/purposes of DBRef?
for example:
> names = ['apple', 'banana', 'orange', 'peach', 'pineapple']
[ "apple", "banana", "orange", "peach", "pineapple" ]
> for (i=0; i<5; i++) {
... db.fruits.insert({_id:i, name:names[i]})
... }
> db.fruits.find()
{ "_id" : 0, "name" : "apple" }
{ "_id" : 1, "name" : "banana" }
{ "_id" : 2, "name" : "orange" }
{ "_id" : 3, "name" : "peach" }
{ "_id" : 4, "name" : "pineapple" }
and I want to store those fruits in a basket collection:
> db.basket.insert({_id:1, items:[ {$ref:'fruits', $id:1}, {$ref:'fruits', $id:3} ] })
> db.basket.insert({_id:2, items:[{fruit_id: 1}, {fruit_id: 3}]})
> db.basket.find()
{ "_id" : 1, "items" : [ DBRef("fruits", 1), DBRef("fruits", 3) ] }
{ "_id" : 2, "items" : [ { "fruit_id" : 1 }, { "fruit_id" : 3 } ] }
What are the real difference between those two techniques? For me it looks like using DBRef you just have to insert more data without any advantages.... Please correct me if I'm wrong.
Basically a DBRef is a self describing ObjectID which a client side helper, which exists in all drivers (I think all), provides the ability within your application to get related rows easily.
They are not:
JOINs
Cascadeable relations
Server-side relations
Resolved Server-side
They also are not used within Map Reduce, the functionality was taken out due to complications with sharding.
It is not always great to use these though, for one they take quite a bit of space if you know the collection that is related to that row in comparison to just storing the ObjectID. Not only that but due to how they are resolved each related record needs to be lazy loaded one by one instead if being able to form a range (easily) to query for related rows all in one go, so they can increase the amount of queries you make to the database as well, in turn increasing cursors.
From "MongoDB: The Definitive Guide" DBRefs aren't necessary and storing a MongoID is more lightweight, but DBRefs offer some interesting functionality like the following:
Loading each DBRef in a document:
var note = db.notes.findOne({"_id":20});
note.references.forEach(function(ref) {
printjson(db[ref.$ref].findOne({"_id": ref.$id}));
});
They're also helpful if the references are stored across different collections and databases as the DBRef contains that info. If you use a MongoID you'd have to remember which DB and collection the MongoID is in reference to.
In your example a basket document's items array might contain references in the fruits collection, but also the vegetables collect. A DBRef would actually be handy in this case.

Suitability of MongoDB for hierarchial type queries

I have a particular data manipulation requirement that I have worked out how to do in SQL Server and PostgreSQL. However, I'm not too happy with the speed, so I am investigating MongoDB.
The best way to describe the query is as follows. Picture the hierarchical data of the USA: Country, State, County, City. Let's say a particular vendor can service the whole of California. Another can perhaps service only Los Angeles. There are potentially hundreds of thousands of vendors and they all can service from some point(s) in this hierarchy down. I am not confusing this with Geo - I am using this to illustrate the need.
Using recursive queries, it is quite simple to get a list of all vendors who could service a particular user. If he were in say Pasadena, Los Angeles, California, we would walk up the hierarchy to get the applicable IDs, then query back down to find the vendors.
I know this can be optimized. Again, this is just a simple query example.
I know MongoDB is a document store. That suits other needs I have very well. The question is how well suited is it to the query type I describe? (I know it doesn't have joins - those are simulated).
I get that this is a "how long is a piece of string" question. I just want to know if anyone has any experience with MongoDB doing this sort of thing. It could take me quite some time to go from 0 to tested, and I'm looking to save time if MongoDB is not suited to this.
EXAMPLE
A local movie store "A" can supply Blu-Rays in Springfield. A chain store "B" with state-wide distribution can supply Blu-Rays to all of IL. And a download-on-demand store "C" can supply to all of the US.
If we wanted to get all applicable movie suppliers for Springfield, IL, the answer would be [A, B, C].
In other words, there are numerous vendors attached at differing levels on the hierarchy.
I realize this question was asked nearly a year ago, but since then MongoDB has an officially supported solution for this problem, and I just used their solution. Refer to their documentation here: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/
The concept relating closest to your question is named "partial path."
While it may feel a bit heavy to embed ancestor data; this approach is the most suitable way to solve your problem in MongoDB. The only pitfall to this, that I've experienced so far, is that if you're storing all of this in a single document you can hit the, as of this time, 16MB document size limit when working with enough data (although, I can only see this happening if you're using this structure to track user referrals [which could reach millions] rather than US cities [which is upwards of 26,000 according to the latest US Census]).
References:
http://www.mongodb.org/display/DOCS/Schema+Design
http://www.census.gov/geo/www/gazetteer/places2k.html
Modifications:
Replaced link: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
Note that this question was also asked on the google group. See http://groups.google.com/group/mongodb-user/browse_thread/thread/5cd5edd549813148 for that disucssion.
One option is to use an array key. You can store the hierarchy as an
array of values (for example ['US','CA','Los Angeles']). Then you can
query against records based on individual elements in that array key
For example:
First, store some documents with the array value representing the
hierarchy
> db.hierarchical.save({ location: ['US','CA','LA'], name: 'foo'} )
> db.hierarchical.save({ location: ['US','CA','SF'], name: 'bar'} )
> db.hierarchical.save({ location: ['US','MA','BOS'], name: 'baz'} )
Make sure we have an index on the location field so we can perform
fast queries against its values
> db.hierarchical.ensureIndex({'location':1})
Find all records in California
> db.hierarchical.find({location: 'CA'})
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" }
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" }
Find all records in Massachusetts
> db.hierarchical.find({location: 'MA'})
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" }
Find all records in the US
> db.hierarchical.find({location: 'US'})
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" }
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" }
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" }
Note that in this model, your values in the array would need to be
unique. So for example, if you had 'springfield' in different states,
then you would need to do some extra work to differentiate.
> db.hierarchical.save({location:['US','MA','Springfield'], name: 'one' })
> db.hierarchical.save({location:['US','IL','Springfield'], name: 'two' })
> db.hierarchical.find({location: 'Springfield'})
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" }
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" }
You can overcome this by using the $all operator and specifying more
levels of the hierarchy. For example:
> db.hierarchical.find({location: { $all : ['US','MA','Springfield']} })
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" }
> db.hierarchical.find({location: { $all : ['US','IL','Springfield']} })
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" }