what is the real purpose of $ref (DBRef) in MongoDb - mongodb

I want to use mongo for my app, and while I was thinking about designing issues, I came up with question, so what are the advantages/purposes of DBRef?
for example:
> names = ['apple', 'banana', 'orange', 'peach', 'pineapple']
[ "apple", "banana", "orange", "peach", "pineapple" ]
> for (i=0; i<5; i++) {
... db.fruits.insert({_id:i, name:names[i]})
... }
> db.fruits.find()
{ "_id" : 0, "name" : "apple" }
{ "_id" : 1, "name" : "banana" }
{ "_id" : 2, "name" : "orange" }
{ "_id" : 3, "name" : "peach" }
{ "_id" : 4, "name" : "pineapple" }
and I want to store those fruits in a basket collection:
> db.basket.insert({_id:1, items:[ {$ref:'fruits', $id:1}, {$ref:'fruits', $id:3} ] })
> db.basket.insert({_id:2, items:[{fruit_id: 1}, {fruit_id: 3}]})
> db.basket.find()
{ "_id" : 1, "items" : [ DBRef("fruits", 1), DBRef("fruits", 3) ] }
{ "_id" : 2, "items" : [ { "fruit_id" : 1 }, { "fruit_id" : 3 } ] }
What are the real difference between those two techniques? For me it looks like using DBRef you just have to insert more data without any advantages.... Please correct me if I'm wrong.

Basically a DBRef is a self describing ObjectID which a client side helper, which exists in all drivers (I think all), provides the ability within your application to get related rows easily.
They are not:
JOINs
Cascadeable relations
Server-side relations
Resolved Server-side
They also are not used within Map Reduce, the functionality was taken out due to complications with sharding.
It is not always great to use these though, for one they take quite a bit of space if you know the collection that is related to that row in comparison to just storing the ObjectID. Not only that but due to how they are resolved each related record needs to be lazy loaded one by one instead if being able to form a range (easily) to query for related rows all in one go, so they can increase the amount of queries you make to the database as well, in turn increasing cursors.

From "MongoDB: The Definitive Guide" DBRefs aren't necessary and storing a MongoID is more lightweight, but DBRefs offer some interesting functionality like the following:
Loading each DBRef in a document:
var note = db.notes.findOne({"_id":20});
note.references.forEach(function(ref) {
printjson(db[ref.$ref].findOne({"_id": ref.$id}));
});
They're also helpful if the references are stored across different collections and databases as the DBRef contains that info. If you use a MongoID you'd have to remember which DB and collection the MongoID is in reference to.
In your example a basket document's items array might contain references in the fruits collection, but also the vegetables collect. A DBRef would actually be handy in this case.

Related

dynamic size of subdocument mongodb

I'm using mongodb and mongoose for my web application. The web app is used for registration for swimming competitions and each competition can have X number of races. My data structure as of now:
{
"_id": "1",
"name": "Utmanaren",
"location": "town",
"startdate": "20150627",
"enddate": "20150627"
"race" : {
"gender" : "m"
"style" : "freestyle"
"length" : "100"
}
}
Doing this i need to determine and define the number of races for every competition. A solution i tried is having a separate document and having a Id for which competition a races belongs to, like below.
{
"belongsTOId" : "1"
"gender" : "m"
"style" : "freestyle"
"length" : "100"
}
{
"belongsTOId" : "1"
"gender" : "f"
"style" : "butterfly"
"length" : "50"
}
Is there a way of creating and defining dynamic number of races as a subdocument while using Mongodb?
Thanks!
You have basically two approaches of modelling your data structure; you can either design a schema where you can reference or embed the races document.
Let's consider the following example that maps swimming competition and multiple races relationships. This demonstrates the advantage of embedding over referencing if you need to view many data entities in context of another. In this one-to-many relationship between competition and race data, the competition has multiple races entities:
// db.competition schema
{
"_id": 1,
"name": "Utmanaren",
"location": "town",
"startdate": "20150627",
"enddate": "20150627"
"races": [
{
"gender" : "m"
"style" : "freestyle"
"length" : "100"
},
{
"gender" : "f"
"style" : "butterfly"
"length" : "50"
}
]
}
With the embedded data model, your application can retrieve the complete swimming competition information with just one query. This design has other merits as well, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk. The other advantage with embedded documents is the atomicity and isolation in writing data. To illustrate this, say you want to remove a competition which has a race "style" property with value "butterfly", this can be done with one single (atomic) operation:
db.competition.remove({"races.style": "butterfly"});
For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, specifically Model One-to-Many Relationships with Embedded Documents
The other design option is referencing documents follow a normalized schema where the race documents contain a reference to the competition document:
// db.race schema
{
"_id": 1,
"competition_id": 1,
"gender": "m",
"style": "freestyle",
"length": "100"
},
{
"_id": 2,
"competition_id": 1,
"gender": "f",
"style": "butterfly",
"length": "50"
}
The above approach gives increased flexibility in performing queries. For instance, to retrieve all child race documents where the main parent entity competition has id 1 will be straightforward, simply create a query against the collection race:
db.race.find({"competiton_id": 1});
The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of race documents per given competition, the embedding option has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.
If your application frequently retrieves the race data with the competition information, then your application needs to issue multiple queries to resolve the references.
The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.
Ref:
MongoDB Applied Design Patterns: Practical Use Cases with the Leading NoSQL Database By Rick Copeland
You basically want to update the data, so you should upsert the data which is basically an update on the subdocument key.
Keep an array of keys in the main document.
Insert the sub-document and add the key to the list or update the list.
To push single item into the field ;
db.yourcollection.update( { $push: { "races": { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"} } } );
To push multiple items into the field it allows duplicate in the field;
db.yourcollection.update( { $push: { "races": { $each: [ { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"}, { "belongsTOId" : "2" , "gender" : "m" , "style" : "horse" , "length" : "70"} ] } } } );
To push multiple items without duplicated items;
db.yourcollection.update( { $addToSet: { "races": { $each: [ { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"}, { "belongsTOId" : "2" , "gender" : "m" , "style" : "horse" , "length" : "70"} ] } } } );
$pushAll deprecated since version 2.4, so we use $each in $push instead of $pushAll.
While using $push you will be able to sort and slice items. You might check the mongodb manual.

query to retrieve multiple objects in an array in mongodb

Suppose I have a array of objects as below.
"array" : [
{
"id" : 1
},
{
"id" : 2
},
{
"id" : 2
},
{
"id" : 4
}
]
If I want to retrieve multiple objects ({id : 2}) from this array, the aggregation query goes like this.
db.coll.aggregate([{ $match : {"_id" : ObjectId("5492690f72ae469b0e37b61c")}}, { $unwind : "$array"}, { $match : { "array.id" : 2}}, { $group : { _id : "$_id", array : { $push : { id : "$array.id"}}}} ])
The output of above aggregation is
{
"_id" : ObjectId("5492690f72ae469b0e37b61c"),
"array" : [
{
"id" : 2
},
{
"id" : 2
}
]
}
Now the question is:
1) Is retrieving of multiple objects from an array possible using find() in MongoDB?
2) With respect to performance, is aggregation is the correct way to do? (Because we need to use four pipeline operators) ?
3) Can we use Java manipulation (looping the array and only keep {id : 2} objects) to do this after
find({"_id" : ObjectId("5492690f72ae469b0e37b61c")}) query? Because find will once retrieve the document and keeps it in RAM. But if we use aggregation four operations need to be performed in RAM to get the output.
Why I asked the 3) question is: Suppose if thousands of clients accessing at the same time, then RAM memory will be overloaded. If it is done using Java, less task on RAM.
4) For how long the workingSet will be in RAM??
Is my understanding correct???
Please correct me if I am wrong.
Please suggest me to have right insight on this..
No. You project the first matching one with $, you project all of them, or you project none of them.
No-ish. If you have to work with this array, aggregation is what will allow you to extract multiple matching elements, but the correct solution, conceptually and for performance, is to design your document structure so this problem does not arise, or arises only for rare queries whose performance is not particularly important.
Yes.
We have no information that would allow us to give a reasonable answer to this question. This is also out of scope relative to the rest of the question and should be a separate question.

MongoDB / Morphia - Projection not working on recursive objects?

I have a test object which works as nodes on a tree, containing 0 or more children instances of the same type. I'm persisting it on MongoDB and querying it with Morphia.
I perform the following query:
db.TestObject.find( {}, { _id: 1, childrenTestObjects: 1 } ).limit(6).sort( {_id: 1 } ).pretty();
Which results in:
{ "_id" : NumberLong(1) }
{ "_id" : NumberLong(2) }
{ "_id" : NumberLong(3) }
{ "_id" : NumberLong(4) }
{
"_id" : NumberLong(5),
"childrenTestObjects" : [
{
"stringValue" : "6eb887126d24e8f1cd8ad5033482c781",
"creationDate" : ISODate("1997-05-24T00:00:00Z")
"childrenTestObjects" : [
{
"stringValue" : "2ab8f86410b4f3bdcc747699295eb5a4",
"creationDate" : ISODate("2024-10-10T00:00:00Z"),
"_id" : NumberLong(7)
}
],
"_id" : NumberLong(6)
}
]
}
That's awesome, but also a little surprising. I'm having two issues with the results:
1) When I do a projection, it only applies to the top elements. The children elements still return other properties not in the projection (stringValue and creationDate). I'd like the field selection to apply to all documents and sub documents of the same type. This tree has an undermined number of sub items, so I can't specify that in the query explicitly. How to accomplish that?
2) To my surprise, limit applied to sub documents! You see that there was one embedded document with id 6. I was expecting to see 6 top level documents with N sub documents, but instead got just 5. How to tell MongoDB to return 6 top level elements, regardless of what is embedded in them? Without that having a consistent pagination system is impossible.
All your help has made learning MongoDB way faster and I really appreciate it! Thanks!
As for 1), projections retain fields in the results. In this case that field is childrenTestObjects which happens to be a document. So mongo returns that entire field which is, of course, the entire subdocument. Projections are not recursive so you'd have to specify each field explicitly.
As for 2), that doesn't sound right. it would help to see the query results without the projections added (full documents in each return document) and we can take it from there.

MongoDB: How can I order by distance considering multiple fields?

I have a collection that stores information about doctors. Each doctor can work in private practices and/or in hospitals.
The collection has the following relevant fields (there are geospatial indexes on both privatePractices.address.loc and hospitals.address.loc):
{
"name" : "myName",
"privatePractices" : [{
"_id": 1,
"address" : {
"loc" : {
"lng" : 2.1608502864837646,
"lat" : 41.3943977355957
}
}
},
...
],
"hospitals" : [{
"_id": 5,
"address" : {
"loc" : {
"lng" : 2.8192520141601562,
"lat" : 41.97784423828125
}
}
},
...
]
}
I am trying to query that collection to get a list of doctors ordered by distance from a given point. This is where I am stuck:
The following queries return a list of doctors ordered by distance to the point defined in $nearSphere, considering only one of the two location types:
{ "hospitals.address.loc" : { "$nearSphere" : [2.1933, 41.4008] } }
{ "privatePractices.address.loc" : { "$nearSphere" : [2.1933, 41.4008] } }
What I want is to get the doctors ordered by the nearest hospital OR private practice, whatever is the nearest. Is it possible to do it on a single Mongo query?
Plan B is to use the queries above and then manually order the results outside Mongo (eg. using Linq). To do this, my two queries should return the distance of each hospital or private practice to the $nearSphere point. Is it possible to do that in Mongo?
EDIT - APPLIED SOLUTION (MongoDB 2.6):
I took my own approach inspired by what Neil Lunn suggests in his answer: I added a field in the Doctor document for sorting purposes, containing an array with all the locations of the doctor.
I tried this approach in MongoDB 2.4 and MongoDB 2.6, and the results are different.
Queries on 2.4 returned duplicate doctors that had more than a location, even if the _id was included in the query filter. Queries on 2.6 returned valid results.
I would have been hoping for a little more information here, but the basics still apply. So the general problem you have stumbled on is trying to have "two" location fields on what appears to be your doctors documents.
There is another problem with the approach. You have the "locations" within arrays in your document/ This would not give you an error when creating the index, but it also is not going to work like you expect. The big problem here is that being within an array, you might find the document that "contains" the nearest location, but then the question is "which one", as nothing is done to affect the array content.
Core problem though is you cannot have more than one geo-spatial index per query. But to really get what you want, turn the problem on it's head, and essentially attach the doctors to the locations, which is the other way around.
For example here, a "practices" collection or such:
{
"type": "Hospital",
"address" : {
"loc" : {
"lng" : 2.8192520141601562,
"lat" : 41.97784423828125
}
},
"doctors": [
{ "_id": 1, "name": "doc1", "specialty": "bones" },
{ "_id": 2, "name": "doc2", "specialty": "heart" }
]
}
{
"type": "Private",
"address" : {
"loc" : {
"lng" : 2.1608502864837646,
"lat" : 41.3943977355957
}
},
"doctors": [
{ "_id": 1, "name": "doc1", "specialty": "bones" },
{ "_id": 3, "name": "doc3", "specialty": "brain" }
]
}
The advantage here is that you have here is that as a single collection and all in the same index you can simply get both "types" and correctly ordered by distance or within bounds or whatever your geo-queries need be. This avoids the problems with the other modelling form.
As for the "doctors" information, of course you actually keep a separate collection for the full doctor information themselves, and possibly even keep an array of the _id values for the location documents there as well. But the main point here is that you can generally live with "embedding" some of the useful search information in a collection here that will help you.
That seems to be the better option here, and matching a doctor to criteria from inside the location is something that can be done, where as finding or sorting the nearest entry inside array is something that is not going to be supported by MongoDB itself, and would result in you applying the Math yourself in processing the results.

How can I work with translated strings in my schema in mongodb?

I'm design a schema for MongoDB and have a question. Here's an example of document that I need to save:
Product {
"_id" : ObjectID("..."),
"name" : "MyProduct",
"category" : {Catid:ObjectID(".."), name: "Eletronic"}
}
This "category" refers to another collection that has all the categories...I save the 'name' inside the product because I need the name of the category when I find a Product..
But this category's name needs to be translated..
How I design this??
I'd suggest storing the category (or categories) as an identifier within the product and not doing denormalization. As it would be typical that you'll have an application/middle-tier/web server doing queries against the MongoDB, it's reasonable to apply a simple caching layer for categories and their translations in memory (you wouldn't even need to cache them very long if that was important).
Product {
"_id" : ObjectID("..."),
"name" : "MyProduct",
"category" : ObjectID("..")
}
Category {
"_id" : ObjectID("..."),
"en-us" : "cheese",
"de-de" : "Käse",
"es-mx" : "queso"
}
Or, category could be stored with more structure to handle regional variances:
Category {
"_id" : ObjectID("..."),
"en" : { default: "cheese" }
"de-de" : { default: "käse", "at": "käse2" }
"es" : { default: "queso" }
}
If you do a query like:
db.products.find({ price : { $gt: 50.00 }})
which returns a list of matches, you can gather all of the categories from the matching product documents, and use $in to quickly fetch any non-cached category values for the current locale. So, you can minimize the number of extra round-trips to the database by doing the query using this technique. If you have a large set of categories to match, you might consider doing them in batches.
db.categories.find( { _id : { $in : [array_of_ids] } });
Then, match them together.
MongoDB (and most other NoSQL databases) do not support of relations.
This doesn't mean you cannot define relationships/references in NoSQL databases. It simply means there are no native query operator available.
There are two different ways to "refer" to one document from another in MongoDB :
Store the referred document's ID (usually an ObjectId) as a field in the referring document. This is the best approach if your app will know in which collection it has to look for the referred document. Example : {_id: ObjectId(...),category: ObjectId(...)} <- reference).
Not technically a reference but in a lot of cases it makes sense to embed (parts of) documents into other documents. Note that normalization of your schema should be less of a focus with NoSQL databases. Example : {_id: ObjectId(...); category: {_id: ObjectId(...), name:"xyz"}}.
There are two ways you can go. One, like you have indicated, is called "Denormalization" where you save the category information (the name in each language) in the Product document itself. That way, when you load a Product, you already have the name in each language. You could model a Product like this:
{
_id: ObjectId(""),
name: "MyProduct",
category: {
Catid: ObjectId(""),
names: {
en: "MyCategory",
es: "...",
fr: "..."
}
}
}
The other option, if the category name changes too much or you add languages regularly, is to not save the names on the category names on the product, but to rather do a query on the Category collection for the Category when you need it.