dynamic size of subdocument mongodb - mongodb

I'm using mongodb and mongoose for my web application. The web app is used for registration for swimming competitions and each competition can have X number of races. My data structure as of now:
{
"_id": "1",
"name": "Utmanaren",
"location": "town",
"startdate": "20150627",
"enddate": "20150627"
"race" : {
"gender" : "m"
"style" : "freestyle"
"length" : "100"
}
}
Doing this i need to determine and define the number of races for every competition. A solution i tried is having a separate document and having a Id for which competition a races belongs to, like below.
{
"belongsTOId" : "1"
"gender" : "m"
"style" : "freestyle"
"length" : "100"
}
{
"belongsTOId" : "1"
"gender" : "f"
"style" : "butterfly"
"length" : "50"
}
Is there a way of creating and defining dynamic number of races as a subdocument while using Mongodb?
Thanks!

You have basically two approaches of modelling your data structure; you can either design a schema where you can reference or embed the races document.
Let's consider the following example that maps swimming competition and multiple races relationships. This demonstrates the advantage of embedding over referencing if you need to view many data entities in context of another. In this one-to-many relationship between competition and race data, the competition has multiple races entities:
// db.competition schema
{
"_id": 1,
"name": "Utmanaren",
"location": "town",
"startdate": "20150627",
"enddate": "20150627"
"races": [
{
"gender" : "m"
"style" : "freestyle"
"length" : "100"
},
{
"gender" : "f"
"style" : "butterfly"
"length" : "50"
}
]
}
With the embedded data model, your application can retrieve the complete swimming competition information with just one query. This design has other merits as well, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk. The other advantage with embedded documents is the atomicity and isolation in writing data. To illustrate this, say you want to remove a competition which has a race "style" property with value "butterfly", this can be done with one single (atomic) operation:
db.competition.remove({"races.style": "butterfly"});
For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, specifically Model One-to-Many Relationships with Embedded Documents
The other design option is referencing documents follow a normalized schema where the race documents contain a reference to the competition document:
// db.race schema
{
"_id": 1,
"competition_id": 1,
"gender": "m",
"style": "freestyle",
"length": "100"
},
{
"_id": 2,
"competition_id": 1,
"gender": "f",
"style": "butterfly",
"length": "50"
}
The above approach gives increased flexibility in performing queries. For instance, to retrieve all child race documents where the main parent entity competition has id 1 will be straightforward, simply create a query against the collection race:
db.race.find({"competiton_id": 1});
The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of race documents per given competition, the embedding option has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.
If your application frequently retrieves the race data with the competition information, then your application needs to issue multiple queries to resolve the references.
The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.
Ref:
MongoDB Applied Design Patterns: Practical Use Cases with the Leading NoSQL Database By Rick Copeland

You basically want to update the data, so you should upsert the data which is basically an update on the subdocument key.
Keep an array of keys in the main document.
Insert the sub-document and add the key to the list or update the list.

To push single item into the field ;
db.yourcollection.update( { $push: { "races": { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"} } } );
To push multiple items into the field it allows duplicate in the field;
db.yourcollection.update( { $push: { "races": { $each: [ { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"}, { "belongsTOId" : "2" , "gender" : "m" , "style" : "horse" , "length" : "70"} ] } } } );
To push multiple items without duplicated items;
db.yourcollection.update( { $addToSet: { "races": { $each: [ { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"}, { "belongsTOId" : "2" , "gender" : "m" , "style" : "horse" , "length" : "70"} ] } } } );
$pushAll deprecated since version 2.4, so we use $each in $push instead of $pushAll.
While using $push you will be able to sort and slice items. You might check the mongodb manual.

Related

MongoDb Schema Structure

I have a collection named 'Category' with this structure:
{
"CategoryID" : 1,
"ParentID" : 0,
"Name" : "Sample Cat"
}
And another collection which will be using this category
{
"DocumentID" : 1,
"CategoryID" : 1,
"DocumentName" : "Doc XPXSAX"
}
The problem with this design is that when is that I cannot use it to make a live search which will show me the document as
Doc XPXSAX found in Sample Cat"(along with category name without using join)
Also I cannot embed the documents inside the Category collection (as an array in one of the fields) as I am expecting the number of documents to go up to 50k.
What alternate schema design will enable me to incorporate an efficient search functionality without using hacks imitating joins ?
Thanks.
If you dislike an application-level join, why not embed the categories inside the document documents?
{
"DocumentID" : 1,
"category" : {
"ID" : 1,
"Name" : "Sample Cat",
"ParentID" : 0
},
"DocumentName" : "Doc XPXSAX"
}
Keep the category information you need for display in the document document. Information you need more rarely can live in the category document and be found with a second query or application-level join.

Index strategy for queries with dynamic match criteria

I have a collection which is going to hold machine data as well as mobile data, the data is captured on channel and is maintained at single level no embedding of object , the structure is like as follows
{
"Id": ObjectId("544e4b0ae4b039d388a2ae3a"),
"DeviceTypeId":"DeviceType1",
"DeviceTypeParentId":"Parent1",
"DeviceId":"D1",
"ChannelName": "Login",
"Timestamp": ISODate("2013-07-23T19:44:09Z"),
"Country": "India",
"Region": "Maharashtra",
"City": "Nasik",
"Latitude": 13.22,
"Longitude": 56.32,
//and more 10 - 15 fields
}
Most of the queries are aggregation queries, as used for Analytics dashboard and real-time analysis , the $match pipeline is as follows
{$match:{"DeviceTypeId":{"$in":["DeviceType1"]},"Timestamp":{"$gte":ISODate("2013-07-23T00:00:00Z"),"$lt":ISODate("2013-08-23T00:00:00Z")}}}
or
{$match:{"DeviceTypeParentId":{"$in":["Parent1"]},"Timestamp":{"$gte":ISODate("2013-07-23T00:00:00Z"),"$lt":ISODate("2013-08-23T00:00:00Z")}}}
and many of my DAL layer find queries and findOne queries are mostly on criteria DeviceType or DeviceTypeParentId.
The collection is huge and its growing, I have used compound index to support this queries, indexes are as follows
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "DB.channel_data"
},
{
"v" : 1,
"key" : {
"DeviceType" : 1,
"Timestamp" : 1
},
"name" : "DeviceType_1_Timestamp_1",
"ns" : "DB.channel_data"
},
{
"v" : 1,
"key" : {
"DeviceTypeParentId" : 1,
"Timestamp" : 1
},
"name" : "DeviceTypeParentId_1_Timestamp_1",
"ns" : "DB.channel_data"
}
]
Now we are going to add support for match criteria on DeviceId and if I follow same strategy as I did for DeviceType and DeviceTypeParentId is not good,as I fell by my current approach I'm creating many indexes and all most all will be same and huge.
So is their any good way to do indexing . I have read a bit about Index Intersection but not sure how will it be helpful.
If any wrong approach is followed by me please point it out as this is my first project and first time I am using MongoDB.
Those indexes all look appropriate for your queries, including the new one you're proposing. Three separate indexes supporting your three kinds of queries are the overall best option in terms of fast queries. You could put indexes on each field and let the planner use index intersection, but it won't be as good as the compound indexes. The indexes are not the same since they support different queries.
I think the real question is, are the (apparently) large memory footprint of the indices actually a problem at this point? Do you have a lot of page faults because of paging indexes and data out of disk?

MongoDB: How can I order by distance considering multiple fields?

I have a collection that stores information about doctors. Each doctor can work in private practices and/or in hospitals.
The collection has the following relevant fields (there are geospatial indexes on both privatePractices.address.loc and hospitals.address.loc):
{
"name" : "myName",
"privatePractices" : [{
"_id": 1,
"address" : {
"loc" : {
"lng" : 2.1608502864837646,
"lat" : 41.3943977355957
}
}
},
...
],
"hospitals" : [{
"_id": 5,
"address" : {
"loc" : {
"lng" : 2.8192520141601562,
"lat" : 41.97784423828125
}
}
},
...
]
}
I am trying to query that collection to get a list of doctors ordered by distance from a given point. This is where I am stuck:
The following queries return a list of doctors ordered by distance to the point defined in $nearSphere, considering only one of the two location types:
{ "hospitals.address.loc" : { "$nearSphere" : [2.1933, 41.4008] } }
{ "privatePractices.address.loc" : { "$nearSphere" : [2.1933, 41.4008] } }
What I want is to get the doctors ordered by the nearest hospital OR private practice, whatever is the nearest. Is it possible to do it on a single Mongo query?
Plan B is to use the queries above and then manually order the results outside Mongo (eg. using Linq). To do this, my two queries should return the distance of each hospital or private practice to the $nearSphere point. Is it possible to do that in Mongo?
EDIT - APPLIED SOLUTION (MongoDB 2.6):
I took my own approach inspired by what Neil Lunn suggests in his answer: I added a field in the Doctor document for sorting purposes, containing an array with all the locations of the doctor.
I tried this approach in MongoDB 2.4 and MongoDB 2.6, and the results are different.
Queries on 2.4 returned duplicate doctors that had more than a location, even if the _id was included in the query filter. Queries on 2.6 returned valid results.
I would have been hoping for a little more information here, but the basics still apply. So the general problem you have stumbled on is trying to have "two" location fields on what appears to be your doctors documents.
There is another problem with the approach. You have the "locations" within arrays in your document/ This would not give you an error when creating the index, but it also is not going to work like you expect. The big problem here is that being within an array, you might find the document that "contains" the nearest location, but then the question is "which one", as nothing is done to affect the array content.
Core problem though is you cannot have more than one geo-spatial index per query. But to really get what you want, turn the problem on it's head, and essentially attach the doctors to the locations, which is the other way around.
For example here, a "practices" collection or such:
{
"type": "Hospital",
"address" : {
"loc" : {
"lng" : 2.8192520141601562,
"lat" : 41.97784423828125
}
},
"doctors": [
{ "_id": 1, "name": "doc1", "specialty": "bones" },
{ "_id": 2, "name": "doc2", "specialty": "heart" }
]
}
{
"type": "Private",
"address" : {
"loc" : {
"lng" : 2.1608502864837646,
"lat" : 41.3943977355957
}
},
"doctors": [
{ "_id": 1, "name": "doc1", "specialty": "bones" },
{ "_id": 3, "name": "doc3", "specialty": "brain" }
]
}
The advantage here is that you have here is that as a single collection and all in the same index you can simply get both "types" and correctly ordered by distance or within bounds or whatever your geo-queries need be. This avoids the problems with the other modelling form.
As for the "doctors" information, of course you actually keep a separate collection for the full doctor information themselves, and possibly even keep an array of the _id values for the location documents there as well. But the main point here is that you can generally live with "embedding" some of the useful search information in a collection here that will help you.
That seems to be the better option here, and matching a doctor to criteria from inside the location is something that can be done, where as finding or sorting the nearest entry inside array is something that is not going to be supported by MongoDB itself, and would result in you applying the Math yourself in processing the results.

How can I work with translated strings in my schema in mongodb?

I'm design a schema for MongoDB and have a question. Here's an example of document that I need to save:
Product {
"_id" : ObjectID("..."),
"name" : "MyProduct",
"category" : {Catid:ObjectID(".."), name: "Eletronic"}
}
This "category" refers to another collection that has all the categories...I save the 'name' inside the product because I need the name of the category when I find a Product..
But this category's name needs to be translated..
How I design this??
I'd suggest storing the category (or categories) as an identifier within the product and not doing denormalization. As it would be typical that you'll have an application/middle-tier/web server doing queries against the MongoDB, it's reasonable to apply a simple caching layer for categories and their translations in memory (you wouldn't even need to cache them very long if that was important).
Product {
"_id" : ObjectID("..."),
"name" : "MyProduct",
"category" : ObjectID("..")
}
Category {
"_id" : ObjectID("..."),
"en-us" : "cheese",
"de-de" : "Käse",
"es-mx" : "queso"
}
Or, category could be stored with more structure to handle regional variances:
Category {
"_id" : ObjectID("..."),
"en" : { default: "cheese" }
"de-de" : { default: "käse", "at": "käse2" }
"es" : { default: "queso" }
}
If you do a query like:
db.products.find({ price : { $gt: 50.00 }})
which returns a list of matches, you can gather all of the categories from the matching product documents, and use $in to quickly fetch any non-cached category values for the current locale. So, you can minimize the number of extra round-trips to the database by doing the query using this technique. If you have a large set of categories to match, you might consider doing them in batches.
db.categories.find( { _id : { $in : [array_of_ids] } });
Then, match them together.
MongoDB (and most other NoSQL databases) do not support of relations.
This doesn't mean you cannot define relationships/references in NoSQL databases. It simply means there are no native query operator available.
There are two different ways to "refer" to one document from another in MongoDB :
Store the referred document's ID (usually an ObjectId) as a field in the referring document. This is the best approach if your app will know in which collection it has to look for the referred document. Example : {_id: ObjectId(...),category: ObjectId(...)} <- reference).
Not technically a reference but in a lot of cases it makes sense to embed (parts of) documents into other documents. Note that normalization of your schema should be less of a focus with NoSQL databases. Example : {_id: ObjectId(...); category: {_id: ObjectId(...), name:"xyz"}}.
There are two ways you can go. One, like you have indicated, is called "Denormalization" where you save the category information (the name in each language) in the Product document itself. That way, when you load a Product, you already have the name in each language. You could model a Product like this:
{
_id: ObjectId(""),
name: "MyProduct",
category: {
Catid: ObjectId(""),
names: {
en: "MyCategory",
es: "...",
fr: "..."
}
}
}
The other option, if the category name changes too much or you add languages regularly, is to not save the names on the category names on the product, but to rather do a query on the Category collection for the Category when you need it.

Suitability of MongoDB for hierarchial type queries

I have a particular data manipulation requirement that I have worked out how to do in SQL Server and PostgreSQL. However, I'm not too happy with the speed, so I am investigating MongoDB.
The best way to describe the query is as follows. Picture the hierarchical data of the USA: Country, State, County, City. Let's say a particular vendor can service the whole of California. Another can perhaps service only Los Angeles. There are potentially hundreds of thousands of vendors and they all can service from some point(s) in this hierarchy down. I am not confusing this with Geo - I am using this to illustrate the need.
Using recursive queries, it is quite simple to get a list of all vendors who could service a particular user. If he were in say Pasadena, Los Angeles, California, we would walk up the hierarchy to get the applicable IDs, then query back down to find the vendors.
I know this can be optimized. Again, this is just a simple query example.
I know MongoDB is a document store. That suits other needs I have very well. The question is how well suited is it to the query type I describe? (I know it doesn't have joins - those are simulated).
I get that this is a "how long is a piece of string" question. I just want to know if anyone has any experience with MongoDB doing this sort of thing. It could take me quite some time to go from 0 to tested, and I'm looking to save time if MongoDB is not suited to this.
EXAMPLE
A local movie store "A" can supply Blu-Rays in Springfield. A chain store "B" with state-wide distribution can supply Blu-Rays to all of IL. And a download-on-demand store "C" can supply to all of the US.
If we wanted to get all applicable movie suppliers for Springfield, IL, the answer would be [A, B, C].
In other words, there are numerous vendors attached at differing levels on the hierarchy.
I realize this question was asked nearly a year ago, but since then MongoDB has an officially supported solution for this problem, and I just used their solution. Refer to their documentation here: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/
The concept relating closest to your question is named "partial path."
While it may feel a bit heavy to embed ancestor data; this approach is the most suitable way to solve your problem in MongoDB. The only pitfall to this, that I've experienced so far, is that if you're storing all of this in a single document you can hit the, as of this time, 16MB document size limit when working with enough data (although, I can only see this happening if you're using this structure to track user referrals [which could reach millions] rather than US cities [which is upwards of 26,000 according to the latest US Census]).
References:
http://www.mongodb.org/display/DOCS/Schema+Design
http://www.census.gov/geo/www/gazetteer/places2k.html
Modifications:
Replaced link: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
Note that this question was also asked on the google group. See http://groups.google.com/group/mongodb-user/browse_thread/thread/5cd5edd549813148 for that disucssion.
One option is to use an array key. You can store the hierarchy as an
array of values (for example ['US','CA','Los Angeles']). Then you can
query against records based on individual elements in that array key
For example:
First, store some documents with the array value representing the
hierarchy
> db.hierarchical.save({ location: ['US','CA','LA'], name: 'foo'} )
> db.hierarchical.save({ location: ['US','CA','SF'], name: 'bar'} )
> db.hierarchical.save({ location: ['US','MA','BOS'], name: 'baz'} )
Make sure we have an index on the location field so we can perform
fast queries against its values
> db.hierarchical.ensureIndex({'location':1})
Find all records in California
> db.hierarchical.find({location: 'CA'})
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" }
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" }
Find all records in Massachusetts
> db.hierarchical.find({location: 'MA'})
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" }
Find all records in the US
> db.hierarchical.find({location: 'US'})
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" }
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" }
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" }
Note that in this model, your values in the array would need to be
unique. So for example, if you had 'springfield' in different states,
then you would need to do some extra work to differentiate.
> db.hierarchical.save({location:['US','MA','Springfield'], name: 'one' })
> db.hierarchical.save({location:['US','IL','Springfield'], name: 'two' })
> db.hierarchical.find({location: 'Springfield'})
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" }
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" }
You can overcome this by using the $all operator and specifying more
levels of the hierarchy. For example:
> db.hierarchical.find({location: { $all : ['US','MA','Springfield']} })
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" }
> db.hierarchical.find({location: { $all : ['US','IL','Springfield']} })
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" }