MongoDb Schema Structure - mongodb

I have a collection named 'Category' with this structure:
{
"CategoryID" : 1,
"ParentID" : 0,
"Name" : "Sample Cat"
}
And another collection which will be using this category
{
"DocumentID" : 1,
"CategoryID" : 1,
"DocumentName" : "Doc XPXSAX"
}
The problem with this design is that when is that I cannot use it to make a live search which will show me the document as
Doc XPXSAX found in Sample Cat"(along with category name without using join)
Also I cannot embed the documents inside the Category collection (as an array in one of the fields) as I am expecting the number of documents to go up to 50k.
What alternate schema design will enable me to incorporate an efficient search functionality without using hacks imitating joins ?
Thanks.

If you dislike an application-level join, why not embed the categories inside the document documents?
{
"DocumentID" : 1,
"category" : {
"ID" : 1,
"Name" : "Sample Cat",
"ParentID" : 0
},
"DocumentName" : "Doc XPXSAX"
}
Keep the category information you need for display in the document document. Information you need more rarely can live in the category document and be found with a second query or application-level join.

Related

Finding Sub-documents through dot notation in mongodb

I have a simplified collection structure below for simplicity's sake. I am unable to query the specific sub-documents under the profile field.
For example, I want to find the sub-document "profiles.Bernie". Is there a find() query that will allow me to only retrieve the document for Bernie (i.e. category and id for Bernie)? Apologies if this is a duplicate, but I was unable to find solutions that catered the way this collection is structured below
{
"_id" : ObjectId("56aec822ceb6e9dc23d32271"),
"sm_user" : "user1",
"profiles" : {
"Bernie" : {
"category" : "Politics",
"id" : "bernie"
},
"Hilary" : {
"category" : "Politics",
"id" : "hilary"
}
}
}
Yes, you can choose to just get the Bernie subtree and suppress the normally included _id;
db.test.find({},{ _id:0, "profiles.Bernie":1 })
This will include the structure leading up to Bernie (aka the profiles tag), but only the data from that subtree.
If you don't want the structure to remain, you could also use the aggregate framework to project Bernie to the root of the output document;
db.test.aggregate([{$project: { _id:0, 'Bernie': "$profiles.Bernie"}}])

Relationship between 2 collections in MongoDB and show data between them

I am making a small database for library with MongoDB. I have 2 collections, first one is called 'books' which stores information about books. The second collection is called 'publishers' which stores information about the publishers and the IDs of the books which they published.
This is the document structure for 'books'. It has 3 documents
{
"_id" : ObjectId("565f2481104871a4a235ba00"),
"book_id" : 1,
"book_name" : "C++",
"book_detail" : "This is details"
},
{
"_id" : ObjectId("565f2492104871a4a235ba01"),
"book_id" : 2,
"book_name" : "JAVA",
"book_detail" : "This is details"
},
{
"_id" : ObjectId("565f24b0104871a4a235ba02"),
"book_id" : 3,
"book_name" : "PHP",
"book_detail" : "This is details"
}
This is the document structure for 'publishers'. It has 1 document.
{
"_id" : ObjectId("565f2411104871a4a235b9ff"),
"pub_id" : 2,
"pub_name" : "Publisher 2",
"pub_details" : "This is publishers details",
"book_id" : [2,3]
}
I want to write a query to show all the details of the books which are published by this publisher. I have written this query but it does not work. When I run it, it displays this message "Script executed successfuly, but there are no results to show.".
db.getCollection('publishers').find({"pub_id" : 2}).forEach(
function (functionName) {
functionName.books = db.books.find( { "book_id": functionName.book_id } ).toArray();
}
)
I think that your data structure is flawed. The publisher is a property of a book, not the other way around. You should add pub_id to each book, and remove book_id from the publisher:
{
"_id" : ObjectId("565f2481104871a4a235ba00"),
"book_id" : 1,
"book_name" : "C++",
"book_detail" : "This is details",
"pub_id" : 1
},
{
"_id" : ObjectId("565f2492104871a4a235ba01"),
"book_id" : 2,
"book_name" : "JAVA",
"book_detail" : "This is details"
"pub_id" : 2
},
{
"_id" : ObjectId("565f24b0104871a4a235ba02"),
"book_id" : 3,
"book_name" : "PHP",
"book_detail" : "This is details"
"pub_id" : 2
}
Then, select your books like such:
db.getCollection('books').find({"pub_id" : 2});
Try this way,
db.getCollection('publishers').find({"pub_id" : "2"}).exec(function(err, publisher){
if (err) {
res.send(err);
}
else
if(publisher)
{
publisher.forEach(function(functionName)
{
functionName.books = db.books.find( { "book_id": functionName.book_id } ).toArray();
})
}
})
I would suggest reading the official documentation, because the relationship between books and publishers is precisely the example which is used there: https://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
In mongoDB and noSQL at large, it is not true that publisher must be a property of book. This is only the case in RDBMS, where in one-to-many relationships the reference is in the "one" part. The same works the other way round, books don't have to be a property of publisher. The clue here is in the absence of "must."
It all depends on how many is the "many-to-many". In this case, I'd say it's also about what type of library we're talking about, size of catalogue and whether new book purchases are common:
Is the number of books per publisher small AND data about publisher is often accessed with data about the book? Then, embed publisher info in the book document.
Is the number of books per publisher fairly big but reasonably stable (e.g.: historical library where acquisitions are rare)? Then, create a publishers collection with an array of books per publisher.
Is the number of books per publisher fairly big and catalogue grows at a reasonable pace? Then, include reference in the book document and fetch publisher info with it.
Side note
Although not related to the question, I think your document structure is flawed. _id and book_id are redundant. If you want to follow the RDBMS pattern of incremental integer IDs, then it's absolutely OK that you specify your own _id at the time of inserting the document with 1, 2, 3, etc. ObjectID() is a great thing, but, again, there's no obligation to use it.

dynamic size of subdocument mongodb

I'm using mongodb and mongoose for my web application. The web app is used for registration for swimming competitions and each competition can have X number of races. My data structure as of now:
{
"_id": "1",
"name": "Utmanaren",
"location": "town",
"startdate": "20150627",
"enddate": "20150627"
"race" : {
"gender" : "m"
"style" : "freestyle"
"length" : "100"
}
}
Doing this i need to determine and define the number of races for every competition. A solution i tried is having a separate document and having a Id for which competition a races belongs to, like below.
{
"belongsTOId" : "1"
"gender" : "m"
"style" : "freestyle"
"length" : "100"
}
{
"belongsTOId" : "1"
"gender" : "f"
"style" : "butterfly"
"length" : "50"
}
Is there a way of creating and defining dynamic number of races as a subdocument while using Mongodb?
Thanks!
You have basically two approaches of modelling your data structure; you can either design a schema where you can reference or embed the races document.
Let's consider the following example that maps swimming competition and multiple races relationships. This demonstrates the advantage of embedding over referencing if you need to view many data entities in context of another. In this one-to-many relationship between competition and race data, the competition has multiple races entities:
// db.competition schema
{
"_id": 1,
"name": "Utmanaren",
"location": "town",
"startdate": "20150627",
"enddate": "20150627"
"races": [
{
"gender" : "m"
"style" : "freestyle"
"length" : "100"
},
{
"gender" : "f"
"style" : "butterfly"
"length" : "50"
}
]
}
With the embedded data model, your application can retrieve the complete swimming competition information with just one query. This design has other merits as well, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk. The other advantage with embedded documents is the atomicity and isolation in writing data. To illustrate this, say you want to remove a competition which has a race "style" property with value "butterfly", this can be done with one single (atomic) operation:
db.competition.remove({"races.style": "butterfly"});
For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, specifically Model One-to-Many Relationships with Embedded Documents
The other design option is referencing documents follow a normalized schema where the race documents contain a reference to the competition document:
// db.race schema
{
"_id": 1,
"competition_id": 1,
"gender": "m",
"style": "freestyle",
"length": "100"
},
{
"_id": 2,
"competition_id": 1,
"gender": "f",
"style": "butterfly",
"length": "50"
}
The above approach gives increased flexibility in performing queries. For instance, to retrieve all child race documents where the main parent entity competition has id 1 will be straightforward, simply create a query against the collection race:
db.race.find({"competiton_id": 1});
The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of race documents per given competition, the embedding option has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.
If your application frequently retrieves the race data with the competition information, then your application needs to issue multiple queries to resolve the references.
The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.
Ref:
MongoDB Applied Design Patterns: Practical Use Cases with the Leading NoSQL Database By Rick Copeland
You basically want to update the data, so you should upsert the data which is basically an update on the subdocument key.
Keep an array of keys in the main document.
Insert the sub-document and add the key to the list or update the list.
To push single item into the field ;
db.yourcollection.update( { $push: { "races": { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"} } } );
To push multiple items into the field it allows duplicate in the field;
db.yourcollection.update( { $push: { "races": { $each: [ { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"}, { "belongsTOId" : "2" , "gender" : "m" , "style" : "horse" , "length" : "70"} ] } } } );
To push multiple items without duplicated items;
db.yourcollection.update( { $addToSet: { "races": { $each: [ { "belongsTOId" : "1" , "gender" : "f" , "style" : "butterfly" , "length" : "50"}, { "belongsTOId" : "2" , "gender" : "m" , "style" : "horse" , "length" : "70"} ] } } } );
$pushAll deprecated since version 2.4, so we use $each in $push instead of $pushAll.
While using $push you will be able to sort and slice items. You might check the mongodb manual.

I have big database on mongodb and can't find and use my info

This my code:
db.test.find() {
"_id" : ObjectId("4d3ed089fb60ab534684b7e9"),
"title" : "Sir",
"name" : {
"_id" : ObjectId("4d3ed089fb60ab534684b7ff"),
"first_name" : "Farid"
},
"addresses" : [
{
"city" : "Baku",
"country" : "Azerbaijan"
},{
"city" : "Susha",
"country" : "Azerbaijan"
},{
"city" : "Istanbul",
"country" : "Turkey"
}
]
}
I want get output only all city. Or I want get output only all country. How can i do it?
I'm not 100% about your code example, because if your 'find' by ID there's no need to search by anything else... but I wonder whether the following can help:
db.test.insert({name:'farid', addresses:[
{"city":"Baku", "country":"Azerbaijan"},
{"city":"Susha", "country":"Azerbaijan"},
{"city" : "Istanbul","country" : "Turkey"}
]});
db.test.insert({name:'elena', addresses:[
{"city" : "Ankara","country" : "Turkey"},
{"city":"Baku", "country":"Azerbaijan"}
]});
Then the following will show all countries:
db.test.aggregate(
{$unwind: "$addresses"},
{$group: {_id:"$country", countries:{$addToSet:"$addresses.country"}}}
);
result will be
{ "result" : [
{ "_id" : null,
"countries" : [ "Turkey", "Azerbaijan"]
}
],
"ok" : 1
}
Maybe there are other ways, but that's one I know.
With 'cities' you might want to take more care (because I know cities with the same name in different countries...).
Based on your question, there may be two underlying issues here:
First, it looks like you are trying to query a Collection called "test". Often times, "test" is the name of an actual database you are using. My concern, then, is that you are trying to query the database "test" to find any collections that have the key "city" or "country" on any of the internal documents. If this is the case, what you actually need to do is identify all of the collections in your database, and search them individually to see if any of these collections contain documents that include the keys you are looking for.
(For more information on how the db.collection.find() method works, check the MongoDB documentation here: http://docs.mongodb.org/manual/reference/method/db.collection.find/#db.collection.find)
Second, if this is actually what you are trying to do, all you need to for each collection is define a query that only returns the key of the document you are looking for. If you get more than 0 results from the query, you know documents have the "city" key. If they don't return results, you can ignore these collections. One caveat here is if data about "city" is in embedded documents within a collection. If this is the case, you may actually need to have some idea of which embedded documents may contain the key you are looking for.

Retrieving only the relevant part of a stored document

I'm a newbie with MongoDB, and am trying to store user activity performed on a site. My data is currently structured as:
{ "_id" : ObjectId("4decfb0fc7c6ff7ff77d615e"),
"activity" : [
{
"action" : "added",
"item_name" : "iPhone",
"item_id" : 6140,
},
{
"action" : "added",
"item_name" : "iPad",
"item_id" : 7220,
}
],
"name" : "Smith,
"user_id" : 2
}
If I want to retrieve, for example, all the activity concerning item_id 7220, I would use a query like:
db.find( { "activity.item_id" : 7220 } );
However, this seems to return the entire document, including the record for item 6140.
Can anyone suggest how this might be done correctly? I'm not sure if it's a problem with my query, or with the structure of the data itself.
Many thanks.
You have to wait the following dev: https://jira.mongodb.org/browse/SERVER-828
You can use $slice only if you know insertion order and position of your element.
Standard queries on MongoDb always return all document.
(question also available here: MongoDB query to return only embedded document)