Why should I have multiple collections in a NoSql database (MongoDB) - mongodb

Example:
If I have a database, let's call it world, then I can have a collection of countries.
The collection can have documents, one example on a document:
"_id" : "hiqgfuywefowehfnqfdqfe",
"name" : "Italy",
cities : {{"_id": "ihevfbwyfv", name : "Napoli"}, {"_id: "hjbyiu", name: "Milano"}}
Why should I or when should I create a new collection of documents, I could expand my documents inside my world collection instead of creating a new collection.
Is creating a new collection not like creating a new domain?

Separate collections offer the greatest querying flexibility. Separate collections are good if you need to select individual documents, need more control over querying, or have huge documents and they require more work. Selecting embedded documents is more limited. A document, including all its embedded documents and arrays, cannot exceed 16 MB. Embedded documents are easy and fast, good when you want the entire document, the document with a $slice of comments, or with no comments at all.

A world collection should look like this:
You should create a document / country.
The above is a single document:
{
"_id": "objectKey", //generated by mongo
"countryName": "Italy",
"cities":[
{
"id": 1,
"cityName": "Napoli"
},
{
"id": 2,
"cityName": "Milano"
}
],
}
Try to expend this collection with other countrys but there are some other situations where you should consider creating a new collection and try to create some sort of relation:
If you use mmapv1 storage engine and you wish to add cities over the time you should consider relational way cuz mmapv1 does not tolerate rapidly growing arrays. If you create a document with mmapv1 the storage engine gives a paddign to the document for growing arrays. If this paddign is filled then the document will be relocated in the hard drive with will result disc fregmentation and performace loss. Btw the default storage is Wiredtiger now which does not have this problem.
Do not nest documents too deeply. It will make your documents hard to run complex queries on. For an example do not create documents like the following:
{
"continentName": "Europe",
"continentCountries: [
{
"countryName": "France",
"cities": [
{
"cityId": 1,
"cityName": "Paris"
},
{
another country
}
]
},
{
another country
}
],
"TimeZone": "GMT+1"
],
"continentId": 1
}
In this situtation you should create a continent property for your country object with the continent name or id in it.
With mongodb you should avoid relations most of the time but its not forbidden to create them. Its way better to create relation then nest document too deeply but the best solution is to flip hierarchical level of the document documents for an example:
use 'county->cities, countinentId, etc...' instead of 'continent->country->city'
useful reading: https://docs.mongodb.org/manual/core/data-modeling-introduction/

Related

Relate children to parent using Mongo aggregate query

I have a single collection named assets that contains documents in 2+ formats, ParentObject and ChildObject. I am currently associating ParentObject to ChildObject with two queries. Can this be done with an aggregate query?
ParentObject
{
"_id" : {
"oid" : "ParentFooABC",
"brand" : "acme"
},
"type": "com.ParentClass",
"title": "Title1234",
"availableDate": Date,
"expirationDate": Date
}
ChildObject
{
"_id" : {
"oid" : "ChildFoo",
"brand" : "acme"
},
"type": "com.ChildClass",
"parentObject": "ParentFooABC",
"title": "Title1234",
"modelNumber": "8HE56",
"modelLine": "Metro",
"availableDate": Date,
"expirationDate": Date,
"IDRequired": true
}
Currently I filter data like this
val parent = db.asset.find(MongoDBObject("_id.brand": MongoDBObject($eq: "acme")),MongoDBObject("type":"com.ParentClass"))
val children = db.asset.find(MongoDBObject("_id.brand": MongoDBObject($eq: "acme")),MongoDBObject("type":"com.ChildClass"), MongoDBObject("parentObject": "${parent._id.oid}"))
if(childs.nonEmpty) {
//I have confirmed this parent has a child associated and should be returned
val childModelNumbers = childs.map(child -> child.modelNumber)
val response = ResponseObject(parent, childModelNumbers)
}
Can I do this in an aggregate query?
Updated:
Mongo Version: db version v2.6.11
Language: Scala
Driver: Casbah 2.8.1
Technically yes, however what you're doing now is standard practice with mongodb. If you need to join collections of data frequently you probably should use an RDBMS. However if you occasionally need to aggregate data from 2 separate collection then there is $lookup. In general you'll find yourself populating data into your document from another document pretty frequently, but that's typically how a document database works. Using $lookup when you're really looking for an equivalent to a SQL JOIN isn't something I would recommend. It's not really the intended purpose, and you'd be better off doing exactly what you're doing now or moving to an RDBMS.
Addition:
Since the data is in the same collection you can use an aggregation to $group that data together. You just have to group them based on _id.brand.
A quick example(in mongodb shell):
db.asset.aggregate([
{ //start with a match to narrow down
$match: {"_id.brand": "acme"}
},
{ //group by `_id.brand`
$group: {
_id: "$_id.brand",
"parentObject": {$first: "$parentObject"},
title: {$first: "$title"},
modelNumer: {$addToSet: "$modelNumer"}, //accumulate child model # as array
modelLine: {$addToSet: "$modelLine"}, //accumulate child model line as array
availableDate: {$first: "$availableDate"},
expirationDate: {$first: "$expirationDate"}
}
}
]);
This should return documents where all the children's properties have been grouped into the parent document as an array(modelNumber,modelLine). It's probably better to do what you're doing above, it might even be better yet if you put the children into their own collection instead of keeping track of their type with a type field in the document. This way you know that the documents in each collection represent a given data structure. However, if you were to do that you would not be able to perform the aggregation in the example.

MongoDB finding an item in an array and how to index properly

I have documents stored like so:
{
...
"users" : [
{
"id" : "123456",
"username" : "John",
"address" : "fake st",
}
],
...
}
What is the best way of being able to retrieve all the documents with the username "john". Also, what are the proper ways of indexing this for performance considering its inside an array. Do I want to index "users", or is there a better way? This is inside a database with 50+ million documents.
To find all the documents which contain at least one "john" in the users array, use db.collection.find({"users.username":"john"});
To make this faster, create an index with db.collection.createIndex({"users.username":1});. Indexes in MongoDB can reach inside arrays and index individual array entries (this is called a multikey index).

mongo operation speed : array $addToSet/$pull vs object $set/$unset

I have a index collection containing lots of terms, and a field items containing identifier from an other collection. Currently that field store an array of document, and docs are added by $addToSet, but I have some performance issues. It seems an $unset operation is executed faster, so I plan to change the array of document to a document of embed documents.
Am I right to think the $set/$unset fields are fatest than push/pull embed document into arrays ?
EDIT:
After small tests, we see the set/unset 4 times faster. On the other
hand, if I use object instead of array, it's a little harder to count
the number of properties (vs the length of the array), and we were
counting that a lot. But we can consider using $set everytime and
adding a field with the number of items.
This is a document of the current index :
{
"_id": ObjectId("5594dea2b693fffd8e8b48d3"),
"term": "clock",
"nbItems": NumberLong("1"),
"items": [
{
"_id": ObjectId("55857b10b693ff18948ca216"),
"id": NumberLong("123")
}
{
"_id": ObjectId("55857b10b693ff18948ca217"),
"id": NumberLong("456")
}
]
}
Frequent update operations are :
* remove item : {$pull:{"items":{"id":123}}}
* add item : {$addToSet:{"items":{"_id":ObjectId("55857b10b693ff18948ca216"),"id":123,}}}
* I can change $addToSet to $push and check duplicates before if performances are better
And this is what I plan to do:
{
"_id": ObjectId("5594dea2b693fffd8e8b48d3"),
"term": "clock",
"nbItems": NumberLong("1"),
"items": {
"123":{
"_id": ObjectId("55857b10b693ff18948ca216")
}
"456":{
"_id": ObjectId("55857b10b693ff18948ca217")
}
}
}
* remove item : {$unset:{"items.123":true}
* add item : {$set:{"items.123":{"_id":ObjectId("55857b10b693ff18948ca216"),"id":123,}}}
For information, theses operations are made with pymongo (or can be done with php if there is a good reason to), but I don't think this is relevant
As with any performance question, there are a number of factors which can come into play with an issue like this, such as indexes, need to hit disk, etc.
That being said, I suspect you are likely correct that adding a new field or removing an old field from a MongoDB document will be slightly faster than appending/removing from an array as the array types will be less easy to traverse when searching for duplicates.

Storing country state city in MongoDB

I am currently working on building database to store country-state-city information which is later to be used from drop down menus in our website.
I wanted to get few suggestions on the schema that I have decided as to how efficiently it will work.
I am using MongoDB to store the data.
The schema that I have designed is as follows:
{
_id: "XXXX",
country_name: "XXXX",
some more fields
state_list:[
{
state_name: "XXXX",
some more fields
city_list:[
{
city_name : "XXXX",
some more fields
},
{
city_name : "XXXX",
some more fields
}
]
}
]
}
The data will be increasing. Also there is a long list of cities for each state.
How good this schema will work for the intended purpose?
Should I use linking documents technique (this will require manual coding to map the _id) ?
I would suggest to keep it 1-N on country with respect to state while city to be separate collection with stateid.
You will have each country with limited number of states, while state will have many cities.
Making cities embedded in state and then states embedded in country will make the country collection too huge.
The common notion for making schema design is based on following criteria:
1-Few (in this case embed the collection inside parent object.
1-many (in this case create array of ids inside the parent collection and keep chil collection as seperate.
1- scillions (in this scenario keep the parent id in the child object and keep them as separate collections.
This link explains it much better.
I think as your data will increase , this schema will collapse. The best way is to break the database in 3 schemas and use refrence of their Ids.
Country Schema:
{
_id: "XXXX",
country_name: "XXXX",
some more fields
state_list:[{
"_id": reference id to state object
}]
}
State Schema :
{
_id: "XXXX",
state_name: "XXXX",
some more fields
city_list:[{
"_id" : reference to city object
}]
}
City Schema:
{
_id: "XXXX",
city_name: "XXXX",
some more fields
}

How do I manage a sublist in Mongodb?

I have different types of data that would be difficult to model and scale with a relational database (e.g., a product type)
I'm interested in using Mongodb to solve this problem.
I am referencing the documentation at mongodb's website:
http://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
For the data type that I am storing, I need to also maintain a relational list of id's where this particular product is available (e.g., store location id's).
In their example regarding "one-to-many relationships with embedded documents", they have the following:
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
I am currently importing the data with a spreadsheet, and want to use a batchInsert.
To avoid duplicates, I assume that:
1) I need to do an ensure index on the ID, and ignore errors on the insert?
2) Do I then need to loop through all the ID's to insert a new related ID to the books?
Your question could possibly be defined a little better, but let's consider the case that you have rows in a spreadsheet or other source that are all de-normalized in some way. So in a JSON representation the rows would be something like this:
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 12346789
},
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 234567890
}
So in order to get those sort of row results into the structure you wanted, one way to do this would be using the "upsert" functionality of the .update() method:
So assuming you have some way of looping the input values and they are identified with some structure then an analog to this would be something like:
books.forEach(function(book) {
db.publishers.update(
{
"name": book.publisher
},
{
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
{ "upsert": true }
);
})
This essentially simplified the code so that MongoDB is doing all of the data collection work for you. So where the "name" of the publisher is considered to be unique, what the statement does is first search for a document in the collection that matches the query condition given, as the "name".
In the case where that document is not found, then a new document is inserted. So either the database or driver will take care of creating the new _id value for this document and your "condition" is also automatically inserted to the new document since it was an implied value that should exist.
The usage of the $setOnInsert operator is to say that those fields will only be set when a new document is created. The final part uses $addToSet in order to "push" the book values that have not already been found into the "books" array (or set).
The reason for the separation is for when a document is actually found to exist with the specified "publisher" name. In this case, all of the fields under the $setOnInsert will be ignored as they should already be in the document. So only the $addToSet operation is processed and sent to the server in order to add the new entry to the "books" array (set) and where it does not already exist.
So that would be simplified logic compared to aggregating the new records in code before sending a new insert operation. However it is not very "batch" like as you are still performing some operation to the server for each row.
This is fixed in MongoDB version 2.6 and above as there is now the ability to do "batch" updates. So with a similar analog:
var batch = [];
books.forEach(function(book) {
batch.push({
"q": { "name": book.publisher },
"u": {
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
"upsert": true
});
if ( ( batch.length % 500 ) == 0 ) {
db.runCommand( "update", "updates": batch );
batch = [];
}
});
db.runCommand( "update", "updates": batch );
So what is doing in setting up all of the constructed update statements into a single call to the server with a sensible size of operations sent in the batch, in this case once every 500 items processed. The actual limit is the BSON document maximum of 16MB so this can be altered appropriate to your data.
If your MongoDB version is lower than 2.6 then you either use the first form or do something similar to the second form using the existing batch insert functionality. But if you choose to insert then you need to do all the pre-aggregation work within your code.
All of the methods are of course supported with the PHP driver, so it is just a matter of adapting this to your actual code and which course you want to take.