I am currently working on building database to store country-state-city information which is later to be used from drop down menus in our website.
I wanted to get few suggestions on the schema that I have decided as to how efficiently it will work.
I am using MongoDB to store the data.
The schema that I have designed is as follows:
{
_id: "XXXX",
country_name: "XXXX",
some more fields
state_list:[
{
state_name: "XXXX",
some more fields
city_list:[
{
city_name : "XXXX",
some more fields
},
{
city_name : "XXXX",
some more fields
}
]
}
]
}
The data will be increasing. Also there is a long list of cities for each state.
How good this schema will work for the intended purpose?
Should I use linking documents technique (this will require manual coding to map the _id) ?
I would suggest to keep it 1-N on country with respect to state while city to be separate collection with stateid.
You will have each country with limited number of states, while state will have many cities.
Making cities embedded in state and then states embedded in country will make the country collection too huge.
The common notion for making schema design is based on following criteria:
1-Few (in this case embed the collection inside parent object.
1-many (in this case create array of ids inside the parent collection and keep chil collection as seperate.
1- scillions (in this scenario keep the parent id in the child object and keep them as separate collections.
This link explains it much better.
I think as your data will increase , this schema will collapse. The best way is to break the database in 3 schemas and use refrence of their Ids.
Country Schema:
{
_id: "XXXX",
country_name: "XXXX",
some more fields
state_list:[{
"_id": reference id to state object
}]
}
State Schema :
{
_id: "XXXX",
state_name: "XXXX",
some more fields
city_list:[{
"_id" : reference to city object
}]
}
City Schema:
{
_id: "XXXX",
city_name: "XXXX",
some more fields
}
Related
Example:
If I have a database, let's call it world, then I can have a collection of countries.
The collection can have documents, one example on a document:
"_id" : "hiqgfuywefowehfnqfdqfe",
"name" : "Italy",
cities : {{"_id": "ihevfbwyfv", name : "Napoli"}, {"_id: "hjbyiu", name: "Milano"}}
Why should I or when should I create a new collection of documents, I could expand my documents inside my world collection instead of creating a new collection.
Is creating a new collection not like creating a new domain?
Separate collections offer the greatest querying flexibility. Separate collections are good if you need to select individual documents, need more control over querying, or have huge documents and they require more work. Selecting embedded documents is more limited. A document, including all its embedded documents and arrays, cannot exceed 16 MB. Embedded documents are easy and fast, good when you want the entire document, the document with a $slice of comments, or with no comments at all.
A world collection should look like this:
You should create a document / country.
The above is a single document:
{
"_id": "objectKey", //generated by mongo
"countryName": "Italy",
"cities":[
{
"id": 1,
"cityName": "Napoli"
},
{
"id": 2,
"cityName": "Milano"
}
],
}
Try to expend this collection with other countrys but there are some other situations where you should consider creating a new collection and try to create some sort of relation:
If you use mmapv1 storage engine and you wish to add cities over the time you should consider relational way cuz mmapv1 does not tolerate rapidly growing arrays. If you create a document with mmapv1 the storage engine gives a paddign to the document for growing arrays. If this paddign is filled then the document will be relocated in the hard drive with will result disc fregmentation and performace loss. Btw the default storage is Wiredtiger now which does not have this problem.
Do not nest documents too deeply. It will make your documents hard to run complex queries on. For an example do not create documents like the following:
{
"continentName": "Europe",
"continentCountries: [
{
"countryName": "France",
"cities": [
{
"cityId": 1,
"cityName": "Paris"
},
{
another country
}
]
},
{
another country
}
],
"TimeZone": "GMT+1"
],
"continentId": 1
}
In this situtation you should create a continent property for your country object with the continent name or id in it.
With mongodb you should avoid relations most of the time but its not forbidden to create them. Its way better to create relation then nest document too deeply but the best solution is to flip hierarchical level of the document documents for an example:
use 'county->cities, countinentId, etc...' instead of 'continent->country->city'
useful reading: https://docs.mongodb.org/manual/core/data-modeling-introduction/
I have two MongoDB collections user and customer which are in one-to-one relationship. I'm new to MongoDB and I'm trying to insert documents manually although I have Mongoose installed. I'm not sure which is the correct way of storing document reference in MongoDB.
I'm using normalized data model and here is my Mongoose schema snapshot for customer:
/** Parent user object */
user: {
type: Schema.Types.ObjectId,
ref: "User",
required: true
}
user
{
"_id" : ObjectId("547d5c1b1e42bd0423a75781"),
"name" : "john",
"email" : "test#localhost.com",
"phone" : "01022223333",
}
I want to make a reference to this user document from the customer document. Which of the following is correct - (A) or (B)?
customer (A)
{
"_id" : ObjectId("547d916a660729dd531f145d"),
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street",
"user" : ObjectId("547d5c1b1e42bd0423a75781")
}
customer (B)
{
"_id" : ObjectId("547d916a660729dd531f145d"),
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street",
"user" : {
"_id" : ObjectId("547d5c1b1e42bd0423a75781")
}
}
Remember these things
Embedding is better for...
Small subdocuments
Data that does not change regularly
When eventual consistency is acceptable
Documents that grow by a small amount
Data that you’ll often need to perform a second query to fetch Fast reads
References are better for...
Large subdocuments
Volatile data
When immediate consistency is necessary
Documents that grow a large amount
Data that you’ll often exclude from the results
Fast writes
Variant A is Better.
you can use also populate with Mongoose
Use variant A. As long as you don't want to denormalize any other data (like the user's name), there's no need to create a child object.
This also avoids unexpected complexities with the index, because indexing an object might not behave like you expect.
Even if you were to embed an object, _id would be a weird name - _id is only a reserved name for a first-class database document.
One to one relations
1 to 1 relations are relations where each item corresponds to exactly one other item. e.g.:
an employee have a resume and vice versa
a building have and floor plan and vice versa
a patient have a medical history and vice versa
//employee
{
_id : '25',
name: 'john doe',
resume: 30
}
//resume
{
_id : '30',
jobs: [....],
education: [...],
employee: 25
}
We can model the employee-resume relation by having a collection of employees and a collection of resumes and having the employee point to the resume through linking, where we have an ID that corresponds to an ID in th resume collection. Or if we prefer, we can link in another direction, where we have an employee key inside the resume collection, and it may point to the employee itself. Or if we want, we can embed. So we could take this entire resume document and we could embed it right inside the employee collection or vice versa.
This embedding depends upon how the data is being accessed by the application and how frequently the data is being accessed. We need to consider:
frequency of access
the size of the items - what is growing all the time and what is not growing. So every time we add something to the document, there is a point beyond which the document need to be moved in the collection. If the document size goes beyond 16MB, which is mostly unlikely.
atomicity of data - there're no transactions in MongoDB, there're atomic operations on individual documents. So if we knew that we couldn't withstand any inconsistency and that we wanted to be able to update the entire employee plus the resume all the time, we may decide to put them into the same document and embed them one way or the other so that we can update it all at once.
In mongodb its very recommended to embedding document as possible as you can, especially in your case that you have 1-to-1 relations.
Why? you cant use atomic-join-operations (even it is not your main concern) in your queries (not the main reason). But the best reason is each join-op (theoretically) need a hard-seek that take about 20-ms. embedding your sub-document just need 1 hard-seek.
I believe the best db-schema for you is using just an id for all of your entities
{
_id : ObjectId("547d5c1b1e42bd0423a75781"),
userInfo :
{
"name" : "john",
"email" : "test#localhost.com",
"phone" : "01022223333",
},
customerInfo :
{
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street",
},
staffInfo :
{
........
}
}
Now if you just want the userinfo you can use
db.users.findOne({_id : ObjectId("547d5c1b1e42bd0423a75781")},{userInfo : 1}).userInfo;
it will give you just the userInfo:
/* 0 */
{
"name" : "john",
"email" : "test#localhost.com",
"phone" : "01022223333"
}
And if you just want the **customerInfo ** you can use
db.users.findOne({_id : ObjectId("547d5c1b1e42bd0423a75781")},{customerInfo : 1}).customerInfo;
it will give you just the customerInfo :
/* 0 */
{
"birthday" : "1983-06-28",
"zipcode" : "12345",
"address" : "1, Main Street"
}
and so on.
This schema has the minimum hard round-trip and actually you are using mongodb document-based feature with best performance you can achive.
I have different types of data that would be difficult to model and scale with a relational database (e.g., a product type)
I'm interested in using Mongodb to solve this problem.
I am referencing the documentation at mongodb's website:
http://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
For the data type that I am storing, I need to also maintain a relational list of id's where this particular product is available (e.g., store location id's).
In their example regarding "one-to-many relationships with embedded documents", they have the following:
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
I am currently importing the data with a spreadsheet, and want to use a batchInsert.
To avoid duplicates, I assume that:
1) I need to do an ensure index on the ID, and ignore errors on the insert?
2) Do I then need to loop through all the ID's to insert a new related ID to the books?
Your question could possibly be defined a little better, but let's consider the case that you have rows in a spreadsheet or other source that are all de-normalized in some way. So in a JSON representation the rows would be something like this:
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 12346789
},
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 234567890
}
So in order to get those sort of row results into the structure you wanted, one way to do this would be using the "upsert" functionality of the .update() method:
So assuming you have some way of looping the input values and they are identified with some structure then an analog to this would be something like:
books.forEach(function(book) {
db.publishers.update(
{
"name": book.publisher
},
{
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
{ "upsert": true }
);
})
This essentially simplified the code so that MongoDB is doing all of the data collection work for you. So where the "name" of the publisher is considered to be unique, what the statement does is first search for a document in the collection that matches the query condition given, as the "name".
In the case where that document is not found, then a new document is inserted. So either the database or driver will take care of creating the new _id value for this document and your "condition" is also automatically inserted to the new document since it was an implied value that should exist.
The usage of the $setOnInsert operator is to say that those fields will only be set when a new document is created. The final part uses $addToSet in order to "push" the book values that have not already been found into the "books" array (or set).
The reason for the separation is for when a document is actually found to exist with the specified "publisher" name. In this case, all of the fields under the $setOnInsert will be ignored as they should already be in the document. So only the $addToSet operation is processed and sent to the server in order to add the new entry to the "books" array (set) and where it does not already exist.
So that would be simplified logic compared to aggregating the new records in code before sending a new insert operation. However it is not very "batch" like as you are still performing some operation to the server for each row.
This is fixed in MongoDB version 2.6 and above as there is now the ability to do "batch" updates. So with a similar analog:
var batch = [];
books.forEach(function(book) {
batch.push({
"q": { "name": book.publisher },
"u": {
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
"upsert": true
});
if ( ( batch.length % 500 ) == 0 ) {
db.runCommand( "update", "updates": batch );
batch = [];
}
});
db.runCommand( "update", "updates": batch );
So what is doing in setting up all of the constructed update statements into a single call to the server with a sensible size of operations sent in the batch, in this case once every 500 items processed. The actual limit is the BSON document maximum of 16MB so this can be altered appropriate to your data.
If your MongoDB version is lower than 2.6 then you either use the first form or do something similar to the second form using the existing batch insert functionality. But if you choose to insert then you need to do all the pre-aggregation work within your code.
All of the methods are of course supported with the PHP driver, so it is just a matter of adapting this to your actual code and which course you want to take.
I am having some difficulty structuring my data so that I can benefit from reactivity in Meteor. Mainly nesting arrays of objects makes queries tricky.
The three main views I am trying to project this data onto are
waiter: shows order for one table, each persons meal (items nested, essentially what I have below)
kitchen manager: columns of orders by table (only needs table, note, and the items)
cook: columns of items, by category where started=true (only need item info)
Currently I have a meteor collection of order objects like this:
Order {
table: "1",
waiter: "neil",
note: "note from kitchen",
meals: [
{
seat: "1",
items: [ {n: "potato", category: "fryer", started: false },
{n: "water", category: "drink" }
]
},
{
seat: "2",
items: [ {n: "water", category: "drink" } ]
},
]
}
Is there any way to query inside the nested array and apply some projection, or do I need to look at an entirely different data model?
Assuming you're building this app for one restaurant, there shouldn't be many active orders at any given time—presumably you don't have thousands of tables. Even if you want to keep the orders in the database after they're served, you could add a field active to separate out the current ones.
Then your query is simple: activeOrders = Orders.find({active: true}).fetch(). The fetch returns an array, which you could loop through several times for each of your views, using nested if and for loops as necessary to dig down into the child objects. See also Underscore's _.pluck. You don't need to get everything right with some complicated Mongo query, and in fact your app will run faster if you query once and process the results however many times you need to.
this is my collections: (many-to-many)
actors:
{
_id: 1,
name: "Name 1"
}
movies:
{
_id: 1,
name: "The Terminator",
production_year: 1984,
actors: [
{
actors_id: 1,
role_id : 1
},
{
actors_id: 2,
role_id : 1
}
]
}
I can't get a list of actors for some movie
it is not a problem when I have this:
{
_id: 1,
name: "The Terminator",
production_year: 1984,
actors: [1,2,3,4,5] (actors id's)
}
var a = db.movies.findOne(name:"The Terminator").actors
db.actors.find({"_id":{$in:a}})
but, how can I make it with this above structure:
if, I do this var a = db.movies.findOne(name:"The Terminator").actors
it returns me this:
[
{
actors_id: 1,
role_id : 1
},
{
actors_id: 2,
role_id : 1
}
]
How do I get only this in array [1,2] (actors_id) to get the names of actors (with $in)
Thanks,
Zoran
You don't. Within MongoDB you always query for documents so you have to make sure your schema is such that you can get all the information you need by querying for specific documents. There is no join/view like functionality in MongoDB.
Denormalization is usually the most appropriate choice in such cases. Your schema looks like it's designed for a traditional relational database and you will have to try and let go of some of the schema design principles that come with relational data.
Specifically for your example you could add the actor name to the embedded array so you have that information after querying for the movie.
Finally, consider if you're using the right tool for what you need to do. Too often people think of MongoDB is a "fast MySQL" which is entirely wrong. Document databases are very different to RDBMS and even k/v stores. If you have a lot of related data use an RDBMS.
variable a in db.movies.findOne(name:"The Terminator").actors is an array of documents, so you'd have to make it an array of integers (ids)