I'm currently learning NOSQL cloudant and trying to design the database, as I'm learning I'm cloudant treat all records as a documents and using denormalise for table. so I'm currently a bit confuse to how to decide which one need to be in one document and which one is need to be separated.
Below are my test cases :
let's say i'm designing store book tables structure, for simplicity I'll be having this tables BOOK, STORE, STORE_BRANCH
BOOK field : _id, book_name, author
STORE field : _id, store_name
STORE_BRANCH field : _id, store_branch_name, address, store_id_fk
with above case, I not able to decide where should i put the "price" field to ? as for normal RDBMS i will just create another table and having fields : ( store book_id, store_branch_id and prices), this with the assumption the price of the book is different for each branch. so i wondering how I put this in cloudant ?
any suggestion is appreciated
Your doubts are pretty common for RDMBS user.
In NoSQL generally you use the everything-in-one-document approach. In fact, in some cases, approximating JOINs in a document-oriented database like Cloudant is outright trivial. For example if you want to model a one-to-n relationship, you can put all n-related documents into the document they belong to. In your case you should put all the store_branch in the related store. This strategy is OK if:
The document does not get so big that it impairs performance. This can be mitigated somewhat by using database views or show functions.
The information in the inner document only appears there and does not need to be duplicated into other documents, or such duplication is acceptable for your application.
The document does not get updated concurrently. If it does, there will likely be unnecessary conflicts that will need to be resolved by the application.
If the above strategy is not applicable you can use an approach that more closely mimics how you solve this problem in a relational database: you can create a document for each "relational table". In your case you should create a document having fields : ( store book_id, store_branch_id and prices), too.
This Cloudant article explains very deeply these possibilities: Cloudant - Join the fun, have fun with JOINs.
Related
I do my first steps in NoSQL databases, thus I would like to hear the best practices about implementing the following requirement.
Let suppose I have a messages database, which is powered by MongoDB engine. This DB contains a collection of documents, where each document has the following fields:
time stamp;
message author/source;
message content.
Now, I want to build a list of authors/sources in order to add some metadata about each source. In the case of the classical RDBMS, I would define a table tblSources where I would store the names of the message sources and all additional meta-data (or links to the relevant tables) for each author.
What is the right approach to such task in NoSQL/MongoDB world?
It really depends on how you want to use the data. NoSQL dbs are generally not designed with fast joins in mind but they are still capable of doing joins and storing foreign keys.
Your options here are really
duplicate data aka store the author metadata in every document. This might be better in the case where you are really trying to optimize lookups and use Mongo as a key value store
Join on foreign key - this is pretty similar to how you would use a RDBMS
I'm coming from a Postgres background, and I am currently contemplating whether I should use a noSQL database such as mongoDB in my next project. For this I have a few questions
Is it possible to perform queries in noSQL that can fetch all the documents that have some common subdocument/attribute, example "select all users where country = italy"
Also, how is redundancy handled in noSQL? say I have a document that represents a given car model that multiple people can own. Would I then have to insert the same exact data in all these People documents, describing the given car model?
Thanks
Sure you can do queries with where clause in MongoDB (and other NoSQL engine), if I take your example, you will store the users into a "collection" named "users", and query it more or less the same way.
db.users.find( { "country" : "Italy" } );
MongoDB has a very rich and powerful query and aggregation engine ( http://docs.mongodb.org/manual/tutorial/query-documents/ ) , I am inviting your to follow the tutorials ( http://docs.mongodb.org/manual/tutorial/getting-started/ )or free online training ( http://university.mongodb.com ), to learn more about it.
To insert the document it is also really easy:
db.users.insert( {"first_name" : "John", "last_name" : "Doe", "country" : "USA" } );
that's it!
You talk about redundancy, like in your SQL world it depends a lot of the design. In MongoDB you will organize your document and link between them (linked or embedded documents) based on your business needs. It is hard to give an answer about document design in this context so I will invite you to read some interesting articles:
MongoDB Documentation : http://docs.mongodb.org/manual/data-modeling/
MongoDB Blog, 6 Rules of Thumb for MongoDB Schema Design (part 1,2,3)
http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1
http://blog.mongodb.org/post/87892923503/6-rules-of-thumb-for-mongodb-schema-design-part-2
http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-schema-design-part-3
Answer to the question about users and cars, will be "It depends of your application".
If your application is mostly read, and you need most of the data about cars & users, duplication (denormalization), will be a good approach to make the development easy. (and yes you will need more work when you have to update the information...). The blog post and documentation should help you to find your way.
Yes, you can do queries in MongoDB on documents that have common subdocument/attribute.
MongoDB encourages embedding(de-normalization) of data as disk space is cheap and embedding docs can result in better query performance.Yes, embedding documents would mean inserting the same data in all People documents in a Car model. But if you want to avoid duplication/de-normalization , then you can go for 'referencing' which is a normalized model which stores data something like this: People collection will contain people docs and Car collection will contain car docs, similar to what we have in rdbms. But the primary/foreign key relationship is not imposed by MongoDB. You will end up doing joins in code and hence query performance will get degraded.
I am new to MongoDB.I have one Master Collection user_group.The sample document is shown bellow.
{group_name:"xyz","previlege":["Add","Delete"],...}
And second collection user_detail
{"user_name":"pt123","group_name":"xyz",...}
How can I maintain relation between these two collections.Should I use reference from user_group into user_detail or any other alternative?
Often, in MongoDB, the "has many" relationship is managed on the opposite side as in a relational database. A MongoDB document often will have an array of ObjectIds or group names (or whatever you're using to identify the foreign document). This is opposed to a relational database where the other side usually has a "belongs to" column.
Do be clear, this is not required. In your example, you could store an array of user details IDs in your group document if it was the most common query that you were going to make. Basically, the question you should ask is "what query am I likely to need?" and design your documents to support it.
Simple answer: You don't.
The entire design philosophy changes when you start looking at MongoDB. If I were you, I would maintain the previlege field inside the user_detail documents itself.
{"user_name":"abc","group_name":"xyz","previlege" : ["add","delete"]}
This may not be ideal if you keep changing group priviledges though. But the idea is, you make design your data storage in a way so that all the information for one "record" can be stored in one object.
MongoDB being NoSQL does not have explicit joins. Workarounds are possible, but not recommended(read MapReduce).
Your best bet is to retrieve both the documents from the mongo collections on the client side and apply user specific privileges. Make sure you have index on the group_name in the user_group collection.
Or better still store the permissions[read, del, etc] for the user in the same document after applying the join at the client side. But then, you cannot update the collection externally since this might break invariants. Everytime an update to the user group occurs, you will need to apply those permissions(privileges) yourself at the client side and save those privileges in the same document. Writes might suffer but reads will be fast(assuming a few fields are indexed, like username).
I just start learning about nosql database, specially MongoDB (no specific reason for mongodb). I browse few tutorial sites, but still cant figure out, how it handle relationship between two documents/entity
Lets say for example:
1. One Employee works in one department
2. One Employee works in many department
I dont know the term 'relationship' make sense for mongodb or not.
Can somebody please give something about joins, relationship.
The short answer: with "nosql" you wouldn't do it that way.
What you'd do instead of a join or a relationship is add the departments the user is in to the user object.
You could also add the user to a field in the "department" object, if you needed to see users from that direction.
Denormalized data like this is typical in a "nosql" database.
See this very closely related question: How do I perform the SQL Join equivalent in MongoDB?
in general, you want to denormalize your data in your collections (=tables). Your collections should be optimized so that you don't need to do joins (joins are not possible in NoSQL).
In MongoDB you can either reference other collections (=tables), or you can embed them into each other -- whatever makes more sense in your domain. There are size limits to entries in a collection, so you can't just embed the encyclopedia britannica ;-)
It's probably best if you look for API documentation and examples for the programming language of your choice.
For Ruby, I'd recommend the Mondoid library: http://mongoid.org/docs/relations.html
Generally, if you decided to learn about NoSql databases you should follow the "NoSql way", i.e. learn the principles beyond the movement and the approach to design and not simply try to map RDBMS to your first NoSql project.
Simply put - you should learn how to embed and denormalize data (like Will above suggested), and not simply copy the id to simulate foreign keys.
If you do this the "foreign _id way", next step is to search for transactions to ensure that two "rows" are consistently inserted/updated. Few steps after Oracle/MySql is waiting. :)
There are some instances in which you want/need to keep the documents separate in which case you would take the _id from the one object and add it as a value in your other object.
For Example:
db.authors
{
_id:ObjectId(21EC2020-3AEA-1069-A2DD-08002B30309D)
name:'George R.R. Martin'
}
db.books
{
name:'A Dance with Dragons'
authorId:ObjectId(21EC2020-3AEA-1069-A2DD-08002B30309D)
}
There is no official relationship between books and authors its just a copy of the _id from authors into the authorId value in books.
Hope that helps.
I've been looking at the rise of the NoSql movement and the accompanying rise in popularity of document databases like mongodb, ravendb, and others. While there are quite a few things about these that I like, I feel like I'm not understanding something important.
Let's say that you are implementing a store application, and you want to store in the database products, all of which have a single, unique category. In Relational Databases, this would be accomplished by having two tables, a product and a category table, and the product table would have a field (called perhaps "category_id") which would reference the row in the category table holding the correct category entry. This has several benefits, including non-repetition of data.
It also means that if you misspelled the category name, for example, you could update the category table and then it's fixed, since that's the only place that value exists.
In document databases, though, this is not how it works. You completely denormalize, meaning in the "products" document, you would actually have a value holding the actual category string, leading to lots of repetition of data, and errors are much more difficult to correct. Thinking about this more, doesn't it also mean that running queries like "give me all products with this category" can lead to result that do not have integrity.
Of course the way around this is to re-implement the whole "category_id" thing in the document database, but when I get to that point in my thinking, I realize I should just stay with relational databases instead of re-implementing them.
This leads me to believe I'm missing some key point about document databases that leads me down this incorrect path. So I wanted to put it to stack-overflow, what am I missing?
You completely denormalize, meaning in the "products" document, you would actually have a value holding the actual category string, leading to lots of repetition of data [...]
True, denormalizing means storing additional data. It also means less collections (tables in SQL), thus resulting in less relations between pieces of data. Each single document can contain the information that would otherwise come from multiple SQL tables.
Now, if your database is distributed across multiple servers, it's more efficient to query a single server instead of multiple servers. With the denormalized structure of document databases, it's much more likely that you only need to query a single server to get all the data you need. With a SQL database, chances are that your related data is spread across multiple servers, making queries very inefficient.
[...] and errors are much more difficult to correct.
Also true. Most NoSQL solutions don't guarantee things such as referential integrity, which are common to SQL databases. As a result, your application is responsible for maintaining relations between data. However, as the amount of relations in a document database is very small, it's not as hard as it may sound.
One of the advantages of a document database is that it is schema-less. You're completely free to define the contents of a document at all times; you're not tied to a predefined set of tables and columns as you are with a SQL database.
Real-world example
If you're building a CMS on top of a SQL database, you'll either have a separate table for each CMS content type, or a single table with generic columns in which you store all types of content. With separate tables, you'll have a lot of tables. Just think of all the join tables you'll need for things like tags and comments for each content type. With a single generic table, your application is responsible for correctly managing all of the data. Also, the raw data in your database is hard to update and quite meaningless outside of your CMS application.
With a document database, you can store each type of CMS content in a single collection, while maintaining a strongly defined structure within each document. You could also store all tags and comments within the document, making data retrieval very efficient. This efficiency and flexibility comes at a price: your application is more responsible for managing the integrity of the data. On the other hand, the price of scaling out with a document database is much less, compared to a SQL database.
Advice
As you can see, both SQL and NoSQL solutions have advantages and disadvantages. As David already pointed out, each type has its uses. I recommend you to analyze your requirements and create two data models, one for a SQL solution and one for a document database. Then choose the solution that fits best, keeping scalability in mind.
I'd say that the number one thing you're overlooking (at least based on the content of the post) is that document databases are not meant to replace relational databases. The example you give does, in fact, work really well in a relational database. It should probably stay there. Document databases are just another tool to accomplish tasks in another way, they're not suited for every task.
Document databases were made to address the problem that (looking at it the other way around), relational databases aren't the best way to solve every problem. Both designs have their use, neither is inherently better than the other.
Take a look at the Use Cases on the MongoDB website: http://www.mongodb.org/display/DOCS/Use+Cases
A document db gives a feeling of freedom when you start. You no longer have to write create table and alter table scripts. You simply embed details in the master 'records'.
But after a while you realize that you are locked in a different way. It becomes less easy to combine or aggregate the data in a way that you didn't think was needed when you stored the data. Data mining/business intelligence (searching for the unknown) becomes harder.
That means that it is also harder to check if your app has stored the data in the db in a correct way.
For instance you have two collection with each approximately 10000 'records'. Now you want to know which ids are present in 'table' A that are not present in 'table' B.
Trivial with SQL, a lot harder with MongoDB.
But I like MongoDB !!
OrientDB, for example, supports schema-less, schema-full or mixed mode. In some contexts you need constraints, validation, etc. but you would need the flexibility to add fields without touch the schema. This is a schema mixed mode.
Example:
{
'#rid': 10:3,
'#class': 'Customer',
'#ver': 3,
'name': 'Jay',
'surname': 'Miner',
'invented': [ 'Amiga' ]
}
In this example the fields "name" and "surname" are mandatories (by defining them in the schema), but the field "invented" has been created only for this document. All your app need to don't know about it but you can execute queries against it:
SELECT FROM Customer WHERE invented IS NOT NULL
It will return only the documents with the field "invented".