In the website of MongoDB they wrote that MonogDB is Document-oriented Database, so if the MongoDB is not an Object Oriented database, so what is it? and what are the differences between Document and Object oriented databases?
This may be a bit late in reply, but just thought it is worth pointing out, there are big differences between ODB and MongoDB.
In general, the focus of ODB is tranparent references (relations) between objects in an arbitarily complex domain model without having to use and manage code for something like a DBRef. Even if you have a couple thousand classes, you don't need to worry about managing any keys, they come for free and when you create instances of those 1000's of classes at runtime, they will automatically create the schema in the database .. even for things like a self-referencing object with collections of collections.
Also, your transactions can span these references, so you do not have to use a completely embedded model.
The concepts are those leveraged in ORM solutions like JPA, the managed persistent object life-cycle, is taken from the ODB space, but the HUGE difference is that there is no mapping AT ALL in the ODB and relations are stored as part of the database so there is no runtime JOIN to resolve relations, all relations are resolved with the same speed as a b-tree look-up. For those of you who have used Hibernate, imagine Hibernate without ANY mapping file and orders of magnitude faster becase there is no runtime JOIN behind the scenes.
Also, ODB allows queries across any relationship in your model, so you are not restricted to queries in a particular collection as you are in MongoDB. Of course, hash/b-tree/aggregate indexes are supported to so queries are very fast when they are used.
You can evolve instances of any class in an ODB at the class level and at runtime the correct class version is resolved. Quite different than the way it works in MongoDB maintaining code to decide how to deal with varied forms of blob ( or value object ) that result from evolving a schema-less database ... or writing the code to visit and change every value object because you wanted to change the schema.
As far as partioning goes, I think it is a lot easier to decide on a partitioning model for a domain model which can talk across arbitary objects, then it is to figure out the be-all, end-all embedding strategy for your collection contained documents in MongoDB. As a rediculous example, you have a Contact and an Address and a ShoppingCart and these are related in a JSON document and you decide to partition on Contact by Contact_id. There is absolutely nothing to keep you from treating those 3 classes as just objects instead of JSON documents and storing those with a partition on Contact_id just as you would with MongoDB. However, if you had another object Account and you wanted to manage those in a non-embedded way because of some aggregate billing operations done on accounts, you can have that for free ( no need to create code for a DBRef type ) in the ODB ... and you can choose to partition right along with the Contact or choose to store the Accounts in a completely separate physical node, yet it will all be connected at runtime in the application space ... just like magic.
If you want to see a really cool video on how to create an application with an ODB which shows distribution, object movement, fault tolerance, performance optimization .. see this ( if you want to skip to the cool part, jump about 21 minutes in and you will avoid the building of the application and just see the how easy it is to add distribution and fault tolerance to any existing application ):
http://www.blip.tv/file/3285543
I think doc-oriented and object-oriented databases are quite different. Fairly detailed post on this here:
http://blog.10gen.com/post/437029788/json-db-vs-odbms
Document-oriented
Documents (objects) map nicely to
programming language data types
Embedded documents and arrays reduce
need for joins
Dynamically-typed (schemaless) for
easy schema evolution
No joins and no (multi-object)
transactions for high performance and
easy scalability
(MongoDB Introduction)
In my understanding MongoDB treats every single record like a Document no matter it is 1 field or n fields. You can even have embedded Documents inside a Document. You don't have to define a schema which is very strictly controlled in other Relational DB Systems (MySQL, PorgeSQL etc.). I've used MongoDB for a while and I really like its philosophy.
Object Oriented is a database model in which information is represented in the form of objects as used in object-oriented programming (Wikipedia).
A document oriented database is a different concept to object and relational databases.
A document database may or may not contain field, whereas a relational or object database would expect missing fields to be filled with a null entry.
Imagine storing an XML or JSON string in a single field on a database table. That is similar to how a document database works. It simply allows semi-structured data to be stored in a database without having lots of null fields.
Related
Schemaless is a term that is currently floating around in the NoSql world.
What does this mean ?
I have a document with 3 properties today and I move to production with it, then what happens to my data when I need to add 2 more properties to my document?
Is this purely a migrations problem where I need to manage my data migration or can a NoSql database create as much friction as a RDBMS or make it easier in someway ?
Schema-less is a bit of a misnomer, it's better to think of it as:
SQL = Schema enforced by a RDBMS on Write
NoSQL = Partial Schema enforced by the DBMS on Write, PLUS schema fully enforced by the Application on Read (Externalised
schema)
So while a supposed Schema-less NoSQL data-store will in theory allow you to store any data you like (typically key value pairs, in a document) without prior knowledge of the keys, or data types, it will be pointless unless you have some mechanism to retrieve and use the data. So essentially the schema is partially moved from the RDBMS into the application code. I say partially as you'll have added Indexes to document collections and or partitioned the data for performance, so the NoSQL DBMS will have a partial schema defined locally, and possibly enforced via Unique constraints.
As to adding additional attributes to a document/object in the store. Depending on how much padding is around the document (un-used space), in its physical data block, adding a few more key value pairs to the documents may result in the document having to be physically moved to a larger contiguous block of storage, and the associated indexes re-built. If you plan to use the new keys in a frequently utilised query then you'll be wanting to also add a suitable new index, which will obviously require some physical storage, take a while to initially build and possibly lead you to ask the sysadmin to allocate more memory to the DBMS, to allow the new index(s) to be cached.
A bit late in the day but while searching on the topic again I found this article
http://tech.pro/tutorial/1189/basics-of-ravendb-nosql
Refer to section 3 in the article, I will quote it again for ease.
Adding and changing data models to RavenDB couldn't be simpler. Since
it is a NoSQL database, it can handle additions and deletions to your
models very simply. If a property is added to your class, it will be
set to the default value of that type. If a property is deleted, then
upon deserialization that value will be ignored. No more futzing with
SQL Scripts.
This seems to be the logical answer for RavenDB.
I just started to use MongoDB and I'm confused to build object models with list property.
I have a User model related to Followers and Following object which are list of User IDs.
So I can think of some object model structures to represent the relation.
Embedded Document. Followers and Following are embedded to User model. In this way, a "current_user" object is generated in many web frameworks in every request, and it's an extra overhead to serialize/deserialize the Follower and Following list property since we seldom use these properties in most requests. We can exclude these properties when "current_user" is generated. However, we need to fetch full "current_user" object again before we do any updates to it.
Use Reference Property in User model. We can have Followers and Following object models themselves, not embedded, but save references to the User object.
Use Reference Property in Followers and Following models. We can save User ID in Follower and Following property for later queries.
There might be some other ways to do it, easier to use or better performance. And my question is:
What's the suggested way to design a model with some related list properties?
For folks coming from the SQL world (such as myself) one of the hardest things to learn about MongoDB is the new style of schema design. In the SQL world, everything goes into third normal form. Folks come to think that there is a single right way to design their schema, because there typically is one.
In the MongoDB world, there is no one best schema design. More accurately, in MongoDB schema design depends on how the application is going to access the data.
Here are the key questions that you need to have answered in order to design a good schema for MongoDB:
How much data do you have?
What are your most common operations? Will you be mostly inserting new data, updating existing data, or doing queries?
What are your most common queries?
What are your most common updates?
How many I/O operations do you expect per second?
Here's how these questions might play out if you are considering one-to-many object relationships.
In SQL you simply create a pair of master/detail tables with a primary key/foreign key relationship. In MongoDB, you have a number of choices: you can embed the data, you can create a linked relationship, you can duplicate and denormalize the data, or you can use a hybrid approach.
The correct approach would depend on a lot of details about the use case of your application.
Here are some good general references on MongoDB schema design.
MongoDB presentations:
http://www.10gen.com/presentations/mongosf2011/schemabasics
http://www.10gen.com/presentations/mongosv-2011/schema-design-by-example
http://www.10gen.com/presentations/mongosf2011/schemascale
http://www.10gen.com/presentations/MongoNYC-2012/Building-a-MongoDB-Power-Chat-Server
Here are a couple of books about MongoDB schema design that I think you would find useful:
http://www.manning.com/banker/ (MongoDB in Action)
http://shop.oreilly.com/product/0636920018391.do (Document Design for MongoDB)
Here are some sample schema designs:
http://docs.mongodb.org/manual/use-cases/
https://openshift.redhat.com/community/blogs/designing-mongodb-schemas-with-embedded-non-embedded-and-bucket-structures
What would be the equivalent of ERD for a NoSQL database such as MongoDB?
It looks like you asked a similar question on Quora.
As mentioned there, the ERD is simply a mapping of the data you intend to store and the relations amongst that data.
You can still make an ERD with MongoDB as you still want to track the data and the relations. The big difference is that MongoDB has no joins, so when you translate the ERD into an actual schema you'll have to make some specific decisions about implement the relationships.
In particular, you'll need to make the "embed vs. reference" decision when deciding how this data will actually be stored. Relations are still allowed, just not enforced. Many of the wrappers for MongoDB actually provide lookups across collections to abstract some of this complexity.
Even though MongoDB does not enforce a schema, it's not recommended to proceed completely at random. Modeling the data you expect to have in the system is still a really good idea and that's what the ERD provides you.
So I guess the equivalent to the ERD is the ERD?
You could just use a UML class diagram instead too.
Moon Modeler supports schema design for MongoDB. It allows users to define diagrams with nested structures.
I know of no standard means of diagramming document-oriented "schema".
I'm sure you could use an ERD to map out your schemata but since document databases do not truly support--or more importantly enforce--relationships between data, it would only be as useful as your code was disciplined to internally enforce such relationships.
I have been thinking about the same issue for quite some time.
And I came to the following conclusion:
If NoSQL databases are generally schemaless, you don't actually have a 'schema' to illustrate in a diagram.
Thus, I think you should take a "by example" approach.
You could draw some mindmaps exemplifying how your data would look like when stored in a NoSQL DB such as MongoDB.
And since these databases are very dynamic you could also create some derived mindmaps to show how the data from today could evolve in time.
Take a look at this topic too.
Confusion about NoSQL Design
MongoDB does support 'joins', just not in the SQL sense of INNER JOIN (the default SQL join). While the concept of 'join' is typically associated with SQL, MongoDB does have the aggregation framework with its data processing pipeline stages. The $lookup pipeline stage is used to create the equivalent of a LEFT JOIN in SQL. That is, all documents on the left of a relationship will be pass through the pipeline, as well as any relating documents on the right side of the relationship. The documents are modified to include the relationship as part of the new documents.
Consequently, I postulate that Entity Relationship Diagrams do have a role in MongoDB. Documents are certainly related to each other in the db, and we should have a visualization of these relationships, including the cardinality relationship, e.g. full participation, partial participation, weak/strong entities, etc.
Of course, MongoDB also introduces the concept of embedded documents and referenced documents, and so I argue it adds additional flavor to the model of the ERD. And I certainly would want to see embedded and referenced relationships mapped out in a visual diagram.
The remaining question is so what is out there? What is out there for Mongoose for NodeJS? Mongoid for Ruby? etc. If you check the respective repositories for their corresponding ORMs (Object Relational Mappers), then you will see there are ERDs for them. But in terms of their completeness, perhaps there is a lot to be desired and the open source community is welcome to make contributions.
https://www.npmjs.com/package/mongoose-erd
https://rubygems.org/gems/railroady
I've been looking at the rise of the NoSql movement and the accompanying rise in popularity of document databases like mongodb, ravendb, and others. While there are quite a few things about these that I like, I feel like I'm not understanding something important.
Let's say that you are implementing a store application, and you want to store in the database products, all of which have a single, unique category. In Relational Databases, this would be accomplished by having two tables, a product and a category table, and the product table would have a field (called perhaps "category_id") which would reference the row in the category table holding the correct category entry. This has several benefits, including non-repetition of data.
It also means that if you misspelled the category name, for example, you could update the category table and then it's fixed, since that's the only place that value exists.
In document databases, though, this is not how it works. You completely denormalize, meaning in the "products" document, you would actually have a value holding the actual category string, leading to lots of repetition of data, and errors are much more difficult to correct. Thinking about this more, doesn't it also mean that running queries like "give me all products with this category" can lead to result that do not have integrity.
Of course the way around this is to re-implement the whole "category_id" thing in the document database, but when I get to that point in my thinking, I realize I should just stay with relational databases instead of re-implementing them.
This leads me to believe I'm missing some key point about document databases that leads me down this incorrect path. So I wanted to put it to stack-overflow, what am I missing?
You completely denormalize, meaning in the "products" document, you would actually have a value holding the actual category string, leading to lots of repetition of data [...]
True, denormalizing means storing additional data. It also means less collections (tables in SQL), thus resulting in less relations between pieces of data. Each single document can contain the information that would otherwise come from multiple SQL tables.
Now, if your database is distributed across multiple servers, it's more efficient to query a single server instead of multiple servers. With the denormalized structure of document databases, it's much more likely that you only need to query a single server to get all the data you need. With a SQL database, chances are that your related data is spread across multiple servers, making queries very inefficient.
[...] and errors are much more difficult to correct.
Also true. Most NoSQL solutions don't guarantee things such as referential integrity, which are common to SQL databases. As a result, your application is responsible for maintaining relations between data. However, as the amount of relations in a document database is very small, it's not as hard as it may sound.
One of the advantages of a document database is that it is schema-less. You're completely free to define the contents of a document at all times; you're not tied to a predefined set of tables and columns as you are with a SQL database.
Real-world example
If you're building a CMS on top of a SQL database, you'll either have a separate table for each CMS content type, or a single table with generic columns in which you store all types of content. With separate tables, you'll have a lot of tables. Just think of all the join tables you'll need for things like tags and comments for each content type. With a single generic table, your application is responsible for correctly managing all of the data. Also, the raw data in your database is hard to update and quite meaningless outside of your CMS application.
With a document database, you can store each type of CMS content in a single collection, while maintaining a strongly defined structure within each document. You could also store all tags and comments within the document, making data retrieval very efficient. This efficiency and flexibility comes at a price: your application is more responsible for managing the integrity of the data. On the other hand, the price of scaling out with a document database is much less, compared to a SQL database.
Advice
As you can see, both SQL and NoSQL solutions have advantages and disadvantages. As David already pointed out, each type has its uses. I recommend you to analyze your requirements and create two data models, one for a SQL solution and one for a document database. Then choose the solution that fits best, keeping scalability in mind.
I'd say that the number one thing you're overlooking (at least based on the content of the post) is that document databases are not meant to replace relational databases. The example you give does, in fact, work really well in a relational database. It should probably stay there. Document databases are just another tool to accomplish tasks in another way, they're not suited for every task.
Document databases were made to address the problem that (looking at it the other way around), relational databases aren't the best way to solve every problem. Both designs have their use, neither is inherently better than the other.
Take a look at the Use Cases on the MongoDB website: http://www.mongodb.org/display/DOCS/Use+Cases
A document db gives a feeling of freedom when you start. You no longer have to write create table and alter table scripts. You simply embed details in the master 'records'.
But after a while you realize that you are locked in a different way. It becomes less easy to combine or aggregate the data in a way that you didn't think was needed when you stored the data. Data mining/business intelligence (searching for the unknown) becomes harder.
That means that it is also harder to check if your app has stored the data in the db in a correct way.
For instance you have two collection with each approximately 10000 'records'. Now you want to know which ids are present in 'table' A that are not present in 'table' B.
Trivial with SQL, a lot harder with MongoDB.
But I like MongoDB !!
OrientDB, for example, supports schema-less, schema-full or mixed mode. In some contexts you need constraints, validation, etc. but you would need the flexibility to add fields without touch the schema. This is a schema mixed mode.
Example:
{
'#rid': 10:3,
'#class': 'Customer',
'#ver': 3,
'name': 'Jay',
'surname': 'Miner',
'invented': [ 'Amiga' ]
}
In this example the fields "name" and "surname" are mandatories (by defining them in the schema), but the field "invented" has been created only for this document. All your app need to don't know about it but you can execute queries against it:
SELECT FROM Customer WHERE invented IS NOT NULL
It will return only the documents with the field "invented".
I'm used to using relational databases like MySQL or PostgreSQL, and combined with MVC frameworks such as Symfony, RoR or Django, and I think it works great.
But lately I've heard a lot about MongoDB which is a non-relational database, or, to quote the official definition,
a scalable, high-performance, open
source, schema-free, document-oriented
database.
I'm really interested in being on edge and want to be aware of all the options I'll have for a next project and choose the best technologies out there.
In which cases using MongoDB (or similar databases) is better than using a "classic" relational databases?
And what are the advantages of MongoDB vs MySQL in general?
Or at least, why is it so different?
If you have pointers to documentation and/or examples, it would be of great help too.
Here are some of the advantages of MongoDB for building web applications:
A document-based data model. The basic unit of storage is analogous to JSON, Python dictionaries, Ruby hashes, etc. This is a rich data structure capable of holding arrays and other documents. This means you can often represent in a single entity a construct that would require several tables to properly represent in a relational db. This is especially useful if your data is immutable.
Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
No schema migrations. Since MongoDB is schema-free, your code defines your schema.
A clear path to horizontal scalability.
You'll need to read more about it and play with it to get a better idea. Here's an online demo:
http://try.mongodb.org/
There are numerous advantages.
For instance your database schema will be more scalable, you won't have to worry about migrations, the code will be more pleasant to write... For instance here's one of my model's code :
class Setting
include MongoMapper::Document
key :news_search, String, :required => true
key :is_availaible_for_iphone, :required => true, :default => false
belongs_to :movie
end
Adding a key is just adding a line of code !
There are also other advantages that will appear in the long run, like a better scallability and speed.
... But keep in mind that a non-relational database is not better than a relational one. If your database has a lot of relations and normalization, it might make little sense to use something like MongoDB. It's all about finding the right tool for the job.
For more things to read I'd recommend taking a look at "Why I think Mongo is to Databases what Rails was to Frameworks" or this post on the mongodb website. To get excited and if you speak french, take a look at this article explaining how to set up MongoDB from scratch.
Edit: I almost forgot to tell you about this railscast by Ryan. It's very interesting and makes you want to start right away!
The advantage of schema-free is that you can dump whatever your load is in it, and no one will ever have any ground for complaining about it, or for saying that it was wrong.
It also means that whatever you dump in it, remains totally void of meaning after you have done so.
Some would label that a gross disadvantage, some others won't.
The fact that a relational database has a well-established schema, is a consequence of the fact that it has a well-established set of extensional predicates, which are what allows us to attach meaning to what is recorded in the database, and which are also a necessary prerequisite for us to do so.
Without a well-established schema, no extensional predicates, and without extensional precicates, no way for the user to make any meaning out of what was stuffed in it.
My experience with Postgres and Mongo after working with both the databases in my projects .
Postgres(RDBMS)
Postgres is recommended if your future applications have a complicated schema that needs lots of joins or all the data have relations or if we have heavy writing. Postgres is open source, faster, ACID compliant and uses less memory on disk, and is all around good performant for JSON storage also and includes full serializability of transactions with 3 levels of transaction isolation.
The biggest advantage of staying with Postgres is that we have best of both worlds. We can store data into JSONB with constraints, consistency and speed. On the other hand, we can use all SQL features for other types of data. The underlying engine is very stable and copes well with a good range of data volumes. It also runs on your choice of hardware and operating system. Postgres providing NoSQL capabilities along with full transaction support, storing JSON documents with constraints on the fields data.
General Constraints for Postgres
Scaling Postgres Horizontally is significantly harder, but doable.
Fast read operations cannot be fully achieved with Postgres.
NO SQL Data Bases
Mongo DB (Wired Tiger)
MongoDB may beat Postgres in dimension of “horizontal scale”. Storing JSON is what Mongo is optimized to do. Mongo stores its data in a binary format called BSONb which is (roughly) just a binary representation of a superset of JSON. MongoDB stores objects exactly as they were designed. According to MongoDB, for write-intensive applications, Mongo says the new engine(Wired Tiger) gives users an up to 10x increase in write performance(I should try this), with 80 percent reduction in storage utilization, helping to lower costs of storage, achieve greater utilization of hardware.
General Constraints of MongoDb
The usage of a schema less storage engine leads to the problem of implicit schemas. These schemas aren’t defined by our storage engine but instead are defined based on application behavior and expectations.
Stand-alone NoSQL technologies do not meet ACID standards because they sacrifice critical data protections in favor of high throughput performance for unstructured applications. It’s not hard to apply ACID on NoSQL databases but it would make database slow and inflexible up to some extent. “Most of the NoSQL limitations were optimized in the newer versions and releases which have overcome its previous limitations up to a great extent”.
It's all about trade offs. MongoDB is fast but not ACID, it has no transactions. It is better than MySQL in some use cases and worse in others.
Bellow Lines Written in MongoDB: The Definitive Guide.
There are several good reasons:
Keeping different kinds of documents in the same collection can be a
nightmare for developers and admins. Developers need to make sure
that each query is only returning documents of a certain kind or
that the application code performing a query can handle documents of
different shapes. If we’re querying for blog posts, it’s a hassle to
weed out documents containing author data.
It is much faster to get a list of collections than to extract a
list of the types in a collection. For example, if we had a type key
in the collection that said whether each document was a “skim,”
“whole,” or “chunky monkey” document, it would be much slower to
find those three values in a single collection than to have three
separate collections and query for their names
Grouping documents of the same kind together in the same collection
allows for data locality. Getting several blog posts from a
collection containing only posts will likely require fewer disk
seeks than getting the same posts from a collection con- taining
posts and author data.
We begin to impose some structure on our documents when we create
indexes. (This is especially true in the case of unique indexes.)
These indexes are defined per collection. By putting only documents
of a single type into the same collection, we can index our
collections more efficiently
After a question of databases with textual storage), I glanced at MongoDB and similar systems.
If I understood correctly, they are supposed to be easier to use and setup, and much faster. Perhaps also more secure as the lack of SQL prevents SQL injection...
Apparently, MongoDB is used mostly for Web applications.
Basically, and they state that themselves, these databases aren't suited for complex queries, data-mining, etc. But they shine at retrieving quickly lot of flat data.
MongoDB supports search by fields, regular expression searches.Includes user defined java script functions.
MongoDB can be used as a file system, taking advantage of load balancing and data replication features over multiple machines for storing files.