What does being schema-less mean for a NoSQL Database? - nosql

Schemaless is a term that is currently floating around in the NoSql world.
What does this mean ?
I have a document with 3 properties today and I move to production with it, then what happens to my data when I need to add 2 more properties to my document?
Is this purely a migrations problem where I need to manage my data migration or can a NoSql database create as much friction as a RDBMS or make it easier in someway ?

Schema-less is a bit of a misnomer, it's better to think of it as:
SQL = Schema enforced by a RDBMS on Write
NoSQL = Partial Schema enforced by the DBMS on Write, PLUS schema fully enforced by the Application on Read (Externalised
schema)
So while a supposed Schema-less NoSQL data-store will in theory allow you to store any data you like (typically key value pairs, in a document) without prior knowledge of the keys, or data types, it will be pointless unless you have some mechanism to retrieve and use the data. So essentially the schema is partially moved from the RDBMS into the application code. I say partially as you'll have added Indexes to document collections and or partitioned the data for performance, so the NoSQL DBMS will have a partial schema defined locally, and possibly enforced via Unique constraints.
As to adding additional attributes to a document/object in the store. Depending on how much padding is around the document (un-used space), in its physical data block, adding a few more key value pairs to the documents may result in the document having to be physically moved to a larger contiguous block of storage, and the associated indexes re-built. If you plan to use the new keys in a frequently utilised query then you'll be wanting to also add a suitable new index, which will obviously require some physical storage, take a while to initially build and possibly lead you to ask the sysadmin to allocate more memory to the DBMS, to allow the new index(s) to be cached.

A bit late in the day but while searching on the topic again I found this article
http://tech.pro/tutorial/1189/basics-of-ravendb-nosql
Refer to section 3 in the article, I will quote it again for ease.
Adding and changing data models to RavenDB couldn't be simpler. Since
it is a NoSQL database, it can handle additions and deletions to your
models very simply. If a property is added to your class, it will be
set to the default value of that type. If a property is deleted, then
upon deserialization that value will be ignored. No more futzing with
SQL Scripts.
This seems to be the logical answer for RavenDB.

Related

When should I create a new collections in MongoDB?

So just a quick best practice question here. How do I know when I should create new collections in MongoDB?
I have an app that queries TV show data. Should each show have its own collection, or should they all be store within one collection with relevant data in the same document. Please explain why you chose the approach you did. (I'm still very new to MongoDB. I'm used to MySql.)
The Two Most Popular Approaches to Schema Design in MongoDB
Embed data into documents and store them in a single collection.
Normalize data across multiple collections.
Embedding Data
There are several reasons why MongoDB doesn't support joins across collections, and I won't get into all of them here. But the main reason why we don't need joins is because we can embed relevant data into a single hierarchical JSON document. We can think of it as pre-joining the data before we store it. In the relational database world, this amounts to denormalizing our data. In MongoDB, this is about the most routine thing we can do.
Normalizing Data
Even though MongoDB doesn't support joins, we can still store related data across multiple collections and still get to it all, albeit in a round about way. This requires us to store a reference to a key from one collection inside another collection. It sounds similar to relational databases, but MongoDB doesn't enforce any of key constraints for us like most relational databases do. Enforcing key constraints is left entirely up to us. We're good enough to manage it though, right?
Accessing all related data in this way means we're required to make at least one query for every collection the data is stored across. It's up to each of us to decide if we can live with that.
When to Embed Data
Embed data when that embedded data will be accessed at the same time as the rest of the document. Pre-joining data that is frequently used together reduces the amount of code we have to write to query across multiple collections. It also reduces the number of round trips to the server.
Embed data when that embedded data only pertains to that single document. Like most rules, we need to give this some thought before blindly following it. If we're storing an address for a user, we don't need to create a separate collection to store addresses just because the user might have a roommate with the same address. Remember, we're not normalizing here, so duplicating data to some degree is ok.
Embed data when you need "transaction-like" writes. Prior to v4.0, MongoDB did not support transactions, though it does guarantee that a single document write is atomic. It'll write the document or it won't. Writes across multiple collections could not be made atomic, and update anomalies could occur for how many ever number of scenarios we can imagine. This is no longer the case since v4.0, however it is still more typical to denormalize data to avoid the need for transactions.
When to Normalize Data
Normalize data when data that applies to many documents changes frequently. So here we're talking about "one to many" relationships. If we have a large number of documents that have a city field with the value "New York" and all of a sudden the city of New York decides to change its name to "New-New York", well then we have to update a lot of documents. Got anomalies? In cases like this where we suspect other cities will follow suit and change their name, then we'd be better off creating a cities collection containing a single document for each city.
Normalize data when data grows frequently. When documents grow, they have to be moved on disk. If we're embedding data that frequently grows beyond its allotted space, that document will have to be moved often. Since these documents are bigger each time they're moved, the process only grows more complex and won't get any better over time. By normalizing those embedded parts that grow frequently, we eliminate the need for the entire document to be moved.
Normalize data when the document is expected to grow larger than 16MB. Documents have a 16MB limit in MongoDB. That's just the way things are. We should start breaking them up into multiple collections if we ever approach that limit.
The Most Important Consideration to Schema Design in MongoDB is...
How our applications access and use data. This requires us to think? Uhg! What data is used together? What data is used mostly as read-only? What data is written to frequently? Let your applications data access patterns drive your schema, not the other way around.
The scope you've described is definitely not too much for "one collection". In fact, being able to store everything in a single place is the whole point of a MongoDB collection.
For the most part, you don't want to be thinking about querying across combined tables as you would in SQL. Unlike in SQL, MongoDB lets you avoid thinking in terms of "JOINs"--in fact MongoDB doesn't even support them natively.
See this slideshare:
http://www.slideshare.net/mongodb/migrating-from-rdbms-to-mongodb?related=1
Specifically look at slides 24 onward. Note how a MongoDB schema is meant to replace the multi-table schemas customary to SQL and RDBMS.
In MongoDB a single document holds all information regarding a record. All records are stored in a single collection.
Also see this question:
MongoDB query multiple collections at once

Is MongoDB object-oriented?

In the website of MongoDB they wrote that MonogDB is Document-oriented Database, so if the MongoDB is not an Object Oriented database, so what is it? and what are the differences between Document and Object oriented databases?
This may be a bit late in reply, but just thought it is worth pointing out, there are big differences between ODB and MongoDB.
In general, the focus of ODB is tranparent references (relations) between objects in an arbitarily complex domain model without having to use and manage code for something like a DBRef. Even if you have a couple thousand classes, you don't need to worry about managing any keys, they come for free and when you create instances of those 1000's of classes at runtime, they will automatically create the schema in the database .. even for things like a self-referencing object with collections of collections.
Also, your transactions can span these references, so you do not have to use a completely embedded model.
The concepts are those leveraged in ORM solutions like JPA, the managed persistent object life-cycle, is taken from the ODB space, but the HUGE difference is that there is no mapping AT ALL in the ODB and relations are stored as part of the database so there is no runtime JOIN to resolve relations, all relations are resolved with the same speed as a b-tree look-up. For those of you who have used Hibernate, imagine Hibernate without ANY mapping file and orders of magnitude faster becase there is no runtime JOIN behind the scenes.
Also, ODB allows queries across any relationship in your model, so you are not restricted to queries in a particular collection as you are in MongoDB. Of course, hash/b-tree/aggregate indexes are supported to so queries are very fast when they are used.
You can evolve instances of any class in an ODB at the class level and at runtime the correct class version is resolved. Quite different than the way it works in MongoDB maintaining code to decide how to deal with varied forms of blob ( or value object ) that result from evolving a schema-less database ... or writing the code to visit and change every value object because you wanted to change the schema.
As far as partioning goes, I think it is a lot easier to decide on a partitioning model for a domain model which can talk across arbitary objects, then it is to figure out the be-all, end-all embedding strategy for your collection contained documents in MongoDB. As a rediculous example, you have a Contact and an Address and a ShoppingCart and these are related in a JSON document and you decide to partition on Contact by Contact_id. There is absolutely nothing to keep you from treating those 3 classes as just objects instead of JSON documents and storing those with a partition on Contact_id just as you would with MongoDB. However, if you had another object Account and you wanted to manage those in a non-embedded way because of some aggregate billing operations done on accounts, you can have that for free ( no need to create code for a DBRef type ) in the ODB ... and you can choose to partition right along with the Contact or choose to store the Accounts in a completely separate physical node, yet it will all be connected at runtime in the application space ... just like magic.
If you want to see a really cool video on how to create an application with an ODB which shows distribution, object movement, fault tolerance, performance optimization .. see this ( if you want to skip to the cool part, jump about 21 minutes in and you will avoid the building of the application and just see the how easy it is to add distribution and fault tolerance to any existing application ):
http://www.blip.tv/file/3285543
I think doc-oriented and object-oriented databases are quite different. Fairly detailed post on this here:
http://blog.10gen.com/post/437029788/json-db-vs-odbms
Document-oriented
Documents (objects) map nicely to
programming language data types
Embedded documents and arrays reduce
need for joins
Dynamically-typed (schemaless) for
easy schema evolution
No joins and no (multi-object)
transactions for high performance and
easy scalability
(MongoDB Introduction)
In my understanding MongoDB treats every single record like a Document no matter it is 1 field or n fields. You can even have embedded Documents inside a Document. You don't have to define a schema which is very strictly controlled in other Relational DB Systems (MySQL, PorgeSQL etc.). I've used MongoDB for a while and I really like its philosophy.
Object Oriented is a database model in which information is represented in the form of objects as used in object-oriented programming (Wikipedia).
A document oriented database is a different concept to object and relational databases.
A document database may or may not contain field, whereas a relational or object database would expect missing fields to be filled with a null entry.
Imagine storing an XML or JSON string in a single field on a database table. That is similar to how a document database works. It simply allows semi-structured data to be stored in a database without having lots of null fields.

CouchDB and MongoDB really search over each document with JavaScript?

From what I understand about these two "Not only SQL" databases. They search over each record and pass it to a JavaScript function you write which calculates which results are to be returned by looking at each one.
Is that actually how it works? Sounds worse than using a plain RBMS without any indexed keys.
I built my schemas so they don't require join operations which leaves me with simple searches on indexed int columns. In other words, the columns are in RAM and a quick value check through them (WHERE user_id IN (12,43,5,2) or revision = 4) gives the database a simple list of ID's which it uses to find in the actual rows in the massive data collection.
So I'm trying to imagine how in the world looking through every single row in the database could be considered acceptable (if indeed this is how it works). Perhaps someone can correct me because I know I must be missing something.
#Xeoncross
I built my schemas so they don't require join operations which leaves me with simple searches on indexed int columns. In other words, the columns are in RAM and a quick value check through them (WHERE user_id IN (12,43,5,2) or revision = 4)
Well then, you'll love MongoDB. MongoDB support indexes so you can index user_id and revision and this query will be able to return relatively quickly.
However, please note that many NoSQL DBs only support Key lookups and don't necessarily support "secondary indexes" so you have to do you homework on this one.
So I'm trying to imagine how in the world looking through every single row in the database could be considered acceptable (if indeed this is how it works).
Well if you run a query in an SQL-based database and you don't have an index that database will perform a table scan (i.e.: looking through every row).
They search over each record and pass it to a JavaScript function you write which calculates which results are to be returned by looking at each one.
So in practice most NoSQL databases support this. But please never use it for real-time queries. This option is primarily for performing map-reduce operations that are used to summarize data.
Here's maybe a different take on NoSQL. SQL is really good at relational operations, however relational operations don't scale very well. Many of the NoSQL are focused on Key-Value / Document-oriented concepts instead.
SQL works on the premise that you want normalized non-repeated data and that you to grab that data in big sets. NoSQL works on the premise that you want fast queries for certain "chunks" of data, but that you're willing to wait for data dependent on "big sets" (running map-reduces in the background).
It's a big trade-off, but if makes a lot of sense on modern web apps. Most of the time is spent loading one page (blog post, wiki entry, SO question) and most of the data is really tied to or "hanging off" that element. So the concept of grabbing everything you need with one query horizontally-scalable query is really useful.
It's the not the solution for everything, but it is a really good option for lots of use cases.
In terms of CouchDB, the Map function can be Javascript, but it can also be Erlang. (or another language altogether, if you pull in a 3rd Party View Server)
Additionally, Views are calculated incrementally. In other words, the map function is run on all the documents in the database upon creation, but further updates to the database only affect the related portions of the view.
The contents of a view are, in some ways, similar to an indexed field in an RDBMS. The output is a set of key/value pairs that can be searched very quickly, as they are stored as b-trees, which some RDBMSs use to store their indexes.
Think CouchDB stores the docs in a btree according to the "index" (view) and just walks this tree.. so it's not searching..
see http://guide.couchdb.org/draft/btree.html
You should study them up a bit more. It's not "worse" than and RDMBS it's different ... in fact, given certain domains/functions the "NoSQL" paradigm works out to be much quicker than traditional and in some opinions, outdated, RDMBS implementations. Think Google's Big Table platform and you get what MongoDB, Riak, CouchDB, Cassandra (Facebook) and many, many others are trying to accomplish. The primary difference is that most of these NoSQL solutions focus on Key/Value stores (some call these "document" databases) and have limited to no concept of relationships (in the primary/foreign key respect) and joins. Join operations on tables can be very expensive. Also, let's not forget the object relational impedance mismatch issue... You don't need an ORM to access MongoDB. It can actually store your code object (or document) as it is in memory. Can you imagine the savings in lines of code and complexity!? db4o is another lightweight solution that does this.
I don't know what you mean when you say "Not only SQL" database? It's a NoSQL paradigm - wherein no SQL is used to query the underlying data store of the system. NoSQL also means not an RDBMS which SQL is generally built on top of. Although, MongoDB does has an SQL like syntax that can be used from .NET when retrieving data - it's called NoRM.
I will say I've only really worked with Riak and MongoDB... I'm by no means familiar with Cassandra or CouchDB past a reading level and feature set comprehension. I prefer to use MongoDB over them all. Riak was nice too but not for what I needed. You should download a few of these NoSQL solutions and you will get the concept. Check out db4o, MongoDB and Riak as I've found them to be the easiest with more support for .NET based languages. It will just make sense for certain applications. All in all, the NoSQL or Document databse or OODBMS ... whatever you want to call it is very appealing and gaining lots of movement.
I also forgot about your javascript question... MongoDB has JavaScript "bindings" that enable it to be used as one method of searching for data. Riak handles data via a JSON format. MongoDB uses BSON I believe and I can't remember what the others use. In any case, the point is instead of SQL (structured query language) to "ask" the database for information some of these (MongoDB being one) use Javascript and/or RESTful syntax to ask the NoSQL system for data. I believe CouchDB and Riak can be queried over HTTP to which makes them very accessible. Not to mention, that's pretty frickin cool.
Do your research.... download them, they are all free and OSS.
db4o: http://www.db4o.com/ (Java & .NET versions)
MongoDB: mongodb.org/
Riak: http://www.basho.com/Riak.html
NoRM: http://thechangelog.com/post/436955815/norm-bringing-mongodb-to-net-linq-and-mono

Am I missing something about Document Databases?

I've been looking at the rise of the NoSql movement and the accompanying rise in popularity of document databases like mongodb, ravendb, and others. While there are quite a few things about these that I like, I feel like I'm not understanding something important.
Let's say that you are implementing a store application, and you want to store in the database products, all of which have a single, unique category. In Relational Databases, this would be accomplished by having two tables, a product and a category table, and the product table would have a field (called perhaps "category_id") which would reference the row in the category table holding the correct category entry. This has several benefits, including non-repetition of data.
It also means that if you misspelled the category name, for example, you could update the category table and then it's fixed, since that's the only place that value exists.
In document databases, though, this is not how it works. You completely denormalize, meaning in the "products" document, you would actually have a value holding the actual category string, leading to lots of repetition of data, and errors are much more difficult to correct. Thinking about this more, doesn't it also mean that running queries like "give me all products with this category" can lead to result that do not have integrity.
Of course the way around this is to re-implement the whole "category_id" thing in the document database, but when I get to that point in my thinking, I realize I should just stay with relational databases instead of re-implementing them.
This leads me to believe I'm missing some key point about document databases that leads me down this incorrect path. So I wanted to put it to stack-overflow, what am I missing?
You completely denormalize, meaning in the "products" document, you would actually have a value holding the actual category string, leading to lots of repetition of data [...]
True, denormalizing means storing additional data. It also means less collections (tables in SQL), thus resulting in less relations between pieces of data. Each single document can contain the information that would otherwise come from multiple SQL tables.
Now, if your database is distributed across multiple servers, it's more efficient to query a single server instead of multiple servers. With the denormalized structure of document databases, it's much more likely that you only need to query a single server to get all the data you need. With a SQL database, chances are that your related data is spread across multiple servers, making queries very inefficient.
[...] and errors are much more difficult to correct.
Also true. Most NoSQL solutions don't guarantee things such as referential integrity, which are common to SQL databases. As a result, your application is responsible for maintaining relations between data. However, as the amount of relations in a document database is very small, it's not as hard as it may sound.
One of the advantages of a document database is that it is schema-less. You're completely free to define the contents of a document at all times; you're not tied to a predefined set of tables and columns as you are with a SQL database.
Real-world example
If you're building a CMS on top of a SQL database, you'll either have a separate table for each CMS content type, or a single table with generic columns in which you store all types of content. With separate tables, you'll have a lot of tables. Just think of all the join tables you'll need for things like tags and comments for each content type. With a single generic table, your application is responsible for correctly managing all of the data. Also, the raw data in your database is hard to update and quite meaningless outside of your CMS application.
With a document database, you can store each type of CMS content in a single collection, while maintaining a strongly defined structure within each document. You could also store all tags and comments within the document, making data retrieval very efficient. This efficiency and flexibility comes at a price: your application is more responsible for managing the integrity of the data. On the other hand, the price of scaling out with a document database is much less, compared to a SQL database.
Advice
As you can see, both SQL and NoSQL solutions have advantages and disadvantages. As David already pointed out, each type has its uses. I recommend you to analyze your requirements and create two data models, one for a SQL solution and one for a document database. Then choose the solution that fits best, keeping scalability in mind.
I'd say that the number one thing you're overlooking (at least based on the content of the post) is that document databases are not meant to replace relational databases. The example you give does, in fact, work really well in a relational database. It should probably stay there. Document databases are just another tool to accomplish tasks in another way, they're not suited for every task.
Document databases were made to address the problem that (looking at it the other way around), relational databases aren't the best way to solve every problem. Both designs have their use, neither is inherently better than the other.
Take a look at the Use Cases on the MongoDB website: http://www.mongodb.org/display/DOCS/Use+Cases
A document db gives a feeling of freedom when you start. You no longer have to write create table and alter table scripts. You simply embed details in the master 'records'.
But after a while you realize that you are locked in a different way. It becomes less easy to combine or aggregate the data in a way that you didn't think was needed when you stored the data. Data mining/business intelligence (searching for the unknown) becomes harder.
That means that it is also harder to check if your app has stored the data in the db in a correct way.
For instance you have two collection with each approximately 10000 'records'. Now you want to know which ids are present in 'table' A that are not present in 'table' B.
Trivial with SQL, a lot harder with MongoDB.
But I like MongoDB !!
OrientDB, for example, supports schema-less, schema-full or mixed mode. In some contexts you need constraints, validation, etc. but you would need the flexibility to add fields without touch the schema. This is a schema mixed mode.
Example:
{
'#rid': 10:3,
'#class': 'Customer',
'#ver': 3,
'name': 'Jay',
'surname': 'Miner',
'invented': [ 'Amiga' ]
}
In this example the fields "name" and "surname" are mandatories (by defining them in the schema), but the field "invented" has been created only for this document. All your app need to don't know about it but you can execute queries against it:
SELECT FROM Customer WHERE invented IS NOT NULL
It will return only the documents with the field "invented".

What are the advantages of using a schema-free database like MongoDB compared to a relational database?

I'm used to using relational databases like MySQL or PostgreSQL, and combined with MVC frameworks such as Symfony, RoR or Django, and I think it works great.
But lately I've heard a lot about MongoDB which is a non-relational database, or, to quote the official definition,
a scalable, high-performance, open
source, schema-free, document-oriented
database.
I'm really interested in being on edge and want to be aware of all the options I'll have for a next project and choose the best technologies out there.
In which cases using MongoDB (or similar databases) is better than using a "classic" relational databases?
And what are the advantages of MongoDB vs MySQL in general?
Or at least, why is it so different?
If you have pointers to documentation and/or examples, it would be of great help too.
Here are some of the advantages of MongoDB for building web applications:
A document-based data model. The basic unit of storage is analogous to JSON, Python dictionaries, Ruby hashes, etc. This is a rich data structure capable of holding arrays and other documents. This means you can often represent in a single entity a construct that would require several tables to properly represent in a relational db. This is especially useful if your data is immutable.
Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
No schema migrations. Since MongoDB is schema-free, your code defines your schema.
A clear path to horizontal scalability.
You'll need to read more about it and play with it to get a better idea. Here's an online demo:
http://try.mongodb.org/
There are numerous advantages.
For instance your database schema will be more scalable, you won't have to worry about migrations, the code will be more pleasant to write... For instance here's one of my model's code :
class Setting
include MongoMapper::Document
key :news_search, String, :required => true
key :is_availaible_for_iphone, :required => true, :default => false
belongs_to :movie
end
Adding a key is just adding a line of code !
There are also other advantages that will appear in the long run, like a better scallability and speed.
... But keep in mind that a non-relational database is not better than a relational one. If your database has a lot of relations and normalization, it might make little sense to use something like MongoDB. It's all about finding the right tool for the job.
For more things to read I'd recommend taking a look at "Why I think Mongo is to Databases what Rails was to Frameworks" or this post on the mongodb website. To get excited and if you speak french, take a look at this article explaining how to set up MongoDB from scratch.
Edit: I almost forgot to tell you about this railscast by Ryan. It's very interesting and makes you want to start right away!
The advantage of schema-free is that you can dump whatever your load is in it, and no one will ever have any ground for complaining about it, or for saying that it was wrong.
It also means that whatever you dump in it, remains totally void of meaning after you have done so.
Some would label that a gross disadvantage, some others won't.
The fact that a relational database has a well-established schema, is a consequence of the fact that it has a well-established set of extensional predicates, which are what allows us to attach meaning to what is recorded in the database, and which are also a necessary prerequisite for us to do so.
Without a well-established schema, no extensional predicates, and without extensional precicates, no way for the user to make any meaning out of what was stuffed in it.
My experience with Postgres and Mongo after working with both the databases in my projects .
Postgres(RDBMS)
Postgres is recommended if your future applications have a complicated schema that needs lots of joins or all the data have relations or if we have heavy writing. Postgres is open source, faster, ACID compliant and uses less memory on disk, and is all around good performant for JSON storage also and includes full serializability of transactions with 3 levels of transaction isolation.
The biggest advantage of staying with Postgres is that we have best of both worlds. We can store data into JSONB with constraints, consistency and speed. On the other hand, we can use all SQL features for other types of data. The underlying engine is very stable and copes well with a good range of data volumes. It also runs on your choice of hardware and operating system. Postgres providing NoSQL capabilities along with full transaction support, storing JSON documents with constraints on the fields data.
General Constraints for Postgres
Scaling Postgres Horizontally is significantly harder, but doable.
Fast read operations cannot be fully achieved with Postgres.
NO SQL Data Bases
Mongo DB (Wired Tiger)
MongoDB may beat Postgres in dimension of “horizontal scale”. Storing JSON is what Mongo is optimized to do. Mongo stores its data in a binary format called BSONb which is (roughly) just a binary representation of a superset of JSON. MongoDB stores objects exactly as they were designed. According to MongoDB, for write-intensive applications, Mongo says the new engine(Wired Tiger) gives users an up to 10x increase in write performance(I should try this), with 80 percent reduction in storage utilization, helping to lower costs of storage, achieve greater utilization of hardware.
General Constraints of MongoDb
The usage of a schema less storage engine leads to the problem of implicit schemas. These schemas aren’t defined by our storage engine but instead are defined based on application behavior and expectations.
Stand-alone NoSQL technologies do not meet ACID standards because they sacrifice critical data protections in favor of high throughput performance for unstructured applications. It’s not hard to apply ACID on NoSQL databases but it would make database slow and inflexible up to some extent. “Most of the NoSQL limitations were optimized in the newer versions and releases which have overcome its previous limitations up to a great extent”.
It's all about trade offs. MongoDB is fast but not ACID, it has no transactions. It is better than MySQL in some use cases and worse in others.
Bellow Lines Written in MongoDB: The Definitive Guide.
There are several good reasons:
Keeping different kinds of documents in the same collection can be a
nightmare for developers and admins. Developers need to make sure
that each query is only returning documents of a certain kind or
that the application code performing a query can handle documents of
different shapes. If we’re querying for blog posts, it’s a hassle to
weed out documents containing author data.
It is much faster to get a list of collections than to extract a
list of the types in a collection. For example, if we had a type key
in the collection that said whether each document was a “skim,”
“whole,” or “chunky monkey” document, it would be much slower to
find those three values in a single collection than to have three
separate collections and query for their names
Grouping documents of the same kind together in the same collection
allows for data locality. Getting several blog posts from a
collection containing only posts will likely require fewer disk
seeks than getting the same posts from a collection con- taining
posts and author data.
We begin to impose some structure on our documents when we create
indexes. (This is especially true in the case of unique indexes.)
These indexes are defined per collection. By putting only documents
of a single type into the same collection, we can index our
collections more efficiently
After a question of databases with textual storage), I glanced at MongoDB and similar systems.
If I understood correctly, they are supposed to be easier to use and setup, and much faster. Perhaps also more secure as the lack of SQL prevents SQL injection...
Apparently, MongoDB is used mostly for Web applications.
Basically, and they state that themselves, these databases aren't suited for complex queries, data-mining, etc. But they shine at retrieving quickly lot of flat data.
MongoDB supports search by fields, regular expression searches.Includes user defined java script functions.
MongoDB can be used as a file system, taking advantage of load balancing and data replication features over multiple machines for storing files.