comment system rdbms vs nosql - mongodb

What is the best way to implement a commenting system (huge data writing)?
1) Use a RDBMS database such as MySQL, 2 tables one for the topics and one for the comments
Pros is that the insertion of new comment is fast, efficient and simple, efficient indexing. Cons is that scaling out (horizontal scaling) is hard.
2) Use a nosql database such as couchdb or mongodb, Pros is that scaling out (horizontal scaling) is easy, Supports huge data writes, schemaless Cons I think that the insertion of new data is not fast and efficient as the RDBMS
For example to update couchdb document you need to grab the whole document, update it locally the submit it again, and the document size will be huge so it will consume bandwidth.
Also I think that couchdb in-place updates, Mongodb updates would be slow and won't be efficient as in RDBMS
Also when you want to get the comments of the each user in various topics I think the search would be faster in RDBMS than in the nosql system.
That is a sample of couchdb database document [document sample for each topic]
{"_id":"doc id",
"_rev":"45521231465421"
"topic_title":"the title of the topic"
"topic_body":"the body of the topic"
"comments":[
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla1"}, {"user":"user1"}
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla2"}, {"user":"user2"}
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla3"}, {"user":"user3"}
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla4"}, {"user":"user4"}
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla5"}, {"user":"user5"}
{"date":"mm/dd/yy hh:mm:ss"}, {"commment":"bla6"}, {"user":"user6"}
]
}

I think that the insertion of new data is not fast and efficient as the RDBMS
You have hit something there. Insertion speed of NoSQL databases relies upon your scenario. I cannot make that clear enough, so many people expect MongoDB to just perform magically faster than SQL and are sorely disappointed when it does not for them, in fact before now the mongodb-user Google group has been filled with such people.
For example to update couchdb
Not only that but CouchDB also uses versioning and JSON which is not as efficient as storing it in SQL and will consume more space per record.
Mongodb updates would be slow and won't be efficient as in RDBMS
Schema, Queries, Schema, Queries...
That is what it comes down to. Ask yourself one question.
Will I be expecting a lot of comments per post?
If so the in-memory (yes, in-memory) $push, $pull and other subdocument operators may get slow on a large subdocument (let's be honest, will).
Not only that but consistently growing documents can be a problem and can cause heavy fragmentation and space usage, creating a "swiss cheese" effect slowing your system down massively (bringing it to a grinding halt). This presentation should help understand more about how storage really works: http://www.10gen.com/presentations/storage-engine-internals
So you already know that, if used wrong, subdocuments can be a bad idea. That being said you could partially remedy it with power of 2 sizes allocation: http://docs.mongodb.org/manual/reference/command/collMod/#usePowerOf2Sizes but if you are getting way too many comment insertions then it won't help too much.
I personally would not embed this relationship.
So I would go for the same set-up as a RDBMS and now you start to see the problem. Insertions will probably be about the same speed if it wasn't for MongoDBs fsync queue, unlike SQL which writes straight to disk. You can set-up MongoDB with journalled writes but then you will probably get the same performance metrics from SQL at the end of the day.
As for querying, this is where MongoDB can still come out on top, providing your working set fits into RAM. I cannot bold that last bit enough!!
Unlike SQL, MongoDB maps everything (your entire data) to virtual memory, not RAM and definitely not to be confused with RAM. This does make it faster for larger lookups, for smaller lookups the speed will be about the same because both will be serving from in-memory cache.
Also when you want to get the comments of the each user in various topics I think the search would be faster in RDBMS than in the nosql system.
If the topic id is in the comment document it would definitely be faster in MongoDB, providing your working set is ready in RAM.
What is meant by the working set? Here is a good answer: What does it mean to fit "working set" into RAM for MongoDB?
Hope this helps,

I can speak only about MongoDB and you are indeed wrong about inserts. Here is nice comparison of Mongo with MSSQL and Mongo is performing 100x times better then MSSQL. So it's quite suitable for large data processing.
Searching is also much more faster (what would be the whole point of NoSQL if inserting and searching wouldn't be faster?) - but with one caveat, you can't perform joins in queries, you have to join tables manually in your application (but there is recommanded workaround - nested documents).

Related

NoSQL vs. Relational Databases vs. Possible Hybrid

I'm hearing more about NoSQL, but have yet had someone give me a clear explanation of how it is to be used instead of relational databases.
I've read that it can't do left joins, so I was trying to figure out how you'd be able to use such a data storage. From reading: Preserve Joins by code in MongoDB it seems like a suggestion is to just make a large table, as if you already did the joins on it.
If the above statement is true, then I can see how it can be used. However I'm curious on how you'd handle repeat data. As the concept of normalizing, helps you remove the redundancy and ensure consistency in the data (e.g. Slight modifications like capitalization, white space, etc)...
Are we simply sacrificing the consistency of the data for scalable speed, or am I missing something?
Edit
I've been doing some more digging and found the answers the following questions useful for clarifying my understanding:
Why Google's BigTable referred as a NoSQL database?
How do you track record relations in NoSQL?
My understanding of consistency seems to be correct from those answers. And it looks like NoSQL is suppose to be used for specific problems types and that if you need relations that you should use a relational database.
But this raises more questions like:
It makes me wonder about real life examples of when to use NoSQL versus when not to?
By denormalizing the data, you should be able to solve all of the same problems that relational databases do... But there are rules on how to normalize data with relational databases. Are there rules that one can use to help them denormalize the data to use a NoSQL solution?
Any examples on when you might want to consider using both a NoSQL solution in parallel with a relational database?
MongoDB has the ability to have documents which include arrays of other documents. This solves many cases where you would have relations in reational databases.
When an invoice has multiple positions, you wouldn't put these positions into a separate collection. You would embed them as an array.
It makes me wonder about real life examples of when to use NoSQL versus when not to?
There are many different NoSQL databases, each one designed with different use-cases in mind. But you tagged this question as MongoDB, so I assume that you mean MongoDB in particular.
MongoDB has two main advantages over relational databases.
First, it scales well.
When the database is too slow or too big, you can easily add more servers by creating a cluster or replica-set of multiple shards. This doesn't work nearly as well with most relational databases.
Second, it allows heterogeneous data.
Imagine, for example, the product database of a computer hardware store. What properties do products have? All products have a price and a vendor. But CPUs have a clock rate, hard drives and RAM chips have a capacity (and these capacities aren't comparable), monitors have a resolution and so on. How would you design this in a relational database? You would either create a very long productID-property-value table or you would create a very wide and sparse product table with every property you can imagine, but most of them being NULL for most products. Both solutions aren't really elegant. But MongoDB can solve this much better because it allows each document in a collection to have a different set of properties.
What can't it do?
As a rather new technology, there isn't that much literature about it. The software ecosystem around it isn't that well either. The tools you can get for relational databases are often much more shiny.
There are also some use-cases MongoDB isn't well-suited for.
MongoDB doesn't do JOINs. When your data is very relational and denormalizing it would be counter-productive, it might be a poor choice for your product. But you might want to take a look at graph databases like Neo4j, which focus even more on relations than relational databases. Update 2016: MongoDB 3.2 now has rudimentary JOIN support with the $lookup aggregation stage, but it's still very limited in functionality compared to relational and graph databases.
MongoDB doesn't do transactions. At least not complex transactions. Certain actions which only affect a single document are guaranteed to be atomic, but as soon as you affect more than one document, you can't guarantee that no other query will happen in-between and find an inconsistent state.
MongoDB is bad for ad-hoc reporting. Its options for data-mining are severely limited. The rather new aggregation functions help and MapReduce can also solve some surprisingly complex problems when you learn to use it smart, but SQL has usually the better tools for things like that.
By denormalizing the data, you should be able to solve all of the same problems that relational databases do... But there are rules on how to normalize data with relational databases. Are there rules that one can use to help them denormalize the data to use a NoSQL solution?
Relational databases are around for about 40 years. Their theory is a well-researched topic in computer science. There are whole libraries of books written about the theory behind them. There is a by-the-book solution for every imaginable corner-case by now.
But NoSQL databases, on the other hand, are a rather new technology. We are still figuring out the best practices. The most frequent advise is: "Use your own head. Think about what queries are performed most often, and optimize your data schema for them."
Any examples on when you might want to consider using both a NoSQL solution in parallel with a relational database?
When possible I would advise against using two different database technologies in the same product:
Anyone who maintains and supports the product must be familiar with both technologies
Troubleshooting gets a lot harder
The sysadmins need to keep an additional database running and updated
You have an additional point of failure which can lead to downtime
I would only recommend to mix database technologies when fulfilling your requirements without it doesn't just become hard but physically impossible. Otherwise, make your pick and stay with it.

MongoDB: BIllions of documents in a collection

I need to load 6.6 billion bigrams into a collection but I can't find any information on the best way to do this.
Loading that many documents onto a single primary key index would take forever but as far as I'm aware mongo doesn't support the equivalent of partitioning?
Would sharding help? Should I try and split the data set over many collections and build that logic into my application?
It's hard to say what the optimal bulk insert is -- this partly depends on the size of the objects you're inserting and other immeasurable factors. You could try a few ranges and see what gives you the best performance. As an alternative, some people like using mongoimport, which is pretty fast, but your import data needs to be json or csv. There's obviously mongodrestore, if the data is in BSON format.
Mongo can easily handle billions of documents and can have billions of documents in the one collection but remember that the maximum document size is 16mb. There are many folk with billions of documents in MongoDB and there's lots of discussions about it on the MongoDB Google User Group. Here's a document on using a large number of collections that you may like to read, if you change your mind and want to have multiple collections instead. The more collections you have, the more indexes you will have also, which probably isn't what you want.
Here's a presentation from Craigslist on inserting billions of documents into MongoDB and the guy's blogpost.
It does look like sharding would be a good solution for you but typically sharding is used for scaling across multiple servers and a lot of folk do it because they want to scale their writes or they are unable to keep their working set (data and indexes) in RAM. It is perfectly reasonable to start off with a single server and then move to a shard or replica-set as your data grows or you need extra redundancy and resilience.
However, there are other users use multiple mongods to get around locking limits of a single mongod with lots of writes. It's obvious but still worth saying but a multi-mongod setup is more complex to manage than a single server. If your IO or cpu isn't maxed out here, your working set is smaller than RAM and your data is easy to keep balanced (pretty randomly distributed), you should see improvement (with sharding on a single server). As a FYI, there is potential for memory and IO contention. With 2.2 having improved concurrency with db locking, I suspect that there will be much less of a reason for such a deployment.
You need to plan your move to sharding properly, i.e. think carefully about choosing your shard key. If you go this way then it's best to pre-split and turn off the balancer. It will be counter-productive to be moving data around to keep things balanced which means you will need to decide up front how to split it. Additionally, it is sometimes important to design your documents with the idea that some field will be useful for sharding on, or as a primary key.
Here's some good links -
Choosing a Shard Key
Blog post on shard keys
Overview presentation on sharding
Presentation on Sharding Best Practices
You can absolutely shard data in MongoDB (which partitions across N servers on the shard key). In fact, that's one of it's core strengths. There is no need to do that in your application.
For most use cases, I would strongly recommend doing that for 6.6 billion documents. In my experience, MongoDB performs better with a number of mid-range servers rather than one large one.

What operations are cheap/expensive in mongodb?

I'm reading up on MongoDB, and trying to get a sense of where it's best used. One question that I don't see a clear answer to is which operations are cheap or expensive, and under what conditions.
Can you help clarify?
Thanks.
It is often claimed that mongodb has insanely fast writes. While they are not slow indeed, this is quite an overstatement. Write throughput in mongodb is limited by global write lock. Yes, you heard me right, there can be only ONE* write operation happening on the server at any given moment.
Also I suggest you take advantage of schemaless nature of mongodb and store your data denormalized. Often it is possible to do just one disk seek to fetch all required data (because it is all in the same document). Less disk seeks - faster queries.
If data sits in RAM - no disk seeks are required at all, data is served right from memory. So, make sure you have enough RAM.
Map/Reduce, group, $where queries are slow.
It is not fast to keep writing to one big document (using $push, for example). The document will outgrow its disk boundaries and will have to be copied to another place, which involves more disk operations.
And I agree with #AurelienB, some basic principles are universal across all databases.
Update
* Since 2011, several major versions of mongodb were released, improving situation with locking (from server-wide to database-level to collection-level). A new storage engine was introduced, WiredTiger, which has document-level locks. All in all, writes should be significantly faster now, in 2018.
From my practice one thing that should mentioned is that mongodb not very good fit for reporting, because usual in reports you need data from different collections ('join') and mongodb does not provide good way to aggregate data multiple collections (and not supposed to provide). For sure for some reports map/reduce or incremental map/reduce can work well, but it rare situations.
For reports some people suggest to migrate data into relations databases, that's have a lot of tools for reporting.
This is not very different than all database systems.
Query on indexed data are fast. Query on a lot of data are... slow.
Due to denormalization, if there is no index, writing on the base will be fast, that's why logging is the basic use case.
At the opposite, reading data which are on disk (not in RAM) without index can be very slow when you have billion of document.

CouchDB and MongoDB really search over each document with JavaScript?

From what I understand about these two "Not only SQL" databases. They search over each record and pass it to a JavaScript function you write which calculates which results are to be returned by looking at each one.
Is that actually how it works? Sounds worse than using a plain RBMS without any indexed keys.
I built my schemas so they don't require join operations which leaves me with simple searches on indexed int columns. In other words, the columns are in RAM and a quick value check through them (WHERE user_id IN (12,43,5,2) or revision = 4) gives the database a simple list of ID's which it uses to find in the actual rows in the massive data collection.
So I'm trying to imagine how in the world looking through every single row in the database could be considered acceptable (if indeed this is how it works). Perhaps someone can correct me because I know I must be missing something.
#Xeoncross
I built my schemas so they don't require join operations which leaves me with simple searches on indexed int columns. In other words, the columns are in RAM and a quick value check through them (WHERE user_id IN (12,43,5,2) or revision = 4)
Well then, you'll love MongoDB. MongoDB support indexes so you can index user_id and revision and this query will be able to return relatively quickly.
However, please note that many NoSQL DBs only support Key lookups and don't necessarily support "secondary indexes" so you have to do you homework on this one.
So I'm trying to imagine how in the world looking through every single row in the database could be considered acceptable (if indeed this is how it works).
Well if you run a query in an SQL-based database and you don't have an index that database will perform a table scan (i.e.: looking through every row).
They search over each record and pass it to a JavaScript function you write which calculates which results are to be returned by looking at each one.
So in practice most NoSQL databases support this. But please never use it for real-time queries. This option is primarily for performing map-reduce operations that are used to summarize data.
Here's maybe a different take on NoSQL. SQL is really good at relational operations, however relational operations don't scale very well. Many of the NoSQL are focused on Key-Value / Document-oriented concepts instead.
SQL works on the premise that you want normalized non-repeated data and that you to grab that data in big sets. NoSQL works on the premise that you want fast queries for certain "chunks" of data, but that you're willing to wait for data dependent on "big sets" (running map-reduces in the background).
It's a big trade-off, but if makes a lot of sense on modern web apps. Most of the time is spent loading one page (blog post, wiki entry, SO question) and most of the data is really tied to or "hanging off" that element. So the concept of grabbing everything you need with one query horizontally-scalable query is really useful.
It's the not the solution for everything, but it is a really good option for lots of use cases.
In terms of CouchDB, the Map function can be Javascript, but it can also be Erlang. (or another language altogether, if you pull in a 3rd Party View Server)
Additionally, Views are calculated incrementally. In other words, the map function is run on all the documents in the database upon creation, but further updates to the database only affect the related portions of the view.
The contents of a view are, in some ways, similar to an indexed field in an RDBMS. The output is a set of key/value pairs that can be searched very quickly, as they are stored as b-trees, which some RDBMSs use to store their indexes.
Think CouchDB stores the docs in a btree according to the "index" (view) and just walks this tree.. so it's not searching..
see http://guide.couchdb.org/draft/btree.html
You should study them up a bit more. It's not "worse" than and RDMBS it's different ... in fact, given certain domains/functions the "NoSQL" paradigm works out to be much quicker than traditional and in some opinions, outdated, RDMBS implementations. Think Google's Big Table platform and you get what MongoDB, Riak, CouchDB, Cassandra (Facebook) and many, many others are trying to accomplish. The primary difference is that most of these NoSQL solutions focus on Key/Value stores (some call these "document" databases) and have limited to no concept of relationships (in the primary/foreign key respect) and joins. Join operations on tables can be very expensive. Also, let's not forget the object relational impedance mismatch issue... You don't need an ORM to access MongoDB. It can actually store your code object (or document) as it is in memory. Can you imagine the savings in lines of code and complexity!? db4o is another lightweight solution that does this.
I don't know what you mean when you say "Not only SQL" database? It's a NoSQL paradigm - wherein no SQL is used to query the underlying data store of the system. NoSQL also means not an RDBMS which SQL is generally built on top of. Although, MongoDB does has an SQL like syntax that can be used from .NET when retrieving data - it's called NoRM.
I will say I've only really worked with Riak and MongoDB... I'm by no means familiar with Cassandra or CouchDB past a reading level and feature set comprehension. I prefer to use MongoDB over them all. Riak was nice too but not for what I needed. You should download a few of these NoSQL solutions and you will get the concept. Check out db4o, MongoDB and Riak as I've found them to be the easiest with more support for .NET based languages. It will just make sense for certain applications. All in all, the NoSQL or Document databse or OODBMS ... whatever you want to call it is very appealing and gaining lots of movement.
I also forgot about your javascript question... MongoDB has JavaScript "bindings" that enable it to be used as one method of searching for data. Riak handles data via a JSON format. MongoDB uses BSON I believe and I can't remember what the others use. In any case, the point is instead of SQL (structured query language) to "ask" the database for information some of these (MongoDB being one) use Javascript and/or RESTful syntax to ask the NoSQL system for data. I believe CouchDB and Riak can be queried over HTTP to which makes them very accessible. Not to mention, that's pretty frickin cool.
Do your research.... download them, they are all free and OSS.
db4o: http://www.db4o.com/ (Java & .NET versions)
MongoDB: mongodb.org/
Riak: http://www.basho.com/Riak.html
NoRM: http://thechangelog.com/post/436955815/norm-bringing-mongodb-to-net-linq-and-mono

What are the advantages of using a schema-free database like MongoDB compared to a relational database?

I'm used to using relational databases like MySQL or PostgreSQL, and combined with MVC frameworks such as Symfony, RoR or Django, and I think it works great.
But lately I've heard a lot about MongoDB which is a non-relational database, or, to quote the official definition,
a scalable, high-performance, open
source, schema-free, document-oriented
database.
I'm really interested in being on edge and want to be aware of all the options I'll have for a next project and choose the best technologies out there.
In which cases using MongoDB (or similar databases) is better than using a "classic" relational databases?
And what are the advantages of MongoDB vs MySQL in general?
Or at least, why is it so different?
If you have pointers to documentation and/or examples, it would be of great help too.
Here are some of the advantages of MongoDB for building web applications:
A document-based data model. The basic unit of storage is analogous to JSON, Python dictionaries, Ruby hashes, etc. This is a rich data structure capable of holding arrays and other documents. This means you can often represent in a single entity a construct that would require several tables to properly represent in a relational db. This is especially useful if your data is immutable.
Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
No schema migrations. Since MongoDB is schema-free, your code defines your schema.
A clear path to horizontal scalability.
You'll need to read more about it and play with it to get a better idea. Here's an online demo:
http://try.mongodb.org/
There are numerous advantages.
For instance your database schema will be more scalable, you won't have to worry about migrations, the code will be more pleasant to write... For instance here's one of my model's code :
class Setting
include MongoMapper::Document
key :news_search, String, :required => true
key :is_availaible_for_iphone, :required => true, :default => false
belongs_to :movie
end
Adding a key is just adding a line of code !
There are also other advantages that will appear in the long run, like a better scallability and speed.
... But keep in mind that a non-relational database is not better than a relational one. If your database has a lot of relations and normalization, it might make little sense to use something like MongoDB. It's all about finding the right tool for the job.
For more things to read I'd recommend taking a look at "Why I think Mongo is to Databases what Rails was to Frameworks" or this post on the mongodb website. To get excited and if you speak french, take a look at this article explaining how to set up MongoDB from scratch.
Edit: I almost forgot to tell you about this railscast by Ryan. It's very interesting and makes you want to start right away!
The advantage of schema-free is that you can dump whatever your load is in it, and no one will ever have any ground for complaining about it, or for saying that it was wrong.
It also means that whatever you dump in it, remains totally void of meaning after you have done so.
Some would label that a gross disadvantage, some others won't.
The fact that a relational database has a well-established schema, is a consequence of the fact that it has a well-established set of extensional predicates, which are what allows us to attach meaning to what is recorded in the database, and which are also a necessary prerequisite for us to do so.
Without a well-established schema, no extensional predicates, and without extensional precicates, no way for the user to make any meaning out of what was stuffed in it.
My experience with Postgres and Mongo after working with both the databases in my projects .
Postgres(RDBMS)
Postgres is recommended if your future applications have a complicated schema that needs lots of joins or all the data have relations or if we have heavy writing. Postgres is open source, faster, ACID compliant and uses less memory on disk, and is all around good performant for JSON storage also and includes full serializability of transactions with 3 levels of transaction isolation.
The biggest advantage of staying with Postgres is that we have best of both worlds. We can store data into JSONB with constraints, consistency and speed. On the other hand, we can use all SQL features for other types of data. The underlying engine is very stable and copes well with a good range of data volumes. It also runs on your choice of hardware and operating system. Postgres providing NoSQL capabilities along with full transaction support, storing JSON documents with constraints on the fields data.
General Constraints for Postgres
Scaling Postgres Horizontally is significantly harder, but doable.
Fast read operations cannot be fully achieved with Postgres.
NO SQL Data Bases
Mongo DB (Wired Tiger)
MongoDB may beat Postgres in dimension of “horizontal scale”. Storing JSON is what Mongo is optimized to do. Mongo stores its data in a binary format called BSONb which is (roughly) just a binary representation of a superset of JSON. MongoDB stores objects exactly as they were designed. According to MongoDB, for write-intensive applications, Mongo says the new engine(Wired Tiger) gives users an up to 10x increase in write performance(I should try this), with 80 percent reduction in storage utilization, helping to lower costs of storage, achieve greater utilization of hardware.
General Constraints of MongoDb
The usage of a schema less storage engine leads to the problem of implicit schemas. These schemas aren’t defined by our storage engine but instead are defined based on application behavior and expectations.
Stand-alone NoSQL technologies do not meet ACID standards because they sacrifice critical data protections in favor of high throughput performance for unstructured applications. It’s not hard to apply ACID on NoSQL databases but it would make database slow and inflexible up to some extent. “Most of the NoSQL limitations were optimized in the newer versions and releases which have overcome its previous limitations up to a great extent”.
It's all about trade offs. MongoDB is fast but not ACID, it has no transactions. It is better than MySQL in some use cases and worse in others.
Bellow Lines Written in MongoDB: The Definitive Guide.
There are several good reasons:
Keeping different kinds of documents in the same collection can be a
nightmare for developers and admins. Developers need to make sure
that each query is only returning documents of a certain kind or
that the application code performing a query can handle documents of
different shapes. If we’re querying for blog posts, it’s a hassle to
weed out documents containing author data.
It is much faster to get a list of collections than to extract a
list of the types in a collection. For example, if we had a type key
in the collection that said whether each document was a “skim,”
“whole,” or “chunky monkey” document, it would be much slower to
find those three values in a single collection than to have three
separate collections and query for their names
Grouping documents of the same kind together in the same collection
allows for data locality. Getting several blog posts from a
collection containing only posts will likely require fewer disk
seeks than getting the same posts from a collection con- taining
posts and author data.
We begin to impose some structure on our documents when we create
indexes. (This is especially true in the case of unique indexes.)
These indexes are defined per collection. By putting only documents
of a single type into the same collection, we can index our
collections more efficiently
After a question of databases with textual storage), I glanced at MongoDB and similar systems.
If I understood correctly, they are supposed to be easier to use and setup, and much faster. Perhaps also more secure as the lack of SQL prevents SQL injection...
Apparently, MongoDB is used mostly for Web applications.
Basically, and they state that themselves, these databases aren't suited for complex queries, data-mining, etc. But they shine at retrieving quickly lot of flat data.
MongoDB supports search by fields, regular expression searches.Includes user defined java script functions.
MongoDB can be used as a file system, taking advantage of load balancing and data replication features over multiple machines for storing files.