Documents store database and connected domain - mongodb

Consider this picture:
The book says documents store database struggle with highly connected domains because "relationships between aggregates aren’t firstclass citizens in the data model, most aggregate stores furnish only the insides of aggregates with structure, in the form of nested maps.
".
And besides: "Instead, the application that uses the database must build relationships from these flat, disconnected data structures."
I'm sorry, I don't understand what does it mean. Why documents store database struggle with a context based on highly relationships?

Because document stores do not support joins. Each time you need to get more data it is a separate query. Instead, document stores support the idea of nesting data within documents.

Related

What kind of data is not relational?

I noticed Storing relational data in MongoDB (NoSQL).
I've been trying to get my head around NoSQL, and I do see the benefits [...]
What I can't understand, and hope someone can clear up, is how to store data if it must be relational.
User David commented, "If data is relational, store it in a relational database."
What useful or significant data isn't relational?
You never just have, say, users with maybe addresses, end of story.
users always have associated comments/posts/favorites/reviews/friends, or whatever.
Look at any app. Look at Zillow. houses have one address. But they're also in school_districts, which contains many other houses. Now it's relational.
I've never seen an application where associations within the data model were dispensable. Or that lacked any entities that needed to be associated with more than one other entity.
What type of data is like that?
An example I can provide is something I'm working on right now. It's pure binary data blobs taken from measurements, which also has to be stored somewhere in a database. MongoDB is one of the more popular databases for such data. The only relation that could be necessary is keeping metadata about said measurement, like the Date/Time of recording.
To answer to your question, I think that data model with no relation between the different entities is very unlikely. MongoDB can handle relational data model but if your data model have a lot of relations, I recommend to not use Document-Oriented NoSQL database such as MongoDB.
I suggest you to take a look the following MongoDB documentation. These MongoDB docs explain how to represent data your MongoDB database.
Data Modeling Introduction (link)
Thinking in Documents: Part 1 (link)
Thinking in Documents: Part 2 (link)

MongoDB and one-to-many relation

I am trying to come up with a rough design for an application we're working on. What I'd like to know is, if there is a way to directly map a one to many relation in mongo.
My schema is like this:
There are a bunch of Devices.
Each device is known by it's name/ID uniquely.
Each device, can have multiple interfaces.
These interfaces can be added by a user in the front end at any given
time.
An interface is known uniquely by it's ID, and can be associated with
only one Device.
A device can contain at least an order of 100 interfaces.
I was going through MongoDB documentation wherein they mention things relating to Embedded document vs. multiple collections. By no means am I having a detailed clarity over this as I've just started with Mongo and meteor.
Question is, what could seemingly be a better approach? Having multiple small collections or having one big embedded collection. I know this question is somewhat subjective, I just need some clarity from folks who have more expertise in this field.
Another question is, suppose I go with the embedded model, is there a way to update only a part of the document (specific to the interface alone) so that as and when itf is added, it can be inserted into the same device document?
It depends on the purpose of the application.
Big document
A good example on where you'd want a big embedded collection would be if you are not going to modify (normally) the data but you're going to query them a lot. In my application I use this for storing pre-processed trips with all the information. Therefore when someone wants to consult this trip, all the information is located in a single document. However if your query is based on a value that is embedded in a trip, inside a list this would be very slow. If that's the case I'd recommend creating another collection with a relation between both collections. Also for updating part of a document it would be slow since it would require you to fetch the whole document and then update it.
Small documents with relations
If you plan on modify the data a lot, I'd recommend you to stick to a reference to another collection. With small documents, this will allow you to update any collection quicker. If you want to model a unique relation you may consider using a unique index in mongo. This can be done using: db.members.createIndex( { "user_id": 1 }, { unique: true } ).
Therefore:
Big object: Great for querying data but slow for complex queries.
Small related collections: Great for updating but requires several queries on distinct collections.

When should I create a new collections in MongoDB?

So just a quick best practice question here. How do I know when I should create new collections in MongoDB?
I have an app that queries TV show data. Should each show have its own collection, or should they all be store within one collection with relevant data in the same document. Please explain why you chose the approach you did. (I'm still very new to MongoDB. I'm used to MySql.)
The Two Most Popular Approaches to Schema Design in MongoDB
Embed data into documents and store them in a single collection.
Normalize data across multiple collections.
Embedding Data
There are several reasons why MongoDB doesn't support joins across collections, and I won't get into all of them here. But the main reason why we don't need joins is because we can embed relevant data into a single hierarchical JSON document. We can think of it as pre-joining the data before we store it. In the relational database world, this amounts to denormalizing our data. In MongoDB, this is about the most routine thing we can do.
Normalizing Data
Even though MongoDB doesn't support joins, we can still store related data across multiple collections and still get to it all, albeit in a round about way. This requires us to store a reference to a key from one collection inside another collection. It sounds similar to relational databases, but MongoDB doesn't enforce any of key constraints for us like most relational databases do. Enforcing key constraints is left entirely up to us. We're good enough to manage it though, right?
Accessing all related data in this way means we're required to make at least one query for every collection the data is stored across. It's up to each of us to decide if we can live with that.
When to Embed Data
Embed data when that embedded data will be accessed at the same time as the rest of the document. Pre-joining data that is frequently used together reduces the amount of code we have to write to query across multiple collections. It also reduces the number of round trips to the server.
Embed data when that embedded data only pertains to that single document. Like most rules, we need to give this some thought before blindly following it. If we're storing an address for a user, we don't need to create a separate collection to store addresses just because the user might have a roommate with the same address. Remember, we're not normalizing here, so duplicating data to some degree is ok.
Embed data when you need "transaction-like" writes. Prior to v4.0, MongoDB did not support transactions, though it does guarantee that a single document write is atomic. It'll write the document or it won't. Writes across multiple collections could not be made atomic, and update anomalies could occur for how many ever number of scenarios we can imagine. This is no longer the case since v4.0, however it is still more typical to denormalize data to avoid the need for transactions.
When to Normalize Data
Normalize data when data that applies to many documents changes frequently. So here we're talking about "one to many" relationships. If we have a large number of documents that have a city field with the value "New York" and all of a sudden the city of New York decides to change its name to "New-New York", well then we have to update a lot of documents. Got anomalies? In cases like this where we suspect other cities will follow suit and change their name, then we'd be better off creating a cities collection containing a single document for each city.
Normalize data when data grows frequently. When documents grow, they have to be moved on disk. If we're embedding data that frequently grows beyond its allotted space, that document will have to be moved often. Since these documents are bigger each time they're moved, the process only grows more complex and won't get any better over time. By normalizing those embedded parts that grow frequently, we eliminate the need for the entire document to be moved.
Normalize data when the document is expected to grow larger than 16MB. Documents have a 16MB limit in MongoDB. That's just the way things are. We should start breaking them up into multiple collections if we ever approach that limit.
The Most Important Consideration to Schema Design in MongoDB is...
How our applications access and use data. This requires us to think? Uhg! What data is used together? What data is used mostly as read-only? What data is written to frequently? Let your applications data access patterns drive your schema, not the other way around.
The scope you've described is definitely not too much for "one collection". In fact, being able to store everything in a single place is the whole point of a MongoDB collection.
For the most part, you don't want to be thinking about querying across combined tables as you would in SQL. Unlike in SQL, MongoDB lets you avoid thinking in terms of "JOINs"--in fact MongoDB doesn't even support them natively.
See this slideshare:
http://www.slideshare.net/mongodb/migrating-from-rdbms-to-mongodb?related=1
Specifically look at slides 24 onward. Note how a MongoDB schema is meant to replace the multi-table schemas customary to SQL and RDBMS.
In MongoDB a single document holds all information regarding a record. All records are stored in a single collection.
Also see this question:
MongoDB query multiple collections at once

MongoDB object model design with list property

I just started to use MongoDB and I'm confused to build object models with list property.
I have a User model related to Followers and Following object which are list of User IDs.
So I can think of some object model structures to represent the relation.
Embedded Document. Followers and Following are embedded to User model. In this way, a "current_user" object is generated in many web frameworks in every request, and it's an extra overhead to serialize/deserialize the Follower and Following list property since we seldom use these properties in most requests. We can exclude these properties when "current_user" is generated. However, we need to fetch full "current_user" object again before we do any updates to it.
Use Reference Property in User model. We can have Followers and Following object models themselves, not embedded, but save references to the User object.
Use Reference Property in Followers and Following models. We can save User ID in Follower and Following property for later queries.
There might be some other ways to do it, easier to use or better performance. And my question is:
What's the suggested way to design a model with some related list properties?
For folks coming from the SQL world (such as myself) one of the hardest things to learn about MongoDB is the new style of schema design. In the SQL world, everything goes into third normal form. Folks come to think that there is a single right way to design their schema, because there typically is one.
In the MongoDB world, there is no one best schema design. More accurately, in MongoDB schema design depends on how the application is going to access the data.
Here are the key questions that you need to have answered in order to design a good schema for MongoDB:
How much data do you have?
What are your most common operations? Will you be mostly inserting new data, updating existing data, or doing queries?
What are your most common queries?
What are your most common updates?
How many I/O operations do you expect per second?
Here's how these questions might play out if you are considering one-to-many object relationships.
In SQL you simply create a pair of master/detail tables with a primary key/foreign key relationship. In MongoDB, you have a number of choices: you can embed the data, you can create a linked relationship, you can duplicate and denormalize the data, or you can use a hybrid approach.
The correct approach would depend on a lot of details about the use case of your application.
Here are some good general references on MongoDB schema design.
MongoDB presentations:
http://www.10gen.com/presentations/mongosf2011/schemabasics
http://www.10gen.com/presentations/mongosv-2011/schema-design-by-example
http://www.10gen.com/presentations/mongosf2011/schemascale
http://www.10gen.com/presentations/MongoNYC-2012/Building-a-MongoDB-Power-Chat-Server
Here are a couple of books about MongoDB schema design that I think you would find useful:
http://www.manning.com/banker/ (MongoDB in Action)
http://shop.oreilly.com/product/0636920018391.do (Document Design for MongoDB)
Here are some sample schema designs:
http://docs.mongodb.org/manual/use-cases/
https://openshift.redhat.com/community/blogs/designing-mongodb-schemas-with-embedded-non-embedded-and-bucket-structures

Is MongoDB object-oriented?

In the website of MongoDB they wrote that MonogDB is Document-oriented Database, so if the MongoDB is not an Object Oriented database, so what is it? and what are the differences between Document and Object oriented databases?
This may be a bit late in reply, but just thought it is worth pointing out, there are big differences between ODB and MongoDB.
In general, the focus of ODB is tranparent references (relations) between objects in an arbitarily complex domain model without having to use and manage code for something like a DBRef. Even if you have a couple thousand classes, you don't need to worry about managing any keys, they come for free and when you create instances of those 1000's of classes at runtime, they will automatically create the schema in the database .. even for things like a self-referencing object with collections of collections.
Also, your transactions can span these references, so you do not have to use a completely embedded model.
The concepts are those leveraged in ORM solutions like JPA, the managed persistent object life-cycle, is taken from the ODB space, but the HUGE difference is that there is no mapping AT ALL in the ODB and relations are stored as part of the database so there is no runtime JOIN to resolve relations, all relations are resolved with the same speed as a b-tree look-up. For those of you who have used Hibernate, imagine Hibernate without ANY mapping file and orders of magnitude faster becase there is no runtime JOIN behind the scenes.
Also, ODB allows queries across any relationship in your model, so you are not restricted to queries in a particular collection as you are in MongoDB. Of course, hash/b-tree/aggregate indexes are supported to so queries are very fast when they are used.
You can evolve instances of any class in an ODB at the class level and at runtime the correct class version is resolved. Quite different than the way it works in MongoDB maintaining code to decide how to deal with varied forms of blob ( or value object ) that result from evolving a schema-less database ... or writing the code to visit and change every value object because you wanted to change the schema.
As far as partioning goes, I think it is a lot easier to decide on a partitioning model for a domain model which can talk across arbitary objects, then it is to figure out the be-all, end-all embedding strategy for your collection contained documents in MongoDB. As a rediculous example, you have a Contact and an Address and a ShoppingCart and these are related in a JSON document and you decide to partition on Contact by Contact_id. There is absolutely nothing to keep you from treating those 3 classes as just objects instead of JSON documents and storing those with a partition on Contact_id just as you would with MongoDB. However, if you had another object Account and you wanted to manage those in a non-embedded way because of some aggregate billing operations done on accounts, you can have that for free ( no need to create code for a DBRef type ) in the ODB ... and you can choose to partition right along with the Contact or choose to store the Accounts in a completely separate physical node, yet it will all be connected at runtime in the application space ... just like magic.
If you want to see a really cool video on how to create an application with an ODB which shows distribution, object movement, fault tolerance, performance optimization .. see this ( if you want to skip to the cool part, jump about 21 minutes in and you will avoid the building of the application and just see the how easy it is to add distribution and fault tolerance to any existing application ):
http://www.blip.tv/file/3285543
I think doc-oriented and object-oriented databases are quite different. Fairly detailed post on this here:
http://blog.10gen.com/post/437029788/json-db-vs-odbms
Document-oriented
Documents (objects) map nicely to
programming language data types
Embedded documents and arrays reduce
need for joins
Dynamically-typed (schemaless) for
easy schema evolution
No joins and no (multi-object)
transactions for high performance and
easy scalability
(MongoDB Introduction)
In my understanding MongoDB treats every single record like a Document no matter it is 1 field or n fields. You can even have embedded Documents inside a Document. You don't have to define a schema which is very strictly controlled in other Relational DB Systems (MySQL, PorgeSQL etc.). I've used MongoDB for a while and I really like its philosophy.
Object Oriented is a database model in which information is represented in the form of objects as used in object-oriented programming (Wikipedia).
A document oriented database is a different concept to object and relational databases.
A document database may or may not contain field, whereas a relational or object database would expect missing fields to be filled with a null entry.
Imagine storing an XML or JSON string in a single field on a database table. That is similar to how a document database works. It simply allows semi-structured data to be stored in a database without having lots of null fields.