I have a schema to build, it is a bus station application, which stores the distance or other info between two nearby bus stations, it is unlikely to store one station id for the index or unique key, I think a better to do it is to group station 1 and station 2 as unique and index key, but I am a bit not confident whether it is the right way to do it, put 2 bus stations id into a array, make this array as index and unique key?
That sounds very reasonable... It's a "relationship table" or a many-to-many join table with additional attributes. You would store the distance as an attribute of the M:N relationship./ the two bus station ids would form the composite primary key.
Look at the image in section 2.1.5 of this modeling guide
You may want to learn a little bit about database design techniques. If so, some useful sources on databases and modeling are:
Fundamentals of Database Systems by Elmasri and Nevathi - Very technical about all aspects of database and covers modeling in depth
An Introduction to Database Systems by CJ Date - similar to above
A Tutorial on Normal Form (BCNF) - BCNF prescribes a means of bringing out your data model by iteratively applying a rule to your model till it meets normal form (is efficient - barring intended redundancy).
Wikipedia entry looks pretty concise - as above re BCNF but looks nice and concise (perhaps focus on sections 3 and 4)
EDIT: Update relevant to Mongo DB
Actually, the above is all pretty general for database modeling. Having a read of some of the excellent resources on Data Modeling Considerations for MongoDB Applications I think you need more specific guidance.
As such, I refer you to this informative SO post: how-to-organise-a-many-to-many-relationship-in-mongodb. The author gives a good explanation that sounds like what you're after. There's even references to docs and a video.
Related
I just start to learning DynamoDB and I face a big problem.
Suppose, I have an author and a book table where the author can have multiple books and each book must have an author.
so, In NonSQL DB I just embedded author information in book table to solve this problem.
Sample code: https://pastebin.ubuntu.com/p/DvHpS8JQJV/
But, recently I face a problem which is, if long time later admin want to change some information about author like, live attribute. How can I make effect in book table.
Note: Embedded book collection in author table could solve this problem but in future retrieve all books data with pagination and other operation could be more difficult.
In relational db it's every easy to solve just use foreign key and retrieve data by using join query.
How can I solve this type of problem In NonSQL or dynamoDB any suggestions?
You have two options.
Go with semi-sql design. Create separate table for books and autor. And joins will be handled on application level. It's not perfect from performance perspective, but it's easy to start for devs with SQL background.
Go with single table design. This is a complex topic. There is no silver bullet to handle one-to-many relationships like in SQL. You need good understanding of your domain and single table design to do this well.
I am coming from the relational database world (Rails / PostgreSQL) and transitioning to the NoSQL world (Meteor / MongoDB), so I am learning about denormalization, embedding and true links.
It seems that, in many cases, choosing between various database schemas comes down to the number of documents that will be "related" to each others.
In this video series, the author distinguishes:
one-to-many relationships from one-to-few relationships
many-to-many relationships from few-to-few relationships
So, I am wondering: where is the limit between few and many?
I guess there may not be a hard number, but are we in the dozens, the hundreds, the thousands or the millions?
It's all relative and is really kind of a dangerous question to make assumptions about when you're designing an architecture. It's worth investing time to make the right choices for your schema and your setup. I would advise a few steps:
Do the math. Multiply your relationships out based on what you expect your application to need to do. If you have a few nested arrays or embedded documents, a couple of "one-to-few" can expand out to many documents pretty easily when you start $unwinding them.
Write a prototype. Do some basic testing on your expected hardware/environment to see if it can easily handle that load when you do queries for all the data.
Based on your testing, create the limitations. This is where you need to draw the line on how many relations you can create per document, for each relationship type, before the system breaks down.
If it were me, I would say one-to-few is less than a dozen, and one-to-many is theoretically unlimited, but practically in the millions. Maybe there should be a middle ground of "one-to-some" to indicate possibly hundreds.
Taken from 6 rules of thumb for MongoDB schema design:
one-to-few - two until few hundreds
one-to-many - couple of hundreds until few thousands
one-to-squillions - thousands and up
I totally agree with #womp about the need to choose the right scheme for your use case. The article I posted above has some good guidelines and examples of which schema design to use.
I'm going to try and make this as straight-forward as I can.
Coming from MySQL and thinking in terms of tables, let's use the following example:
Let's say that we have a real-estate website and we're displaying a list of houses
normally, I'd use the following tables:
houses - the real estate asset at hand
owners - the owner of the house (one-to-many relationship with houses)
agencies - the real-estate broker agency (many-to-many relationship with houses)
images - many-to-one relationship with houses
reviews - many-to-one relationship with houses
I understand that MongoDB gives you the flexibility to design your web-app in different collections with unique IDs much like a relational database (normalized), and to enjoy quick selections, you can nest within a collection, related objects and data (un-normalized).
Back to our real-estate houses list, the query used to populate it is quite expensive in a normal relational DB, for each house you need to query its images, reviews, owner & agencies, each entity resides in a different table with its fields, you'd probably use joins and have multiple queries joined into one - Expensive!
Enter MongoDB - where you don't need joins, and you can store all the related data of a house in a house item on the houses collection, selection was never faster, it's a db heaven!
But what happens when you need to add/update/delete related reviews/agencies/owner/images?
This is a mystery to me, and if I need to guess, each related collection exist on its own collection on top of its data within the houses table, and once one of these pieces of related data is being added/updated/deleted you'll have to update it on its own collection as well as on the houses collection. Upon this update - do I need to query the other collections as well to make sure I'm updating the house record with all the updated related data?
I'm just guessing here and would really appreciate your feedback.
Thanks,
Ajar
Try this approach:
Work out which entity (or entities) are the hero(s)
With 'hero', I mean the entity(s) that the database is centered around. Let's take your example. The hero of the real-estate example is the house*.
Work out the ownerships
Go through the other entities, such as the owner, agency, images and reviews and ask yourself whether it makes sense to place their information together with the house. Would you have a cascading delete on any of the foreign keys in your relational database? If so, then that implies ownership.
Work out whether it actually matters that data is de-normalised
You will have agency (and probably owner) details spread across multiple houses. Does that matter?
Your house collection will probably look like this:
house: {
owner,
agency,
images[], // recommend references to GridFS here
reviews[] // you probably won't get too many of these for a single house
}
*Actually, it's probably the ad of the house (since houses are typically advertised on a real-estate website and that's probably what you're really interested in) so just consider that
Sarah Mei wrote an informative article about the kinds of issues that can arise with data integrity in nosql dbs. The choice between duplicate data or using id's, code based joins and the challenges with keeping data integrity. Her take is that any nosql db with code based joins will lose data integrity at some point. Imho the articles comments are as valuable as the article itself in understanding these issues and possible resolutions.
Link: http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/comment-page-1/
I would just like to give a normalization refresher from the MongoDB's perspective -
What are the goals of normalization?
Frees the database from modification anomalies - For MongoDB, it looks like embedding data would mostly cause this. And in fact, we should try to avoid embedding data in documents in MongoDB which possibly create these anomalies. Occasionally, we might need to duplicate data in the documents for performance reasons. However that's not the default approach. The default is to avoid it.
Should minimize re-design when extending - MongoDB is flexible enough because it allows addition of keys without re-designing all the documents
Avoid bias toward any particular access pattern - this is something, we're not going to worry about when describing schema in MongoDB. And one of the ideas behind the MongoDB is to tune up your database to the applications that we're trying to write and the problem we're trying to solve.
I’m creating a document model of my entities to store in a Document Database (RavenDB). The domain I’m modeling revolves around Incidents. An incident has a source, a priority, a category, a level of impact and many other classification attributes. In a RDBMS, I’d have an Incident table with Foreign Keys to the Priorities table, the Categories tables, the Impacts tables etc but I don't know how to handle that in a document database (that is my first Doc BD).
I have two types of reference data:
Simple lookup values: Countries, States, Sources, Languages. Attributes: They only have a Name but this is a multilingual system so there is name for each language. Supported operations: create, delete, rename, deactivate and merge.
Complex reference data: Same as the Simple Lookups plus: Several of those have many fields and have business rules and validation rules of their own. For instance two Priorities cannot have the same Rank value. Some have a more complex structure, for instance Categories are composed of Subcategories.
How should I model those as (or as part of) documents?
PS: Links to Document Database Modeling Guidelines would be appreciated as well
Handling relationships is very different for a document database to a SQL database. RavenDB documentation discusses this here. For things that rarely, if ever, change, you should use denormalized refences.
Further, there is a good discussion about modelling reference data by the main RavenDB author, here. You can expand this example to include a dictionary of abbreviations/names per locale pretty easily. An example of this, here.
To answer your specific questions:
You can store a key for each Country/State/etc and then retrieve the locale-specific version using this key, by loading the whole reference data document and performing an in-memory lookup.
Denormalized references would be a good suit for categories. You can include the Name and/or parent category if it has to be displayed. It sounds like the entity itself is small so you may as well store the whole thing (and don't need to denormalize it). It's ok to replicate it - it's cheaper to process this way and it won't change, or at least not often (and if it does you can use patching to update it). The same applies for your other entities. As far as I can see, business rules have nothing to do with the database, other than you must be able to run the appropriate queries to enforce them.
Update: Here's a post that describes how to deal with a tree structure in Raven.
I am trying to figure out how to best implement this for my system...and get my head out of the RDBMS space for now...
A part of my current DB has three tables: Show, ShowEntry, and Entry. Basically ShowEntry is a many-to-many joining table between Show and Entry. In my RDBMS thinking it's quite logical since any changes to Show details can be done in one place, and the same with Entry.
What's the best way to reflect this in a document-based storage? I'm sure there is no one way of doing this but I can't help but think if document-based storage is appropriate for this case at all.
FYI, I am currently considering implementing RavenDB. While discussions on general NoSQL design will be good a more RavenDB focused one will be fantastic!
Thanks,
D.
When modelling a many-to-many relationship in a document database, you usually store a collection of foreign keys in just one of the documents. The document you choose largely depends on the direction you intend to traverse the relationship. Traversing it one way is trivial, traversing it the other way requires an index.
Take the shopping basket example. It's more important to know exactly which items are in a particular basket than which baskets contain a particular item. Since we're usually following the relationship in the basket-to-item direction, it makes more sense to store item IDs in a basket than it does to store basket IDs in an item.
You can still traverse the relationship in the opposite direction (e.g. find baskets containing a particular item) by using an index, but the index will be updated in the background so it won't always be 100% accurate. (You can wait for the index to become accurate with WaitForNonStaleResults, but that delay will show in your UI.)
If you require immediate 100% accuracy in both directions, you can store foreign keys in both documents, but your application will have to update two documents whenever a relationship is created or destroyed.
This went a long way towards solving my question!
Answer to the question
Many-to-many relationships in NoSQL are implemented via an array of references on one of the entities.
You've got two options:
Show has an array of Entry items;
Entry has an array of Shows.
Location of the array is determined by the most common direction of querying. To resolve records in the other direction - index the array (in RavenDB it works like a charm).
Usually, having two arrays on both entities pointing to each other brings more grief than joy. You're losing the single source of truth in an eventually consistent environment... it has potential for breaking data integrity.
Check out this article - Entity Relationships in NoSQL (one-to-many, many-to-many). It covers entity relationships from various angles, taking into account performance, operational costs, time/costs of development and maintenance... and provides examples for RavenDB.