What is exactly demoralization in Nosql databases?
I have read it means modelling different object types as different documents. My first guess was it means Aggregation without storing related data, i.e storing all rows of an entity in a single document with related data being referred by different documents for each row.
But I'm not sure if this is correct or not?
An example would be helpful.
Thanks in advance
I do mean demoralization and not denormalization. I came across this term in the following links:
1. Couchbase documentation
2. Blog on Nosql
In the context of NoSQL (and database in general), demoralization is synonymous to denormalization. You can find mixed usage of demoralization and denormalization in many documents, or mention of demoralization being the opposite of normalization (so again, the same as denormalization) :
What Is Meant By Denormalization In SQL?
Database Denormalization
What is demoralization?
Normalization & Demoralization
Designing databases - OLTP and OLAP
There is even that reference, which mention that some/many spell checkers suggest "demoralization" instead of "denormalization". This could explain why some people use demoralization : The effect of denormalization
NoSQL is a very, very wide field. It covers a lot of entirely different databases systems with entirely different concepts of how data should be structured.
The dogma of database normalization applies mostly to classic relational databases. The further a NoSQL database is away from the relational philosophy, the more do you have to question this dogma.
The philosophy of normalization assumes that database JOINs are cheap. So any data which can be split over multiple tables to remove redundancies should be split. But that doesn't apply to all NoSQL databases. Some of them don't support JOIN operations, so getting data stored in many different database entries can be a very expensive operation which either requires multiple consecutive queries to the database or expensive database-sided code execution. When you use one of those databases, you should store your data in a way that every performance-critical use-case can be fulfilled by looking up as few entries as possible, even when this means that you will have redundant data.
Those non-relational NoSQL databases which don't support JOINs frequently support arrays in database entries instead. These are usually the preferred way to model 1:n relations. So when 1 person has n telephone numbers, you wouldn't store the telephone numbers in a separate table/document/collection/whateveryoucallit, you would store them in an array in the person entry. There is usually no reason to handle telephone numbers as self-sustained entities when it wouldn't be for the inability of SQL to work properly with multiple values in a single field.
Denormalization in a NoSQL world would mean the same as in a RDBMS world. Duplication of data for read performance.
Related
I would like to know if Spatialite is considered as a NoSQL database.
What is NoSQL?
NoSQL encompasses a wide variety of different database technologies
and were developed in response to a rise in the volume of data stored
about users, objects and products, the frequency in which this data is
accessed, and performance and processing needs. Relational databases,
on the other hand, were not designed to cope with the scale and
agility challenges that face modern applications, nor were they built
to take advantage of the cheap storage and processing power available
today.
NoSQL Database Types
Document databases pair each key with a complex data structure known
as a document. Documents can contain many different key-value pairs,
or key-array pairs, or even nested documents.
Graph stores are used to store information about networks, such as
social connections. Graph stores include Neo4J and HyperGraphDB.
Key-value stores are the simplest NoSQL databases. Every single item
in the database is stored as an attribute name (or "key"), together
with its value. Examples of key-value stores are Riak and Voldemort.
Some key-value stores, such as Redis, allow each value to have a type,
such as "integer", which adds functionality.
Wide-column stores such as Cassandra and HBase are optimized for
queries over large datasets, and store columns of data together,
instead of rows.
The Benefits of NoSQL
When compared to relational databases, NoSQL databases are more
scalable and provide superior performance, and their data model
addresses several issues that the relational model is not designed to
address:
Large volumes of structured, semi-structured, and unstructured data
Agile sprints, quick iteration, and frequent code pushes
Object-oriented programming that is easy to use and flexible
Efficient, scale-out architecture instead of expensive, monolithic
architecture
Explanation is from MongoDB site
NoSQL is a very vaguely defined term (I once wrote a blog post about this issue).
But even though the definition of NoSQL is quite fuzzy, you can definitely say that SpatiaLite is not a NoSQL database. It is, in fact, not a database at all. It is just a library for using SQLite (which is a SQL database).
The library includes some utility-functions which make it easier to store and query geospatial data in SQLite. But that data is still queried with normal SQL syntax and stored in a relational manner, so you couldn't even claim that it is a NoSQL abstraction layer on a SQL database.
I'm hearing more about NoSQL, but have yet had someone give me a clear explanation of how it is to be used instead of relational databases.
I've read that it can't do left joins, so I was trying to figure out how you'd be able to use such a data storage. From reading: Preserve Joins by code in MongoDB it seems like a suggestion is to just make a large table, as if you already did the joins on it.
If the above statement is true, then I can see how it can be used. However I'm curious on how you'd handle repeat data. As the concept of normalizing, helps you remove the redundancy and ensure consistency in the data (e.g. Slight modifications like capitalization, white space, etc)...
Are we simply sacrificing the consistency of the data for scalable speed, or am I missing something?
Edit
I've been doing some more digging and found the answers the following questions useful for clarifying my understanding:
Why Google's BigTable referred as a NoSQL database?
How do you track record relations in NoSQL?
My understanding of consistency seems to be correct from those answers. And it looks like NoSQL is suppose to be used for specific problems types and that if you need relations that you should use a relational database.
But this raises more questions like:
It makes me wonder about real life examples of when to use NoSQL versus when not to?
By denormalizing the data, you should be able to solve all of the same problems that relational databases do... But there are rules on how to normalize data with relational databases. Are there rules that one can use to help them denormalize the data to use a NoSQL solution?
Any examples on when you might want to consider using both a NoSQL solution in parallel with a relational database?
MongoDB has the ability to have documents which include arrays of other documents. This solves many cases where you would have relations in reational databases.
When an invoice has multiple positions, you wouldn't put these positions into a separate collection. You would embed them as an array.
It makes me wonder about real life examples of when to use NoSQL versus when not to?
There are many different NoSQL databases, each one designed with different use-cases in mind. But you tagged this question as MongoDB, so I assume that you mean MongoDB in particular.
MongoDB has two main advantages over relational databases.
First, it scales well.
When the database is too slow or too big, you can easily add more servers by creating a cluster or replica-set of multiple shards. This doesn't work nearly as well with most relational databases.
Second, it allows heterogeneous data.
Imagine, for example, the product database of a computer hardware store. What properties do products have? All products have a price and a vendor. But CPUs have a clock rate, hard drives and RAM chips have a capacity (and these capacities aren't comparable), monitors have a resolution and so on. How would you design this in a relational database? You would either create a very long productID-property-value table or you would create a very wide and sparse product table with every property you can imagine, but most of them being NULL for most products. Both solutions aren't really elegant. But MongoDB can solve this much better because it allows each document in a collection to have a different set of properties.
What can't it do?
As a rather new technology, there isn't that much literature about it. The software ecosystem around it isn't that well either. The tools you can get for relational databases are often much more shiny.
There are also some use-cases MongoDB isn't well-suited for.
MongoDB doesn't do JOINs. When your data is very relational and denormalizing it would be counter-productive, it might be a poor choice for your product. But you might want to take a look at graph databases like Neo4j, which focus even more on relations than relational databases. Update 2016: MongoDB 3.2 now has rudimentary JOIN support with the $lookup aggregation stage, but it's still very limited in functionality compared to relational and graph databases.
MongoDB doesn't do transactions. At least not complex transactions. Certain actions which only affect a single document are guaranteed to be atomic, but as soon as you affect more than one document, you can't guarantee that no other query will happen in-between and find an inconsistent state.
MongoDB is bad for ad-hoc reporting. Its options for data-mining are severely limited. The rather new aggregation functions help and MapReduce can also solve some surprisingly complex problems when you learn to use it smart, but SQL has usually the better tools for things like that.
By denormalizing the data, you should be able to solve all of the same problems that relational databases do... But there are rules on how to normalize data with relational databases. Are there rules that one can use to help them denormalize the data to use a NoSQL solution?
Relational databases are around for about 40 years. Their theory is a well-researched topic in computer science. There are whole libraries of books written about the theory behind them. There is a by-the-book solution for every imaginable corner-case by now.
But NoSQL databases, on the other hand, are a rather new technology. We are still figuring out the best practices. The most frequent advise is: "Use your own head. Think about what queries are performed most often, and optimize your data schema for them."
Any examples on when you might want to consider using both a NoSQL solution in parallel with a relational database?
When possible I would advise against using two different database technologies in the same product:
Anyone who maintains and supports the product must be familiar with both technologies
Troubleshooting gets a lot harder
The sysadmins need to keep an additional database running and updated
You have an additional point of failure which can lead to downtime
I would only recommend to mix database technologies when fulfilling your requirements without it doesn't just become hard but physically impossible. Otherwise, make your pick and stay with it.
Having been working with Mongodb and Solr/Lucene, I am starting to wonder why multi-value field for relational databases are (generally) considered an bad idea?
I am aware of the theoretical foundation of relational database and normalization. In practice, however, I ran into many use cases where I end up using an meta table of key-value pairs to supplement the main table, such as in the cases of tagging, where I wish I don't have to make multiple joins to look up the data. Or where requirements suddenly changed from having to support an single author to multiple authors per article.
So, what are some disadvantages of having multi-value fields or did the vendor choose not to support it since it not part of the SQL standard?
The main disadvantage is query bias. The phenomenon that such databases tend to get designed with one particular kind of query in mind, and turn out to be difficult to handle when other queries need to be written.
Suppose you have Students and Courses, and you model all of that so that you can say, in a single row in a single table, "John Doe takes {French, Algebra, Relational Theory}" and "Jane Doe takes {German, Functional Computing, Relational Theory}".
That makes it easy to query "what are all the courses followed by ...", but try and imagine what it would take to produce the answer to "what are all the students who follow Relational Theory".
Try and imagine all the things the system should itself be doing to give such a query (if it were possible to write it) any chance of performing reasonably ...
The query bias is assuming that SQL is a always a good query language. The fact is it is sometimes an excellent query language, but it has never been one size fits all. Multivalue databases allow you to pack multiple values and handle 'alternate perspective' queries.
Examples of MVDBs: UniData http://u2.rocketsoftware.com/products/u2-unidata, OpenInsight http://www.revelation.com/, Reality http://www.northgate-is.com/. There are many others.
Their query languages support what you are looking to do.
I think this has its roots in the fact that there is no simple, standard way to map a collection to a column in the Relational world. A mutifield value is basically a simple collection (an array of strings in most use cases), which is difficult to represent as a column. Some RDBMS support this by using a delimiter but then again, it starts to feel like an anti-pattern even if the DB driver lets you use multi-value fields in a relational database. Databases like MongoDB rely on a JSON-like structure to define the data, where collections are easily mapped and retrieved.
I have an MSSQL database which I am considering porting to CouchDB or MongoDB. I have a many-to-many relationship within the SQL db which has hundreds of thousands rows in the xref table, corresponding to tens of thousands of rows in the tables on each side of the relationship. Will CouchDB and/or MongoDB be able to handle this data, and what would be the best way of formatting the relevant documents for performant querying? Thanks very much.
For CouchDB, I would highly recommend reading this article about Entity Relationships.
One thing I would note in CouchDB is to be careful of attempting to "normalize" a non-relational data model. The document-based storage offers you a great deal of flexibility, and it's seldom the best idea to abstract everything into as many "document types" as you can think of. Many times, it's best to leave much of your data within the same document unless you have clear cases where separate entities exist.
One common use-case of many-to-many relationships is implementing tagging. There are articles about different methods you can use to accomplish this in CouchDB. It may apply to your requirements, it may not, but it's probably worth a read.
Since the 'collection' model of MongoDB is similar to tables you can of course maintain
the m:n relationship inside a dedicated mapping collection (using the _id of the related documents of the referenced documents from other collections).
If you can: consider redesign your application using embedded documents.
http://www.mongodb.org/display/DOCS/Schema+Design
In general: try to turn off your memories to a RDBMS when working with MongoDB.
Blindly copying the database design from RDBMS to MongoDB is neither helpful nor adviceable nor will it work in general.
There seem to be a lot of new "NoSQL" type databases out there.
Some of the popular ones are CouchDB, Cassandra and MongoDB.
What are the differences between such databases and how are they different from tradition relational databases? What are the advantages and disadvantages of picking NoSQL DBs over SQL DBs?
The term NoSQL covers a lot of different approaches to data storage ranging from the simplest key/value storage to sophisticated document databases. It's a catchy buzz word, but not very discriptive IMHO.
For a quick intro you could take a look at the Wikipedia entry for NoSQL
Agreed, the question is "not which is better," it's "which solution or set of solutions is best for this particular situation."
NoSQL covers a lot of different storage technologies such as CouchDB, MongoDB, Cassandra and Solr.
CouchDB and MongoDB store multi-dimensional data-structures. MongoDB is also schema-less. Cassandra is a column-based storage engine for fast retrieval, and Solr helps solve other problems such as faceting.
NoSQL simply refers to any storage facility which is not interacted with via SQL queries.
They are not better. NOSQL doesn't involve any new innovation or special feature. NOSQL just refers to a collection of software products that are used for certain types of application but don't necessarily have much else in common with each other. NOSQL does not have to mean a non-relational database.
Folks, Its a hot debate now a days, SQL or NoSQL, While some admire the elegance in terms of performance of NoSQL databases while others want to live with the legacy of SQL or the RDBMS. While each have its merits and demerits ,I tried to contrast it in brief using some points.
While RDBMS uses relations and joins to make data simpler in database tables
NoSQL don't use joins for performance.
NoSQL scales freely when we talk in terms of schema and data, while its very tough to scale a RDBMS if data grows.
There are restriction in size of data in RDBMS in terms of data-types capability, files of any size can be used in NoSQL databases.
Data integrity enforcement comes to play only in RDBMS not in NoSQL databases.
ACID is not the cup of tea for NoSQL databases but for RDBMS.
RDBMS supports complex transactions whereas NoSQL keeps mum for transactions.
NoSQL does not support constraints and validations while its the basic ingredient in RDBMS.
Data is not structured in NoSQL but is highly structured in form of tables in RDBMS.
Its all depends upon the nature and need of the project whether to use SQL or NoSQL.
RDBMS is completely structured way of storing data.
While the NoSQL is unstructured way of storing the data.
And another main difference is that the amount of data stored mainly depends on the Physical memory of the system. While in the NoSQL you don't have any such limits as you can scale the system horizontally.
You'll find that NoSQL database have few common characteristics. They can be roughly divided into a few categories:
key/value stores
Bigtable inspired databases (based on the Google Bigtable paper)
Dynamo inspired databases
distributed databases
document databases
Well,The basic difference are discussed below.Of course,now No-SQL concepts getting popular day by day.But still which one we need to use based on project need or requirements.
1) SQL databases are primarily called as RDBMS. whereas NoSQL database are primarily called as Non-Relational or Distributed database.
2) RDBMS will follow ACID properties i.e Atomcity,Consistency,Isolation,Durability.But in No-Sql it's following CAP (Consistency, Availability and Portioning).
3) In SQL we store data in Tabular formats only.But in No-SQL it uses collection of key-value pair, documents, graph databases or wide-column stores.So No-SQL is Schema free and It can handle structured, semi-structured and unstructured data.
But SQL is not Schema free.SQL is having Pre-Defined schema.i.e In SQL if you have table and in that first column is int data type,then you cant store string or Float values.
4) RDBMS follows SQL ( structured query language ) for defining and manipulating the data, which is very powerful. In NoSQL database, queries are focused on collection of documents. Sometimes it is also called as UnQL (Unstructured Query Language). The syntax of using UnQL varies from database to database.Also SQL databases are good fit for the complex query intensive environment whereas NoSQL databases are not good fit for complex queries. On a high-level, NoSQL don’t have standard interfaces to perform complex queries, and the queries themselves in NoSQL are not as powerful as SQL query language.
For Eg..Take Social Eng. sites,We upload photos/videos/Music/Album..etc.For that we get comments, replies to comments,like..etc.Here we can get numbers,special characters..,so almost we cant predict what might be the reply or comments.In this case we go for No-SQL in documented type like below to store the comments.
{
user_id: ObjectID("65f82bda42e7b8c76f5c1969"),
update: [
{
date: ISODate("2015-09-18T10:02:47.620Z"),
text: "Nice picture."
},
{
date: ISODate("2015-09-17T13:14:20.789Z"),
text: "1234#some smile symbol"
}
{
date: ISODate("2015-09-17T12:33:02.132Z"),
text: "...Oh my god.."
}
]
}
In Above if we go for SQL we cant store comments (text above) in column only.We need to store based on type.So we will end up with Big complex query with number of joins with different tables .But SQL is good for Transactions.
5)In most typical situations, SQL databases are Vertically scalable. You can manage increasing load by increasing the CPU, RAM, SSD, etc, on a single server. On the other hand, No-SQL databases are Horizontally scalable. You can just add few more servers easily in your No-SQL database infrastructure to handle the large traffic.
6)SQL databases are best fit for Heavy duty transnational type applications, as it is more stable and promises the atomicity as well as integrity of the data. While you can use NoSQL for transactions purpose, it is still not comparable and stable enough in high load and for complex transactional applications.
7)Examples for No-SQL are MangoDB,Cassandra..etc while for SQL are MySQL,SQL Server etc..