How is it possible for DynamoDB to support both Key-Value and Document database properties at the same time - nosql

As per DynamoDB's documentation it supports both key-value and document-oriented properties of NoSQL even though other NoSQL databases fall only under only one type either Key-Value or Document or Graph or Column-oriented
Also it says
Amazon DynamoDB is "built on the principles of Dynamo"[3] and is a hosted service within the AWS infrastructure. However, while Dynamo is based on leaderless replication, DynamoDB uses single-leader replication.
And Dynamo is
A set of techniques that together can form a highly available key-value structured storage system[1] or a distributed data store
So when DynamoDB is built on the principles of Dynamo which is not related to Document-oriented storage system and since it is mandatory for a developer to create a primary key and the table requires key for every item how and in what sense DynamoDB is called a Document-oriented database ?
Can a DB fall under two types of NoSQL databases in the first place ?

First, it is important to realize that "Dynamo" was an earlier nosql database designed by Amazon, and its design was made public in 2007 (e.g., see https://www.allthingsdistributed.com/2007/10/amazons_dynamo.html). Other people later took this design, and other contemporary designs (like Google's BigTable) and improved on them, resulting in projects such as Cassandra (2008). Amazon's DynamoDB was only released in 2012, based on many ideas from those other systems (and especially Cassandra) and had very little in common with the original "Dynamo". So almost anything you can say about the original "Dynamo" would not be relevant when you discuss the modern DynamoDB.
Now regarding your main question:
A key-value store holds for each key a single value. Arguably, if the value can be an entire document, you can call this database a "document store". In this sense, DynamoDB is a document store. The DynamoDB API lets you conveniently store a JSON document as the value, and also read or writes part of this document directly instead of reading or writing the entire document (although, you actually pay for reading and writing the entire document).
You should note that DynamoDB, like Cassandra and BigTable (and unlike the original "Dynamo") actually gives you more than that: Each so-called "partition key" can hold not just one value (or document), but a sorted list of such values. I mentioned this interesting feature, which I don't know how to call, in my question
How do you call the data model of DynamoDB and Cassandra?

Related

I would like to know if Spatialite is considered as a NoSQL database?

I would like to know if Spatialite is considered as a NoSQL database.
What is NoSQL?
NoSQL encompasses a wide variety of different database technologies
and were developed in response to a rise in the volume of data stored
about users, objects and products, the frequency in which this data is
accessed, and performance and processing needs. Relational databases,
on the other hand, were not designed to cope with the scale and
agility challenges that face modern applications, nor were they built
to take advantage of the cheap storage and processing power available
today.
NoSQL Database Types
Document databases pair each key with a complex data structure known
as a document. Documents can contain many different key-value pairs,
or key-array pairs, or even nested documents.
Graph stores are used to store information about networks, such as
social connections. Graph stores include Neo4J and HyperGraphDB.
Key-value stores are the simplest NoSQL databases. Every single item
in the database is stored as an attribute name (or "key"), together
with its value. Examples of key-value stores are Riak and Voldemort.
Some key-value stores, such as Redis, allow each value to have a type,
such as "integer", which adds functionality.
Wide-column stores such as Cassandra and HBase are optimized for
queries over large datasets, and store columns of data together,
instead of rows.
The Benefits of NoSQL
When compared to relational databases, NoSQL databases are more
scalable and provide superior performance, and their data model
addresses several issues that the relational model is not designed to
address:
Large volumes of structured, semi-structured, and unstructured data
Agile sprints, quick iteration, and frequent code pushes
Object-oriented programming that is easy to use and flexible
Efficient, scale-out architecture instead of expensive, monolithic
architecture
Explanation is from MongoDB site
NoSQL is a very vaguely defined term (I once wrote a blog post about this issue).
But even though the definition of NoSQL is quite fuzzy, you can definitely say that SpatiaLite is not a NoSQL database. It is, in fact, not a database at all. It is just a library for using SQLite (which is a SQL database).
The library includes some utility-functions which make it easier to store and query geospatial data in SQLite. But that data is still queried with normal SQL syntax and stored in a relational manner, so you couldn't even claim that it is a NoSQL abstraction layer on a SQL database.

why rdbms cant store unstructured data?and why nosql databases can?

I have read that one of the differences between rdbms and nosql databases is storing unstructured data,I know each nosql database has its own architecture and algorithms,but I want to know why rdbms cant store unstructured data?
and why nosql databases can do that,I will be really thankful if you show me a simple example so that I can understand how nosql databases do that,and what makes rdbms unable to store unstructured data.
Relational databases are based on Edgar F. Codd's relational data model which assumes strictly structured data. The whole SQL language is constructed around this model and the databases which implement it are optimized for working that way.
But in the past few years, there were attempts to add features to SQL which allow to work with unstructured data, like the SQL/XML extension which allows to store XML documents in fields of SQL tables and query their document-trees transparently.
Document-oriented databases like MongoDB or CouchDB, on the other hand, were designed from the start to work with unstructured data and their query languages were designed around this concept, so when working with unstructured data they are usually much faster and more convenient to use.
I read this question totally wrong and thought about the problem totally wrong at first (multiple locale definitions of "structured") so ignore my comments, however, MongoDB does actually store structured data. The Wikipedia definition (which may I say seems to differ across the internet in itself) is that ( http://en.wikipedia.org/wiki/Unstructured_data ):
Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner.
But that is untrue for MongoDB since it actually does have one:
{
_id:{}
}
Since the _id is always required, as such it has been more accurately said by MongoDB users recently that MongoDB has a "flexible" schema and not nessecarily schemaless which is why most people say it stores unstructured data.
So yes, it does kind of store unstructured data but not totally...
Simply put, NOSQL data stores are hierarchical, variable length, highly distributed file based systems with eventual consistency. The schema is embedded in the data (or in the code but NOSQLs are not schemaless), the columns can vary from one instance to the next even in the same structure, and the size of the columns is not fixed.

What does being schema-less mean for a NoSQL Database?

Schemaless is a term that is currently floating around in the NoSql world.
What does this mean ?
I have a document with 3 properties today and I move to production with it, then what happens to my data when I need to add 2 more properties to my document?
Is this purely a migrations problem where I need to manage my data migration or can a NoSql database create as much friction as a RDBMS or make it easier in someway ?
Schema-less is a bit of a misnomer, it's better to think of it as:
SQL = Schema enforced by a RDBMS on Write
NoSQL = Partial Schema enforced by the DBMS on Write, PLUS schema fully enforced by the Application on Read (Externalised
schema)
So while a supposed Schema-less NoSQL data-store will in theory allow you to store any data you like (typically key value pairs, in a document) without prior knowledge of the keys, or data types, it will be pointless unless you have some mechanism to retrieve and use the data. So essentially the schema is partially moved from the RDBMS into the application code. I say partially as you'll have added Indexes to document collections and or partitioned the data for performance, so the NoSQL DBMS will have a partial schema defined locally, and possibly enforced via Unique constraints.
As to adding additional attributes to a document/object in the store. Depending on how much padding is around the document (un-used space), in its physical data block, adding a few more key value pairs to the documents may result in the document having to be physically moved to a larger contiguous block of storage, and the associated indexes re-built. If you plan to use the new keys in a frequently utilised query then you'll be wanting to also add a suitable new index, which will obviously require some physical storage, take a while to initially build and possibly lead you to ask the sysadmin to allocate more memory to the DBMS, to allow the new index(s) to be cached.
A bit late in the day but while searching on the topic again I found this article
http://tech.pro/tutorial/1189/basics-of-ravendb-nosql
Refer to section 3 in the article, I will quote it again for ease.
Adding and changing data models to RavenDB couldn't be simpler. Since
it is a NoSQL database, it can handle additions and deletions to your
models very simply. If a property is added to your class, it will be
set to the default value of that type. If a property is deleted, then
upon deserialization that value will be ignored. No more futzing with
SQL Scripts.
This seems to be the logical answer for RavenDB.

What are the uses of a key-value database?

I've heard about highly scalable key-value databases, like Amazon DynamoDB and Kyoto Cabinet.
But I can't see an use for everyday problems.
Ex.: Suppose I have a "post object", which has an id, a title, a content and some tags, and I want to query the database to find all posts with some particular tag. How can I do it with a key-value database?
Key value stores are extremely fast (typically in memory) and provide eventual consistency. Many of the standard database features such as ACID may not exist but at the same time none of the limitations such as too many writes etc., also exist.
The idea is to think outside of the traditional key-value store as you would do in a hashtable and instead look at this form of storage as a de-normalized form of storage that is designed based on the queries you want to fire at it.
In your specific question I would use the tags as keys and the post-id as values. Then bingo...

What are the advantages of using a schema-free database like MongoDB compared to a relational database?

I'm used to using relational databases like MySQL or PostgreSQL, and combined with MVC frameworks such as Symfony, RoR or Django, and I think it works great.
But lately I've heard a lot about MongoDB which is a non-relational database, or, to quote the official definition,
a scalable, high-performance, open
source, schema-free, document-oriented
database.
I'm really interested in being on edge and want to be aware of all the options I'll have for a next project and choose the best technologies out there.
In which cases using MongoDB (or similar databases) is better than using a "classic" relational databases?
And what are the advantages of MongoDB vs MySQL in general?
Or at least, why is it so different?
If you have pointers to documentation and/or examples, it would be of great help too.
Here are some of the advantages of MongoDB for building web applications:
A document-based data model. The basic unit of storage is analogous to JSON, Python dictionaries, Ruby hashes, etc. This is a rich data structure capable of holding arrays and other documents. This means you can often represent in a single entity a construct that would require several tables to properly represent in a relational db. This is especially useful if your data is immutable.
Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
No schema migrations. Since MongoDB is schema-free, your code defines your schema.
A clear path to horizontal scalability.
You'll need to read more about it and play with it to get a better idea. Here's an online demo:
http://try.mongodb.org/
There are numerous advantages.
For instance your database schema will be more scalable, you won't have to worry about migrations, the code will be more pleasant to write... For instance here's one of my model's code :
class Setting
include MongoMapper::Document
key :news_search, String, :required => true
key :is_availaible_for_iphone, :required => true, :default => false
belongs_to :movie
end
Adding a key is just adding a line of code !
There are also other advantages that will appear in the long run, like a better scallability and speed.
... But keep in mind that a non-relational database is not better than a relational one. If your database has a lot of relations and normalization, it might make little sense to use something like MongoDB. It's all about finding the right tool for the job.
For more things to read I'd recommend taking a look at "Why I think Mongo is to Databases what Rails was to Frameworks" or this post on the mongodb website. To get excited and if you speak french, take a look at this article explaining how to set up MongoDB from scratch.
Edit: I almost forgot to tell you about this railscast by Ryan. It's very interesting and makes you want to start right away!
The advantage of schema-free is that you can dump whatever your load is in it, and no one will ever have any ground for complaining about it, or for saying that it was wrong.
It also means that whatever you dump in it, remains totally void of meaning after you have done so.
Some would label that a gross disadvantage, some others won't.
The fact that a relational database has a well-established schema, is a consequence of the fact that it has a well-established set of extensional predicates, which are what allows us to attach meaning to what is recorded in the database, and which are also a necessary prerequisite for us to do so.
Without a well-established schema, no extensional predicates, and without extensional precicates, no way for the user to make any meaning out of what was stuffed in it.
My experience with Postgres and Mongo after working with both the databases in my projects .
Postgres(RDBMS)
Postgres is recommended if your future applications have a complicated schema that needs lots of joins or all the data have relations or if we have heavy writing. Postgres is open source, faster, ACID compliant and uses less memory on disk, and is all around good performant for JSON storage also and includes full serializability of transactions with 3 levels of transaction isolation.
The biggest advantage of staying with Postgres is that we have best of both worlds. We can store data into JSONB with constraints, consistency and speed. On the other hand, we can use all SQL features for other types of data. The underlying engine is very stable and copes well with a good range of data volumes. It also runs on your choice of hardware and operating system. Postgres providing NoSQL capabilities along with full transaction support, storing JSON documents with constraints on the fields data.
General Constraints for Postgres
Scaling Postgres Horizontally is significantly harder, but doable.
Fast read operations cannot be fully achieved with Postgres.
NO SQL Data Bases
Mongo DB (Wired Tiger)
MongoDB may beat Postgres in dimension of “horizontal scale”. Storing JSON is what Mongo is optimized to do. Mongo stores its data in a binary format called BSONb which is (roughly) just a binary representation of a superset of JSON. MongoDB stores objects exactly as they were designed. According to MongoDB, for write-intensive applications, Mongo says the new engine(Wired Tiger) gives users an up to 10x increase in write performance(I should try this), with 80 percent reduction in storage utilization, helping to lower costs of storage, achieve greater utilization of hardware.
General Constraints of MongoDb
The usage of a schema less storage engine leads to the problem of implicit schemas. These schemas aren’t defined by our storage engine but instead are defined based on application behavior and expectations.
Stand-alone NoSQL technologies do not meet ACID standards because they sacrifice critical data protections in favor of high throughput performance for unstructured applications. It’s not hard to apply ACID on NoSQL databases but it would make database slow and inflexible up to some extent. “Most of the NoSQL limitations were optimized in the newer versions and releases which have overcome its previous limitations up to a great extent”.
It's all about trade offs. MongoDB is fast but not ACID, it has no transactions. It is better than MySQL in some use cases and worse in others.
Bellow Lines Written in MongoDB: The Definitive Guide.
There are several good reasons:
Keeping different kinds of documents in the same collection can be a
nightmare for developers and admins. Developers need to make sure
that each query is only returning documents of a certain kind or
that the application code performing a query can handle documents of
different shapes. If we’re querying for blog posts, it’s a hassle to
weed out documents containing author data.
It is much faster to get a list of collections than to extract a
list of the types in a collection. For example, if we had a type key
in the collection that said whether each document was a “skim,”
“whole,” or “chunky monkey” document, it would be much slower to
find those three values in a single collection than to have three
separate collections and query for their names
Grouping documents of the same kind together in the same collection
allows for data locality. Getting several blog posts from a
collection containing only posts will likely require fewer disk
seeks than getting the same posts from a collection con- taining
posts and author data.
We begin to impose some structure on our documents when we create
indexes. (This is especially true in the case of unique indexes.)
These indexes are defined per collection. By putting only documents
of a single type into the same collection, we can index our
collections more efficiently
After a question of databases with textual storage), I glanced at MongoDB and similar systems.
If I understood correctly, they are supposed to be easier to use and setup, and much faster. Perhaps also more secure as the lack of SQL prevents SQL injection...
Apparently, MongoDB is used mostly for Web applications.
Basically, and they state that themselves, these databases aren't suited for complex queries, data-mining, etc. But they shine at retrieving quickly lot of flat data.
MongoDB supports search by fields, regular expression searches.Includes user defined java script functions.
MongoDB can be used as a file system, taking advantage of load balancing and data replication features over multiple machines for storing files.