Differences between NoSQL databases - nosql

NoSQL term has 4 categories.
Key\value stores
Document oriented
Graph
Column oriented.
From my point of view all these data modeling has same definition, What are differences?
Key\value database maintains data in structure like object in OOP. having access to data is base on unique key.
Column oriented is an approach like key\value! But in key\value, you cant access to value by query. I mean, queries are key-based.
Compare 1st & 2nd picture from 2 different categories.
Document oriented stores data in collections, something like rows. Having access to data is base on unique key. The collections store data like key\value. However, you can access data by value.
As you can see, In these 3 categories, we define a unique key for specify a unique object & some pairs of key\value for more information
Graph db is a little different.
So, what are differences in definition & in real-world?

Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.
Graph stores are used to store information about networks of data, such as social connections. Graph stores include Neo4J and Giraph.
Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as 'integer', which adds functionality.
Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
for more, follow this link on MongoDB

Related

Documents store database and connected domain

Consider this picture:
The book says documents store database struggle with highly connected domains because "relationships between aggregates aren’t firstclass citizens in the data model, most aggregate stores furnish only the insides of aggregates with structure, in the form of nested maps.
".
And besides: "Instead, the application that uses the database must build relationships from these flat, disconnected data structures."
I'm sorry, I don't understand what does it mean. Why documents store database struggle with a context based on highly relationships?
Because document stores do not support joins. Each time you need to get more data it is a separate query. Instead, document stores support the idea of nesting data within documents.

how to store array of data without knowing the fields?

I am currently developing a Facebook application that calls upon the API and retrieve the user's info and display them on the front end.
However, since there are so many fields in the data. Could i actually store them into an array without knowing the fields using MongoDB?
Data in MongoDB has a flexible schema. Unlike SQL databases, where you
must determine and declare a table’s schema before inserting data,
MongoDB’s collections do not enforce document structure.
Read this articel: Data Modeling Introduction
Instead of inserting data to array you can insert it as a new key-value pair to document.

Dynamic Data inside MongoDB

I'm making dynamic data storage, which it was in relational data base as:
form -> field
form -> form_data
field -> field_data
form_data -> filed_data
and field data contains field_id, form_data_id, and value
but for scalability and performance I'm planing to move form_data and field_data to MongoDB and my problem now is how to design MongoDB collections, using one collection for all form_data and move field_data to map inside it and key is field_id and value is the value of this field_data, or make collection for each form record and store data direct in form_data without map as the all data in this case will be consistent.
In document-oriented databases like MongoDB you should always favor aggregation over references, because they don't support JOIN-operations on the database to link multiple documents.
That means that a "A has many B" relationship between two entities should not be modeled as two tables A and B, but rather as one collection of A where each A has an embedded array of B objects.
In the context of MongoDB, there is however an additional limitation: MongoDB doesn't like objects which grow. When an object accumulates more and more data over its lifetime, the object will grow. That means that the hard drive space MongoDB allocated for it will run out again and again, which will require space reallocation. This takes performance and fragments the database. Also, MongoDB has an artificial size limit for documents, mainly to discourage developers from designing growing objects.
Therefor:
When the data exists at creation, embed it directly.
When more and more data is added after creation, put it into a different collection.
When a form has X fields, the number of fields will likely not change much over its lifetime. So you should embed the fields and their descriptions directly into the form object.
But the number of answers entered into these forms will grow over time, which means that these should be treated as separate objects in a separate collection.
So I would recommend you to have two collections, forms and form_data.
Each document in forms embeds a sub-object of fields with the static field properties.
Each document in form_data has a field with the _id of the corresponding form and embeds a sub-object of field_data which uses the same keys as the fields sub-object of forms and stores the entries the user made in that form.
When your use-case requires frequent access to the aggregated data (like when you want to publish the up-to-date statistics on a public website), you could also store this information in the fields of forms to avoid an expensive aggregation query over many form_data documents. MongoDB in general recommends to orient your database schema rather on your performance requirements than on the semantics of your data.
Regarding your remark "all data in this case will be consistent": keep in mind that MongoDB does not enforce referential integrity. When an application deletes or changes a document, it's the responsibility of the application to fix any outdated references to it in other documents.

Custom Attribtues - No SQL Data Store

We want to develop a application which need to support custom attribtues to different entities (like user, project, folder, document etc..) in our application.
I googled and prima face it looks like No-SQL database can be suited for our requirement. Do you see any limitation ? What are the prons/cons of using No-SQL instead of RDBMS?
There are many NO-SQL databases available - http://nosql-database.org/ ? But we don't have any experiance in using No SQL database.Don't find any good article which compares these NO-SQL database. Any suggestion which No-SQL data store we can use to achive custom attribtues functionality?
One big advantage of No-sql database is its free-style: you will never specify the columns like "user, project, folder" before you insert your real data. The columns can be added at any time.
While in RDBMS, the table schema is strictly defined, can not modify during run time.
Another advantage is the performance in query. It is quite efficient if you query all the records of a user, say "Michael", since the data is stored following the principle of Big Table, named by google.
There are two ways to solve your question: a column database such as Cassandra; or a name-value pair (also called attribute-value pair) in relational.
First, Cassandra is a structured key-value store. A key can contain multiple and variable attributes and values. Values or columns are grouped into column families. The column families are fixed when a Cassandra database is created. A family is analogous to an entity in a logical data model or to a table in relational. Columns can be added to a family at any time. Thereby, different instances of the column family can have different columns, which is what you need. Furthermore, columns are assigned to specified keys, so different keys can have different numbers of columns in any given family.
A name value pair, also called an attribute value pair, can be created in logical data modeling and in relational. This can be done with three related entities or tables:
The base entity (such as customer), which in analogous to a column family.
A "type" entity, which describes the attribute and its characteristics such as Net Worth Amount,
A "value" entity, which assigns the attribute to an instance of a base entity and assigns it a value.
The "type" entity is simply a code table identified by a type code and containing a description and other domain characteristics. Domain refers to data type, length, meaning, and units of measure. It describes the attribute out of context (i.e., unassigned). An example could be Net Worth Amount, which is a number 8 digits with 2 decimal places, right justified, and its description is "a value representing the total financial value of a customer including liquid and non-liquid amounts".
The "value" entity is an associative entity or table that is identified by the customer id and the attribute type code, and has a value attribute that assigns the Net Worth Amount type the Customer and gives it a value, such as "$2,000,000."
However, in relational name-value pairs are somewhat difficult to query in SQL and generally do not perform well. This could be addressed by denormalizing the "type" and "value" entities into one. Instead of having three tables you have two -- one-to-many. Actually, that is essentially how Cassandra does it. A column family is a fully flattened attribute-value pair.
I hope this helps. If you are going to use NOSQL, I'd use something like Cassandra. If you use relational, I'd denormalize (i.e., collapse into one) the type and value. The advantage of relational is that your already have it. The disadvantage to Cassandra is that you have to learn it but it is build to do what you want.
Couchbase would be a great answer for you, if you can encapsulate your model into JSON then you are already halfway there. You can have any number of properties for your object:
product::001
{
"name": "Hard Drive",
"brand": "Toshiba",
...
...
}
To learn some simple patterns moving from RDBMS to Couchbase, check out their webinars at http://www.couchbase.com/webinars or some simple design patterns at http://CouchbaseModels.com (examples are in Ruby though)
The real advantage of Couchbase is schema flexibility, horizontal scalability on commodity hardware, and speed. After learning the basics, it fits better into Agile processes, with almost no need for migrations. In enterprise organizations it's very effective since every column modification will require business processes and approvals with the DBA. Couchbase schema flexibility circumvents a lot of these issues.

Mongodb : multiple specific collections or one "store-it-all" collection for performance / indexing

I'm logging different actions users make on our website. Each action can be of different type : a comment, a search query, a page view, a vote etc... Each of these types has its own schema and common infos. For instance :
comment : {"_id":(mongoId), "type":"comment", "date":4/7/2012,
"user":"Franck", "text":"This is a sample comment"}
search : {"_id":(mongoId), "type":"search", "date":4/6/2012,
"user":"Franck", "query":"mongodb"} etc...
Basically, in OOP or RDBMS, I would design an Action class / table and a set of inherited classes / tables (Comment, Search, Vote).
As MongoDb is schema less, I'm inclined to set up a unique collection ("Actions") where I would store these objects instead of multiple collections (collection Actions + collection Comments with a link key to its parent Action etc...).
My question is : what about performance / response time if I try to search by specific columns ?
As I understand indexing best practices, if I want "every users searching for mongodb", I would index columns "type" + "query". But it will not concern the whole set of data, only those of type "search".
Will MongoDb engine scan the whole table or merely focus on data having this specific schema ?
If you create sparse indexes mongo will ignore any rows that don't have the key. Though there is the specific limitation of sparse indexes that they can only index one field.
However, if you are only going to query using common fields there's absolutely no reason not to use a single collection.
I.e. if an index on user+type (or date+user+type) will satisfy all your querying needs - there's no reason to create multiple collections
Tip: use date objects for dates, use object ids not names where appropriate.
Here is some useful information from MongoDB's Best Practices
Store all data for a record in a single document.
MongoDB provides atomic operations at the document level. When data
for a record is stored in a single document the entire record can be
retrieved in a single seek operation, which is very efficient. In some
cases it may not be practical to store all data in a single document,
or it may negatively impact other operations. Make the trade-offs that
are best for your application.
Avoid Large Documents.
The maximum size for documents in MongoDB is 16MB. In practice most
documents are a few kilobytes or less. Consider documents more like
rows in a table than the tables themselves. Rather than maintaining
lists of records in a single document, instead make each record a
document. For large media documents, such as video, consider using
GridFS, a convention implemented by all the drivers that stores the
binary data across many smaller documents.