What is the main difference between key-value and wide column nosql databases? - nosql

What is the main difference between key-value and wide column? Is it that all data from a given column is stored together, which makes reads of specific columns faster?

A wide-column store is like a 2-dimensional key-value store, where the first key is used as a row identifier and the second is used as a column identifier.

With a key-value nosql db, every key only maps to one value. With a wide-column nosql db, every key maps to potentially many columns that can be selected. This can make reads more efficient, since we only need to read the columns that we are interested in. With the key-value nosql db, all the columns would be in the same value field, so everything would have to be read.

Related

Is it possible to create a custom index on Postgres catalog table pg_largeobject?

I am aware of the fact that large objects are stored in a separate table called pg_largeobject, which stores b-tree indexed rows, and the user table merely stores the Oid of the object stored in pg_largeobject.
Now, creating an index on the column which stores just the Oid(s) is kind of absurd. So, can we create custom indexes on the pg_largeobject table for better performance of data retrieval and stuff?
No, you cannot do that, because pg_largeobject is a system catalog. It also wouldn't do you much good, since the objects are stored in chunks there.
If you want to index a large object, you are doing something wrong. The large object would be too large for fitting into an index entry anyway, and who wants to search like WHERE blob = '...'?
I suspect that you have some information stored inside the large object that you would like to index, like the (benighted) idea of keeping your state in a JSON, storing that as large object and then index one of its attributes.
It would be better to store such attributes that you want to search for outside the large object as regular table columns, then the problem would go away.
That said, in PostgreSQL you can define indexes on expressions, so if you use bytea rather than a large object (which is preferable for smaller binary data anyway), you can define an index on an expression that extracts the desired attribute from the binary data. You cannot do that with a large object, because the functions to access large objects are not IMMUTABLE, as the contents of a large object can change, while the oid stays the same.

Differences between NoSQL databases

NoSQL term has 4 categories.
Key\value stores
Document oriented
Graph
Column oriented.
From my point of view all these data modeling has same definition, What are differences?
Key\value database maintains data in structure like object in OOP. having access to data is base on unique key.
Column oriented is an approach like key\value! But in key\value, you cant access to value by query. I mean, queries are key-based.
Compare 1st & 2nd picture from 2 different categories.
Document oriented stores data in collections, something like rows. Having access to data is base on unique key. The collections store data like key\value. However, you can access data by value.
As you can see, In these 3 categories, we define a unique key for specify a unique object & some pairs of key\value for more information
Graph db is a little different.
So, what are differences in definition & in real-world?
Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.
Graph stores are used to store information about networks of data, such as social connections. Graph stores include Neo4J and Giraph.
Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as 'integer', which adds functionality.
Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
for more, follow this link on MongoDB

Custom Attribtues - No SQL Data Store

We want to develop a application which need to support custom attribtues to different entities (like user, project, folder, document etc..) in our application.
I googled and prima face it looks like No-SQL database can be suited for our requirement. Do you see any limitation ? What are the prons/cons of using No-SQL instead of RDBMS?
There are many NO-SQL databases available - http://nosql-database.org/ ? But we don't have any experiance in using No SQL database.Don't find any good article which compares these NO-SQL database. Any suggestion which No-SQL data store we can use to achive custom attribtues functionality?
One big advantage of No-sql database is its free-style: you will never specify the columns like "user, project, folder" before you insert your real data. The columns can be added at any time.
While in RDBMS, the table schema is strictly defined, can not modify during run time.
Another advantage is the performance in query. It is quite efficient if you query all the records of a user, say "Michael", since the data is stored following the principle of Big Table, named by google.
There are two ways to solve your question: a column database such as Cassandra; or a name-value pair (also called attribute-value pair) in relational.
First, Cassandra is a structured key-value store. A key can contain multiple and variable attributes and values. Values or columns are grouped into column families. The column families are fixed when a Cassandra database is created. A family is analogous to an entity in a logical data model or to a table in relational. Columns can be added to a family at any time. Thereby, different instances of the column family can have different columns, which is what you need. Furthermore, columns are assigned to specified keys, so different keys can have different numbers of columns in any given family.
A name value pair, also called an attribute value pair, can be created in logical data modeling and in relational. This can be done with three related entities or tables:
The base entity (such as customer), which in analogous to a column family.
A "type" entity, which describes the attribute and its characteristics such as Net Worth Amount,
A "value" entity, which assigns the attribute to an instance of a base entity and assigns it a value.
The "type" entity is simply a code table identified by a type code and containing a description and other domain characteristics. Domain refers to data type, length, meaning, and units of measure. It describes the attribute out of context (i.e., unassigned). An example could be Net Worth Amount, which is a number 8 digits with 2 decimal places, right justified, and its description is "a value representing the total financial value of a customer including liquid and non-liquid amounts".
The "value" entity is an associative entity or table that is identified by the customer id and the attribute type code, and has a value attribute that assigns the Net Worth Amount type the Customer and gives it a value, such as "$2,000,000."
However, in relational name-value pairs are somewhat difficult to query in SQL and generally do not perform well. This could be addressed by denormalizing the "type" and "value" entities into one. Instead of having three tables you have two -- one-to-many. Actually, that is essentially how Cassandra does it. A column family is a fully flattened attribute-value pair.
I hope this helps. If you are going to use NOSQL, I'd use something like Cassandra. If you use relational, I'd denormalize (i.e., collapse into one) the type and value. The advantage of relational is that your already have it. The disadvantage to Cassandra is that you have to learn it but it is build to do what you want.
Couchbase would be a great answer for you, if you can encapsulate your model into JSON then you are already halfway there. You can have any number of properties for your object:
product::001
{
"name": "Hard Drive",
"brand": "Toshiba",
...
...
}
To learn some simple patterns moving from RDBMS to Couchbase, check out their webinars at http://www.couchbase.com/webinars or some simple design patterns at http://CouchbaseModels.com (examples are in Ruby though)
The real advantage of Couchbase is schema flexibility, horizontal scalability on commodity hardware, and speed. After learning the basics, it fits better into Agile processes, with almost no need for migrations. In enterprise organizations it's very effective since every column modification will require business processes and approvals with the DBA. Couchbase schema flexibility circumvents a lot of these issues.

DynamoDB: Get All Items

I'm trying to retrieve all of the keys from a DynamoDB table in an optimized way. There are millions of keys.
In Cassandra I would probably create a single row with a column for every key which would eliminate to do a full table scan. DynamoDBs 64k limit per Item would seemingly preclude this option though.
Is there a quick way for me to get back all of the keys?
Thanks.
I believe the DynamoDB analogue would be to use composite keys: have a primary key of "allmykeys" and a range attribute of the originals being tracked: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/DataModel.html#DataModelPrimaryKey
I suspect this will scale poorly to billions of entries, but should work adequately for a few million.
Finally, again as with Cassandra, the most straightforward solution is to use map/reduce to get the keys: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.html

Non Relational Database , Key Value or flat table

My application needs configurable columns , and titles of these columns get configured in the begining, If relation database I would have created generic columns in table like CodeA, CodeB etc for this need because it helps queering on these columns (Code A = 11 ) it also helps in displaying the values (if that columns stores code and value) but now I am using Non Relational database Datastore (and I am new to it), should I follow the same old approach or I should use collection (Key Value pair) type of structure .
There will be lot of filters on these columns. Please suggest
What you've just described is one of the classic scenarios for a Key-Value database. The limitation here is that you will not have many of the set-based tools you're used to.
Most of the K-V databases are really good at loading one "record" or small set thereof. However, they don't tend to be any good at loading anything that may require a join. Given that you're using AppEngine, you probably appreciate this limitation. But it's worth stating.
As an important note, not all K-V database will allow you to "select by any column". Many K-V stores actually only allow for selection by a primary key. If you take a look at MongoDB, you'll find that you can query any column which sounds like a necessary feature.
I would suggest using key/value pairs where keys will act as your column names and value will be their data.