Clarification of uses of different NoSQL databases - nosql

I understand it may seem a redundant question, but I hope it will clarify all my and other users doubts.
Here are the noSQL I am talking about can someone explain me:
The best use cases (When I should use it)
The pros and the cons (Limitations included)
Their added value (why it is better, possibly a mathematical/scientific explanation)
of MongoDB, Redis, CouchDB, Hadoop
Thanks
g

MongoDb and CouchDb are not key-value storages, but document-stores.

The best way to clarify doubts - reading technical documentation and overview =)
In short - MongoDb and CouchDb are fast enough, reliable key-value storages, that persist
data to disc. MongoDb works over custom TCP/IP protocol, CouchDb use REST approach over HTTP
Redis is another kind of ket/value storage, that stores all data in memory, all writes and reads go directly to memory. This approach has some drawbacks and benefits as well. It persists changes at disc too
Hadoop is not just a key/value storage. It's an engine for the distributed processing of large data. If you are going to write Google you can use Hadoop.
In my opinion, if you are not going to build something specific (i think you won't), go ahead and use MongoDB.

Related

Using PostgreSQL or PostgreSQL + MongoDB?

I'm currently planning a social-media application - especially the backend.
Basically I have all the social aspects for which I want to use SQL (PostgreSQL I guess) but I also have geolocations organized in lists (so many-to-one) which will propably make out the biggest ammount of data. I know that PostgreSQL has modules for GIS capabilities and my initial thought was to just use PostgreSQL for everything, just for the sake of simplicity and because performance of Geolocation searches should be around the same for both systems, if not even in favor of PostgreSQL. I can also use JSON Type in PostgreSQL so it basically has the most obvious advantages of MongoDB covered.
On the other hand I'm affraid of scalability as the geolocations are going to be the biggest chunk of data and the tables are propably going have heaps of rows.
So my thought now is to implement geolocations in MongoDB with its easy scalability, easy to use geolocation search and embedd e.g Comments/Likes for a geolocation directly into the document, which would make the geolocation reads/searches way easier but then again I had to combine this data with social data from SQL, e.g fetch all users that commented a geolocation and get their profile info from PostgreSQL and 'manually' combine it. Even though parts of this could be done on frontend saving me a lot of resources.
I'm not sure how good this idea performs and if I'm really doing myself a favor there.
tldr: Use PostgreSQL.
Long answer:
You are trying to pre-optimize for a problem you don't even know if you will have. You don't know how many geolocations you will have, what the usage behaviors will be of your users and you probably don't even have any users yet.
I've used MongoDB before and migrated to PostgreSQL. There are many, many features and benefits to using a 'real' database for storing highly structured data. I suggest googling around for 'PostgreSQL vs X' articles, but the overall consensus that I've found is that PGSQL is extremely mature, reliable, performant and supported.
From my personal experience using Mongo then switching to PGSQL, I will never use Mongo again unless PGSQL (or another full-fledged SQL database) is completely falling over and I've spent months fixing it. Even then I'd take a hard look at other NoSQL databases too. PGSQL has so many amazing features and powerful tools that make it a joy to use.
For the seemingly few things you think you need Mongo for, PGSQL can do, and do just as well or better. It has native JSON types with indexes, geo support, full text indexing, etc. PGSQL has been around longer and has more support (useful for debugging, performance tuning, etc).
Regardless of which technologies you are thinking of using, you can't make any sort of informed decision if you don't:
Test with large data sets
and
Know your usage patterns and data volumes
So at this point I'd pick the more matured and powerful tool and setup monitoring for it. Watch the usage and performance of PGSQL, see how it holds up. Research best practices for PGSQL. Get to know it, learn it, dive in deep. When it comes to scaling individual services, each one is somewhat unique and will not fit a simple "Should I use X or Y?" question.
Good luck!

Which is better to rebuild stackoverflow, CouchDB or MongoDB?

If we would try to create a similar site like stackoverflow (this site), what is your preference between the CouchDB and MongoDB? why?
I guess this site seems need only pre-defined queries, and data is accumulated, when we want a master-master replication, couchDB seems to be a better choice.
but from performance point of view, mongo seems better.
Intuitively I think couchdb. It's easier to replicate. And that's important if you want to scale... as you already guessed.

NoSql vs BigTable (comparison of client API's)

This might sound like a dumb question but I am recently learning about a Big Table.
Would someone please tell me the advantage of using Big Table over NoSql databases. I eventually see both of them as semi-structured data storage. Some people mention that Big Table has much more simple interface as compared to a NoSql database but I don't quite understand how. Also is there a way I can try out API's of Big Table ??
Also , does Big Table have web interfaces , if yes can I get links to it as well ?
BigTable is Google's system to store large documents of data. It doesn't generates relations between documents as it doesn't benefit the architecture of Google's applications. This philosophy of "Unrelated" data-instances are the basic idea of NoSQL. So long story short, BigTable is NoSQL as NoSQl is the theoretical idea ( just like RDBMS is the basic theory of MySQL,MSSQL and others ).
An approach of bigtable has been made and gave birth to hadoop. It is widely used by many industries.An other related implementation is storm which tries to operates faster when it comes to serve real time data.
Regarding NoSQL databases you should take a good look at hbase, cassandra and if you are coming from the RDBMS world MongoDB would be the best choice to start realizing the use of NoSQL.
Mind to take a good look at the Google's notes regarding BigTable.
Cheers!

NoSQL Document DB

I am in a need for a document-oriented database for a project I am working on. I basically have two things I need: Full ACID support and the ability to have references. Scalability is not a major issue since the number of total users is at most 300.
I know MongoDB supports references between documents and CouchDB supports ACID but I have not found one that has both.
I am really trying to avoid implementing either (ACID, References) in the application layer. The obvious fallback is RDBMS with some tree structure implementation which I am also trying to avoid.
Any suggestions?
THANX
You require ACID and full references, and CouchDB is not good for that.
You do not require scalability either. My guess is a database that is well-known wouldn't hurt either.
For those reasons, a relational database sounds appropriate.
Checkout RavenDB - it has both ACID and transaction support AND it supports the notion of relations between documents via Includes and Live Projections. Denormalization can probably come in handy, too.
Don't use an RDBMS if your business logic says it doesn't like it.
You mentioned the constraints - you mentioned correctly what CouchDB/MongoDB give you.
So based on those facts: use your fallback.

Something about MongoDB

I'm new to MongoDB, can anyone explain how it could be used in efficiently in enterprise applications, so as to give good performance (using joins, indexing etc.)
And perhaps also point me to any MongoDB production applications on the web.
For a good introduction to MongoDB, check out The Little MongoDB Book. Here's a list of sites currently using MongoDB in production.
You talk about Joins and Indexes. It seems your head is still in the RDBMS world. NoSQL and Mongo are not just different Relational Databases there a completely different way of managing and thinking of Data. You need to think of your data schema in terms of Structured Objects rather than rows.
Sounds like you need to start from the beginning. MongoDB.org has a lot of the info you're asking for already available. Specifically, read their page on use cases, and the page on production deployments.
A more specific question would receive more comprehensive answers and fewer downvotes.