MongoDB + Postgres (or do I need a graph db?)

MongoDB + Postgres (or do I need a graph db?) - mongodb

I'm planning to build a wiki/resource app which, by itself, makes sense to be using Mongo for. However, the primary purpose of the app is to have associative tables showing connections between individual content items. A majorly simplified example would be, Odin, Zeus, Jupiter would be a row within an "Allfather" association. The problem is that these tables could grow indefinitely and it seems like developing this type of network in Mongo would be a rather complex and frustating experience.
I've thinking about using Mongo for the pages and just maintaining a small Postgres database for these associations, but something intuitively feels wrong about that. However, I'm an experienced frontend dev that's just starting to dabble in backend/database, so I'm not willing to trust my intuitions on databases yet haha.
Is postgres + mongo a good solution for the above problem, or is this where something like a graph database (which I only learned about yesterday) would come into play?

After spending the last several hours researching further, it does appear that a graph database is the right solution to managing the "association" feature that I'm looking to develop here because the actual relationships will be rather multidimensional in nature.
Furthermore, I've decided to go with ArangoDB as it merges key-value (ie. Redis or postgres' hstore IIRC), document store (ie. Mongo's documents or Postgres' JSONB), and also does graph database functionality. Arango can do joins between documents and, even better, it has a single, unified query language that works across all 3 types of models. It also has a rather robust tooling environment around it already that seems pretty promsing.
I found this youtube video rather enlightening as well if anyone wants a nice introduction to understanding why you might want to use a "multi-model database" like ArangoDB.

Related

Replace PostgreSQL with MongoDB?

My client has an existing PostgreSQL database with around 100 tables and most every table has one or more relationships to other tables. He's got around a thousand customers who use an app that hits that database.
Recently he hired a new frontend web developer, and that person is trying to tell him that we should throw out the PostgreSQL database and replace it with a MongoDB solution. That seems odd to me, but I don't have experience with MongoDB.
Is there any clear reasons why he should, or should not, make the change? Obviously I'm arguing against it and the other guy for it, but I would like to remove the "I like this one better" from the argument and really hear from the community on their experience with such things.

1) Performance
During last years, there were several benchmarks comparing Postgres and Mongo.
Here you can find the most recent performance benchmark (Yahoo): https://www.slideshare.net/profyclub_ru/postgres-vs-mongo-postgres-professional (start with slide #58, where some overview of the past becnhmarks is given).
Notice, that traditionally, MongoDB provided benchmarks, where they didn't turn on write ahead logging or even turned fsync off, so their benchmarks were unfair -- in such states the database system doesn't wait for filesystem, so TPS are high but probability to lose data is also very high.
2) Flexibility – JSON
Postgres has non-structured and semistructured data types since 2003 (hstore, XML, array data types). And now has very strong JSON support with indexing (jsonb data type), you can create partial indexes, functional indexes, index only part of JSON documents, index whole documents in different manners (you can tweek index to reduce it's size and speed).
More interestingly, with Postgres, you can combine relational approach and non-relational JSON data – see this talk again https://www.slideshare.net/profyclub_ru/postgres-vs-mongo-postgres-professional for details. This gives you a lot of flexibility and power (I wouldn't keep money-related or basic accounts-related data in JSON format).
3) Standards and costs of support
SQL experiences new born now -- NoSQL products started to add SQL dialects, there is a lot of people making big data analysis with SQL, you can even run machine learning algorithms inside RDBMS (see MADlib project http://madlib.incubator.apache.org).
When you need to work with data, SQL was, is and will be for long time the best language – there are such many things included to it, so all other languages are lagging too much. I recommend http://modern-sql.com/ to learn modern SQL features and https://use-the-index-luke.com (from the same author) to learn how reach the best performance using SQL.
When Mongo needed to create "BI connector", they also needed to speak SQL, so guess what they chose? https://www.linkedin.com/pulse/mongodb-32-now-powered-postgresql-john-de-goes
SQL will go nowhere, it's extended with SQL/JSON now and this means that for future, Postgres is an excellent choice.
4) Scalability
If you data size is up to several terabytes -- it's easy to live on "single master - multiple replicas" architectuyre either on your own installation or in clouds (Amazon RDS, Heroku, Google Cloud Platform, and since recently, Azure – all them support Postgres). There is an increasing number of solutions which help you to work with microservice architecture, have automatic failover, and/or shard your data. Here is only few of them, which are actively developed and supported, without specific order:
https://wiki.postgresql.org/wiki/PL/Proxy
https://github.com/zalando/spilo and https://github.com/zalando/patroni
https://github.com/dalibo/PAF
https://github.com/postgrespro/postgres_cluster
https://www.2ndquadrant.com/en/resources/bdr/
https://www.postgresql.org/docs/10/static/postgres-fdw.html
5) Extensibility
There are much more additional projects built to work with Postgres than with Mongo. You can work with literally any data type (including but not limited to time ranges, geospatial data, JSON, XML, arrays), have index support for it, ACID and manipulate with it using standard SQL. You can develop your own functions, data types, operators, index structures and much more!

If your data is relational (and it appears that it is), it makes no sense whatsoever to use a non-relational db (like mongodb). You can't underestimate the power and expressiveness of standard SQL queries.
On top of that, postgres has full ACID. And it can handle free-form JSON reasonably well, if that is that guy's primary motivation.

Neo4j instead of relational database

I am implementing a sinatra/rails based web portal that might eventually have few many:many relationships between tables/models. This is a one man team and part time but real world app.
I discussed my entity with someone and was advised to try neo4j. Coming from real 'non-sexy' enterprise world, my inclination is to use relational db until it stops scaling or becomes a nightmare because of sharding etc and then think about anything else.
HOWEVER,
I am using postgres for the first time in this project along with datamapper and its taking me time to get started very fast
I am just trying out few things and building more use cases so I consitently have to update my schema (prototyping idea and feedback from beta) . I wont have to do this in neo4j (except changing my queries)
Seems like its very easy to setup search using neo4j . But Postgres can do full text search as well.
Postgres recently announced support for json and javascript. Wondering if I should just stick with PG and invest more time learning PG (which has a good community) instead neo4j.
Looking for usecases where neo4j is better, especially at protyping/initial phase of a project. I understand if the website grows I might end up having multiple persistent technologies like s3, relational (PG), mongo etc.
Also it would be good to know how it plays out with Rails/Ruby ecosystem.
Update1:
I got a lot of good answers and seems like the right thing to do is stick with Postgres for now (especially since I deploy to heroku)
However the idea of being schema-less is tempting. Basically I am thinking of a approach where you don't define a datamodel until you have say 100-150 users and you have yourself figured out a good schema (business use cases) for your product , while you are just demoing the concept and getting feedback with limited signups. Then one can decide a schema and start with relational.
Would be nice to know if there are easy to use schema/less persistence option (based on ease to use/setup for new user) that might give up say scaling etc.

Graph databases should be considered if you have a really chaotic data model. They were needed to express highly complex relationships between entities. To do that, they store relationships at the data level whereas RDBMS use a declarative approach. Storing relationships only makes sense if these relationships are very different, otherwise you'll just end up duplicating data over and over, taking a lot of space for nothing.
To require such variety in relationships you'd have to handle huge amount of data. This is where graph databases shines because instand of doing tons of joins, they just pick a record and follow his relationships. To support my statement : you'll notice that every use cases on Neo4j's website are dealing with very complex data.
In brief, if you don't feel concerned with what I said above, I think you should use another technology. If this is just about scaling, schemalessness or starting fast a project, then look at other NoSQL solutions (more specifically, either column or document oriented databases). Otherwise you should stick with PostgreSQL. You could also, like you said, consider polyglot persistence,
About your update, you might consider hStore. I think it fits your requirements. It's a PostgreSQL module which also works on Heroku.

I don't think I agree that you should only use a graph database when your data model is very complex. I'm sure they could handle a simple data model/relationships as well.
If you have no prior experience with Neo4j or Postgres, then most likely both with take quite a bit of time to learn well.
Some things to keep in mind when picking:
It's not just about development against a database technology. You should consider deployment as well. How easy is it to deploy and scale Postgres/Neo4j?
Consider the community and tools around each technology. Is there a data mapper for Neo4j like there is for Postgres?
Consider that the data models are considerably different between the two. If you can already think relationally, then I'd probably stick with Postgres. If you go with Neo4j you're going to be making a lot of mistakes for several months with your data models.
Over time I've learned to keep it simple when I can. Postgres might be the boring choice compared to Neo4j, but boring doesn't keep you up at night. =)
Also I never see anyone mention it, but you should look at Riak (http://basho.com/riak/) too. It's a document database that also provides relationships (links) between objects. Not as mature as a graph database, but it can connect a few entities quickly.

The most appropriate choice depends on what problem you are trying to solve.
If you just have a few many to many tables, a relational database can be fine. In general, there is better OR-mapper support for relational databases, as they are much older and have a standardized interface and row-column structure. They also have been improved on for a long time, so they are stable and optimized for what they are doing.
A graph database is better if e.g. your problem is more about the connections between entities, especially if you need higher distance connections, like "detect cycles (of unspecified length)", some "what do friends-of-a-friend like". Things like that get unwieldy when restricted to SQL joins. A problem specific language like cypher in case of Neo4j makes that much more concise. On the downside, there are mappers between graph dbs and objects, but not for every framework and language under the sun.
I recently implemented a system prototype using neo4j and it was very useful to be able to talk about the structure and connections of our data and be able to model that one to one in the data storage. Also, adding other connections between data points was easy, neo4j being a schemaless storage. We ended up switching to mongodb due to troubles with write performance, but I don't think we could have finished the prototype with that in the same time.
Other NoSQL datastores like document based, column, key-value also cover specific usecases. Polyglot persistence is definitively something to look at, so keep your choice of backend reasonably separated from your business logic, to allow you to change your technology later if you learned something new.

When to use CouchDB over MongoDB and vice versa

I am stuck between these two NoSQL databases.
In my project, I will be creating a database within a database. For example, I need a solution to create dynamic tables.
So users can create tables with columns and rows. I think either MongoDB or CouchDB will be good for this, but I am not sure which one. I will also need efficient paging as well.

Of C, A & P (Consistency, Availability & Partition tolerance) which 2 are more important to you? Quick reference, the Visual Guide To NoSQL Systems
MongodB : Consistency and Partition Tolerance
CouchDB : Availability and Partition Tolerance
A blog post, Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j comparison has 'Best used' scenarios for each NoSQL database compared. Quoting the link,
MongoDB: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.
CouchDB : For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.
A recent (Feb 2012) and more comprehensive comparison by Riyad Kalla,
MongoDB : Master-Slave Replication ONLY
CouchDB : Master-Master Replication
A blog post (Oct 2011) by someone who tried both, A MongoDB Guy Learns CouchDB commented on the CouchDB's paging being not as useful.
A dated (Jun 2009) benchmark by Kristina Chodorow (part of team behind MongoDB),
I'd go for MongoDB.

The answers above all overcomplicate the story.
If you plan to have a mobile component, or need desktop users to work offline and then sync their work to a server you need CouchDB.
If your code will run only on the server then go with MongoDB
That's it. Unless you need CouchDB's (awesome) ability to replicate to mobile and desktop devices, MongoDB has the performance, community and tooling advantage at present.

Very old question but it's on top of Google and I don't quite like the answers I see so here's my own.
There's much more to Couchdb than the ability to develop CouchApps. Most people use CouchDb in a classical 3-tiers web architecture.
In practice the deciding factor for most people will be the fact that MongoDb allows ad-hoc querying with a SQL like syntax while CouchDb doesn't (you've got to create map/reduce views which turns some people off even though creating these views is Rapid Application Development friendly - they have nothing to do with stored procedures).
To address points raised in the accepted answer : CouchDb has a great versionning system, but it doesn't mean that it is only suited (or more suited) for places where versionning is important. Also, couchdb is heavy-write friendly thanks to its append-only nature (writes operations return in no time while guaranteeing that no data will ever be lost).
One very important thing that is not mentioned by anyone is the fact that CouchDb relies on b-tree indexes. This means that whether you have 1 "row" or 20 billions, the querying time will always remain below 10ms. This is a game changer which makes CouchDb a low-latency and read-friendly database, and this really shouldn't be overlooked.
To be fair and exhaustive the advantage MongoDb has over CouchDb is tooling and marketing. They have first-class citizen tools for all major languages and platforms making the on-boarding easy and this added to their adhoc querying makes the transition from SQL even easier.
CouchDb doesn't have this level of tooling - even though there are many libraries available today - but CouchDb is exposed as an HTTP API and it is therefore quite easy to create a wrapper in your favorite language to talk with it. I personally like this approach as it avoids bloat and allows you to only take what you want (interface segregation principle).
So I'd say using one or the other is largely a matter of comfort and preference with their paradigms. CouchDb approach "just fits", for certain people, but if after learning about the database features (in the exhaustive official guide) you don't have your "hell yeah" moment, you should probably move on.
I'd discourage using CouchDb if you just want to use "the right tool for the right job". because you'll find out that you can't just use it that way and you'll end up being pissed and writing blog posts such as "Where are joins in CouchDb ?" and "Where is transaction management ?". Indeed Couchdb is - paradoxically - very transparent but at the same time requires a paradigm shift and a change in the way you approach problems to really shine (and really work).
But once you've done that it really pays off. I'd personally need very strong reasons or a major deal breaker on a project to choose another database, but so far I haven't met any.
Update December 2022:
Since this post is still getting a lot of views, I felt important to inform people that I have recently moved to using MongoDB as my daily driver, while keeping CouchDB in my toolbelt for specialized cases where this database makes more sense (namely cases where views are not needed). There were multiple reasons for this choice, the most important ones were:
Performance: While precomputed indexes are a powerful asset, the main limitation of CouchDB is its QueryServer architecture. Every time a document is updated, it has to be serialized and processed by every view (even though this happens in a deferred manner, namely when the view is accessed). But more importantly, every time a view is updated (for example to add filtering logic for a new field added as part of the implementation of a new feature), ALL documents of the database must be sent to the view. This becomes a big deal when you have millions of documents in the database. You start worrying about the impact of updating your views and it becomes a distraction. Should you decide to create one database per data type to bypass this limitation, you'd then lose the ability to map/reduce across all your documents since views are scoped per database. MongoDB avoids this by segmenting documents into collections (ie. data types) so that when an index is updated only a subset of the data of the database is impacted. Moreover, MongoDB uses a binary format making these operations way more performant (while CouchDB uses JSON sent to the view server in plain text). This point may not be important if you do not design products needing to operate at large scale (hundreds of thousands of daily users or more).
the tooling available with MongoDB is comprehensive and mature, whether we are talking about the drivers officially supported for various programming languages, or integration with IDEs.
Advanced querying: A wide range of data types and advanced query capabilities are available out of the box (geo types, GridFS allowing one to store files of arbitrary size directly in the DB etc...). Having easy access to powerful query aggregation capabilities made me realize how much CouchDB had been inhibiting my productivity.
Seamless support for resharding: resharding is easy with MongoDB, while it is a dangerous operation involving moving files by hands with CouchDB.
Many other small items that improve quality of life and really add up.
I have been a big CouchDB fan but I have to admit that moving to MongoDB as a daily driver felt a lot like moving back to civilization in terms of productivity and quality of life improvement. Now I only consider CouchDB for key-value store scenarios (in which no map-reduce views are required and all that is needed is getting a document by key - CouchDB shines quite a lot for this), and advanced situations in which having per-user like databases is needed (for example to support advanced synchronization between devices).
The only drawback I see with MongoDB is that it consumes a lot of memory to the point that I cannot install it on development machines having low specs (while by comparison couchdb is launched at startup without me noticing and consumes almost no resource). However I feel this is worth it considering the time saved and the features provided.
As a long-time CouchDB user, the value I see in MongoDB is quite different from the items highlighted in the other answers promoting MongoDB so I felt it was important for me to provide this update (and also out of intellectual honestly when I remembered this post). CouchDB gave me quite a boost in productivity back in the days compared to the SQL products and ORMs I had been using, and at that time there were a lot of horror stories circulating regarding the reliability of MongoDB.
However, as of now, the few concerns I could have (and that were probably given disproportionate importance by internet folks - they essentially all boiled down to defaults whose reliability tradeoffs may surprise new users in a number of scenarios) no longer stand.
At this point, as a long-time CouchDB user in a great position to compare both products, I would recommend MongoDB to people needing a productive and scalable software development experience for their web app and advise to only pick CouchDB for specific needs.
CouchDB had momentum back in the days which probably influenced my perception, but development has stalled, no meaningful features have been introduced for a long-time, otherwise it would probably have caught up with MongoDB in terms of quality of life. I see two possible reasons for this: the way a now aborted rewrite of CouchDB has diverted resources for a long-time, and maybe early architectural decisions (such as the Query Server architecture) that may very well have restricted its future from the start. None of these aspects seem to be the priority of the core team.
I do not totally regret choosing CouchDB because it has been massively helpful and the mindset it has taught me is extremely helpful to allow me to write performant code in MongoDB (writing performant code in MongoDB is a breeze compared to the discipline one has to observe to solve business problems using CouchDB). However if I had to do it again today, I would have transitioned to MongoDB as my daily driver MUCH sooner. I'm usually quite good at picking the winning horse when technologies popup, but this time it seems I haven't played the game that well. Hope this helps.

Ask this questions yourself? And you will decide your DB selection.
Do you need master-master? Then CouchDB. Mainly CouchDB supports master-master replication which anticipates nodes being disconnected for long periods of time. MongoDB would not do well in that environment.
Do you need MAXIMUM R/W throughput? Then MongoDB
Do you need ultimate single-server durability because you are only going to have a single DB server? Then CouchDB.
Are you storing a MASSIVE data set that needs sharding while maintaining insane throughput? Then MongoDB.
Do you need strong consistency of data? Then MongoDB.
Do you need high availability of database? Then CouchDB.
Are you hoping multi databases and multi tables/ collections? Then MongoDB
You have a mobile app offline users and want to sync their activity data to a server? Then you need CouchDB.
Do you need large variety of querying engine? Then MongoDB
Do you need large community to be using DB? Then MongoDB

I summarize the answers found in that article:
http://www.quora.com/How-does-MongoDB-compare-to-CouchDB-What-are-the-advantages-and-disadvantages-of-each
MongoDB: Better querying, data storage in BSON (faster access), better data consistency, multiple collections
CouchDB: Better replication, with master to master replication and conflict resolution, data storage in JSON (human-readable, better access through REST services), querying through map-reduce.
So in conclusion, MongoDB is faster, CouchDB is safer.
Also: http://nosql.mypopescu.com/post/298557551/couchdb-vs-mongodb

Be aware of an issue with sparse unique indexes in MongoDB. I've hit it and it is extremely cumbersome to workaround.
The problem is this - you have a field, which is unique if present and you wish to find all the objects where the field is absent. The way sparse unique indexes are implemented in Mongo is that objects where that field is missing are not in the index at all - they cannot be retrieved by a query on that field - {$exists: false} just does not work.
The only workaround I have come up with is having a special null family of values, where an empty value is translated to a special prefix (like null:) concatenated to a uuid. This is a real headache, because one has to take care of transforming to/from the empty values when writing/quering/reading. A major nuisance.
I have never used server side javascript execution in MongoDB (it is not advised anyway) and their map/reduce has awful performance when there is just one Mongo node. Because of all these reasons I am now considering to check out CouchDB, maybe it fits more to my particular scenario.
BTW, if anyone knows the link to the respective Mongo issue describing the sparse unique index problem - please share.

I'm sure you can with Mongo (more familiar with it), and pretty sure you can with couch too.
Both are documented oriented (JSON-based) so there would be no "columns" but rather fields in documents -- but they can be fully dynamic.
They both do it you may want to look at other factors on which to use: other features you care about, popularity, etc. Google insights and indeed.com job posts would be ways to look at popularity.
You could just try it I think you should be able to have mongo running in 5 minutes.

MongoDB for personal non-distributed work

This might be answered here (or elsewhere) before but I keep getting mixed/no views on the internet.
I have never used anything else except SQL like databases and then I came across NoSQL DBs (mongoDB, specifically). I tried my hands on it. I was doing it just for fun, but everywhere the talk is that it is really great when you are using it across distributed servers. So I wonder, if it is any helpful(in a non-trivial way) for doing small projects and things mainly only on a personal computer? Are there some real advantages when there is just one server.
Although it would be cool to use MapReduce (and talk about it to peers :d) won't it be an overkill when used for small projects run on single servers? Or are there other advantages of this? I need some clear thought. Sorry if I sounded naive here.
Optional: Some examples where/how you have used would be great.
Thanks.

IMHO, MongoDB is perfectly valid for use for single server/small projects and it's not a pre-requisite that you should only use it for "big data" or multi server projects.
If MongoDB solves a particular requirement, it doesn't matter on the scale of the project so don't let that aspect sway you. Using MapReduce may be a bit overkill/not the best approach if you truly have low volume data and just want to do some basic aggregations - these could be done using the group operator (which currently has some limitations with regard to how much data it can return).
So I guess what I'm saying in general is, use the right tool for the job. There's nothing wrong with using MongoDB on small projects/single PC. If a RDBMS like SQL Server provides a better fit for your project then use that. If a NoSQL technology like MongoDB fits, then use that.

+1 on AdaTheDev - but there are 3 more things to note here:
Durability: From version 1.8 onwards, MongoDB has single server durability when started with --journal, so now it's more applicable to single-server scenarios
Choosing a NoSQL DB over say an RDBMS shouldn't be decided upon the single or multi server setting, but based on the modelling of the database. See for example 1 and 2 - it's easy to store comment-like structures in MongoDB.
MapReduce: again, it depends on the data modelling and the operation/calculation that needs to occur. Depending on the way you model your data you may or may not need to use MapReduce.

Which noSQL database to choose for a network daemon?

I am writing a custom server, which should be very performant.
It has 100.000-600.000 clients connected, and like 10 million records stored.
Database will run on a single server.
The server code is realized via twisted framework (in python).
Now I had it use MySQL, but I think a NoSQL database would be much more efficient (no complex queries, many simple writes / timestamp changes and many simple reads).
Which NoSQL database should I go for? Easy indexing would be a plus, I want the option to search the database from an administration system, create groups from logs containing a specific keyword and stuff like that.
I had a look at Cassandra and MongoDB, MongoDB seemed easier to get in / use for me.
Thanks for the help!

As far as pure learning curve goes, MongoDB has positioned itself to be a very friendly alternative to MySQL. Cassandra is a very different beast and will have a higher learning curve. That said, both have the potential to solve your problem based upon what you describe.

You have pretty simple requirements: easy indexing, arbitrary searches, grouping on keyword, etc -- pretty much every NoSQL system would work. It really comes down to the technologies with which you're comfortable. Like C#? Then go with RavenDB -- it can even automatically add indices as you execute queries. Like Erlang? Then you're a freak, but you should go with CouchDB. Like Javascript and JSON? Go with MongoDB.
Personally I really like Mongo, as it feels like a lovely hybrid of SQL and NoSQL databases. You can index the hell out of it (and get amazing performance!), which makes it almost like a RDBMS. You can also use it like a key/value store, and use it like a "giant hashtable in the sky". Still, YMMV. Play with them and see what works for you.

Cassandra is really designed for multiple server nodes, providing transparent replication. So you won't get the best value out of it with a single server host. Cassandra is also designed primarily for large-scale (and sacrifices indexing and flexible queries as a result). 10 million records isn't really very big, so you can afford to try something more flexible but less scalable.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse