How can I use MongoDB as a cache for Postgresql? - postgresql

I have an application that can not afford to lose data, so Postgresql is my choice for database (ACID)
However, speed and query advantages of MongoDB are very attractive, but based on what I've read so far, MongoDB can report a successful write which may not have gone to disk, so I can't make it my mission critical db (I'll also need transactions)
I've seen references to people using mysql and MongoDB together, one for the transactions and the other for queries. Please not that I'm not talking about keeping some data in one DB and the rest in another. I want to use Postgresql as a gateway to data entry, and MongoDB for reads.
Are there any resources that offer an architecture/guide for Postgresql + MongoDB usage in this way? I can remember seeing this topic in Postgresql conference agenda, but I could not find the link.

I don't think you'll get much speed using MongoDB just as a cache. It's strengths are replication and horizontal scalability. On one computer you'd make Mongo and Postgres compete for memory, IO bandwidth and processor time.
As you can not afford to loose transactions you'll be better with Postgres only. Its has efficient caching, sophisticated query planner, prepared queries and wide indexing support cause that read-only queries will be very fast - really comparable to MongoDB on a single computer.
Postgres can even scale horizontally now using asynchronous, or, from version 9.1, synchronous replication.

One way to achieve this would be to set up a master-slave replication with the PostgreSQL database as master, and the MongoDB database as slave. You would then do all reads from MongoDB, and all writes to PostgreSQL.
This post discusses such a setup using a tool called Bucardo:
http://blog.endpoint.com/2011/06/mongodb-replication-from-postgres-using.html
You may also be able to do it with Tungsten Replicator, although it seems designed to be used with MySQL:
http://code.google.com/p/tungsten-replicator/wiki/TRCHeterogeneousReplication

I can remember seeing this topic in Postgresql conference agenda, but I could not find the
link.
Maybe, you are talking about this: https://www.postgresqlconference.org/content/hybrid-applications-using-mongodb-and-postgres

Depending how important transactions are to you, one option is to use MongoDb driver's safe mode and drop Postgresql.
http://www.mongodb.org/display/DOCS/getLastError+Command

How can you expect transactional consistency from Postgres but trust MongoDB for reads? How would you support rollbacks in this scenario? How do you detect when they've gotten out of sync?
I think you're better off going with memcache and implementing a higher level object cache. Alternatively, you could consider a replication slave for reads. If you have performance needs beyond what a dedicated read slave can provide, consider denormalizing your tables on your slave system.
Make sure that any of this is actually needed. For thin tables with PK lookups most modern database engines like Postgres or InnoDB are going to generally keep up with NoSQL solutions. Don't fall into the ROFLSCALE trap
http://www.youtube.com/watch?v=b2F-DItXtZs

I think you can run a mongo replica set.. Let say 3 Slave and 1 Master.. Then in your app you should run all write transactions on Postgresql and then on Mongo ReplicaSet.. After that you can query read operations on Mongo Replica set..
But Synchronizing will be a problem, you should work on it..

you may find some replacement for mongo in here or here that is safer and fast as well.
but I advise to simplify your solution instead of making a complicated design.
Visual Guide to NoSQL Systems
lucky

In mongodb we can specify writeConcern property to specify that it should write to journal/ instances and then send confirmation/ acknowledgement and i think even mongodb has teh concept of transactions. Not sure why we need postgres behind it.

Related

Greenplum selection criteria

I'm getting familiar with the greenplum solution concepts, and trying to understand whether, and if so, when the organisation I work for should use this solution. Our conceptual idea is to setup a kind of central 'datastore' suitable for both OLTP and OLAP access.
My research: this article suggests Greenplum is more suitable for OLAP, and PostgreSQL for OLTP. But I also read about Greenplum improvements for OLTP processing. And in favour of Postgresql, there are also articles like this that suggest that OLAP (eg, a datawarehouse implementation) can be done by means of Postgresql.
So my question is: how to move forward, and what are the main criteria to decide? For example, in case we now have a just a few TB's (1-5), start with a Postgresql cluster (for OLTP+OLAP), and when data volumes grow, move on to Greenplum? Or start straight away with Greenplum?
maybe use postgres if it can handle your use case. If you have you have too much data and need to finish reports and analytics faster; change to greenplum

Why database replication optimizes reading and writing from the database?

Can someone explain why database replication usually optimizes reading and writing from the database? I can not understand how it works... after all, replication adds a lot of extra operations, such as master-slave marge. The type of database I'm considering is a PostgreSQL cloud-based database, making 1.5M records per day, only 1% of this records is a reading.
PostgreSQL replication does not optimize reading and writing.
You can use it for load balancing, but it is not perfect for that:
your application has to know that is must write to one database and read from another
data written at the primary server don't become visible at the standby right away
If your workload consists to 99% of writing, that won't help you at all.
Look into sharding if you want to distribute the load.

MongoDB as failover database

If a system is already running SQL Server, is it possible to use a NoSQL database (i,e MongoDb in particular) as the failover database in a SQL Server failover environment? Such that if the primary SQL node fails the secondary node running/hosting MongoDb takes the primary place.
The short answer to this question is "no". The long answer is anything is possible given enough code and resources.
SQL and MongoDB do not speak the same language, so there would need to be an intermediary that can translate. But this adds another failure mode to the system. It also needs to be complex enough to understand such concepts as "primary". There are connectors out there that will handle either SQL -> MongoDB or MongoDB -> SQL, but I'm not aware of any that are capable of syncing the two in real time. Additionally, it would be up to your application to determine where to query data from and where to write data to. This would be outside something a connector like these will do.

PostgreSQL equivalent of Oplog Tailing in MongoDB

Is there an equivalent process similar to oplog tailing for MongoDB in PostgreSQL? I find it very useful in MongoDB for real-time analytics and building out dashboards on what is going on in the DB by peeking at the log. Unfortunately MongoDB is not useful for my particular DB needs. I'm looking really for a legitimate, non-hackish, way of doing it. This would be put in a production environment and I can't cause more problems than it's worth down the line.
Thanks in advance and lets try to not make this a NO-SQL vs RDBMS debate.
In PostgreSQL 9.4 and newer you can use the test_decoding plugin via pg_recvlogical to stream changes from a replication slot.
In 9.3 and newer pg_xlogdump can decode the transaction log segments, but that means you have to capture and dump each segment, and it really requires WAL archiving to be enabled in order to be practical.
You should also look at:
The pg_stat_statements extension
The built-in pg_stat_activity view
The built-in pg_stat_.. views like pg_stat_user_indexes, etc.

How can one replicate a graph database like Neo4J

I am noob in NoSQL world. but After going thru the basics of how Neo4J works, I didnt unerstand how will replication be fast compared to column or document databases or a plain key value DB.
It has nodes and edges which are nothing but relations between those nodes, something simiar to Joins in a RDBMS.
So how does replication works here as ccompared to an RDBMS ?
Each NoSQL database will behave in different ways when it comes to replication. NoSQL is quite a wide term, so you should not expect to have good replication performance for all of them. In fact Neo4j Enterprise has some support for replication, but the design of Neo4j does not naturally lead to scaling. It is certainly not of of the core objectives, unlike others like Cassandra for example.
What do you mean with replication? Neo4j enterprise comes with an high-availability cluster that replicates your data across a number of machines.
If it is just about replicating the data, you can also shutdown your database and copy the database files (or in enterprise execute a online backup).