PostgreSQL equivalent of Oplog Tailing in MongoDB

PostgreSQL equivalent of Oplog Tailing in MongoDB - mongodb

Is there an equivalent process similar to oplog tailing for MongoDB in PostgreSQL? I find it very useful in MongoDB for real-time analytics and building out dashboards on what is going on in the DB by peeking at the log. Unfortunately MongoDB is not useful for my particular DB needs. I'm looking really for a legitimate, non-hackish, way of doing it. This would be put in a production environment and I can't cause more problems than it's worth down the line.
Thanks in advance and lets try to not make this a NO-SQL vs RDBMS debate.

In PostgreSQL 9.4 and newer you can use the test_decoding plugin via pg_recvlogical to stream changes from a replication slot.
In 9.3 and newer pg_xlogdump can decode the transaction log segments, but that means you have to capture and dump each segment, and it really requires WAL archiving to be enabled in order to be practical.
You should also look at:
The pg_stat_statements extension
The built-in pg_stat_activity view
The built-in pg_stat_.. views like pg_stat_user_indexes, etc.

Related

Greenplum selection criteria

I'm getting familiar with the greenplum solution concepts, and trying to understand whether, and if so, when the organisation I work for should use this solution. Our conceptual idea is to setup a kind of central 'datastore' suitable for both OLTP and OLAP access.
My research: this article suggests Greenplum is more suitable for OLAP, and PostgreSQL for OLTP. But I also read about Greenplum improvements for OLTP processing. And in favour of Postgresql, there are also articles like this that suggest that OLAP (eg, a datawarehouse implementation) can be done by means of Postgresql.
So my question is: how to move forward, and what are the main criteria to decide? For example, in case we now have a just a few TB's (1-5), start with a Postgresql cluster (for OLTP+OLAP), and when data volumes grow, move on to Greenplum? Or start straight away with Greenplum?

maybe use postgres if it can handle your use case. If you have you have too much data and need to finish reports and analytics faster; change to greenplum

Is it safe taking an SQL export form a running production GCP SQL service?

We have one Google Cloud SQL instance with 1 vCPU for production. I want to grab a copy of the data by exporting to a bucket. Is this safe to do? As in might it block other operations on the instance?

I think it's important to take into consideration the RDBMS that you are using, it's mentioned in here that PostgreSQL has issues when handling big blobs in an export, and at this other SO post there's an answer with the most votes with hints to have an smoother export, since it can lead to DBs getting unresponsive, which is a pretty well known fact.
In the case of MySQL, the product doc have some tips for this case in this article where it stated:
"If the server is running, it is necessary to perform appropriate locking so that the server does not change database contents during the backup"
And you can achive this by using mysqldump --lock-tables=false into your export command.

postgres copy database to another server reduces database size

Installed postgres 9.1 in both the machine.
Initially the DB size is 7052 MB then i used the following command for copy to another server.
pg_dump -C dbname | bzip2 | ssh remoteuser#remotehost "bunzip2 | psql dbname"
After successfully copies, In destination machine i check size it shows 6653 MB.
then i checked for table count its same.
Has there been data loss? Is there missing data?
Note:
Two machines have same hardware and software configuration.
i used:
SELECT pg_size_pretty(pg_database_size('dbname'));

One of the PostgreSQL's most sophisticated features is so called Multi-Version Concurrency Control (MVCC), a standard technique for avoiding conflicts between reads and writes of the same object in database. MVCC guarantees that each transaction sees a consistent view of the database by reading non-current data for objects modified by concurrent transactions. Thanks to MVCC, PostgreSQL has great scalability, a robust hot backup tool and many other nice features comparable to the most advanced commercial databases.
Unfortunately, there is one downside to MVCC, the databases tend to grow over time and sometimes it can be a problem. In recent versions of PostgreSQL there is a separate server process called the autovacuum daemon (pg_autovacuum), whose purpose is to keep the database size reasonable. It does that by trying to recover reusable chunks of the database files. Still, there are many scenarios that will force the database to grow, even if the amount of the useful data in it doesn't really change. That happens typically if you have lots of UPDATE and/or DELETE statements in the applications that are using the database.
When you do a COPY, you recover extraneous space and so your copied DB appears smaller.

That looks normal. Databases are often smaller after restore, because a newly created b-tree index is more compact than one that's been progressively built by inserts. Additionally, UPDATEs and DELETEs leave empty space in the tables.
So you have nothing to worry about. You'll find that if you diff an SQL dump from the old DB and a dump taken from the just-restored DB, they'll be the same except for comments.

Upsert in Amazon RedShift without Function or Stored Procedures

As there is no support for user defined functions or stored procedures in RedShift, how can i achieve UPSERT mechanism in RedShift which is using ParAccel, a PostgreSQL 8.0.2 fork.
Currently, i'm trying to achieve UPSERT mechanism using IF...THEN...ELSE... statement
e.g:-
IF NOT EXISTS(SELECT...WHERE(SELECT..))
THEN INSERT INTO tblABC() SELECT... FROM tblXYZ
ELSE UPDATE tblABC SET.,.,.,. FROM tblXYZ WHERE...
which is giving me error. As i'm writing this code independently without including it in function or SP's.
So, is there any solution to achieve UPSERT.
Thanks

You should probably read this article on upsert by depesz. You can't rely on SERIALIABLE for this since, AFAIK, ParAccel doesn't support full serializability support like in Pg 9.1+. As outlined in that post, you can't really do what you want purely in the DB anyway.
The short version is that even on current PostgreSQL versions that support writable CTEs it's still hard. On an 8.0 based ParAccel, you're pretty much out of luck.
I'd do a staged merge. COPY the new data to a temporary table on the server, LOCK the destination table, then do an UPDATE ... FROM followed by an INSERT INTO ... SELECT. Doing the data uploads in big chunks and locking the table for the upserts is reasonably in keeping with how Redshift is used anyway.
Another approach is to externally co-ordinate the upserts via something local to your application cluster. Have all your tools communicate via an external tool where they take an "insert-intent lock" before doing an insert. You want a distributed locking tool appropriate to your system. If everything's running inside one application server, it might be as simple as a synchronized singleton object.

How can I use MongoDB as a cache for Postgresql?

I have an application that can not afford to lose data, so Postgresql is my choice for database (ACID)
However, speed and query advantages of MongoDB are very attractive, but based on what I've read so far, MongoDB can report a successful write which may not have gone to disk, so I can't make it my mission critical db (I'll also need transactions)
I've seen references to people using mysql and MongoDB together, one for the transactions and the other for queries. Please not that I'm not talking about keeping some data in one DB and the rest in another. I want to use Postgresql as a gateway to data entry, and MongoDB for reads.
Are there any resources that offer an architecture/guide for Postgresql + MongoDB usage in this way? I can remember seeing this topic in Postgresql conference agenda, but I could not find the link.

I don't think you'll get much speed using MongoDB just as a cache. It's strengths are replication and horizontal scalability. On one computer you'd make Mongo and Postgres compete for memory, IO bandwidth and processor time.
As you can not afford to loose transactions you'll be better with Postgres only. Its has efficient caching, sophisticated query planner, prepared queries and wide indexing support cause that read-only queries will be very fast - really comparable to MongoDB on a single computer.
Postgres can even scale horizontally now using asynchronous, or, from version 9.1, synchronous replication.

One way to achieve this would be to set up a master-slave replication with the PostgreSQL database as master, and the MongoDB database as slave. You would then do all reads from MongoDB, and all writes to PostgreSQL.
This post discusses such a setup using a tool called Bucardo:
http://blog.endpoint.com/2011/06/mongodb-replication-from-postgres-using.html
You may also be able to do it with Tungsten Replicator, although it seems designed to be used with MySQL:
http://code.google.com/p/tungsten-replicator/wiki/TRCHeterogeneousReplication

I can remember seeing this topic in Postgresql conference agenda, but I could not find the
link.
Maybe, you are talking about this: https://www.postgresqlconference.org/content/hybrid-applications-using-mongodb-and-postgres

Depending how important transactions are to you, one option is to use MongoDb driver's safe mode and drop Postgresql.
http://www.mongodb.org/display/DOCS/getLastError+Command

How can you expect transactional consistency from Postgres but trust MongoDB for reads? How would you support rollbacks in this scenario? How do you detect when they've gotten out of sync?
I think you're better off going with memcache and implementing a higher level object cache. Alternatively, you could consider a replication slave for reads. If you have performance needs beyond what a dedicated read slave can provide, consider denormalizing your tables on your slave system.
Make sure that any of this is actually needed. For thin tables with PK lookups most modern database engines like Postgres or InnoDB are going to generally keep up with NoSQL solutions. Don't fall into the ROFLSCALE trap
http://www.youtube.com/watch?v=b2F-DItXtZs

I think you can run a mongo replica set.. Let say 3 Slave and 1 Master.. Then in your app you should run all write transactions on Postgresql and then on Mongo ReplicaSet.. After that you can query read operations on Mongo Replica set..
But Synchronizing will be a problem, you should work on it..

you may find some replacement for mongo in here or here that is safer and fast as well.
but I advise to simplify your solution instead of making a complicated design.
Visual Guide to NoSQL Systems
lucky

In mongodb we can specify writeConcern property to specify that it should write to journal/ instances and then send confirmation/ acknowledgement and i think even mongodb has teh concept of transactions. Not sure why we need postgres behind it.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

PostgreSQL equivalent of Oplog Tailing in MongoDB - mongodb

Related

Greenplum selection criteria

Is it safe taking an SQL export form a running production GCP SQL service?

postgres copy database to another server reduces database size

Upsert in Amazon RedShift without Function or Stored Procedures

How can I use MongoDB as a cache for Postgresql?

Categories

Resources