I would like to have a Postgres database which is in sync with my production database like a read-replica, but I would also like to write to that database. AWS provides read-replicas to be writable for MySQL and MariaDB but not Postgres. Is there any other way to achieve this?
Well, by definition, read replicas are not writable, so I'm afraid I don't think you'll have much luck with that approach.
Amazon themselves state that read replicas are for read only traffic:
You can create one or more replicas of a given source DB Instance and
serve high-volume application read traffic from multiple copies of
your data, thereby increasing aggregate read throughput.
Now, as you say, for MySQL read replicas can be promoted to masters (and therefore become writable), but pay special attention to the "when needed" below:
Read replicas can also be promoted when needed to become standalone DB
instances.
However, RDS itself does not support multi-master deployments for MySQL.
For PostgreSQL things are even "worse". AWS RDS for Postgres does not (at the time of writing this) support automatic promotion of read-replicas, leaving you with Multi-AZ as your only option.
Outside RDS, multi-master deployments of PostgreSQL (which sounds like what you're looking for) require an even more elaborate setup. You can find more information in the clustering section of their wiki.
As a general note, horizontally scaling relational / SQL databases is probably not something you'll have a lot of fun with and you're bound to run into problems along the way.
That's because they were simply not designed for horizontal scaling the same way that newer "NoSQL" databases are (take a look at MongoDB or Cassandra, etc.). You are far better off scaling them vertically, for as far as that will take you (and it will take you quite some way).
The only relational database that I know of that's (being) built to scale out is CockroachDB, but albeit a very promising solution, that's still in beta -- there's no 1.0 release of it yet.
Related
As per standard Postgres documentation
As with the plain file-system-backup technique, this method can only support restoration of an entire database cluster, not a subset.
From this, I understood that it is not possible to setup PITR for individual databases in a cluster (a.k.a. a database instance holding multiple databases).
If my understanding is incorrect, probably the next part of the question is not relevant, but if not, here it is:
I still do not get the problem in setting this up theoretically as each database is generating its own WAL archive.
The problem here is: I am in need of setting up multiple Postgres clusters and somehow I have only 2 RHEL 7.6 machines to handle this. I am trying to reduce the number of clusters on these 2 machines to only 2. I am planning to create multiple database rather than multiple instances to handle customer applications. But that means that I have to sacrifice PITS, as PITR only can be performed on the instance/cluster level and not on the database level (as per the official documentation).
Could someone please help clarifying my misunderstanding.
You are correct, you can only do PITR on a PostgreSQL database cluster, not on an individual database.
There is only one WAL stream for the complete database cluster; WAL is not split up per database.
Don't hesitate to run several PostgreSQL clusters on a single machine if that is advantageous for you.
There is little overhead in running a second database cluster. The biggest resource that is hogged by a cluster is shared buffers, but you want that to be only a fraction of the available RAM anyway. Most of the memory should be left to the filesystem cache that is shared by all PostgreSQL clusters.
Currently, I have an application consisting of a backend, frontend, and database. The Postgres database has a table with around 60 million rows.
This table has a foreign key to another table: categories. So, if want to count—I know it's one of the slowest operations in a DB—every row from a specific category, on my current setup this will result in a 5-minute query. Currently, the DB, backend, and frontend a just running on a VM.
I've now containerized the backend and the frontend and I want to spin them up in Google Kubernetes Engine.
So my question, will the performance of my queries go up if you also use a container DB and let Kubernetes do some load balancing work, or should I use Google's Cloud SQL? Does anyone have some experience in this?
will the performance of my queries go up if you also use a container DB
Raw performance will only go up if the capacity of the nodes (larger nodes) is larger than your current node. If you use the same node as a kubernetes node it will not go up. You won't get benefits from containers in this case other than maybe updating your DB software might be a bit easier if you run it in Kubernetes. There are many factors that are in play here, including what disk you use for your storage. (SSD, magnetic, clustered filesystem?).
Say if your goal is to maximize resources in your cluster by making use if that capacity when say not many queries are being sent to your database then Kubernetes/containers might be a good choice. (But that's not what the original question is)
should I use Google's Cloud SQL
The only reason I would use Cloud SQL is that if you want to offload managing your SQL db. Other than that you'll get similar performance numbers than running in the same size instance on GCE.
We use PostgreSQL 9.3 in our application. We want to setup PostgreSQL active-active clustering with DRBD. I google it and see a lot of resource about active-passive.
Does PostgreSQL support Active-Active Clustering with DRBD?
No, PostgreSQL does not support active/active clustering with DRBD.
PostgreSQL does not support any form of shared-storage clustering in any way - active/active, active/passive, or otherwise.
It's rather implausible to support shared storage clustering with the architecture in PostgreSQL. Lots of things would need to change. In particular, Pg couldn't lazily write buffers to disk anymore, which would be brutal for performance.
You'll need to use replication. You can use read-replicas (with a few limitations) that way.
There's no support for multi-master, nor is there any support for auto-relaying write queries to the master from a replica. Some people use PgPool-II for routing queries, though it also has some significant limitations.
(I'm involved in work on bi-directional replication, which seeks to offer another alternative, but it's very much alpha. See BDR on the PostgreSQL wiki)
I am noob in NoSQL world. but After going thru the basics of how Neo4J works, I didnt unerstand how will replication be fast compared to column or document databases or a plain key value DB.
It has nodes and edges which are nothing but relations between those nodes, something simiar to Joins in a RDBMS.
So how does replication works here as ccompared to an RDBMS ?
Each NoSQL database will behave in different ways when it comes to replication. NoSQL is quite a wide term, so you should not expect to have good replication performance for all of them. In fact Neo4j Enterprise has some support for replication, but the design of Neo4j does not naturally lead to scaling. It is certainly not of of the core objectives, unlike others like Cassandra for example.
What do you mean with replication? Neo4j enterprise comes with an high-availability cluster that replicates your data across a number of machines.
If it is just about replicating the data, you can also shutdown your database and copy the database files (or in enterprise execute a online backup).
I have an application that can not afford to lose data, so Postgresql is my choice for database (ACID)
However, speed and query advantages of MongoDB are very attractive, but based on what I've read so far, MongoDB can report a successful write which may not have gone to disk, so I can't make it my mission critical db (I'll also need transactions)
I've seen references to people using mysql and MongoDB together, one for the transactions and the other for queries. Please not that I'm not talking about keeping some data in one DB and the rest in another. I want to use Postgresql as a gateway to data entry, and MongoDB for reads.
Are there any resources that offer an architecture/guide for Postgresql + MongoDB usage in this way? I can remember seeing this topic in Postgresql conference agenda, but I could not find the link.
I don't think you'll get much speed using MongoDB just as a cache. It's strengths are replication and horizontal scalability. On one computer you'd make Mongo and Postgres compete for memory, IO bandwidth and processor time.
As you can not afford to loose transactions you'll be better with Postgres only. Its has efficient caching, sophisticated query planner, prepared queries and wide indexing support cause that read-only queries will be very fast - really comparable to MongoDB on a single computer.
Postgres can even scale horizontally now using asynchronous, or, from version 9.1, synchronous replication.
One way to achieve this would be to set up a master-slave replication with the PostgreSQL database as master, and the MongoDB database as slave. You would then do all reads from MongoDB, and all writes to PostgreSQL.
This post discusses such a setup using a tool called Bucardo:
http://blog.endpoint.com/2011/06/mongodb-replication-from-postgres-using.html
You may also be able to do it with Tungsten Replicator, although it seems designed to be used with MySQL:
http://code.google.com/p/tungsten-replicator/wiki/TRCHeterogeneousReplication
I can remember seeing this topic in Postgresql conference agenda, but I could not find the
link.
Maybe, you are talking about this: https://www.postgresqlconference.org/content/hybrid-applications-using-mongodb-and-postgres
Depending how important transactions are to you, one option is to use MongoDb driver's safe mode and drop Postgresql.
http://www.mongodb.org/display/DOCS/getLastError+Command
How can you expect transactional consistency from Postgres but trust MongoDB for reads? How would you support rollbacks in this scenario? How do you detect when they've gotten out of sync?
I think you're better off going with memcache and implementing a higher level object cache. Alternatively, you could consider a replication slave for reads. If you have performance needs beyond what a dedicated read slave can provide, consider denormalizing your tables on your slave system.
Make sure that any of this is actually needed. For thin tables with PK lookups most modern database engines like Postgres or InnoDB are going to generally keep up with NoSQL solutions. Don't fall into the ROFLSCALE trap
http://www.youtube.com/watch?v=b2F-DItXtZs
I think you can run a mongo replica set.. Let say 3 Slave and 1 Master.. Then in your app you should run all write transactions on Postgresql and then on Mongo ReplicaSet.. After that you can query read operations on Mongo Replica set..
But Synchronizing will be a problem, you should work on it..
you may find some replacement for mongo in here or here that is safer and fast as well.
but I advise to simplify your solution instead of making a complicated design.
Visual Guide to NoSQL Systems
lucky
In mongodb we can specify writeConcern property to specify that it should write to journal/ instances and then send confirmation/ acknowledgement and i think even mongodb has teh concept of transactions. Not sure why we need postgres behind it.