We have successfully deployed Postgres 9.3 with streaming replication (WAL replication). We currently have 2 slaves, the second slave being a cascaded slave from the first slave. Both slaves are hot-standby's with active read-only connections in use.
Due to load, we'd like to create a 3rd slave, with slightly different hardware specifications, and a different application using it as a read-only database in more of a Data Warehouse use-case. As it's for a different application, we'd like to optimize it specifically for that application, and improve performance by utilizing some additional indexes. For size and performances purposes, we'd rather not have those indexes on the master, or the other 2 slaves.
So my main question is, can we create different indexes on slaves for streaming replication, and if not, is there another data warehouse technique that I'm missing out on?
So my main question is, can we create different indexes on slaves for streaming replication
No, you can't. Streaming physical replication works at a lower level than that, copying disk blocks around. It doesn't really pay attention to "this is an index update," "this is an insert into a table," etc. It does not have the information it'd need to maintain standby-only indexes.
and if not, is there another data warehouse technique that I'm missing out on?
Logical replication solutions like:
Londiste
pglogical
Slony-I
can do what you want. They send row changes, so the secondary server can have additional indexes.
Related
I have a distributed table,but this table only has one replica,only one replica doesn't have
ha, I want and one more replica for the table,can i? how to do?
I have search online help docs,but didn't find any solution.
Replica's in Citus are not an HA solution. For HA you will need to setup any postgres tooling for every member in your cluster to stream WAL to another node. Citus specializes in distributed queries and separates that problem from HA by relying on proven technology available in the postgres ecosystem.
If you want to scale out reads adding a replica can help. However adding replica's have a significantly high impact on write throughput. Before adding replica's please thoroughly test that your database can handle your expected load. And yet again, if HA is your goal, don't add Citus replica's instead, apply postgres HA solutions to every worker and coordinator.
Increasing the replica count of an already distributed table is due to above reasoning not an operation Citus provides out of the box. Easiest would be to create a new table and use an INSERT INTO SELECT clause to reinsert the data into a table with appropriate shard_count and replica's according to your application needs.
My problem statements: I can't use bucardo (no approval for creating triggers on prod tables) pglogical only works on Master node.(the publisher). I want to replicate from slave (this is my end goal)
If you have v10 or better, you should definitely use logical replication. The burden on the primary server is minimal.
If for whatever incomprehensible reason you absolutely have to replicate from a standby, you'll have to use streaming replication. Then you have to replicate the whole cluster, but there is no better solution.
One way out could be to put walbouncer in between, which can filter out WAL records for all but one database, so that you ate effectively replicating just one database.
Disclaimer: walbouncer was developed by my company (it is open source though).
I am planning to migrate my production oracle cluster to postgresql cluster. Current systems support 2000TPS and in order to support that TPS, I would be very thankful if someone could clarify bellow.
1) What is the best replication strategy ( Streaming or DRBD based replication)
2) In streaming replication, can master process traffic without slave and when slave come up does it get what lost in down time ?
About TPS - it depends mainly on your HW and PostgreSQL configuration. I already wrote about it on Stackoverflow in this answer. You cannot expect magic on some notebook-like configuration. Here is my text "PostgreSQL and high data load application".
1) Streaming replication is the simplest and almost "painless" solution. So if you want to start quickly I highly recommend it.
2) Yes but you have to archive WAL logs. See below.
All this being said here are links I would recommend you to read:
how to set streaming replication
example of WAL log archiving script
But of course streaming replication has some caveats which you should know:
problem with increasing some parameters like max_connections
how to add new disk and new tablespace to master and replicas
There is no “best solution” in this case. Which solution you pick depends on your requirements.
Do you need a guarantee of no data lost?
How big a performance hit can you tolerate?
Do you need failover or just a backup?
Do you need PITR (Point In Time Recovery)?
By default, I think a failed slave will be ignored. Depending on your configuration, the slave might take a long time to recover after e.g. a boot.
I'd recommend you read https://www.postgresql.org/docs/10/static/different-replication-solutions.html
I have decided to start developing a little web application in my spare time so I can learn about MongoDB. I was planning to get an Amazon AWS micro instance and start the development and the alpha stage there. However, I stumbled across a question here on Stack Overflow that concerned me:
But for durability, you need to use at least 2 mongodb server
instances as master/slave. Otherwise you can lose the last minute of
your data.
Is that true? Can't I just have my box with everything installed on it (Apache, PHP, MongoDB) and rely on the data being correctly stored? At least, there must be a config option in MongoDB to make it behave reliably even if installed on a single box - isn't there?
The information you have on master/slave setups is outdated. Running single-server MongoDB with journaling is a durable data store, so for use cases where you don't need replica sets or if you're in development stage, then journaling will work well.
However if you're in production, we recommend using replica sets. For the bare minimum set up, you would ideally run three (or more) instances of mongod, a 'primary' which receives reads and writes, a 'secondary' to which the writes from the primary are replicated, and an arbiter, a single instance of mongod that allows a vote to take place should the primary become unavailable. This 'automatic failover' means that, should your primary be unable to receive writes from your application at a given time, the secondary will become the primary and take over receiving data from your app.
You can read more about journaling here and replication here, and you should definitely familiarize yourself with the documentation in general in order to get a better sense of what MongoDB is all about.
Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
In some cases, you can use replication to increase read capacity. Clients have the ability to send read and write operations to different servers. You can also maintain copies in different data centers to increase the locality and availability of data for distributed applications.
Replication in MongoDB
A replica set is a group of mongod instances that host the same data set. One mongod, the primary, receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set.
The primary accepts all write operations from clients. Replica set can have only one primary. Because only one member can accept write operations, replica sets provide strict consistency. To support replication, the primary logs all changes to its data sets in its oplog. See primary for more information.
I have a master postgres database M with tables M.A1, etc.
I have a slave database S with tables M.A1, etc, populated and maintained by Skytools/londiste. Everything works great.
I don't know exactly how it works, since I am not the person who set up my Skytools instance. I have just read some pieces of documentation and interacted with it slightly.
I would like to add some auxiliary read/write tables to S: S.B1 . (I want to join against S.A1, and not add any extra load to M, which is why I want to install B1 on S. Is it possible to maintain this setup?
If I create a new table S.B1 on a Skytools/Londiste slave, will that interfere with replication of table A1?
Edit to add followup:
How safe would such a setup be, with respect to slave failures impacting the master?
I am not very concerned about replication lag or downtime on my analytics slave (but I would need a way to eventually recover without taking downtime on the master).
I am very concerned about a slave failure causing the master to grow its replication queue indefinitely and consuming HD/RAM/resources on the master. How would I mitigate that? Is there a way to set a tolerance so that the master just drops the slave connection if the slave falls too far behind?
Part 2
If I do get this set up working, I'll want to have a slave backup of S.B1 somewhere, in case S fails.
Is it possible to set up a secondary slave T, and configure Skytools/Londiste to replicate S.B1 to T.B1, while M.A1 is also replicating to S.A1?
What caveats are gotchas should I be concerned about?
Thank you so much for advice and pointers.
Firstly I would really suggest that you spend the time in understanding how skytools pgq and londiste work. It is not very complicated but the documentation is rather scant.
For your first question - yes you can have other tables on the slave which are not replicated from the master.
Your second question is a bit more involved and I am not sure if your requirement is entirely clear.
Assuming the tables you want to replicate from the slave to a secondary slave are an entirely separate group from the tables you are replicating for the master to the initial slave then you could install pgq on the initial slave and londiste on the secondary slave , create a new queue and add those tables to that queue which you wish to replicate to the secondary slave.
You can't use skytools/Londiste for cascading replication e.g. master -> slave1 -> slave2 so it is not obvious what benefit you would get from the partial replication of data from one slave to another.
It would be simpler to have all the tables on the master and then just one queue for replication to a slave and then for resilience have a warm standby of the master see
explanation for 8.4 from which you could do a point in time recovery if necessary and then rebuild the slave from a consistent master. Skytools has packages to help you with setting up warm standby/pitr.
If you cannot have all the tables on the master then you might be better to maintain a warm standby of the slave for pitr recovery but bear in mind you would probably have to resubscribe the tables replicated from the master after doing such a restore. This might be complicated if the slave tables you are joining to the master tables have foreign key constraints.
If you are postgres 9 there is streaming replication which may also serve but I have not used this.
Just to expand on the topic, if anyone reaches it, you can have multiple queues as Gavin suggested above, but also you can have cascading replication as of Skytools version 3 (March 2012). And indeed you can replicate any subset of tables, and you can even have tables renamed on destination, if needed.