I am using test_decoding plugin (have issues with pglogical plugin unable to detect the replication slot) on RDS Postgres 11. Replication slot lag seems to increase drastically (~33gb) even when there is no user activity.
As soon as I send DML operations from the application, I can see the data is replicated to the cutover and replication slot lag have come down to 224 bytes and restart_lsn/confirmed_flush_lsn value have changed.
Can you please advise why the slot piles up when there is no user activity?
Anyway to read what is the data in the slot? Also did anyone used this plugin in production.
I am using the following config:
max_wal_size=2GB
max_worker_process=21
max_wal_senders=15
max_parallel_workers=8
autovauum_max_workers=3
Related
I am currently working with a Managed Instance from TimescaleDB and have incoming data. Now, I am setting up another instance, but this time it will be self-hosted and managed by me.
As such, I would like to setup some sort of replication so that the data coming into the managed instance is accessible in the self-hosted one (after a while, does not have to be live). I've looked into setting up replication with WAL streaming, however I've run into an issue.
Most replication workflows require changes in postgresql.conf and pg_hba.conf files which I cannot access (Managed instance). TimescaleDB support also says modifying this is not possible.
Is there a way to achieve this, without access to those files?
I wanted to check if there is a way to stop and resume PostgreSQL replication using pglogical? For some reason, if either the publisher or subscriber needs to restart and go offline for sometime (or connectivity issues because of some n/w issues), is there a way to stop the replication and resume it again? I know it is not a relevant example but AWS DMS (which used Postgres native logical replication) gives you an option to stop/resume the replication. Wanted to check if there is a similar option available in pglogical.
Thanks
Sure there is. I presume You have pglogical extension installed and there is standard logical replication:
select pglogical.alter_subscription_disable('subscription_name');
You can use a WHERE clause to aim it specifically:
select pglogical.alter_subscription_disable(subscription_name) from pglogical.subscriptions where writer = 'name_of_writer';
I have set up Postgres 11 streaming replication cluster. Standby is a "hot standby". Is it possible to attach the second standby as a warm standby?
I assume that you are talking about WAL file shipping when you are speaking of a “warm standby”.
Sure, there is nothing that keeps you from adding a second standby that ships WAL files rather than directly attaching to the primary, but I don't see the reason for that.
According to this decent documentation of Postgres 11 streaming replication architecture, you can set the sync_state of a 2nd slave instance to be potential. This means that if/when the 1st sync slave fails, the detected failure (through ACK communication) will result in the 2nd slave will move from potential to sync becoming the active replication server. --see Section 11.3 - Managing Multiple Stand-by Servers in that link for more details.
I'm using PostgreSQL 9.6.
Is it possible to have replication and incremental backup on the same setup
I would like to have high availability setup. On the main site I will have two servers with replication between them and pgpool will handle the failover in case the primary server goes down.
I would also like to have another remote site for geographical redundancy. This site will be active only if the main site is no longer functioning. The remote site does not need to be updated in real-time. Therefore, if it saves resources I thought about having incremental backup and restore from the main site to the remote site. In other words the main site primary server will replicate its data to the main site secondary server. In addition it will also generate incremental backup and that backup will be restored on the remote site.
From your answer I understood that it is possible to have both replication and incremental backup. However, will this solution be better (resource consumption, reliability etc.) than just have replication to both the main secondary server and the remote site server?
Yes, you can have PITR and streaming replication in use at the same time. Streaming replication can fall back to restoring from the WAL archive if it loses direct connectivity to the master too.
Lots more detail in the manual. It's hard to be more specific with a rather open and vague question - what, exactly, do you mean by "incremental backup"? etc.
I have been observing that my PostgreSQL read replica shows periodic delay for replication lags. The lag seems to build to up to 30-40 minutes and then automatically goes down to 0. There is a correlation with CPU Utilization but it's nowhere close to CPU limit.
Read traffic comes from a reporting software called DOMO. DOMO periodically copies a large chunk of data & full tables into its warehouse.
Here's AWS Cloudwatch graph. The red line shows Replication Lag in seconds. The blue line shows the CPU load.
Lag vs CPU
Lag vs Network Out
Lag vs Read IOPS
Lag vs Write IOPS
Cloud: Amazon RDS
Instance Size: db.m3.2xlarge
PostgresSQL version: 9.3
Postgres Settings:
Shared Buffers (Set by RDS) = 7.3 GB (956978 * 8KB)
Updates
Tried setting Shared Buffers to 1GB (didn't help)
Updates June, 5 2017
I created a branch new replica for my database and pointed the reporting software (DOMO) at it. Things in the new instance look stable for now. The old replica which has no read traffic now is stable as well. Beginning to suspect some type of AWS config issue or something to do what remaining artifacts in the database (vacuum?).
RDS read replica lag metric isn't updated when there's nothing to replicate. If master database has no changes to replicate, then replica would only be updated on time-forced so called checkpoint - periodic sync of data from write ahead log to the tables.
This would cause the graph to look like above. To see the real graph data you'd have to generate some traffic on the master, for example update some special sequence every minute or even every second - depending how much resolution you need.
Also WAL-generation log of master and network utilization on replica graphs would be interesting - the alternative explanation would be that there are too much traffic (IO or network) for replica to handle and it can only catch-up when traffic stops.