pg_wal folder on standby node not removing files (postgresql-11) - postgresql

I have master-slave (primary-standby) streaming replication set up on 2 physical nodes. Although the replication is working correctly and walsender and walreceiver both work fine, the files in the pg_wal folder on the slave node are not getting removed. This is a problem I have been facing every time I try to bring the slave node back after a crash. Here are the details of the problem:
postgresql.conf on master and slave/standby node
# Connection settings
# -------------------
listen_addresses = '*'
port = 5432
max_connections = 400
tcp_keepalives_idle = 0
tcp_keepalives_interval = 0
tcp_keepalives_count = 0
# Memory-related settings
# -----------------------
shared_buffers = 32GB # Physical memory 1/4
##DEBUG: mmap(1652555776) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
#huge_pages = try # on, off, or try
#temp_buffers = 16MB # depends on DB checklist
work_mem = 8MB # Need tuning
effective_cache_size = 64GB # Physical memory 1/2
maintenance_work_mem = 512MB
wal_buffers = 64MB
# WAL/Replication/HA settings
# --------------------
wal_level = logical
synchronous_commit = remote_write
archive_mode = on
archive_command = 'rsync -a %p /TPINFO01/wal_archive/%f'
#archive_command = ':'
max_wal_senders=5
hot_standby = on
restart_after_crash = off
wal_sender_timeout = 5000
wal_receiver_status_interval = 2
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
hot_standby_feedback = on
random_page_cost = 1.5
max_wal_size = 5GB
min_wal_size = 200MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 30min
# Logging settings
# ----------------
log_destination = 'csvlog,syslog'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql_%Y%m%d.log'
log_truncate_on_rotation = off
log_rotation_age = 1h
log_rotation_size = 0
log_timezone = 'Japan'
log_line_prefix = '%t [%p]: [%l-1] %h:%u#%d:[PG]:CODE:%e '
log_statement = all
log_min_messages = info # DEBUG5
log_min_error_statement = info # DEBUG5
log_error_verbosity = default
log_checkpoints = on
log_lock_waits = on
log_temp_files = 0
log_connections = on
log_disconnections = on
log_duration = off
log_min_duration_statement = 1000
log_autovacuum_min_duration = 3000ms
track_functions = pl
track_activity_query_size = 8192
# Locale/display settings
# -----------------------
lc_messages = 'C'
lc_monetary = 'en_US.UTF-8' # ja_JP.eucJP
lc_numeric = 'en_US.UTF-8' # ja_JP.eucJP
lc_time = 'en_US.UTF-8' # ja_JP.eucJP
timezone = 'Asia/Tokyo'
bytea_output = 'escape'
# Auto vacuum settings
# -----------------------
autovacuum = on
autovacuum_max_workers = 3
autovacuum_vacuum_cost_limit = 200
auto_explain.log_min_duration = 10000
auto_explain.log_analyze = on
include '/var/lib/pgsql/tmp/rep_mode.conf' # added by pgsql RA
recovery.conf
primary_conninfo = 'host=xxx.xx.xx.xx port=5432 user=replica application_name=xxxxx keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'rsync -a /TPINFO01/wal_archive/%f %p'
recovery_target_timeline = 'latest'
standby_mode = 'on'
Result of pg_stat_replication on master/primary
select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 8868
usesysid | 16420
usename | xxxxxxx
application_name | sub_xxxxxxx
client_addr | xx.xx.xxx.xxx
client_hostname |
client_port | 21110
backend_start | 2021-06-10 10:55:37.61795+09
backend_xmin |
state | streaming
sent_lsn | 97AC/589D93B8
write_lsn | 97AC/589D93B8
flush_lsn | 97AC/589D93B8
replay_lsn | 97AC/589D93B8
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
-[ RECORD 2 ]----+------------------------------
pid | 221533
usesysid | 3541624258
usename | replica
application_name | xxxxx
client_addr | xxx.xx.xx.xx
client_hostname |
client_port | 55338
backend_start | 2021-06-12 21:26:40.192443+09
backend_xmin | 72866358
state | streaming
sent_lsn | 97AC/589D93B8
write_lsn | 97AC/589D93B8
flush_lsn | 97AC/589D93B8
replay_lsn | 97AC/589D93B8
write_lag |
flush_lag |
replay_lag |
sync_priority | 1
sync_state | sync
Steps I had followed to bring the standby node back from a crash
On master started select pg_start_backup('backup');
rsync data folder and wal_archive folder from master/primary to slave/standby
On master `select pg_stop_backup();
Restart postgres on slave/standby node.
This resulted in the slave/standby node being in sync with master and has been working fine since then.
On the primary/master node the pg_wal folder gets its files removed after nearly 2 hours. But the files on the slave/standby node are not removed. Almost all the files are in the archive_status folder in the pg_wal folder with the <filename>.done as well on the standby node.
I guess the problem can go away if I perform a switchover, but I still want to understand the reason why it is happening.
Please see, I am also trying to find the answers to some of the following questions as well:
Which process writes the files to pg_wal on the slave/standby node? I am following this link
https://severalnines.com/database-blog/postgresql-streaming-replication-deep-dive
Which parameter removes the files from the pg_wal folder on the standby node?
Do they need to go to wal_archive folder on the disk just like they go to wal_archive folder on the master node?

You didn't describe omitting pg_replslot during your rsync, as the docs recommend. If you didn't omit it, then now your replica has a replication slot which is a clone of the one on the master. But if nothing ever connects to that slot on the replica and advances the cutoff, then the WAL never gets released to recycling. To fix you just need to shutdown the replica, remove that directory, restart it, (and wait for the next restart point to finish).
Do they need to go to wal_archive folder on the disk just like they go to wal_archive folder on the master node?
No, that is optional not necessary. It is set by archive_mode = always if you want it to happen.

Related

Using sql query in shell script

I have a shell script that uses output from a sql query and based on the value of one column sends out an alert. However i don't think it's capturing the value. although the value is not greater than 0 yet it still sends out an email.
Any idea where i am going wrong? Thanks.
............................................................................
#!/bin/sh
psql -d postgres -U postgres -c "select pid,application_name,pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) sending_lag,pg_wal_lsn_diff(sent_lsn,flush_lsn) receiving_lag,pg_wal_lsn_diff(flush_lsn, replay_lsn) replaying_lag,pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) total_lag from pg_stat_replication;"| while read total_lag;
do
echo $total_lag
lag1=$(echo $total_lag)
done
if [[ $lag1 -ge 0 ]]
then echo "Current replication lag is $lag1" |mail -s "WARNING!" abcd#mail.com
else
echo "No issue"
fi
............................................................................
this is the output of above query
pid | application_name | sending_lag | receiving_lag | replaying_lag | total_lag
-------+------------------+-------------+---------------+---------------+-----------
27823 | db123 | 0 | 0 | 0 | 0
27824 | db023 | 0 | 0 | 0 | 0

ERROR: user "xxx" cannot be dropped because permission dependency is found

I'm trying to drop a user from a Redshift cluster but receive the following error:
drop user "xxx";
ERROR: user "xxx" cannot be dropped because permission dependency is found
I've installed the admin views and revoked all privileges from all tables and schemas. I cannot find any reference to this specific error. It is also not included in this instructional: https://aws.amazon.com/premiumsupport/knowledge-center/redshift-user-cannot-be-dropped/
select ddl from admin.v_generate_user_grant_revoke_ddl where ddltype='revoke' and grantee='xxx' order by objseq, grantseq desc;
ddl
-----
(0 rows)
select ddl, grantor, grantee from admin.v_generate_user_grant_revoke_ddl where grantee='xxx' and ddltype='grant' and objtype <>'default acl' order by objseq,grantseq;
ddl | grantor | grantee
-----+---------+---------
(0 rows)
select * from pg_user where usename = 'xxx';
usename | usesysid | usecreatedb | usesuper | usecatupd | passwd | valuntil | useconfig
------------+----------+-------------+----------+-----------+----------+----------+-----------
xxx | 110 | f | f | f | xxx | |
(1 row)
select * from pg_default_acl where defacluser=110;
defacluser | defaclnamespace | defaclobjtype | defaclacl
------------+-----------------+---------------+-----------
(0 rows)
The user in not in any groups either. Any guidances is appreciated.
The user had not run queries for at least the last two weeks, he was not very active. His only access was through a Redshift/Excel ODBC setup. I did not check initially, but a day later there were no active sessions for him. I re-ran the drop user command and got the expected result. There must have been some lingering 'something'. For reference I ran this cmd to see who had active session: select * from stv_sessions;. My problem has been resolved by trying again the next day. https://docs.aws.amazon.com/redshift/latest/dg/r_STV_SESSIONS.html

Bad or inaccessible location specified in external data source

I'm trying to save a file from Azure File Storage into Azure SQL Database table varbinary(max) column (store whole content as advised in this answer). I've tried a few times to adjust my SQL query but without success. Here's the code which results in error 'Bad or inaccessible location specified in external data source "my_Azure_Files".' when it invokes OPENROWSET:
OPEN MASTER KEY DECRYPTION BY PASSWORD = 'mypassword123'
GO
CREATE DATABASE SCOPED CREDENTIAL [https://mystorageaccount.file.core.windows.net/]
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sas_token_generated_on_azure_portal';
CREATE EXTERNAL DATA SOURCE my_Azure_Files
WITH (
LOCATION = 'https://mystorageaccount.file.core.windows.net/test',
CREDENTIAL = [https://mystorageaccount.file.core.windows.net/],
TYPE = BLOB_STORAGE
);
Insert into dbo.myTable(targetColumn)
Select BulkColumn FROM OPENROWSET(
BULK 'test.csv',
DATA_SOURCE = 'my_Azure_Files',
SINGLE_BLOB) AS testFile;
CLOSE MASTER KEY;
GO
I'm able to download the test.csv file by a web-browser using the same SAS token and url path. I'm also able to verify that the credential and the external source are successfully created in the database:
+-------------------------------------------------+------------------+-----------------------------------------------------+-------------------------+------------------+---------------------------+---------------+---------------+----------------+--------------------+----------+
| data_source_id | name | location | type_desc | type | resource_manager_location | credential_id | database_name | shard_map_name | connection_options | pushdown |
+-------------------------------------------------+------------------+-----------------------------------------------------+-------------------------+------------------+---------------------------+---------------+---------------+----------------+--------------------+----------+
| 65540 | my_Azure_Files | https://mystorageaccount.file.core.windows.net/test | BLOB_STORAGE | 05/01/1900 00:00 | NULL | 65539 | NULL | NULL | NULL | ON |
| name | principal_id | credential_id | credential_identity | create_date | modify_date | target_type | target_id | | | |
+-------------------------------------------------+------------------+-----------------------------------------------------+-------------------------+------------------+---------------------------+---------------+---------------+----------------+--------------------+----------+
| https://mystorageaccount.file.core.windows.net/ | 1 | 65539 | SHARED ACCESS SIGNATURE | 15/07/2020 13:14 | 15/07/2020 13:14 | NULL | NULL | | | |
When creating SAS on Azure portal I checked all allowed resource types and all allowed permissions, except 'Delete'. I also removed the leading '?' from SAS to use in the SECRET field.
I've tried variations of TYPE = BLOB_STORAGE and TYPE = HADOOP as well as SINGLE_BLOB, SINGLE_CLOB and SINGLE_NCLOB parameters.
Please help me solve my problem.
By following below steps, able to successfully insert into the target table:
While generating the SAS, please select Allowed Resource Type as ‘Container’ and ‘Object’:
Copy the SAS and use below command:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password#123'
Use SAS token generated without ‘?’ at the start and create Scoped Credentials:
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential WITH IDENTITY =
'SHARED ACCESS SIGNATURE', SECRET = 'sv=2019-10-
10XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
Create External Data Source referencing your blob path:
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://mystorageaccount.file.core.windows.net'
, CREDENTIAL= MyAzureBlobStorageCredential
);
Run the insert using OPENROWSET:
Insert into dbo.test(name1)
Select BulkColumn FROM OPENROWSET(
BULK 'test/test.csv',
DATA_SOURCE = 'MyAzureBlobStorage',
SINGLE_BLOB) AS testFile;
Can also use Bulk insert:
BULK INSERT dbo.test
FROM 'test/test.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage',
FORMAT = 'CSV');
Assuming table dbo.test is already created

Converting bytes to megabytes on the fly

In the PS command below I'm getting the name, creationTimeStamp, Disk_Size and storageBytes of yesterday's snapshots in my gcp project and outputting the data to a csv file (which is later converted to HTML and emailed):
$csv = gcloud --project $gcpProject compute snapshots list --format="csv(name,creationTimestamp,diskSizeGb,storageBytes)" --filter="creationTimestamp.date('%Y-%m-%d')=$yesterday" | Out-File C:\data.csv
The result looks something like this (the number of snapshots displayed varies):
+---------------------------+-------------------------------+--------------+---------------+
| NAME | CREATION_TIMESTAMP | DISK_SIZE_GB | STORAGE_BYTES |
+---------------------------+-------------------------------+--------------+---------------+
| snapshot1-us-central1 | 2019-10-24T19:24:09.061-07:00 | 50 | 1250586048 |
| snapshot2-data-us-east1 | 2019-10-24T19:01:49.791-07:00 | 150 | 425018600 |
+---------------------------+-------------------------------+--------------+---------------+
This is good except that the STORAGE_BYTES data is all in bytes thus making it hard to read. How can I record this data in MB instead in the csv file (or just replacing this data in the csv file that's in bytes with MB)
$data=Get-Content C:\data.csv
$NewData=#($data[0..2])#Adding Header in a new Variable
$Data[3..$($data.Count - 2)]|Foreach{
$startRange=$_.LastIndexOf("| ")+1
$length=$_.LastIndexOf(" |") - $_.LastIndexOf("| ")
#Converting Bytes into MB and replacing in the line
$NewData+=$_.Replace(($_.Substring($startRange,$length)).Trim(),"$([math]::Round(($_.Substring($startRange,$length)/1MB),2)) ")
}
#Adding last line
$NewData+=$data[$data.Count - 1]
#Modifying exsiting file
$NewData |Set-Content C:\data.csv -Force
I think, it should be simple using GCP PowerShell cmdlets.
Get-GceSnapshot -Project $gcpProject
I never used it, but you can use calculated properties to convert the size in oneline, an example below
Get-Process | Select-Object -First 3 -Property Name,#{E={$_.STORAGE_BYTES/1024};L='Size'}
PS: assuming the size property name is STORAGE_BYTES
https://googlecloudplatform.github.io/google-cloud-powershell/#/google-compute-engine/GceSnapshot/Get-GceSnapshot

PostgreSQL Service unexpectedly closing

using PostgreSQL 9.2 on Windows Xp(RAM 4GB) sometimes the Database service is closing unexpectedly.
am new in PostgreSQL Database, So i cannot figure out the reason behind it, can any one help me ?!
log 1
log 2
Current Backend Access config(pg_hba.conf)
host all all 0.0.0.0/0 trust
Postgresql.conf Settings
effective_cache_size = 131072
lc_messages = English_United States.1252
lc_monetary = English_United States.1252
lc_numeric = English_United States.1252
lc_time = English_United States.1252
listen_addresses = *
log_destination = stderr
log_line_prefix = %t
log_timezone = Asia/Calcutta
logging_collector = on
max_connections = 10000
max_files_per_process = 1000
port = 5432
restart_after_crash = on
shared_buffers = 131072
temp_buffers = 131072
TimeZone = Asia/Calcutta
work_mem = 1048576