Postgres synchronous_standby_names var not accepting '-' in the hostname - postgresql

I am trying to setup Postgres cluster with 3 machines to get high availability with automatic failover.
postgres-01 --> master
postgres-02 --> sync replica
postgres-03 --> async replica
When I tried to use synchronous_standby_names='postgres-02' in the postgresql.conf it fails to restart the postgres with the following error
LOG: invalid value for parameter "synchronous_standby_names": "postgres-02"
DETAIL: syntax error at or near "-"
FATAL: configuration file "/pgsql/postgresql.conf" contains errors
postgresql-10.service: main process exited, code=exited, status=1/FAILURE
Failed to start PostgreSQL 10 database server.
-- Subject: Unit postgresql-10.service has failed
-- Defined-By: systemd
Removing the '-' from the hostname fixes the problem, But is this really required.

You'll have to quote the name:
synchronous_standby_names = '"postgres-02"'
You should have at least two synchronous standby servers, else your system will stop functioning if the single synchronous standby server goes down.

Related

PSQL timeline conflict prevent start of master

We had an outage on one of our PSQL 14 (managed by Zalando) due to k8s control plane being unreachable for 30min.
Control plane is now ok but master PSQL does not want to start:
LOG,00000,"listening on IPv4 address ""0.0.0.0"", port 5432"
LOG,00000,"listening on IPv6 address ""::"", port 5432"
LOG,00000,"listening on Unix socket ""/var/run/postgresql/.s.PGSQL.5432"""
LOG,00000,"database system was shut down at 2023-01-30 02:51:10 UTC"
WARNING,01000,"specified neither primary_conninfo nor restore_command",,"The database server will regularly poll the pg_wal subdirectory to check for files placed there."
LOG,00000,"entering standby mode"
FATAL,XX000,"requested timeline 5 is not a child of this server's history","Latest checkpoint is at 2/82000028 on timeline 4, but in the history of the requested timeline, the server forked off from that timeline at 0/530000A0."
LOG,00000,"startup process (PID 23007) exited with exit code 1"
LOG,00000,"aborting startup due to startup process failure"
LOG,00000,"database system is shut down"
We can see in archive_status folder:
-rw-------. 1 postgres postgres 0 Jan 30 02:51 000000040000000200000081.ready
-rw-------. 1 postgres postgres 0 Jan 30 02:51 00000005.history.done
Would you know how we can recover safely from this?
I guess switching back to timeline 4 would be enough as timeline 5 was made after start of outage.
The server is started in standby mode. Remove standby.signal if you want to start the server as primary server.

Unable to start PostgreSQL 12 Server

I want to setup PostgreSQL 12 with PostGIS 3 on Ubuntu 20.04 for the purpose of creating an OSM Tile Server. I want to have 2 different clusters, one for a regular PSQL database and another for OSM data. I can't seem to get the one for the OSM data up and running:
When I run pg_lsclusters, I get the following:
Ver Cluster Port Status Owner Data directory Log file
12 main 5433 online postgres /var/lib/postgresql/12/main /var/log/postgresql/postgresql-12-main.log
12 osm_psql_db 5432 down postgres /var/lib/postgresql/12/2TB1/osm_psql_db /var/log/postgresql/postgresql-12-osm_psql_db.log
When I run journalctl -xe, I get the following:
Mar 13 11:47:37 cdil-MS-7B92 systemd[1]: Dependency failed for PostgreSQL Cluster 12-osm_psql_db.
-- Subject: A start job for unit postgresql#12-osm_psql_db.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit postgresql#12-osm_psql_db.service has finished with a failure.
--
-- The job identifier is 9566 and the job result is dependency.
Mar 13 11:47:37 cdil-MS-7B92 systemd[1]: postgresql#12-osm_psql_db.service: Job postgresql#12-osm_psql_db.service/start failed with result 'dependency'.
Mar 13 11:47:37 cdil-MS-7B92 systemd[1]: var-lib-postgresql-12-osm_psql_db.mount: Job var-lib-postgresql-12-osm_psql_db.mount/start failed with result 'dependency'.
Mar 13 11:47:37 cdil-MS-7B92 systemd[1]: dev-disk-by\x2dlabel-osm_psql_db.device: Job dev-disk-by\x2dlabel-osm_psql_db.device/start failed with result 'timeout'.
Mar 13 11:47:43 cdil-MS-7B92 PackageKit[27900]: daemon quit
Mar 13 11:47:43 cdil-MS-7B92 systemd[1]: packagekit.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit packagekit.service has successfully entered the 'dead' state.
Any idea what could be holding me up?
*** EXTRA INFO JUST IN CASE ***
In terms of how I set up everything, I installed the following packages:
sudo apt install postgresql-12 postgresql-contrib postgis postgresql-12-postgis-3
Because the OSM data is quite large, I want to store that particular cluster on another hard disk. It's called "2TB1" and it's been mounted to /var/lib/postgresql/12/2TB1 because I realized that the postgres user needed access to the data_directory folder and all parent folders leading up to it.
To do so I modified the permissions of the new hard drive:
sudo chown -R postgres:postgres /var/lib/postgresql/12/2TB1
Next, I created the new db cluster instance:
sudo pg_createcluster 12 osm_psql_db -d /var/lib/postgresql/12/2TB1/osm_psql_db -p 5432
I start the new instance:
sudo pg_ctlcluster 12 osm_psql_db start
I get the following error:
A dependency job for postgresql#12-osm_psql_db.service failed. See 'journalctl -xe' for details.
For anyone that stumbles upon the same issue... I tracked the problem down to the *.service file referencing the wrong mount point for the database cluster location. Here's what I did:
Enable the new service (not sure if this is needed, but what the heck...)
sudo systemctl enable postgresql#12-osm_psql_db
Edit the postgresql#12-osm_psql_db.service
sudo systemctl edit --full postgresql#12-osm_psql_db.service
Change
RequiresMountsFor=/etc/postgresql/%I /var/lib/postgresql/%I
To
RequiresMountsFor=/etc/postgresql/%I /var/lib/postgresql/12/2TB1/osm_psql_db
As part of the service script, %I expands to VERSION/CLUSTER which in my case would have been 12/osm_psql_db. Since I was choosing to place the DB on another SSD and the database can't reside in the root directory of a disk, the mount location in the *.service file needed to be updated to 12/2TB1/osm_psql_db. This would not be necessary if you were storing all your databases on a single hard disk.

How to initial sync mongo replica

My mongo slave is dead because it stopped unexpectedly due to not enough space and It wont start due to
mongodb.service: Main process exited, code=exited, status=14/n/a
I tried to fix the error with following suggestions:
https://askubuntu.com/questions/823288/mongodb-loads-but-breaks-returning-status-14
but it lead to next error code:
mongodb.service: Main process exited, code=exited, status=100/n/a
which I tried to fix with following
https://dba.stackexchange.com/questions/220411/sudo-service-mongod-start-returns-error-100
this it it log output
2021-05-01T18:25:30.987+0000 I - [initandlisten] Fatal assertion 28579 UnsupportedFormat: Unable to find metadata for table:index-3-848131710157586571 Index: {name: _id_, ns: local.me} - version too new for this mongod. See http://dochub.mongodb.org/core/3.4-index-downgrade for detailed instructions on how to handle this error. at src/mongo/db/storage/wiredtiger/wiredtiger_index.cpp 241
The command sudo service mongodb start wont work because the status command shows that the service is dead.
I figured out that it would be easier to resync the data from scratch. I found the documentation
https://docs.mongodb.com/manual/tutorial/resync-replica-set-member/#resync-a-member-of-a-replica-set
but I am not fully aware what commands run to execute this operation.
My dbPath = "/mnt/mongo/mongodb", MongoDB shell version v3.4.14, and my database has about 2.5T. Could you give my some guidance how to execute initial sync mongo replica?
From my understanding i should
sudo rm -r /mnt/mongo/mongodb/*
sudo service mongodb start
After some time everything should get back to normal(?)
Correct me if I am wrong...

Postgresql WalReceiver process waits on connecting master regardless of "connect_timeout"

I am trying to deploy an automated high-available PostgreSQL cluster on kubernetes. In cases of master failover or temporary failures in master, standby loses streaming replication connection and when retrying, it takes a long time until it gets failed and retries.
I use PostgreSQL 10 and streaming replication (cluster-main-cluster-master-service is a service that always routes to master and all the replicas connect to this service for replication). I've tried setting configs like connect_timeout and keepalive in primary_conninfo of recovery.conf and wal_receiver_timeout in postgresql.conf of standby but I could not make any progress with them.
In the first place when master goes down, replication stops with the following error (state 1):
2019-10-06 14:14:54.042 +0330 [3039] LOG: replication terminated by primary server
2019-10-06 14:14:54.042 +0330 [3039] DETAIL: End of WAL reached on timeline 17 at 0/33000098.
2019-10-06 14:14:54.042 +0330 [3039] FATAL: could not send end-of-streaming message to primary: no COPY in progress
2019-10-06 14:14:55.534 +0330 [12] LOG: record with incorrect prev-link 0/2D000028 at 0/33000098
After investigating Postgres activities I found out that WalReceiver proccess stucks in LibPQWalReceiverConnect wait_event (state 2) but timeout is way longer than what I configured (although I set connect_timeout to 10 seconds, it takes about 2 minutes). Then, It fails with the following error (state 3):
2019-10-06 14:17:06.035 +0330 [3264] FATAL: could not connect to the primary server: could not connect to server: Connection timed out
Is the server running on host "cluster-main-cluster-master-service" (192.168.0.166) and accepting
TCP/IP connections on port 5432?
In the next try, It successfully connects the primary (state 4):
2019-10-06 14:17:07.892 +0330 [5786] LOG: started streaming WAL from primary at 0/33000000 on timeline 17
I also tried killing the process when stuck event occurs (state 2), and when I do, It starts the process again and connects and then streams normally (jumps to state 4).
After checking netstat, I also found that there is a connection with SYN_SENT state to the old master in the walreceiver process (in failover case).
connect_timeout governs how long PostgreSQL will wait for the replication connection to succeed, but that does not include establishing the TCP connection.
To reduce the time that the kernel waits for a successful answer to a TCP SYN request, reduce the number of retries. In /etc/sysctl.conf, set:
net.ipv4.tcp_syn_retries = 3
and run sysctl -p.
That should reduce the time significantly.
Reducing the value too much might make your system less stable.

Restart PostgreSQL without postgresql-server

I'm on CentOS 7 and I'm trying to get through the 'PG::ConnectionBad: FATAL: Peer authentication failed for user' error.
So I've already figured out that I should change pg_hba.conf (peer to md5) and I've done it. It seems that I have to restart postgres but it is not so easy as I thought.
I tried 'service postgresql restart' which resulted in 'Failed to restart postgresql.service: Unit not found.'
Then tried to install posgresql-server. Got:
oct 23 01:16:15 serverct1 pg_ctl[3280]: HINT: Is another postmaster already running on port 5432? If ...try.
oct 23 01:16:15 serverct1 pg_ctl[3280]: WARNING: could not create listen socket for "localhost"
oct 23 01:16:15 serverct1 pg_ctl[3280]: FATAL: could not create any TCP/IP sockets
oct 23 01:16:16 serverct1 pg_ctl[3280]: pg_ctl: could not start server
oct 23 01:16:16 serverct1 systemd[1]: postgresql.service: control process exited, code=exited status=1
oct 23 01:16:16 serverct1 systemd[1]: Failed to start PostgreSQL database server.
About 5432 port usage:
postgres 5432/tcp postgresql # POSTGRES
postgres 5432/udp postgresql # POSTGRES
So I'm curious:
1) Do postgresql and postgresql-server work separately?
2) Is it possible to restart posgresql without postgresql-server?
3) If not - how to get the port 5432 free in order to run postgresql-server?
You can avoid troubles with serverct1 if you use standard postgres pg_ctl, eg:
pg_ctl reload
Or if needed pg_ctl reload -D $PGDATA
You dont need to restart the postgres for pg_hba changes to apply: https://www.postgresql.org/docs/current/static/auth-pg-hba-conf.html
The pg_hba.conf file is read on start-up and when the main server
process receives a SIGHUP signal. If you edit the file on an active
system, you will need to signal the postmaster (using pg_ctl reload or
kill -HUP) to make it re-read the file.