Postgres Replication with pglogical: ERROR: connection to other side has died - postgresql

Got this error (on replica) while replicating between 2 Postgres instances:
ERROR: connection to other side has died
Here is the logs on the replica/subscriber:
2017-09-15 20:03:55 UTC [14335-3] LOG: apply worker [14335] at slot 7 generation 109 crashed
2017-09-15 20:03:55 UTC [2961-1732] LOG: worker process: pglogical apply 16384:3661733826 (PID 14335) exited with exit code 1
2017-09-15 20:03:59 UTC [14331-2] ERROR: connection to other side has died
2017-09-15 20:03:59 UTC [14331-3] LOG: apply worker [14331] at slot 2 generation 132 crashed
2017-09-15 20:03:59 UTC [2961-1733] LOG: worker process: pglogical apply 16384:3423246629 (PID 14331) exited with exit code 1
2017-09-15 20:04:02 UTC [14332-2] ERROR: connection to other side has died
2017-09-15 20:04:02 UTC [14332-3] LOG: apply worker [14332] at slot 4 generation 125 crashed
2017-09-15 20:04:02 UTC [2961-1734] LOG: worker process: pglogical apply 16384:2660030132 (PID 14332) exited with exit code 1
2017-09-15 20:04:02 UTC [14350-1] LOG: starting apply for subscription parking_sub
2017-09-15 20:04:05 UTC [14334-2] ERROR: connection to other side has died
2017-09-15 20:04:05 UTC [14334-3] LOG: apply worker [14334] at slot 6 generation 119 crashed
2017-09-15 20:04:05 UTC [2961-1735] LOG: worker process: pglogical apply 16384:394989729 (PID 14334) exited with exit code 1
2017-09-15 20:04:06 UTC [14333-2] ERROR: connection to other side has died
Logs on master/provider:
2017-09-15 23:22:43 UTC [22068-5] repuser#ga-master ERROR: got sequence entry 1 for toast chunk 1703536315 instead of seq 0
2017-09-15 23:22:43 UTC [22068-6] repuser#ga-master LOG: could not receive data from client: Connection reset by peer
2017-09-15 23:22:44 UTC [22067-5] repuser#ga-master ERROR: got sequence entry 1 for toast chunk 1703536315 instead of seq 0
2017-09-15 23:22:44 UTC [22067-6] repuser#ga-master LOG: could not receive data from client: Connection reset by peer
2017-09-15 23:22:48 UTC [22070-5] repuser#ga-master ERROR: got sequence entry 1 for toast chunk 1703536315 instead of seq 0
2017-09-15 23:22:48 UTC [22070-6] repuser#ga-master LOG: could not receive data from client: Connection reset by peer
2017-09-15 23:22:49 UTC [22069-5] repuser#ga-master ERROR: got sequence entry 1 for toast chunk 1703536315 instead of seq 0
2017-09-15 23:22:49 UTC [22069-6] repuser#ga-master LOG: could not receive data from client: Connection reset by peer
Config on master/provider:
archive_mode = on
archive_command = 'cp %p /data/pgdata/wal_archives/%f'
max_wal_senders = 20
wal_level = logical
max_worker_processes = 100
max_replication_slots = 100
shared_preload_libraries = pglogical
max_wal_size = 20GB
Config on the replica/subscriber:
max_replication_slots = 100
shared_preload_libraries = pglogical
max_worker_processes = 100
max_wal_size = 20GB
I'm having a total of 18 subscriptions for 18 schemas. It seemed to work fine in the beginning, but it quickly deteriorated and some subscriptions started to bounce between down and replicating statuses, with the error posted above.
Question
What could be the possible causes? Do I need to change my Pg configurations?
Also, I noticed that when replication is going on, the CPU usage on the master/provider is pretty high.
/# ps aux | sort -nrk 3,3 | head -n 5
postgres 18180 86.4 1.0 415168 162460 ? Rs 22:32 19:03 postgres: getaround getaround 10.240.0.7(64106) CREATE INDEX
postgres 20349 37.0 0.2 339428 38452 ? Rs 22:53 0:07 postgres: wal sender process repuser 10.240.0.7(49742) idle
postgres 20351 33.8 0.2 339296 36628 ? Rs 22:53 0:06 postgres: wal sender process repuser 10.240.0.7(49746) idle
postgres 20350 28.8 0.2 339016 44024 ? Rs 22:53 0:05 postgres: wal sender process repuser 10.240.0.7(49744) idle
postgres 20352 27.6 0.2 339420 36632 ? Rs 22:53 0:04 postgres: wal sender process repuser 10.240.0.7(49750) idle
Thanks in advance!

I had a similar problem which was fixed by setting the: wal_sender_timeout config on the master/provider to 5 minutes (default is 1 minute). It will drop the connection if it times out - this seems to have fixed the problem for me.

Related

ERROR: invalid logical replication message type "T"

I am getting below error from Postgres 10.3 logical replication.
Setup
In master, postgresql used 12.3
In logical, postgres 10.3
Logs
2021-03-22 13:06:57.332 IST # 25929 LOG: checkpoints are occurring too frequently (22 seconds apart)
2021-03-22 13:06:57.332 IST # 25929 HINT: Consider increasing the configuration parameter "max_wal_size".
2021-03-22 14:34:21.263 IST # 21461 ERROR: invalid logical replication message type "T"
2021-03-22 14:34:21.315 IST # 3184 LOG: logical replication apply worker for subscription "elk_subscription_133" has started
2021-03-22 14:34:21.367 IST # 3184 ERROR: invalid logical replication message type "T"
2021-03-22 14:34:21.369 IST # 25921 LOG: worker process: logical replication worker for subscription 84627 (PID 3184) exited with exit code 1
2021-03-22 14:34:22.259 IST # 25921 LOG: worker process: logical replication worker for subscription 84627 (PID 21461) exited with exit code 1
2021-03-22 14:34:27.281 IST # 3187 LOG: logical replication apply worker for subscription "elk_subscription_133" has started
2021-03-22 14:34:27.311 IST # 3187 ERROR: invalid logical replication message type "T"
2021-03-22 14:34:27.313 IST # 25921 LOG: worker process: logical replication worker for subscription 84627 (PID 3187) exited with exit code 1
2021-03-22 14:34:32.336 IST # 3188 LOG: logical replication apply worker for subscription "elk_subscription_133" has started
2021-03-22 14:34:32.362 IST # 3188 ERROR: invalid logical replication message type "T"
The documentation describes message T:
Truncate
      Byte1('T')
              Identifies the message as a truncate message.
Support for TRUNCATE was added in v11, so the primary server must be v11 or better.
You will have to remove the table from the publication, refresh the subscription, truncate the table manually, add it to the publication and refresh the subscription again.
Avoid TRUNCATE and change the publication:
ALTER PUBLICATION name SET (publish = 'insert, update, delete');

How to resolve error reading result of streaming command?

I have a master database doing logical replication with a publication and a slave database subscribing to that publication. It is on the slave that I am occasionally getting the following error:
ERROR: error reading result of streaming command:
LOG: logical replication table synchronization worker for subscription ABC, table XYZ
How do I stop the above error from happening?
Below is a screenshot of the log to demonstrate the error:
Here is the same information as text:
2020-11-25 06:50:51.736 UTC [91572] LOG: background worker "logical replication worker" (PID 96504) exited with exit code 1
2020-11-25 06:50:51.740 UTC [96505] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_devicekioskrating" has started
2020-11-25 06:50:52.197 UTC [96505] ERROR: error reading result of streaming command:
2020-11-25 06:50:52.200 UTC [91572] LOG: background worker "logical replication worker" (PID 96505) exited with exit code 1
2020-11-25 06:50:52.203 UTC [96506] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "workorders_sectorbranchinformation" has started
2020-11-25 06:50:52.286 UTC [96506] ERROR: error reading result of streaming command:
2020-11-25 06:50:52.288 UTC [91572] LOG: background worker "logical replication worker" (PID 96506) exited with exit code 1
2020-11-25 06:50:52.292 UTC [96507] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_kioskstatetransitions" has started
2020-11-25 06:52:14.887 UTC [96339] ERROR: error reading result of streaming command:
2020-11-25 06:52:14.896 UTC [91572] LOG: background worker "logical replication worker" (PID 96339) exited with exit code 1
2020-11-25 06:52:14.900 UTC [96543] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_sensordatafeed" has started
2020-11-25 06:52:21.385 UTC [96507] ERROR: error reading result of streaming command:
2020-11-25 06:52:21.393 UTC [91572] LOG: background worker "logical replication worker" (PID 96507) exited with exit code 1
2020-11-25 06:52:21.397 UTC [96547] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_sitemappoint" has started
2020-11-25 06:52:21.523 UTC [96547] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_sitemappoint" has finished
2020-11-25 06:52:21.528 UTC [96548] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "core_event" has started
2020-11-25 06:55:35.401 UTC [96543] ERROR: error reading result of streaming command:
2020-11-25 06:55:35.408 UTC [91572] LOG: background worker "logical replication worker" (PID 96543) exited with exit code 1
2020-11-25 06:55:35.412 UTC [96642] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_doorevents" has started
2020-11-25 06:56:43.633 UTC [96642] ERROR: error reading result of streaming command:
2020-11-25 06:56:43.641 UTC [91572] LOG: background worker "logical replication worker" (PID 96642) exited with exit code 1
2020-11-25 06:56:43.644 UTC [96678] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "workorders_sectorbranchinformation" has started
2020-11-25 06:56:43.776 UTC [96678] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "workorders_sectorbranchinformation" has finished
2020-11-25 06:56:43.782 UTC [96679] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "core_batteryhistory" has started
2020-11-25 06:57:04.166 UTC [96679] ERROR: error reading result of streaming command:
2020-11-25 06:57:04.174 UTC [91572] LOG: background worker "logical replication worker" (PID 96679) exited with exit code 1
2020-11-25 06:57:04.178 UTC [96685] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_attendantvisittime" has started
2020-11-25 06:57:06.100 UTC [96685] ERROR: error reading result of streaming command:
2020-11-25 06:57:06.160 UTC [91572] LOG: background worker "logical replication worker" (PID 96685) exited with exit code 1
2020-11-25 06:57:06.164 UTC [96693] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_kioskstatetransitions" has started
2020-11-25 06:59:50.375 UTC [96548] ERROR: error reading result of streaming command:
2020-11-25 06:59:50.382 UTC [91572] LOG: background worker "logical replication worker" (PID 96548) exited with exit code 1
2020-11-25 06:59:50.389 UTC [96755] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_sensordatafeed" has started
2020-11-25 07:00:56.844 UTC [96693] ERROR: error reading result of streaming command:
2020-11-25 07:00:56.852 UTC [91572] LOG: background worker "logical replication worker" (PID 96693) exited with exit code 1
2020-11-25 07:00:56.856 UTC [96779] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "workorders_wastestream" has started
2020-11-25 07:00:57.391 UTC [96779] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "workorders_wastestream" has finished
2020-11-25 07:00:57.397 UTC [96780] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "core_event" has started
2020-11-25 07:02:39.650 UTC [96755] ERROR: error reading result of streaming command:
2020-11-25 07:02:39.658 UTC [91572] LOG: background worker "logical replication worker" (PID 96755) exited with exit code 1
2020-11-25 07:02:39.662 UTC [96824] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_devicekioskrating" has started
2020-11-25 07:02:40.276 UTC [96824] ERROR: error reading result of streaming command:
2020-11-25 07:02:40.279 UTC [91572] LOG: background worker "logical replication worker" (PID 96824) exited with exit code 1
2020-11-25 07:02:40.283 UTC [96825] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_kioskstatetransitions" has started
2020-11-25 07:04:07.222 UTC [96825] ERROR: error reading result of streaming command:
2020-11-25 07:04:07.230 UTC [91572] LOG: background worker "logical replication worker" (PID 96825) exited with exit code 1
2020-11-25 07:04:07.234 UTC [96862] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "contractservices_attendantvisit" has started
2020-11-25 07:04:49.971 UTC [96862] ERROR: error reading result of streaming command:
2020-11-25 07:04:49.978 UTC [91572] LOG: background worker "logical replication worker" (PID 96862) exited with exit code 1
2020-11-25 07:04:50.432 UTC [97013] LOG: logical replication table synchronization worker for subscription "snf5_cba_isp_db_staging_app1_srv_sub", table "core_batteryhistory" has started
Despite this error in postgresql v13.0, the tables on the slave database seem to be replicating okay. However, I would like to resolve this error.
I also tried downloading postgresql v13.1, and noticed that I still get this error and that it does not replicate okay.
There was this post that I found:
https://www.postgresql-archive.org/BUG-16643-PG13-Logical-replication-initial-startup-never-finishes-and-gets-stuck-in-startup-loop-td6156051.html
The guy there (Henry Hinze) said it was a bug and it was fixed by installing version 13 RC1.
But my experience was the reverse, in postgresql v13.0 it was not getting stuck in the start up loop but I noticed after installing postgresql v13.1 it was doing that.
I can confirm that I am using postgresql version 13.1 as /usr/lib/postgresql/13/bin/postgres -V gives me the following output:
postgres (PostgreSQL) 13.1 (Ubuntu 13.1-1.pgdg18.04+1)
I am using Ubuntu v18.04.
I have uninstalled postgresql completely and reinstalled it and it has not resolved the issue.
The postgresql.conf settings on the slave are the default settings.
The relevant postgresql.conf on the master are as follows:
wal_level = logical
checkpoint_timeout = 5min
max_wal_size = 1GB
min_wal_size = 80MB

PostgreSQL 9.4.1 Switchover & Switchback without recover_target_timeline=latest

I have tested different scenarios to do switchover and switchback in postgreSQL 9.4.1 Version.
Scenario 1:- PostgreSQL Switchover and Switchback in 9.4.1
Scenario 2:- Is it mandatory parameter recover_target_timeline='latest' in switchover and switchback in PostgreSQL 9.4.1?
Scenario 3:- On this page
To test scenario 3 I have followed below steps to perform.
1) Stop the application connected to primary server.
2) Confirm all application was stopped and all thread was disconnected from primary DB.
#192.x.x.129(Primary)
3) Clean shutdown primary using
pg_ctl -D$PGDATA stop --mf
#DR(192.x.x.128) side check sync status:
postgres=# select pg_last_xlog_receive_location(),pg_last_xlog_replay_location();
-[ RECORD 1 ]-----------------+-----------
pg_last_xlog_receive_location | 4/57000090
pg_last_xlog_replay_location | 4/57000090
4)Stop DR server.DR(192.x.x.128)
pg_ctl -D $PGDATA stop -mf
pg_log:
2019-12-02 13:16:09 IST LOG: received fast shutdown request
2019-12-02 13:16:09 IST LOG: aborting any active transactions
2019-12-02 13:16:09 IST LOG: shutting down
2019-12-02 13:16:09 IST LOG: database system is shut down
#192.x.x.128(DR)
5) Make following changes on DR server.
mv recovery.conf recovery.conf_bkp
6)make changes in 192.x.x.129(Primary):
[postgres#localhost data]$ cat recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=replication password=postgres host=192.x.x.128 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
restore_command = 'cp %p /home/postgres/restore/%f'
trigger_file='/tmp/promote'
7)Start DR as read write mode:
pg_ctl -D $DATA start
pg_log:
2019-12-02 13:20:21 IST LOG: database system was shut down in recovery at 2019-12-02 13:16:09 IST
2019-12-02 13:20:22 IST LOG: database system was not properly shut down; automatic recovery in progress
2019-12-02 13:20:22 IST LOG: consistent recovery state reached at 4/57000090
2019-12-02 13:20:22 IST LOG: invalid record length at 4/57000090
2019-12-02 13:20:22 IST LOG: redo is not required
2019-12-02 13:20:22 IST LOG: database system is ready to accept connections
2019-12-02 13:20:22 IST LOG: autovacuum launcher started
(END)
We can see in above log OLD primary is now DR of Primary(Which was OLD DR) and not showing any error because timeline id same on new primary which is already exit in new DR.
8)Start Primary as read only mode:-
pg_ctl -D$PGDATA start
logs:
2019-12-02 13:24:50 IST LOG: database system was shut down at 2019-12-02 11:14:50 IST
2019-12-02 13:24:51 IST LOG: entering standby mode
cp: cannot stat ‘pg_xlog/RECOVERYHISTORY’: No such file or directory
cp: cannot stat ‘pg_xlog/RECOVERYXLOG’: No such file or directory
2019-12-02 13:24:51 IST LOG: consistent recovery state reached at 4/57000090
2019-12-02 13:24:51 IST LOG: record with zero length at 4/57000090
2019-12-02 13:24:51 IST LOG: database system is ready to accept read only connections
2019-12-02 13:24:51 IST LOG: started streaming WAL from primary at 4/57000000 on timeline 9
2019-12-02 13:24:51 IST LOG: redo starts at 4/57000090
(END)
Question 1:- In This scenario i have perform only switch-over to show you. using this method we can do switch-over and switchback. but using below method Switch-over-switchback is work, then why PostgreSQL Community invented recovery_target_timeline=latest and apply patches see blog: https://www.enterprisedb.com/blog/switchover-switchback-in-postgresql-9-3 from PostgrSQL 9.3...to latest version.
Question 2:- What mean to say in above log cp: cannot stat ‘pg_xlog/RECOVERYHISTORY’: No such file or directory ?
Question 3:- I want to make sure from scenarios 1 and scenario 3 which method/Scenarios is correct way to do switchover and switchback? because scenario 2 is getting error because we must use recover_target_timeline=latest which all community experts know.
Answers:
If you shut down the standby cleanly, then remove recovery.conf and restart it, it will come up, but has to perform crash recovery (database system was not properly shut down).
The proper way to promote a standby to a primary is by using the trigger file or running pg_ctl promote (or, from v12 on, by running the SQL function pg_promote). Then you have no down time and don't need to perform crash recovery.
Promoting the standby will make it pick a new time line, so you need recovery_target_timeline = 'latest' if you want the new standby to follow that time line switch.
That is caused by your restore_command.
The method shown in 1. above is the correct one.

Unexpected termination and restart of postgresql 9.6

We have a postgresql 9.6.14 postgres server, where we run a query which caused a termination of postgresql process and restart of the postgresql process.
We don't know why it happened.
The query runs fine when we query it with another filter value, so I guess it has to do with the amount of data it is querying. But can this really cause a restart of the whole postgres service? So maybe a memory problem?
postgresql.log
2019-07-12 17:54:13.487 CEST [6459]: [7-1] user=,db=,app=,client= LOG:
server process (PID 11064) was terminated by signal 11: Segmentation
fault 2019-07-12 17:54:13.487 CEST [6459]: [8-1]
user=,db=,app=,client= DETAIL: Failed process was running:
2019-07-12 17:54:13.487 CEST [6459]: [9-1] user=,db=,app=,client= LOG:
terminating any other active server processes 2019-07-12 17:54:13.488
CEST [11501]: [1-1] user=hg,db=test,app=[unknown],client=172.31.0.43
WARNING: terminating connection because of crash of another server
process 2019-07-12 17:54:13.488 CEST [11501]: [2-1]
user=hg,db=test,app=[unknown],client=172.31.0.43 DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory. 2019-07-12 17:54:13.488 CEST
[11501]: [3-1] user=hg,db=test,app=[unknown],client=172.31.0.43 HINT:
In a moment you should be able to reconnect to the database and repeat
your command. 2019-07-12 17:54:13.488 CEST [8889]: [2-1]
user=hg,db=_test,app=[unknown],client=172.31.0.46 WARNING:
terminating connection because of crash of another server process
select stat.*,
(
Select
1
From
table1 a, table2 pg
Where
a.field_1::Text = stat.field_1::Text And
a.field_2::Text = stat.field_2::Text And
stat.field_3::Text = pg.field_3::Text And
a.field_4= pg.field_4
limit 1
)
from table3 stat
where field_1= 'xyz';

Slow postgresql startup in docker container

We built a debian docker image with postgresql to run one of our service. The database is for internal container use and does not need port mapping. I believe it is installed via apt-get in the Dockerbuild file.
We stop and start this service often, and it is a performance issue that the database is slow to startup. Although empty, takes sightly over 20s to accept connection on the first time we start the docker image. The log is as follow :
2019-04-05 13:05:30.924 UTC [19] LOG: could not bind IPv6 socket: Cannot assign requested address
2019-04-05 13:05:30.924 UTC [19] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2019-04-05 13:05:30.982 UTC [20] LOG: database system was shut down at 2019-04-05 12:57:16 UTC
2019-04-05 13:05:30.992 UTC [20] LOG: MultiXact member wraparound protections are now enabled
2019-04-05 13:05:30.998 UTC [19] LOG: database system is ready to accept connections
2019-04-05 13:05:30.998 UTC [24] LOG: autovacuum launcher started
2019-04-05 13:05:31.394 UTC [26] [unknown]#[unknown] LOG: incomplete startup packet
2019-04-19 13:21:58.974 UTC [37] LOG: could not bind IPv6 socket: Cannot assign requested address
2019-04-19 13:21:58.974 UTC [37] HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2019-04-19 13:21:59.025 UTC [38] LOG: database system was interrupted; last known up at 2019-04-05 13:05:34 UTC
2019-04-19 13:21:59.455 UTC [39] [unknown]#[unknown] LOG: incomplete startup packet
2019-04-19 13:21:59.971 UTC [42] postgres#postgres FATAL: the database system is starting up
[...]
2019-04-19 13:22:15.221 UTC [85] root#postgres FATAL: the database system is starting up
2019-04-19 13:22:15.629 UTC [38] LOG: database system was not properly shut down; automatic recovery in progress
2019-04-19 13:22:15.642 UTC [38] LOG: redo starts at 0/14EEBA8
2019-04-19 13:22:15.822 UTC [38] LOG: invalid record length at 0/24462D0: wanted 24, got 0
2019-04-19 13:22:15.822 UTC [38] LOG: redo done at 0/24462A8
2019-04-19 13:22:15.822 UTC [38] LOG: last completed transaction was at log time 2019-04-05 13:05:36.602318+00
2019-04-19 13:22:16.084 UTC [38] LOG: MultiXact member wraparound protections are now enabled
2019-04-19 13:22:16.094 UTC [37] LOG: database system is ready to accept connections
2019-04-19 13:22:16.094 UTC [89] LOG: autovacuum launcher started
2019-04-19 13:22:21.528 UTC [92] root#test LOG: could not receive data from client: Connection reset by peer
2019-04-19 13:22:21.528 UTC [92] root#test LOG: unexpected EOF on client connection with an open transaction
Any suggetion in fixing this startup issue ?
EDIT : Some requested the dockerfile, here is relevant lines
RUN apt-get update \
&& apt-get install -y --force-yes \
postgresql-9.6-pgrouting \
postgresql-9.6-postgis-2.3 \
postgresql-9.6-postgis-2.3-scripts \
[...]
# Download, compile and install GRASS 7.2
[...]
USER postgres
# Create a database 'grass_backend' owned by the "root" role.
RUN /etc/init.d/postgresql start \
&& psql --command "CREATE USER root WITH SUPERUSER [...];" \
&& psql --command "CREATE EXTENSION postgis; CREATE EXTENSION plpython3u;" --dbname [dbname] \
&& psql --command "CREATE EXTENSION postgis_sfcgal;" --dbname [dbname] \
&& psql --command "CREATE EXTENSION postgis; CREATE EXTENSION plpython3u;" --dbname grass_backend
WORKDIR [...]
End of file after workdir, meaning I guess the database isn't properly shut down
Answer I stopped properly postgresql inside the docker install. It now starts 15s faster. Thanks for replying
Considering the line database system was not properly shut down; automatic recovery in progress that would definitely explain slow startup, please don't kill the service, send the stop command and wait for it to close properly.
Please note that the system might kill the process if it takes to long to stop, this will happen in the case of postgresql if there are connections still held to it (probably from your application). If you disconnect all the connections and than stop, postgresql should be able to stop relatively quickly.
Also make sure you stop the postgresql service inside the container before turning it off.
TCP will linger connections for a while, if you are starting and stopping in quick succession without properly stopping the service inside that would explain your error of why the port is unavailable, normally the service can start/stop in very quick succession on my machine if nothing is connected to it.
3 start-stop cycles of postgresql on my machine (I have 2 decently sized databases)
$ time bash -c 'for i in 1 2 3; do /etc/init.d/postgresql-11 restart; done'
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
* Stopping PostgreSQL 11 (this can take up to 92 seconds) ... [ ok ]
* /run/postgresql: correcting mode
* Starting PostgreSQL 11 ... [ ok ]
real 0m1.188s
user 0m0.260s
sys 0m0.080s