pgpool second standby shows "down" in status - postgresql

I have setup pgpool with 3 nodes. Lets say node1 as primary and node2,node3 and secondary. When I stop the database service on node1, the node2 becomes primary as expected but node3 is being shown as "down" in show pool_nodes command. It is showing that streaming is live on node3 and I can connect to psql as well on node3 but status is down in pgpool
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
2020-05-07 05:58:39.969 EDT [23820] LOG: record with incorrect prev-link 0/7D000028 at 0/81000060
2020-05-07 05:58:39.976 EDT [25653] FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "node1" (10.32.5.121) and accepting
TCP/IP connections on port 5432?
Permission denied, please try again.
Permission denied, please try again.
2020-05-13 08:02:45.738 EDT [3159] ERROR: pgpool_remote_start failed
2020-05-13 08:02:45.738 EDT [3159] STATEMENT: SELECT pgpool_remote_start('node3', '/var/lib/pgsql/11/data')
2020-05-13 08:03:37.917 EDT [3032] LOG: unexpected EOF on standby connection
On executing pcp_recovery_node command, the following error occassionally occurs but node recovers with same command after 1 or 2 more retries
ERROR: executing remote start failed with error: "ERROR: pgpool_remote_start failed
I then have to attach the node with pcp_attach_node to change status to "up"

Related

Postgres: "Failed to load sql modules into the database cluster" and other errors

I run Windows and I am trying to install postgres, but I can't get it to work. During installation, I get the error:
Failed to load sql modules into the database cluster
When I try to access the server in pgadmin or via SQL Shell, I get the error:
could not connect to server: Connection refused (0x0000274D/10061) Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432? could not connect to server: Connection refused (0x0000274D/10061) Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5432?"
I have found a log in the data-folder saying:
2021-04-08 14:53:44.315 CEST [11088] LOG: starting PostgreSQL 13.2, compiled by Visual C++ build 1914, 64-bit
2021-04-08 14:53:44.321 CEST [11088] LOG: could not listen on IPv6 address "::": Permission denied
2021-04-08 14:53:44.326 CEST [11088] LOG: could not listen on IPv4 address "0.0.0.0": Permission denied
2021-04-08 14:53:44.327 CEST [11088] WARNING: could not create listen socket for "*"
2021-04-08 14:53:44.328 CEST [11088] FATAL: could not create any TCP/IP sockets
2021-04-08 14:53:44.331 CEST [11088] LOG: database system is shut down
Can someone please help me?
I had the same issue and followed different tips and tricks available on the internet but nothing worked.
I fixed that issue by following these steps:
First disable antivirus for an hour
Create a folder in either C drive or D drive and give full permission to User
Then again install a fresh application. If you have already installed then uninstall that and install from starting.
Make sure to choose that folder only at the time of installation

PSQL TimescaleDB, ERROR: the database system is in recovery mode

We have an application pipeline and Postgres-12(TimescaleDB, managed through Patroni) on a separate server (VM with Ubuntu 18.04 LTS).
We are facing an issue with the DB, it suddenly stuck in the recovery mode, and also we can’t access it from the psql client and select queries also hung.
After an hour or late all got back to normal (As my current pipeline terminated) and able to run queries against the DB server.
Master DB error details:
2020-11-03 18:35:08.612 IST [9773] [unknown]#[unknown] LOG: connection received: host=x.x.x.x port=58780
2020-11-03 18:35:08.612 IST [9773] FATAL: the database system is in recovery mode
2020-11-03 18:35:08.596 IST [18276] LOG: could not send data to client: Broken pipe
Replica server error details:
2020-11-03 18:34:55 IST [18316]: [85649-1] user=postgres,db=postgres,app=[unknown],client=x.x.x.x LOG: duration: 10.228 ms statement: SELECT * FROM pg_stat_bgwriter;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-11-03 18:35:08 IST [18322]: [2-1] user=,db=,app=,client= FATAL: could not receive data from WAL stream: SSL SYSCALL error: EOF detected
2020-11-03 18:35:08 IST [20500]: [1-1] user=,db=,app=,client= FATAL: could not connect to the primary server: FATAL: the database system is in recovery mode
FATAL: the database system is in recovery mode
Pipeline error details:
Job aborted due to stage failure: Task 4 in stage 0.0 failed 3 times, most recent failure: Lost task 4.2 in stage 0.0 (TID 29, ip-x-x-x-x.ap-southeast-1.compute.internal, executor 19): org.postgresql.util.PSQLException: FATAL: the database system is in recovery mode at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:514) at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:141) at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192) at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195) at org.postgresql.Driver.makeConnection(Driver.java:454) at org.postgresql.Driver.connect(Driver.java:256) at org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)
Please any advise on this issue?
What version of TimescaleDB are you running? In particular, there were some issues with 1.7.x if you try to query a read replica; we recommend upgrading to 1.7.4.
(Otherwise, there's not much information about to suggest what might have happened.)
https://github.com/timescale/timescaledb/releases/tag/1.7.4

pg_Ctl -data directory has wrong ownership

I am unable to start the Postgres server and whenever I use pg_ctl I am getting the following error - can some one help me to fix this. I changed the folder permissions using CHmod and tried running with Sudo -s also but still the problem exists.
one error I did was, I deleted the Postmaster.pid when the server was running- post this I am getting this issue when ever I try to start the server through pg_ctl and another error when I use the pgadmin.
Any suggestions here will be really helpful- thanks.
Using Macos Shell command :
'pg_ctl start -D /Library/PostgreSQL/12/data waiting for server to start....2020-05-05 11:40:04.838 IST [1216] FATAL: data directory "/Library/PostgreSQL/12/data" has wrong ownership 2020-05-05 11:40:04.838 IST [1216] HINT: The server must be started by the user that owns the data directory. stopped waiting pg_ctl: could not start server Examine the log output.'
Using pgadmin the error is as follows :
'could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5434? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5434?'
p.s. : I modified the hba.conf and also the postgres.conf files to allow connection from the local ip
Error received on 5May
waiting for server to start....2020-05-05 19:54:13.029 IST [7274] LOG: starting PostgreSQL 12.2 on x86_64-apple-darwin, compiled by Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn), 64-bit
2020-05-05 19:54:13.030 IST [7274] LOG: listening on IPv6 address "::", port 5433
2020-05-05 19:54:13.030 IST [7274] LOG: listening on IPv4 address "0.0.0.0", port 5433
2020-05-05 19:54:13.030 IST [7274] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2020-05-05 19:54:13.039 IST [7274] LOG: redirecting log output to logging collector process... 2020-05-05 19:54:13.039 IST [7274] HINT: Future log output will appear in directory "log" stopped waiting .. pg_ctl: could not start server
Examine the log output.
Log file details
2020-05-05 21:29:30.748 IST [8853] LOG: invalid authentication method "0.0.0.0/0"
2020-05-05 21:29:30.748 IST [8853] CONTEXT: line 80 of configuration file "/Library/PostgreSQL/12/data/pg_hba.conf"
2020-05-05 21:29:30.748 IST [8853] FATAL: could not load pg_hba.conf
2020-05-05 21:29:30.749 IST [8853] LOG: database system is shut down
Details of my pg_HBA conf
# "local" is for Unix domain socket connections only
local all all 0.0.0.0/0 md5
local all all md5
# IPv4 local connections:
host all all 127.0.0.1/32 md5
# IPv6 local connections:
host all all ::1/128 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all md5
host replication all 127.0.0.1/32 md5
host replication all ::1/128 md5
host all all 0.0.0.0/0 md5
host all all ::/0 md5
latest log file
bash-3.2$ cat postgresql-2020-05-05_221328.log
2020-05-05 22:13:28.794 IST [9834] LOG: database system was interrupted; last known up at 2020-05-05 22:13:09 IST
2020-05-05 22:13:28.872 IST [9834] LOG: database system was not properly shut down; automatic recovery in progress
2020-05-05 22:13:28.874 IST [9834] LOG: redo starts at 0/17742C8
2020-05-05 22:13:28.874 IST [9834] LOG: invalid record length at 0/1774300: wanted 24, got 0
2020-05-05 22:13:28.874 IST [9834] LOG: redo done at 0/17742C8
2020-05-05 22:13:28.881 IST [9832] LOG: database system is ready to accept connections
......
also I found this error while staring the server and the PID is chaning everytime..
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2020-05-05 22:09:21.941 IST [9746] FATAL: lock file "postmaster.pid" already exists
2020-05-05 22:09:21.941 IST [9746] HINT: Is another postmaster (PID 9735) running in data directory "/Library/PostgreSQL/12/data"?
stopped waiting
pg_ctl: could not start server
Examine the log output.
bash-3.2$ kill -9 9735
bash-3.2$ pg_ctl start -D /Library/PostgreSQL/12/data
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2020-05-05 22:09:35.829 IST [9758] FATAL: lock file "postmaster.pid" already exists
2020-05-05 22:09:35.829 IST [9758] HINT: Is another postmaster (PID 9747) running in data directory "/Library/PostgreSQL/12/data"?
stopped waiting
pg_ctl: could not start server
Examine the log output.
502 9833 9832 0 10:13PM ?? 0:00.00 postgres: logger
502 9835 9832 0 10:13PM ?? 0:00.00 postgres: checkpointer
502 9836 9832 0 10:13PM ?? 0:00.04 postgres: background writer
502 9837 9832 0 10:13PM ?? 0:00.01 postgres: walwriter
502 9838 9832 0 10:13PM ?? 0:00.01 postgres: autovacuum launcher
502 9839 9832 0 10:13PM ?? 0:00.01 postgres: stats collector
502 9840 9832 0 10:13PM ?? 0:00.00 postgres: logical replication launcher
0 9641 9504 0 10:03PM ttys000 0:00.02 sudo -u postgres -s /bin/bash
502 9904 9642 0 10:37PM ttys000 0:00.00 grep postgres
The data directory should be owned by the postgres user and have user-only access (700 or u+rwx)
Does this match what you have set up?
Thom Brown
Disclosure: I am an EnterpriseDB employee.
Try running this code
pg_ctl -D /usr/local/var/postgres start

mongodb keyFile between replicas throws Permission denied

I have a single node ReplicaSet with auth activated, a root user and a keyFile I've created with this tutorial, I also have two more mongod processes in the same server in different ports (37017 and 47017) and the same replSet name, but when I try to add the secondaries in the mongo shell connected to PRIMARY with rs.add("172.31.48.41:37017") I get:
{
"ok" : 0,
"errmsg" : "Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 172.31.48.41:27017; the following nodes did not respond affirmatively: 172.31.48.41:37017 failed with Failed attempt to connect to 172.31.48.41:37017; couldn't connect to server 172.31.48.41:37017 (172.31.48.41), connection attempt failed",
"code" : 74
}
Then I went to the mongod process log of the PRIMARY and found out this:
2015-05-19T20:53:59.848-0400 I REPL [conn51] replSetReconfig admin command received from client
2015-05-19T20:53:59.848-0400 W NETWORK [conn51] Failed to connect to 172.31.48.41:37017, reason: errno:13 Permission denied
2015-05-19T20:53:59.848-0400 I REPL [conn51] replSetReconfig config object with 2 members parses ok
2015-05-19T20:53:59.849-0400 W NETWORK [ReplExecNetThread-0] Failed to connect to 172.31.48.41:37017, reason: errno:13 Permission denied
2015-05-19T20:53:59.849-0400 W REPL [ReplicationExecutor] Failed to complete heartbeat request to 172.31.48.41:37017; Location18915 Failed attempt to connect to 172.31.48.41:37017; couldn't connect to server 172.31.48.41:37017 (172.31.48.41), connection attempt failed
2015-05-19T20:53:59.849-0400 E REPL [conn51] replSetReconfig failed; NodeNotFound Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 172.31.48.41:27017; the following nodes did not respond affirmatively: 172.31.48.41:37017 failed with Failed attempt to connect to 172.31.48.41:37017; couldn't connect to server 172.31.48.41:37017 (172.31.48.41), connection attempt failed
And the log of the mongod that should become SECONDARY shows nothing, the last two lines are:
2015-05-19T20:48:36.584-0400 I REPL [initandlisten] Did not find local replica set configuration document at startup; NoMatchingDocument Did not find replica set configuration document in local.system.replset
2015-05-19T20:48:36.591-0400 I NETWORK [initandlisten] waiting for connections on port 37017
It's clear that I cannot rs.initiate() in this node because it will self vote to be PRIMARY and that would create a conflict, so the line that states "Did not find local replica set configuration document at startup" is to be ignores as far as I know.
So I would think that the permission should be ok since I'm using the same key file in every mongod process and the replSet is the same in every config file, and that's all the tutorial states to be needed, but obviously something is missing.
Any ideas? Is this a bug?
If you are using ec2 instances and ip 27017 port in security group for both instances, just add a secondary instance port. It worked for me.

Shutdown mongos for upgrade

I am upgrading a mongo sharded cluster, and am in the first step of replacing the mongos process. Can I just kill this process or is there a safer way to shut it down before replacing it?
It is the answer. But may be you misunderstood. You cannot shutdown mongos from command line, but you have to logon to the shell. Here is the result I am trying on my own:
$ mongo --port 27077 (please replace with your own port on mongos instance)
**mongos**> db.shutdownServer({timeoutSecs:30})
shutdown command only works with the admin database; try 'use admin'
mongos> use admin
switched to db admin
mongos> db.shutdownServer({timeoutSecs:30})
2015-02-27T12:53:54.408+0800 DBClientCursor::init call() failed
**server should be down...**
2015-02-27T12:53:54.410+0800 trying reconnect to 127.0.0.1:27077 (127.0.0.1) failed
2015-02-27T12:53:54.410+0800 warning: Failed to connect to 127.0.0.1:27077, reason: errno:111 Connection refused
2015-02-27T12:53:54.410+0800 reconnect 127.0.0.1:27077 (127.0.0.1) failed failed couldn't connect to server 127.0.0.1:27077 (127.0.0.1), connection attempt failed
2015-02-27T12:53:54.413+0800 trying reconnect to 127.0.0.1:27077 (127.0.0.1) failed
2015-02-27T12:53:54.413+0800 warning: Failed to connect to 127.0.0.1:27077, reason: errno:111 Connection refused
2015-02-27T12:53:54.413+0800 reconnect 127.0.0.1:27077 (127.0.0.1) failed failed couldn't connect to server 127.0.0.1:27077 (127.0.0.1), connection attempt failed
>
Try to run "top" or "htop" to display all processes running on your computer. You shall see the "mongos" process has gone. Read the log file. Here is the last two lines of my log:
2015-02-27T12:53:54.406+0800 [conn1] terminating, shutdown command received
2015-02-27T12:53:54.406+0800 [conn1] dbexit: shutdown called rc:0 shutdown called
And here is the link of my testing script:
https://github.com/babycaseny/QuickStart/blob/master/StartShard.sh
Note that you have to replace the "localhost" in the mongod/mongos command with the hostname of your computer, or you will not be able to config your shards.
See also this one:
https://groups.google.com/forum/#!topic/mongodb-user/TQLlRI6HG1M
In case you need a command line to do the work, here is one:
mongo admin --port portnumber --eval "db.shutdownServer()"
Notice that you have to run the command in localhost.