We have master-slave replication on
PostgreSQL 9.4.9
CentOS 6.8
and till today we've had beautiful time with our replication between our two +- identical servers. But today I ran VACUUM FULL (on the master of course) which destroyed replication (as expected), but that was not supposed to be a big deal as we have "turned" the replication off and back on for so many times before. But this time it was different.
After executing our many-times-proved script (basically pg_start_backup(), full rsync of data/ directory (with some conf exludes) and pg_stop_backup()), the synchronization looked ok, but the slave DB has become no longer (RO-) accessible by psql. The error reads:
psql: FATAL: could not open file "global/12745": No such file or
directory
After a couple of re-runs I gave up and created empty global/12745 to see what's going to happen, but instead I am always getting
psql: FATAL: role "postgres" does not exist
Actually it seems, that no role we have on the master exists for the slave DB, and this is still true even after disabling replication.
So now, I have no idea how even to access the slave database.
At the same time, the master DB has no such issue, and "postgres" (or any other user we have) is functioning there perfectly.
I did many attempts, including complete removal of /var/lib/pgsql/9.4 directory and reinstall of rpms with initdb. (Fresh empty DB works fine on the slave.)
What could have gone wrong? Have my primary DB became somehow "non-replicable" anymore? That'd be pitty, as this is our primary mean of backup.
Any help is greatly appreciated. Thanks a lot.
Related
I've followed the instructions to set up a replica server of PostgreSQL with repmgr, But I can't start the PostgreSQL service because of a permission problem.
On the stand by server, I have this on my /etc/repmgr.conf file:
node_id=2
node_name=aws-replica
conninfo='host=<REDACTED> user=repmgr dbname=repmgr port=5432'
data_directory='/mnt/data/postgres/data'
log_file='/var/log/repmgr.log'
As you can see, I've changed the location of the data directory to /mnt/data/postgres/data and also updated the postgresql.conf file with the same information.
When I try to start the PostgreSQL service, I get this error on journalctl:
FATAL: data directory "/mnt/data/postgres/data" has wrong ownership
HINT: The server must be started by the user that owns the data directory.
The folder in question is owned by a normal user, the same one that runs repmgr. If I set the ownership of the folder to postgres:postgres, then repmgr can't perform any operations because it can't access the folder. I tried joining the postgres group, but the service won't start unless the folder has 700 permissions, so it's pointless to join the group.
So, I'm either not being able to start postgresql or repmgr. What can I do to make it work?
The problem was that I created a postgres DB using innitdb, which wasn't necessary. I deleted the database and then ran repmgr as the postgres user, and it worked.
My team and I are currently experiencing an issue where we can't connect to Cloud SQL's Postgres instance(s) from anything other than the psql cli tool. We get a too many connections for database "postgres" error (in PGAdmin, DBeaver, and our node typeorm/pg backend). It initially happened on our (only) Postgres database instance. After restarting, stopping and starting again, increasing machine CPU/memory proved to do nothing, I deleted the database instance entirely and created a new one from scratch.
However, after a few hours the problem came back. I know that we're not actually having too many connections as I am able to query pg_stat_activity from psql command line and see the following:
Only one of those (postgres username) connections is ours.
My coworker also can't connect at all - not even from psql cli.
If it matters, we are using PostgreSQL 13, europe-west2 (London), single zone availability, db-g1-small instance with 1.7GB memory, 10GB HDD, and we have public IP enabled and the correct IP addresses whitelisted.
I'd really appreciate if anyone has any insights into what's causing this.
EDIT: I further increased the instance size (to no longer be a shared core), and I managed to successfully connect my backend to it. However my psql cli no longer works - it appears that only the first client to connect is allowed to connect after a restart (even if it disconnects, other clients can't connect...).
From the error message, it is clear that the database "postgres" has a custom connection limit (set, for example, by ALTER DATABASE postgres CONNECTION LIMIT 1). And apparently, it is quite small. Why is everyone try to connect to that database anyway? Usually 'postgres' database is reserved for maintenance operations, and you should create other databases for daily use.
You can see the setting with:
select datconnlimit from pg_database where datname='postgres';
I don't know if the low setting is something you did, or maybe Google does it on its own for their cloud offering.
#jjanes had the right idea/mention.
I created another database within the Cloud SQL instance that wasn't named postgres and then it was fine.
It wasn't anything to do with maximum connection settings (as this was within Google Cloud SQL) or not closing connections (as TypeORM/pg does this already).
A slave database was set up some time ago for the purpose of backing up or replicating a remote database. However I can no longer write to the database using a Delphi based ETL (the ETL works for another database pair, but to date has never been used for this particular pair). The replication database was setup by somebody else who has since left the company. I am reasonably sure this has been setup as a replication database, however the employee who has since left told me that replication never worked for unrelated reasons. Using the ETL we can (using SQL queries) read from the one database, and write back to the replication database, Or should be able to, as it is currently read only.
I have tried:
Maintenance such as VACUUM
Attempt to drop tables and the entire database
Restore a full backup from the master database
None of these work, and I am told the database is read-only.
I have looked at postgresql.conf and see that hot_standby is checked, so I think (but am not 100% certain) that the database is in some sort of replication mode (I've never touched replication as supported by Postgres, so I wouldn't know).
I have checked permissions in pg_hba.conf and see there are some credentials in there for replication. I am not sure whether this activates "replication mode" for the database, or simply means these credentials are for replication only.
I have been through months worth of log files (This has not been working since our IT department upgraded the entire network about 5 months ago). I see the log file contents seen below, repeated over and over with nothing else for months. Note the IP address shown below is listed in the pg_hba.conf file, so credentials are valid.
The database is in recovery mode, as I have found by using:
select pg_is_in_recovery();
This explains to me why it's read only, but why can I not restore databases, or just simply dump the entire database and start again (it's a backup so losing/restoring it is not an issue)?
I was tempted to try modifying the recovery.conf file (which exists) but I read/believe that once recovery has been initiated (which in my case it has) modifying the file will have no effect.
I'm using a legacy version of Postgres: 9.2.9
Any help here would be greatly appreciated, as I have been working solidly on this for more than a day now.
Log File entry (sample):
FATAL: could not connect to the primary server:
FATAL: no pg_hba.conf entry for replication connection from host "192.168.20.2", user "postgres", SSL off
FATAL: could not connect to the primary server: server closed the connection unexpectedly
This probably means the server terminated abnormally before or while processing the request.
A couple of options would work for me:
Convert the database from being a read-only replication database, to a standard read/write database or
Dump/drop the entire database so I can create a new one with write capabilities.
It looks like the two database clusters have been set up for replication, but configuration changes on one of the machines broke the replication (changed pg_hba.conf on the primary, changed IP addresses, …).
Here is the way to your desired solutions:
Bringing the standby out of recovery mode: Run
/path/to/pg_ctl promote -D /path/to/data/directory
on the standby as operating system user postgres.
Nuking the standby: Remove the data directory on the standby with rm -rf (or the equivalent on your operating system). Kill all PostgreSQL processes.
Then use initdb to create a new database cluster in the same location.
I have tried to do database replication in linux 7.0 red hat using postgresql.
I refered this URL for Confuring http://blog.terminal.com/postgresql-replication-and-backup-methods/ I completed the the steps upto this step
Configuring the slave server
but the step
Initial replication
when I executed this command in Master
-bash-4.2$ psql -c "select pg_start_backup('initial_backup');"
I got Error Like this
ERROR: WAL level not sufficient for making an online backup
HINT: wal_level must be set to "archive" or "hot_standby" at server start.
So kindly let me know where we are wrong.
On your master PostgreSQL server's configuration file <PG_DATA>/postgresql.conf there is parameter called wal_level it should be set to either "archive" or "hot_standby"
Both said to postgres to keep WAL segmetnts for replication server. The difference between the two is rather simple:
"hot_standby" means that your replication server will allow connections for SELECT statements
"archive" means you don't want that possibility.
More to read in this tutorial on streaming replication
Also keep in mind that change of wal_level property needs server restart to take an effect.
I've been a MySQL guy, and now I'm working with Postgres so I am learning. Wondering if someone can tell me why my postgres process on my macbook is sending and receiving data over my network. I am just noticing this is happening for the first time - so maybe it's been going on before this and I just never noticed postgres does this.
What has me a bit nervous, is that I pulled down a production datadump from our server which is set up with replication and I imported it to my local postgres db. The settings in my postgresql.conf don't indicate replication is turned on. So it shouldn't be streaming out to anything, right?
If someone has some insight into what may be happening, or why postgres is sending/receiving packets, I'd love to hear the easy answer (and the complex one if there's more to what's happening).
This is a postgres install via Homebrew on MacOSX.
Thanks in advance!
Some final thoughts: It's entirely possible, I guess, that Mac's activity monitor also shows local 'network' traffic stats. Maybe this isn't going out to the internets.....
In short, I would not expect replication to be enabled for a DB that was dumped from a server that had it if the server to which it was restored had no replication configured at all.
More detail:
Normally, to get a local copy of a database in Postgres, one would do a pg_dump of the remote database (this could be done from your laptop, pointing at your server), followed by a createdb on your laptop to create the database stub and then a pg_restore pointed at the dump to populate its contents. [Edit: Re-reading your post, it seems like you may perhaps have done this, but meant that the dump you used had replication enabled.)]
That would be entirely local (assuming no connections into the DB from off-box), so long as you didn't explicitly setup any replication or anything else that would go off-box. Can you elaborate on what exactly you mean by importing with replication?
Also, if you're concerned about remote traffic coming from Postgres, try running this command a few times over the period of a minute or two (when you are seeing the traffic):
netstat | grep postgres
In general, replication in Postgres in configured at a server level, and has to do with things such as the master server shipping WAL files to the standby server (for streaming replication). You would have almost certainly have had to setup entries in postgresql.conf and pg_hba.conf to ensure that the standby server had access (such as a replication entry in the latter conf file). Assuming you didn't do steps such as this, I think it can pretty safely be concluded that there's no replication going on (especially in conjunction with double-checking via netstat).
You might also double-check the Postgres log to see if it's doing anything replication related. In a default install, that'd probably be in /var/log/postgresql (although I'm not 100% sure if Homebrew installs put it somewhere else).
If it's UDP traffic, to and from a high port, it's likely to be PostgreSQL's internal statistics collector.
These are pre-bound to prevent interference and should not be accessible outside of PostgreSQL.