SQL Live Backup Over Intermittent Connection - postgresql

I have a few PCs that have local PostgreSQL databases running, just logging data. Data is only ever inserted, never removed or updated. The remote PCs are connected to the internet by cellular modem and depending on their location, often do not have internet access. When they do have an internet connection I would like them to push a copy of their databases to a central location and keep the remote database up to date with any new data. Essentially, I need an 'rsync' for databases.
At first it seemed like what I need is to set up PostgreSQL Hot-Standby but I'm unsure if this is actually what I need because my situation seems to differ from the examples I've seen.
Each remote PC has a Postgres server with a single database that has a unique name, the tables within the DBs have generic names. I would like to synchronize these databases to a single remote Postgres server. I think this should be okay due to the unique DB names.
My connectivity is very intermittent, days to weeks without a connection. I've seen PgAdmin be very reliable despite a terrible (cellular) internet connection, if Postges Hot-Standby is the same I may be alright.
As far as I can see my options are either to set up PostgreSQL Hot-Standby, or roll my own solution. I don't want to roll my own solution. However it is simple enough if I can't find anything better; a Python daemon run by systemd to find the diff between the local and remote DB, then push the new rows from the local to the remote DB. But I'm sure someone has solved this problem, I just haven't found the solution yet.

You don't need hot standby (which is the PostgreSQL term for being able to query the replicated database), but streaming replication. You need a central standby server for each intermittently connected remote database server. If you use replication slots, you can be sure that replication will never fall behind.

Related

Synchronize local changes to remote database

I got a requirement to setup 2-way data sync between main remote Postgresql DB and a local one. Main DB is used for multi-tenant access, and local db is used only by one tenant. So I need to sync only this tenant data. Local DB is installed on tenant's site and should be used only when internet connection is down. So, when there is internet connection, use remote DB and sync all the changes to local. When internet is down, switch to local DB. When internet is back online, sync local changes to remote.
I tried to figure out how to set this up, but it seems that Postgres replication isn't suitable for this case.
I need a tool that is capable of partial table sync both ways. Should I, maybe, consider changing DB design?

"error: too many connections for database 'postgres'" when trying to connect to any Postgres 13 instance

My team and I are currently experiencing an issue where we can't connect to Cloud SQL's Postgres instance(s) from anything other than the psql cli tool. We get a too many connections for database "postgres" error (in PGAdmin, DBeaver, and our node typeorm/pg backend). It initially happened on our (only) Postgres database instance. After restarting, stopping and starting again, increasing machine CPU/memory proved to do nothing, I deleted the database instance entirely and created a new one from scratch.
However, after a few hours the problem came back. I know that we're not actually having too many connections as I am able to query pg_stat_activity from psql command line and see the following:
Only one of those (postgres username) connections is ours.
My coworker also can't connect at all - not even from psql cli.
If it matters, we are using PostgreSQL 13, europe-west2 (London), single zone availability, db-g1-small instance with 1.7GB memory, 10GB HDD, and we have public IP enabled and the correct IP addresses whitelisted.
I'd really appreciate if anyone has any insights into what's causing this.
EDIT: I further increased the instance size (to no longer be a shared core), and I managed to successfully connect my backend to it. However my psql cli no longer works - it appears that only the first client to connect is allowed to connect after a restart (even if it disconnects, other clients can't connect...).
From the error message, it is clear that the database "postgres" has a custom connection limit (set, for example, by ALTER DATABASE postgres CONNECTION LIMIT 1). And apparently, it is quite small. Why is everyone try to connect to that database anyway? Usually 'postgres' database is reserved for maintenance operations, and you should create other databases for daily use.
You can see the setting with:
select datconnlimit from pg_database where datname='postgres';
I don't know if the low setting is something you did, or maybe Google does it on its own for their cloud offering.
#jjanes had the right idea/mention.
I created another database within the Cloud SQL instance that wasn't named postgres and then it was fine.
It wasn't anything to do with maximum connection settings (as this was within Google Cloud SQL) or not closing connections (as TypeORM/pg does this already).

How to exit out of database recovery mode (currently locked in read-only mode)

A slave database was set up some time ago for the purpose of backing up or replicating a remote database. However I can no longer write to the database using a Delphi based ETL (the ETL works for another database pair, but to date has never been used for this particular pair). The replication database was setup by somebody else who has since left the company. I am reasonably sure this has been setup as a replication database, however the employee who has since left told me that replication never worked for unrelated reasons. Using the ETL we can (using SQL queries) read from the one database, and write back to the replication database, Or should be able to, as it is currently read only.
I have tried:
Maintenance such as VACUUM
Attempt to drop tables and the entire database
Restore a full backup from the master database
None of these work, and I am told the database is read-only.
I have looked at postgresql.conf and see that hot_standby is checked, so I think (but am not 100% certain) that the database is in some sort of replication mode (I've never touched replication as supported by Postgres, so I wouldn't know).
I have checked permissions in pg_hba.conf and see there are some credentials in there for replication. I am not sure whether this activates "replication mode" for the database, or simply means these credentials are for replication only.
I have been through months worth of log files (This has not been working since our IT department upgraded the entire network about 5 months ago). I see the log file contents seen below, repeated over and over with nothing else for months. Note the IP address shown below is listed in the pg_hba.conf file, so credentials are valid.
The database is in recovery mode, as I have found by using:
select pg_is_in_recovery();
This explains to me why it's read only, but why can I not restore databases, or just simply dump the entire database and start again (it's a backup so losing/restoring it is not an issue)?
I was tempted to try modifying the recovery.conf file (which exists) but I read/believe that once recovery has been initiated (which in my case it has) modifying the file will have no effect.
I'm using a legacy version of Postgres: 9.2.9
Any help here would be greatly appreciated, as I have been working solidly on this for more than a day now.
Log File entry (sample):
FATAL: could not connect to the primary server:
FATAL: no pg_hba.conf entry for replication connection from host "192.168.20.2", user "postgres", SSL off
FATAL: could not connect to the primary server: server closed the connection unexpectedly
This probably means the server terminated abnormally before or while processing the request.
A couple of options would work for me:
Convert the database from being a read-only replication database, to a standard read/write database or
Dump/drop the entire database so I can create a new one with write capabilities.
It looks like the two database clusters have been set up for replication, but configuration changes on one of the machines broke the replication (changed pg_hba.conf on the primary, changed IP addresses, …).
Here is the way to your desired solutions:
Bringing the standby out of recovery mode: Run
/path/to/pg_ctl promote -D /path/to/data/directory
on the standby as operating system user postgres.
Nuking the standby: Remove the data directory on the standby with rm -rf (or the equivalent on your operating system). Kill all PostgreSQL processes.
Then use initdb to create a new database cluster in the same location.

Why is Postgres sending data somewhere? [duplicate]

I've been a MySQL guy, and now I'm working with Postgres so I am learning. Wondering if someone can tell me why my postgres process on my macbook is sending and receiving data over my network. I am just noticing this is happening for the first time - so maybe it's been going on before this and I just never noticed postgres does this.
What has me a bit nervous, is that I pulled down a production datadump from our server which is set up with replication and I imported it to my local postgres db. The settings in my postgresql.conf don't indicate replication is turned on. So it shouldn't be streaming out to anything, right?
If someone has some insight into what may be happening, or why postgres is sending/receiving packets, I'd love to hear the easy answer (and the complex one if there's more to what's happening).
This is a postgres install via Homebrew on MacOSX.
Thanks in advance!
Some final thoughts: It's entirely possible, I guess, that Mac's activity monitor also shows local 'network' traffic stats. Maybe this isn't going out to the internets.....
In short, I would not expect replication to be enabled for a DB that was dumped from a server that had it if the server to which it was restored had no replication configured at all.
More detail:
Normally, to get a local copy of a database in Postgres, one would do a pg_dump of the remote database (this could be done from your laptop, pointing at your server), followed by a createdb on your laptop to create the database stub and then a pg_restore pointed at the dump to populate its contents. [Edit: Re-reading your post, it seems like you may perhaps have done this, but meant that the dump you used had replication enabled.)]
That would be entirely local (assuming no connections into the DB from off-box), so long as you didn't explicitly setup any replication or anything else that would go off-box. Can you elaborate on what exactly you mean by importing with replication?
Also, if you're concerned about remote traffic coming from Postgres, try running this command a few times over the period of a minute or two (when you are seeing the traffic):
netstat | grep postgres
In general, replication in Postgres in configured at a server level, and has to do with things such as the master server shipping WAL files to the standby server (for streaming replication). You would have almost certainly have had to setup entries in postgresql.conf and pg_hba.conf to ensure that the standby server had access (such as a replication entry in the latter conf file). Assuming you didn't do steps such as this, I think it can pretty safely be concluded that there's no replication going on (especially in conjunction with double-checking via netstat).
You might also double-check the Postgres log to see if it's doing anything replication related. In a default install, that'd probably be in /var/log/postgresql (although I'm not 100% sure if Homebrew installs put it somewhere else).
If it's UDP traffic, to and from a high port, it's likely to be PostgreSQL's internal statistics collector.
These are pre-bound to prevent interference and should not be accessible outside of PostgreSQL.

Mirror one database to another in PostgreSQL

I know the way to set up a Master/Slave DB in Postgres is having 2 DB servers, but unfortunately i have only one server for now.
How can i mirror my production db into another "backup db" in "real_time"? I want to give access to another user to the mirrored db, so even if he does something there it will not affect production.
Nothing stops you setting up hot standby streaming replication, or another replication option like Londiste, between two PostgreSQL instances on the same computer.
The two copies of PostgreSQL must use different ports, but that's the only real restriction.
How to set up the second PostgreSQL instance depends on your operating system and how you installed PostgreSQL, which you have not mentioned.
You'll want to use streaming replication with hot standby if you want a read-only replica. If you want it to be read/write, then you can do a one-off copy of the database with pg_basebackup and not keep them in sync after that. Or you can use a tool like Londiste to replicate changes selectively.
You can run multiple instances of PostgreSQL on the same computer, by using different ports.