PostgreSQL: even read access changes data files disk leading to large incremental backups using pgbackrest

PostgreSQL: even read access changes data files disk leading to large incremental backups using pgbackrest - postgresql

We are using pgbackrest to backup our database to Amazon S3. We do full backups once a week and an incremental backup every other day.
Size of our database is around 1TB, a full backup is around 600GB and an incremental backup is also around 400GB!
We found out that even read access (pure select statements) on the database has the effect that the underlying data files (in /usr/local/pgsql/data/base/xxxxxx) change. This results in large incremental backups and also in very large storage (costs) on Amazon S3.
Usually the files with low index names (e.g. 391089.1) change on read access.
On an update, we see changes in one or more files - the index could correlate to the age of the row in the table.
Some more facts:
Postgres version 13.1
Database is running in docker container (docker version 20.10.0)
OS is CentOS 7
We see the phenomenon on multiple servers.
Can someone explain, why postgresql changes data files on pure read access?
We tested on a pure database without any other resources accessing the database.

This is normal. Some cases I can think of right away are:
a SELECT or other SQL statement setting a hint bit
This is a shortcut for subsequent statements that access the data, so they don't have t consult the commit log any more.
a SELECT ... FOR UPDATE writing a row lock
autovacuum removing dead row versions
These are leftovers from DELETE or UPDATE.
autovacuum freezing old visible row versions
This is necessary to prevent data corruption if the transaction ID counter wraps around.
The only way to fairly reliably prevent PostgreSQL from modifying a table in the future is:
never perform an INSERT, UPDATE or DELETE on it
run VACUUM (FREEZE) on the table and make sure that there are no concurrent transactions

Related

How to see changes in a postgresql database

My postresql database is updated each night.
At the end of each nightly update, I need to know what data changed.
The update process is complex, taking a couple of hours and requires dozens of scripts, so I don't know if that influences how I could see what data has changed.
The database is around 1 TB in size, so any method that requires starting a temporary database may be very slow.
The database is an AWS instance (RDS). I have automated backups enabled (these are different to RDS snapshots which are user initiated). Is it possible to see the difference between two RDS automated backups?

I do not know if it is possible to see difference between RDS snapshots. But in the past we tested several solutions for similar problem. Maybe you can take some inspiration from it.
Obvious solution is of course auditing system. This way you can see in relatively simply way what was changed. Depending on granularity of your auditing system down to column values. Of course there is impact on your application due auditing triggers and queries into audit tables.
Another possibility is - for tables with primary keys you can store values of primary key and 'xmin' and 'ctid' hidden system columns (https://www.postgresql.org/docs/current/static/ddl-system-columns.html) for each row before updated and compare them with values after update. But this way you can identify only changed / inserted / deleted rows but not changes in different columns.
You can make streaming replica and set replication slots (and to be on the safe side also WAL log archiving ). Then stop replication on replica before updates and compare data after updates using dblink selects. But these queries can be very heavy.

Slow insert and update commands during mysql to redshift replication

I am trying to make a replication server from MySQL to redshift, for this, I am parsing the MySQL binlog. For initial replication, I am taking the dump of the mysql table, converting it into a CSV file and uploading the same to S3 and then I use the redshift copy command. For this the performance is efficient.
After the initial replication, for the continuous sync when I am reading the binlog the inserts and updates have to be run sequentially which are very slow.
Is there anything that can be done for increasing the performance?
One possible solution that I can think of is to wrap the statements in a transaction and then send the transaction at once, to avoid multiple network calls. But that would not address the problem that single update and insert statements in redshift run very slow. A single update statement is taking 6s. Knowing the limitations of redshift (That it is a columnar database and single row insertion will be slow) what can be done to work around those limitations?
Edit 1:
Regarding DMS: I want to use redshift as a warehousing solution which just replicates our MYSQL continuously, I don't want to denormalise the data since I have 170+ tables in mysql. During ongoing replication, DMS shows many errors multiple times in a day and fails completely after a day or two and it's very hard to decipher DMS error logs. Also, When I drop and reload tables, it deletes the existing tables on redshift and creates and new table and then starts inserting data which causes downtime in my case. What I wanted was to create a new table and then switch the old one with new one and delete old table

Here is what you need to do to get DMS to work
1) create and run a dms task with "migrate and ongoing replication" and "Drop tables on target"
2) this will probably fail, do not worry. "stop" the dms task.
3) on redshift make the following changes to the table
Change all dates and timestamps to varchar (because the options used
by dms for redshift copy cannot cope with '00:00:00 00:00' dates that
you get in mysql)
change all bool to be varchar - due to a bug in dms.
4) on dms - modify the task to "Truncate" in "Target table preparation mode"
5) restart the dms task - full reload
now - the initial copy and ongoing binlog replication should work.
Make sure you are on latest replication instance software version
Make sure you have followed the instructions here exactly
http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html
If your source is aurora, also make sure you have set binlog_checksum to "none" (bad documentation)

Best way to backup and restore data in PostgreSQL for testing

I'm trying to migrate our database engine from MsSql to PostgreSQL. In our automated test, we restore the database back to "clean" state at the start of every test. We do this by comparing the "diff" between the working copy of the database with the clean copy (table by table). Then copying over any records that have changed. Or deleting any records that have been added. So far this strategy seems to be the best way to go about for us because per test, not a lot of data is changed, and the size of the database is not very big.
Now I'm looking for a way to essentially do the same thing but with PostgreSQL. I'm considering doing the exact same thing with PostgreSQL. But before doing so, I was wondering if anyone else has done something similar and what method you used to restore data in your automated tests.
On a side note - I considered using MsSql's snapshot or backup/restore strategy. The main problem with these methods is that I have to re-establish the db connection from the app after every test, which is not possible at the moment.

If you're okay with some extra storage, and if you (like me) are particularly not interested in re-inventing the wheel in terms of checking for diffs via your own code, you should try creating a new DB (per run) via templates feature of createdb command (or CREATE DATABASE statement) in PostgreSQL.
So for e.g.
(from bash) createdb todayDB -T snapshotDB
or
(from psql) CREATE DATABASE todayDB TEMPLATE snaptshotDB;
Pros:
In theory, always exact same DB by design (no custom logic)
Replication is a file-transfer (not DB restore). So far less time taken (i.e. doesn't run SQL again, doesn't recreate indexes / restore tables etc.)
Cons:
Takes 2x the disk space (although template could be on a low performance NFS etc)

For my specific situation. I decided to go back to the original solution. Which is to compare the "working" copy of the database with "clean" copy of the database.
There are 3 types of changes.
For INSERT records - find max(id) from clean table and delete any record on working table that has higher ID
For UPDATE or DELETE records - find all records in clean table EXCEPT records found in working table. Then UPSERT those records into working table.

postgres copy database to another server reduces database size

Installed postgres 9.1 in both the machine.
Initially the DB size is 7052 MB then i used the following command for copy to another server.
pg_dump -C dbname | bzip2 | ssh remoteuser#remotehost "bunzip2 | psql dbname"
After successfully copies, In destination machine i check size it shows 6653 MB.
then i checked for table count its same.
Has there been data loss? Is there missing data?
Note:
Two machines have same hardware and software configuration.
i used:
SELECT pg_size_pretty(pg_database_size('dbname'));

One of the PostgreSQL's most sophisticated features is so called Multi-Version Concurrency Control (MVCC), a standard technique for avoiding conflicts between reads and writes of the same object in database. MVCC guarantees that each transaction sees a consistent view of the database by reading non-current data for objects modified by concurrent transactions. Thanks to MVCC, PostgreSQL has great scalability, a robust hot backup tool and many other nice features comparable to the most advanced commercial databases.
Unfortunately, there is one downside to MVCC, the databases tend to grow over time and sometimes it can be a problem. In recent versions of PostgreSQL there is a separate server process called the autovacuum daemon (pg_autovacuum), whose purpose is to keep the database size reasonable. It does that by trying to recover reusable chunks of the database files. Still, there are many scenarios that will force the database to grow, even if the amount of the useful data in it doesn't really change. That happens typically if you have lots of UPDATE and/or DELETE statements in the applications that are using the database.
When you do a COPY, you recover extraneous space and so your copied DB appears smaller.

That looks normal. Databases are often smaller after restore, because a newly created b-tree index is more compact than one that's been progressively built by inserts. Additionally, UPDATEs and DELETEs leave empty space in the tables.
So you have nothing to worry about. You'll find that if you diff an SQL dump from the old DB and a dump taken from the just-restored DB, they'll be the same except for comments.

is it possible to fork a mysqldump of data?

I am restoring a mysql database with perl on a remote server with about 30 million records. It's taking > 2 days & looking at my network connections I am not fully utilizing my uplink bandwidth. I will need to do this at least 1x per week. Is there a way to fork a mysqldump (I'm using perl) so that I can take full advantage of my bandwidth (I don't mind if I'm choked off for a bit...I just need to get this done faster).

Can't you upload the whole dump to the remote server and start the restore there?

A restore of a mysqldump is just the execution of a long series of commands that would restore your database from scratch. If the execution path for that is; 1) send command 2) remote system executes command 3) remote system replies that the command is complete 4) send next command, then you are spending most of your time waiting on network latency.
I do know that most SQL hosts will allow you to upload a dump file specifically to avoid the kinds of restore time that you're talking about. The company that takes my money each month even has a web-based form that you can use to restore a database from a file that has been uploaded via sftp. Poke around your hosting service's documentation. They should have something similar. If nothing else (and you're comfortable on the command line) you can upload it directly to your account and do it from a shell there.

mk-parallel-dump and mk-parallel-restore are designed to do what you want, but in my testing mk-parallel-dump was actually slower than plain old mysqldump. Your mileage may vary.
(I would guess the biggest factor would be the number of spindles your data files reside on, which in my case, 1, was not especially conducive to parallelization.)
First caveat: mk-parallel-* writes a bunch of files, and figuring out when it's safe to start sending them (and when you're done receiving them) may be a little tricky. I believe that's left as an exercise for the reader, sorry.
Second caveat: mk-parallel-dump is specifically advertised as not being for backups. Because "At the time of this release there is a bug that prevents --lock-tables from working correctly," it's really only useful for databases that you know will not change, e.g., a slave that you can STOP SLAVE on with no repercussions, and then START SLAVE once mk-parallel-dump is done.
I think a better solution than parallelizing a dump may be this:
If you're doing your mysqldump on a weekly basis, you can just do it once (dumping with --single-transaction (which you should be doing anyway) and --master-data=n) and then start a slave that connects over an ssh tunnel to the remote master, so the slave is continually updated. The disadvantage is that if you want to clone a local copy (perhaps to make a backup) you will need enough disk to keep an extra copy around. The advantage is that a week's worth of (query-based) replication log is probably quite a bit smaller than resending the data, and also it arrives gradually so you don't clog your pipe.

How big is your database in total? What kind of tables are you using?
A big risk with backups using mysqldump has to do with table locking, and updates to tables during the backup process.
The mysqldump backup process basically works as follows:
For each table {
Lock table as Read-Only
Dump table to disk
Unlock table
}
The danger is that if you run an INSERT/UPDATE/DELETE query that affects multiple tables while your backup is running, your backup may not capture the results of your query properly. This is a very real risk when your backup takes hours to complete and you're dealing with an active database. Imagine - your code runs a series of queries that update tables A,B, and C. The backup process currently has table B locked.
The update to A will not be captured, as this table was already backed up.
The update to B will not be captured, as the table is currently locked for writing.
The update to C will be captured, because the backup has not reached C yet.
This is an easy way to destroy referential integrity in your database.
Your backup process needs to be atomic, and transactional. If you can't shut down the entire database to writes during the backup process, you're risking disaster.
Also - there must be something wrong here. At a previous company, we were running nightly backups of a 450G Mysql DB (largest table had 150M rows), and it took less than 6 hours for the backup to complete.
Two thoughts:
Do you have a slave database? Run the backup from there - Stop replication (preventing RW risk), run the backup, restart replication.
Are your tables using InnoDB? Consider investing in InnoDBhotbackup, which solves this problem, as the backup process leverages the journaling that is part of the InnoDB storage engine.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse