Forgot to make a backup. Now I have harddrive with databases and new system with empty postgres. Can I somehow restore databases? by simple copy of files etc?
If you have the full data directory of your old postgresql system (and if it was the same version, or differing only in a revision number) you can just try to put it in place of your data directory in your new postgresql installation. (Of course, stop postgres server before doing this).
It's basically the same procedure used when upgrading postgresql, when there is no need to do backup-restore.
Edit: As pointed out in the comments, I assume not only same (or almost same) version, but same architecture (32 - 64 bits , Linux - Windows, etc)
In addition to the leonbloy's answer, you could try pg_migrator, especially if you need to upgrade from 8.3 to 8.4 (and 9.0 eventually).
In your case you have the files, but if you haven't, Maybe, only maybe, you can do something with the logs of the database, you can try to see the log of the statements in the database normally in /var/log/postgresql/postgresql.log, if it is there or close to it, and if log_statements = 'mod' or 'all' is set up before, you can recovery some of your data.
Table by table, by searching by insert into in this tables in all or recent history of database. You can cut text with some Unix tools to get only the statements and put a ";" at the end of each statement, and another important queries like delete, etc.
But you must to do it table by table, and data must be there, and database don't runned up too much time without backups.
In certain cases you just need the last operation or something like this to save the day.
This, however, its just for Apolo 13 disasters moment and never can replace a good backup.
Related
We are using pgbackrest to backup our database to Amazon S3. We do full backups once a week and an incremental backup every other day.
Size of our database is around 1TB, a full backup is around 600GB and an incremental backup is also around 400GB!
We found out that even read access (pure select statements) on the database has the effect that the underlying data files (in /usr/local/pgsql/data/base/xxxxxx) change. This results in large incremental backups and also in very large storage (costs) on Amazon S3.
Usually the files with low index names (e.g. 391089.1) change on read access.
On an update, we see changes in one or more files - the index could correlate to the age of the row in the table.
Some more facts:
Postgres version 13.1
Database is running in docker container (docker version 20.10.0)
OS is CentOS 7
We see the phenomenon on multiple servers.
Can someone explain, why postgresql changes data files on pure read access?
We tested on a pure database without any other resources accessing the database.
This is normal. Some cases I can think of right away are:
a SELECT or other SQL statement setting a hint bit
This is a shortcut for subsequent statements that access the data, so they don't have t consult the commit log any more.
a SELECT ... FOR UPDATE writing a row lock
autovacuum removing dead row versions
These are leftovers from DELETE or UPDATE.
autovacuum freezing old visible row versions
This is necessary to prevent data corruption if the transaction ID counter wraps around.
The only way to fairly reliably prevent PostgreSQL from modifying a table in the future is:
never perform an INSERT, UPDATE or DELETE on it
run VACUUM (FREEZE) on the table and make sure that there are no concurrent transactions
I often have to execute complex sql scripts in a single transaction on a large PostgreSQL database and I would like to verify everything that was changed during the transaction.
Verifying each single entry on each table "by hand" would take ages.
Dumping the database before and after the script to plain sql and using diff on the dumps isn't really an option since each dump would be about 50G of data.
Is there a way to show all the data that was added, deleted or modified during a single transaction?
Dude, What are you looking for is the most searchable thing on the internet when it comes to capturing Database changes. It is a kind of version control we can say.
But as long as I know, sadly there are no in-built approaches are available in PostgreSQL or MySql. But you can overcome it by setting/adding some triggers for your most usable operations.
You can create some backup schemas, and tables to capture your changes that are changed(updated), created, or deleted.
In this way you can achieve what you want. I know this process is fully manual, But really effective.
If you need to analyze the script's behaviour only sporadically, then the easiest approach would be to change server configuration parameter log_min_duration_statement to 0 and then back to any value it had before the analysis. Then all of the script activity will be written to the instance log.
This approach is not suitable if your storage is not prepared to accommodate this amount of data, or for systems in which you don't want sensitive client data to be written to a plain-text log file.
Background of the issue (could be irrelevent, but only relation to these issues makes sense for me)
In our production environment, disk space had run out. (We do have monitoring and notifications for this, but no-one read them - the classical)
Anyway, after fixing the issue Postgresql PostgreSQL 9.4.17 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit has shown couple of weird behaviors.
1. Unique indexes
I have couple of (multi column) unique indexes specified for the database, but they do not appear to be functioning. However I can find duplicate rows from the database.
2. Sorting based on date
We got one table which is basically just logging some json data. We got three columns: id, json and insertedAt DEFAULT NOW(). If I do simple query where I try to sort based on the insertedAt column, the sorting doesn't work around the area of disk overflow. All of the data is valid and readable, but order is invalid
3. Db dumps / backups are having some corruption.
Again, when I was browsing this logging data and tried to recover a backup to my local machine for better observation it gave an error around some random row. When I examined the sql-file with text editor, I encountered that data was otherwise valid expect that it was missing some semicolons on some rows. I think I'll give a try shortly for the never backup if it's still having the same error or if it was random issue with the one backup I tried playing with.
I've tried the basic ones: restarting the machine and PG process.
I'm trying to migrate our database engine from MsSql to PostgreSQL. In our automated test, we restore the database back to "clean" state at the start of every test. We do this by comparing the "diff" between the working copy of the database with the clean copy (table by table). Then copying over any records that have changed. Or deleting any records that have been added. So far this strategy seems to be the best way to go about for us because per test, not a lot of data is changed, and the size of the database is not very big.
Now I'm looking for a way to essentially do the same thing but with PostgreSQL. I'm considering doing the exact same thing with PostgreSQL. But before doing so, I was wondering if anyone else has done something similar and what method you used to restore data in your automated tests.
On a side note - I considered using MsSql's snapshot or backup/restore strategy. The main problem with these methods is that I have to re-establish the db connection from the app after every test, which is not possible at the moment.
If you're okay with some extra storage, and if you (like me) are particularly not interested in re-inventing the wheel in terms of checking for diffs via your own code, you should try creating a new DB (per run) via templates feature of createdb command (or CREATE DATABASE statement) in PostgreSQL.
So for e.g.
(from bash) createdb todayDB -T snapshotDB
or
(from psql) CREATE DATABASE todayDB TEMPLATE snaptshotDB;
Pros:
In theory, always exact same DB by design (no custom logic)
Replication is a file-transfer (not DB restore). So far less time taken (i.e. doesn't run SQL again, doesn't recreate indexes / restore tables etc.)
Cons:
Takes 2x the disk space (although template could be on a low performance NFS etc)
For my specific situation. I decided to go back to the original solution. Which is to compare the "working" copy of the database with "clean" copy of the database.
There are 3 types of changes.
For INSERT records - find max(id) from clean table and delete any record on working table that has higher ID
For UPDATE or DELETE records - find all records in clean table EXCEPT records found in working table. Then UPSERT those records into working table.
On my PostgreSQL 8.0 database, I started receiving a "ERROR: could not open relation 1663/17269/16691: No such file or directory" message, and now my data is inaccessible.
Any ideas on how to recover at least some of the data? Professional support is an option.
Regards.
RP
If you want your data back in a hurry and it's worth something to you, then the professional support option should be simple enough.
Some things to check, now that you've got a full backup of all your database (that's base, pg_clog, pg_xlog and all the other folders at that level).
Does that file actually exist? It might be a permissions problem rather than the file actualy going missing.
Check your anti-virus/security packages - have they mistakenly quarantined the file? If you can exclude PostgreSQL's database directories from scans/active scans that's worthwhile too.
Make a note of everything you can remember about when this happened and what happened just before. This will help with troubleshooting for you or a consultant.
Check the logs likewise - this error will be logged, find the first occurrence and see if there's anything odd before.
Double-check you really do have all your existing files backed up, and restart PostgreSQL.
Try connecting as user postgres to database postgres or database template1. If that works then the file is one of your database files rather than the global list of users or some such.
Try creating an empty file with the right name (and permissions - check the other files). If you are really lucky it's just an index. Otherwise it could be a data table you can live without. Then you can dump other tables individually.
OK - if you're here then you can connect to your DB. Those numbers in the file-path are PostgreSQL's OIDs identifying system objects. You can try a couple of useful queries here. These two queries should give you the IDs of the databases and then the object with the missing file. This is useful information for your professional too.
SELECT oid, datname, dattablespace FROM pg_database;
SELECT * FROM pg_class WHERE relfilenode = 16691;
Remember make sure you have the filesystem backup before tinkering.