PostgreSQL: invalid page header in block - postgresql

I'm getting an Error
ERROR: invalid page header in block 411 of relation "t_value_time"
in my PostgreSQL database. This keeps happening on different machines. Is there a way to prevent it from happening, or at least telling PSQL to ignore the data on the invalid block and move on?
I'd rather lose the data from the block and have him skip over it, reading the rest of the data. Is there a way to tell PSQL to skip this block?

WARNING: You will lose some data!
We managed to get over it (crashed DEV VM) by issuing:
database=# SET zero_damaged_pages = on;
SET
database=# VACUUM FULL damaged_table;
WARNING: invalid page header in block xxx of relation base/yyy/zzz; zeroing out page
[..]
REINDEX TABLE damaged_table;
Fix via pwkg.ork.

Same block every time?
From what I've read, the most common cause of invalid blocks is hardware. Red Hat has a utility, pg_filedump, that formats "PostgreSQL heap, index, and control files into a human-readable form". I don't think they support any PostgreSQL version greater than 8.4.0, but I could be wrong.
You want to prove your hardware is good by using tough, thorough disk, RAM, and NIC diagnostics.

There's no simple way to do it, but it's reasonably easy to do just by editing the data file directly (relfilenode of the pg_class entry gives the filename).
Just copy a block from elsewhere in the file over the bad block. Ideally, synthesise an empty block or update the one you're overwriting to have no valid tuples in it.
Once you've got something that doesn't produce that error, dump the table and reload it for safety.

these are almost always hardware problems btw. Verify and test RAM, disk, CPU. Make sure your environment is good (bad power input can cause problems as can overheating). That's the best way to prevent it. Best way to address it is point in time recovery from a base backup.

If you have a slave, set hot_standby_feedback to 'on' on it if not already.
Do pg_dump and write it to /dev/null so that you don't consume any space.
nohup pg_dump db_name -v -Fc -f /dev/null &
If the dump succeeds, then your slave is fine. Do a failover. There will be no data loss.
Another way to validate your slave is to do, explain select count(*) from table_name;
If it succeeds and if it is using a sequence scan, then your slave is good.
You may not have to consider this option if it is using index scan.
Note: This works only if your master is affected with storage level corruption.
I happened to face same issue just today and i was able to fix it.

Related

How to ignore invalid byte sequences when doing a pg_dump

I need a way to just ignore all invalid characters when doing dumps.
I get over a thousand tables from hundreds of sources on a regular basis and much of it is of questionable quality. The way I am getting data into the database sidesteps any incorrect or invalid encoding. I have no need to repair any of them.
But when I am backing up my database pg_dump chokes on every one of them. It seemed like this was not a problem before I moved to version 9.6.
I have three choices. Go fix each table when it chokes. Run a process to strip all non-ascii stuff from every field in every table. Both take time and add a step to my backup procedure.
Or I can find a switch that just tells pg_dump to stop being so picky. Any way to do this?
PostgreSQL tries to be 100% exact about encoding, and the checks have become tighter from version to version. That may explain what you observe.
PostgreSQL has no switch for pg_dump to disable encoding check.
If you don't care about encoding, you should use the database encoding SQL_ASCII, but that still won't allow zero bytes in strings (if you have got that).
You could use a physical backup with pg_basebackup which will solve your immediate problem, but expect more trouble along the road.

PostgreSQL - Recovery of Functions' code following accidental deletion of data files

So, I am (well... I was) running PostgreSQL within a container (Ubuntu 14.04LTS with all the recent updates, back-end storage is "dir" because of convince).
To cut the long story short, the container folder got deleted. Following the use of extundelete and ext4magic, I have managed to extract some of the database physical files (it appears as if most of the files are there... but not 100% sure if and what is missing).
I have two copies of the database files. One from 9.5.3 (which appears to be more complete) and one from 9.6 (I upgraded the container very recently to 9.6, however it appears to be missing datafiles).
All I am after is to attempt and extract the SQL code the relates to the user defined functions. Is anyone aware of an approach that I could try?
P.S.: Last backup is a bit dated (due to bad practices really) so it would be last resort if the task of extracting the needed information is "reasonable" and "successful".
Regards,
G
Update - 20/4/2017
I was hoping for a "quick fix" by somehow extracting the function body text off the recovered data files... however, nothing's free in this life :)
Starting from the old-ish backup along with the recovered logs, we managed to cover a lot of ground into bringing the DB back to life.
Lessons learned:
1. Do implement a good backup/restore strategy
2. Do not store backups on the same physical machine
3. Hardware failure can be disruptive... Human error can be disastrous!
If you can reconstruct enough of a data directory to start postgres in single user mode you might be able to dump pg_proc. But this seems unlikely.
Otherwise, if you're really lucky you'll be able to find the relation for pg_proc and its corresponding pg_toast relation. The latter will often contain compressed text, so searches for parts of variables you know appear in function bodies may not help you out.
Anything stored inline in pg_proc will be short functions, significantly less than 8k long. Everything else will be in the toast relation.
To decode that you have to unpack the pages to get the toast hunks, then reassemble them and uncompress them (if compressed).
If I had to do this, I would probably create a table with the exact same schema as pg_proc in a new postgres instance of the same version. I would then find the relfilenode(s) for pg_catalog.pg_proc and its toast table using the relfilenode map file (if it survived) or by pattern matching and guesswork. I would replace the empty relation files for the new table I created with the recovered ones, restart postgres, and if I was right, I'd be able to select from the tables.
Not easy.
I suggest reading up on postgres's storage format as you'll need to understand it.
You may consider https://www.postgresql.org/support/professional_support/ . (Disclaimer, I work for one of the listed companies).
P.S.: Last backup is a bit dated (due to bad practices really) so it would be last resort if the task of extracting the needed information is "reasonable" and "successful".
Backups are your first resort here.
If the 9.5 files are complete and undamaged (or enough so to dump the schema) then simply copying them in place, checking permissions and starting the server will get you going. Don't trust the data though, you'll need to check it all.
Although it is possible to partially recover given damaged files, it's a long complicated process and the fact that you are asking on Stack Overflow probably means it's not for you.

I TRUNCATEd a table. How do I get the data back?

In my postgesql database, unfortunately I truncate this table mail_group, and the table is delete from the database, how to I get back this table.
Kindly help me, waiting for reply.
Thanks
Anyone else in the same situation: immediately stop your database with pg_ctl stop -m immediate (the immediate is important, you need to simulate a crash and prevent a checkpoint) then do not restart it.. If you had concurrent transactions still in progress you might be really lucky and PostgreSQL might not have unlinked the backing files for the table yet, so it could maybe be recoverable.
You very likely can't get the data back, you deleted it. Restore from a backup.
A normal DELETE in PostgreSQL marks the rows as deleted but does not actually erase the data immediately, so it can often be recovered if you promptly stop the database and you don't write anything else to the table.
This is not the case for TRUNCATE. TRUNCATE deletes the underlying files that represent the database table from the file system.
Recovering the data, if possible at all, would require forensic analysis of your hard drive. If the data is truly important then power the computer off now and take a disk image of the hard drive. Expect recover work to cost multiple thousand dollars, if it is possible at all, since you will need someone who knows both (a) file system internals and (b) PostgreSQL internals. The only person I can think of who I know has the skills to possibly be able to do this would probably cost about €5000 to €10000 for the time required for this sort of work. (It isn't me).
If you didn't have backups you have just learned a very expensive lesson.
If someone else is reading this and DELETEd rows, please immediately follow the instructions in corruption since the first recovery steps are the same. This will not help if you ran TRUNCATE.

postgresql: Accidentally deleted pg_filenode.map

Is there any way to recover or re-create pg_filenode.map file that was accidentally deleted? Or is there any solution on how to fix this issue without affecting the database? Any suggestions to fix this issue is highly appreciated! The postgres version that we have is 9.0 running in Redhat Linux 5. Thanks!
STOP TRYING TO FIX ANYTHING RIGHT NOW. Everything you do risks making it worse.
Treat this as critical database corruption. Read and act on this wiki article.
Only once you have followed its advice should you even consider attempting repair or recovery.
Since you may have some hope of recovering the deleted file if it hasn't been overwritten yet, you should also STOP THE ENTIRE SERVER MACHINE or unmount the file system PostgreSQL is on and disk image it.
If this data is important to you I advise you to contact professional support. This will cost you, but is probably your best chance of getting your data back after a severe administrator mistake like this. See PostgreSQL professional support. (Disclaimer: I work for one of the listed companies as shown in my SO profile).
It's possible you could reconstruct pg_filenode.map by hand using information about the table structure and contents extracted from the on-disk tables. Probably a big job, though.
First, if this is urgent and valuable, I strongly recommend contacting professional support initially. However, if you can work on a disk image, if it is not time critical, etc. here are important points to note and how to proceed (we recently had to recover a bad pg_filenode.map. Moreover you are better off working on a disk image of a disk image.
What follows is what I learned from having to recover a damaged file due to an incomplete write on the containing directory. It is current to PostgreSQL 10, but that could change at any time
Before you begin
Data recovery is risky business. Always note what recovery means to your organization, what data loss is tolerable, what downtime is tolerable etc before you begin. Work on a copy of a copy if you can. If anything doesn't seem right, circle back, evaluate what went wrong and make sure you understand why before proceeding.
What this file is and what it does
The standard file node map for PostgreSQL is stored in the pg_class relation which is referenced by object id inside the Pg catalogs. Unfortunately you need a way to bootstrap the mappings of the system tables so you can look up this sort of informatuion.
In most deployments this file will never be written. It can be copied from a new initdb on the same version of Postgres with the same options passed to initdb aside from data directory. However this is not guaranteed.
Several things can change this mapping. If you do a vacuum full or similar on system catalogs, this can change the mapping from the default and then copying in a fresh file from an initdb will not help.
Some Things to Try
The first thing to try (on a copy of a copy!) is to replace the file with one from a fresh initdb onto another filesystem from the same server (this could be a thumb drive or whatever). This may work. It may not work.
If that fails, then it would be possible perhaps to use pg_filedump and custom scripting/C programming to create a new file based on efforts to look through the data of each relation file in the data directory. This would be significant work as Craig notes above.
If you get it to work
Take a fresh pg_dump of your database and restore it into a fresh initdb. This way you know everything is consistent and complete.

Executing VACUUM FULL on postgresql db which is stopped/down

I am attempting to restart a postgresql db which has stopped/is down and requires a VACUUM.
http://suwala.eu/blog/2010/10/09/how-to-vacuum-postgresql/
Following the above sequence of commands, I can't seem to get the last line to execute right.
$ postgres -D /var/lib/pgsql/data YOUR_DATABASE_NAME < /tmp/fix.sql
This gives me an error that says
postgres: invalid argument: "YOUR_DATABASE_NAME"
Try "postgres --help" for more information.
Any idea why?
CLARIFICATION
The 'YOUR_DATABASE_NAME' and the data directory I used on my server are the correct ones.
The referenced "how-to-vacuum-postgresql" page referenced in the question gives some very bad advice when it recommends VACUUM FULL. All that is needed is a full-database vacuum, which is simply a VACUUM run as the database superuser against the entire database (i.e., you don't specify any table name).
A VACUUM FULL works differently based on the version, but it eliminates all space within the heap files which is held by the database for quick re-use, and releases it to the OS. This can be much slower than the minimum needed to get back to a usable database, by orders of magnitude. And since any inserts or updates after the VACUUM FULL require OS calls to re-allocate space to the database, it can cause slower execution afterward, unless your database had a lot of bloat. (Although, if you turned off autovacuum, it might be in horrible shape, but you probably want to get back on your feet first, and sort that out later.)
Another issue with VACUUM FULL before version 9.0 is that while it eliminates bloat in a table's heap files, it tends to increase bloat in its index files, sometimes dramatically. If you issue a VACUUM FULL, you should normally follow it with a REINDEX to get the indexes back into good shape.
The page referenced in the question also fails to heed the advice given in the PostgreSQL docs at http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND to use single-user mode:
since the system will not execute commands once it has gone into the
safety shutdown mode, the only way to do this is to stop the server
and use a single-user backend to execute VACUUM. The shutdown mode is
not enforced by a single-user backend. See the postgres reference page
for details about using a single-user backend.
As others have mentioned -- there is almost no use case where turning off autovacuum is beneficial. It may be useful to supplement the autovacuum activity with explicit vacuums on large tables, or you may want to adjust autovacuum configuration, but really -- don't turn it off or you will see bloat which saps performance and you'll run into transaction ID wraparound problems periodically. People who notice a performance hit when autovacuum is performing maintenance sometimes have an instinct to make it less aggressive in triggering, but that is usually counter-productive. It is generally better to adjust the autovacuum cost limitation parameters to pace the work, rather than have it neglect tables which need maintenance.
This appears to be an issue in PostgreSQL, as according to documentation for 9.0 and 8.3 it should work with those versions but doesn't.
However, using --single switch makes it work:
postgres --single -D [path-to-data-dir] [db-name] < /tmp/fix.sql