Repair Corrupt database postgresql - postgresql

I have multiple errors with my postgresql db, which resulted after a power surge:
I cannot access most tables from my database. When I try for example select * from ac_cash_collection, I get the foolowing error:
ERROR: missing chunk number 0 for toast value 118486855 in pg_toast_2619
when I try pg_dump I get the following error:
Error message from server: ERROR: relation "public.st_stock_item_newlist" does not exist
pg_dump: The command was: LOCK TABLE public.st_stock_item_newlist IN ACCESS SHARE MODE
I went ahead and tried to run reindex of the whole database, I actually I left it runnng, went to sleep, and I found it had not done anything in the morning, so I had to cancel it.
I need some help to fix this as soon as possible, Please help.

Before you do anything else, http://wiki.postgresql.org/wiki/Corruption and act on the instructions. Failure to do so risks making the problem worse.
There are two configuration parameters listed in the Fine Manual that might be of use: ignore_system_indexes and zero_damaged_pages. I have never used them, but I would if I were desparate ...
I don't know if they help against toast-tables. In any case, if setting them causes your database(s) to become usable again, I would {backup + drop + restore} to get all tables and catalogs into newborn shape again. Success!

If you have backups, just restore from them.
If not - you've just learned why you need regular backups. There's nothing PostgreSQL can do if hardware misbehaves.
In addition, if you ever find yourself in this situation again, first stop PostgreSQL and take a complete file-level backup of everything - all tablespaces, WAL etc. That way you have a known starting point.
So - if you still want to recover some data.
Try dumping individual tables. Get what you can this way.
Drop indexes if they cause problems
Dump sections of tables (id=0..9999, 1000..19999 etc) - that way you can identify where some rows may be corrupted and dump ever-smaller sections to recover what's still good.
Try dumping just certain columns - large text values are stored out-of-line (in toast tables) so avoiding them might get the rest of your data out.
If you've got corrupted system tables then you're getting into a lot of work.
That's a lot of work, and then you'll need to go through and audit what you've recovered and try to figure out what's missing/incorrect.
There are more things you can do (creating empty blocks in some cases can let you dump partial data) but they're all more complicated and fiddly and unless the data is particularly valuable not worth the effort.
Key message to take away from this - make sure you take regular backups, and make sure they work.

Before you do ANYTHING ELSE, take a complete file-system-level copy of the damaged database.
http://wiki.postgresql.org/wiki/Corruption
Failure to do so destroys evidence about what caused the corruption, and means that if your repair efforts go badly and make things worse you can't undo them.
Copy it now!

If few/specific files are corrupted, following tricks might help.
Restore Older dump in different node or a second installation.
Copy required files from RESTORED/second installation to FAILED node.
Stop & Start PSQL.
From Today's experience!
Error message from server: ERROR: could not read block 226448 in file "base/12345/12345.1": Input/output error
try (probably it will fail)
cp base/12345/12345.1 /root/backup/12345.1-orig
try mv base/12345/12345.1 /root/backup/12345.1-orig #expecting this to finish. Else do rm -rf base/12345/12345.1 /root/backup/12345.1-orig
Finally,
Magic of tar. (if below tar completes, you have luck!)
tar -zcvf my_backup.tar.gz /var/lib/postgresql/xx/main/xx
Extract the corrupted file from TAR.
Replace it in original locationbase/12345/12345.1.
Stop & Start PGSQL
IMPORTANT: Please try googling and do try vaccum, reindex and disk checks like fsck etc before getting to this stage.
Also, always take a Filesystem backup before doing any TRIAL and ERROR method :)

Related

Schema pg_dump failed due to a Lock on a table

I'm running backup restore on a schema every day and get this every now and then:
pg_dump: Error message from server: ERROR: relation not found (OID
86157003) DETAIL: This can be validly caused by a concurrent delete
operation on this object. pg_dump: The command was: LOCK TABLE
myschema.products IN ACCESS SHARE MODE
How can this be avoided? It seems the table was being used at the time, or someone was running something against the table. can I just kill all connections to the DB before restoring or is there another alternative?
As far as I understand, pg_dump could run even if users are doing something with the table but it doesn't seem to be the case.
Thanks,
It is somewhat buried but the answer lies here:
https://www.postgresql.org/docs/current/app-pgdump.html
"
-j njobs
...
To detect this conflict, the pg_dump worker process requests another shared lock using the NOWAIT option. If the worker process is not granted this shared lock, somebody else must have requested an exclusive lock in the meantime and there is no way to continue with the dump, so pg_dump has no choice but to abort the dump.
"
Which is borne out by the this in the error message:
"LOCK TABLE myschema.products IN ACCESS SHARE MODE"
ACCESS SHARE will cooperate with all other locks modes except ACCESS EXCLUSIVE. ACCESS EXCLUSIVE is used by DROP TABLE, TRUNCATE, REINDEX, etc. See here Locks for more information. So you need to do the dump during a time where the operations listed for ACCESS EXCLUSIVE are known to not happen or by blocking/dropping connections.
Somebody dropped a table between the time pg_dump took an inventory of the tables and the time it tries to dump the table.
This can happen if your application is in the habit of dropping tables all the time.
This is not an answer to your main question, but a caution regarding:
As far as I understand, pg_dump could run even if users are doing something with the table but it doesn't seem to be the case.
It assumes that the application performs every action in a single transaction. I have known of applications which accomplish some tasks using more than one.
I don't know exactly what the tasks were or if it was unavoidable that they use multiple transactions, but dumps could only be trusted when the application was idle or, better yet, when the service was stopped.
For the function that those applications performed, it wasn't a big deal to work around down times or stop services.
I don't know how you'd determine this behaviour without being told by the developers. Just something to consider.

Postgresql is having off behavior after disk was run out

Background of the issue (could be irrelevent, but only relation to these issues makes sense for me)
In our production environment, disk space had run out. (We do have monitoring and notifications for this, but no-one read them - the classical)
Anyway, after fixing the issue Postgresql PostgreSQL 9.4.17 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit has shown couple of weird behaviors.
1. Unique indexes
I have couple of (multi column) unique indexes specified for the database, but they do not appear to be functioning. However I can find duplicate rows from the database.
2. Sorting based on date
We got one table which is basically just logging some json data. We got three columns: id, json and insertedAt DEFAULT NOW(). If I do simple query where I try to sort based on the insertedAt column, the sorting doesn't work around the area of disk overflow. All of the data is valid and readable, but order is invalid
3. Db dumps / backups are having some corruption.
Again, when I was browsing this logging data and tried to recover a backup to my local machine for better observation it gave an error around some random row. When I examined the sql-file with text editor, I encountered that data was otherwise valid expect that it was missing some semicolons on some rows. I think I'll give a try shortly for the never backup if it's still having the same error or if it was random issue with the one backup I tried playing with.
I've tried the basic ones: restarting the machine and PG process.

Export from mongo database file to bson

I have a mongo database db.ns, db.0, db.1, ... db.7
Accidentally I remove all the data from a collection, but in the database files (explorer with vim) it's all (or part of) the data.
After trying to recover the data moving to another mongodb instance, or mongod --restore, also, I try with the mongodump, but the collection appears empty.
I try to recover from scratch, directly from the files. I try with bsondump for each one, and for a single file (cat db.ns db.1 ... > bigDB) but nothing.
I don't know what other ways are from recover the data from a mongo database file.
Any suggestion?? Thx!!!
[SOLVED]
I will try to explain what I do to "solved" the problem.
First. Theory.
In this SlideShare, can see a little of how files MongoDB database work.
http://www.slideshare.net/mdirolf/inside-mongodb-the-internals-of-an-opensource-database
Options:
When you remove accidentally a collection:
the first thing that you have to do is quickly copy all the database (normally in /data/db or /var/lib/mongodb) and stop the service.
Remove the journal directory try to recover from this copy and pray ;D
You can see more about that, here:
mongodb recovery removed records
In my case, this did not work for me.
In Journaly case, mongo no update its database files directly only their indexes.
So that, you can access to the files (appointed as database.ns, database.0, database.1 ...) and try to recover this.
These files are as cleaved BSONs and binary. So, you can open, and see all the information
In my case, I create a easy function in PHP that first read the file, and explode the file in smallers files.
Before, takes one to one and apply some regular expresions to remove Hexadecimal values, explode the info into the registers (you can see the "_id" key to do that) and do some others task to clean the info.
And finally, I have to process manually all the preprocessed info to obtain all the information.
I think, I have lost, at least, the 15-25% of the information. But I prefer to think that I have recovered the 75% of the lost info.
Caution:
This is not a easy and secure way to solve this problem. In my case, the db only recive information, and not modify or update this.
With this method, a lot of information will be lost, Mongo IDs, integers, dates, can't be recovered.
The proccess is 100% manually, you can spend your time on automating certain tasks, but will depend on your database structure.

How to Recover PostgreSQL 8.0 Database

On my PostgreSQL 8.0 database, I started receiving a "ERROR: could not open relation 1663/17269/16691: No such file or directory" message, and now my data is inaccessible.
Any ideas on how to recover at least some of the data? Professional support is an option.
Regards.
RP
If you want your data back in a hurry and it's worth something to you, then the professional support option should be simple enough.
Some things to check, now that you've got a full backup of all your database (that's base, pg_clog, pg_xlog and all the other folders at that level).
Does that file actually exist? It might be a permissions problem rather than the file actualy going missing.
Check your anti-virus/security packages - have they mistakenly quarantined the file? If you can exclude PostgreSQL's database directories from scans/active scans that's worthwhile too.
Make a note of everything you can remember about when this happened and what happened just before. This will help with troubleshooting for you or a consultant.
Check the logs likewise - this error will be logged, find the first occurrence and see if there's anything odd before.
Double-check you really do have all your existing files backed up, and restart PostgreSQL.
Try connecting as user postgres to database postgres or database template1. If that works then the file is one of your database files rather than the global list of users or some such.
Try creating an empty file with the right name (and permissions - check the other files). If you are really lucky it's just an index. Otherwise it could be a data table you can live without. Then you can dump other tables individually.
OK - if you're here then you can connect to your DB. Those numbers in the file-path are PostgreSQL's OIDs identifying system objects. You can try a couple of useful queries here. These two queries should give you the IDs of the databases and then the object with the missing file. This is useful information for your professional too.
SELECT oid, datname, dattablespace FROM pg_database;
SELECT * FROM pg_class WHERE relfilenode = 16691;
Remember make sure you have the filesystem backup before tinkering.

how to restore postgresql DB without backup

Forgot to make a backup. Now I have harddrive with databases and new system with empty postgres. Can I somehow restore databases? by simple copy of files etc?
If you have the full data directory of your old postgresql system (and if it was the same version, or differing only in a revision number) you can just try to put it in place of your data directory in your new postgresql installation. (Of course, stop postgres server before doing this).
It's basically the same procedure used when upgrading postgresql, when there is no need to do backup-restore.
Edit: As pointed out in the comments, I assume not only same (or almost same) version, but same architecture (32 - 64 bits , Linux - Windows, etc)
In addition to the leonbloy's answer, you could try pg_migrator, especially if you need to upgrade from 8.3 to 8.4 (and 9.0 eventually).
In your case you have the files, but if you haven't, Maybe, only maybe, you can do something with the logs of the database, you can try to see the log of the statements in the database normally in /var/log/postgresql/postgresql.log, if it is there or close to it, and if log_statements = 'mod' or 'all' is set up before, you can recovery some of your data.
Table by table, by searching by insert into in this tables in all or recent history of database. You can cut text with some Unix tools to get only the statements and put a ";" at the end of each statement, and another important queries like delete, etc.
But you must to do it table by table, and data must be there, and database don't runned up too much time without backups.
In certain cases you just need the last operation or something like this to save the day.
This, however, its just for Apolo 13 disasters moment and never can replace a good backup.