missing chunk number 0 for toast value 37946637 in pg_toast_2619 - postgresql

Main Issue:
Getting "ERROR: missing chunk number 0 for toast value 37946637 in pg_toast_2619" while selecting from tables.
Steps that led to the issue:
- Used pg_basebackup from a Primary db and tried to restore it onto a Dev host.
- Did a pg_resetxlog -f /${datadir} and started up the Dev db.
- After starting up the Dev db, when I query a varchar column, I keep getting:
psql> select text_col_name from big_table;
ERROR: missing chunk number 0 for toast value 37946637 in pg_toast_2619
This seems to be happening for most varchar columns in the restored db.
Has anyone else seen it?
Does anyone have ideas of why it happens and how to fix it?

pg_resetxlog is a bit of a last resort utility which you should prefer not to use. Easiest way to make a fully working backup is to use pg_basebackup with the -X s option. That is an uppercase X. What this does is that basebackup opens two connections. One to copy all the data files and one to receive all of the wal that is written during the duration of the backup. This way you cannot run into the problem that parts of the wal you need are already deleted.

I tried a few things since by original question. I can confirm that the source of my error "ERROR: missing chunk number 0 for toast value 37946637 in pg_toast_2619" was doing a pg_resetxlog during the restore process.
I re-did the restore today but this time, applied the pg_xlog files from Primary using recovery.conf. The restored db started up fine now and all queries are running as expected.

Related

I have loaded wrong psql dump into my database, anyway to revert?

Ok, I screwed up.
I dumped one of my psql (9.6.18) staging database with the following command
pg_dump -U postgres -d <dbname> > db.out
And after doing some testing, I "restored" the data using the following command.
psql -f db.out postgres
Notice the absence of -d option? yup. And that was supposed to be the username.
Annnd as the database happend to have the same name as its user, it overwrote the 'default' database (postgres), which had data that other QAs are using.
I cancelled the operation quickly as soon as I realised my mistake, but the damage was still done. Around 1/3 ~ 1/2 of the database is roughly identical to the staging database - at least in terms of the schema.
Is there any way to revert this? I am still looking for any other dumps if any of these guys made one. But I don't think there is any past two to three months. Seems like I got no choice but to own up and apologise to them in the morning.
Without a recent dump or some sort of PITR replication setup, you can't un-revert this easily. The only option is to manually go through the log of what was restored and remove/alter it in the postgres database. This will work for the schema, the data is another matter. FYI, the postgres database should not really be used as a 'working' database. It is there to be a database to connect to for doing other operations, such as CREATE DATABASE or to bootstrap your way into a cluster. If left empty then the above would not have been a problem. You could have done, from another database, DROP DATABASE postgres; and then CREATE DATABASE postgres.
Do you have a capture of the output of the psql -f db.out postgres run?
Since the pg_dump didn't specify --clean or -c, it should not have overwritten anything, just appended. And if your tables have unique or primary keys, most of the data copy operations should have failed with unique key violations and rolled back. Even one overlapping row (per table) would roll back the entire dataset for that table.
Without having the output, it will be hard to figure out what damage has actually been done.
You should also immediately copy the pg_xlog data someplace safe. If it comes down to it, you might be able to use pg_xlogdump to figure out what changes committed and what did not.

PostgreSQL PANIC: WAL contains references to invalid pages

I have problem with PostgreSQL database running as replica of the master database server. Database on master runs without any problems. But replica database runs only for few hours (it is random time) and after that crashing down by this reason:
WARNING: page 3318889 of relation base/16389/19632 is uninitialized
...
PANIC: WAL contains references to invalid pages
Have you any idea what is wrong please? I'm not able to solve this problem for many days! Thanks.
There was more Postgres bugs with these symptoms. Lot of was fixed already. Please, check if your Postgres is latest minor release. And if it is, then report this issue to mailing list https://www.postgresql.org/list/pgsql-hackers/.

pg_dump FATAL: segment too big

pg_dump is failing with the error message:
"pg_dump FATAL: segment too big"
What does that mean?
PostgreSQL 10.4 on Ubuntu 16.04.
It appears that pg_dump passes the error messages it receives from the queries it is running into the logs.
The following line in the logs (maybe buried deeper if you have busy logs), shows the query that failed.
In this case, we had a corrupted sequence. Any query on the sequence, whether it was interactive, via a column default, or via pgdump, returned the "segment too big" error, and killed the querying process.
I figured out the new start value for the sequence, dropped the dependencies, and created a new sequence starting where the old one left off and then put the dependencies back.
pg_dump worked fine after that.
It is not clear why or how a sequence could get so corrupted that you would have a session killing error when it was accessed. We did have a recent database hard-crash though, so it may be related. (Although that sequence is accessed very rarely and it is unlikely we went down in the middle of incrementing it.)

postgres index contains "unexpected zero page at block" exceptions

I have following errors in pg_log file several thousand times. How to resolve them.
index "meeting_pkey" contains unexpected zero page at block 410.Please REINDEX it.
index "faultevent_props_pkey" contains unexpected zero page at block 37290.
index "faultevent_pkey" contains unexpected zero page at block 1704
The cause of the issue is due to bad index pages and its unable to read it.
Reindex the problematic index to overcome the issue.
Reindex index <schema_name>.<index_name>;
Here you have some hits.
Your database is corrupt.
Try to run pg_dumpall to get a logical dump of the database.
If that fails, buy support from somebody who can salvage data from corrupted databases.
If it succeeds:
Check the hardware, particularly storage and RAM.
Once you are certain the hardware is ok, install the latest patch update for your PostgreSQL version.
Create a new database cluster with initdb.
Restore the dump into the new cluster.
Did you have crashes recently?
Did you test if your storage handles fsync requests properly?
Do you have any dangerous settings like fsync = off?
I ran into this issue and after a lot of reading I decided to do a complete DB reindex:
reindex DATABASE <DATABASE NAME>
and it solved the issue for me. Hope this helps you.

SQL2059W A device full warning - when trying to bring tablespace online

Trying to do a DB2 import as part of a system copy and the transaction logs filled up. Import was cancelled, transaction log backup ran, and number of logs were increased to approximately 90% of the available disk (previously 70%).
Restarted DB and kicked off DB but now that errors due to the tablespace state - running db2 list tablespaces show detail shows I have 4 tablespaces in Backup Pending state.
So I tried db2 backup database <SID> tablespace <SID>#BTABI online but I get the error:
SQL2059W A device full warning was encountered on device "/db2/db2". Do you want to continue(c), terminate this device only(d), abort the utility(t) ? (c/d/t) t
No option works but to terminate.
The thing is, the device isn't full. There's no activities on the DB, running db2 list applications gives:
SQL1611W No data was returned by Database System Monitor.
Running db2 "select log_utilization_percent,dbpartitionnum from sysibmadm.log_utilization order by 2" to show the log utilization returns 0.
There's no logs in use. The filesystem has space free. I even tried reducing the number of logs again to make sure but get the same issue.
I tried db2 "alter tablespace <SID>#BTABI switch online" instead and although this returns a 'success' statement it doesn't actually do anything - my tablespaces are still in Backup pending?
Any ideas please
You're trying to write the backup images to the /db2/db2 file system, which doesn't have enough space to hold the backup image(s).
Note: When you execute BACKUP DATABASE as in your example above without specifying where to send the backup (i.e. you don't use the to /dir/ectory or another option like use TSM), DB2 will write the backup image to the current directory. Make sure you specify where to store the backup image (and that it has enough free space to hold the backup image). If you don't care about recoverability and are just trying to get the table space out of backup pending state, you can specify /dev/null as your location as #mustaccio suggests in the comments above.
Also: You may want to look at the COMMITCOUNT option for the import utility so you're not trying to insert all data in a single massive transaction.
As per above comments - just kept running the import, resetting the 'pending load' status each time with:
load from /dev/null of del terminate into SAPECD.
A few packages fail each time but the rest process. Letting finish, resetting again and restarting the import gets through a little more each time.