Postgresql is having off behavior after disk was run out

Postgresql is having off behavior after disk was run out - postgresql

Background of the issue (could be irrelevent, but only relation to these issues makes sense for me)
In our production environment, disk space had run out. (We do have monitoring and notifications for this, but no-one read them - the classical)
Anyway, after fixing the issue Postgresql PostgreSQL 9.4.17 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit has shown couple of weird behaviors.
1. Unique indexes
I have couple of (multi column) unique indexes specified for the database, but they do not appear to be functioning. However I can find duplicate rows from the database.
2. Sorting based on date
We got one table which is basically just logging some json data. We got three columns: id, json and insertedAt DEFAULT NOW(). If I do simple query where I try to sort based on the insertedAt column, the sorting doesn't work around the area of disk overflow. All of the data is valid and readable, but order is invalid
3. Db dumps / backups are having some corruption.
Again, when I was browsing this logging data and tried to recover a backup to my local machine for better observation it gave an error around some random row. When I examined the sql-file with text editor, I encountered that data was otherwise valid expect that it was missing some semicolons on some rows. I think I'll give a try shortly for the never backup if it's still having the same error or if it was random issue with the one backup I tried playing with.
I've tried the basic ones: restarting the machine and PG process.

Related

PostgreSQL: even read access changes data files disk leading to large incremental backups using pgbackrest

We are using pgbackrest to backup our database to Amazon S3. We do full backups once a week and an incremental backup every other day.
Size of our database is around 1TB, a full backup is around 600GB and an incremental backup is also around 400GB!
We found out that even read access (pure select statements) on the database has the effect that the underlying data files (in /usr/local/pgsql/data/base/xxxxxx) change. This results in large incremental backups and also in very large storage (costs) on Amazon S3.
Usually the files with low index names (e.g. 391089.1) change on read access.
On an update, we see changes in one or more files - the index could correlate to the age of the row in the table.
Some more facts:
Postgres version 13.1
Database is running in docker container (docker version 20.10.0)
OS is CentOS 7
We see the phenomenon on multiple servers.
Can someone explain, why postgresql changes data files on pure read access?
We tested on a pure database without any other resources accessing the database.

This is normal. Some cases I can think of right away are:
a SELECT or other SQL statement setting a hint bit
This is a shortcut for subsequent statements that access the data, so they don't have t consult the commit log any more.
a SELECT ... FOR UPDATE writing a row lock
autovacuum removing dead row versions
These are leftovers from DELETE or UPDATE.
autovacuum freezing old visible row versions
This is necessary to prevent data corruption if the transaction ID counter wraps around.
The only way to fairly reliably prevent PostgreSQL from modifying a table in the future is:
never perform an INSERT, UPDATE or DELETE on it
run VACUUM (FREEZE) on the table and make sure that there are no concurrent transactions

DB2 Tables Not Loading when run in Batch

I have been working on a reporting database in DB2 for a month or so, and I have it setup to a pretty decent degree of what I want. I am however noticing small inconsistencies that I have not been able to work out.
Less important, but still annoying:
1) Users claim it takes two login attempts to connect, first always fails, second is a success. (Is there a recommendation for what to check for this?)
More importantly:
2) Whenever I want to refresh the data (which will be nightly), I have a script that drops and then recreates all of the tables. There are 66 tables, each ranging from 10's of records to just under 100,000 records. The data is not massive and takes about 2 minutes to run all 66 tables.
The issue is that once it says it completed, there is usually at least 3-4 tables that did not load any data in them. So the table is deleted and then created, but is empty. The log shows that the command completed successfully and if I run them independently they populate just fine.
If it helps, 95% of the commands are just CAST functions.
While I am sure I am not doing it the recommended way, is there a reason why a number of my tables are not populating? Are the commands executing too fast? Should I lag the Create after the DROP?
(This is DB2 Express-C 11.1 on Windows 2012 R2, The source DB is remote)
Example of my SQL:
DROP TABLE TEST.TIMESHEET;
CREATE TABLE TEST.TIMESHEET AS (
SELECT NAME00, CAST(TIMESHEET_ID AS INTEGER(34))TIMESHEET_ID ....
.. (for 5-50 more columns)
FROM REMOTE_DB.TIMESHEET
)WITH DATA;

It is possible to configure DB2 to tolerate certain SQL errors in nested table expressions.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyfqetnint.html
When the federated server encounters an allowable error, the server allows the error and continues processing the remainder of the query rather than returning an error for the entire query. The result set that the federated server returns can be a partial or an empty result.
However, I assume that your REMOTE_DB.TIMESHEET is simply a nickname, and not a view with nested table expressions, and so any errors when pulling data from the source should be surfaced by DB2. Taking a look at the db2diag.log is likely the way to go - you might even be hitting a Db2 issue.
It might be useful to change your script to TRUNCATE and INSERT into your local tables and see if that helps avoid the issue.
As you say you are maybe not doing things the most efficient way. You could consider using cache tables to take a periodic copy of your remote data https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyvfed_tuning_cachetbls.html

SQLite to PostgreSQL data-only transfer (to maintain alembic functionality)

There are a few questions and answers already on PostgreSQL import (as well as the specific SQLite->PostgreSQL situation). This question is about a specific corner-case.
Background
I have an existing, in-production web-app written in python (pyramid) and using alembic for easy schema migration. Due to the database creaking with unexpectedly high write-load (probably due to the convoluted nature of my own code), I've decided to migrate to PostgreSQL.
Data migration
There are a few recommendations on data migration. The simplest one involved using
sqlite3 my.db .dump > sqlitedumpfile.sql
and then importing it with
psql -d newpostgresdb < sqlitedumpfile.sql
This required a bit of editing of sqlitedumpfile. In particular, removing some incompatible operations, changing values (sqlite represents booleans as 0/1) etc. It ended up being too complicated to do programmatically for my data, and too much work to handle manually (some tables had 20k rows or so).
A good tool for data migration which I eventually settled on was pgloader, which 'worked' immediately. However, as is typical for data migration of this sort, this exposed various data inconsistencies in my database which I had to solve at source before doing the migration (in particular, removing foreign keys to non-unique columns which seemed a good idea at the time for convenient joins and removing orphan rows which relied on rows in other tables which had been deleted). After these were solved, I could just do
pgloader my.db postgresql:///newpostgresdb
And get all my data appropriately.
The problem?
pgloader worked really well for data but not so well for the table structure itself. This resulted in three problems:-
I had to create a new alembic revision with a ton of changes (mostly datatype related, but also some related to problem 2).
Constraint/index names were unreliable (unique numeric names generated). There's actually an option to disable this, and this was a problem because I needed a reliable upgrade path which was replicable in production without me having to manually tweak the alembic code.
Sequences/autoincrement just failed for most primary keys. This broke my webapp as I was not able to add new rows for some (not all) databases.
In contrast, re-creating a blank database using alembic to maintain the schema works well without changing any of my webapps code. However pgloader defaults to over-riding existing tables, so this would leave me nowhere as the data is what really needs migrating.
How do I get proper data migration using a schema I've already defined (and which works)?

What eventually worked was, in summary:-
Create the appropriate database structure in postgresql://newpostgresdb (I just used alembic upgrade head for this)
Use pgloader to move data over from sqlite to a different database in postgresql. As mentioned in the question, some data inconsistencies need to be solved before this step, but that's not relevant to this question itself.
createdb tempdb
pgloader my.db postgresql:///tempdb
Dump the data in tempdb using pg_dump
pg_dump -a -d tempdb > dumped_postgres_database
Edit the resulting dump to accomplish the following:-
SET session_replication_role = replica because some of my rows are circular in reference to other rows in the same table
Delete the alembic_version table, as we're restarting a new branch for alembic.
Regenerate any sequences, with the equivalent of SELECT pg_catalog.setval('"table_colname_seq"', (select max(colname) from table));
Finally, psql can be used to load the data to your actual database
psql -d newpostgresdb < dumped_postgres_database

How do I ALTER a set of partitioned tables in Postgres?

I created a set of partitioned tables in Postgres, and started inserting a lot of rows via the master table. When the load process blew up on me, I realized I should have declared the id row BIGSERIAL (BIGINT with a sequence, behind the scenes), but inadvertently set it as SERIAL (INTEGER). Now that I have a couple of billion rows loaded, I am trying to ALTER the column to BIGINT. The process seems to be working, but is taking a long time. So, in reality, I don't really know if it is working or it is hung. I'd rather not restart the entire load process again.
Any suggestions?

When you update a row to alter it in PostgreSQL, that writes out a new copy of the row and then does some cleanup later to remove the original. This means that trying to fix the problem by doing updates can take longer than just loading all the data in from scratch again--it's more disk I/O than loading a new copy, and some extra processing time too. The only situation where you'd want to do an update instead of a reload is when the original load was very inefficient, for example if a slow client programs is inserting the data and it's the bottleneck on the process.
To figure out if the process is still working, see if it's using CPU when you run top (UNIX-ish systems) or the Task Manager (Windows). On Linux, "top -c" will even show you what the PostgreSQL client processes are doing. You probably just expected it to take less time than the original load, which it won't, and it's still running rather than hung up.

Restart it (clarifying edit: restart the entire load process again).
Altering a column value requires a new row version, and all indexes pointing to the old version to be updated to point to the new version.
Additionally, see how much of the advise on populating databases you can follow.
Correction from #archnid:
altering the type of the column will trigger a table rewrite, so the row versioning isn't a big problem, but it will still take lots of disk space temporarily. you can usually monitor progress by looking at which files in the database directory are being appended to...

how to restore postgresql DB without backup

Forgot to make a backup. Now I have harddrive with databases and new system with empty postgres. Can I somehow restore databases? by simple copy of files etc?

If you have the full data directory of your old postgresql system (and if it was the same version, or differing only in a revision number) you can just try to put it in place of your data directory in your new postgresql installation. (Of course, stop postgres server before doing this).
It's basically the same procedure used when upgrading postgresql, when there is no need to do backup-restore.
Edit: As pointed out in the comments, I assume not only same (or almost same) version, but same architecture (32 - 64 bits , Linux - Windows, etc)

In addition to the leonbloy's answer, you could try pg_migrator, especially if you need to upgrade from 8.3 to 8.4 (and 9.0 eventually).

In your case you have the files, but if you haven't, Maybe, only maybe, you can do something with the logs of the database, you can try to see the log of the statements in the database normally in /var/log/postgresql/postgresql.log, if it is there or close to it, and if log_statements = 'mod' or 'all' is set up before, you can recovery some of your data.
Table by table, by searching by insert into in this tables in all or recent history of database. You can cut text with some Unix tools to get only the statements and put a ";" at the end of each statement, and another important queries like delete, etc.
But you must to do it table by table, and data must be there, and database don't runned up too much time without backups.
In certain cases you just need the last operation or something like this to save the day.
This, however, its just for Apolo 13 disasters moment and never can replace a good backup.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse