Firebird overgrow database file - firebird

Is there a way to check why my database is increasing its size so fast? I had a growth of 30GB in less than a week and all my process and procedures are so slow now, with the same amount of transactions as usual.
Or is there a way to check the size of the table?

If you have a table that you suspect might be growing you can run the following command from the command line.
gstat <DATABASE NAME|ALIAS> -t <TABLE_NAME>
Additional information on gstat can be found here. https://firebirdsql.org/file/documentation/html/en/firebirddocs/gstat/firebird-gstat.html
You can have this command run on a schedule and save the results so you can see what is happening over time. Although, this command might take a while to run based on the size of the table.

Related

Postgres pg_dump getting very slow while copying large objects

I was performing a pg_dump operation on a postgres (v9) database of size around 80Gb.
The operation never seemed to finish even when trying the following:
running a FULL VACUUM before dumping
dumping the db into a directory-format archive (using -Fd)
without compression (-Z 0)
dumping the db into a directory in parallel (tried up to 10 threads -j 10)
When using the --verbose flag I saw that the most of the logs are related to creating/executing large objects.
When I tried dumping each table on its own (pg_dump -t table_name) the result was fast again (in minutes) but when restoring the dump to another db, the application that uses the db started throwing exceptions regarding some resources not being found (they should've been in the db)
As in Postgres pg_dump docs when using the -t flag the command will not copy blobs.
I added the flag -b (pg_dump -b -t table_name) and the operation went back to being slow.
So the problem I guess is with exporting the blobs in the db.
The number of blobs should be around 5 Million which can explain the slowness in general but the duration of execution is lasting as long as 5 hours before killing the process manually.
The blobs are relatively small (Max 100 Kb per blob)
Is this expected? or is there something fishy going around?
The slowness was due to high number of orphaned blobs
Apparently when launching a FULL VACUUM on a postgres database, it doesn't not remove orphaned large objects.
When I queried the amount of large objects in my database
select count(distinct loid) from pg_largeobject;
output:
151200997
The displayed amount from the query did not match the expected value. The expected amount of blobs should be around 5 Million in my case.
The table (the one that I created in the app) that references those blobs, in my case, is subject for frequent updates and postgres does not delete the old tuples (rows) but rather marks them as 'dead' and inserts the new ones. With each update to the table the old blob becomes no longer referenced by alive tuples, only by dead ones which makes it an orphaned blob.
Postgres has a dedicated command 'vacuumlo' to vacuum orphaned blobs.
After using it (the vacuum took around 4h) the dump operation became much faster. The new duration is around 2h (previsouly taking hours and hours without finishing)

psql copy command hangs with large CSV data set

I'm trying to load some large data sets from CSV into a Postgres 11 database (Windows) to do some testing. Fist problem I ran into was that with very large CSV I got this error: "ERROR: could not stat file "'D:/temp/data.csv' Unknown error". So after searching, I found a workaround to load the data from a zip file. So I setup 7-zip and was able to load some data with a command like this:
psql -U postgres -h localhost -d MyTestDb -c "copy my_table(id,name) FROM PROGRAM 'C:/7z e -so d:/temp/data.zip' DELIMITER ',' CSV"
Using this method, I was able to load a bunch of files of varying sizes, one with 100 million records that was 700MB zipped. But then I have one more large file with 100 million records that's around 1GB zipped, that one for some reason is giving me grief. Basically, the psql process just keeps running and never stops. I can see based on data files growing that it generates data up to a certain point, but then at some point it stops growing. I'm seeing 6 files in a data folder named 17955, 17955.1, 17955.2, etc. up to 17955.5. The Date Modified date on those files continues to be updated, but they're not growing in size and my psql program just sits there. If I shut down the process, I lose all the data since I assume it rolls it back when the process does not run to completion.
I looked at the logs in the data/log folder, there doesn't seem to be anything meaningful there. I can't say I'm very used to Postgres, I've used SQL Server the most, so looking for tips on where to look, or what extra logging to turn on, or anything else that could help figure out why this process is stalling.
Figured it out thanks to #jjanes comment above (sadly he/she didn't add an answer). I was adding 100 million records to a table with a foreign key to another table with 100 million records. I removed the foreign key, added the records, and then re-added the foreign key, that did the trick. I guess checking the foreign keys is just too much with a bulk insert this size.

PostgreSQL how to find what is causing Deadlock in vacuum when using --jobs parameter

How to find in PostgreSQL 9.5 what is causing deadlock error/failure when doing full vacuumdb over database with option --jobs to run full vacuum in parallel.
I just get some process numbers and table names... How to prevent this so I could successfuly do full vacuum over database in parallel?
Completing a VACUUM FULL under load is a pretty hard task. The problem is that Postgres is contracting space taken by the table, thus any data manipulation interferes with that.
To achieve a full vacuum you have these options:
Lock access to the vacuumed table. Not sure if acquiring some exclusive lock will help, though. You may need to prevent access to the table on application level.
Use a create new table - swap (rename tables) - move data - drop original technique. This way you do not contract space under the original table, you free it by simply dropping the table. Of course you are rebuilding all indexes, redirecting FKs, etc.
Another question is: do you need to VACUUM FULL? The only thing it does that VACUUM ANALYZE does not is contracting the table on the file system. If you are not very limited by disk space you do not need doing a full vacuum that much.
Hope that helps.

postgresql 9.2 never vacuumed and analyze

I have given a postgres 9.2 DB around 20GB of size.
I looked through the database and saw that it has been never run vacuum and/or analyze on any tables.
Autovacuum is on and the transaction wraparound limit is very far (only 1% of it).
I know nothing about the data activity (number of deletes,inserts, updates), but I see, it uses a lot of index and sequence.
My question is:
does the lack of vacuum and/or analyze affect data integrity (for example a select doesn't show all the rows matches the select from a table or from an index)? The speed of querys and writes doesn't matter.
is it possible that after the vacuum and/or analyze the same query gives a different answer than it would executed before the vacuum/analyze command?
I'm fairly new to PG, thank you for your help!!
Regards,
Figaro88
Running vacuum and/or analyze will not change the result set produced by any select operation (unless there was a bug in PostgreSQL). They may effect the order of results if you do not supply an ORDER BY clause.

Dump postgres data with indexes

I've got a Postgres 9.0 database which frequently I took data dumps of it.
This database has a lot of indexes and everytime I restore a dump postgres starts background task vacuum cleaner (is that right?). That task consumes much processing time and memory to recreate indexes of the restored dump.
My question is:
Is there a way to dump the database data and the indexes of that database?
If there is a way, will worth the effort (I meant dumping the data with the indexes will perform better than vacuum cleaner)?
Oracle has some the "data pump" command a faster way to imp and exp. Does postgres have something similar?
Thanks in advance,
Andre
If you use pg_dump twice, once with --schema-only, and once with --data-only, you can cut the schema-only output in two parts: the first with the bare table definitions and the final part with the constraints and indexes.
Something similar can probably be done with pg_restore.
Best Practice is probably to
restore the schema without indexes
and possibly without constraints,
load the data,
then create the constraints,
and create the indexes.
If an index exists, a bulk load will make PostgreSQL write to the database and to the index. And a bulk load will make your table statistics useless. But if you load data first, then create the index, the stats are automatically up to date.
We store scripts that create indexes and scripts that create tables in different files under version control. This is why.
In your case, changing autovacuum settings might help you. You might also consider disabling autovacuum for some tables or for all tables, but that might be a little extreme.