Question on checkpoints during large data loads? - postgresql

This is a question about how PostgreSQL works. During large data loads using the 'COPY' command, I see multiple checkpoints occur where 100% of the log files (checkpoint_segments) are recycled.
I don't understand this, I guess. What does pgsql do when a single transaction requires more space than available log files? It seems that it is wrapping around multiple times in the course of this load which is a single transaction. What am I missing?
Everything is working, I just want to understand it better in case I can tune things, etc.

When a checkpoint happens all dirty pages are written to disk. As these pages cannot get lost anymore it doesn't need the log for them any more so it is save to recycle. Writing dirty pages to disk doesn't mean this data is committed. The db can see from the meta data stored with each row that it belongs to a transaction that has not commited yet and it also can still abort this transaction in which case vacuum will eventually cleanup these rows.
When loading large amounts of data it is adviced to temporarily increase checkpoint_segments.

Related

Is there any way we can monitor all data modifications in intersystems cache?

I'm a newbie of intersystems cache. We have an old system using cache database, now we want to extract and transform all data of it to store into another different database such as PostgreSQL first, and then monitor all modifications of the original cache data to modify(new or update) our transformed data in PostgreSQL in time.
Is there any way we can monitor all data modifications in cache?
Does cache have got any modification/replication log just like mongodb's oplog?
Any idea would be appreciated, thanks!
In short, yes, there is a way to monitor data modifications. InterSystems uses journaling for some reasons, mostly related to keeping data consistent. In some situations, journals may have more records than in others. It used to rollback transactions, restore data in unexpected shutdowns, for backups, for mirroring, and so on.
But I think in your situation it may help. Most of the quite old applications on Caché does not use Objects and SQL tables and works just with globals as is. While in journals you will find only globals, so, you if already have objects and tables in your Caché application, you should know where and how it stores data in globals. And with Journals API you will be able to read every change in the data. There will be any changes, if the application uses transactions, you will have flags about it as well. Every record changes only one value, it could be set or kill.
And be aware, that Caché cleans outdated journal files by settings, so, you have to increase the number of days after which it will be purged.

Force data to stay in cache

I have a part of a large table in PostgreSQL 12 that I would like to be cached at all times. The queries are such that the same rows would (almost) never be read twice, so I can't rely on the automatic caching. I'm reading up on pg_prewarm which seems suitable for loading the cache, but I don't find anything about preventing it being overwritten over time. Any hints? Thanks!
AFAIK there is no documented feature in PostgreSQL to lock pages for a specific object in the database cache.
That is true. The only way you can make sure that the table stays cached is to have shared_buffers big enough to contain the whole database.
But in practice that is no big problem: if a block doesn't get used regularly, it can drop out of the cache, so you have to read it in again when it gets used. But a single block contains many rows, so it only drops out if none of these rows are needed.

delete temporary files in postgresql

I have a huge database of about 800GB. When I tried to run a query which groups certain variables and aggregates the result, it was stopping after running for a couple of hours. Postgres was throwing a message that disk space is full. After looking at the statistics I realized that the dB has about 400GB of temporary files. I believe these temp files where created while I was running the query. My question is how do I delete these temp files. Also, how do I avoid such problems - use cursors or for-loops to not process all the data at once? Thanks.
I'm using Postgres 9.2
The temporary files that get created in base/pgsql_tmp during query execution will get deleted when the query is done. You should not delete them by hand.
These files have nothing to do with temporary tables, they are use to store data for large hash or sort operations that would not fit in work_mem.
Make sure that the query is finished or canceled, try running CHECKPOINT twice in a row and see if the files are still there. If yes, that's a bug; did the PostgreSQL server crash when it ran out of disk space?
If you really have old files in base/pgsql_tmp that do not get deleted automatically, I think it is safe to delete them manually. But I'd file a bug with PostgreSQL in that case.
There is no way to avoid large temporary files if your execution plan needs to sort large result sets or needs to create large hashes. Cursors won't help you there. I guess that with for-loops you mean moving processing from the database to application code – doing that is usually a mistake and will only move the problem from the database to another place where processing is less efficient.
Change your query so that it doesn't have to sort or hash large result sets (check with EXPLAIN). I know that does not sound very helpful, but there's no better way. You'll probably have to do that anyway, or is a runtime of several hours acceptable for you?

I TRUNCATEd a table. How do I get the data back?

In my postgesql database, unfortunately I truncate this table mail_group, and the table is delete from the database, how to I get back this table.
Kindly help me, waiting for reply.
Thanks
Anyone else in the same situation: immediately stop your database with pg_ctl stop -m immediate (the immediate is important, you need to simulate a crash and prevent a checkpoint) then do not restart it.. If you had concurrent transactions still in progress you might be really lucky and PostgreSQL might not have unlinked the backing files for the table yet, so it could maybe be recoverable.
You very likely can't get the data back, you deleted it. Restore from a backup.
A normal DELETE in PostgreSQL marks the rows as deleted but does not actually erase the data immediately, so it can often be recovered if you promptly stop the database and you don't write anything else to the table.
This is not the case for TRUNCATE. TRUNCATE deletes the underlying files that represent the database table from the file system.
Recovering the data, if possible at all, would require forensic analysis of your hard drive. If the data is truly important then power the computer off now and take a disk image of the hard drive. Expect recover work to cost multiple thousand dollars, if it is possible at all, since you will need someone who knows both (a) file system internals and (b) PostgreSQL internals. The only person I can think of who I know has the skills to possibly be able to do this would probably cost about €5000 to €10000 for the time required for this sort of work. (It isn't me).
If you didn't have backups you have just learned a very expensive lesson.
If someone else is reading this and DELETEd rows, please immediately follow the instructions in corruption since the first recovery steps are the same. This will not help if you ran TRUNCATE.

Compact Firebird 2.1 Database

How can I compact Firebird 2.1 database, like we do in MS Access (discarding erased data, remaking index, etc)?
There's a way to do it?
Thanks!
Usually there is no need to compact a Firebird Database: see fb release notes about garbage collection and an automatic (per-database configurable) operation named "sweep".
In few words, fb reuses space in pages when records are deleted or oldest record version are freed asking for disk space chunks only when free space becomes too small (i.e. under a defined percent).
Sweep is performed as default after a predefined number of commited transactions, bur it's an expensive task.
Backup and restore must be intended as last resort to optimize and shrink, as this rebuilds and optimize indexes too, but usually this is not needed as there are commands and tools to rebuild indexes.
The only way to do it is to make a backup and a restore.
From the official faq
Many users wonder why they don't get their disk space back when they
delete a lot of records from database.
The reason is that it is an expensive operation, it would require a
lot of disk writes and memory - just like doing refragmentation of
hard disk partition. The parts of database (pages) that were used by
such data are marked as empty and Firebird will reuse them next time
it needs to write new data.
If disk space is critical for you, you can get the space back by
doing backup and then restore. Since you're doing the backup to
restore right away, it's wise to use the "inhibit garbage collection"
or "don't use garbage collection" switch (-G in gbak), which will make
backup go A LOT FASTER. Garbage collection is used to clean up your
database, and as it is a maintenance task, it's often done together
with backup (as backup has to go throught entire database anyway).
However, you're soon going to ditch that database file, and there's no
need to clean it up.