delete temporary files in postgresql - postgresql

I have a huge database of about 800GB. When I tried to run a query which groups certain variables and aggregates the result, it was stopping after running for a couple of hours. Postgres was throwing a message that disk space is full. After looking at the statistics I realized that the dB has about 400GB of temporary files. I believe these temp files where created while I was running the query. My question is how do I delete these temp files. Also, how do I avoid such problems - use cursors or for-loops to not process all the data at once? Thanks.
I'm using Postgres 9.2

The temporary files that get created in base/pgsql_tmp during query execution will get deleted when the query is done. You should not delete them by hand.
These files have nothing to do with temporary tables, they are use to store data for large hash or sort operations that would not fit in work_mem.
Make sure that the query is finished or canceled, try running CHECKPOINT twice in a row and see if the files are still there. If yes, that's a bug; did the PostgreSQL server crash when it ran out of disk space?
If you really have old files in base/pgsql_tmp that do not get deleted automatically, I think it is safe to delete them manually. But I'd file a bug with PostgreSQL in that case.
There is no way to avoid large temporary files if your execution plan needs to sort large result sets or needs to create large hashes. Cursors won't help you there. I guess that with for-loops you mean moving processing from the database to application code – doing that is usually a mistake and will only move the problem from the database to another place where processing is less efficient.
Change your query so that it doesn't have to sort or hash large result sets (check with EXPLAIN). I know that does not sound very helpful, but there's no better way. You'll probably have to do that anyway, or is a runtime of several hours acceptable for you?

Related

What is the correct procedure to delete records from a large mongodb

So, here is the problem that I am running into, I have a mongodb that is about 2000tb, and there are constant read and write operations happening in there, However I have decided to truncate some old datas that are not being used, and this is where things starts to get interesting, Since the db is large and there is a constant read and write operations that are happing, wont the delete query on the old records be process intensive on the database or hold on the read or write ? so, here are the questions that I have:-
which type of parameters or situations be considered while writing a delete script like this ?
What are the things that might get affected when I run a delete query to delete a batch of records?
Is mongodb designed to handle situations like this ? will a simple delete query be effective ? can anyone point me to a resource of the internal working of the mongodb when a delete is fired ? (like deleting the record, and then rte-indexing etc)
I found this, however I am not sure this is the best fit that answers this situation.

PostgreSQL - Recovery of Functions' code following accidental deletion of data files

So, I am (well... I was) running PostgreSQL within a container (Ubuntu 14.04LTS with all the recent updates, back-end storage is "dir" because of convince).
To cut the long story short, the container folder got deleted. Following the use of extundelete and ext4magic, I have managed to extract some of the database physical files (it appears as if most of the files are there... but not 100% sure if and what is missing).
I have two copies of the database files. One from 9.5.3 (which appears to be more complete) and one from 9.6 (I upgraded the container very recently to 9.6, however it appears to be missing datafiles).
All I am after is to attempt and extract the SQL code the relates to the user defined functions. Is anyone aware of an approach that I could try?
P.S.: Last backup is a bit dated (due to bad practices really) so it would be last resort if the task of extracting the needed information is "reasonable" and "successful".
Regards,
G
Update - 20/4/2017
I was hoping for a "quick fix" by somehow extracting the function body text off the recovered data files... however, nothing's free in this life :)
Starting from the old-ish backup along with the recovered logs, we managed to cover a lot of ground into bringing the DB back to life.
Lessons learned:
1. Do implement a good backup/restore strategy
2. Do not store backups on the same physical machine
3. Hardware failure can be disruptive... Human error can be disastrous!
If you can reconstruct enough of a data directory to start postgres in single user mode you might be able to dump pg_proc. But this seems unlikely.
Otherwise, if you're really lucky you'll be able to find the relation for pg_proc and its corresponding pg_toast relation. The latter will often contain compressed text, so searches for parts of variables you know appear in function bodies may not help you out.
Anything stored inline in pg_proc will be short functions, significantly less than 8k long. Everything else will be in the toast relation.
To decode that you have to unpack the pages to get the toast hunks, then reassemble them and uncompress them (if compressed).
If I had to do this, I would probably create a table with the exact same schema as pg_proc in a new postgres instance of the same version. I would then find the relfilenode(s) for pg_catalog.pg_proc and its toast table using the relfilenode map file (if it survived) or by pattern matching and guesswork. I would replace the empty relation files for the new table I created with the recovered ones, restart postgres, and if I was right, I'd be able to select from the tables.
Not easy.
I suggest reading up on postgres's storage format as you'll need to understand it.
You may consider https://www.postgresql.org/support/professional_support/ . (Disclaimer, I work for one of the listed companies).
P.S.: Last backup is a bit dated (due to bad practices really) so it would be last resort if the task of extracting the needed information is "reasonable" and "successful".
Backups are your first resort here.
If the 9.5 files are complete and undamaged (or enough so to dump the schema) then simply copying them in place, checking permissions and starting the server will get you going. Don't trust the data though, you'll need to check it all.
Although it is possible to partially recover given damaged files, it's a long complicated process and the fact that you are asking on Stack Overflow probably means it's not for you.

I TRUNCATEd a table. How do I get the data back?

In my postgesql database, unfortunately I truncate this table mail_group, and the table is delete from the database, how to I get back this table.
Kindly help me, waiting for reply.
Thanks
Anyone else in the same situation: immediately stop your database with pg_ctl stop -m immediate (the immediate is important, you need to simulate a crash and prevent a checkpoint) then do not restart it.. If you had concurrent transactions still in progress you might be really lucky and PostgreSQL might not have unlinked the backing files for the table yet, so it could maybe be recoverable.
You very likely can't get the data back, you deleted it. Restore from a backup.
A normal DELETE in PostgreSQL marks the rows as deleted but does not actually erase the data immediately, so it can often be recovered if you promptly stop the database and you don't write anything else to the table.
This is not the case for TRUNCATE. TRUNCATE deletes the underlying files that represent the database table from the file system.
Recovering the data, if possible at all, would require forensic analysis of your hard drive. If the data is truly important then power the computer off now and take a disk image of the hard drive. Expect recover work to cost multiple thousand dollars, if it is possible at all, since you will need someone who knows both (a) file system internals and (b) PostgreSQL internals. The only person I can think of who I know has the skills to possibly be able to do this would probably cost about €5000 to €10000 for the time required for this sort of work. (It isn't me).
If you didn't have backups you have just learned a very expensive lesson.
If someone else is reading this and DELETEd rows, please immediately follow the instructions in corruption since the first recovery steps are the same. This will not help if you ran TRUNCATE.

How to verify large postgresql Databases running different version have the same data without dumping

How Would I verify that the data in a 8.3 postgresql DB is the same as the data in a 9.0 DB
When I did a sql dump on a example table there we3re many differences that showed but this was due to 9.0 truncating 0's on the end and begining of date fields, also the order of the dump was not fixed, even though this can be sorted with sort(no pun intended). it does not allow validation as it would loose what table it was part of as the sorted sql dump would be a meaningless splat of sql commands with dump settings thrown in for extra.
count(*) is also not adequate.
I would like to be 100% sure that the data in one is equal to the data in the other despite the version differences and the way that at the very least dates are held in 9.0.
I should add I have several hundred tables and many hundred GB of data. so i need a automated process like diff DUMPa.sql DUMP2.sql, a SHA of the data(not the format) would be idea, but one cannot diff binary dumps of PostgreSQL for well known reasons. I am aware mysql has a checksum feature, but im using postgresql.
First the bad news. There is really no way to offer the full concerns you want addressed without loading all the data into an intermediary program and directly comparing. This will take time and it will drag your system down load-wise so my recommendation is set up some sort of replication and compare replicas.
One thing you might be able to do is to use something like Slony or Bucardo to replicate, and then triggers to move data into secondary child partitions and replicate those onto a consolidated server for comparison. You could then compare within PostgreSQL. This would reduce the load and it would mean your reporting data would be relatively easy to manage compared to other approaches. But all the data is going to have to be loaded and compared line-by-line.

SQLite3: Batch Insert?

I've got some old code on a project I'm taking over.
One of my first tasks is to reduce the final size of the app binary.
Since the contents include a lot of text files (around 10.000 of them), my first thought was to create a database containing them all.
I'm not really used to SQLite and Core Data, so I've got basically two questions:
1 - Is my assumption correct? Should my SQLite file have a smaller size than all of the text files together?
2 - Is there any way of automating the task of getting them all into my newly created database (maybe using some kind of GUI or script), one file per record inside a single table?
I'm still experimenting with CoreData, but I've done a lot of searching already and could not find anything relevant to bringing everything together inside the database file. Doing that manually has proven no easy task already!
Thanks.
An alternative to using SQLite might be to use a zipfile instead. This is easy to create, and will surely safe space (and definitely reduce the number of files). There are several implementations of using zipfiles on the iphone, e.g. ziparchive or TWZipArchive.
1 - It probably won't be any smaller, but you can compress the files before storing them in the database. Or without the database for that matter.
2 - Sure. It's shouldn't be too hard to write a script to do that.
If you're looking for a SQLite bulk insert command to write your script for 2), there isn't one AFAIK. Prepared insert statments in a loop inside a transaction is the best you can do, I imagine it would take only a few seconds (if that) to insert 10,000 records.