I get an error "could not write block .... of temporary file no space left on device ..." using postgresql - postgresql

I'm running a really big query, that insert a lot of rows in table, almost 8 million of rows divide in some smaller querys, but in some moment appear that error : "I get an error "could not write block .... of temporary file no space left on device ..." using postgresql". I don't know if i need to delete temporary files after each query and how I can to do that, or if it is related with another issue.
Thank you

OK. As there are still some facts missing, an attempt to answer to maybe clarify the issue:
It appears that you are running out of disk space. Most likely because you don't have enough space on your disk. Check on a Linux/Unix df -h for example.
To show you, how this could happen:
Having a table with maybe 3 integers the data alone will occupy about 12Byte. You need to add some overhead to it for row management etc. On another answer Erwin mentioned about 23Byte and linked to the manual for more information about. Also there might needs some padding betweens rows etc. So doing a little math:
Even with a 3 integer we will end up at about 40 Byte per row. Having in mind you wanted to insert 8,000,000 this will sum up to 320,000,000Byte or ~ 300MB (for our 3 integer example only and very roughly).
Now giving, you have a couple of indexes on this table, the indexes will also grow during the inserts. Also another aspect might could be bloat on the table and indexes which might can be cleared with a vacuum.
So what's the solution:
Provide more disk space to your database
Split your inserts a little more and ensure, vacuum is running between them

Inserting data or index(create) always needs temp_tablespaces, which determines the placement of temporary tables and indexes, as well as temporary files that are used for purposes such as sorting large data sets.according to your error, it meant that your temp_tablespace location is not enough for disk space .
To resolve this problem you may need these two ways:
1.Re-claim the space of your temp_tablespace located to, default /PG_DATA/base/pgsql_tmp
2. If your temp_tablespace space still not enough for temp storing you can create the other temp tablespace for that database:
create tablespace tmp_YOURS location '[your enough space location';
alter database yourDB set temp_tablespaces = tmp_YOURS ;
GRANT ALL ON TABLESPACE tmp_YOURS to USER_OF_DB;
then disconnect the session and reconnect it.

The error is quite self-explanatory. You are running a big query yet you do not have enough disk space to do so. If postgresql is installed in /opt...check if you have enough space to run the query. If not LIMIT the output to confirm you are getting the expected output and then proceed to run the query and write the output to a file.

Related

Postgres: Do we always need at least 3-4 times free the space of the biggest table?

we are using Postgres to store ~ 2.000.000.000 samples. This ends up in tables with ~ 500 mio entries and ~100GB Size each table.
What I want to do:
E.g. update the table entries: UPDATE table SET flag = true;
After this, the table is twice as big, i.e. 200GB
To get the space (stored on a SSD) back we: "VACCUM FULL table"
Unfortunately, this step needs again loads of space which results in the Vacuum to fail due to too little space left.
My Questions:
Does this mean, that, in order to make this UPDATE query only once and to get the space back for other tables in this DB we need at least 300-400GB space for a 100GB table?
In your scenario, you won't get away without having at least twice as much space as the table data would require.
The cheapest solution is probably to define the table with a fillfactor of 50 so that half of each block is left empty, thereby doubling the table size. Then the updated rows can all be in the same block as the original rows, and the UPDATE won't increase the table size because PostgreSQL can use the heap only tuple (HOT) update feature. The old versions will be freed immediately if there are no long running transactions that can still see them.
NOTE: This will only work if the colum you are updating is not indexed.
The downside of this approach is that the table is always twice the necessary size, and all sequential scans will take twice as long. It won't bother you if you don't use sequential scans of the table.

postgres 9.2 table size with pg_total_relation_size

I process a table with ~10^7 rows the following way: take last N rows, update them in some way, and delete, then vacuum table. In the end I make a query for pg_total_relation_size. Loop repeats until the table is over. Each iteration last for several seconds. There are no any other queries for this table except mentioned above. The problem is that I get the same results for table size. It changes about once a several hours.
So the question is -- does postgres store somewhere the table size or does it calculate it every time the function is invoked? I.e., does my table size really stays the same in spite of its processing?
Your table really does stay the same size on disk despite the DELETEs and VACUUMing you're doing. As per the documentation on VACUUM, ordinary VACUUM only releases space back to the OS if it can do so by truncating free space from the end of the file without rearranging live rows.
The space is still "free" in that PostgreSQL can re-use it for other new rows. It is much, much faster to re-use space that PostgreSQL hasn't given back to the OS than it is to extend a relation with new space, so this is often preferable.
The other reason Pg doesn't just give this space back is that it can only give space back to the OS when it's a contiguous chunk with no visible rows until the end of the file. This doesn't happen much so in practice Pg needs to move some rows around to compact the table and allow it to free space at the end, kind of like a defrag on a file system. This is an inefficient and slow process that can counter-intuitively make the table slower to access instead of faster, so it's not always a good idea.
If you have a relation that's mostly but not entirely empty it can be worth doing a VACUUM FULL (Pg 9.0 and above) or CLUSTER (all versions) to free the space. If you expect to refill the table this is usually counter-productive; it's actually better to leave it as-is.
(For what I mean by terms like "live" and "visible" see the documentation on MVCC which will help you understand Pg's table organisation.)
Personally I'd skip the manual VACUUM in your case. Turn autovacuum up if you need to. If you really need to you could consider partitioning your table, processing it partition by partition and TRUNCATE each partition when you're done processing it.

Is killing a "CLUSTER ON index" dangerous for database?

All the question is in the title,
if we kill a cluster query on a 100 millions row table, will it be dangerous for database ?
the query is running for 2 hours now, and i need to access the table tomorrow morning (12h left hopefully).
I thought it would be far quicker, my database is running on raid ssd and Bi-Xeon Processor.
Thanks for your wise advice.
Sid
No, you can kill the cluster operation without any risk. Before the operation is done, nothing has changed to the original table- and indexfiles. From the manual:
When an index scan is used, a temporary copy of the table is created
that contains the table data in the index order. Temporary copies of
each index on the table are created as well. Therefore, you need free
space on disk at least equal to the sum of the table size and the
index sizes.
When a sequential scan and sort is used, a temporary sort file is also
created, so that the peak temporary space requirement is as much as
double the table size, plus the index sizes.
As #Frank points out, it is perfectly fine to do so.
Assuming you want to run this query in the future and assuming you have the luxury of a service window and can afford some downtime, I'd tweak some settings to boost the performance a bit.
In your configuration:
turn off fsync, for higher throughput to the file system
Fsync stands for file system sync. With fsync on, the database waits for the file system to commit on every page flush.
maximize your maintenance_work_mem
It's ok to just take all memory available, as it will not be allocated during production hours. I don't know how big your table and the index you are working on are, things will run faster when they can be fully loaded in main memory.

DB Trigger to limit maximum table size in Postgres

Is it possible, perhaps using DB-triggers to set a maximum table-size in a postgres DB?
For example, say I have a table called: Comments.
From the user perspective, this can be done as frequently as possible, but say I only want to store the 100 most recent comments in the DB. So what I want to do is have a trigger that automatically maintains this. I.e. when more than 100 comments are there, it deletes the oldest one, etc.
Could someone help me with writing such a trigger?
I think a trigger is the wrong tool for the job; although it is possible to implement this. Something about spawning a "delete" from an executing insert makes the hair on my neck neck stand up. You will generate a lot of locking and possibly contention that way; and inserts should generally not generate locks.
To me this says "stored procedure" all the way.
But I also think you should ask yourself, "why delete" old comments? Deletes are an anathema. Better just limit them when you display them. If you are really worried about the size of the table, use a TEXT column. Postgres will maintain these in a shadow table and full scans of the original table will blaze along just fine.
Limiting to 100 comments per user is rather simple, e.g.
delete from comments where user_id = new.user_id
order by comment_date desc offset 100;
Limiting the byte size is trickier. You'd need to calculate the relevant row sizes and that won't account for index sizes, dead rows, etc. At best you'd use the admin functions to get the table size but these won't yield the size per user, only the total size.
We could in theory create a table of 100 dummy records and then simply overwrite them with the actual comments. Once we pass the 100th we will overwrite the 1st one, etc.
This way we are suppose to keep the same size of the table, but that is not possible, because an update is equivalent to delete,insert in Postgresql. So the size of the table will continue to grow.
So if the objective is not to overflow the disk drive then once the disk is full at 80% a "vacuum full" should be performed to free up disk space. "Vacuum full" requires disk space by itself. If you kept the records to a fixed number then there will be an effect of the vacuum. Also there seems to be cases where vacuum can fail.

What is the effect on record size of reordering columns in PostgreSQL?

Since Postgres can only add columns at the end of tables, I end up re-ordering by adding new columns at the end of the table, setting them equal to existing columns, and then dropping the original columns.
So, what does PostgreSQL do with the memory that's freed by dropped columns? Does it automatically re-use the memory, so a single record consumes the same amount of space as it did before? But that would require a re-write of the whole table, so to avoid that, does it just keep a bunch of blank space around in each record?
The question is old, but since both answers are wrong or misleading, I'll add another one.
When updating a row, Postgres writes a new row version and the old one is eventually removed by VACUUM after no running transaction can see it any more.
Plain VACUUM does not return disk space from the physical file that contains the table to the system, unless it finds completely dead or empty blocks at the physical end of the table. You need to run VACUUM FULL or CLUSTER to aggressively compact the table and return excess space to the system. This is not typically desirable in normal operation. Postgres can re-use dead tuples to keep new row versions on the same data page, which benefits performance.
In your case, since you update every row, the size of the table is doubled (from its minimum size). It's advisable to run VACUUM FULL or CLUSTER to return the bloat to the system.
Both take an exclusive lock on the table. If that interferes with concurrent access, consider pg_repack, which can do the same without exclusive locks.
To clarify: Running CLUSTER reclaims the space completely. No VACUUM FULL is needed after CLUSTER (and vice versa).
More details:
PostgreSQL 9.2: vacuum returning disk space to operating system
From the docs:
The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.
You'll need to do a CLUSTER followed by a VACUUM FULL to reclaim the space.
Why do you "reorder" ? There is no order in SQL, it doesn't make sence. If you need a fixed order, tell your queries what order you need or use a view, that's what views are made for.
Diskspace will be used again after vacuum, auto_vacuum will do the job. Unless you disabled this process.
Your current approach will kill overall performance (table locks), indexes have to be recreated, statistics go down the toilet, etc. etc. And in the end, you end up with the same situation you allready had. So why the effort?