How to set the table space of a sub select in PostgreSQL - postgresql

I have a rather large Insert Query, and upon running this my slow disks fill up towards 100% upon where I revive:
Transaction aborted because DBD::Pg::db do failed: ERROR: could not write to hash-join temporary file: No space left on device
Sounds believable, I have a Fast drive with lots of space on it that i could use instead of the slow disk but I dont want to make the fast disk the default table-space or move the table im inserting into to the fast disk, I just want that data-blob that is generated as part of the insert query to be on the fast disk table-space. Is this possible is PostgreSQL and if so how?
version 9.1

You want the temp_tablespaces configuration directive. See the docs.
Temporary files for purposes such as sorting large data sets are also
created in these tablespaces
You must CREATE TABLESPACE the tablespace(s) before using them in an interactive SET temp_tablespaces command.
SET LOCAL temp_tablespaces may be used to set it only for the current transaction.

Related

Limit size of temporary tables (PostgreSQL)

I'm managing a PostgreSQL database server for some users who need to create temporary tables. One user accidentally sent a query with ridiculously many outer joins, and that completely filled the disk up.
PostgreSQL has a temp_file_limit parameter but it seems to me that it is not relevant:
It should be noted that disk space used for explicit temporary tables, as opposed to temporary files used behind-the-scenes in query execution, does not count against this limit.
Is there a way then to put a limit on the size on disk of "explicit" temporary tables? Or limit the row count? What's the best approach to prevent this?
The only way to limit a table's size in PostgreSQL is to put it in a tablespace on a file system of an appropriate size.
Since temporary tables are created in the default tablespace of the database you are connected to, you have to place your database in that size restricted tablespace. To keep your regular tables from being limited in the same way, you'd have to explicitly create them in a different, less limited tablespace. Make sure that your user has no permissions on that less limited tablespace.
This is a rather unappealing solution, so maybe you should rethink your requirement. After all, the user could just as well fill up the disk by inserting the data into a permanent table.

How to resolve Amazon RDS Postgresql instance's DiskFull error?

We are having a very small Database for storing some relational data in an Amazon RDS instance. The version of the PostgreSQL Engine is 12.7.
There are a number of lambda functions in AWS in the same region, that access this instance for inserting records. In the process, some join queries are also used. We use psycopg2 Python library to interact with the DB. Since, the size of the data is very small, we have used a t2.small instance with 20GB storage and 1 CPU. In the production, however, a t2.medium instance has been used. Auto Scaling has not been enabled.
Recently, we have started experiencing an issue with this database. After the lambda functions run for a while, at some point, they time out. This is because the database takes too long to return a response, or, some times throws a Disk Full error as follows:
DiskFull
could not write to file "base/pgsql_tmp/pgsql_tmp1258.168": No space left on device
I have referred this documentation to identify the cause. Troubleshoot RDS DiskFull error
Following are the queries for checking the DB file size:
SELECT pg_size_pretty(pg_database_size('db_name'));
The response of this query is 35 MB.
SELECT pg_size_pretty(SUM(pg_relation_size(oid))) FROM pg_class;
The output of the above query is 33 MB.
As we can see, the DB file size is very small. However, on checking the size of the temporary files, we see the following:
SELECT datname, temp_files AS "Temporary files",temp_bytes AS "Size of temporary files" FROM pg_stat_database;
If we look at the size of the temporary files, its roughly 18.69 GB, which is why the DB is throwing a DiskFull error.
Why is the PostgreSQL instance not deleting the temporary files after the queries have finished? Even after rebooting the instance, the temporary file size is the same (although this is not a feasible solution as we want the DB to delete the temporary files on its own). Also, how do I avoid the DiskFull error as I may want to run more lambda functions that interact with the DB.
Just for additional information, I am including some RDS Monitoring graphs taken while the DB slowed down for CPU Utilisation and Free Storage Space:
From this, I am guessing that we probably need to enable autoscaling as the CPU Utilisation hits 83.5%. I would highly appreciate if someone shared some insights and helped in resolving the DiskFull error and identify why the temporary files are not deleted.
One of the join queries the lambda function runs on the database is:
SELECT DISTINCT
scl1.*, scl2.date_to AS compiled_date_to
FROM
logger_main_config_column_name_loading
JOIN
column_name_loading ON column_name_loading.id = logger_main_config_column_name_loading.column_name_loading_id
JOIN
sensor_config_column_name_loading ON sensor_config_column_name_loading.column_name_loading_id = column_name_loading.id
JOIN
sensor_config_loading AS scl1 ON scl1.id = sensor_config_column_name_loading.sensor_config_loading_id
INNER JOIN (
SELECT id, hash, min(date_from) AS date_from, max(date_to) AS date_to
FROM sensor_config_loading
GROUP BY id, hash
) AS scl2
ON scl1.id = scl2.id AND scl1.hash=scl2.hash AND scl1.date_from=scl2.date_from
WHERE
logger_main_config_loading_id = %(logger_main_config_loading_id)s;
How can this query be optimized? Will running smaller queries in a loop be faster?
pg_stat_database does not show the current size and number of temporary files, it shows cumulative historical data. So your database had 145 temporary files since the statistics were last reset.
Temporary files get deleted as soon as the query is done, no matter if it succeeds or fails.
You get the error because you have some rogue queries that write enough temporary files to fill the disk (perhaps some forgotten join conditions). To avoid the out-of-space condition, set the parameter temp_file_limit in postgresql.conf to a reasonable value and reload PostgreSQL.

PostgreSQL: even read access changes data files disk leading to large incremental backups using pgbackrest

We are using pgbackrest to backup our database to Amazon S3. We do full backups once a week and an incremental backup every other day.
Size of our database is around 1TB, a full backup is around 600GB and an incremental backup is also around 400GB!
We found out that even read access (pure select statements) on the database has the effect that the underlying data files (in /usr/local/pgsql/data/base/xxxxxx) change. This results in large incremental backups and also in very large storage (costs) on Amazon S3.
Usually the files with low index names (e.g. 391089.1) change on read access.
On an update, we see changes in one or more files - the index could correlate to the age of the row in the table.
Some more facts:
Postgres version 13.1
Database is running in docker container (docker version 20.10.0)
OS is CentOS 7
We see the phenomenon on multiple servers.
Can someone explain, why postgresql changes data files on pure read access?
We tested on a pure database without any other resources accessing the database.
This is normal. Some cases I can think of right away are:
a SELECT or other SQL statement setting a hint bit
This is a shortcut for subsequent statements that access the data, so they don't have t consult the commit log any more.
a SELECT ... FOR UPDATE writing a row lock
autovacuum removing dead row versions
These are leftovers from DELETE or UPDATE.
autovacuum freezing old visible row versions
This is necessary to prevent data corruption if the transaction ID counter wraps around.
The only way to fairly reliably prevent PostgreSQL from modifying a table in the future is:
never perform an INSERT, UPDATE or DELETE on it
run VACUUM (FREEZE) on the table and make sure that there are no concurrent transactions

Postgres - Is it necessary to create tablespace in my case?

I have a mobile/web project, using pg9.3 as database, and linux as server.
The data won't be huge, but as time goes on, the data increase.
For long term considering, I want to know about:
Questions:
1. Is it necessary for me to create tablespace for my database, or just use the default one?
2. If I create new tablespace, what is the proper location on linux to create the folder, and why?
3. If I don't create it now, and wait until I have to, till then, will it be easy for me to migrate db with data to new tablespace?
Just use the default tablespace, do not create new tablespaces. Tablespaces are only useful if you have multiple physical disks, so you can define which data is stored on which physical disk. The directory where your data is located is not that important for the workings of postgres, so if you only have one disk it is useless to use tablespaces
Should your data grow beyond the capacity of 1 disk, you will have to perform a full data migration anyway to move it to another physical disk, so you can configure tablespaces at that time
The idea behind defining which data is located on which disk (with tablespaces) is that you can do things like putting a big table which is hardly used on a slow disk, and putting this very intensively used table on a separated faster disk. But I assume you're not there yet, so don't over complicate things

Error: 1105 in sql server

I'm using SOL server 2012 as database server.
Right now mdf file size is around 10 GB.
When ever I'm doing any transaction into this database sql server troughs bellow error
Error Number : 1105 Error Message :Could not allocate space for object dbo.tblsdr . PK_tblsdr_3213E83F0AD2A005 in database hwbsssdr because the PRIMARY filegroup is full.
Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the file-group.
There is almost 400 GB of free space is available on my disc.
Can any one tell me what is the issue and how can i solve that.
Since you are using Express edition of SQL Server 2012, there is a limitations of 10GB per database, so this is your problem.
By the way, problem doesn't need necessarily to be with disk that is running out.
Also, if you have database with set up MAXSIZE to specific value, and if database reach that value every next transaction will report error from your question.
So, if you are sure that you have enough disk space for next transactions, check MAXSIZE property of your database executing next code:
use master;
go
exec sp_helpdb [YourDatabase]
If you want to change MAXSIZE database property, you can do this with next code:
alter database [YourDatabase]
modify file
(
name = 'YourDatabaseFile',
maxsize = X MB
)
Explanation
The specified filegroup has run out of free space.
Action
To gain more space, you can free disk space on any disk drive containing a file in the full filegroup, allowing files in the group to grow. Or you can gain space using a data file with the specified database.
Freeing disk space
You can free disk space on your local drive or on another disk drive. To free disk space on another drive:
Move the data files in the filegroup with an insufficient amount of free disk space to a different disk drive.
Detach the database by executing sp_detach_db.
Attach the database by executing sp_attach_db, pointing to the moved files.
Using a data file
Another solution is to add a data file to the specified database using the ADD FILE clause of the ALTER DATABASE statement. Or you can enlarge the data file by using the MODIFY FILE clause of the ALTER DATABASE statement, specifying the SIZE and MAXSIZE syntax.