I'm managing a PostgreSQL database server for some users who need to create temporary tables. One user accidentally sent a query with ridiculously many outer joins, and that completely filled the disk up.
PostgreSQL has a temp_file_limit parameter but it seems to me that it is not relevant:
It should be noted that disk space used for explicit temporary tables, as opposed to temporary files used behind-the-scenes in query execution, does not count against this limit.
Is there a way then to put a limit on the size on disk of "explicit" temporary tables? Or limit the row count? What's the best approach to prevent this?
The only way to limit a table's size in PostgreSQL is to put it in a tablespace on a file system of an appropriate size.
Since temporary tables are created in the default tablespace of the database you are connected to, you have to place your database in that size restricted tablespace. To keep your regular tables from being limited in the same way, you'd have to explicitly create them in a different, less limited tablespace. Make sure that your user has no permissions on that less limited tablespace.
This is a rather unappealing solution, so maybe you should rethink your requirement. After all, the user could just as well fill up the disk by inserting the data into a permanent table.
Related
I use schemas in PostgreSQL for organizing my huge accounting database. At the end of every year I make a reconcile process by creating a new schema for the next year.
Are the files of the new schema physically separated from the old schema? Or all schemas stored on the hard disk together?
This is a vital thing for me because at the end of every year I've huge tables with millions of records which means I'll call heavy queries soon (I didn't plan for it when I decided to choose PostgreSQL).
Schemas are namespaces so they are a "logical" thing, not a physical thing.
As documented in the manual each table is represented as one (or more files) inside the directory corresponding to the database the table is created in. The namespaces (schemas) are not reflected in the physical database layout.
In general you shouldn't care about the storage of the database to begin with and your SQL queries will not know where the actual data is stored.
"millions" of rows is not considered "huge" these days. If you do run in performance problems, you will tune your query using e.g. indexes or by rewriting it to a more efficient solution. In rare cases partitioning a table can help with really huge tables - but we are talking hundreds of millions or even billions of rows. With medium to small sized tables, partitioning usually doesn't help with performance.
What is the hierarchy of Database related objects in postgres SQL?
Should it be like, table space must be created at instance level unlike other RDBMS(where we have table space under database).
If so we create the table space at instance level, what is the purpose of database? and what is difference between table space and database on postgres server?
An instance (in PostgreSQL called cluster) is a data directory initialized with initdb with a PostgreSQL server process.
A tablespace is a directory outside the data directory where objects can also be stored. Tablespaces are useful for certain corner cases like distributing I/O or limiting space for a subset of the data.
A database is a container for objects with permissions, organized in schemas.
The difference is that tablespaces are a physical concept, it defines a space where the data are stored, while databases are a logical concept about how data are organized, what they mean, how they are related, who is allowed to access them and so on.
The two concepts are orthogonal.
A database can have tables in several tablespaces, and a tablespace can contain data from several databases.
Database is where you organize all your objects. Tablespace is just storage space for those object.
You can storage your db object in different Tablespace. For example one table is storage in a Tablespace in diskA but another Table use a Tablespace in diskB to improve the performance. Or maybe you need a tablespace for big tables and dont mind use a slow big HDD for those objects.
I have a mobile/web project, using pg9.3 as database, and linux as server.
The data won't be huge, but as time goes on, the data increase.
For long term considering, I want to know about:
Questions:
1. Is it necessary for me to create tablespace for my database, or just use the default one?
2. If I create new tablespace, what is the proper location on linux to create the folder, and why?
3. If I don't create it now, and wait until I have to, till then, will it be easy for me to migrate db with data to new tablespace?
Just use the default tablespace, do not create new tablespaces. Tablespaces are only useful if you have multiple physical disks, so you can define which data is stored on which physical disk. The directory where your data is located is not that important for the workings of postgres, so if you only have one disk it is useless to use tablespaces
Should your data grow beyond the capacity of 1 disk, you will have to perform a full data migration anyway to move it to another physical disk, so you can configure tablespaces at that time
The idea behind defining which data is located on which disk (with tablespaces) is that you can do things like putting a big table which is hardly used on a slow disk, and putting this very intensively used table on a separated faster disk. But I assume you're not there yet, so don't over complicate things
I am using DB2 v9.5, the database is not automatic storage and table spaces are all SMS (I know that SMS is not the best practice, but I'm studying to perform the migration then).
I dropped a total of 144 indexes, which were not used, but the amount of pages used/allocated in the database did not change after the DROP INDEX.
As far as I remember, for SMS tablespaces, if DROP of objects (tables or indexes), REORGs not be necessary, unless you had just deleted rows from the table, where it would be necessary to run the REORG to reduce the size allocated for the table .
Some opnion of what can be done to actually free the space from the indexes that were dropped?
Thanks
When you are sure you had your indexes in SMS tablespaces, you should look in the corresponding filesystem, e.g. with df -h or some such.
I have a rather large Insert Query, and upon running this my slow disks fill up towards 100% upon where I revive:
Transaction aborted because DBD::Pg::db do failed: ERROR: could not write to hash-join temporary file: No space left on device
Sounds believable, I have a Fast drive with lots of space on it that i could use instead of the slow disk but I dont want to make the fast disk the default table-space or move the table im inserting into to the fast disk, I just want that data-blob that is generated as part of the insert query to be on the fast disk table-space. Is this possible is PostgreSQL and if so how?
version 9.1
You want the temp_tablespaces configuration directive. See the docs.
Temporary files for purposes such as sorting large data sets are also
created in these tablespaces
You must CREATE TABLESPACE the tablespace(s) before using them in an interactive SET temp_tablespaces command.
SET LOCAL temp_tablespaces may be used to set it only for the current transaction.