How to cretae Buffer Pool in Database dedicated only for ONE BIG table?

How to cretae Buffer Pool in Database dedicated only for ONE BIG table? - db2

I have table TICKET with 400K records in database (DB2).
I wish to create one huge buffer pool which will be dedicated only to this one big table for faster response. What are the steps to do it?
Also at the moment I have one Buffer Pool which coovers whole Table space with all the tables (about 200) in database! what will happen then with that my specific table in that old firstly created buffer pool? should that table stay in first buffer pool or how to remove from that buffer pool??
Also are there some risks for this action???
Thank you

I think this article will help you: http://www.ibm.com/developerworks/data/library/techarticle/0212wieser/index.html
Moving your large table into a different buffer pool may increase performance, but it depends on your use case. A relevant quote from the article:
Having more than one buffer pool can preserve data in the buffers. For
example, you might have a database with many very-frequently used
small tables, which would normally be in the buffer in their entirety
to be accessible very quickly. You might also have a query that runs
against a very large table that uses the same buffer pool and involves
reading more pages than the total buffer size. When this query runs,
the pages from the small, very frequently used tables are lost, making
it necessary to re-read them when they are needed again. If the small
tables have their own buffer pool, thereby making it necessary for
them to have their own table space, their pages cannot be overwritten
by the large query. This can lead to better overall system
performance, albeit at the price of a small negative effect on the
large query.
If you do decide to do this, you can only have one buffer pool per tablespace, so you would need to move your large table into its own tablespace. The article gives examples of creating tablespaces and buffer pools.
A table can be moved to another tablespace with ADMIN_MOVE_TABLE. I don't think it is risky. It captures changes that may be made to the source table during moving. The only thing it does is disable a few (rarely used) actions on the source table during moving.
You assign a buffer pool to a tablespace by specifying it in the CREATE TABLESPACE or ALTER TABLESPACE statement.

Related

huge postgres database size reduction

I have a huge database (9 billon rows, 3000 columns) which is currently host on postgres, it is huge about 30 TB in size. I am wondering if there is some practical ways to reduce the size of the database while preserving the same information in the database, as storage is really costly.

If you don't want to delete any data.
Vacuuming. Depending on how many updates/deletes your database performs there will be much garbage. A database that size is likely full of tables that do not cross vacuum thresholds often (pg13 has a fix for this). Manually running vacuums will allocate dead rows for re-use, and free up space where ends of pages are no longer needed.
Index management. These bloat over time and should be smaller than your tables. Re-indexing (concurrently) will give you some space back, or allow re-use of existing pages.
data de-duplication / normalisation. See where you can remove data from tables where it is not needed, or presented elsewhere in the database.

When should one vacuum a database, and when analyze?

I just want to check that my understanding of these two things is correct. If it's relevant, I am using Postgres 9.4.
I believe that one should vacuum a database when looking to reclaim space from the filesystem, e.g. periodically after deleting tables or large numbers of rows.
I believe that one should analyse a database after creating new indexes, or (periodically) after adding or deleting large numbers of rows from a table, so that the query planner can make good calls.
Does that sound right?

vacuum analyze;
collects statistics and should be run as often as much data is dynamic (especially bulk inserts). It does not lock objects exclusive. It loads the system, but is worth of. It does not reduce the size of table, but marks scattered freed up place (Eg. deleted rows) for reuse.
vacuum full;
reorganises the table by creating a copy of it and switching to it. This vacuum requires additional space to run, but reclaims all not used space of the object. Therefore it requires exclusive lock on the object (other sessions shall wait it to complete). Should be run as often as data is changed (deletes, updates) and when you can afford others to wait.
Both are very important on dynamic database

Correct.
I would add that you can change the value of the default_statistics_target parameter (default to 100) in the postgresql.conf file to a higher number, after which, you should restart your server and run analyze to obtain more accurate statistics.

postgres 9.2 table size with pg_total_relation_size

I process a table with ~10^7 rows the following way: take last N rows, update them in some way, and delete, then vacuum table. In the end I make a query for pg_total_relation_size. Loop repeats until the table is over. Each iteration last for several seconds. There are no any other queries for this table except mentioned above. The problem is that I get the same results for table size. It changes about once a several hours.
So the question is -- does postgres store somewhere the table size or does it calculate it every time the function is invoked? I.e., does my table size really stays the same in spite of its processing?

Your table really does stay the same size on disk despite the DELETEs and VACUUMing you're doing. As per the documentation on VACUUM, ordinary VACUUM only releases space back to the OS if it can do so by truncating free space from the end of the file without rearranging live rows.
The space is still "free" in that PostgreSQL can re-use it for other new rows. It is much, much faster to re-use space that PostgreSQL hasn't given back to the OS than it is to extend a relation with new space, so this is often preferable.
The other reason Pg doesn't just give this space back is that it can only give space back to the OS when it's a contiguous chunk with no visible rows until the end of the file. This doesn't happen much so in practice Pg needs to move some rows around to compact the table and allow it to free space at the end, kind of like a defrag on a file system. This is an inefficient and slow process that can counter-intuitively make the table slower to access instead of faster, so it's not always a good idea.
If you have a relation that's mostly but not entirely empty it can be worth doing a VACUUM FULL (Pg 9.0 and above) or CLUSTER (all versions) to free the space. If you expect to refill the table this is usually counter-productive; it's actually better to leave it as-is.
(For what I mean by terms like "live" and "visible" see the documentation on MVCC which will help you understand Pg's table organisation.)
Personally I'd skip the manual VACUUM in your case. Turn autovacuum up if you need to. If you really need to you could consider partitioning your table, processing it partition by partition and TRUNCATE each partition when you're done processing it.

DB Trigger to limit maximum table size in Postgres

Is it possible, perhaps using DB-triggers to set a maximum table-size in a postgres DB?
For example, say I have a table called: Comments.
From the user perspective, this can be done as frequently as possible, but say I only want to store the 100 most recent comments in the DB. So what I want to do is have a trigger that automatically maintains this. I.e. when more than 100 comments are there, it deletes the oldest one, etc.
Could someone help me with writing such a trigger?

I think a trigger is the wrong tool for the job; although it is possible to implement this. Something about spawning a "delete" from an executing insert makes the hair on my neck neck stand up. You will generate a lot of locking and possibly contention that way; and inserts should generally not generate locks.
To me this says "stored procedure" all the way.
But I also think you should ask yourself, "why delete" old comments? Deletes are an anathema. Better just limit them when you display them. If you are really worried about the size of the table, use a TEXT column. Postgres will maintain these in a shadow table and full scans of the original table will blaze along just fine.

Limiting to 100 comments per user is rather simple, e.g.
delete from comments where user_id = new.user_id
order by comment_date desc offset 100;
Limiting the byte size is trickier. You'd need to calculate the relevant row sizes and that won't account for index sizes, dead rows, etc. At best you'd use the admin functions to get the table size but these won't yield the size per user, only the total size.

We could in theory create a table of 100 dummy records and then simply overwrite them with the actual comments. Once we pass the 100th we will overwrite the 1st one, etc.
This way we are suppose to keep the same size of the table, but that is not possible, because an update is equivalent to delete,insert in Postgresql. So the size of the table will continue to grow.
So if the objective is not to overflow the disk drive then once the disk is full at 80% a "vacuum full" should be performed to free up disk space. "Vacuum full" requires disk space by itself. If you kept the records to a fixed number then there will be an effect of the vacuum. Also there seems to be cases where vacuum can fail.

What is the effect on record size of reordering columns in PostgreSQL?

Since Postgres can only add columns at the end of tables, I end up re-ordering by adding new columns at the end of the table, setting them equal to existing columns, and then dropping the original columns.
So, what does PostgreSQL do with the memory that's freed by dropped columns? Does it automatically re-use the memory, so a single record consumes the same amount of space as it did before? But that would require a re-write of the whole table, so to avoid that, does it just keep a bunch of blank space around in each record?

The question is old, but since both answers are wrong or misleading, I'll add another one.
When updating a row, Postgres writes a new row version and the old one is eventually removed by VACUUM after no running transaction can see it any more.
Plain VACUUM does not return disk space from the physical file that contains the table to the system, unless it finds completely dead or empty blocks at the physical end of the table. You need to run VACUUM FULL or CLUSTER to aggressively compact the table and return excess space to the system. This is not typically desirable in normal operation. Postgres can re-use dead tuples to keep new row versions on the same data page, which benefits performance.
In your case, since you update every row, the size of the table is doubled (from its minimum size). It's advisable to run VACUUM FULL or CLUSTER to return the bloat to the system.
Both take an exclusive lock on the table. If that interferes with concurrent access, consider pg_repack, which can do the same without exclusive locks.
To clarify: Running CLUSTER reclaims the space completely. No VACUUM FULL is needed after CLUSTER (and vice versa).
More details:
PostgreSQL 9.2: vacuum returning disk space to operating system

From the docs:
The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.
You'll need to do a CLUSTER followed by a VACUUM FULL to reclaim the space.

Why do you "reorder" ? There is no order in SQL, it doesn't make sence. If you need a fixed order, tell your queries what order you need or use a view, that's what views are made for.
Diskspace will be used again after vacuum, auto_vacuum will do the job. Unless you disabled this process.
Your current approach will kill overall performance (table locks), indexes have to be recreated, statistics go down the toilet, etc. etc. And in the end, you end up with the same situation you allready had. So why the effort?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse