postgres too slow - postgresql

I'm doing massive tests on a Postgres database...
so basically I have 2 table where I inserted 40.000.000 records on, let's say table1 and 80.000.000 on table2
after this I deleted all those records.
Now if I do SELECT * FROM table1 it takes 199000ms ?
I can't understand what's happening?
can anyone help me on this?

If you delete all the rows from a table, they are marked as deleted but not actually removed from disk immediately. In order to remove them you need to do a "vacuum" operation- this should kick in automatically some time after such a big delete. Even so, that will just leave the pages empty but taking up quite a bit of disk space without a "vacuum full".
If you regularly need to do delete all the rows from a large table, consider using "truncate" instead, which simply zaps the table data file.

The tuples are logically deleted, not fisically.
You should perform a VACUUM on the db.
More info here

If you are deleting all records, use truncate not delete. Further the first time you run it the relation will not be cached (file cache or shared buffers), so it will be slower than subsequent times.

Related

Temp table updates is slower then normal table in postgresql

I have a situation where updates on my temp table is slow. Below is the scenario
Created temp table in session for every session,first time temp table created and then going forward doing insert,update and delete operations this operations until session ends only.
First i'm inserting the rows and based on rows i'm updateing other columns. but this updates is slow compared to norma table. i checked the performance by replacing temp table whereas normal table taking around 50 to 60s but temp table is taking nearly 5 mins.
I tried analyze on temp table, then i got the improved performance. when im using analyze the updates are completed in with 50 seconds.
I tried Types also, but no luck.
Record count in temp table is 480
Can anyone help to imprrove the performance on temp table with out analyze OR any alternative for bulk collect and bulk insert in user defined types
All the above ooperations i'm doing in postgresql.
The lack of information in your question forces me to guess, but if all other things are equal, the difference is probably that you don't have accurate statistics on the temporary table. For normal tables (which are visible to the public), autovacuum takes care of that automatically, but for temporary tables, you have to call ANALYZE explicitly to gather table statistics.

Huge delete on PostgreSQL table : Deleting 99,9% of the rows of the table

I have a table in my PostgreSQL database that became huge, filled with a lot of useless rows.
As these useless rows represent 99.9% of my table data (about 3.3M rows), I was wondering if deleting them could have a bad impact on my DB :
I know that this operation could take some time and I will be able to block writes on the table during the maintenance operation
But I was wondering if this huge change in the data could also impact performance after the opertation itself.
I found solutions like creating a new table / using TRUNCATE to drop all lines but as this operation will be specific and one shot, I would like to be able to choose the most adapted solution.
I know that Postgre SQL has a VACUUM mechanism but I'm not a DBA expert : Could anyone please confirm that this delete will not impact my table integrity / data structure and that freed space will be reclaimed if needed for new data ?
PostgreSQL 11.12, with default settings on AWS RDS. I don't have any index on my table and the criteria for rows deletion will not be based on the PK
Deleting rows typically does not shrink a PostgreSQL table, sou you would then have to run VACUUM (FULL) to compact it, during which the table is inaccessible.
If you are deleting many rows, both the DELETE and the VACUUM (FULL) will take a long time, and you would be much better off like this:
create a new table that is defined like the old one
INSERT INTO new_tab SELECT * FROM old_tab WHERE ... to copy over the rows you want to keep
drop foreign key constraints that point to the old table
create all indexes and constraints on the new table
drop the old table and rename the new one
By planning that carefully, you can get away with a short down time.

AWS database single column adds extremely much data

I'm retrieving data from an AWS database using PgAdmin. This works well. The problem is that I have one column that I set to True after I retrieve the corresponding row, where originally it is set to Null. Doing so adds an enormous amount of data to my database.
I have checked that this is not due to other processes: it only happens when my program is running.
I am certain no rows are being added, I have checked the number of rows before and after and they're the same.
Furthermore, it only does this when changing specific tables, when I update other tables in the same database with the same process, the database size stays the same. It also does not always increase the database size, only once every couple changes does the total size increase.
How can changing a single boolean from Null to True add 0.1 MB to my database?
I'm using the following commands to check my database makeup:
To get table sizes
SELECT
relname as Table,
pg_total_relation_size(relid) As Size,
pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as External Size
FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;
To get number of rows:
SELECT schemaname,relname,n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
To get database size:
SELECT pg_database_size('mydatabasename')
If you have not changed that then your fillfactor is at 100% on the table since that is the default.
This means that every change in your table will mark the changed row as obsolete and will recreate the updated row. The issue could be even worse if you have indices on your table since those should be updated on every row change too. As you could imagine this hurts the UPDATE performance too.
So technically if you would read the whole table and update even the smallest column after reading the rows then it would double the table size when your fillfactor is 100.
What you can do is to ALTER your table lower the fillfactor on it, then VACUUM it:
ALTER TABLE your_table SET (fillfactor = 90);
VACUUM FULL your_table;
Of course with this step your table will be about 10% bigger but Postgres will spare some space for your updates and it won't change its size with your process.
The reason why autovacuum helps is because it cleans the obsoleted rows periodically and therefore it will keep your table at the same size. But it puts a lot of pressure on your database. If you happen to know that you'll do operations like you described in the opening question then I would recommend tuning the fillfactor for your needs.
The problem is that (source):
"In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table"
Furthermore, we did not always close the cursor which also increased database size while running.
One last problem is that we were running one huge query, not allowing the system to autovacuum properly. This problem is described in more detail here
Our solution was to re-approach the problem such that the rows did not have to be updated. Other solutions that we could think of but have not tried is to stop the process every once in a while allowing the autovacuum to work correctly.
What do you mean adds data? to all the data files? specifically to some files?
to get a precise answer you should supply more details, but generally speaking, any DB operation will add data to the transaction logs, and possibly other files.

Postgres parallel/efficient load huge amount of data psycopg

I want to load many rows from a CSV file.
The file​s​ contain​ data like these​ "article​_name​,​article_time,​start_time,​end_time"
There is a contraint on the table: for the same article name, i don't insert a new row if the new ​article_time falls in an existing range​ [start_time,​end_time]​ for the same article.
ie: don't insert row y if exists [​start_time_x,​end_time_x] for which time_article_y inside range [​start_time_x,​end_time_x] , with article_​name_​y = article_​name_​x
I tried ​with psycopg by selecting the existing article names ad checking manually if there is an overlap --> too long
I tried again with psycopg, this time by setting a condition 'exclude using...' and tryig to insert with specifying "on conflict do nothing" (so that it does not fail) but still too long
I tried the same thing but this time trying to insert many values at each call of execute (psycopg): it got a little better (1M rows processed in almost 10minutes)​, but still not as fast as it needs to be for the amount of data ​I have (500M+)
I tried to parallelize by calling the same script many time, on different files but the timing didn't get any better, I guess because of the locks on the table each time we want to write something
Is there any way to create a lock only on rows containing the same article_name? (and not a lock on the whole table?)
Could you please help with any idea to make this parallellizable and/or more time efficient?
​Lots of thanks folks​
Your idea with the exclusion constraint and INSERT ... ON CONFLICT is good.
You could improve the speed as follows:
Do it all in a single transaction.
Like Vao Tsun suggested, maybe COPY the data into a staging table first and do it all with a single SQL statement.
Remove all indexes except the exclusion constraint from the table where you modify data and re-create them when you are done.
Speed up insertion by disabling autovacuum and raising max_wal_size (or checkpoint_segments on older PostgreSQL versions) while you load the data.

Move truncated records to another table in Postgresql 9.5

Problem is following: remove all records from one table, and insert them to another.
I have a table that is partitioned by date criteria. To avoid partitioning each record one by one, I'm collecting the data in one table, and periodically move them to another table. Copied records have to be removed from first table. I'm using DELETE query with RETURNING, but the side effect is that autovacuum is having a lot of work to do to clean up the mess from original table.
I'm trying to achieve the same effect (copy and remove records), but without creating additional work for vacuum mechanism.
As I'm removing all rows (by delete without where conditions), I was thinking about TRUNCATE, but it does not support RETURNING clause. Another idea was to somehow configure the table, to automatically remove tuple from page on delete operation, without waiting for vacuum, but I did not found if it is possible.
Can you suggest something, that I could use to solve my problem?
You need to use something like:
--Open your transaction
BEGIN;
--Prevent concurrent writes, but allow concurrent data access
LOCK TABLE table_a IN SHARE MODE;
--Copy the data from table_a to table_b, you can also use CREATE TABLE AS to do this
INSERT INTO table_b AS SELECT * FROM table_a;
--Zeroying table_a
TRUNCATE TABLE table_a;
--Commits and release the lock
COMMIT;