Delete from a table on basis of indexed columns is taking for ever

Delete from a table on basis of indexed columns is taking for ever - postgresql

We have a table having three indexed columns say
column1 of type bigint
column2 of type timestamp without time zone
column3 of type timestamp without time zone
The table is having more than 12 crores of records and we are trying to delete all the records which are greater than current date - 45 days using below query
delete from tableA
where column2 <= '2019-04-15 00:00:00.00'
OR column3 <= '2019-04-15 00:00:00.00';
This is executing for ever and never completes.
Is there any way we can improve the performance of this query.
Drop indexes, delete data and recreate indexes. But this is not working as I am not able to delete data even after dropping the indexes.
delete
from tableA
where column2 <= '2019-04-15 00:00:00.00'
OR column3 <= '2019-04-15 00:00:00.00'
I do not want to change the query but want the Postgres configured through some property so that it is able to delete the records

See also for a good discussion of the issue Best way to delete millions of rows by ID
12 crores == 120 million rows?
Deleting from a large indexed table is slow because the index is rebuilt many times during the process. If you can select the rows you want to keep and use them to create a new table, then drop the old one, the process is much faster. If you do this regularly, use table partitioning and disconnect a partition when required, this can then be dropped.
1) Check the logs, you are probably suffering from deadlocks.
2) Try creating a new table selecting the data you need, then drop and rename. Use all the columns in your index in the query. DROP TABLE is much faster than DELETE .. FROM
CREATE TABLE new_table AS (
SELECT * FROM old_table WHERE
column1 >= 1 AND column2 >= current_date - 45 AND column3 >= current_date - 45);
DROP TABLE old_table;
ALTER TABLE new_table RENAME TO old_table;
CREATE INDEX ...
3) Create a new table using partitions based on date, with a table for say 15, 30 or 45 days (if you regularly remove data that is 45 days old). See https://www.postgresql.org/docs/10/ddl-partitioning.html for details.

Related

How to Truncate a postgreSQL table with conditions

I'm trying to truncate a PostgreSQL Table with some conditions.
Truncate all the data in the table and just let the data of the last 6 months
For that i have written this Query
select distinct datecalcul
from Table
where datecalcul > now() - INTERVAL '6 months'
order by datecalcul asc
How could I add the truncate clause?

TRUNCATE does not support a WHERE condition. You will have to use a DELETE statement.
delete from the_table
where ...
If you want to get rid of old ("expired") rows efficiently based on a timestamp, you can think about partitioning. Then you can just drop the old partitions.

Can heavily index table have its updates slower even if the columns updated aren't in any of the indexes?

I'm trying to understand why a 14 Milion row table is so slow updating, even though I'm joining with its primary key, and updating in batches( 5000 rows).
THIS IS THE QUERY
UPDATE A
SET COL1= B.COL1,
COL2 = B.COL2,
COL3 = 'ALWAYS THE SAME VAL'
FROM TABLE_X A, TABLE_Y B
WHERE A.PK = B.PK
TABLE_X has 14 Million rows
TABLE_X has 12 INDEXES, however the updated columns do not belong to any index. so it's not expected that this slowness is caused by having so many indexes right?
TABLE_Y has 5000 rows
ADITIONAL INFORMATION
I must update by the order of other column(Group) rather than the PK. If I could update by the order of PK then it would be way faster.
This is a business need. If they need to stop the process. they want groups to be either updated or not updated at all.
What could be causing such slow updates?
Database is SYBASE 15.7

busy table performance optimization

I have a postgresql table storing data from a table-like form.
id SERIAL,
item_id INTEGER ,
date BIGINT,
column_id INTEGER,
row_id INTEGER,
value TEXT,
some_flags INTEGER,
The issue is we have 5000+ entries per day and the information needs to be kept for years.
So I end up with a huge table witch is busy for the top 1000-5000 rows,
with lots of SELECT, UPDATE, DELETE queries but the old content is rarely used (only in statistics) and is almost never changed.
The question is how can I boost the performance for the daily work (top 5000 entries from 50 millions).
There are simple indexes on almost all columns .. but nothing fancy.
Splitting the table is not possible for now, I`m looking more for Index optimisation .

The advices in the comments from dezso and Jack are good. If you want the simplest then this is how you implement the partial index:
create table t ("date" bigint, archive boolean default false);
insert into t ("date")
select generate_series(
extract(epoch from current_timestamp - interval '5 year')::bigint,
extract(epoch from current_timestamp)::bigint,
5)
;
create index the_date_partial_index on t ("date")
where not archive
;
To avoid having to change all queries adding the index condition rename the table:
alter table t rename to t_table;
And create a view with the old name including the index condition :
create view t as
select *
from t_table
where not archive
;
explain
select *
from t
;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Index Scan using the_date_partial_index on t_table (cost=0.00..385514.41 rows=86559 width=9)
Then each day you archive older rows:
update t_table
set archive = true
where
"date" < extract(epoch from current_timestamp - interval '1 week')
and
not archive
;
The not archive condiditon is to avoid updating millions of already archived rows.

To delete records beyond 20 from a table

At any time, I want my table to display the latest 20 rows and delete the rest.
I tried rownum > 20 but it said " 0 rows deleted" even when my table had 50 records.However, on triying rownum<20 - the first 19 records were deleted.
Please help.

ROWNUM is a pseudo-column which is assigned 1 for the first row produced by the query, 2 for the next, and so on. If you say "WHERE ROWNUM > 20", no row will be matched - the first row, if there was one, would have ROWNUM=1, but your predicate causes it to reject it - therefore the query returns no rows.
If you want to query just the latest 20 rows, you'd need some way of determining what order they were inserted into the table. For example, if each row gets a timestamp when it is inserted, this would usually be pretty reliable (unless you get thousands of rows inserted every second).
For example, a table with definition MYTABLE(ts TIMESTAMP, mycol NUMBER), you could query the latest 20 rows with a query like this:
SELECT * FROM (
SELECT ts, mycol FROM MYTABLE ORDER BY ts DESC
)
WHERE ROWNUM <= 20;
Note that if there is more than one row with exact same timestamp, this query may pick some rows non-deterministically if there are two or more rows tied for the 20th spot.
If you have an index on ts it is likely to use the index to avoid a sort, and Oracle will use stopkey optimisation to halt the query once it's found the 20th row.
If you want to delete the older rows, you could do something like this, assuming mycol is unique:
DELETE MYTABLE
WHERE mycol NOT IN (
SELECT mycol FROM (
SELECT ts, mycol FROM MYTABLE ORDER BY ts DESC
)
WHERE ROWNUM <= 20
);
The performance of this delete, if the number of rows to be deleted is large, will probably be helped by an index on mycol.

SQLite - a smart way to remove and add new objects

I have a table in my database and I want for each row in my table to have an unique id and to have the rows named sequently.
For example: I have 10 rows, each has an id - starting from 0, ending at 9. When I remove a row from a table, lets say - row number 5, there occurs a "hole". And afterwards I add more data, but the "hole" is still there.
It is important for me to know exact number of rows and to have at every row data in order to access my table arbitrarily.
There is a way in sqlite to do it? Or do I have to manually manage removing and adding of data?
Thank you in advance,
Ilya.

It may be worth considering whether you really want to do this. Primary keys usually should not change through the lifetime of the row, and you can always find the total number of rows by running:
SELECT COUNT(*) FROM table_name;
That said, the following trigger should "roll down" every ID number whenever a delete creates a hole:
CREATE TRIGGER sequentialize_ids AFTER DELETE ON table_name FOR EACH ROW
BEGIN
UPDATE table_name SET id=id-1 WHERE id > OLD.id;
END;
I tested this on a sample database and it appears to work as advertised. If you have the following table:
id name
1 First
2 Second
3 Third
4 Fourth
And delete where id=2, afterwards the table will be:
id name
1 First
2 Third
3 Fourth
This trigger can take a long time and has very poor scaling properties (it takes longer for each row you delete and each remaining row in the table). On my computer, deleting 15 rows at the beginning of a 1000 row table took 0.26 seconds, but this will certainly be longer on an iPhone.

I strongly suggest that you re-think your design. In my opinion your asking yourself for troubles in the future (e.g. if you create another table and want to have some relations between the tables).
If you want to know the number of rows just use:
SELECT count(*) FROM table_name;
If you want to access rows in the order of id, just define this field using PRIMARY KEY constraint:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
...
);
and get rows using ORDER BY clause with ASC or DESC:
SELECT * FROM table_name ORDER BY id ASC;
Sqlite creates an index for the primary key field, so this query is fast.
I think that you would be interested in reading about LIMIT and OFFSET clauses.
The best source of information is the SQLite documentation.

If you don't want to take Stephen Jennings's very clever but performance-killing approach, just query a little differently. Instead of:
SELECT * FROM mytable WHERE id = ?
Do:
SELECT * FROM mytable ORDER BY id LIMIT 1 OFFSET ?
Note that OFFSET is zero-based, so you may need to subtract 1 from the variable you're indexing in with.

If you want to reclaim deleted row ids the VACUUM command or pragma may be what you seek,
http://www.sqlite.org/faq.html#q12
http://www.sqlite.org/lang_vacuum.html
http://www.sqlite.org/pragma.html#pragma_auto_vacuum

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Delete from a table on basis of indexed columns is taking for ever - postgresql

Related

How to Truncate a postgreSQL table with conditions

Can heavily index table have its updates slower even if the columns updated aren't in any of the indexes?

busy table performance optimization

To delete records beyond 20 from a table

SQLite - a smart way to remove and add new objects

Categories

Resources