I need to execute more than 50 thousand update queries in DB2 and its taking 3 hours to execute only 1500 records. Below is the sample update query , this table has index however it still takes a lot of time. I would like to know whether there is any other to speed up the execution time. The unique index is on ITEEMNUM and ITEMSETID . Each update statement updates 1 row only.
UPDATE DEMO SET CXDEMO=(select orgid from organization where itemsetid = 'ABC')
WHERE ITEMNUM='0039523-03' AND itemsetid='ABC';
UPDATE DEMO SET CXDEMO=(select orgid from organization where itemsetid = 'ABC')
WHERE ITEMNUM='0039523-07' AND itemsetid='ABC';
UPDATEDEMO SET CXDEMO=(select orgid from organization where itemsetid = 'ABC')
WHERE ITEMNUM='0039528-03' AND itemsetid='ABC';
Related
I have simple query that takes some results from User model.
Query 1:
SELECT users.id, users.name, company_id, updated_at
FROM "users"
WHERE (TRIM(telephone) = '8973847' AND company_id = 90)
LIMIT 20 OFFSET 0;
Result:
Then I have done some update on the customer 341683 and again I run the same query that time the result shows different, means the last updated shows first. So postgres is taking the last updated by default or any other things happens here?
Without an order by clause, the database is free to return rows in any order, and will usually just return them in whichever way is fastest. It stands to reason the row you recently updated will be in some cache, and thus returned first.
If you need to rely on the order of the returned rows, you need to explicitly state it, e.g.:
SELECT users.id, users.name, company_id, updated_at
FROM "users"
WHERE (TRIM(telephone) = '8973847' AND company_id = 90)
ORDER BY id -- Here!
LIMIT 20 OFFSET 0
When you want to use postgres's SELECT FOR UPDATE SKIP LOCKED functionality to ensure that two different users reading from a table and claiming tasks do not get blocked by each other and also do not get tasks already being read by another user:
A join is being used in the query to retrieve tasks. We do not want any other table to have row-level locking except the table that contains the main info. Sample query below - Lock only the rows in the table -'task' in the below query
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo )
ORDER BY v.id limit 200 for update skip locked;
Now if user A is retrieving some 200 rows of this table, user B should be able to retrieve another set of 200 rows.
EDIT: As per the comment below, the query will be changed to :
SELECT v.someid , v.info, v.parentinfo_id, v.stage FROM task v, parentinfo pi WHERE v.stage = 'READY_TASK'
AND v.parentinfo_id = pi.id
AND pi.important_info_number = (
SELECT MAX(important_info_number) FROM parentinfo) ORDER BY v.id limit 200 for update of v skip locked;
How best to place order by such that rows are ordered? While the order would get effected if multiple users invoke this command, still some order sanctity should be maintained of the rows that are being returned.
Also, does this also ensure that multiple threads invoking the same select query would be retrieving a different set of rows or is the locking only done for update commands?
Just experimented with this a little bit - multiple select queries will end up retrieving different set of rows. Also, order by ensures the order of the final result obtained.
Yes,
FOR UPDATE OF "TABLE_NAME" SKIP LOCKED
will lock only TABLE_NAME
I have a table in a Redshift cluster with 5 billion rows. I have a job that tries to update some column values based on some filter. Updating anything at all in this table is incredibly slow. Here's an example:
Update tbl1
set price=tbl2.price, flag=true
from tbl2 join tbl1 on tbl1.id=tbl2.id
where tbl1.time between (some value) and
tbl2.createtime between (some value)
I have sort key on time and dist key on id. When I checked stl_scan table, its shows that my query is scanning 50 million rows on each slice, and only returning 50K rows on each slice. I stopped the query after 20 mins.
For testing, I created same table with 1 billion rows and same update query took 3 mins.
When I run select with same condition I get the results in few seconds.Is there anything I am doing wrong?
I believe the correct syntax is:
Update tbl1
set price = tbl2.price,
flag = true
from tbl2
where tbl1.id = tbl2.id and
tbl1.time between (some value) and
tbl2.createtime between (some value);
Note that tbl1 is only mentoned once, in the update clause. There is no join, just a correlation clause.
Every time I do an INSERT or UPSERT (ON CONFLICT UPDATE), the increments column on each table increments by the number of updates that came before it.
For instance, if I have this table:
id int4
title text
description text
updated_at timestamp
created_at timestamp
And then run these queries:
INSERT INTO notifications (title, description) VALUES ('something', 'whatever'); // Generates increments ID=1
UPDATE notifications title='something else' WHERE id = 1; // Repeat this query 20 times with different values.
INSERT INTO notifications (title, description) VALUES ('something more', 'whatever again'); // Generates increments ID=22
This is a pretty big issue. The script we are running processes 100,000+ notifications every day. This can create gaps between each insert on the order of 10,000, so we might start off with 100 rows but by the time we reach 1,000 rows we have an auto-incremented primary key ID value over 100000 for that last row.
We will quickly run out of auto-increment values on our tables if this continues.
Is our PostgreSQL server misconfigured? Using Postgres 9.5.3.
I'm using Eloquent Schema Builder (e.g. $table->increments('id')) to create the table and I don't know if that has something to do with it.
A sequence will be incremented whenever an insertion is attempted regardless of its success. A simple update (as in your example) will not increment it but an insert on conflict update will since the insert is tried before the update.
One solution is to change the id to bigint. Another is not to use a sequence and manage it yourself. And another is to do a manual upsert:
with s as (
select id
from notifications
where title = 'something'
), i as (
insert into notifications (title, description)
select 'something', 'whatever'
where not exists (select 1 from s)
)
update notifications
set title = 'something else'
where id = (select id from s)
This supposes title is unique.
You can reset auto increment column to max inserted value by run this command before insert command:
SELECT setval('notifications_id_seq', MAX(id)) FROM notifications;
I have a Customer table that has 55 million records. I need to update a column HHPK with increment values
Example:
12345....upto 55 million
I'm using the following script but the script is erroring out with transaction log for the database is getting full..
DB is using simple recovery model
DECLARE #SEQ BIGINT
SET #SEQ = 0
UPDATE Customers
SET #SEQ = HHPK = #SEQ + 1
Is there any other way to do that task? Please help
As your table already has a CustomerPK identity column, just use:
UPDATE dbo.Customers
SET HHPK = CustomerPK
Of course - with 55 million rows, this will be a strain on your log file. So you might want to do this in batches - preferably of less than 5000 rows to avoid lock escalation effects that would exclusively lock the entire table:
UPDATE TOP (4500) dbo.Customers
SET HHPK = CustomerPK
WHERE HHPK IS NULL
and repeat this until the entire table has been updated.
But really: if you already have an INT IDENTITY column CustomerPK - why do you need a second column to hold the same values? Doesn't make a lot of sense to me ....