Updating a table in SQL Server 2008 R2 - sql-server-2008-r2

I have a Customer table that has 55 million records. I need to update a column HHPK with increment values
Example:
12345....upto 55 million
I'm using the following script but the script is erroring out with transaction log for the database is getting full..
DB is using simple recovery model
DECLARE #SEQ BIGINT
SET #SEQ = 0
UPDATE Customers
SET #SEQ = HHPK = #SEQ + 1
Is there any other way to do that task? Please help

As your table already has a CustomerPK identity column, just use:
UPDATE dbo.Customers
SET HHPK = CustomerPK
Of course - with 55 million rows, this will be a strain on your log file. So you might want to do this in batches - preferably of less than 5000 rows to avoid lock escalation effects that would exclusively lock the entire table:
UPDATE TOP (4500) dbo.Customers
SET HHPK = CustomerPK
WHERE HHPK IS NULL
and repeat this until the entire table has been updated.
But really: if you already have an INT IDENTITY column CustomerPK - why do you need a second column to hold the same values? Doesn't make a lot of sense to me ....

Related

Prevent two threads from selecting same row ibm db2

I have a situation where I have multiple (potentially hundreds) threads repeating the same task (using a java scheduled executor, if you are curious). This task entails selecting rows of changes (from a table called change) that have not yet been processed (processed changes are kept track in a m:n join table called process_change_rel that keeps track of the process id, record id and status) processing them, then updating back the status.
My question is, how is the best way to prevent two threads from the same process from selecting the same row? Will the below solution (using for update to lock rows ) work? If not, please suggest a working solution
Create table change(
—id , autogenerated pk
—other fields
)
Create table change_process_rel(
—change id (pk of change table)
—process id (pk of process table)
—status)
Query I would use is listed below
Select * from
change c
where c.id not in(select changeid from change_process_rel with cs) for update
Please let me know if this would work
You have to "lock" a row which you are going to process somehow. Such a "locking" should be concurrent of course with minimum conflicts / errors.
One way is as follows:
Create table change
(
id int not null generated always as identity
, v varchar(10)
) in userspace1;
insert into change (v) values '1', '2', '3';
Create table change_process_rel
(
id int not null
, pid int not null
, status int not null
) in userspace1;
create unique index change_process_rel1 on change_process_rel(id);
Now you should be able to run the same statement from multiple concurrent sessions:
SELECT ID
FROM NEW TABLE
(
insert into change_process_rel (id, pid, status)
select c.id, mon_get_application_handle(), 1
from change c
where not exists (select 1 from change_process_rel r where r.id = c.id)
fetch first 1 row only
with ur
);
Every such a statement inserts 1 or 0 rows into the change_process_rel table, which is used here as a "lock" table. The corresponding ID from change is returned, and you may proceed with processing of the corresponding event in the same transaction.
If the transaction completes successfully, then the row inserted into the change_process_rel table is saved, so, the corresponding id from change may be considered as processed. If the transaction fails, the corresponding "lock" row from change_process_rel disappears, and this row may be processed later by this or another application.
The problem of this method is, that when both tables become large enough, such a sub-select may not work as quick as previously.
Another method is to use Evaluate uncommitted data through lock deferral.
It requires to place the status column into the change table.
Unfortunately, Db2 for LUW doesn't have SKIP LOCKED functionality, which might help with such a sort of algorithms.
If, let's say, status=0 is "not processed", and status<>0 is some processing / processed status, then after setting these DB2_EVALUNCOMMITTED and DB2_SKIP* registry variables and restart the instance, you may "catch" the next ID for processing with the following statement.
SELECT ID
FROM NEW TABLE
(
update
(
select id, status
from change
where status=0
fetch first 1 row only
)
set status=1
);
Once you get it, you may do further processing of this ID in the same transaction as previously.
It's good to create an index for performance:
create index change1 on change(status);
and may be set this table as volatile or collect distribution statistics on this column in addition to regular statistics on table and its indexes periodically.
Note that such a registry variables setting has global effect, and you should keep it in mind...

Update a very large table in PostgreSQL without locking

I have a very large table with 100M rows in which I want to update a column with a value on the basis of another column. The example query to show what I want to do is given below:
UPDATE mytable SET col2 = 'ABCD'
WHERE col1 is not null
This is a master DB in a live environment with multiple slaves and I want to update it without locking the table or effecting the performance of the live environment. What will be the most effective way to do it? I'm thinking of making a procedure that update rows in batches of 1000 or 10000 rows using something like limit but not quite sure how to do it as I'm not that familiar with Postgres and its pitfalls. Oh and both columns don't have any indexes but table has other columns that has.
I would appreciate a sample procedure code.
Thanks.
There is no update without locking, but you can strive to keep the row locks few and short.
You could simply run batches of this:
UPDATE mytable
SET col2 = 'ABCD'
FROM (SELECT id
FROM mytable
WHERE col1 IS NOT NULL
AND col2 IS DISTINCT FROM 'ABCD'
LIMIT 10000) AS part
WHERE mytable.id = part.id;
Just keep repeating that statement until it modifies less than 10000 rows, then you are done.
Note that mass updates don't lock the table, but of course they lock the updated rows, and the more of them you update, the longer the transaction, and the greater the risk of a deadlock.
To make that performant, an index like this would help:
CREATE INDEX ON mytable (col2) WHERE col1 IS NOT NULL;
Just an off-the-wall, out-of-the-box idea. Both col1 and col2 must be null to qualify precludes using an index, perhaps building a psudo index might be an option. This index would of course be a regular table but would only exist for a short period. Additionally, this relieves the lock time worry.
create table indexer (mytable_id integer primary key);
insert into indexer(mytable_id)
select mytable_id
from mytable
where col1 is null
and col2 is null;
The above creates our 'index' that contains only the qualifying rows. Now wrap an update/delete statement into an SQL function. This function updates the main table and deleted the updated rows from the 'index' and returns the number of rows remaining.
create or replace function set_mytable_col2(rows_to_process_in integer)
returns bigint
language sql
as $$
with idx as
( update mytable
set col2 = 'ABCD'
where col2 is null
and mytable_id in (select mytable_if
from indexer
limit rows_to_process_in
)
returning mytable_id
)
delete from indexer
where mytable_id in (select mytable_id from idx);
select count(*) from indexer;
$$;
When the functions returns 0 all rows initially selected have been processed. At this point repeat the entire process to pickup any rows added or updated which the initial selection didn't identify. Should be small number, and process is still available needed later.
Like I said just an off-the-wall idea.
Edited
Must have read into it something that wasn't there concerning col1. However the idea remains the same, just change the INSERT statement for 'indexer' to meet your requirements. As far as setting it in the 'index' no the 'index' contains a single column - the primary key of the big table (and of itself).
Yes you would need to run multiple times unless you give it the total number rows to process as the parameter. The below is a DO block that would satisfy your condition. It processes 200,000 on each pass. Change that to fit your need.
Do $$
declare
rows_remaining bigint;
begin
loop
rows_remaining = set_mytable_col2(200000);
commit;
exit when rows_remaining = 0;
end loop;
end; $$;

Delete from a table on basis of indexed columns is taking for ever

We have a table having three indexed columns say
column1 of type bigint
column2 of type timestamp without time zone
column3 of type timestamp without time zone
The table is having more than 12 crores of records and we are trying to delete all the records which are greater than current date - 45 days using below query
delete from tableA
where column2 <= '2019-04-15 00:00:00.00'
OR column3 <= '2019-04-15 00:00:00.00';
This is executing for ever and never completes.
Is there any way we can improve the performance of this query.
Drop indexes, delete data and recreate indexes. But this is not working as I am not able to delete data even after dropping the indexes.
delete
from tableA
where column2 <= '2019-04-15 00:00:00.00'
OR column3 <= '2019-04-15 00:00:00.00'
I do not want to change the query but want the Postgres configured through some property so that it is able to delete the records
See also for a good discussion of the issue Best way to delete millions of rows by ID
12 crores == 120 million rows?
Deleting from a large indexed table is slow because the index is rebuilt many times during the process. If you can select the rows you want to keep and use them to create a new table, then drop the old one, the process is much faster. If you do this regularly, use table partitioning and disconnect a partition when required, this can then be dropped.
1) Check the logs, you are probably suffering from deadlocks.
2) Try creating a new table selecting the data you need, then drop and rename. Use all the columns in your index in the query. DROP TABLE is much faster than DELETE .. FROM
CREATE TABLE new_table AS (
SELECT * FROM old_table WHERE
column1 >= 1 AND column2 >= current_date - 45 AND column3 >= current_date - 45);
DROP TABLE old_table;
ALTER TABLE new_table RENAME TO old_table;
CREATE INDEX ...
3) Create a new table using partitions based on date, with a table for say 15, 30 or 45 days (if you regularly remove data that is 45 days old). See https://www.postgresql.org/docs/10/ddl-partitioning.html for details.

Can heavily index table have its updates slower even if the columns updated aren't in any of the indexes?

I'm trying to understand why a 14 Milion row table is so slow updating, even though I'm joining with its primary key, and updating in batches( 5000 rows).
THIS IS THE QUERY
UPDATE A
SET COL1= B.COL1,
COL2 = B.COL2,
COL3 = 'ALWAYS THE SAME VAL'
FROM TABLE_X A, TABLE_Y B
WHERE A.PK = B.PK
TABLE_X has 14 Million rows
TABLE_X has 12 INDEXES, however the updated columns do not belong to any index. so it's not expected that this slowness is caused by having so many indexes right?
TABLE_Y has 5000 rows
ADITIONAL INFORMATION
I must update by the order of other column(Group) rather than the PK. If I could update by the order of PK then it would be way faster.
This is a business need. If they need to stop the process. they want groups to be either updated or not updated at all.
What could be causing such slow updates?
Database is SYBASE 15.7

SQLite - a smart way to remove and add new objects

I have a table in my database and I want for each row in my table to have an unique id and to have the rows named sequently.
For example: I have 10 rows, each has an id - starting from 0, ending at 9. When I remove a row from a table, lets say - row number 5, there occurs a "hole". And afterwards I add more data, but the "hole" is still there.
It is important for me to know exact number of rows and to have at every row data in order to access my table arbitrarily.
There is a way in sqlite to do it? Or do I have to manually manage removing and adding of data?
Thank you in advance,
Ilya.
It may be worth considering whether you really want to do this. Primary keys usually should not change through the lifetime of the row, and you can always find the total number of rows by running:
SELECT COUNT(*) FROM table_name;
That said, the following trigger should "roll down" every ID number whenever a delete creates a hole:
CREATE TRIGGER sequentialize_ids AFTER DELETE ON table_name FOR EACH ROW
BEGIN
UPDATE table_name SET id=id-1 WHERE id > OLD.id;
END;
I tested this on a sample database and it appears to work as advertised. If you have the following table:
id name
1 First
2 Second
3 Third
4 Fourth
And delete where id=2, afterwards the table will be:
id name
1 First
2 Third
3 Fourth
This trigger can take a long time and has very poor scaling properties (it takes longer for each row you delete and each remaining row in the table). On my computer, deleting 15 rows at the beginning of a 1000 row table took 0.26 seconds, but this will certainly be longer on an iPhone.
I strongly suggest that you re-think your design. In my opinion your asking yourself for troubles in the future (e.g. if you create another table and want to have some relations between the tables).
If you want to know the number of rows just use:
SELECT count(*) FROM table_name;
If you want to access rows in the order of id, just define this field using PRIMARY KEY constraint:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
...
);
and get rows using ORDER BY clause with ASC or DESC:
SELECT * FROM table_name ORDER BY id ASC;
Sqlite creates an index for the primary key field, so this query is fast.
I think that you would be interested in reading about LIMIT and OFFSET clauses.
The best source of information is the SQLite documentation.
If you don't want to take Stephen Jennings's very clever but performance-killing approach, just query a little differently. Instead of:
SELECT * FROM mytable WHERE id = ?
Do:
SELECT * FROM mytable ORDER BY id LIMIT 1 OFFSET ?
Note that OFFSET is zero-based, so you may need to subtract 1 from the variable you're indexing in with.
If you want to reclaim deleted row ids the VACUUM command or pragma may be what you seek,
http://www.sqlite.org/faq.html#q12
http://www.sqlite.org/lang_vacuum.html
http://www.sqlite.org/pragma.html#pragma_auto_vacuum