I'm attempting to update a table in a PostgreSQL database that has ~2.3 million rows. We also have an event trigger associated with this table which is supposed to run a microservice to perform further calculations whenever a row is updated/inserted/deleted.
As expected, the first time I updated the table, this led to the creation of over 2 million pending events. At the rate of a few thousand events cleared an hour, I don't have the option to wait for all events to be processed.
I'm looking to update the data in the table without the event trigger creating any pending events. Things I've tried:
deleting the event trigger, updating the table and then re-creating the event trigger. While we didn't have any pending events at first, all of them reappeared as soon as the event trigger was recreated.
manipulating the table storing the event logs itself to manually delete all pending events created in the last 2 days (following the Hasura docs here).
DELETE FROM hdb_catalog.event_invocation_logs
WHERE event_id IN (
SELECT id FROM hdb_catalog.event_log
WHERE trigger_name = 'my_trigger_name'
AND delivered = false
AND created_at > now() - interval '2 days');
The above would only delete few tens of events each time, and then finish running for some reason.
Before I try deleting all event logs as a last resort, I was wondering if it's safe to do so:
DELETE FROM hdb_catalog.event_invocation_logs;
DELETE FROM hdb_catalog.event_log;
Any help is appreciated, thanks.
How are you actually executing the update statement which is touching 2.3 million rows? Are you just running it using SQL directly?
If so, you can wrap your statements like so:
SET session_replication_role = replica;
UPDATE table SET thing = 'whatever';
SET session_replication_role = DEFAULT;
Triggers do not execute when in replica mode
Related
I have a database where data from different source tables are processed and stored in a materialized view.
I choosed to store it as a MV because the query to process this data takes a while - about 3 seconds - and needs to be called all the time.
So, I created a trigger to refresh the MV every time the source table is modified (INSERT, DELETE or UPDATE).
The problem is, it seems the trigger function waits for the materialized view to finish refreshing to return, and I don't want this.
I want the insert operation to return as fast as possible, and the MV to refresh in parallel.
My function:
CREATE OR REPLACE FUNCTION "MOBILIDADE".atualizar_mv_solicitacao()
RETURNS TRIGGER AS
$$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY "MOBILIDADE"."MV_SOLICITACAO";
RETURN NULL;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER solicitacao_atualizar_mv_solicitacao
AFTER INSERT OR DELETE OR UPDATE ON "MOBILIDADE"."GESTAOPROJETOS_SOLICITACAO"
FOR EACH STATEMENT
EXECUTE PROCEDURE "MOBILIDADE".atualizar_mv_solicitacao();
When I run an INSERT operation with the trigger function enabled, it takes about 3 seconds to finish, while when I execute it with the trigger disabled it takes only seconds 0.07 seconds.
INSERT INTO "MOBILIDADE"."GESTAOPROJETOS_SOLICITACAO" (documento_tipo,documento_numero,documento_sigla,documento_ano,requerente,solicitacao,data,data_recebimento_semob,categorias,geom,endereco_regiao,endereco_bairro,endereco_logradouro,anexo,created_by,created_at,acao) VALUES('Indicação',12345,'TESTE',2022,'TESTE','TESTE','2022-09-15','2022-09-15','{"Barreiras físicas" , "Pavimentação"}',ST_Transform(ST_SetSRID(ST_MakePoint(-45.888675631640105,-23.236909838714148),4326),4326),'Sul','Bosque dos Eucaliptos','Rua Lima Duarte',false,1,NOW(),1) RETURNING id
This is the wrong way to go about it. If refreshing the materialized view takes long and you modify the table often, then you cannot refresh the materialized view on every data change. Even if the refresh runs asynchronously (which is not possible with a trigger), it will still put a lot of load on your system.
Consider alternative solutions:
Refresh the materialized view every five minutes or so.
Don't use a materialized view, but a regular table that contains the aggregates and update that table from a trigger whenever the underlying data change. That will only work if the "materialized view" is simple enough.
I have been using insert/update to update or insert a table in mysql from sql server. The job is set up as a cronjob. The job runs every 8 hours. The number of records in the source table is around 400000. Every 8 hours around 100 records might get updated or inserted.
I run the job in such a away that at the source level, I only take the modified runs between the last run and the current run.
I have observed that just to update / insert 100 rows the time taken is 30 minutes.
However, another way was to dump all of the 400000 in a file and then truncate the destination table and insert all of those records all over again. This process is done at every job run
So, now may I know why does insert/update take so much time?
Thanks
Rathi
As you said you run the job in such a away that at the source level, I only take the modified runs between the last run and the current run.
So just insert all these modified rows in a temp table
Take the min date modified date from temp table or use the same criteria which you use to extract only modified rows from source and delete all the rows from the destination table.
Then you can insert all the rows from temp to end table.
Let me know if you have any question.
Without knowing how your database is configured, it's hard to tell the exact reason, but I'd say the updates are slow because you don't have an index on your target table.
Try adding an index on your insert/update key column, it will speed things up.
Also, are you doing a commit after each insert ? If so, disable autocommit, and only commit on success like this : tMysqlOutput -- OnComponentOk -- tMysqlCommit.
PostgreSQL has read committed isolation level. Now I have a transaction which consists of a single DELETE statement and this delete statement has a subquery consisting of a SELECT statement for selection the rows to delete.
Is it true that I have to use FOR UPDATE in the select statement to get no conflicts with other transaction?
My thinking is the following: First the corresponding rows are read out from the table and in a second step these rows are deleted, so another transaction could interfere.
And what about a simple DELETE FROM myTable WHERE id = 4 statement? Do I also have to use FOR UPDATE?
Is it true that I have to use FOR UPDATE in the select statement to
get no conflicts with other transaction?
What does "no conflicts with other transaction" mean to you? You can test this by opening two terminals, and executing statements in each of them. Interleaved correctly, the DELETE statement will make the "other transaction" (the one that has its isolation level set to READ COMMITTED) wait until it commits or rolls back.
sandbox=# set transaction isolation level read committed;
SET
sandbox=# select * from customer;
date_of_birth
---------------
1996-09-29
1996-09-28
(2 rows)
sandbox=# begin transaction;
BEGIN
sandbox=# delete from customer
sandbox-# where date_of_birth = '1996-09-28';
DELETE 1
sandbox=# update customer
sandbox-# set date_of_birth = '1900-01-01'
sandbox-# where date_of_birth = '1996-09-28';
(Execution pauses here, waiting for transaction in other terminal.)
sandbox=# commit;
COMMIT
sandbox=#
UPDATE 0
sandbox=#
See below for the documentation.
And what about a simple DELETE FROM myTable WHERE id = 4 statement? Do
I also have to use FOR UPDATE?
There's no such statement as DELETE . . . FOR UPDATE.
You need to be sensitive to context when you're reading about database updates. Update can mean any change to a database; it can include inserting, deleting, and updating rows. In the docs cited below, "locked as though for update" is explicitly talking about UPDATE and DELETE statements, among others.
Current docs
FOR UPDATE causes the rows retrieved by the SELECT statement to be
locked as though for update. This prevents them from being modified or
deleted by other transactions until the current transaction ends. That
is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE,
SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of
these rows will be blocked until the current transaction ends. The FOR
UPDATE lock mode is also acquired by any DELETE on a row, and also by
an UPDATE that modifies the values on certain columns. Currently, the
set of columns considered for the UPDATE case are those that have an
unique index on them that can be used in a foreign key (so partial
indexes and expressional indexes are not considered), but this may
change in the future. Also, if an UPDATE, DELETE, or SELECT FOR UPDATE
from another transaction has already locked a selected row or rows,
SELECT FOR UPDATE will wait for the other transaction to complete, and
will then lock and return the updated row (or no row, if the row was
deleted).
Short version: the FOR UPDATE in a sub-select is not necessary because the DELETE implementation already does the necessary locking. It would be redundant.
Ideally you should read and digest Concurrency Control to learn how the concurrency issues are dealt with by the SQL engine.
Specifically for the case you're mentioning, I think these couple of excerpts are the most relevant, in Read Committed Isolation Level:
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands
behave the same as SELECT in terms of searching for target rows: they
will only find target rows that were committed as of the command start
time.
However, such a target row might have already been updated (or
deleted or locked) by another concurrent transaction by the time it is
found. In this case, the would-be updater will wait for the first
updating transaction to commit or roll back (if it is still in
progress).
So one of your two concurrent DELETE will be put to wait, as soon as it tries to delete a row that the other one already processed just before. This wait will only end when the other one commits or roll backs. In a way, that means that the engine "detected the conflict" and serialized the two DELETE in order to deal with that conflict.
If the first updater rolls back, then its effects are
negated and the second updater can proceed with updating the
originally found row. If the first updater commits, the second updater
will ignore the row if the first updater deleted it, otherwise it will
attempt to apply its operation to the updated version of the row.
In your scenario, after the first DELETE has committed and the second one is waked up, the second one will be unable to delete the row that it was put to wait for, because it's no longer current, it's gone. That's not an error in this isolation level. The execution will just go on with the other rows, some of which may also have disappeared. Eventually it will report the actual number of rows that were deleted by this statement, that may be different from the number that the sub-select initially found, before the statement was put to wait.
I have a database table 'MyTable' that has a trigger upon update of field 'Status' in it.
Below is a dummy-code of what i'm trying to do:
MyTable table = new Mytable();
table.setTableId(1);
table.setStatus ("NEW");
em.persist (table); //At this point the trigger did not kick in since this inserted a new record
...
MyTable table2 = em.find(MyTable.class, 1);
table2.setStatus ("NEW");
em.merge(table2)//Even though im updating the record with the same status with the same value, i still want the trigger to kick. However the trigger is not being activated.
...
MyTable table3 = em.find(MyTable.class, 1);
table3.setStatus ("OLD");
em.merge(table3)//The trigger is being activated here since the status is different the status value when it was inserted the first time.
Long story short, how can i make the changes done to 'transfer2' to trigger an update even though the status is the same?
-Thanks
Use em.flush(); to synchronize the persistence context to the underlying database. Your pending queries shall be send to database, but you still can have a complete rollback.
JPA does not update object that have no changes.
You could try changing to something else (flush), then changing it back.
You could also use a JPQL update query to update it.
Depending on your JPA provider you could probably force it to update fields that have not changed, but this would lead to very bad performance.
Could you please try updating the enity and commit the transaction, without using merge.
Like
em.getTransaction().begin();
MyTable table2 = em.find(MyTable.class, 1);
table2.setStatus ("NEW");
//em.merge(table2)//Even though im updating the record with the same status with the
// same value, i still want the trigger to kick. However the trigger is not being activated.
em.getTransaction().commit();
I have a table in my SQL Server 2008 R2 database, and would like to add a column called LastUpdated, that will automatically be changed every time the row is updated. That way, I can see when each individual row was last updated.
It seems that SQL Server 2008 R2 doesn't have a data type to handle this like earlier versions did, so I'm not sure of the best way to do it. I wondered about using a trigger, but what would happen when the trigger updated the row? Will that fire the trigger again, etc?
To know which row was last updated, you need to create a new column of type DATETIME/DATETIME2 and update it with a trigger. There is no data type that automatically updates itself with date/time information every time the row is updated.
To avoid recursion you can use the UPDATE() clause inside the trigger, e.g.
ALTER TRIGGER dbo.SetLastUpdatedBusiness
ON dbo.Businesses
AFTER UPDATE -- not insert!
AS
BEGIN
IF NOT UPDATE(LastUpdated)
BEGIN
UPDATE t
SET t.LastUpdated = CURRENT_TIMESTAMP -- not dbo.LastUpdated!
FROM dbo.Businesses AS t -- not b!
INNER JOIN inserted AS i
ON t.ID = i.ID;
END
END
GO
In modern versions you can trick SQL Server into doing this using temporal tables:
Maintaining LastModified Without Triggers
But this is full of caveats and limitations and was really only making light of multiple other similar posts:
A System-Maintained LastModifiedDate Column
Tracking Row Changes With Temporal
Columns
How to add “created” and “updated” timestamps without triggers
Need a datetime column that automatically updates
It's not that easy, unfortunately.
You can add a new DATETIME (or DATETIME2) field to your table, and you can give it a default constraint of GETDATE() - that will set the value when a new row is inserted.
Unfortunately, other than creating an AFTER UPDATE trigger, there is no "out of the box" way to keep it updated all the time. The trigger per se isn't hard to write, but you'll have to write it for each and every single table that should have that feature.....