Activation of ACCELERATED_DATABASE_RECOVERY causes a bad management for trigger tables INSERTED and DELETED - triggers

In my last version of Sql-Server 2019 I have created a test database activating the new parameter "ACCELERATED_DATABASE_RECOVERY".
I have created a single table (4 columns and 127 records), including a trigger, a primary key and 3 indexes (one for every field).
The trigger make an update on the same table (on a random field in my test; usually it's used to update automatically a modification data field)
The problem happens if I update and delete the data, executing 2 TIMES this simple t-Sql:
**
begin tran
update DAT_UDC set UDC_CELLA = 0
delete DAT_UDC
rollback
**
First execution is ok ... during the update, tables "INSERTED" and "DELETED" are full.
Second execution is NOT OK ... during the update, table "DELETED" is empty.
Next execution will be NOT OK.
I have found some trick to execute the t-Sql always with a good result, but are not the solution:
Waiting 10 seconds (persistent data are automatically cleaned)
Executing every time the cleanup manually: EXEC sys.sp_persistent_version_cleanup [DATABASE_NAME];
Deleting an index (or the PK); maybe for an internal memory limit;
Removing in the trigger the update on the same table.
Maybe exists some setting more for the "ACCELERATED_DATABASE_RECOVERY" to work in a correct way.
Maybe someone can help me.
The test database with the test script are availables.
Thanks
Wandalous

Related

Postgres parallel/efficient load huge amount of data psycopg

I want to load many rows from a CSV file.
The file​s​ contain​ data like these​ "article​_name​,​article_time,​start_time,​end_time"
There is a contraint on the table: for the same article name, i don't insert a new row if the new ​article_time falls in an existing range​ [start_time,​end_time]​ for the same article.
ie: don't insert row y if exists [​start_time_x,​end_time_x] for which time_article_y inside range [​start_time_x,​end_time_x] , with article_​name_​y = article_​name_​x
I tried ​with psycopg by selecting the existing article names ad checking manually if there is an overlap --> too long
I tried again with psycopg, this time by setting a condition 'exclude using...' and tryig to insert with specifying "on conflict do nothing" (so that it does not fail) but still too long
I tried the same thing but this time trying to insert many values at each call of execute (psycopg): it got a little better (1M rows processed in almost 10minutes)​, but still not as fast as it needs to be for the amount of data ​I have (500M+)
I tried to parallelize by calling the same script many time, on different files but the timing didn't get any better, I guess because of the locks on the table each time we want to write something
Is there any way to create a lock only on rows containing the same article_name? (and not a lock on the whole table?)
Could you please help with any idea to make this parallellizable and/or more time efficient?
​Lots of thanks folks​
Your idea with the exclusion constraint and INSERT ... ON CONFLICT is good.
You could improve the speed as follows:
Do it all in a single transaction.
Like Vao Tsun suggested, maybe COPY the data into a staging table first and do it all with a single SQL statement.
Remove all indexes except the exclusion constraint from the table where you modify data and re-create them when you are done.
Speed up insertion by disabling autovacuum and raising max_wal_size (or checkpoint_segments on older PostgreSQL versions) while you load the data.

PostgreSQL TEMP table alternating between exist and not exist

I'm using PostgreSQL 9.6.2, with Toad client on Mac. Auto-commit is set to ON.
I first created a simple temp table like this:
CREATE TEMP TABLE demo_pairs
AS
WITH t (name, value) AS (VALUES ('a', 'b'), ('c', 'd'))
SELECT * FROM t;
Then something weird happens when I ran:
SELECT * FROM demo_pairs;
Every time I run the select (without re-running the create), it alternates between successfully selecting the values and error with table does not exist!
Can anyone help me understand what's going on?
https://www.postgresql.org/docs/current/static/sql-createtable.html
TEMPORARY or TEMP
If specified, the table is created as a temporary table. Temporary
tables are automatically dropped at the end of a session, or
optionally at the end of the current transaction (see ON COMMIT
below). Existing permanent tables with the same name are not visible
to the current session while the temporary table exists, unless they
are referenced with schema-qualified names. Any indexes created on a
temporary table are automatically temporary as well.
If you use session pooler that can close session for your or just close it yourself (eg network problem), the temp table will be dropped.
Also you can create it the way it is dropped on transaction end as well:
ON COMMIT
The behavior of temporary tables at the end of a transaction block can
be controlled using ON COMMIT. The three options are:
PRESERVE ROWS
No special action is taken at the ends of transactions. This is the
default behavior.
DELETE ROWS
All rows in the temporary table will be deleted at the end of each
transaction block. Essentially, an automatic TRUNCATE is done at each
commit.
DROP
The temporary table will be dropped at the end of the current transaction block.

Sending only updated rows to a client

I'd like to create a web service that allows a client to fetch all rows in a table, and then later allows the client to only fetch new or updated rows.
The simplest implementation seems to be to send the current timestamp to the client, and then have the client ask for rows that are newer than the timestamp in the following request.
It seems that this is doable by keeping an "updated_at" column with a timestamp set to NOW() in update and insert triggers, and then querying newer rows, and also passing down the value of NOW().
The problem is that if there are uncommitted transactions, these transactions will set updated_at to the start time of the transaction, not the commit time.
As a result, this simple implementation doesn't work, because rows can be lost since they can appear with a timestamp in the past.
I have been unable to find any simple solution to this problem, despite the fact that it seems to be a very common need: any ideas?
Possible solutions:
Keep a monotonic timestamp in a table, update it at the start of every transaction to MAX(NOW(), last_timestamp + 1) and use it as a row timestamp. Problem: this effectively means that all write transactions are fully serialized and lock the whole database since they conflict on the update time table.
At the end of the transaction, add a mapping from NOW() to the time in an update table like the above solution. This seems to require to take an explicit lock and use a sequence to generate non-temporal "timestamps" because just using an UPDATE on a single row would cause rollbacks in SERIALIZABLE mode.
Somehow have PostgreSQL, at commit time, iterate over all updated rows and set updated_at to a monotonic timestamp
Somehow have PostgreSQL itself maintain a table of transaction commit times, which it doesn't seem to do at the moment
Using the built-in xmin column also seems impossible, because VACUUM can trash it.
It would be nice to be able to do this in the database without modifications to all updates in the application.
What is the usual way this is done?
The problem with the naive solution
In case it's not obvious, this is the problem with using NOW() or CLOCK_TIMESTAMP():
At time 1, we run NOW() or CLOCK_TIMESTAMP() in a transaction, which gives 1 and we update a row setting time 1 as the update time
At time 2, a client fetches all rows, and we tell him that we gave it all rows until time 2
At time 3, the transaction commits with "time 1" in the updated_at field
The client asks for updated rows since time 2 (the time he got from the previous full fetch request), we query for updated_at >= 2 and we return nothing, instead of returning the row that is just added
That row is lost and will never seen by the client
Your whole proposition goes against some of the underlying fundamentals of an ACID-compliant RDBMS like PostgreSQL. Time of transaction start (e.g. current_timestamp()) and other time-based metrics are meaningless as a measure of what a particular client has received or not. Abandon the whole idea.
Assuming that your clients connect through a persistent session to the database you can follow this procedure:
When the session starts, CREATE TEMP UNLOGGED TABLE for the session user. This table contains nothing but the PK and the last update time of the table you want to fetch the data from.
The client polls for new data and receives only those records that have a PK not yet in the temp table or an existing PK but a newer last update time. Currently uncommitted transactions are invisible but will be retrieved at the next poll for new or updated records. The update time is required because there is no way to delete records from the temp tables of all concurrent clients.
The PK and last update time of retrieved record are stored in the temp table.
When the user closes the session, the temp table is deleted.
If you want to persist the retrieved records over multiple sessions for each client or the client disconnects after every query, then you need a regular table but then I would suggest to also add the oid of the user such that all users can use a single table for keeping track of the retrieved records. In that latter case you can create an AFTER UPDATE trigger on the table with your data which deletes the PK from the table with fetched records, for all users in one sweep. On their next poll the clients will then get the updated record.
Add a column, which will be used to track what record has been sent to a client:
alter table table_under_view
add column access_order int null;
create sequence table_under_view_access_order_seq
owned by table_under_view.access_order;
create function table_under_view_reset_access_order()
returns trigger
language plpgsql
as $func$
new.access_order := null;
$func$;
create trigger table_under_view_reset_access_order_before_update
before update on table_under_view
for each row execute procedure table_under_view_reset_access_order();
create index table_under_view_access_order_idx
on table_under_view (access_order);
create index table_under_view_access_order_where_null_idx
on table_under_view (access_order)
where (access_order is null);
(You could use a before insert on table_under_view trigger too, to ensure only NULL values are inserted into access_order).
You need to update this column after transactions with INSERTs & UPDATEs on this table is finished, but before any client query your data. You cannot do anything just after a transaction finished, so let's do it before a query happens. You can do this with a function, f.ex:
create function table_under_access(from_access int)
returns setof table_under_view
language sql
as $func$
update table_under_view
set access_order = nextval('table_under_view_access_order_seq'::regclass)
where access_order is null;
select *
from table_under_view
where access_order > from_access;
$func$;
Now, your first "chunk" of data (which will fetch all rows in a table), looks like:
select *
from table_under_access(0);
The key element after this is that your client needs to process every "chunk" of data to determine which is the greatest access_order it last got (unless you include it in your result with f.ex. window functions, but if you're going to process the results - which seems highly likely - you don't need that). Always use that for the subsequent calls.
You can add an updated_at column too for ordering your results, if you want to.
You can also use a view + rule(s) for the last part (instead of the function), to make it more transparent.

INSERT statement that does not fire an INSERT trigger

I am using PostgreSQL 9.2 and I need to write an INSERT statement which copies data from table A to table B without firing the INSERT trigger defined on table B (maybe some sort of bulk insertion operation??).
On this specific table (table B) many INSERT, UPDATE and DELETE operations are executed. During each and every one of this executions, a trigger must fire.
I cannot temporary disable the triggers because of standard, day-to-day DML operations.
Can anyone help me with the syntax for this non-trigger-firing INSERT statement?
Run your "privileged" inserts as a different user. That way your trigger can check the current user and exit if it shouldn't do anything.

MS-SQL 2000: Turn off logging during stored procedure

Here's my scenario:
I have a simple stored procedure that removes a specific set of rows from a table (we'll say about 30k rows), and then inserts about the same amount of rows. This generally should only take a few seconds; however, the table has a trigger on it that watches for inserts/deletes, and tries to mimic what happened to a linked table on another server.
This process in turn is unbareably slow due to the trigger, and the table is also locked during this process. So here are my two questions:
I'm guessing a decent part of the slowdown is due to the transaction log. Is there a way for me to specify in my stored procedure that I do not want what's in the procedure to be logged?
Is there a way for me to do my 'DELETE FROM' and 'INSERT INTO' commands without me locking the table during the entire process?
Thanks!
edit - Thanks for the answers; I figured it was the case (not being able to do either of the above), but wanted to make sure. The trigger was created a long time ago, and doesn't look very effecient, so it looks like my next step will be to go in to that and find out what's needed and how it can be improved. Thanks!
1) no, also you are not doing a minimally logged operation like TRUNCATE or BULK INSERT
2) No, how would you prevent corruption otherwise?
I wouldn't automatically assume that the performance problem is due to logging. In fact, it's likely that the trigger is written in such a way that is causing your performance problems. I encourage you to modify your original question and show the code for the trigger.
You can't turn off transactional integrity when modifying the data. You could ignore locks when you select data using select * from table (nolock); however, you need to be very careful and ensure your application can handle doing dirty reads.
It doesn't help with your trigger, but the solution to the locking issue is to perform the transactions in smaller batches.
Instead of
DELETE FROM Table WHERE <Condition>
Do something like
WHILE EXISTS ( SELECT * FROM table WHERE <condition to delete>)
BEGIN
SET ROWCOUNT 1000
DELETE FROM Table WHERE <Condition>
SET ROWCOUNT 0
END
You can temporarily disable the trigger, run your proc, then do whatever the trigger was doing in a more efficient manner.
-- disable trigger
ALTER TABLE [Table] DISABLE TRIGGER [Trigger]
GO
-- execute your proc
EXEC spProc
GO
-- do more stuff to clean up / sync with other server
GO
-- enable trigger
ALTER TABLE [Table] ENABLE TRIGGER [Trigger]
GO