When running a batch UPDATE statement (ie. a statement that updates multiple records at once), do the Inserted and Deleted tables contain the same number of records and are the records in the same order (ie. record at pos 0 of Inserted corresponds to record at pos 0 of Deleted)?
Also...
I have an update statement...
UPDATE table SET column = value WHERE id = idvalue
If originally, the column had a value of 1, and the update sets the column's value = 1 also, both the inserted and deleted tables have one record and the column is flagged as changed (even though it didn't). Is this normal behavior?
Tables have no inherent order. If you wish to correlate rows between inserted and deleted, you're really going to want to have an unchangeable key on the table (such as e.g. a surrogate generated by IDENTITY). Otherwise, there's no way to identify which rows correspond with each other1.
And yes, it's normal for the trigger to fire, and for rows to be present, even when no actual data change occurred.
1Indeed, there are come optimizations where UPDATEs can be decomposed into INSERT and DELETEs, and then re-composed back into UPDATEs which actually "change" different columns - overall, the correct final rows exist in the table, but which columns got updated is actually switched around.
Related
I have a table with roughly 100,000,000 rows. We need to delete around 80,000 of them for a remediation.
In order to prevent downtime, I have a job setup to grab the records that needs to be deleted and then processes the delete in chunks of 100. However, even processing the first 100 is taking forever.
There is no primary ID on this table and the only way I can reliably reference each row is with a unique column called tx which is a varchar(250)` (though the field is never longer than 18-20 characters). I created an index on this row, but still takes roughly 4-6s to select a row.
Seemed likely the varchar was causing the problem, so I wanted to add a new id bigint serial column, but was trying to figure out whether or not doing this would lock the table until it's able to populate all of the ID's.
I know alter table add column is non blocking as long as there is no default value. But does Serial count as a default value?
I couldn't find an answer to this in the documentation. We're on Postgres 12.
Adding a new column with a sequence-generated value will rewrite the table, which will cause down time. With some care, it could be done without down time, but that is complicated and not worth the effort if you already have a varchar column with a unique index on it that does not contain NULL values.
Searching for rows with the existing index should be a matter of milliseconds. If it isn't, that's the problem you have to solve. Can you add EXPLAIN (ANALYZE, BUFFERS) output for the query to the question?
I have a non-empty PostgreSQL table with a GENERATED ALWAYS AS IDENTITY column id. I do a bulk insert with the C++ binding pqxx::stream_to, which I'm assuming uses COPY FROM. My problem is that I want to know the ids of the newly created rows, but COPY FROM has no RETURNING clause. I see several possible solutions, but I'm not sure if any of them is good, or which one is the least bad:
Provide the ids manually through COPY FROM, taking care to give the values which the identity sequence would have provided, then afterwards synchronize the sequence with setval(...).
First stream the data to a temp-table with a custom index column for ordering. Then do something likeINSERT INTO foo (col1, col2)
SELECT ttFoo.col1, ttFoo.col2 FROM ttFoo
ORDER BY ttFoo.idx RETURNING foo.id
and depend on the fact that the identity sequence produces ascending numbers to correlate them with ttFoo.idx (I cannot do RETURNING ttFoo.idx too because only the inserted row is available for that which doesn't contain idx)
Query the current value of the identity sequence prior to insertion, then check afterwards which rows are new.
I would assume that this is a common situation, yet I don't see an obviously correct solution. What do you recommend?
You can find out which rows have been affected by your current transaction using the system columns. The xmin column contains the ID of the inserting transaction, so to return the id values you just copied, you could:
BEGIN;
COPY foo(col1,col2) FROM STDIN;
SELECT id FROM foo
WHERE xmin::text = (txid_current() % (2^32)::bigint)::text
ORDER BY id;
COMMIT;
The WHERE clause comes from this answer, which explains the reasoning behind it.
I don't think there's any way to optimise this with an index, so it might be too slow on a large table. If so, I think your second option would be the way to go, i.e. stream into a temp table and INSERT ... RETURNING.
I think you can create id with type is uuid.
The first step, you should random your ids after that bulk insert them, by this way your will not need to return ids from database.
I'm having a trouble finding an option to keep an auto-increment function that follows the column number/ID without having a whole complicated process. Is there a data-type like serial/identity that keeps track of the column ID, but also tracks it when it deletes it?
Here's what happens when I delete values (20) from a table and the ID doesn't match the row number anymore.
I am working with a little website crawler program. I use PostgresQL to store data and use such statement to update that,
INSERT INTO topic (......) VALUES (......)
ON CONFLICT DO UPDATE /* updagte all fields here */
The question is if all fields before upate and after update are really equals, would PostgresQL really update that?
Postgres (like nearly all other DBMS) will not check if the target values are different then the original ones. So the answer is: yes, it will update the row even if the values are different.
However, you can easily prevent the "empty" update in this case by including a where clause:
INSERT INTO topic (......)
VALUES (......)
ON CONFLICT (...)
DO UPDATE
set ... -- update all column
WHERE topic IS DISTINCT FROM excluded;
The where clause will prevent updating a row that is identical to the one that is being inserted. To make that work correctly your insert has to list all columns of the target tables. Otherwise the topic is distinct from excluded condition will always be true because the excluded row has fewer columns then the topic row and thus it id "distinct" from it.
Adding a check for modified values has been discussed multiple times on the mailing list and has always be discarded. The main reason being, that it doesn't make sense to have the overhead of checking for changes for every statement just to cope with a few badly written ones.
I am getting a duplicate key error, DB2 SQL Error: SQLCODE=-803, SQLSTATE=23505, when I try to INSERT records. The primary key is one column, INTEGER 4, Generated, and it is the first column.
the insert looks like this: INSERT INTO SCHEMA.TABLE1 values (DEFAULT, ?, ?, ...)
It's my understanding that using the value DEFAULT will just let DB2 auto-generate the key at the time of insert, which is what I want. This works most of the time, but sometimes/randomly I get the duplicate key error. Thoughts?
More specifically, I'm running against DB2 9.7.0.3, using Scriptella to copy a bunch of records from one database to another. Sometimes I can process a bunch with no problems, other times I'll get the error right away, other times after 2 records, or 20 records, or 30 records, etc. Does not seem to be a pattern, nor is it the same record every time. If I change the data to copy 1 record instead of a bunch, sometimes I'll get the error one time, then it's fine the next time.
I thought maybe some other process was inserting records during my batch program, and creating keys at the same time. However, the tables I'm copying TO should not have any other users/processes trying to INSERT records during this same time frame, although there could be READS happening.
Edit: adding create info:
Create table SCHEMA.TABLE1 (
SYSTEM_USER_KEY INTEGER NOT NULL
generated by default as identity (start with 1 increment by 1 cache 20),
COL2...,
)
alter table SCHEMA.TABLE1
add constraint SYSTEM_USER_SYSTEM_USER_KEY_IDX
Primary Key (SYSTEM_USER_KEY);
You most likely have records in your table with IDs that are bigger then the next value in your identity sequence. To find out what the current value your sequence is about at, run the following query.
select s.nextcachefirstvalue-s.cache, s.nextcachefirstvalue-s.increment
from syscat.COLIDENTATTRIBUTES as a inner join syscat.sequences as s on a.seqid=s.seqid
where a.tabschema='SCHEMA'
and a.TABNAME='TABLE1'
and a.COLNAME='SYSTEM_USER_KEY'
So basically what happened is that somehow you got records in your table with ids that are bigger then the current last value of your identity sequence. So sooner or later these ids will collide with identity generated ids.
There are different reasons on how this could have happened. One possibility is that data was loaded which already contained values for the id column or that records were inserted with an actual value for the ID. Another option is that the identity sequence was reset to start at a lower value than the max id in the table.
Whatever the cause, you may also want the fix:
SELECT MAX(<primary_key_column>) FROM onsite.forms;
ALTER TABLE <table> ALTER COLUMN <primary_key_column> RESTART WITH <number from previous query + 1>;