CDC multiple insert/delete of the same identity value - sql-server-2008-r2

I have a table T that contains an ID set as identity and primary key. I have enabled CDC on the table and then later added an XML field that I didn't care capturing so I did not do anything further (to recreate the capture table and/or migrate old capture data).
I now have a stored procedure that (among other things) updates only the newly created field (no other field) in table T. I notice that instead of recording an update (operation=3 followed by operation=4), CDC records a delete (operation=1) followed by an insert (operation=2) and all fields are the same (of course since none of them was updated)
I actually noticed this because I had the same identity value inserted and/or deleted more than once, which is not possible (unless identity_insert is on, which is not)
Why does CDC record operation=1 instead of 3 and operation=2 instead of 4?
Is this documented anywhere or is it a bug?

The reason you are seeing a Delete/insert pair (Operation number 1/2) as opposed to an update pair (3/4) is because you are updating a "set" of data that ALSO has a unique constraint on your column.
For SQL to make sense of this wihout violating the unique cosntraint, it deletes the row and reinserts it (with the "update").
More information on this. Its not an issue or a defect. its the way SQL works and CDC innocently logs it as it sees it. Remember, CDC is just a subscriber and replicates things as they happen.
If you have a need to see an update you may have to look for the 1/2 "pair" and not ONLY the operation code 3/4.
Some great articles:
Bounded Update is the term used to describe certain types of UPDATE statements from the publisher that will replicate as DELETE/INSERT pairs on the subscriber. We perform a bounded update for every set based update that changes a column that is part of a unique index or constraint. In other words, if an UPDATE statement touches more than one row and modifies a column that is has any UNIQUE constraints, the UPDATE statement is sent to the subscriber as a DELETE/INSERT pair ... read more here
https://support.microsoft.com/en-us/kb/238254

Related

How can I do a read-create transaction with Npgsql?

In an ASP.NET Core 6 website which uses EF Core (Npgsql) I need to do the following scenario.
There is a table which holds users' data of some type. Each row of this table (per user) has a calculated column which uses the previous row's value. This opens up the race condition problem. I need to read the latest row (for a user), then create the new row and then insert it into DB. And during this time I need an assurance that the created row is the last row and no two rows are created at the same time.
How can I achieve this with EF Core?
I have read about Postgresql's table locks which are not supported in EF and slow down the app anyway (not recommended). Another thing that I read about is transactions, but every example I've seen uses it for updating or creating multiple rows, I don't know if this scenario can be handled with transactions.
One simple way to model this, is to have a composite key consisting of the user ID and a user row ID, which the application will increment client-side. In this model, your application simply loads the latest row, and then attempts to insert a new row for that user with an incremented value. If two threads attempt to do this concurrently, one will get a unique index violation, and can retry again or whatever.
So let's assume there's user id 4, and their latest row has user row id 10. You code loads that, and then attempts to insert a new row with user id 4 and user row id 11. Since there's a composite key defined over the two columns, only one such insert can succeed.
This should work just fine in EF.

automatically change column values to lower case while inserting

I have a table in db2 which is having a varchar column. I want to insert only lower case string in the column.
Is it possible to change the case to lower whenever an insertion happens in that column. What will be the alter
Query for that if possible ?
I can not make another column which can take reference of my current column and be referenced like ucase(Current_column)
The means to ensure the effect of lower-casing the data that is inserted into a column, i.e. "to change the case to lower whenever an insertion happens in that column", is a TRIGGER.
Presumably much like Why is my “before update” trigger changing unexpected columns?, per having noted in a followup comment to the OP that "I tried making a BEFORE INSERT", a TRIGGER similar to the following apparently was implemented in that attempt?:
CREATE TRIGGER TOLOWER_BI BEFORE INSERT ON USERS
REFERENCING NEW AS N OLD AS O FOR EACH ROW MODE DB2ROW
set N.LoginId= lcase(N.LoginId)
If so, then "the application is not picking the trigger" [also from a followup comment to the OP] must be explained, because a TRIGGER is in effect at the database layer, such that an application has no choice [no picking] with regard to the effects of the trigger being enforced.

Avoid duplicate inserts without unique constraint in target table?

Source & target tables are similar.
Target table has a UUID field that is computed in tMap, however the flow should not insert duplicate persons in target i.e unique (firstname,lastname,dob,gender). I tried marking those columns as key in tMap as in below screenshot, but that does not prevent duplicate inserts. How can I avoid duplicate inserts without adding unique constraint on target?
I also tried "using field" in target.
Edit: Solution as suggested below:
The CDC components in the Paid version of Talend Studio for Data Integration undoubtedly address this.
In Open Studio, you'll can roll your own Change data capture based on the composite, unique key (firstname,lastname,dob,gender).
Use tUniqueRow on data coming from stage_geno_patients, unique on the following columns: firstname,lastname,dob,gender
Feed that into a tMap
Add another query as input to the tMap, to perform look-ups against the table behind "patients_test", to find a match on the firstname,lastname,dob,gender. That lookup should "Reload for each row" using looking up against values from the staging row
In the case of no-match, detect it and then do an insert of the staging row of data into the table behind "patients_test"
Q: Are you going to update information, also? Or, is the goal only to perform unique inserts where the data is not already present?

Is it relevant to put "version" on a separate sql server table?

I have a table with several fields, this table almost never change but for one field, "version" which change very often.
Would it be relevant to put that single field into a separate table in order to reduce how often locks are put on the main table?
For instance I have a table tType and a table tEntry.
Whenever I add/deleted/update any row of tEntry, I need to update the "version" field of tType. There might be thousand of rows inside tEntry for a single tType referenced row. Meaning the version number could change very often, though any other data of tType (such as name, id, etc.) doesn't change.
Your Referral to tType and tEntry sounds like you are implementing a key-value store in a rdbms. There are several discussions you can google about this topic. In the www there seems to be consesus, that cons overweight pros on that. An option would be to look at key value stores, no sql, multi column DBs, etc (wikipedia)...
The next "anti-pattern" I recognized is that you try to mix transactional data with 'master data' in the table tType. Try to avoid this, even if your selects get more uncomfortable and need to be tuned better. Keep off the version info from the tType, if this changes extremely often. Look here to get the concept: MySQL JOIN the most recent row only?

Insert record in table if does not exist in iPhone app

I am obtaining a json array from a url and inserting data into a table. Since the contents of the url are subject to change, I want to make a second connection to a url and check for updates and insert new records in y table using sqlite3.
The issues that I face are:
1) My table doesn't have a primary key
2) The url lists the changes on the same day. Hence, if I run my app multiple times, when I insert values in my database, I get duplicate entries. I want to keep a check for the day duplicated entries that should be removed. The problem can be solved by adding a constraint, but since the url itself has duplicated values, I find it difficult.
The only way I can see you can do it if you have no primary key or something you can use that is unique to each record, is when you get your new data in you go through the new entries where for each one you check if the exact same data exists in the database already. If it doesn't then you add it, if it does then you skip over it.
You could even do something like create a unique key yourself for each entry which is a concatenation of each column of the table. That way you can quickly do the check for if the entry already exists in the database.
I see two possibilities depending on your setup:
You have a column setup as UNIQUE (this can be through a PRIMARY KEY or not). In this case, you can use the ON CONFLICT clause:
http://www.sqlite.org/lang_conflict.html
If you find this construct a little confusing, you can instead use "INSERT OR REPLACE" or "INSERT OR IGNORE" as described here:
http://www.sqlite.org/lang_insert.html
You do not have a column setup as UNIQUE. In this case, you will need to SELECT first to verify for duplicate data, and based on the result INSERT, UPDATE, or do nothing.
A more common & robust way to handle this is to associate a timestamp with each data item on the server. When your app interrogates the server it provides the timestamp corresponding to the last time it synced. The server then queries its database and returns all values that are timestamped later than the timestamp provided by the app. Then it also returns a new timestamp value for the app to store, to use on the next sync.