Duplicate Records Data in Debezium - postgresql

I'm Beginner of Debezium concept,
Just Before i created Maven project for debezium integration. i created 2 java classes one is Configuration part and another one is OPERATIONS Part like CREATE,DELETE,UPDATE.
Step 1--: My Configuration part i added few my database related properties like host name port, dbname etc like as uploaded image(1.1).
enter image description here
So i'm inserting one record that (id is 1000) to my table debezium read that record. I take that records inserting some like .. ABC table.
Now i'm Down application(debezium Shut Down) then i'm inserting one Records previously not inserted but after adding snapshot.mode="never" its work fine data inserted (id is 1001).
Here is my issue---> : data inserted along with previous records like when (debezium Shut Down) then restart 1001 and 1000 id also inserted into ABC Table next time Again Down my application (debezium Shut Down) now i'm not inserting any record then Restart my application then again 1001 and 1000 id(s) insert every time with out any operation 2 records inserted.
I Don't want previous related records just i need how many records inserted in (debezium down time that much records i need)
Could you please tell what type of properties i need to add for Resolving this issue.

Related

SSIS deadlock because of the need for parallel updates inside the same data flow task

I am creating a data flow task which will be extracting data from a source table and will be updating a destination table as follows:
1) Use the unique id in the source record to find the record you want to update in the destination table.
2) If the ID does not exist in the destination table, check whether the email of the source record exists in the destination table instead.
a) If the email exists, update the destination record through the email. Also update the unique id of that destination record.
b) If the email does not exist, insert a new record to the destination table.
So, with simple words, I am creating a task that will be updating a table on its unique id and if it does not have a match, it will be attempting to update on its email. If it still does not find a match, it will be inserting a new record.
This means that I will have two updates running in parallel as you can see in the image (the two circled components will be running in parallel)
SSIS_Data_Flow_Task
Now, this generates a deadlock issue because of those two updates.
I have tried using With (NOLOCK) but this hint is for reading data, not updating it. I also have searched for delay tasks to delay one of the two data pipelines until the other is finished.
Any ideas? Could I maybe design my data flow task differently in order to avoid having multiple parallel updates in the first place?
Any help will be greatly appreciated.
with these type of flows i always work with a work table (dest table id, work type (U or I), ...). in a first step i fill a table with work that needs to be done, then i apply the work.

delete journal entries counter data in odoo 12

I needed to move the old database with all its setting from one server to another, and then remove the accounting testing data, I was able to delete journal items, journal entries and invoices but couldn't make the counter equals zero.
as shown in the picture next to the journal entries:
I need to setup Odoo system for the production. I don't want to make a new database as it needs a lot of data entry.
You need to use the REINDEX maintenance operation on your database to reset table id.
Or you can reset the table (Journal entries) id with the following SQL query:
ALTER SEQUENCE account_move_id_seq RESTART WITH 1;

iSQLOutput - Update only Selected columns

My flow is simple and I am just reading a raw file into a SQL table.
At times the raw file contains data corresponding to existing records. I do not want to insert a new record in that case and would only want to update the existing record in the SQL table. The challenge is, there is a 'record creation date' column which I initialize at the time of record creation. The update operation overwrites that column too. I just want to avoid overwriting that column, while updating the other columns from the information coming from the raw file.
So far I am having no idea about how to do that. Could someone make a recommendation?
I defaulted the creation column to auto-populate in the SQL database itself. And I changed my flow to just update the remaining records. Talend job is now not touching that column. Problem solved.
Yet another reminder of 'Simplification is underrated'. :)

CDC multiple insert/delete of the same identity value

I have a table T that contains an ID set as identity and primary key. I have enabled CDC on the table and then later added an XML field that I didn't care capturing so I did not do anything further (to recreate the capture table and/or migrate old capture data).
I now have a stored procedure that (among other things) updates only the newly created field (no other field) in table T. I notice that instead of recording an update (operation=3 followed by operation=4), CDC records a delete (operation=1) followed by an insert (operation=2) and all fields are the same (of course since none of them was updated)
I actually noticed this because I had the same identity value inserted and/or deleted more than once, which is not possible (unless identity_insert is on, which is not)
Why does CDC record operation=1 instead of 3 and operation=2 instead of 4?
Is this documented anywhere or is it a bug?
The reason you are seeing a Delete/insert pair (Operation number 1/2) as opposed to an update pair (3/4) is because you are updating a "set" of data that ALSO has a unique constraint on your column.
For SQL to make sense of this wihout violating the unique cosntraint, it deletes the row and reinserts it (with the "update").
More information on this. Its not an issue or a defect. its the way SQL works and CDC innocently logs it as it sees it. Remember, CDC is just a subscriber and replicates things as they happen.
If you have a need to see an update you may have to look for the 1/2 "pair" and not ONLY the operation code 3/4.
Some great articles:
Bounded Update is the term used to describe certain types of UPDATE statements from the publisher that will replicate as DELETE/INSERT pairs on the subscriber. We perform a bounded update for every set based update that changes a column that is part of a unique index or constraint. In other words, if an UPDATE statement touches more than one row and modifies a column that is has any UNIQUE constraints, the UPDATE statement is sent to the subscriber as a DELETE/INSERT pair ... read more here
https://support.microsoft.com/en-us/kb/238254

Select new or updated records com table in Postgresql

It's possible to get all the new or update records from one table in postgresql by
specified date?
something like this:
Select NEW OR UPDATED FROM anyTable WHERE dt_insert or dt_update = '2015-01-01'
tks
You can only do this if you added a trigger-maintained field that keeps track of the last change time.
There is no internal row timestamp in PostgreSQL, so in the absence of a trigger-maintained timestamp for the row, there's no way to find rows changed/added after a certain time.
PostgreSQL does have internal information on the transaction ID that wrote a row, stored in the xmin hidden column. There's no record of what transaction ID committed when, though, until PostgreSQL 9.5 which keeps track of this if and only if the new track_commit_timestamps setting is turned on. Additionally, PostgreSQL eventually clears the creator transaction ID information from a tuple because it re-uses transaction IDs, so it only works for fairly recent transactions.
In other words: it's kind-of possible in a rough way, if you understand the innards of the database, but should really only be used for forensic and recovery purposes.