Found ddl change for a table in CDC management console - db2

our target db is DB2 and source is ORACLE, we found ddl changes in CDC management console and i need to fix the instance in to proper running condition.

Paul Vernon answer assumes that what you are looking for is how to replicate DDL changes. I will assume that you don't want to replicate DDL changes, but just restart the subscription after minor layout changes (for example, after a column size has been increased or a column you are not going to replicate, changes).
If that is the case, right-click the specific table map on your subscription, and update table definition. I am not sure but I think after that, you have to refresh the entire subscription. If the table is very large, you will want to avoid refreshing them all, but that's another question.
Off course, if in the table change, a column has been added and you want to deal with it, you can edit column map and make the specific assignment you want to that column.
I hope this helps.

Related

How to actually delete files in Iceberg

I know that in Apache Iceberg I can set limits on number and age of snapshots, and that "deleting" data from the table does not result in underlying data removal, it simply masks or deletes tracking information.
I would like to actually delete the underlying files on delete, however. I know this will make time-travel inconsistent, but it is still a business requirement.
https://iceberg.apache.org/docs/latest/configuration/
As best as I can tell, I'll have to track and manage the physical life-cycle every file independently. Am I missing something?
If you don't care about table history (or time travel) you can simply call the expire_snapshots procedure after each delete.
What you get is a common question for many iceberg users.
We often need an asynchronous task to delete and expire snapshots\data.
If you use spark, you can use https://iceberg.apache.org/docs/latest/spark-procedures/#expire_snapshots, as shay saied.
you can also do this using the java api provided by iceberg https://iceberg.apache.org/docs/latest/api/.
Starting a task for each table is difficult to manage. Tables often have different TTL. In this case, You can add custom configurations to a table. Manually scan all iceberg tables, then determines whether to delete expired snapshots and data based on these configurations.
If you are using Iceberg with Hive (4.0.0-alpha2 + version), you can try expire_snapshot command on beeline.
Like
ALTER TABLE test_table EXECUTE expire_snapshots('2021-12-09 05:39:18.689000000');
Can read:
https://docs.cloudera.com/cdw-runtime/cloud/iceberg-how-to/topics/iceberg-expiring-snapshots.html
Hive Jira adding support:
https://issues.apache.org/jira/browse/HIVE-26354

PostgreSQL logical replication - ignore pre-existing data

Imagine dropping a subscription and recreating it from scratch. Is it possible to ignore existing data during the first synchronization?
Creating a subscription with (copy_data=false) is not an option because I do want to copy data, I just don't want to copy already existing data.
Example: There is a users table and a corresponding publication on the master. This table has 1 million rows and every minute a new row is added. Then we drop the subscription for a day.
If we recreate the subscription with (copy_data=true), replication will not start due to a conflict with already existing data. If we specify (copy_data=false), 1440 new rows will be missing. How can we synchronize the publisher and the subscriber properly?
You cannot do that, because PostgreSQL has no way of telling when the data were added.
You'd have to reconcile the tables by hand (or INSERT ... ON CONFLICT DO NOTHING).
Unfortunately PostgreSQL does not support nice skip options for conflicts yet, but I believe it will be enhanced in the feature.
Based on #Laurenz Albe answer which recommends the use of the statement:
INSERT ... ON CONFLICT DO NOTHING.
I believe that it would be better to use the following command which also will take care any possible updates on your data before you start the subscription again:
INSERT ... ON CONFLICT UPDATE SET...
Finally I have to say that both are dirty solutions as during the execution of the above statement and the creation of the subscription, new lines may have been arrived which will result in losing them until you perform again the custom sync.
I have seen some other suggested solutions using the LSN number from the Postgresql log file...
For me maybe is elegant and safe to delete all the data from the destination table and create the replication again!

DB2 updated rows since last check

I want to periodically export data from db2 and load it in another database for analysis.
In order to do this, I would need to know which rows have been inserted/updated since the last time I've exported things from a given table.
A simple solution would probably be to add a timestamp to every table and use that as a reference, but I don't have such a TS at the moment, and I would like to avoid adding it if possible.
Is there any other solution for finding the rows which have been added/updated after a given time (or something else that would solve my issue)?
There is an easy option for a timestamp in Db2 (for LUW) called
ROW CHANGE TIMESTAMP
This is managed by Db2 and could be defined as HIDDEN so existing SELECT * FROM queries will not retrieve the new row which would cause extra costs.
Check out the Db2 CREATE TABLE documentation
This functionality was originally added for optimistic locking but can be used for such situations as well.
There is a similar concept for Db2 z/OS - you have to check that out as I have not tried this one.
Of cause there are other ways to solve it like Replication etc.
That is not possible if you do not have a timestamp column. With a timestamp, you can know which are new or modified rows.
You can also use the TimeTravel feature, in order to get the new values, but that implies a timestamp column.
Another option, is to put the tables in append mode, and then get the rows after a given one. However, this option is not sure after a reorg, and affects the performance and space utilisation.
One possible option is to use SQL replication, but that needs extra tables for staging.
Finally, another option is to read the logs, with the db2ReadLog API, but that implies a development. Also, just appliying the archived logs into the new database is possible, however the database will remain in roll forward pending.

Change tracking in postgres

I want to track changes on some tables in Postgres. Is there any native means in PG for doing this? Probably not, because this requirement may be very specific depending on use-case.
I would like to observe only some columns in table for changes, and make a copy of the row, if such a filed changes. The row should be copied in another, similar table, which additionally has a column for current user, change time-stamp, id of the original row and of cause own primary key column.
Are there any good patterns for doing this? Which native Postgres tools could I use, and what should I implement myself?

SQL Server 2005 - ALTER TABLE AFTER COLUMN

There is any way to include one or more columns in a specific order (after X column, for eg) in SQL Server 2005? Or something like change the master, or a sysobject, or a MODIFY command?
Please:
NOT MySQL (AFTER COLUMN doesn't work)
NOT DROP TABLE-CREATE TABLE (I can NOT implement this option on production without put down the application)
I can NOT touch the application, it's not my APP or APP.Team
I can NOT KNOW if there is somewhere in the application there is a SELECT * FROM so I must assume that YES, there is.
No, is not a desire, is an specific requirement, the table gets a feed from external source (app) through a job.
You can only add columns at the end.
And even that will use a schema modify lock for a short time, so in a very sensitive production environment, you should be aware of this.
http://msdn.microsoft.com/en-us/library/ms190273.aspx
If your app depends on a specific order of columns, the cure is not to change the column order, but to fix the app.
Some of the principles of RDBMS operation are better understood than others, and every definition of 1NF I know of concurs, that column order is to be considered without meaning.