GoldenGate Replicat not putting all Columns in After - oracle12c

I have GoldenGate for Oracle(123015)'s Replicat process set up to get changes from the Oracle 12c database.
I'm able to get all INSERT/UPDATE/DELETE changes to the trail files. However, during the UPDATE operation, the After section of the record in the trail file only includes the columns that have been modified.
According to the documentation, if
FORCE LOGGING and SUPPLEMENTAL LOG DATA ALL COLUMNS is enabled on the database,
The EXTRACT param file in GoldenGate includes parameters LOGALLSUPCOLS and UPDATERECORDFORMAT FULL,
the AFTER section of the record should contain all columns, but it doesn't :(

To not rely on the documentation. Make some your own tests. Please check the full explanation what exactly GoldenGate 12.3 Extract process writes to the trail.

You need to use GETUPDATEBEFORES to get the before image of the update records:
https://docs.oracle.com/en/middleware/goldengate/core/19.1/reference/getupdatebefores-ignoreupdatebefores.html#GUID-3CA51C08-8D8C-4CD5-AEB2-E0DFEF1B6BF7

On the pluggable database just need add supplemental logging for all columns
ALTER PLUGGABLE DATABASE ORCLPDB ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Related

IBM DB2 and IBM IMS Change Data Capture Capabilities

I'd like to understand if the CDC enabled IBM IMS segments and IBM DB2 table sources would be able to provide both the before and after snapshot change values (like the Oracle .OLD and .NEW values in trigger) so that both could be used for further processing.
Note:
We are supposed to retrieve these values through Informatica PowerExchange and process and push to targets.
As of now, we need to know would we be able to retrieve both before snapshot and after snapshot values from IBM DB2 and IBM IMS (.OLD and .NEW as in Oracle triggers - not an exact similar example, but mentioned just as an example to understand)
Any help is much appreciated, Thanks.
I don't believe CDC captures before data in its change messages that it compiles from the DBMS log data. It's main purpose is to issue the minimum number of commands needed to replicate the data from one database to another. You'll want to take a snapshot of your replica database prior to processing the change messages if you want to preserve the state of data such that you can query it.
Alternatively for Db2, it's probably easier to work with the temporal tables feature added in Db2 10 as that allows you to define what changes should drive a snapshot. You can then access the temporal data using a temporal SQL query.
SELECT … FROM…period specification
Example trigger with old and new referencing...
CREATE TRIGGER danny117
NO CASCADE BEFORE Update ON mylib.myfile
REFERENCING NEW AS N old as O
FOR EACH ROW
-- don't let the claim change and force upper case
--just do something automatically on update blah...
BEGIN ATOMIC
SET N.claim = ucase(O.claim);
END
w.r.t PowerExchange 9.1.0 & 9.6:
Before snapshot data can't be processed via the powerexchange for DB2 database. Recently I worked on a migration project and I thought like the Oracle CDC which uses SCN numbers there should be something for db2 to start the logger from any desired point. But to my surprise Inforamtica global support confirmed that before snapshot data can't be captured by PowerExchange.
They talk about materialize and de-materialize targets which was out of my knowledge at that time, later I found out they meant to export and import of history data.
Even if you have table with CDC enanbled, you can't capture the data before snapshot from PWX.
DB2 reads capture data from the DB2-logs which has a marking for the operation like U/I/D that's enough for PowerExchange to progress.

How to use Apache Apex to ingest data in batch from DB2 to Vertica

Use Case: Ingest transaction data (e.g. rows = 10,000) in a single batch from DB2 and insert them to a Vertica database.
Question:
Should I get a single row from database or batch of 10k rows, process and then insert into destination database?
Is there any sample code which reads from one database and writes into another database?
You should always prefer batch execution , you will minimized your network roundtrip and improved your load to Vertica .
You can use the JDBC input and output operators to fetch from origin database and destination database. They should have configurable batch sizes. In general batching is faster than tuple by tuple.
Check https://github.com/apache/incubator-apex-malhar/tree/master/library/src/main/java/com/datatorrent/lib/db/jdbc
You can add multiple XML configuration files at src/site/conf in your project and select one of them at launch time.
This is described briefly at http://docs.datatorrent.com/application_packages/ under the section entitled "Adding pre-set configurations"

How to identify recently changed SPs in postgresql?

I want to get the list of stored procedures which were recently changed.
In MS SQL Server, there are system tables which store those information and we can easily retrieve what has changed. Similarly I want to find most recent changed SPs and tables in PostgreSql
Thanks
You can use an EVENT TRIGGER for logging. More information about how to create and use event triggers, can be found in the manual and www.youlikeprogramming.com.
You need at least version 9.3.

Loading DB2 table rows as Marklogic documents

Is there any tool to quickly convert a DB2 table rows into collection of XML documents that we can load to Marklogic?
DB2 supports the SQL/XML publishing extensions that were introduced in SQL:2003. These functions include XMLSERIALIZE, XMLELEMENT, XMLATTRIBUTE, and XMLFOREST, and are easily added to a SQL SELECT statement to produce a simple, well-formed XML document for each row in the result set. By writing queries that retrieve the table names and column layouts from DB2's catalog views, it is possible to automate the creation of the XML-publishing SELECT statements for a large number of tables.
One way of doing this would be to use the MLSQL toolkit ( http://developer.marklogic.com/code/mlsql ). It allows accessing relational databases from within your XQuery code in MarkLogic. Not sure how the returned data actually looks like, but it should be easy to process it within XQuery, and insert your data as XML into MarkLogic.
Just make sure not to try to load a million records in one statement, but instead try to spawn batches of lets say 1000 records at a time. Spawning will also allow for handling it with multiple threads, so should be faster for that reason too..
HTH!
Do you need to stream from DB2 to MarkLogic? Or can you temporarily dump all the documents to an intermediary filesystem and then read them in? If you can dump, then simply use some DB2 tooling (like #Fred's answer above) to export the rows to a bunch of XML documenets in a filesystem and use one of many methods for reading in a directory full of XML files into MarkLogic (like Information Studio (UI or apis), RecordLoader, and so on).
If you have don't want to store them in the filesystem as an intermediary, then you could write an InformationStudio plugin for MarkLogic that will pull out each row and insert a document into MarkLogic. You'd like need some web-service or rest endpoint that the plugin could call to extract the document data from DB2.
Alternatively, I suspect you could use the DB2 tooling (described by #Fred) that will let you execute some code per row of your table. If you can do that in Java (or .Net), then pull in the MarkLogic XCC APIs which will give you the ability to write documents into MarkLogic.

How can I audit with the Microsoft SQL Server LDF file?

We need an audit log in the product that we are creating. We use SQL Server 2008 R2. I learned that the LDF file keeps an complete log of all transactions that where made*.
I've found ApexSQL Log, this tools analyses the LDF file and provides a GUI. It's a great demonstration of what's possible. But it's expensive. More info: http://www.apexsql.com/sql_tools_log.aspx
Do you know of other programs that can analyse the LDF file's? Or perhaps other methods to provide audit-trail functionality? I know that it's possible to create triggers. But if it isn't necessary to add things to my database scheme then I would rather not do it.
*Only if you select the full recovery model.
How about the new Change Data Capture (CDC) functionality in R2. Doesnt that serve your purpose ?
When it comes to the information stored in an LDF file, make sure to form a full log chain. A log chain is a continuous sequence of transaction log backups. It starts with a full database backup followed by all subsequent log backups up through the auditing point. If it becomes broken, only the transactions in the logs up to the last backup before the missing one can be shown with full information (e.g. a schema and object name, or a row history)
Unlike INSERT and DELETE operations, which are fully logged in the LDF files, UPDATE operations are logged minimally – only the changes that are made are logged, but the old and new values are not. When logging UPDATE operations, SQL Server doesn’t log complete before and after row states but only the incremental change that occurred to the row. For example, if a word “log” was updated to word “blog” SQL Server will, in general case, only log an addition of letter “b” at index 0. This is enough for its purpose of ensuring ACID but not enough to easily show before and after states of the row. So, in order to understand what changed really occurred, you have to reconstruct the context in which the change occurred from the rest of transaction log and/or backup and online database data