How to create a Derived Column in IIDR CDC for Kafka Topics? - apache-kafka

we are currently working on a project to get data from an IBM i (formerly known as AS400) system with IBM IIDR CDC to Apache Kafka (Confluent Plattform).
So far everything was working fine, everything get replicated and appears in the topics.
Now we are trying to create a derived column in a table mapping which gives us the journal entry type from the source system (IBM i).
We would like to have the information to see whether it was an Insert,Update or Delete Operation.
Therefore we crated a derived column called OPERATION as Char(2) with Expression &ENTTYP.
But unfortunately the Kafka Topic doesn't show the value.
Can someone tell me what we were missing here?
Best regards,
Michael

I own the IBM IDR Kafka target, so lets see if I can help a bit.
So you have two options. The recommended way to see audit information would be to use one of the audit KCOPs. For instance you might use this one...
https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_11.4.0/com.ibm.cdcdoc.cdckafka.doc/tasks/kcopauditavroformat.html#kcopauditavroformat
You'll note that the audit.jcf property in the example is set to CCID and ENTTYP, so you get both the operation type and the transaction id.
Now if you are using derived columns I believe you would follow the following procedure... https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_11.4.0/com.ibm.cdcdoc.mcadminguide.doc/tasks/addderivedcolumn.html
If this is not working out, open a ticket and the L2 folks will provide a deeper debug. Oh also if you end up adding one, does the actual column get created in the output, just with no value in it?
Cheers,
Shawn

your colleagues told me how to do it:
DR Management Console -> Go to the "Filtering" tab -> find out "Derived Column" column in "Filter Columns" (Source Columns) section and mark "replicate" next to the column. Save table mapping afterwards and see if it appears now.
Unfortunately a derived column isn`t automatically selected for replication, but now I know how to select it.

you need to duplicate the new column on filter:
https://www.ibm.com/docs/en/idr/11.4.0?topic=mstkul-mapping-audit-fields-journal-control-fields-kafka-targets

Related

How to configure a Synapse Mapping Data Flow for INSERT/UPDATE/DELETE when the destination table does not yet exist

I am trying to build a generic Mapping Data Flow for some basic cleansing on tables in my Data Lake. I need it to be able to work both on an ongoing basis after data already exists in my cleansed tables as well as when new tables are added (it would detect them automatically and create and populate the destination). Both the Source and Destination tables with be Delta tables.
The approach I have taken is to have Sources configured to both my actual source and to the target and use either JOIN transformations or EXISTS transformations to identify the new, updated and removed rows.
This works fine for INSERTS and UPDATES, however my issues is dealing with DELETES when there is no data currently in the destination. Obviously there will be nothing to DELETE - that is as expected. However, because I reference the key column that will exist once data is loaded to the table I get an error on an initial run that states:
ERROR Dataflow AppManager: name=BatchJobListener.failed, opId=xxx, message=Job 'xxx failed due to reason: DF-SINK-007 at Sink 'cleansedTableWithDeletes': Sink results in 0 output columns. Please ensure at least one column is mapped.
The overall process looks as follows:
Has anyone developed a pattern that works for a generic flow (this one is parameter driven and ensures schema drift is accommodated) or a way for the Data Flow to think that there IS a column in the destination that it can refer to and get past this issue?
In Source options check Allow no files found.
You can also provide date dynamically in Filter by last modified option.
Refer - https://learn.microsoft.com/en-us/azure/data-factory/data-flow-sink#sink-settings

Streamsets Transformer - JDBC Origin without offset column

I'm testing platforms that can allow any user to easily create data processing pipelines. This platform has to meet certain requirements and one of them is to be capable of moving data from Oracle/SQL Server to HDFS.
Streamsets Transformer (v3.11) meets all requirements including the one referred above. I just can't get it to work in a very specific case: When ingesting a table that contains no numeric columns.
In these cases I want the pipeline to process all data so, in the JDBC Origin, I enabled the "Skip Offset Tracking" property. I thought that by skipping the offset tracking there would be no need to set the "Offset Column" property (guess I was wrong).
JDBC_05 - Table doesn't have compatible primary key configuration - supporting exactly one column but table have 0
If a numeric column exists, a possible workaround is to set it as the offset column but I can't find a way of doing this when none exists.
Am I missing something?
Thanks
We are looking at providing this functionality in Transformer in a future release. I'll come back and update this answer with any news.
In the meantime, you might want to look at using StreamSets Data Collector for these tables. It does not have the 'numeric offset column' requirement.

How to call a sas dataset by its label or where to check its name

I have a problem in dealing with SAS Enterprise Guide that runs on the server of my client.
I do not have access to the libraries so, in order to use the datasets the only thing we can do is to store them on the local disk C: of the computer and drag them to SAS.
We can not create libraries because the server does not read local paths.
Once you drag a table, let's call it "mydata" in SAS, the table is automatically renamed "mydata9865" with random numbers at the end and "mydata" is its label.
If you right-click the table and go to properties, you can't find the name of the table, just the label.
The only way I found to check the real name of the dataset is to open the Query Builder and check the name in the code preview.
The problem is that I am dealing with tables of millions of records and the machine I am using is very slow, so whenever I want to open the Query Building, just to check the table's name, it takes sometimes even an hour.
I am not a SAS expert, so I am sure there is a smarter way to do so. Is it possible for instance to use the table by calling it with its label?
data mydata2;
set mydata;
run;
instead of
set mydata9865?
Or is there some place I can rapidly check the name of the table without going through the query builder?
I tried to google it but I can't find anything, I hope someone will be able to help me!
Thank you in advance
Hover the mouse pointer over a data node to see it's attributes. The data set name is the File name: value.
For example:
In this example I had renamed the nodes created by two different queries to be the same (doable:yes, smart:maybe not). NOTE: A data node Label: is not necessarily the same as it's underlying data set's label metadata.
Regarding
use the table by calling it with its label?
Two nodes can have the same label, and is a a situation that defeats this approach.
Use the COPY task to upload your data explicitly. It sounds like you're not adding your data to the projects properly so SAS automatically assigns a name, rather than if you explicitly import or load your data.
Problem solved! I should have simply upload the data to the server with Tasks->Data->Upload Data Sets to Server but I didn't know this task so I didn't know it was possible to do it at all!
https://communities.sas.com/t5/SAS-Enterprise-Guide/Importing-sas-data-sets-from-C-drive-into-SAS-EG-not-possible/td-p/135184
Thank you everybody for you help!

Visio 2013: How to trigger a change in databinding of all shapes

I have a nice process overview for our ordering process in Visio. I have an external data source (SQL Server), which works fine. Every record in my data source represents one ordering process. Currently all my shapes of the process are linked to the first record of the data source.
Now I want to add a dynamic behavior. What I want to achieve is this:
A user provides the order reference in a textbox (order reference is a column in the data source)
Afterwards the user clicks a button
After the button click, the process is updated and all shapes are now linked to the external data source record, that matches the provided order reference
So in short: the user should be able to select which process that needs to be visualized.
I assume that this is common functionality, but I don't see how I can deal with this requirement. I've searched already some days on this issue, but without any success.
Can you help me with this issue?
Thanks a lot!
Problem solved :-)
Some old school VBA was required. Using the DataRecordSet object did the trick. It contains a method GetDataRowIDs that you can use to query the external dataset. Once you have the record to visualize, it's just a matter of dynamically updating the shapes with the correct record. Use macro recording to see how to do this.
MSDN: http://msdn.microsoft.com/en-us/library/office/ms195694(v=office.12).aspx

Eclipse Birt Reports, Creating report from SQL database, (user key?)

I'm fairly new to using the Birt Report Designer and need to figure out how to generate a report from a SQLite database. I have suceeded in getting it to connect to the DB but am now unsure how to generate a report and the tutorials that I have found aren't of much help so far.
I have a template that was given to me by my employer that has a few fields, I'm wondering if these fieldnames (in the template) are supposed to match field names in the DB.
Also, when I go to Run->View Report-> As PDF I am unsure what I am supposed to enter for the field "User Key", does this correspond to a table name in the DB or something along these lines?
As of now, I have tried entering a table name but just a blank report is generated.
If anyone can point me to a good resource or help with this I would greatly appreciate it. Thanks
There are two books i could really advice:
BIRT - A Field Guide to Reporting
Integrating and Extending BIRT
and the Eclipse Help containing BIRT documentation.
I suppose the User Key could be report parameter (listed in Data Explorer window), which is passed to Data Set to select appropriate data. If I'm guessing right, check within a Data Set editor ("Parameters" tab and "Query" tab) where the User Key parameter goes in - probably to one of the table field in a WHERE clause. Parameters in a query are represented by question marks: SELECT * FROM fooTable WHERE barColumn = ?. Hope tracking this would lead to find out, what to enter to the parameter.
Additionally, ensure if your Data Set(s) is(are) connected correctly to your SQLite Data Source ("Data Source" tab in a Data Set editor).
Being as new as you are to BIRT, I would suggest building a couple of reports with the sample DB (Classic Models). There are many, many samples out there for you to use as a guide. Additionally, most tutorials will use the Classic Models data so you can follow right along. After you create a couple of practice reports (this should not take more than 30-45 minutes) the template you have been given will likely make A LOT more sense and allow you to make progress almost immediately.
If you are looking for a nice collection of tutorials and samples, be sure to check out Birt Exchange for Dev Share (samples) & tutorials.
As for the "User Key" this is almost certainly a report-level parameter used to filter the data set (as the previous answer points out).
Good Luck!