How do I get data for more than one table in Talend using Oracle CDC? - talend

We are trying to connect Talend to our Oracle 12c database using CDC. The tOracleCDC component uses Oracle XStream to do the actual change data capture work. The issue is that when creating the CDC endpoint in Oracle one creates an "Outbound Server" which listens for changes on a number of tables, or even a number of whole schemas.
In Talend when configuring the tOracleCDC component one of the required fields is "Table Using CDC" which in the generated Java code is used to filter the incoming change records using something like "TableName".equalsIgnoringCase(... )
This means that we can only get changes for a single table for a given XStream connection (and each connection will require a unique outbound server object in the database).
We must be missing something, how can we pull changes for multiple tables in Talend?
Thanks!

The solution is to use an empty string as the table name in the Table Using CDC field. This will cause the templating engine to not emit the table name check that was causing this problem.
I could not find this documented anywhere, so it might be unsupported, but examining the templates shows that it is the intended behavior.

Related

Dynamic loading not working in talend

Not able to load multiple tables, getting error:
Exception in component tMysqlInput_1 (MYSQL_DynamicLoading)
java.sql.SQLException: Bad format for Timestamp 'GUINESS' in column 3
One table works fine. Basically after first iteration the second table trying to use the schema
of the first table. Please help, how to edit the component to make it
correct. Trying to load actor & country table from sakila DB mysql to
a another DB on the same server. Above image is for successful one table
dynamic loading.
you should not use tMysqlInput if output schemas differ. For this case there is no way around tJavaRow and custom code. I however cannot guess what happens in tMap, so you should provide some more details about what you want to achieve.
If all you need is to load data from one table to another without any transformations, you can do one of the following:
If your tables reside in 2 different databases on the same server, you can use a tMysqlRow and execute a query "INSERT INTO catalog.table SELECT * from catalog2.table2..". You can do some simple transformations in SQL if needed.
If your tables live in different servers, check the generic solution I suggested for a similar question here. It may need some tweaking depending on your use case, but the general idea is to replicate the functionality of INSERT INTO SELECT when the tables are not on the same server.

Mirror SAP internal data to an external system

We would like to mirror data which is inside SAP to an external database.
Up to now there is a script which exports the data every night.
The customer wants this to happen more often. It should happen every hour.
The export is quite big, and we search for a better way to mirror data which is inside SAP to an external database.
Based on the tag, I assume that your external database is a PostgreSQL database. In this case, I don't think you will really find a pure SAP, database independent solution.
The standard solution for this sort of replication is the SAP SLT Server. It supports taking data out of your SAP system to either a SAP target or a non-SAP target. Currently it supports the following non-SAP targets:
DB2
SAP MaxDB
Microsoft SQL Server
Oracle
Sybase ASE
As you can see, PostgreSQL is not included in there (yet). In conclusion, I see the following possibilities:
Use SLT in combination with some other external DB that is supported.
Use a third party replication tool like for example SymmetricDS.
Depending on your source database, you might be able to use some database specific tools (e.g. SAP HANA Smart Data Integration).
Write some custom code for doing it. In my opinion, you should try to build a sort of log table in this case, to record (using maybe triggers) which rows were inserted / updated / deleted since the last replication. IMO, this should be really a last resort, as database replication is a fairly common topic and you should not reinvent the wheel.

Talend open studio run only created or modified records among 15k

I have a job in talend open studio which is working fine, it conects a tMSSqlinput to a tMap then tMysqlOutput, very straight forward. My problem is that i need this job running on daily basis, but only run when a new record is created or modified...any help is highly aprecciated!
It seems that you are searching for a Change Data Capture Tool for Talend.
Unfortunately it is only available on the licenced product.
To implement your need, you do have several ways. I want to show the most popular ones.
CDC from Talend
As Corentin said correctly, you could choose to use CDC (Change Data Capture) from Talend if you use the subscription version.
CDC of MSSQL
Alternatively you can check if you can activate or use CDC in your MSSQL server. This depends on your license. If it is possible, you can use the function to identify new elements and proceed them.
Triggers
Also you can create triggers on your database (if you have access to it). For example, creating a trigger for the cases INSERT, UPDATE, DELETE would help you getting the deltas. Then you could store those records separately or their IDs.
Software driven / API
If your database is connected to a software and you have developers around, you could ask for a service which identifies records on insert / update / delete and shows them to you. This could be done e.g. in a REST interface.
Delta via ID
If the primary key is an ID and it is set to autoincrement, you could also check your MySQL table for the biggest number and only SELECT those from the source which have a bigger ID than you have already got. This depends of course from the database layout.

Viewing tableau server when one data source is missing

I have a dashboard in Tableau which pulls data from about 10 tables in a SQL database.
These tables are refreshed at various times of day. There are occasions where one of them is not available (or has been deleted and awaiting rebuild)
However when I open my tableau dashboard on the server it wont let me see any of it. Not seeing the data from the missing table is fine but the majority of the data that does not come from that table is unavailable too.
I get this error
An unexpected error occurred. If you continue to receive this error please contact your Tableau Server Administrator.
TableauException: [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid object name 'dbo.survey_order_info_fy16_TV_L'. The table "[dbo].[survey_order_info_fy16_TV_L]" does not exist. Unable to connect to the server "dbedwro.vistaprint.net". Check that the server is running and that you have access privileges to the requested database.
"survey_order_info_fy16_TV_L" being the missing table but not one I'm bothered about right now.
Is there an option that might help me see all the other data?
I am not sure if it's possible to avoid this behavior.
If there isn't there is a workaround for that by creating extract of these tables and storing them on the Tableau server. You can then use these extracts instead of the tables on the DB and just refresh them either by schedule if you know when the tables are available again or from the SQL server (eg. with SSIS by triggering the refresh once the data is available again).
Advantage of that would be that
you can refresh them independently and always have the latest data
it performs better than an SQL connection
you don't jam your SQL server with connections (in case you have a lot of users accesing)
you can filter and select if you didn't want your users to get access to the full dataset
disadvantages:
you will have to create one extract per table, and replace all data sources in workbooks you already use
It's a matter of creating a workbook, connecting to the source (adding filters or hiding fields) and publishing it to the server. Details of that can be found here:
http://onlinehelp.tableau.com/current/pro/online/mac/en-us/publish_datasources.html

Synchronize between an MS Access (Jet / MADB) database and PostgreSQL DB, is this possible?

Is it possible to have a MS access backend database (Microsoft JET or Access Database Engine) set up so that whenever entries are inserted/updated those changes are replicated* to a PostgreSQL database?
Two-way synchronization would be nice, but one way would be acceptable.
I know it's popular to link the two and use one as a frontend, but it's essential that both be backend.
Any suggestions?
* ie reflected, synchronized, mirrored
Can you use Microsoft SQL Server Express Edition? Or do you have to use Microsoft Access Database Engine? It's possible you'll have more options using MS SQL express, like more complete triggers and logging.
Either way, you're going to need a way to accumulate a log of changed rows from the source database engine, and a program to sync them to PostgreSQL by reading the log and converting it into suitable PostgreSQL INSERT, UPDATE and DELETE statements.
You could do this by having audit triggers in MADB/Express insert a row into an audit shadow table for every "real" table whenever it changed, including inserting special "row deleted" audit entries. Then your sync program could connect to both MADB/Express, read the audit tables, apply the changes to PostgreSQL, and empty the audit tables.
I'll be surprised if you find anything to do this out of the box. It's one area where Microsoft SQL Server has a big advantage because of all the deep Access and MADB engine integation to support the synchronisation and integration features.
There are some ETL ("Extract, Transform, Load") tools that might be helpful, like Pentaho and Talend. I don't know if you can achieve the desired degree of automation with them though.