Talend open studio run only created or modified records among 15k - talend

I have a job in talend open studio which is working fine, it conects a tMSSqlinput to a tMap then tMysqlOutput, very straight forward. My problem is that i need this job running on daily basis, but only run when a new record is created or modified...any help is highly aprecciated!

It seems that you are searching for a Change Data Capture Tool for Talend.
Unfortunately it is only available on the licenced product.

To implement your need, you do have several ways. I want to show the most popular ones.
CDC from Talend
As Corentin said correctly, you could choose to use CDC (Change Data Capture) from Talend if you use the subscription version.
CDC of MSSQL
Alternatively you can check if you can activate or use CDC in your MSSQL server. This depends on your license. If it is possible, you can use the function to identify new elements and proceed them.
Triggers
Also you can create triggers on your database (if you have access to it). For example, creating a trigger for the cases INSERT, UPDATE, DELETE would help you getting the deltas. Then you could store those records separately or their IDs.
Software driven / API
If your database is connected to a software and you have developers around, you could ask for a service which identifies records on insert / update / delete and shows them to you. This could be done e.g. in a REST interface.
Delta via ID
If the primary key is an ID and it is set to autoincrement, you could also check your MySQL table for the biggest number and only SELECT those from the source which have a bigger ID than you have already got. This depends of course from the database layout.

Related

IBM DB2 and IBM IMS Change Data Capture Capabilities

I'd like to understand if the CDC enabled IBM IMS segments and IBM DB2 table sources would be able to provide both the before and after snapshot change values (like the Oracle .OLD and .NEW values in trigger) so that both could be used for further processing.
Note:
We are supposed to retrieve these values through Informatica PowerExchange and process and push to targets.
As of now, we need to know would we be able to retrieve both before snapshot and after snapshot values from IBM DB2 and IBM IMS (.OLD and .NEW as in Oracle triggers - not an exact similar example, but mentioned just as an example to understand)
Any help is much appreciated, Thanks.
I don't believe CDC captures before data in its change messages that it compiles from the DBMS log data. It's main purpose is to issue the minimum number of commands needed to replicate the data from one database to another. You'll want to take a snapshot of your replica database prior to processing the change messages if you want to preserve the state of data such that you can query it.
Alternatively for Db2, it's probably easier to work with the temporal tables feature added in Db2 10 as that allows you to define what changes should drive a snapshot. You can then access the temporal data using a temporal SQL query.
SELECT … FROM…period specification
Example trigger with old and new referencing...
CREATE TRIGGER danny117
NO CASCADE BEFORE Update ON mylib.myfile
REFERENCING NEW AS N old as O
FOR EACH ROW
-- don't let the claim change and force upper case
--just do something automatically on update blah...
BEGIN ATOMIC
SET N.claim = ucase(O.claim);
END
w.r.t PowerExchange 9.1.0 & 9.6:
Before snapshot data can't be processed via the powerexchange for DB2 database. Recently I worked on a migration project and I thought like the Oracle CDC which uses SCN numbers there should be something for db2 to start the logger from any desired point. But to my surprise Inforamtica global support confirmed that before snapshot data can't be captured by PowerExchange.
They talk about materialize and de-materialize targets which was out of my knowledge at that time, later I found out they meant to export and import of history data.
Even if you have table with CDC enanbled, you can't capture the data before snapshot from PWX.
DB2 reads capture data from the DB2-logs which has a marking for the operation like U/I/D that's enough for PowerExchange to progress.

Mirror SAP internal data to an external system

We would like to mirror data which is inside SAP to an external database.
Up to now there is a script which exports the data every night.
The customer wants this to happen more often. It should happen every hour.
The export is quite big, and we search for a better way to mirror data which is inside SAP to an external database.
Based on the tag, I assume that your external database is a PostgreSQL database. In this case, I don't think you will really find a pure SAP, database independent solution.
The standard solution for this sort of replication is the SAP SLT Server. It supports taking data out of your SAP system to either a SAP target or a non-SAP target. Currently it supports the following non-SAP targets:
DB2
SAP MaxDB
Microsoft SQL Server
Oracle
Sybase ASE
As you can see, PostgreSQL is not included in there (yet). In conclusion, I see the following possibilities:
Use SLT in combination with some other external DB that is supported.
Use a third party replication tool like for example SymmetricDS.
Depending on your source database, you might be able to use some database specific tools (e.g. SAP HANA Smart Data Integration).
Write some custom code for doing it. In my opinion, you should try to build a sort of log table in this case, to record (using maybe triggers) which rows were inserted / updated / deleted since the last replication. IMO, this should be really a last resort, as database replication is a fairly common topic and you should not reinvent the wheel.

How to identify recently changed SPs in postgresql?

I want to get the list of stored procedures which were recently changed.
In MS SQL Server, there are system tables which store those information and we can easily retrieve what has changed. Similarly I want to find most recent changed SPs and tables in PostgreSql
Thanks
You can use an EVENT TRIGGER for logging. More information about how to create and use event triggers, can be found in the manual and www.youlikeprogramming.com.
You need at least version 9.3.

Synchronize between an MS Access (Jet / MADB) database and PostgreSQL DB, is this possible?

Is it possible to have a MS access backend database (Microsoft JET or Access Database Engine) set up so that whenever entries are inserted/updated those changes are replicated* to a PostgreSQL database?
Two-way synchronization would be nice, but one way would be acceptable.
I know it's popular to link the two and use one as a frontend, but it's essential that both be backend.
Any suggestions?
* ie reflected, synchronized, mirrored
Can you use Microsoft SQL Server Express Edition? Or do you have to use Microsoft Access Database Engine? It's possible you'll have more options using MS SQL express, like more complete triggers and logging.
Either way, you're going to need a way to accumulate a log of changed rows from the source database engine, and a program to sync them to PostgreSQL by reading the log and converting it into suitable PostgreSQL INSERT, UPDATE and DELETE statements.
You could do this by having audit triggers in MADB/Express insert a row into an audit shadow table for every "real" table whenever it changed, including inserting special "row deleted" audit entries. Then your sync program could connect to both MADB/Express, read the audit tables, apply the changes to PostgreSQL, and empty the audit tables.
I'll be surprised if you find anything to do this out of the box. It's one area where Microsoft SQL Server has a big advantage because of all the deep Access and MADB engine integation to support the synchronisation and integration features.
There are some ETL ("Extract, Transform, Load") tools that might be helpful, like Pentaho and Talend. I don't know if you can achieve the desired degree of automation with them though.

What applications do you use for data entry and retrieval via ODBC?

What apps or tools do you use for data entry into your database? I'm trying to improve our existing (cumbersome) system that uses a php web based system for entering data one ... item ... at ... a ... time.
My current solution to this is to use a spreadsheet. It works well with text and numbers that are human readable, but not with foreign keys that are used to join with the other table's rows.
Imagine that I want a row of data to include what city someone lives in. The column holding this is id_city, which is keyed to the "city" table which has two columns: id (serial) and name (text).
I envision being able to extend the spreadsheet capabilities to include dropdown menu's for every row of the id_city column that would allow the user to select which city (displaying the text of the city names), but actually storing the city id chosen. This way, the spreadsheet would:
(1) show a great deal of data on each screen and
(2) could be exported as a csv file and thrown to our existing scripts that manually insert rows into the database.
I have been playing around with MS Excel and Access, as well as OpenOffice's suite, but have not found something that gives me the functionality I mention above.
Other items on my wish-list:
(1) dynamically fetch the name of cities that can be selected by the user.
(2) allow the user to push the data directly into the backend (not via external files/scripts.
(3) If any of the columns of the rows of data gets changed in the backend, the user could refresh the data on the screen to reflect any recent changes.
Do you know how I could improve the process of data entry? What tools do you use? I use PostgreSQL for the backend and have access to MS Office, OpenOffice, as well as web based solutions. I would love a solution that is flexible, powerful, and doesn't require much time to develop or deploy (I know, dream on...)
I know that pgAdmin3 has similar functionality, but from what I have seen, it is more of an administrative tool rather than something for users to use.
As j_random_hacker noted, I've used MS Access for years (since Access 97) to connect to an ODBC Data Source.
You can do this via linking to external tables: (in Access 2010:)
New -> Blank Database
External Data -> ODBC Database -> Link to Data Source
Machine Data Source -> New -> System Data Source -> Select Driver (Oracle, or whatever) -> Finish
Enter a new name for your DSN, the all of the connection parameters, then click OK
Select newly created DSN, hit ok.
You can do so much once Access sees your external table as a linked table, including sorting, filtering, etc. There's one caveat: as far as I can tell, ALL operations happen on the client side unless you're using a pass-through query. That's fine if you're looking at a table with 3000 records. With 2,000,000 records, that hurts. To be clear, all data in the table comes down to the workstation, for all tables being joined, and the join happens client-side, NOT server-side.
There are usually standalone tools for basic database management - e.g., for Oracle and MySQL a free tool called SQL Developer suffices for basic database data entry.
For more complex types (especially involving clobs) I can usually knock an application together in Java+SWT in a day if we already have the model and DAOs available on the Java side. Yeah, you have to put some effort in, but if it will be used regularly in the future then it is probably worth it.
In your case (well, the case where you have bulk imports of data) knocking up some Perl that reads from the CSV and does the city id lookup would be trivial to implement. Maybe a waste for a one-off thing? Depends on the amount of data to import.
I would be surprised if MS Access can't do what you're looking for -- this is basically the exact use case for it. Namely, quickly throwing together a nice UI for a simple CRUD DB application that a spreadsheet doesn't quite stretch to.
This is an answer, technically, but not a recommendation:
I've used Excel and SSIS for importing simple data entry files into MS SQL, but it's not adequate - there's very little ability to control the data, and SSIS is so very touchy, especially when working with Excel.
MS Access does not work well with some non-Microsoft databases. There is an open-source equivalent called Apache OpenOffice Base you may want to try.