Mirror SAP internal data to an external system - postgresql

We would like to mirror data which is inside SAP to an external database.
Up to now there is a script which exports the data every night.
The customer wants this to happen more often. It should happen every hour.
The export is quite big, and we search for a better way to mirror data which is inside SAP to an external database.

Based on the tag, I assume that your external database is a PostgreSQL database. In this case, I don't think you will really find a pure SAP, database independent solution.
The standard solution for this sort of replication is the SAP SLT Server. It supports taking data out of your SAP system to either a SAP target or a non-SAP target. Currently it supports the following non-SAP targets:
DB2
SAP MaxDB
Microsoft SQL Server
Oracle
Sybase ASE
As you can see, PostgreSQL is not included in there (yet). In conclusion, I see the following possibilities:
Use SLT in combination with some other external DB that is supported.
Use a third party replication tool like for example SymmetricDS.
Depending on your source database, you might be able to use some database specific tools (e.g. SAP HANA Smart Data Integration).
Write some custom code for doing it. In my opinion, you should try to build a sort of log table in this case, to record (using maybe triggers) which rows were inserted / updated / deleted since the last replication. IMO, this should be really a last resort, as database replication is a fairly common topic and you should not reinvent the wheel.

Related

Talend open studio run only created or modified records among 15k

I have a job in talend open studio which is working fine, it conects a tMSSqlinput to a tMap then tMysqlOutput, very straight forward. My problem is that i need this job running on daily basis, but only run when a new record is created or modified...any help is highly aprecciated!
It seems that you are searching for a Change Data Capture Tool for Talend.
Unfortunately it is only available on the licenced product.
To implement your need, you do have several ways. I want to show the most popular ones.
CDC from Talend
As Corentin said correctly, you could choose to use CDC (Change Data Capture) from Talend if you use the subscription version.
CDC of MSSQL
Alternatively you can check if you can activate or use CDC in your MSSQL server. This depends on your license. If it is possible, you can use the function to identify new elements and proceed them.
Triggers
Also you can create triggers on your database (if you have access to it). For example, creating a trigger for the cases INSERT, UPDATE, DELETE would help you getting the deltas. Then you could store those records separately or their IDs.
Software driven / API
If your database is connected to a software and you have developers around, you could ask for a service which identifies records on insert / update / delete and shows them to you. This could be done e.g. in a REST interface.
Delta via ID
If the primary key is an ID and it is set to autoincrement, you could also check your MySQL table for the biggest number and only SELECT those from the source which have a bigger ID than you have already got. This depends of course from the database layout.

Data mining with postgres in production environment - is there a better way?

There is a web application which is running for a years and during its life time the application has gathered a lot of user data. Data is stored in relational DB (postgres). Not all of this data is needed to run application (to do the business). However form time to time business people ask me to provide reports of this data data. And this causes some problems:
sometimes these SQL queries are long running
quires are executed against production DB (not cool)
not so easy to deliver reports on weekly or monthly base
some parts of data is stored in way which is not suitable for such
querying (queries are inefficient)
My idea (note that I am a developer not the data mining specialist) how to improve this whole process of delivering reports is:
create separate DB which regularly is update with production data
optimize how data is stored
create a dashboard to present reports
Question: But is there a better way? Is there another DB which better fits for such data analysis? Or should I look into modern data mining tools?
Thanks!
Do you really do data mining (as in: classification, clustering, anomaly detection), or is "data mining" for you any reporting on the data? In the latter case, all the "modern data mining tools" will disappoint you, because they serve a different purpose.
Have you used the indexing functionality of Postgres well? Your scenario sounds as if selection and aggregation are most of the work, and SQL databases are excellent for this - if well designed.
For example, materialized views and triggers can be used to process data into a scheme more usable for your reporting.
There are a thousand ways to approach this issue but I think that the path of least resistance for you would be postgres replication. Check out this Postgres replication tutorial for a quick, proof-of-concept. (There are many hits when you Google for postgres replication and that link is just one of them.) Here is a link documenting streaming replication from the PostgreSQL site's wiki.
I am suggesting this because it meets all of your criteria and also stays withing the bounds of the technology you're familiar with. The only learning curve would be the replication part.
Replication solves your issue because it would create a second database which would effectively become your "read-only" db which would be updated via the replication process. You would keep the schema the same but your indexing could be altered and reports/dashboards customized. This is the database you would query. Your main database would be your transactional database which serves the users and the replicated database would serve the stakeholders.
This is a wide topic, so please do your diligence and research it. But it's also something that can work for you and can be quickly turned around.
If you really want try Data Mining with PostgreSQL there are some tools which can be used.
The very simple way is KNIME. It is easy to install. It has full featured Data Mining tools. You can access your data directly from database, process and save it back to database.
Hardcore way is MADLib. It installs Data Mining functions in Python and C directly in Postgres so you can mine with SQL queries.
Both projects are stable enough to try it.
For reporting, we use non-transactional (read only) database. We don't care about normalization. If I were you, I would use another database for reporting. I will desing the tables following OLAP principals, (star schema, snow flake), and use an ETL tool to dump the data periodically (may be weekly) to the read only database to start creating reports.
Reports are used for decision support, so they don't have to be in realtime, and usually don't have to be current. In other words it is acceptable to create report up to last week or last month.

Synchronize between an MS Access (Jet / MADB) database and PostgreSQL DB, is this possible?

Is it possible to have a MS access backend database (Microsoft JET or Access Database Engine) set up so that whenever entries are inserted/updated those changes are replicated* to a PostgreSQL database?
Two-way synchronization would be nice, but one way would be acceptable.
I know it's popular to link the two and use one as a frontend, but it's essential that both be backend.
Any suggestions?
* ie reflected, synchronized, mirrored
Can you use Microsoft SQL Server Express Edition? Or do you have to use Microsoft Access Database Engine? It's possible you'll have more options using MS SQL express, like more complete triggers and logging.
Either way, you're going to need a way to accumulate a log of changed rows from the source database engine, and a program to sync them to PostgreSQL by reading the log and converting it into suitable PostgreSQL INSERT, UPDATE and DELETE statements.
You could do this by having audit triggers in MADB/Express insert a row into an audit shadow table for every "real" table whenever it changed, including inserting special "row deleted" audit entries. Then your sync program could connect to both MADB/Express, read the audit tables, apply the changes to PostgreSQL, and empty the audit tables.
I'll be surprised if you find anything to do this out of the box. It's one area where Microsoft SQL Server has a big advantage because of all the deep Access and MADB engine integation to support the synchronisation and integration features.
There are some ETL ("Extract, Transform, Load") tools that might be helpful, like Pentaho and Talend. I don't know if you can achieve the desired degree of automation with them though.

DB2 external tables?

I just heard that Oracle has a feature called External Table that allows to access a flat file (for example a CSV file in the file system) from the database.
I just want to know if there is something similar in DB2 for LUW.
The closest thing I could see is to implement a Table function (written in Java, for example) that will read the file, and return a table with the data from the file. However, this procedure takes a long time (create the Java code, compile the Java and create the function in DB2 associating the Java class) and the implementation is not dynamic for different files with different quantity of columns (table function returns a predefined set of columns).
Here the documentation of Oracle External Tables: http://docs.oracle.com/cd/B28359_01/server.111/b28319/et_concepts.htm
Yes, IBM offers this as part of their InfoSphere Federation Server, which basically allows you to define nicknames inside a database to various data sources. Supported data sources
IBM Db2 11.5 has support for external tables that will allow you to do this.
This was formerly provided only by Netezza and this functionality has made its way to Db2.
See the manual page for CREATE EXTERNAL TABLE here https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r_create_ext_table.html
As mentioned, InfoSphere Federation Server is a good choice. There are two alternatives for DB2 UDB (Universal Database), which may be helpful in specific use cases:
DataLinks: it is basically another data type
that keeps a reference to your external file. It also provides
several levels of control over external data such as referential
integrity, access control, coordinated backup and recovery, and
transaction consistency.
DB2 Extenders: they extend functionality of the DB2 to operate on specific file formats, e.g. XML Extender provide set of features to operate on XML files inside DB2
There is also:
(a) external table support in the warehousing engine products (Db2 Warehouse, Db2 Warehouse on Cloud) (b) Data virtualization (aka federation/fluid query) in all Db2 products which may achieve the same thing.

How to transfer or copy tables of DB2 to oracle database

I want to transfer some tables of DB2 to oracle daily for accessing them from web page,
But I don't know commands of DB2. How to do this?
I want this action should perform on database daily on particular time, so is there any tool is available to do this operation. And for writing the program for operating above query which programming language should I use? I am using windows XP.
I think Change Data Capture is used to replicate DML from one database to other databases continuously.
However, what you need is to transfer some data at a particular time each day, thus CDC could be too heavy for that.
You could do a simply "db2 export", and then you could import the generated file from Oracle.
There should be an option to create an adapter in Oracle that permits to query DB2 tables. The opposite is called federation in DB2 (InfoSphere Information Server) that permits to query Oracle tables.
Export http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0008303.html
CMD examples http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.dm.doc/doc/r0004567.html
Check this link
http://blogs.oracle.com/warehousebuilder/entry/simple_change_data_capture_from_db2_table_to_oracle_table
In 11.2 releases, Change Data Capture (CDC) can be done by code template mapping. This allows users to capture the data changes from heterogeneous data source, and load into the target across different platforms.