SAP / DB2 for LUW 11.1 Finding out process that is creating high number of TLOG entries - database-performance

I have problem with replicating data from DB2 LUW (that is DB for SAP ECC app).
Replication procedure utilize DB2 low level API DB2 Log Reader.
https://www.ibm.com/docs/en/db2/11.1?topic=apis-db2readlog-read-log-records
Recently I can see that I can't read anymore logs in timely manner (I am connecting with Express Route between DB2 and Software that reads log so network is super good).
What I was able to identify so far is that there was 4 hours interval where reading logs were getting slower. I was also replicating AUFK (SAP ECC table) and saw every 4 hours enormous number of UPDATES. We were able to find out job on SAP ECC and disable it.
Now I am spotting different interval.. (probably other job on different table)
Is there a way (on SAP ECC or directly on DB2) to verify which table is having most DML(writes) or causing high consumption of log space?
I would like to get statistics like top 10 :
TABLE Name || % of Daily Log utilization
or
TABLE Name || No of Inserts || No of Updates || No of Deletes
What would be even better to maybe get the stats by Hour so I can associate it with some Job that is running.
Note: I don't have access to DB2 or SAP ECC and need to give some kind of guideline to team what I expect them to run on SAP ECC or DB2 side.

Related

streaming PostgreSQL tables into Google BigQuery

I would like to automatically stream data from an external PostgreSQL database into a Google Cloud Platform BigQuery database in my GCP account. So far, I have seen that one can query external databases (MySQL or PostgreSQL) with the EXTERNAL_QUERY() function, e.g.:
https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries
But for that to work, the database has to be in GCP Cloud SQL. I tried to see what options are there for streaming from the external PostgreSQL into a Cloud SQL PostgreSQL database, but I could only find information about replicating it in a one time copy, not streaming:
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external
The reason why I want this streaming into BigQuery is that I am using Google Data Studio to create reports from the external PostgreSQL, which works great, but GDS can only accept SQL query parameters if it comes from a Google BigQuery database. E.g. if we have a table with 1M entries, and we want a Google Data Studio parameter to be added by the user, this will turn into a:
SELECT * from table WHERE id=#parameter;
which means that the query will be faster, and won't hit the 100K records limit in Google Data Studio.
What's the best way of creating a connection between an external PostgreSQL (read-only access) and Google BigQuery so that when querying via BigQuery, one gets the same live results as querying the external PostgreSQL?
Perhaps you missed the options stated on the google cloud user guide?
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external#setup-replication
Notice in this section, it says:
"When you set up your replication settings, you can also decide whether the Cloud SQL replica should stay in-sync with the source database server after the initial import is complete. A replica that should stay in-sync is online. A replica that is only updated once, is offline."
I suspect online mode is what you are looking for.
What you are looking for will require some architecture design based on your needs and some coding. There isn't a feature to automatically sync your PostgreSQL database with BigQuery (apart from the EXTERNAL_QUERY() functionality that has some limitations - 1 connection per db - performance - total of connections - etc).
In case you are not looking for the data in real time, what you can do is with Airflow for instance, have a DAG to connect to all your DBs once per day (using KubernetesPodOperator for instance), extract the data (from past day) and loading it into BQ. A typical ETL process, but in this case more EL(T). You can run this process more often if you cannot wait one day for the previous day of data.
On the other hand, if streaming is what you are looking for, then I can think on a Dataflow Job. I guess you can connect using a JDBC connector.
In addition, depending on how you have your pipeline structure, it might be easier to implement (but harder to maintain) if at the same moment you write to your PostgreSQL DB, you also stream your data into BigQuery.
Not sure if you have tried this already, but instead of adding a parameter, if you add a dropdown filter based on a dimension, Data Studio will push that down to the underlying Postgres db in this form:
SELECT * from table WHERE id=$filter_value;
This should achieve the same results you want without going through BigQuery.

Best way to unload delta data from DB2 transactional tables?

Sorry if this has been asked before. I am hoping to win some time this way :)
What would be the best way to unload delta data from a DB2 source database that has been optimized for OLTP? E.g. by analyzing the redo files, as with Oracle Logminer?
Background: we want near-realtime ETL, and a full table unload every 5 minutes is not feasible.
this is more about the actual technology behind accessing DB2 than about determining the deltas to load into the (Teradata) target.
Ie, we want to unload all records since last unload timestamp.
many many thanks!
Check out IBM InfoSphere Data Replication.
Briefly:
There are 3 replication solutions: CDC, SQL & Q replication.
All 3 solutions read Db2 transaction logs using the same db2ReadLog API, which anyone may use for custom implementation. All other things like staging & transformation of the data changes got from logs, transportation and target application of data are different for each method.

PCI Compliance with Native Postgresql

We have PostgreSQL database no 3rd party software a Linux ad min and a SQL dba with little PostgreSQL experience.
We need to set up audit\access logging of all transactions on the CC tables. We enabled logging, but we are concerned about enabling everything to log. We want to restrict it to specified tables. I am not finding a resource that I under stand to accomplish this.
a few blogs have mentioned table triggers and logfiles
I found another that discusses functions. I am just not sure how t proceed on this. The following is the PCI information I am working off of:
(Done) Install pg_stat_statements extension to monitor all queries (SELECT, INSERT, UPDATE, DELETE)
Setup monitor to find out suspicious access on PAN holding table
Enable connection/disconnection logging
Enable Web Server access logs
Monitor Postgres logs for unsuccessful login attempts
Automated log analysis & Access Monitoring using Alerts
Keep archive audit and log history for at least one year and for last 3
months ready available for analysis
Update
also need to apply password policy to postgrsql db users.
90 day expirations (there is a place to set a date but not an
interval)
Lock out user 6 failed attempts Locked out for 30 minutes
or until an administrator enables the userID.
Force re-authenticate when idle for more than 15 minutes
Passwords/phrases must meet the
following: Require a minimum length of at least seven characters.
Contain both numeric and alphabetic characters.
Cannot be same as last 4 passwords/passphrases used
2) There is no direct way to log access to tables. The extension pg_audit claims that it can do that. I have never used it though.
3) can easily be done using log_connections and log_disconnections
4) has nothing to do with Postgres
5) can be done once connection logging has been done by monitoring the logfile
6) no idea what that should mean
7) that is independent of the Postgres setup. You just need to make sure the Postgres logfiles are archived properly.

PostgreSQL: Automating deletion of old partitions

I am using Imperial Wicket's fine automated partitioning function for PostgreSQL http://imperialwicket.com/postgresql-automating-monthly-table-partitions/) to create weekly partitions to store a large amount of data. The database grows by ~100GB per week, so some frequent attention is needed to ensure the server does not run out of disk space.
I have searched SO but cannot find any solutions to automatically drop tables based on disk space thresholds or time periods. Are there any native PostgreSQL solutions or developed functions to automate dropping old partitions, or is this a roll-my-own solution kind of problem?
This seems to be a roll my own solution kind of problem. Since I am using Imperial Wicket's automated table partitioning function, I used it as the base to develop a new function that can drop partioning tables older than the specified date.
The code is available at https://github.com/stevbev/postgresql-drop-time-series-table-partitions
Usage is similar to Imperial Wicket's syntax and supports day/week/month/year partitioning schemes. For example, to delete table partitions with name 'my_table_name' created with weekly partitioning that are older than the week 180 days ago, execute the SQL statement:
SELECT public.drop_partitions(current_date-180, 'public', 'my_table_name', 5, 'week');

Synchronize between an MS Access (Jet / MADB) database and PostgreSQL DB, is this possible?

Is it possible to have a MS access backend database (Microsoft JET or Access Database Engine) set up so that whenever entries are inserted/updated those changes are replicated* to a PostgreSQL database?
Two-way synchronization would be nice, but one way would be acceptable.
I know it's popular to link the two and use one as a frontend, but it's essential that both be backend.
Any suggestions?
* ie reflected, synchronized, mirrored
Can you use Microsoft SQL Server Express Edition? Or do you have to use Microsoft Access Database Engine? It's possible you'll have more options using MS SQL express, like more complete triggers and logging.
Either way, you're going to need a way to accumulate a log of changed rows from the source database engine, and a program to sync them to PostgreSQL by reading the log and converting it into suitable PostgreSQL INSERT, UPDATE and DELETE statements.
You could do this by having audit triggers in MADB/Express insert a row into an audit shadow table for every "real" table whenever it changed, including inserting special "row deleted" audit entries. Then your sync program could connect to both MADB/Express, read the audit tables, apply the changes to PostgreSQL, and empty the audit tables.
I'll be surprised if you find anything to do this out of the box. It's one area where Microsoft SQL Server has a big advantage because of all the deep Access and MADB engine integation to support the synchronisation and integration features.
There are some ETL ("Extract, Transform, Load") tools that might be helpful, like Pentaho and Talend. I don't know if you can achieve the desired degree of automation with them though.