Is there a way to create a timestamped log of transactions for a postgres (PG-admin) database table? - postgresql

How can I (using PG-Admin) access or create a log (ideally timestamped) that displays the changes that have happened to a table?

First of the way enable pg_stat_statements on PostgreSQL.
create extension pg_stat_statements;
After then you can view all SQL queries executed on your DB. At this you can view when executed SQL, when finished, duration of execute times and etc.
For more details: PostgreSQL Documentation - pg_stat_statements
If you need history of updated or deleted records on the tables, you can do it for all tables manually writing triggers or own functions, using JSON(JSONB) data types.

Related

Source of data in Redshift tables

I am looking to find the data source of couple of Tables in Redshift. I have gone through all the stored procedures in Redshift instance. I couldn't find any stored procedure which populates these tables in Redshift. I have also checked the Data Migration Service and didn't see these tables are being migrated from RDS instance. However, the tables are updated regularly each day.
What would be the way to find how data is populated in those 2 tables? Is there any logs or system tables I can look in to?
One place I'd look is svl_statementtext. That will pull any queries and utility queries that may be inserting or running copy jobs against that table. Just use a WHERE text LIKE %yourtablenamehere% and see what comes back.
https://docs.aws.amazon.com/redshift/latest/dg/r_SVL_STATEMENTTEXT.html
Also check scheduled queries in the Redshift UI console.

postgresql materialized view refresh history time

I'm working on a project which requires me to write a query to create a materialized view in PostgreSQL. My requirement is that How I can get PostgreSQL materialized view refresh history time for specific materialized view.
PostgreSQL does not store the time when an SQL statement like REFRESH MATERIALIZED VIEW is run.
Any attempt to rely on the file modification time of the underlying data file is in vain, as jobs like autovacuum may modify the file.
The only way to retain such information is to store the times when you run the statement in a table yourself.
An alternative could be to log all DDL statements (log_statement = 'ddl') and retrieve the information from the log file.

How can I automatically maintain a dump of modified rows in PostGreSql

So, I have a PostGreSQL DB. For some chosen tables in that DB I want to maintain a plain dump of the rows when modified. Note this dump is not a recovery or backup dump. It is just a file which will have the incremental rows. That is, whenever a row is inserted or updated, I want that appended to this file or to a file in a folder. Idea is to load that folder into say something like hive periodically so that I can run queries to check previous states of certain rows, columns. Now, these are very high transactional tables and the dump does not need to be real time. It can be in batches, every hour. I want to avoid a trigger firing hundreds of times every minute. I am looking for something which is off the shelf - already available in PostGreSQL. I did some research but everything is related to PostGreSQL backup - which is not the exact use case.
I have read some links like https://clarkdave.net/2015/02/historical-records-with-postgresql-and-temporal-tables-and-sql-2011/ Implementing history of PostgreSQL table etc - but these are based on insert update trigger and create the history table on PostGreSQL itself. I want to avoid both. I cannot have the history on PostGreSQL as it will be huge soon. And I do not want to keep writing to files through a trigger firing constantly.

Slow insert and update commands during mysql to redshift replication

I am trying to make a replication server from MySQL to redshift, for this, I am parsing the MySQL binlog. For initial replication, I am taking the dump of the mysql table, converting it into a CSV file and uploading the same to S3 and then I use the redshift copy command. For this the performance is efficient.
After the initial replication, for the continuous sync when I am reading the binlog the inserts and updates have to be run sequentially which are very slow.
Is there anything that can be done for increasing the performance?
One possible solution that I can think of is to wrap the statements in a transaction and then send the transaction at once, to avoid multiple network calls. But that would not address the problem that single update and insert statements in redshift run very slow. A single update statement is taking 6s. Knowing the limitations of redshift (That it is a columnar database and single row insertion will be slow) what can be done to work around those limitations?
Edit 1:
Regarding DMS: I want to use redshift as a warehousing solution which just replicates our MYSQL continuously, I don't want to denormalise the data since I have 170+ tables in mysql. During ongoing replication, DMS shows many errors multiple times in a day and fails completely after a day or two and it's very hard to decipher DMS error logs. Also, When I drop and reload tables, it deletes the existing tables on redshift and creates and new table and then starts inserting data which causes downtime in my case. What I wanted was to create a new table and then switch the old one with new one and delete old table
Here is what you need to do to get DMS to work
1) create and run a dms task with "migrate and ongoing replication" and "Drop tables on target"
2) this will probably fail, do not worry. "stop" the dms task.
3) on redshift make the following changes to the table
Change all dates and timestamps to varchar (because the options used
by dms for redshift copy cannot cope with '00:00:00 00:00' dates that
you get in mysql)
change all bool to be varchar - due to a bug in dms.
4) on dms - modify the task to "Truncate" in "Target table preparation mode"
5) restart the dms task - full reload
now - the initial copy and ongoing binlog replication should work.
Make sure you are on latest replication instance software version
Make sure you have followed the instructions here exactly
http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html
If your source is aurora, also make sure you have set binlog_checksum to "none" (bad documentation)

Replicating data between Postgres DBs

I have a Postgres DB that is used by a chat application. The chat system often truncates these tables when they grow to big but I need this data copied to another Postgres database. I will not be truncating the tables in this DB.
How I can configure a few tables on the chat-system's database to replicate data to another Postgres database. Is there a quick way to accomplish this?
Slony can replicate only select tables, but I'm not sure how it handles truncates, and it can be a pain to configure.
You might also use something like pgpool to send copies of the insert statements to a second database.
You might modify the source of your chat application to do two writes (one to each db) when a new record is created.
You could just write a script in Perl/PHP/Python to read from one and write to another, then fire it by cron so that you're sure it gets run before truncation.
If you only copy a batch of rows every other day, you may be better off with a plain INSERT to a different schema in the same database or a different database in the same database cluster (you need something like dblink for that).
The safest / fastest solution in the same database would be a data-modifying CTE. Something along these lines:
WITH del AS (
DELETE FROM tbl
WHERE <some condition>
RETURNING *
)
INSERT INTO backup.tbl
SELECT * FROM del;
For true replication consider these official sources:
https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling
https://www.postgresql.org/docs/current/runtime-config-replication.html