Replaying database load on test instance - postgresql

I am able to download logs from RDS having queries in the format
2022-09-18 00:00:02 UTC:<host><region>.compute.amazonaws.com(10000):user#db:[20000]:LOG: duration: 0.768 ms statement: Possibly multi line query>
I was suggested to use pgreplay, two things
(1) I want to provide a fixed user for all queries.
(2) The query format above is not compatible.
Is there something can be done or better to write a custom script? Or any alternative tool you would suggest?

Related

Logging sql commands but not committing them

I'm looking for some advice for a system that has a "dry run" mode that will be used to help with the QA of data transformations. The idea is to be able to log and report what would happen to the database and then apply that log if not in dry run mode.
We've considered creating a sort of common format csv file but that has limited us. My other thought was to log the sql inserts/updates/deletes so we can view them in dry run and apply them in production mode, but that certainly adds work and the concern regarding sql injection (though I can use parameterized queries).
What would be ideal is to use something within postgres, such as using rollback transactions but then get the sql output from within those transactions.
Has anyone had to solve for something like this? Is this kind of a pipedream that may be more trouble than it's worth?

Is there a way to show everything that was changed in a PostgreSQL database during a transaction?

I often have to execute complex sql scripts in a single transaction on a large PostgreSQL database and I would like to verify everything that was changed during the transaction.
Verifying each single entry on each table "by hand" would take ages.
Dumping the database before and after the script to plain sql and using diff on the dumps isn't really an option since each dump would be about 50G of data.
Is there a way to show all the data that was added, deleted or modified during a single transaction?
Dude, What are you looking for is the most searchable thing on the internet when it comes to capturing Database changes. It is a kind of version control we can say.
But as long as I know, sadly there are no in-built approaches are available in PostgreSQL or MySql. But you can overcome it by setting/adding some triggers for your most usable operations.
You can create some backup schemas, and tables to capture your changes that are changed(updated), created, or deleted.
In this way you can achieve what you want. I know this process is fully manual, But really effective.
If you need to analyze the script's behaviour only sporadically, then the easiest approach would be to change server configuration parameter log_min_duration_statement to 0 and then back to any value it had before the analysis. Then all of the script activity will be written to the instance log.
This approach is not suitable if your storage is not prepared to accommodate this amount of data, or for systems in which you don't want sensitive client data to be written to a plain-text log file.

Is there an equivalent of pg_backend_pid in Cassandra?

I am starting to use Cassandra and I need to work with several sessions without creating different roles. I am trying to implement a record that saves the session ID in each modification (aka AuditLog). Previously it was already implemented in Postgresql, so I learned about triggers. I am adapting to Cassandra's triggers. So far I can't find a way to track a cql session / connection that doesn't include an external process. But in this way the use of triggers is excluded.
Cassandra has the function to enable or disable traces with the command TRACING, which will create traces for all the queries in that session. There is also a more useful approach with nodetool settraceprobability, where you can determine a percentage of traces stored.
All those traces are kept in a separate keyspace, for 3.x this is system_traces, the traces are kept with a Time to Live (TTL) of 24 hours.

PostgreSQL equivalent of Oplog Tailing in MongoDB

Is there an equivalent process similar to oplog tailing for MongoDB in PostgreSQL? I find it very useful in MongoDB for real-time analytics and building out dashboards on what is going on in the DB by peeking at the log. Unfortunately MongoDB is not useful for my particular DB needs. I'm looking really for a legitimate, non-hackish, way of doing it. This would be put in a production environment and I can't cause more problems than it's worth down the line.
Thanks in advance and lets try to not make this a NO-SQL vs RDBMS debate.
In PostgreSQL 9.4 and newer you can use the test_decoding plugin via pg_recvlogical to stream changes from a replication slot.
In 9.3 and newer pg_xlogdump can decode the transaction log segments, but that means you have to capture and dump each segment, and it really requires WAL archiving to be enabled in order to be practical.
You should also look at:
The pg_stat_statements extension
The built-in pg_stat_activity view
The built-in pg_stat_.. views like pg_stat_user_indexes, etc.

How to parse postgresql wal log file to sql

The PostgreSQL database server stores "change data" in WAL log file, and I wanted to parse the archive log file to sql like mysqlbinlog parse binlog file to sql, That I can find the application execute sql. Does anyone have a tool like this?
You can't. It's the changes to the actual disk blocks.
You can set the server to log all the SQL statements to file if you would like though. Not sure you'd be able to replay them without being very clear about transaction boundaries though.
This feature is currently under development. (Look for "logical replication" patches by Andres Freund.) It's a huge project, so don't hold your breath. The short answer is: It's currently not possible.
If you are feeling adventurous, xlogdump might get you part way to extracting data from your WAL segments. If you truly only need the SQL that gets executed in your cluster, then set log_min_duration_statement = 0 to log all statements.
Now you can replicate with SQL. Look at pglogical. However, it doesn't cover schema changes.