How to retrieve all database sql executed from particular server? - postgresql

I am running an application in a particular server which updates a postgres database table.Is there any way that I can retrieve all the queries executed to that database (may be my table) from a -period of time if I have admin privilege?

You can install the extension pg_stat_statements which will give you a summary of the queries executed.
Note that the number of queries that are stored in the table pg_stat_statements is limited (the limit can be configured). So you probably want to store a snapshot of that table on a regular basis. How often depends on your workload. Increasing pg_stat_statements.max means you can reduce the frequency of taking snapshots from that table.

Related

Get maximum number of job slaves per instance that can be created for the execution in PostgreSQL

I am migrating oracle database to postgresql.
While migration came to know with the following query in the oracle side.
Oracle Query:
SELECT
TRIM(value) AS val
FROM v$parameter
WHERE name = 'job_queue_processes';
I just want to know how can we get the maximum number of job slaves per instance that can be created for the execution at the postgresql side.
I have created pg_cron extension and created required jobs till now. But one of the function is using above query in oracle, so I just want to convert it into the postgresql.
The documentation is usually a good source of information.
Important: By default, pg_cron uses libpq to open a new connection to the local database.
In this case, there is no specific limit. It would be limited in the same way other user connections are limited. Mostly by max_connections, but possibly lowered from that for particular users or particular databases by the ALTER command. You could create a user specifically for cron if you wanted to limit its connections separately, then grant that user privileges of other roles it will operate on behalf of. I don't know what pg_cron does if the limit is reached, does it deal with it gracefully or not?
Alternatively, pg_cron can be configured to use background workers. In that case, the number of concurrent jobs is limited by the max_worker_processes setting, so you may need to raise that.
Note that the max number of workers may have to be shared with parallel execution workers and maybe with other extensions.

Insert data into remote DB tables from multiple databases through trigger or replication or foreign data wrapper

I need some advice about the following scenario.
I have multiple embedded systems supporting PostgreSQL database running at different places and we have a server running on CentOS at our premises.
Each system is running at remote location and has multiple tables inside its database. These tables have the same names as the server's table names, but each system has different table name than the other systems, e.g.:
system 1 has tables:
sys1_table1
sys1_table2
system 2 has tables
sys2_table1
sys2_table2
I want to update the tables sys1_table1, sys1_table2, sys2_table1 and sys2_table2 on the server on every insert done on system 1 and system 2.
One solution is to write a trigger on each table, which will run on every insert of both systems' tables and insert the same data on the server's tables. This trigger will also delete the records in the systems after inserting the data into server. The problem with this solution is that if the connection with the server is not established due to network issue than that trigger will not execute or the insert will be wasted. I have checked the following solution for this
Trigger to insert rows in remote database after deletion
The second solution is to replicate tables from system 1 and system 2 to the server's tables. The problem with replication will be that if we delete data from the systems, it'll also delete the records on the server. I could add the alternative trigger on the server's tables which will update on the duplicate table, hence the replicated table can get empty and it'll not effect the data, but it'll make a long tables list if we have more than 200 systems.
The third solution is to write a foreign table using postgres_fdw or dblink and update the data inside the server's tables, but will this effect the data inside the server when we delete the data inside the system's table, right? And what will happen if there is no connectivity with the server?
The forth solution is to write an application in python inside each system which will make a connection to server's database and write the data in real time and if there is no connectivity to the server than it will store the data inside the sys1.table1 or sys2.table2 or whatever the table the data belongs and after the re-connect, the code will send the tables data into server's tables.
Which option will be best according to this scenario? I like the trigger solution best, but is there any way to avoid the data loss in case of dis-connectivity from the server?
I'd go with the fourth solution, or perhaps with the third, as long as it is triggered from outside the database. That way you can easily survive connection loss.
The first solution with triggers has the problems you already detected. It is also a bad idea to start potentially long operations, like data replication across a network of uncertain quality, inside a database transaction. Long transactions mean long locks and inefficient autovacuum.
The second solution may actually also be an option if you you have a recent PostgreSQL versions that supports logical replication. You can use a publication WITH (publish = 'insert,update'), so that DELETE and TRUNCATE are not replicated. Replication can deal well with lost connectivity (for a while), but it is not an option if you want the data at the source to be deleted after they have been replicated.

Processing multiple concurrent read queries in Postgres

I am planning to use AWS RDS Postgres version 10.4 and above for storing data in a single table comprising of ~15 columns.
My use case is to serve:
1. Periodically (after 1 hour) store/update rows in to this table.
2. Periodically (after 1 hour) fetch data from the table say 500 rows at a time.
3. Frequently fetch small data (10 rows) from the table (100's of queries in parallel)
Does AWS RDS Postgres support serving all of above use cases
I am aware of Read-Replicas support, but is there any in built load balancer to serve the queries that come in parallel?
How many read queries can Postgres be able to process concurrently?
Thanks in advance
Your usecases seems to be a normal fit for all relational database systems. So I would say: yes.
The question is: how fast the DB can handle the 100 queries (3).
In general the postgresql documentation is one of the best I ever read. So give it a try:
https://www.postgresql.org/docs/10/parallel-query.html
But also take into consideration how big your data is!
That said, try w/o read replicas first! You might not need them.

How to see changes in a postgresql database

My postresql database is updated each night.
At the end of each nightly update, I need to know what data changed.
The update process is complex, taking a couple of hours and requires dozens of scripts, so I don't know if that influences how I could see what data has changed.
The database is around 1 TB in size, so any method that requires starting a temporary database may be very slow.
The database is an AWS instance (RDS). I have automated backups enabled (these are different to RDS snapshots which are user initiated). Is it possible to see the difference between two RDS automated backups?
I do not know if it is possible to see difference between RDS snapshots. But in the past we tested several solutions for similar problem. Maybe you can take some inspiration from it.
Obvious solution is of course auditing system. This way you can see in relatively simply way what was changed. Depending on granularity of your auditing system down to column values. Of course there is impact on your application due auditing triggers and queries into audit tables.
Another possibility is - for tables with primary keys you can store values of primary key and 'xmin' and 'ctid' hidden system columns (https://www.postgresql.org/docs/current/static/ddl-system-columns.html) for each row before updated and compare them with values after update. But this way you can identify only changed / inserted / deleted rows but not changes in different columns.
You can make streaming replica and set replication slots (and to be on the safe side also WAL log archiving ). Then stop replication on replica before updates and compare data after updates using dblink selects. But these queries can be very heavy.

Set index statistics after insert in Firebird database

I would like to automate the process of setting index statistics in a Firebird database so that it doesn't require a database administrator to run the command, or a user to click a button.
Since the statistics only need to be recalculated after a large number of inserts or deletes, I am considering using an After Insert and After Delete trigger to keep track of how many inserts or deletes have taken place, and then run a procedure to set index statistics based on that value.
My question is whether there is anything to watch out for when setting the index statistics in this manner on a live database. To be clear, I am not rebuilding indexes, but recalculating index statistics only. It is quite possible that this would occur during a mass import or delete operation. Would calculating index statistics during a mass import or delete have the potential to cause any problems?
It is safe to recalculate index statistics on a live database, while it is in use. It is also safe to do that in PSQL, e.g. in a stored procedure. For example I'm running a scheduled batch job in the night, which executes a stored procedure recalculating statistics for all indexes.
I'm not sure if it is wise to do that in a trigger, because triggers in Firebird fire per row and not per statement, thus you have to make sure to run that in some kind of conditional branch in your PSQL body.