I can do currently active count of all INSERT queries executed on the PostgreSQL server like this:
SELECT count(*) FROM pg_stat_activity where query like 'INSERT%'
But is there a way to count all INSERT queries executed on the server in a given period of time? E.g. in the past minute?
I have a bunch of tables into which I send a lot of inserts and I would like to somehow aggregate how many rows I am inserting per minute. I could code a solution for this, but it'd be so much easier if this was possible to somehow extract directly from the server.
Any type of stats like this, in a certain period of time, would be very helpful, an average time it takes for the query to process, or knowing the bandwidth that goes through per minute, etc.
Note: I am using PostgreSQL 12
If not already done, install pg_stat_statements extension and take some snapshots of the view pg_stat_statements: the diff will give the number of queries executed between 2 snapshots.
Note: It doesn’t save each individual query, rather it parameterizes them and then saves the aggregated result.
See https://www.citusdata.com/blog/2019/02/08/the-most-useful-postgres-extension-pg-stat-statements/
I believe that you can use the audit trigger.
This audit will create a table that register INSERT, UPDATE and DELETE actions. So you can adapt. So every time that your database runs one of those commands, the audit table register the action, the table and the time of the action. So, it will be easy to do a COUNT() on desired table with a WHERE from a minute ago.
I couldn't come across anything solid, so I have created a table where I log a number of insert transactions using a script that runs as a cron job. It was simple enough to implement and I do not get estimations, but the real values instead. I actually count all new rows inserted to tables in a given interval.
Related
I have built an application with Spring Boot and JPA to migrate a Jira postgres database.
Basically, I have 5000 users that I need to migrate. Each user means 67 update queries in different tables.
Each query uses the LOWER function to compare ignoring case.
Some pseudo-code:
for (user : users){
for (query : queries) {
jdbcTemplate.execute(query.replace(user....
I ignore any errors, so if a single query fails, I still go on and execute the other 66.
I am running this in 10 separate threads and each user is taking roughly 120 seconds to migrate. (20 threads resulted in database dead lock)
At this pace, it's gonna take more than a day, which is not acceptable (I am running this in a test environment before doing in production).
The queries looks like this:
UPDATE table SET column = 'NEWUSERNAME' where LOWER(column) = LOWER('CURRENTUSERNAME');
Is there anything I can do to try and optimize this migration?
UPDATE:
I changed my approach. First, I select every element with the CURRENTUSERNAME and get it's ID. Then I create the UPDATE queries using the ID as the "where" clause.
Other than that, it is still taking a long time (4+ hours) to execute.
I am running millions of UPDATEs, each at a time. I know jdbcTemplate has a bulk method, but if a single UPDATE fails, I believe it roll's back every successful update too. Also, I am not aware of the performance improvement it would have, if any.
So, to update the question, given that I have millions of UPDATE queries to run, what would be the best way execute them? (bulk, multi threading, something else)
I have a cron job that fires a query every three minutes. With each firing, data is entered into my db. So my database keeps on growing after each interval. However I am only interested in the latest row. Is it safe? Will postgresql truncate oldest entries automatically?
If you post the gist of your cron job you could get a better answer.
If its a direct insert, execute a truncate before hand to delete the old unwanted data. Delete is also possible, but you will end up with a lot of dead tuples and you will need to vacuum the table on a regular basis.
update is a good option but it depends on how much of the data is static and how much is not. eg. if you repeat values in any columns then go for update. This will also be subject to dead tuples and vacuuming.
if you are loading from an external source, eg csv, json, xml there are methods to overwrite existing data automatically. pg_loader may be an option here.
I have a postgresql table with about 250K records. It gets updated a few times an hour. However, the entire table gets deleted and new records added. (Batch job).
I don't have much control over that process. During the time it takes the transaction to delete/re-load queries on the table basically lock/hang until the job finishes. The job takes about a minute to run. We have real time users looking at this data (which is spatial data on a map with a time slider). They recogize the lost records very easily.
Is there anything that can be done about these 60 or so second query times during the update. I've thought about loading into a 2nd table, dropping the original and renaming the 2nd to the original but that introduces more chance of error. Are there any settings that will just grab data as is and not necessarily look for a consistent view of the data.
Basically just looking for ideas on how to handle this situation.
I'm running postgresql 9.3.1
Thanks
Context:
Using PostgreSQL (9.6), for a custom synchronisation project, we have an agent that make a lot of INSERTs between a database_1 and database_2 when syncing data.
For example: DB2 is down during 5 minutes, there are 40,000 new lines in DB1, so when DB2 is up again, all the 40,000 lines will be immediately synced from DB1 to DB2.
All this works great.
Problem/Fact:
During the synchronisation, the INSERT rate is around 1000 lines / second.
However, when we do a simple SELECT count(*) FROM table during the sync (in the middle of these thousands of INSERTs), we noticed that the INSERT rate is falling town to a few dozens per second (instead of 1000x per second).
Question:
Is there any reason why a SELECT operation (made inside pgAdmin, by another process than the syncing process) is slowing down the batch of INSERT ?
Any locking or internal reason that might explain this?
Or should I provide more information? How can I debug more?
Hints:
Logs are fully activated and all the INSERTs always take around 0.700ms (before slowdown and same during slowdown), it doesn't change.
INSERTs are currently performed one row by one row
(I'll be happy to provide more information)
I am facing an issue, possibly quite easy to solve, I am just new to advanced transaction settings.
Every 30 minutes I am running an INSERT query that is getting latest data from a linked server to my client's server, to a table we can call ImportTable. For this I have a simple job that looks like this:
BEGIN TRAN
DELETE FROM ImportTable
INSERT INTO ImportTable (columns)
SELECT (columns)
FROM QueryGettingResultsFromLinkedServer
COMMIT
The thing is, each time the job runs the ImportTable is locked for the query run time (2-5 minutes) and nobody can read the records. I wish the table to be read-accessible all the time, with as little downtime as possible.
Now, I read that it is possible to allow SNAPSHOT ISOLATION in the database settings that could probably solve my problem (set to FALSE at the moment), but I have never played with different transaction isolation types and as this is not my DB but my client's, I'd rather not alter any database settings if I am not sure if it can break something.
I know I could have an intermediary table that the records are inserted to and then inserted to the final table and that is certainly a possible solution, I was just hoping for something more sophisticated and learning something new in the process.
PS: My client's server & database is fairly new and barely used, so I expect very little impact if I change some settings, but still, I cannot just randomly change various settings for learning purposes.
Many thanks!
Inserts wont normally block the table ,unless it is escalated to table level.In this case,you are deleting table first and inserting data again,why not insert only updated data?.for the query you are using transaction level (rsci)snapshot isolation will help you,but you will have an added impact of row version which means sql will store row versions of rows that changed in tempdb.
please see MCM isolation videos of Kimberely tripp for indepth understanding ,also dont forget to test in stage enviornment.
You are making this harder than it needs to be
The problem is the 2-5 minutes that you let be part of a transaction
It is only a few thousand rows - that part takes like a few milliseconds
If you need ImportTable to be available during those few milliseconds then put it in a SnapShot
Delete ImportTableStaging;
INSERT INTO ImportTableStaging(columns)
SELECT (columns)
FROM QueryGettingResultsFromLinkedServer;
BEGIN TRAN
DELETE FROM ImportTable
INSERT INTO ImportTable (columns) with (tablock)
SELECT (columns)
FROM ImportTableStaging
COMMIT
If you are worried about concurrent update to ImportTableStaging then use a #temp