I'm new with PostgreSQL and I would like to know or have some leads on:
Emit event (call an API) when a table is updated
My problem is: I have a SSO that insert row in an event table when user do something (login, register, update info). I need to exploit these inserts in another solution (a loyalty program) on real time.
For now I have in mind to query the table every minute (in nodeJS) and compare the size of table with the size of the previous minute. I think that is not the good way :)
You can do that with a trigger in principle. If the API is external to the database, you'd need a trigger function written in C or a language like PL/Perl or PL/Python that can perform the action you need.
However, unless this action can be guaranteed to be fast, it may not be a good idea to run it in a trigger. The trigger runs in the same transaction as the triggering statement, so if your trigger happens to run for a long time, you end up with a long database transaction. This has two main disadvantages:
Locks are held for a long time, which harms concurrency and hence performance, and also increases the risk of deadlocks.
Autovacuum cannot remove dead rows that were still active when the transaction started, which can lead to excessive table bloat on busy tables.
To avoid that risk, it is often better to use a queuing system: The trigger creates an entry in the queue, which is a fast action, and worker processes read and process these queue entries asynchronously outside the database.
Implementing a queue in a database is notoriously difficult, so you may want to look for existing solutions.
Related
I have a big table(bo_sip_cti_event) which is too largest to even run queries on this so I made the same table (bo_sip_cti_event_day), added trigger after insert on bo_sip_cti_event to add all the same values to bo_sip_cti_event_day and now I am thinking if I significantly slowed down inserts into bo_sip_cti_event.
So generally, does trigger after insert slow down operations on this table?
Yes, the trigger must slow down inserts.
The reason is that relational databases are ACID compliant: All actions, including side-effects like triggers, must be completed before the update transaction completes. So triggers must be executed synchronously, and that consumes CPU, and in your case I/O too, which ultimately takes more time. There's no getting around it.
The answer is yes: it is additional overhead, so obviously it takes time to finish the transaction with the additional trigger execution.
Your design makes me wonder if:
You explored all options to speed up your large table. Even billions of rows can be handled quite fine, if you have proper index ect. But it all depends on the table, the design, the data and the queries.
What exactly your trigger is doing. The table name "_day" raises questions when and where and how exactly this table is cleaned out at midnight. Hopefully not inside the trigger function, and hopefully not with a "DELETE FROM".
I have about 30 million records to delete from a table, and deleting even 10.000 is taking 30 minutes. I'm concerned about issuing the delete command for all the 30 million records, so I'd like to do the delete in batches.
So my approach was to do a loop deleting a batch, then commit, then loop to delete the next batch. But that generates the following error:
LOCATION: exec_stmt_raise, pl_exec.c:3216
ERROR: 0A000: cannot begin/end transactions in PL/pgSQL
HINT: Use a BEGIN block with an EXCEPTION clause instead.
This is the code I wrote:
DO $$
BEGIN
FOR i in 1..30000 loop
DELETE FROM my_table
WHERE id IN (
SELECT id
FROM my_table
WHERE should_delete = true
LIMIT 1000
);
RAISE NOTICE 'Done with batch %', i;
COMMIT;
END LOOP;
END
$$;
What is an alternative to achieving this?
A few things jump out at me:
Transaction overhead in PostgreSQL is significant. It's possible you would see a substantial performance improvement if you just did this all in one big transaction.
It sounds as if you are experimenting on a live production database. Don't do that. Set up a test instance with statistically similar (or identical) data and a similar workload, and use that. That way, if you manage to break something, you'll only hurt the test instance.
Once you have a test instance via (2), you can use it to test (1) and see how long it takes. Then you won't have to guess at which approach is superior.
You mention that other jobs run at fixed times. So this is not a database backing (something like) a web server exposed to the internet. It is a batch system which runs scheduled jobs at predetermined times. Since you already have a scheduling system, use it to schedule your deletes at times when the system is not in use (or when it is unlikely to be in use).
If you decide that you must use multiple transactions, use something other than PL/pgSQL to do the actual looping. You could, for instance, use a shell script or another programming language like Python or Java. Anything with Postgres bindings will do.
The really drastic approach is to replicate the whole database, perform the deletes on the replica, and then replace the original with the replica. The swap may require putting the database into read-only mode for a short time to avoid write inconsistencies and to allow the replica to converge. Since yours is a batch system, that might not matter. Obviously this is the most resource-intensive approach since it requires a whole extra database.
We have multiple processes which read one database table, get available record and work with it. It works fine.
When there is no record in this table each process waits 5 seconds and reads it again.
So, record could idle in the table for 5 seconds which is not good.
What would be recommended solution to eliminate such waiting and proceed immediately when record is created? One solution could be trigger which does something when record created. But this solution requires knowledge of working processes to deliver record to the one of idle processes.
It looks that ideal solution would be when each process starts to read via SQL from something and when record is created one of waiting processes will have it record and other will continue to wait.
Does Oracle 10 provide such or similar mechanism?
Look at Database Change Notification in 10g, which has since been renamed Continuous Query Notification.
I normally like to include an example but it's hard to find a 10g instance these days, and even a short example requires a lot of code. The process looks complicated, it might be better off to use triggers as you suggested, and deal with the tight coupling.
I have 2 stored procedures that interact with the same datatables.
first executes for several hours and second one is instant.
So if I run first one, and after that second one (second connection) then the second procedure will wait for the first one to end.
It is harmless for my data if both can run at the same time, how to do that?
The fact that the shorter query is blocked while being on a second connection suggests that the longer query is getting an exclusive lock on the table during the query.
That suggests it is doing writes, as if they were both reads there shouldn't be any locking issues. PgAdmin can show what locks are active during the longer query and also if the shorter query is indeed blocked on the longer one.
If the longer query is indeed doing writes, it's possible that you may be able to reduce the lock contention -- by chunking it, for example, which could allow readers in between chunked updates/inserts -- but if it's an operation that requires an exclusive write lock, then it will block everybody until it's done.
It's also possible that you may be able to optimize the query such that it needs to be a lower-level lock that isn't exclusive, but that would all depend on the specifics of what the query is doing and your data.
I have a table called deposits
When a deposit is made, the table is locked, so the query looks something like:
SELECT * FROM deposits WHERE id=123 FOR UPDATE
I assume FOR UPDATE is locking the table so that we can manipulate it without another thread stomping on the data.
The problem occurs though, when other deposits are trying to get the lock for the table. What happens is, somewhere in between locking the table and calling psql_commit() something is failing and keeping the lock for a stupidly long amount of time. There are a couple of things I need help addressing:
Subsequent queries trying to get the lock should fail, I have tried achieving this with NOWAIT but would prefer a timeout method (because it may be ok to wait, just not wait for a 'stupid amount of time')
Ideally I would head this off at the pass, and have my initial query only hold the lock for a certain amount of time, is this possible with postgresql?
Is there some other magic function I can tack onto the query (similar to NOWAIT) which will only wait for the lock for 4 seconds before failing?
Due to the painfully monolithic spaghetti code nature of the code base, its not simply a matter of changing global configs, it kinda needs to be a per-query based solution
Thanks for your help guys, I will keep poking around but I haven't had much luck. Is this a non-existing function of psql, because I found this: http://www.postgresql.org/message-id/40286F1F.8050703#optusnet.com.au
I assume FOR UPDATE is locking the table so that we can manipulate it without another thread stomping on the data.
Nope. FOR UPDATE locks only those rows, so that another transaction that attempts to lock them (with FOR SHARE, FOR UPDATE, UPDATE or DELETE) blocks until your transaction commits or rolls back.
If you want a whole table lock that blocks inserts/updates/deletes you probably want LOCK TABLE ... IN EXCLUSIVE MODE.
Subsequent queries trying to get the lock should fail, I have tried achieving this with NOWAIT but would prefer a timeout method (because it may be ok to wait, just not wait for a 'stupid amount of time')
See the lock_timeout setting. This was added in 9.3 and is not available in older versions.
Crude approximations for older versions can be achieved with statement_timeout, but that can lead to statements being cancelled unnecessarily. If statement_timeout is 1s and a statement waits 950ms on a lock, it might then get the lock and proceed, only to be immediately cancelled by a timeout. Not what you want.
There's no query-level way to set lock_timeout, but you can and should just:
SET LOCAL lock_timeout = '1s';
after you BEGIN a transaction.
Ideally I would head this off at the pass, and have my initial query only hold the lock for a certain amount of time, is this possible with postgresql?
There is a statement timeout, but locks are held at transaction level. There's no transaction timeout feature.
If you're running single-statement transactions you can just set a statement_timeout before running the statement to limit how long it can run for. This isn't quite the same thing as limiting how long it can hold a lock, though, because it might wait 900ms of an allowed 1s for the lock, only actually hold the lock for 100ms, then get cancelled by the timeout.
Is there some other magic function I can tack onto the query (similar to NOWAIT) which will only wait for the lock for 4 seconds before failing?
No. You must:
BEGIN;
SET LOCAL lock_timeout = '4s';
SELECT ....;
COMMIT;
Due to the painfully monolithic spaghetti code nature of the code base, its not simply a matter of changing global configs, it kinda needs to be a per-query based solution
SET LOCAL is suitable, and preferred, for this.
There's no way to do it in the text of the query, it must be a separate statement.
The mailing list post you linked to is a proposal for an imaginary syntax that was never implemented (at least in a public PostgreSQL release) and does not exist.
In a situation like this you may want to consider "optimistic concurrency control", often called "optimistic locking". It gives you greater control over locking behaviour at the cost of increased rates of query repetition and the need for more application logic.