I am new to postgresql and trying to understand advisory locks. I have the following two scenarios:
With different databases in two different sessions: (Works in expected manner)
Session 1: SELECT pg_advisory_lock(1); Successfully acquires the lock
Session 2 (note in different database): SELECT pg_advisory_lock(1); Successfully acquires the lock
With Different Schemas in same database: When I do the same operation, the second 'session' blocks.
It appears that advisory locks operate at database level rather than (database and schema) combination. Is my assumption correct or is there anything I am missing?
In postgres schema is a namespace. More than just a prefix, but less than another database. in your case two, second session not "blocks", but rather is waiting as per docs:
If another session already holds a lock on the same resource
identifier, this function will wait until the resource becomes
available.
Regarding successful locking on different databases:
After you run SELECT pg_advisory_lock(1); checkout pg_locks, column objid
OID of the lock target within its system catalog, or null if the
target is not a general database object
So this number is per database - you can reference same 1 for many databases - those will be different OIDs.
Related
I need some advice about the following scenario.
I have multiple embedded systems supporting PostgreSQL database running at different places and we have a server running on CentOS at our premises.
Each system is running at remote location and has multiple tables inside its database. These tables have the same names as the server's table names, but each system has different table name than the other systems, e.g.:
system 1 has tables:
sys1_table1
sys1_table2
system 2 has tables
sys2_table1
sys2_table2
I want to update the tables sys1_table1, sys1_table2, sys2_table1 and sys2_table2 on the server on every insert done on system 1 and system 2.
One solution is to write a trigger on each table, which will run on every insert of both systems' tables and insert the same data on the server's tables. This trigger will also delete the records in the systems after inserting the data into server. The problem with this solution is that if the connection with the server is not established due to network issue than that trigger will not execute or the insert will be wasted. I have checked the following solution for this
Trigger to insert rows in remote database after deletion
The second solution is to replicate tables from system 1 and system 2 to the server's tables. The problem with replication will be that if we delete data from the systems, it'll also delete the records on the server. I could add the alternative trigger on the server's tables which will update on the duplicate table, hence the replicated table can get empty and it'll not effect the data, but it'll make a long tables list if we have more than 200 systems.
The third solution is to write a foreign table using postgres_fdw or dblink and update the data inside the server's tables, but will this effect the data inside the server when we delete the data inside the system's table, right? And what will happen if there is no connectivity with the server?
The forth solution is to write an application in python inside each system which will make a connection to server's database and write the data in real time and if there is no connectivity to the server than it will store the data inside the sys1.table1 or sys2.table2 or whatever the table the data belongs and after the re-connect, the code will send the tables data into server's tables.
Which option will be best according to this scenario? I like the trigger solution best, but is there any way to avoid the data loss in case of dis-connectivity from the server?
I'd go with the fourth solution, or perhaps with the third, as long as it is triggered from outside the database. That way you can easily survive connection loss.
The first solution with triggers has the problems you already detected. It is also a bad idea to start potentially long operations, like data replication across a network of uncertain quality, inside a database transaction. Long transactions mean long locks and inefficient autovacuum.
The second solution may actually also be an option if you you have a recent PostgreSQL versions that supports logical replication. You can use a publication WITH (publish = 'insert,update'), so that DELETE and TRUNCATE are not replicated. Replication can deal well with lost connectivity (for a while), but it is not an option if you want the data at the source to be deleted after they have been replicated.
From https://wiki.postgresql.org/wiki/Psycopg2_Tutorial
PostgreSQL can not drop databases within a transaction, it is an all
or nothing command. If you want to drop the database you would need to
change the isolation level of the database this is done using the
following.
conn.set_isolation_level(0)
You would place the above immediately preceding the DROP DATABASE
cursor execution.
Why "If you want to drop the database you would need to change the isolation level of the database"?
In particular, why do we need to change the isolation level to 0? (If I am correct, 0 means psycopg2.extensions.ISOLATION_LEVEL_READ_COMMITTED)
From https://stackoverflow.com/a/51859484/156458
The operation of destroying a database is implemented in a way which
prevents undoing it - therefore you can not run it from inside a
transaction because transactions are always undoable. Also keep in
mind that unlike most other databases PostgreSQL allows almost all DDL
statements (obviously not the DROP DATABASE one) to be executed inside
a transaction.
Actually you can not drop a database if anyone (including you) is
currently connected to this database - so it does not matter what is
your isolation level, you still have to connect to another database
(e.g. postgres)
"you can not run it from inside a transaction because transactions are always undoable". Then how can I drop a database not from inside a transaction?
I found my answer at https://stackoverflow.com/a/51880577/156458
I'm unfamiliar with psycopg2 so I can only provide steps to be performed.
Steps to be taken to perform DROP DATABASE from Python:
Connect to a different database, which you don't want to drop
Store current isolation level in a variable
Set isolation level to 0
Execute DROP DATABASE query
Set isolation level back to original (from #2)
Steps to be taken to perform DROP DATABASE from PSQL:
Connect to a different database, which you don't want to drop
Execute DROP DATABASE query
Code in psql
\c second_db
DROP DATABASE first_db;
Remember, that there can be no live connections to the database you are trying to drop.
I use connection pooling provided by JBoss 7 and I do a connection.close() after every action.
But when I create a temporary table, it does not remain for the current user because he is using a new session from the database (due to connection.close() and pooling).
For example, I create a temp table in an action. The user changes page. The new action has to do queries in temp table but this does not exist anymore.
So, I really don't know how to provide temporary tables with this kind of architecture.
Temporary tables in combination with connection pooling means request scope for the temporary tables. You'd like to have session scope temporary tables.
I am not aware of an out-of-the box solution. So options are
Create normal tables with a prefix/suffix to associate these with a user
Use the same connection for a user throughout the whole session
Rewrite the application: Instead of temporary tables, read the data completely, and store it in the HTTP session. Possibly the database can paginate so there is no need to cache intermediate results etc.
Options 1 and 2 require cleanup, especially if the session is not shut down properly. Option 2 possibly requires many resources. So I would investigate the 3rd option, even if it seems to be the most cumbersone one.
I have a PostgreSQL 9.2.2 database that serves orders to my ERP system. The database tables contain boolean columns indicating if a customer is added or not among other records. The code I use extracts the rows from the database and sends them to our ERP system one at a time (single threaded). My code works perfectly in this regard; however over the past year our volume has grown enough to require a multi-threaded solution.
I don't think the MVCC modes will work for me because the added_customer column is only updated once a customer has been successfully added. The default MVCC modes could cause the same row to be worked on at the same time resulting in duplicate web service calls. What I want to avoid is duplicate web service calls to our ERP system as they can be rather heavy, although admittedly I am not an expert on MVCC nor the other modes that PostgreSQL provides.
My question is: How can I be sure that a row, or series of rows returned in one select statement are excluded from other queries to the database in separate threads?
You will need to record the fact that the rows are being processed somehow. You will also need to deal with concurrent attempts to mark them as being processed and handle failures with sending them to your ERP system.
You may find SELECT ... FOR UPDATE useful to get a set of rows and simultaneously lock them against updates. One approach might be for each thread to select a target row, try to add it's ID to a "processing" table, then remove it in the same transaction you update added_customer.
If a thread fetches no candidate rows, or fails to insert then it just needs to sleep briefly and try again. If anything goes badly wrong then you should have rows left in the "processing" table that you can inspect/correct.
Of course the other option is to just grab a set of candidate rows and spawn a separate process/thread for each that communicates with the ERP. That keeps the database fetching single-threaded while allowing multiple channels to the ERP.
You can add a column user_is_proccesed to the table. It can hold the process id for the back end, that updates the record.
Then use a small serializable transaction to set the user_is_proccesed to "lock row for proccesing".
Something like:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE user_table
SET user_is_proccesed = pg_backend_pid()
WHERE <some condition>
AND user_is_proccesed IS NULL; -- no one is proccesing it now
COMMIT;
The key thing here - with SERIALIZABLE only one transaction can successfully update the record (all other concurrent SERIALIZABLE updates will fail with ERROR: could not serialize access due to concurrent update).
I try to use foreign table to link 2 postgresql databases
everything is fine and I can retrieve all data I want
the only issue is that the data wrapper seems to lock tables in foreign server and it's very annoying when I unit test my code
if I don't do any select request I can initialize data and truncate both tables in local server and tables in remote server
but I execute one select statement the truncate command on remote server seems to be in deadlock state
do you know how I can avoid this lock?
thanks
[edit]
I use this data wrapper to link 2 postgresql databases: http://interdbconnect.sourceforge.net/pgsql_fdw/pgsql_fdw-en.html
I use table1 of db1 as foreign table in db2
when I execute a select query in foreign_table1 in db2, there is an AccessShareLock for table1 in db1
the query is very simple: select * from foreign_table1
the lock is never released so when I execute a truncate command at the end of my unit test, there is a conflict because the truncate add an AccessExclusiveLock
I don't know how to release the first AccessShareLock but I think it would be done automatically by the wrapper...
hope this help
AccessExclusiveLock and AccessShareLock aren't generally obtained explicitly. They're obtained automatically by certain normal statements. See locking - the lock list says which statements acquire which locks, which says:
ACCESS SHARE
Conflicts with the ACCESS EXCLUSIVE lock mode only.
The SELECT command acquires a lock of this mode on referenced tables.
In general, any query that only reads a table and does not modify it
will acquire this lock mode.
What this means is that your 1st transaction hasn't committed or rolled back (thus releasing its locks) yet, so the 2nd can't TRUNCATE the table because TRUNCATE requires ACCESS EXCLUSIVE which conflicts with ACCESS SHARE.
Make sure the 1st transaction commits or rolls back.
BTW, is the "foreign" database actually the local database, ie are you using pgsql_fdw as an alternative to dblink to simulate autonomous transactions?