the difference between transaction and prepare transaction in postgresql - postgresql

In my perspective of view, the prepared transaction and normal transaction can both be used for distributed transaction, so what's the difference between them?

Related

Distributed transaction on PostgreSQL

Can someone tell me about how I can perform a distributed transaction on PostgreSQL?
I need to start transaction from a node x to node y (this node has a database).
But I don't find information on internet on how I can do it.
All I can do is a distributed query with:
select dblink_connect
('conn','dbname=ConsultaRemota host=192.168.3.9
user=remoto password=12345 port=5432');
select * from dblink('conn','select * from tablaremota') as
temp (id_remoto int, nombre_remoto text, descripcion text);
Using dblink is no true distributed transaction, because it is possible that the remote transaction succeeds, while the local transaction fails.
To perform a distributed transaction:
Create a normal transaction with BEGIN or START TRANSACTION on both databases.
Perform work on both databases.
Once you are done, prepare the transaction on both databases:
PREPARE TRANSACTION 'some_name';
This step will perform everything that could potentially fail during COMMIT and persist the transaction, but it will not yet commit it.
If that step fails somewhere, use ROLLBACK or ROLLBACK PREPARED to abort the transaction on all databases.
Commit the transaction on all databases:
COMMIT PREPARED 'some_name';
This is guaranteed to succeed.
To reliably perform a distributed transaction, you need a transaction manager: that is a piece of software that keeps track of all distributed transactions. This component has to persist its information, so that it can survive a crash. The job of the transaction manager is to commit or rollback any transaction that was left in an incomplete state after a crash.
This is necessary, because prepared transactions will stay around even if you restart the database, and they will hold locks and block VACUUM progress. Such orphaned prepared transactions can break your database.
Never use distributed transactions without a transaction manager!

XA or non XA in JEE

I have question about this paragraph
"Initially, all transactions are local. If a non-XA data source connection is the first resource connection enlisted in a transaction scope, it will become a global transaction when a (second) XA data source connection joins it. If a second non-XA data source connection attempts to join, an exception is thrown." -> link https://docs.oracle.com/cd/E19229-01/819-1644/detrans.html (Global and Local TRansaction).
Can I have the first connection non XA and the second XA? So the first become xa without any Exception thrown? (I'm in doubt)
Can I have fist transaction marked xa, second marked xa and third non xa? (I suppose no)
what happens if the first ejb trans-type=required use XA on db and call a remote EJB trans-type=required(deployed in another app server) with a db non-xa? Could I have in this moment two distinct transaction so that xa is not the right choice? What happens if two ejb are in the same server but in two distinct ear?
"In scenarios where there is only a single one-phase commit resource provider that participates in the transaction and where all the two-phase commit resource-providers that participate in the transaction are used in a read-only fashion. In this case, the two-phase commit resources all vote read-only during the prepare phase of two-phase commit. Because the one-phase commit resource provider is the only provider to complete any updates, the one-phase commit resource does not have to be prepared."
https://www.ibm.com/support/knowledgecenter/SSEQTP_8.5.5/com.ibm.websphere.base.doc/ae/cjta_trans.html
What mean for readonly ? So we can mix xa updates with readonly non xa?
Some of these should really be split out into separate questions. I can answer the first couple of questions.
Can I have the first connection non XA and the second XA?
Yes, if you are willing to use Last Participant Support
So the first become xa without any Exception thrown?
No, the transaction manager cannot convert a non-xa capable connection into one that is xa capable. A normal non-xa commit or rollback will be performed on the connection, but it still participates in the transaction alongside the XA resources. I'll discuss how this is done further down in summarizing the Last Participant Support optimization.
Can I have fist transaction marked xa, second marked xa and third non xa?
I assume you meant to say first connection marked xa, and so forth. Yes, you can do this relying on Last Participant Support
What mean for readonly ?
read-only refers to usage of the transactional resource in a way that does not modify any data. For example, you might run a query that locks a row in a database and reads data from it, but does not perform any updates on it.
So we can mix xa updates with readonly non xa?
You have this in reverse. The document that you cited indicates that the XA resources can be read only and the non-xa resource can make updates. This works because the XA resources have a spec-defined way of indicating to the transaction manager that they did not modify any data (by voting XA_RDONLY in their response to the xa.prepare request). Because they haven't written any data, they only need to release their locks, so the commit of the overall transaction just reduces to non-xa commit/rollback of the one-phase resource and then either resolution of the xa-capable resources (commit or rollback) would have the same effect.
Last Participant Support
Last Participant Support, mentioned earlier, is a feature of the application server that simulates the participation of a non-xa resource as part of a transaction alongside one or more xa-capable resources. There are some risks involved in relying on this optimization, namely a timing window where the transaction can be left in-doubt, requiring manual intervention to resolve it.
Here is how it works:
You operate on all of the enlisted resources (xa and non-xa) as you normally would, and when you are ready, you invoke the userTransaction.commit operation (or rely on container managed transactions to issue the commit for you). When the transaction manager receives the request to commit, it sees that there is a non-xa resource involved and orders the prepare/commit operations to the backend in a special way. First, it tells all of the xa-capable resources to do xa.prepare, and receives the vote from each of them. If all indicate that they have successfully prepared and would be able to commit, then the transaction manager proceeds to issue a commit to the non-xa resource. If the commit of the non-xa resource succeeds, then the transaction manager commits all of the xa-capable resources. Even if the system goes down at this point, it is written in the recovery log that these resources must commit, and the transaction manager will later find them during a recovery attempt and commit them, with their corresponding records in the back end being locked until that happens. If the commit of the non-xa resource fails, then the transaction manager would instead proceed to roll back all of the xa-capable resources. The risk here comes from the possibility that the request to commit the non-xa capable resources might not return at all, leaving the transaction manager no way of knowing whether that resource has committed or rolled back, and thus no way knowing whether to commit or roll back the xa-capable resources, leaving the transaction in-doubt and in need of manual intervention to properly recover. Only enable/rely upon Last Participant Support if you are okay with accepting this risk.

maxTransactionLockRequestTimeoutMillis with concurrent transactions

I'm trying to get a better understanding of the lock acquisition behavior on MongoDB transactions. I have a scenario where two concurrent transactions try to modify the same document. Since one transaction will get the write lock on the document first, the second transaction will run into a write conflict and fail.
I stumbled upon the maxTransactionLockRequestTimeoutMillis setting as documented here: https://docs.mongodb.com/manual/reference/parameters/#param.maxTransactionLockRequestTimeoutMillis and it states:
The maximum amount of time in milliseconds that multi-document transactions should wait to acquire locks required by the operations in the transaction.
However, changing this value does not seem to have an impact on the observed behavior with a write conflict. Transaction 2 does not seem to wait for the lock to be released again but immediately runs into a write conflict when another transaction holds the lock (other than concurrent writes outside a transaction which will block and wait for the lock).
Do I understand correctly that the configured time in maxTransactionLockRequestTimeoutMillis does not include the act of actually receiving the write lock on the document or is there something wrong with my tests?

Purpose of "initial" state in MongoDB two phase commits

Referencing this article about performing two-phase commits with MongoDB:
What is the purpose of the "initial" state of the transaction? Why wouldn't you just insert your transaction document with a "pending" state and save a round trip to the database?
If there are more than one applications running the same transaction, as explained later in the article, "initial" state basically means that logical lock is free, so findAndModify() could update the document to get that lock. Thus only one application is allowed to run the transaction.
There is actually a good reason for the initial state: The 2PC protocol defines a state machine with an 'initial' state, which transitions to 'pending'. The transaction should only be in 'pending' state once all participants are in pending state. In addition the transition to 'commit' should only be started when the transaction is in 'pending' state. Finally the transaction should only be marked as 'committed' once all participants have successfully committed.
If you started the transition to commit without fully reaching the pending state, you risk breaking the atomicity and isolation guarantees.
In the article referenced the transaction state is changed to pending before setting the participants to pending (inserting the records). The transaction state is set to committed before committing the participants. And there is an extra state called 'done', this might lead to bugs in the coordinator implementation. I suggest you take the article with a grain of salt.

How does TransactionScope guarantee data integrity across multiple databases?

Can someone tell me the principle of how TransactionScope guarantees data integrity across multiple databases? I imagine it first sends the commands to the databases and then waits for the databases to respond before sending them a message to apply the command sent earlier. However when execution is stopped abruptly when sending those apply messages we could still end up with a database that has applied the command and one that has not. Can anyone shed some light on this?
Edit:
I guess what Im asking is can I rely on TransactionScope to guarantee data integrity when writing to multiple databases in case of a power outage or a sudden shutdown.
Thanks, Bas
Example:
using(var scope=new TransactionScope())
{
using (var context = new FirstEntities())
{
context.AddToSomethingSet(new Something());
context.SaveChanges();
}
using (var context = new SecondEntities())
{
context.AddToSomethingElseSet(new SomethingElse());
context.SaveChanges();
}
scope.Complete();
}
It promotes it to the Distributed Transaction Coordinator (msdtc) if it detects multiple databases which use each scope as a part of a 2-phase commit. Each scope votes to commit and hence we get the ACID properties but distributed accross databases. It can also be integrated with TxF, TxR. You should be able to use it the way you describe.
The two databases are consistent as the distributed COM+ transaction running under MTC have attached to them, database transactions.
If one database votes to commit (e.g. by doing a (:TransactionScope).Commit()), "it" tells the DTC that it votes to commit. When all databases have done this they have a change-list. As far as the database transactions don't deadlock or conflict with other transactions now (e.g. by means of a fairness algorithm that pre-empts one transaction) all operations for each database are in the transaction log. If the system loses power when not yet commit for one database has finished but it has for another, it has been recorded in the transaction log that all resources have voted to commit, so there is no logical implication that the commit should fail. Hence, next time the database that wasn't able to commit boots up, it will finish those transactions left in this indeterminate state.
With distributed transactions it can in fact happen that the databases become inconsistent. You said:
At some point both databases have to
be told to apply their changes. Lets
say there is a power outage after
telling the first db to apply, then
the databases are out of sync. Or am I
missing something?
You aren't. I think this is known as the generals problem. It can provably not be prevented. The windows of failure is however quite small.