What happens when rollback fails in two Phase Commit protocol - rest

I am trying to implement transactions for distributed services in java over REST. I have some questions to ask.
What happens when resources reply affirmatively and in phase 2 they fail to commit?
I tried to search but unfortunately I could not find a proper answer to what happens when rollback fails in 2PC protocol. I know that its a blocking protocol and it waits for response for infinite time, but what happens in real world scenario?
what are the other protocols for distributed transaction management?
I read about JTA for transaction implementation, but is there any other implementation which can be used to implement transactions?
Any reply will be helpful. Thanks in advance.

I don't have answers to these questions but I created a specific method for my specific case. So posting here if some one need transactions for the same cases.
Since In my case there is no change to current entries in database (or indexer, which is also running as a service) but there were only new entries in system at different places, so the false failures were not harmful but false success were. So for my particular case I followed following strategy:
i. All the resources adds a transaction id with the row in database. In first phase when coordinator ask resources, all resources makes entries in database with transaction id generated by coordinator.
ii. After phase 1, when all resources reply affirmatively that means resources have made changes to database, coordinator makes an entry in it's own log that transaction is successful and conveys the same to resources. All resources makes the transaction status successful in the row of data inserted.
iii. A service run continuously to search the database and correct the transaction status by asking the status from coordinator. If there is no entry or failure entry, transaction returns failure status, and same is updated on service. When fetching data, if there is an entry in database which has failure label, then it always checks the transaction status with coordinator, if there is no entry of failure it filters the results. Hence those data entries are not supplied for which there is no information or there is failure information. So the outcome is always consistent.
This strategy provides a way for atomicity for my case.

Related

XA or non XA in JEE

I have question about this paragraph
"Initially, all transactions are local. If a non-XA data source connection is the first resource connection enlisted in a transaction scope, it will become a global transaction when a (second) XA data source connection joins it. If a second non-XA data source connection attempts to join, an exception is thrown." -> link https://docs.oracle.com/cd/E19229-01/819-1644/detrans.html (Global and Local TRansaction).
Can I have the first connection non XA and the second XA? So the first become xa without any Exception thrown? (I'm in doubt)
Can I have fist transaction marked xa, second marked xa and third non xa? (I suppose no)
what happens if the first ejb trans-type=required use XA on db and call a remote EJB trans-type=required(deployed in another app server) with a db non-xa? Could I have in this moment two distinct transaction so that xa is not the right choice? What happens if two ejb are in the same server but in two distinct ear?
"In scenarios where there is only a single one-phase commit resource provider that participates in the transaction and where all the two-phase commit resource-providers that participate in the transaction are used in a read-only fashion. In this case, the two-phase commit resources all vote read-only during the prepare phase of two-phase commit. Because the one-phase commit resource provider is the only provider to complete any updates, the one-phase commit resource does not have to be prepared."
https://www.ibm.com/support/knowledgecenter/SSEQTP_8.5.5/com.ibm.websphere.base.doc/ae/cjta_trans.html
What mean for readonly ? So we can mix xa updates with readonly non xa?
Some of these should really be split out into separate questions. I can answer the first couple of questions.
Can I have the first connection non XA and the second XA?
Yes, if you are willing to use Last Participant Support
So the first become xa without any Exception thrown?
No, the transaction manager cannot convert a non-xa capable connection into one that is xa capable. A normal non-xa commit or rollback will be performed on the connection, but it still participates in the transaction alongside the XA resources. I'll discuss how this is done further down in summarizing the Last Participant Support optimization.
Can I have fist transaction marked xa, second marked xa and third non xa?
I assume you meant to say first connection marked xa, and so forth. Yes, you can do this relying on Last Participant Support
What mean for readonly ?
read-only refers to usage of the transactional resource in a way that does not modify any data. For example, you might run a query that locks a row in a database and reads data from it, but does not perform any updates on it.
So we can mix xa updates with readonly non xa?
You have this in reverse. The document that you cited indicates that the XA resources can be read only and the non-xa resource can make updates. This works because the XA resources have a spec-defined way of indicating to the transaction manager that they did not modify any data (by voting XA_RDONLY in their response to the xa.prepare request). Because they haven't written any data, they only need to release their locks, so the commit of the overall transaction just reduces to non-xa commit/rollback of the one-phase resource and then either resolution of the xa-capable resources (commit or rollback) would have the same effect.
Last Participant Support
Last Participant Support, mentioned earlier, is a feature of the application server that simulates the participation of a non-xa resource as part of a transaction alongside one or more xa-capable resources. There are some risks involved in relying on this optimization, namely a timing window where the transaction can be left in-doubt, requiring manual intervention to resolve it.
Here is how it works:
You operate on all of the enlisted resources (xa and non-xa) as you normally would, and when you are ready, you invoke the userTransaction.commit operation (or rely on container managed transactions to issue the commit for you). When the transaction manager receives the request to commit, it sees that there is a non-xa resource involved and orders the prepare/commit operations to the backend in a special way. First, it tells all of the xa-capable resources to do xa.prepare, and receives the vote from each of them. If all indicate that they have successfully prepared and would be able to commit, then the transaction manager proceeds to issue a commit to the non-xa resource. If the commit of the non-xa resource succeeds, then the transaction manager commits all of the xa-capable resources. Even if the system goes down at this point, it is written in the recovery log that these resources must commit, and the transaction manager will later find them during a recovery attempt and commit them, with their corresponding records in the back end being locked until that happens. If the commit of the non-xa resource fails, then the transaction manager would instead proceed to roll back all of the xa-capable resources. The risk here comes from the possibility that the request to commit the non-xa capable resources might not return at all, leaving the transaction manager no way of knowing whether that resource has committed or rolled back, and thus no way knowing whether to commit or roll back the xa-capable resources, leaving the transaction in-doubt and in need of manual intervention to properly recover. Only enable/rely upon Last Participant Support if you are okay with accepting this risk.

Synchronising transactions between database and Kafka producer

We have a micro-services architecture, with Kafka used as the communication mechanism between the services. Some of the services have their own databases. Say the user makes a call to Service A, which should result in a record (or set of records) being created in that service’s database. Additionally, this event should be reported to other services, as an item on a Kafka topic. What is the best way of ensuring that the database record(s) are only written if the Kafka topic is successfully updated (essentially creating a distributed transaction around the database update and the Kafka update)?
We are thinking of using spring-kafka (in a Spring Boot WebFlux service), and I can see that it has a KafkaTransactionManager, but from what I understand this is more about Kafka transactions themselves (ensuring consistency across the Kafka producers and consumers), rather than synchronising transactions across two systems (see here: “Kafka doesn't support XA and you have to deal with the possibility that the DB tx might commit while the Kafka tx rolls back.”). Additionally, I think this class relies on Spring’s transaction framework which, at least as far as I currently understand, is thread-bound, and won’t work if using a reactive approach (e.g. WebFlux) where different parts of an operation may execute on different threads. (We are using reactive-pg-client, so are manually handling transactions, rather than using Spring’s framework.)
Some options I can think of:
Don’t write the data to the database: only write it to Kafka. Then use a consumer (in Service A) to update the database. This seems like it might not be the most efficient, and will have problems in that the service which the user called cannot immediately see the database changes it should have just created.
Don’t write directly to Kafka: write to the database only, and use something like Debezium to report the change to Kafka. The problem here is that the changes are based on individual database records, whereas the business significant event to store in Kafka might involve a combination of data from multiple tables.
Write to the database first (if that fails, do nothing and just throw the exception). Then, when writing to Kafka, assume that the write might fail. Use the built-in auto-retry functionality to get it to keep trying for a while. If that eventually completely fails, try to write to a dead letter queue and create some sort of manual mechanism for admins to sort it out. And if writing to the DLQ fails (i.e. Kafka is completely down), just log it some other way (e.g. to the database), and again create some sort of manual mechanism for admins to sort it out.
Anyone got any thoughts or advice on the above, or able to correct any mistakes in my assumptions above?
Thanks in advance!
I'd suggest to use a slightly altered variant of approach 2.
Write into your database only, but in addition to the actual table writes, also write "events" into a special table within that same database; these event records would contain the aggregations you need. In the easiest way, you'd simply insert another entity e.g. mapped by JPA, which contains a JSON property with the aggregate payload. Of course this could be automated by some means of transaction listener / framework component.
Then use Debezium to capture the changes just from that table and stream them into Kafka. That way you have both: eventually consistent state in Kafka (the events in Kafka may trail behind or you might see a few events a second time after a restart, but eventually they'll reflect the database state) without the need for distributed transactions, and the business level event semantics you're after.
(Disclaimer: I'm the lead of Debezium; funnily enough I'm just in the process of writing a blog post discussing this approach in more detail)
Here are the posts
https://debezium.io/blog/2018/09/20/materializing-aggregate-views-with-hibernate-and-debezium/
https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/
first of all, I have to say that I’m no Kafka, nor a Spring expert but I think that it’s more a conceptual challenge when writing to independent resources and the solution should be adaptable to your technology stack. Furthermore, I should say that this solution tries to solve the problem without an external component like Debezium, because in my opinion each additional component brings challenges in testing, maintaining and running an application which is often underestimated when choosing such an option. Also not every database can be used as a Debezium-source.
To make sure that we are talking about the same goals, let’s clarify the situation in an simplified airline example, where customers can buy tickets. After a successful order the customer will receive a message (mail, push-notification, …) that is sent by an external messaging system (the system we have to talk with).
In a traditional JMS world with an XA transaction between our database (where we store orders) and the JMS provider it would look like the following: The client sets the order to our app where we start a transaction. The app stores the order in its database. Then the message is sent to JMS and you can commit the transaction. Both operations participate at the transaction even when they’re talking to their own resources. As the XA transaction guarantees ACID we’re fine.
Let’s bring Kafka (or any other resource that is not able to participate at the XA transaction) in the game. As there is no coordinator that syncs both transactions anymore the main idea of the following is to split processing in two parts with a persistent state.
When you store the order in your database you can also store the message (with aggregated data) in the same database (e.g. as JSON in a CLOB-column) that you want to send to Kafka afterwards. Same resource – ACID guaranteed, everything fine so far. Now you need a mechanism that polls your “KafkaTasks”-Table for new tasks that should be send to a Kafka-Topic (e.g. with a timer service, maybe #Scheduled annotation can be used in Spring). After the message has been successfully sent to Kafka you can delete the task entry. This ensures that the message to Kafka is only sent when the order is also successfully stored in application database. Did we achieve the same guarantees as we have when using a XA transaction? Unfortunately, no, as there is still the chance that writing to Kafka works but the deletion of the task fails. In this case the retry-mechanism (you would need one as mentioned in your question) would reprocess the task an sends the message twice. If your business case is happy with this “at-least-once”-guarantee you’re done here with a imho semi-complex solution that could be easily implemented as framework functionality so not everyone has to bother with the details.
If you need “exactly-once” then you cannot store your state in the application database (in this case “deletion of a task” is the “state”) but instead you must store it in Kafka (assuming that you have ACID guarantees between two Kafka topics). An example: Let’s say you have 100 tasks in the table (IDs 1 to 100) and the task job processes the first 10. You write your Kafka messages to their topic and another message with the ID 10 to “your topic”. All in the same Kafka-transaction. In the next cycle you consume your topic (value is 10) and take this value to get the next 10 tasks (and delete the already processed tasks).
If there are easier (in-application) solutions with the same guarantees I’m looking forward to hear from you!
Sorry for the long answer but I hope it helps.
All the approach described above are the best way to approach the problem and are well defined pattern. You can explore these in the links provided below.
Pattern: Transactional outbox
Publish an event or message as part of a database transaction by saving it in an OUTBOX in the database.
http://microservices.io/patterns/data/transactional-outbox.html
Pattern: Polling publisher
Publish messages by polling the outbox in the database.
http://microservices.io/patterns/data/polling-publisher.html
Pattern: Transaction log tailing
Publish changes made to the database by tailing the transaction log.
http://microservices.io/patterns/data/transaction-log-tailing.html
Debezium is a valid answer but (as I've experienced) it can require some extra overhead of running an extra pod and making sure that pod doesn't fall over. This could just be me griping about a few back to back instances where pods OOM errored and didn't come back up, networking rule rollouts dropped some messages, WAL access to an aws aurora db started behaving oddly... It seems that everything that could have gone wrong, did. Not saying Debezium is bad, it's fantastically stable, but often for devs running it becomes a networking skill rather than a coding skill.
As a KISS solution using normal coding solutions that will work 99.99% of the time (and inform you of the .01%) would be:
Start Transaction
Sync save to DB
-> If fail, then bail out.
Async send message to kafka.
Block until the topic reports that it has received the
message.
-> if it times out or fails Abort Transaction.
-> if it succeeds Commit Transaction.
I'd suggest to use a new approach 2-phase message. In this new approach, much less codes are needed, and you don't need Debeziums any more.
https://betterprogramming.pub/an-alternative-to-outbox-pattern-7564562843ae
For this new approach, what you need to do is:
When writing your database, write an event record to an auxiliary table.
Submit a 2-phase message to DTM
Write a service to query whether an event is saved in the auxiliary table.
With the help of DTM SDK, you can accomplish the above 3 steps with 8 lines in Go, much less codes than other solutions.
msg := dtmcli.NewMsg(DtmServer, gid).
Add(busi.Busi+"/TransIn", &TransReq{Amount: 30})
err := msg.DoAndSubmitDB(busi.Busi+"/QueryPrepared", db, func(tx *sql.Tx) error {
return AdjustBalance(tx, busi.TransOutUID, -req.Amount)
})
app.GET(BusiAPI+"/QueryPrepared", dtmutil.WrapHandler2(func(c *gin.Context) interface{} {
return MustBarrierFromGin(c).QueryPrepared(db)
}))
Each of your origin options has its disadvantage:
The user cannot immediately see the database changes it have just created.
Debezium will capture the log of the database, which may be much larger than the events you wanted. Also deployment and maintenance of Debezium is not an easy job.
"built-in auto-retry functionality" is not cheap, it may require much codes or maintenance efforts.

In 2PC what happens in case of failure to commit?

In 2PC what happens if coordinator asks 3 participants to commit and the second one fails with no response to the coordinator.
A client arrives asks the second node for the value, the second node has just come up but did not manage to commit so it returns an old value... Is that a fault of 2PC?
The missing part of 2PC - 2PR(2 Phases Read)
If any of the commit messages lost or doesn't take effect for some reason at some participants, the resource remains at prepared state(which is uncertain), even after a restart, because prepared state must be persisted in non-volatile storage before the coordinator can ever send a commit message.
Any one tries to read any uncertain resource, must refer to the coordinator to determine the exact state of that resource. Once determined, you can choose the right version of value.
For your case, the second node returns the new value(with the help of coordinator to find out new value is really committed, and old value is stale).
---------- edit --------------
Some implementations use Exclusive Lock during prepare phase, which means, once prepared, no other can read or write the prepared resource. So, before participant committed, any one tries to read, must wait.
If the coordinator is asking them to commit, then it means that all participants have already answered that they are prepared to commit. Prepared means that the participant is guaranteed to be able to commit. There is no failure. If the node vanished in a meteor strike, then the node is restored from the HA/DR data and the restored mode resumes the transaction and proceeds with the commit.
Participants in 2PC are durable, persisted coordinators capable of backup and restore. In theory in the case when one of the participants cannot be restored, then every participant, and the coordinators, are all restored back in time before the last coordinated transaction. In practice, all coordinators support enforcing cases when a participant is lost and the transaction will be manually forced into one state or another, see Resolve Transactions Manually or Resolving indoubt transactions manually.

Downsides of CommitAsync() w/o any changes to collection

All the samples usually demonstrate some sort of change to reliable collections with CommitAsync() or rollback in case of a failure. My code is using TryRemoveAsync(), so failure is not a concern (will be retried later).
Is there a significant downside to invoking tx.CommitAsync() when no changes to reliable collections where performed?
Whenever you open a Transaction and execute commands against a collection, these commands acquire locks in the TStore(Collection) and are recorded to the transaction temporary dictionary(Change tracking) and also to the transaction logs, the replicator then will forward these changes to the replicas.
Once you execute the tx.CommitAsync() the temporary records are saved to the disk, the transaction is registered in the logs and then replicated to secondary replicas to also commit and save to the disk, and then the locks are released.
If the collection is not modified, the transaction won't have anything to save\replicate and will just close the transaction.
If you don't call tx.CommitAsync() after the operation, the transaction is aborted and any pending operations(if any) are discarded and the abort operation is written to the logs to notify other replicas.
In both cases, Commit and Abort, will generate logs(and replicate them), The only detail I am not sure is if these logs are also generated when no changes are in place, I assume they are. Regarding performance, the act of reading or attempting to change a collection, will acquire locks and need to be released with a commit or abort, I think these are to biggest impact on your code, because they will prevent other threads of modifying it while you not complete the transaction. In this case I wouldn't be too worried committing an empty transaction.
// Create a new Transaction object for this partition
using (ITransaction tx = base.StateManager.CreateTransaction()) {
//modify the collection
await m_dic.AddAsync(tx, key, value, cancellationToken);
// CommitAsync sends Commit record to log & secondary replicas
// After quorum responds, all locks released
await tx.CommitAsync();
} // If CommitAsync not called, this line will Dispose the transaction and discard the changes
You can find most of these details on this documentation
If you really want to go deep on implementation details to answer this question, I suggest you dig the answer in the source code for the replicator here

Looking for message bus implementations that offer something between full ACID and nothing

Anyone know of a message bus implementation which offers granular control over consistency guarantees? Full ACID is too slow and no ACID is too wrong.
We're currently using Rhino ESB wrapping MSMQ for our messaging. When using durable, transactional messaging with distributed transactions, MSMQ can block the commit for considerable time while it waits on I/O completion.
Our messages fall into two general categories: business logic and denormalisation. The latter account for a significant percentage of message bus traffic.
Business logic messages require the guarantees of full ACID and MSMQ has proven quite adequate for this.
Denormalisation messages:
MUST be durable.
MUST NOT be processed until after the originating transaction completes.
MAY be processed multiple times.
MAY be processed even if the originating transaction rolls back, as long as 2) is adhered to.
(In some specific cases the durability requirements could probably be relaxed, but identifying and handling those cases as exceptions to the rule adds complexity.)
All denormalisation messages are handled in-process so there is no need for IPC.
If the process is restarted, all transactions may be assumed to have completed (committed or rolled back) and all denormalisation messages not yet processed must be recovered. It is acceptable to replay denormalisation messages which were already processed.
As far as I can tell, messaging systems which deal with transactions tend to offer a choice between full ACID or nothing, and ACID carries a performance penalty. We're seeing calls to TransactionScope#Commit() taking as long as a few hundred milliseconds in some cases depending on the number of messages sent.
Using a non-transactional message queue causes messages to be processed before their originating transaction completes, resulting in consistency problems.
Another part of our system which has similar consistency requirements but lower complexity is already using a custom implementation of something akin to a transaction log, and generalising that for this use case is certainly an option, but I'd rather not implement a low-latency, concurrent, durable, transactional messaging system myself if I don't have to :P
In case anyone's wondering, the reason for requiring durability of denormalisation messages is that detecting desyncs and fixing desyncs can be extremely difficult and extremely expensive respectively. People do notice when something's slightly wrong and a page refresh doesn't fix it, so ignoring desyncs isn't an option.
It's not exactly the answer you're looking for, but Jonathan Oliver has written extensively on how to avoid using distributed transactions in messaging and yet maintain transactional integrity:
http://blog.jonathanoliver.com/2011/04/how-i-avoid-two-phase-commit/
http://blog.jonathanoliver.com/2011/03/removing-2pc-two-phase-commit/
http://blog.jonathanoliver.com/2010/04/idempotency-patterns/
Not sure if this helps you but, hey.
It turns out that MSMQ+SQL+DTC don't even offer the consistency guarantees we need. We previously encountered a problem where messages were being processed before the distributed transaction which queued them had been committed to the database, resulting in out-of-date reads. This is a side-effect of using ReadCommitted isolation to consume the queue, since:
Start transaction A.
Update database table in A.
Queue message in A.
Request commit of A.
Message queue commits A
Start transaction B.
Read message in B.
Read database table in B, using ReadCommitted <- gets pre-A data.
Database commits A.
Our requirement is that B's read of the table block on A's commit, which requires Serializable transactions, which carries a performance penalty.
It looks like the normal thing to do is indeed to implement the necessary constraints and guarantees oneself, even though it sounds like reinventing the wheel.
Anyone got any comments on this?
If you want to do this by hand, here is a reliable approach. It satisfies (1) and (2), and it doesn't even need the liberties that you allow in (3) and (4).
Producer (business logic) starts transaction A.
Insert/update whatever into one or more tables.
Insert a corresponding message into PrivateMessageTable (part of the domain, and unshared, if you will). This is what will be distributed.
Commit transaction A. Producer has now simply and reliably performed its writes including the insertion of a message, or rolled everything back.
Dedicated distributer job queries a batch of unprocessed messages from PrivateMessageTable.
Distributer starts transaction B.
Mark the unprocessed messages as processed, rolling back if the number of rows modified is different than expected (two instances running at the same time?).
Insert a public representation of the messages into PublicMessageTable (a publically exposed table, in whatever way). Assign new, strictly sequential Ids to the public representations. Because only one process is doing these inserts, this can be guaranteed. Note that the table must be on the same host to avoid 2PC.
Commit transaction B. Distributor has now distributed each message to the public table exactly once, with strictly sequantial Ids.
A consumer (there can be several) queries the next batch of messages from PublicMessageTable with Id greater than its own LastSeenId.
Consumer starts transaction C.
Consumer inserts its own representation of the messages into its own table ConsumerMessageTable (thus advancing LastSeenId). Insert-ignore can help protect against multiple instances running. Note that this table can be in a completely different server.
Commit transaction C. Consumer has now consumed each message exactly once, in the same order the messages were made publically available, without ever skipping a message.
We can do whatever we want based on the consumed messages.
Of course, this requires very careful implementation.
It is even suitable for database clusters, as long as there is only a single write node, and both reads and writes perform causality checks. It may well be that having one of these is sufficient, but I'd have to consider the implications more carefully to make that claim.