ActiveMQ Artemis Brokers behavior on trasaction commit error due to failover - activemq-artemis

We have encountered the following problem aka TransactionRolledBackException:
Transaction completion in doubt due to failover. Forcing rollback of ID:***** at Apache.NMS.ActiveMQ.Connection.SyncRequest(Command command, TimeSpan requestTimeout)
at Apache.NMS.ActiveMQ.Connection.SyncRequest(Command command)
at Apache.NMS.ActiveMQ.TransactionContext.Commit()
at Apache.NMS.ActiveMQ.Session.DoCommit()
at Apache.NMS.ActiveMQ.Session.Commit()
Here's our topology: we have 6 nodes (n1, .., n6) where nodes n1, n3, n5 are clustered master nodes, and n2 is slave of n1, n4 is slave of n3, n6 is slave of n5.
We use transactions in sending and consuming message processes, which means that this exception could be thrown in both. I'm looking for best practices of handling TransactionRolledBackException without needing to rollback all the business logic executed before commit. We have our own application-level library, which allows fast and easy connection to the broker for sending/receiving messages, and we would like to implement into it some code which would handle TransactionRolledBackException for every application using it and provide "once and only once" delivery.
Thanks for your answers in advance

The transaction itself is there to provide the guarantees you're looking for (i.e. "once and only once").
Typically in the kind of situation you describe (where the consumption or production of a message or groups of messages needs to happen atomically with other business operations) all the resources involved in the atomic operation (e.g. JMS consumption, JMS production, JDBC inserts, JDBC deletes, etc.) are enlisted into the same transaction and managed by a transaction manager. That way when one of the individual operations fails then all the other work is automatically undone as well. Transactions like this are typically called XA transactions and are supported in both Spring and Java EE via the Jakarta/Java Transactions API (i.e. JTA).
In Java EE it is very common, for example, to have a Message Driven Bean (i.e. MDB) consume a message, work with a database, and then send another message so that every piece of that work is done atomically. If sending the message at the end fails then the database changes are reversed and the message the MDB consumed originally is put back on the queue to be consumed again in the future. In this way the message represents a kind of "unit of work" which is a simple, powerful pattern for managing business processes.

Related

Synchronising transactions between database and Kafka producer

We have a micro-services architecture, with Kafka used as the communication mechanism between the services. Some of the services have their own databases. Say the user makes a call to Service A, which should result in a record (or set of records) being created in that service’s database. Additionally, this event should be reported to other services, as an item on a Kafka topic. What is the best way of ensuring that the database record(s) are only written if the Kafka topic is successfully updated (essentially creating a distributed transaction around the database update and the Kafka update)?
We are thinking of using spring-kafka (in a Spring Boot WebFlux service), and I can see that it has a KafkaTransactionManager, but from what I understand this is more about Kafka transactions themselves (ensuring consistency across the Kafka producers and consumers), rather than synchronising transactions across two systems (see here: “Kafka doesn't support XA and you have to deal with the possibility that the DB tx might commit while the Kafka tx rolls back.”). Additionally, I think this class relies on Spring’s transaction framework which, at least as far as I currently understand, is thread-bound, and won’t work if using a reactive approach (e.g. WebFlux) where different parts of an operation may execute on different threads. (We are using reactive-pg-client, so are manually handling transactions, rather than using Spring’s framework.)
Some options I can think of:
Don’t write the data to the database: only write it to Kafka. Then use a consumer (in Service A) to update the database. This seems like it might not be the most efficient, and will have problems in that the service which the user called cannot immediately see the database changes it should have just created.
Don’t write directly to Kafka: write to the database only, and use something like Debezium to report the change to Kafka. The problem here is that the changes are based on individual database records, whereas the business significant event to store in Kafka might involve a combination of data from multiple tables.
Write to the database first (if that fails, do nothing and just throw the exception). Then, when writing to Kafka, assume that the write might fail. Use the built-in auto-retry functionality to get it to keep trying for a while. If that eventually completely fails, try to write to a dead letter queue and create some sort of manual mechanism for admins to sort it out. And if writing to the DLQ fails (i.e. Kafka is completely down), just log it some other way (e.g. to the database), and again create some sort of manual mechanism for admins to sort it out.
Anyone got any thoughts or advice on the above, or able to correct any mistakes in my assumptions above?
Thanks in advance!
I'd suggest to use a slightly altered variant of approach 2.
Write into your database only, but in addition to the actual table writes, also write "events" into a special table within that same database; these event records would contain the aggregations you need. In the easiest way, you'd simply insert another entity e.g. mapped by JPA, which contains a JSON property with the aggregate payload. Of course this could be automated by some means of transaction listener / framework component.
Then use Debezium to capture the changes just from that table and stream them into Kafka. That way you have both: eventually consistent state in Kafka (the events in Kafka may trail behind or you might see a few events a second time after a restart, but eventually they'll reflect the database state) without the need for distributed transactions, and the business level event semantics you're after.
(Disclaimer: I'm the lead of Debezium; funnily enough I'm just in the process of writing a blog post discussing this approach in more detail)
Here are the posts
https://debezium.io/blog/2018/09/20/materializing-aggregate-views-with-hibernate-and-debezium/
https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/
first of all, I have to say that I’m no Kafka, nor a Spring expert but I think that it’s more a conceptual challenge when writing to independent resources and the solution should be adaptable to your technology stack. Furthermore, I should say that this solution tries to solve the problem without an external component like Debezium, because in my opinion each additional component brings challenges in testing, maintaining and running an application which is often underestimated when choosing such an option. Also not every database can be used as a Debezium-source.
To make sure that we are talking about the same goals, let’s clarify the situation in an simplified airline example, where customers can buy tickets. After a successful order the customer will receive a message (mail, push-notification, …) that is sent by an external messaging system (the system we have to talk with).
In a traditional JMS world with an XA transaction between our database (where we store orders) and the JMS provider it would look like the following: The client sets the order to our app where we start a transaction. The app stores the order in its database. Then the message is sent to JMS and you can commit the transaction. Both operations participate at the transaction even when they’re talking to their own resources. As the XA transaction guarantees ACID we’re fine.
Let’s bring Kafka (or any other resource that is not able to participate at the XA transaction) in the game. As there is no coordinator that syncs both transactions anymore the main idea of the following is to split processing in two parts with a persistent state.
When you store the order in your database you can also store the message (with aggregated data) in the same database (e.g. as JSON in a CLOB-column) that you want to send to Kafka afterwards. Same resource – ACID guaranteed, everything fine so far. Now you need a mechanism that polls your “KafkaTasks”-Table for new tasks that should be send to a Kafka-Topic (e.g. with a timer service, maybe #Scheduled annotation can be used in Spring). After the message has been successfully sent to Kafka you can delete the task entry. This ensures that the message to Kafka is only sent when the order is also successfully stored in application database. Did we achieve the same guarantees as we have when using a XA transaction? Unfortunately, no, as there is still the chance that writing to Kafka works but the deletion of the task fails. In this case the retry-mechanism (you would need one as mentioned in your question) would reprocess the task an sends the message twice. If your business case is happy with this “at-least-once”-guarantee you’re done here with a imho semi-complex solution that could be easily implemented as framework functionality so not everyone has to bother with the details.
If you need “exactly-once” then you cannot store your state in the application database (in this case “deletion of a task” is the “state”) but instead you must store it in Kafka (assuming that you have ACID guarantees between two Kafka topics). An example: Let’s say you have 100 tasks in the table (IDs 1 to 100) and the task job processes the first 10. You write your Kafka messages to their topic and another message with the ID 10 to “your topic”. All in the same Kafka-transaction. In the next cycle you consume your topic (value is 10) and take this value to get the next 10 tasks (and delete the already processed tasks).
If there are easier (in-application) solutions with the same guarantees I’m looking forward to hear from you!
Sorry for the long answer but I hope it helps.
All the approach described above are the best way to approach the problem and are well defined pattern. You can explore these in the links provided below.
Pattern: Transactional outbox
Publish an event or message as part of a database transaction by saving it in an OUTBOX in the database.
http://microservices.io/patterns/data/transactional-outbox.html
Pattern: Polling publisher
Publish messages by polling the outbox in the database.
http://microservices.io/patterns/data/polling-publisher.html
Pattern: Transaction log tailing
Publish changes made to the database by tailing the transaction log.
http://microservices.io/patterns/data/transaction-log-tailing.html
Debezium is a valid answer but (as I've experienced) it can require some extra overhead of running an extra pod and making sure that pod doesn't fall over. This could just be me griping about a few back to back instances where pods OOM errored and didn't come back up, networking rule rollouts dropped some messages, WAL access to an aws aurora db started behaving oddly... It seems that everything that could have gone wrong, did. Not saying Debezium is bad, it's fantastically stable, but often for devs running it becomes a networking skill rather than a coding skill.
As a KISS solution using normal coding solutions that will work 99.99% of the time (and inform you of the .01%) would be:
Start Transaction
Sync save to DB
-> If fail, then bail out.
Async send message to kafka.
Block until the topic reports that it has received the
message.
-> if it times out or fails Abort Transaction.
-> if it succeeds Commit Transaction.
I'd suggest to use a new approach 2-phase message. In this new approach, much less codes are needed, and you don't need Debeziums any more.
https://betterprogramming.pub/an-alternative-to-outbox-pattern-7564562843ae
For this new approach, what you need to do is:
When writing your database, write an event record to an auxiliary table.
Submit a 2-phase message to DTM
Write a service to query whether an event is saved in the auxiliary table.
With the help of DTM SDK, you can accomplish the above 3 steps with 8 lines in Go, much less codes than other solutions.
msg := dtmcli.NewMsg(DtmServer, gid).
Add(busi.Busi+"/TransIn", &TransReq{Amount: 30})
err := msg.DoAndSubmitDB(busi.Busi+"/QueryPrepared", db, func(tx *sql.Tx) error {
return AdjustBalance(tx, busi.TransOutUID, -req.Amount)
})
app.GET(BusiAPI+"/QueryPrepared", dtmutil.WrapHandler2(func(c *gin.Context) interface{} {
return MustBarrierFromGin(c).QueryPrepared(db)
}))
Each of your origin options has its disadvantage:
The user cannot immediately see the database changes it have just created.
Debezium will capture the log of the database, which may be much larger than the events you wanted. Also deployment and maintenance of Debezium is not an easy job.
"built-in auto-retry functionality" is not cheap, it may require much codes or maintenance efforts.

Kafka Streams and RPC: is calling REST service in map() operator considered an anti-pattern?

The naive approach for implementing the use case of enriching an incoming stream of events stored in Kafka with reference data - is by calling in map() operator an external service REST API that provides this reference data, for each incoming event.
eventStream.map((key, event) -> /* query the external service here, then return the enriched event */)
Another approach is to have second events stream with reference data and store it in KTable that will be a lightweight embedded "database" then join main event stream with it.
KStream<String, Object> eventStream = builder.stream(..., "event-topic");
KTable<String, Object> referenceDataTable = builder.table(..., "reference-data-topic");
KTable<String, Object> enrichedEventStream = eventStream
.leftJoin(referenceDataTable , (event, referenceData) -> /* return the enriched event */)
.map((key, enrichedEvent) -> new KeyValue<>(/* new key */, enrichedEvent)
.to("enriched-event-topic", ...);
Can the "naive" approach be considered an anti-pattern? Can the "KTable" approach be recommended as the preferred one?
Kafka can easily manage millions of messages per minute. Service that is called from the map() operator should be capable of handling high load too and also highly-available. These are extra requirements for the service implementation. But if the service satisfies these criteria can the "naive" approach be used?
Yes, it is ok to do RPC inside Kafka Streams operations such as map() operation. You just need to be aware of the pros and cons of doing so, see below. Also, you should do any such RPC calls synchronously from within your operations (I won't go into details here why; if needed, I'd suggest to create a new question).
Pros of doing RPC calls from within Kafka Streams operations:
Your application will fit more easily into an existing architecture, e.g. one where the use of REST APIs and request/response paradigms is common place. This means that you can make more progress quickly for a first proof-of-concept or MVP.
The approach is, in my experience, easier to understand for many developers (particularly those who are just starting out with Kafka) because they are familiar with doing RPC calls in this manner from their past projects. Think: it helps to move gradually from request-response architectures to event-driven architectures (powered by Kafka).
Nothing prevents you from starting with RPC calls and request-response, and then later migrating to a more Kafka-idiomatic approach.
Cons:
You are coupling the availability, scalability, and latency/throughput of your Kafka Streams powered application to the availability, scalability, and latency/throughput of the RPC service(s) you are calling. This is relevant also for thinking about SLAs.
Related to the previous point, Kafka and Kafka Streams scale very well. If you are running at large scale, your Kafka Streams application might end up DDoS'ing your RPC service(s) because the latter probably can't scale as much as Kafka. You should be able to judge pretty easily whether or not this is a problem for you in practice.
An RPC call (like from within map()) is a side-effect and thus a black box for Kafka Streams. The processing guarantees of Kafka Streams do not extend to such side effects.
Example: Kafka Streams (by default) processes data based on event-time (= based on when an event happened in the real world), so you can easily re-process old data and still get back the same results as when the old data was still new. But the RPC service you are calling during such reprocessing might return a different response than "back then". Ensuring the latter is your responsibility.
Example: In the case of failures, Kafka Streams will retry operations, and it will guarantee exactly-once processing (if enabled) even in such situations. But it can't guarantee, by itself, that an RPC call you are doing from within map() will be idempotent. Ensuring the latter is your responsibility.
Alternatives
In case you are wondering what other alternatives you have: If, for example, you are doing RPC calls for looking up data (e.g. for enriching an incoming stream of events with side/context information), you can address the downsides above by making the lookup data available in Kafka directly. If the lookup data is in MySQL, you can setup a Kafka connector to continuously ingest the MySQL data into a Kafka topic (think: CDC). In Kafka Streams, you can then read the lookup data into a KTable and perform the enrichment of your input stream via a stream-table join.
I suspect most of the advice you hear from the internet is along the lines of, "OMG, if this REST call takes 200ms, how wil I ever process 100,000 Kafka messages per second to keep up with my demand?"
Which is technically true: even if you scale your servers up for your REST service, if responses from this app routinely take 200ms - because it talks to a server 70ms away (speed of light is kinda slow, if that server is across the continent from you...) and the calling microservice takes 130ms even if you measure right at the source....
With kstreams the problem may be worse than it appears. Maybe you get 100,000 messages a second coming into your stream pipeline, but some kstream operator flatMaps and that operation in your app creates 2 messages for every one object... so now you really have 200,000 messages a second crashing through your REST server.
BUT maybe you're using Kstreams in an app that has 100 messages a second, or you can partition your data so that you get a message per partition maybe even just once a second. In that case, you might be fine.
Maybe your Kafka data just needs to go somewhere else: ie the end of the stream is back into a Good Ol' RDMS. In which case yes, there's some careful balancing there on the best way to deal with potentially "slow" systems, while making sure you don't DDOS yourself, while making sure you can work your way out of a backlog.
So is it an anti-pattern? Eh, probably, if your Kafka cluster is LinkedIn size. Does it matter for you? Depends on how many messages/second you need to drive, how fast your REST service really is, how efficiently it can scale (ie your new kstreams pipeline suddenly delivers 5x the normal traffic to it...)

How to implement a microservice Event Driven architecture with Spring Cloud Stream Kafka and Database per service

I am trying to implement an event driven architecture to handle distributed transactions. Each service has its own database and uses Kafka to send messages to inform other microservices about the operations.
An example:
Order service -------> | Kafka |------->Payment Service
| |
Orders MariaDB DB Payment MariaDB Database
Order receives an order request. It has to store the new Order in its DB and publish a message so that Payment Service realizes it has to charge for the item:
private OrderBusiness orderBusiness;
#PostMapping
public Order createOrder(#RequestBody Order order){
logger.debug("createOrder()");
//a.- Save the order in the DB
orderBusiness.createOrder(order);
//b. Publish in the topic so that Payment Service charges for the item.
try{
orderSource.output().send(MessageBuilder.withPayload(order).build());
}catch(Exception e){
logger.error("{}", e);
}
return order;
}
These are my doubts:
Steps a.- (save in Order DB) and b.- (publish the message) should be performed in a transaction, atomically. How can I achieve that?
This is related to the previous one: I send the message with: orderSource.output().send(MessageBuilder.withPayload(order).build()); This operations is asynchronous and ALWAYS returns true, no matter if the Kafka broker is down. How can I know that the message has reached the Kafka broker?
Steps a.- (save in Order DB) and b.- (publish the message) should be
performed in a transaction, atomically. How can I achieve that?
Kafka currently does not support transactions (and thus also no rollback or commit), which you'd need to synchronize something like this. So in short: you can't do what you want to do. This will change in the near-ish future, when KIP-98 is merged, but that might take some time yet. Also, even with transactions in Kafka, an atomic transaction across two systems is a very hard thing to do, everything that follows will only be improved upon by transactional support in Kafka, it will still not entirely solve your issue. For that you would need to look into implementing some form of two phase commit across your systems.
You can get somewhat close by configuring producer properties, but in the end you will have to chose between at least once or at most once for one of your systems (MariaDB or Kafka).
Let's start with what you can do in Kafka do ensure delivery of a message and further down we'll dive into your options for the overall process flow and what the consequences are.
Guaranteed delivery
You can configure how many brokers have to confirm receipt of your messages, before the request is returned to you with the parameter acks: by setting this to all you tell the broker to wait until all replicas have acknowledged your message before returning an answer to you. This is still no 100% guarantee that your message will not be lost, since it has only been written to the page cache yet and there are theoretical scenarios with a broker failing before it is persisted to disc, where the message might still be lost. But this is as good a guarantee as you are going to get.
You can further reduce the risk of data loss by lowering the intervall at which brokers force an fsync to disc (emphasized text and/or flush.ms) but please be aware, that these values can bring with them heavy performance penalties.
In addition to these settings you will need to wait for your Kafka producer to return the response for your request to you and check whether an exception occurred. This sort of ties into the second part of your question, so I will go into that further down.
If the response is clean, you can be as sure as possible that your data got to Kafka and start worrying about MariaDB.
Everything we have covered so far only addresses how to ensure that Kafka got your messages, but you also need to write data into MariaDB, and this can fail as well, which would make it necessary to recall a message you potentially already sent to Kafka - and this you can't do.
So basically you need to choose one system in which you are better able to deal with duplicates/missing values (depending on whether or not you resend partial failures) and that will influence the order you do things in.
Option 1
In this option you initialize a transaction in MariaDB, then send the message to Kafka, wait for a response and if the send was successful you commit the transaction in MariaDB. Should sending to Kafka fail, you can rollback your transaction in MariaDB and everything is dandy.
If however, sending to Kafka is successful and your commit to MariaDB fails for some reason, then there is no way of getting back the message from Kafka. So you will either be missing a message in MariaDB or have a duplicate message in Kafka, if you resend everything later on.
Option 2
This is pretty much just the other way around, but you are probably better able to delete a message that was written in MariaDB, depending on your data model.
Of course you can mitigate both approaches by keeping track of failed sends and retrying just these later on, but all of that is more of a bandaid on the bigger issue.
Personally I'd go with approach 1, since the chance of a commit failing should be somewhat smaller than the send itself and implement some sort of dupe check on the other side of Kafka.
This is related to the previous one: I send the message with:
orderSource.output().send(MessageBuilder.withPayload(order).build());
This operations is asynchronous and ALWAYS returns true, no matter if
the Kafka broker is down. How can I know that the message has reached
the Kafka broker?
Now first of, I'll admit I am unfamiliar with Spring, so this may not be of use to you, but the following code snippet illustrates one way of checking produce responses for exceptions.
By calling flush you block until all sends have finished (and either failed or succeeded) and then check the results.
Producer<String, String> producer = new KafkaProducer<>(myConfig);
final ArrayList<Exception> exceptionList = new ArrayList<>();
for(MessageType message : messages){
producer.send(new ProducerRecord<String, String>("myTopic", message.getKey(), message.getValue()), new Callback() {
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
exceptionList.add(exception);
}
}
});
}
producer.flush();
if (!exceptionList.isEmpty()) {
// do stuff
}
I think the proper way for implementing Event Sourcing is by having Kafka be filled directly from events pushed by a plugin that reads from the RDBMS binlog e.g using Confluent BottledWater (https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/) or more active Debezium (http://debezium.io/). Then consuming Microservices can listen to those events, consume them and act on their respective databases being eventually consistent with the RDBMS database.
Have a look here to my full answer for a guideline:
https://stackoverflow.com/a/43607887/986160

How are distributed queues architectured?

What are architectural patterns/solutions that make distributed queues tick?
Please share for both ordered and non-ordered types.
You can think of the backend of a queue as a replicated database. (I am assuming the queues you are talking about consider themselves as durable: when they accept a message, they guarantee at least once delivery.)
As a replicated database, the message queue backend uses a replication protocol to make sure the message is on at least N hosts before acknowledging receipt to the sender. Common replication protocols are 2PC, 3PC, and consensus protocols like Raft, Multi-Paxos, and Chain Replication.
To send a message to a receiver, you have to do almost the same replication with a message lease. The queue server reserves the message for a certain period of time; it sends the message to the receiver, and if/when the receiver ackowledges receipt of the message the server deletes the message. Otherwise, the servers will resend the message to the next available receiver.
Some message queues stop there, others add lots of bells and whistles. SQS is one queue implementation that doesn't add many bells and whistles so that it can scale more. It allows them, for example, to shard the queue so that one SQS queue is actually made of many—even thousands—of these queues as described above. As an aside, I once heard one SQS developer ask another "What does 'ordering' mean when you are accepting millions of messages per second?"
That being said, some queues do provide strong ordering guarantees. (I have implemented a couple of these types of systems.) The cost of this is less ability to scale. To maintain ordering the queue's complexity goes way up. The queue has to maintain an ordered log of all the messages, and have the same ordering replicated across its servers. This is much much harder than unordered replication. Ordered queue systems typically elect a master to maintain the ordering and all messages are routed to the master. They also tend to use the more complex protocols for replication.

Looking for message bus implementations that offer something between full ACID and nothing

Anyone know of a message bus implementation which offers granular control over consistency guarantees? Full ACID is too slow and no ACID is too wrong.
We're currently using Rhino ESB wrapping MSMQ for our messaging. When using durable, transactional messaging with distributed transactions, MSMQ can block the commit for considerable time while it waits on I/O completion.
Our messages fall into two general categories: business logic and denormalisation. The latter account for a significant percentage of message bus traffic.
Business logic messages require the guarantees of full ACID and MSMQ has proven quite adequate for this.
Denormalisation messages:
MUST be durable.
MUST NOT be processed until after the originating transaction completes.
MAY be processed multiple times.
MAY be processed even if the originating transaction rolls back, as long as 2) is adhered to.
(In some specific cases the durability requirements could probably be relaxed, but identifying and handling those cases as exceptions to the rule adds complexity.)
All denormalisation messages are handled in-process so there is no need for IPC.
If the process is restarted, all transactions may be assumed to have completed (committed or rolled back) and all denormalisation messages not yet processed must be recovered. It is acceptable to replay denormalisation messages which were already processed.
As far as I can tell, messaging systems which deal with transactions tend to offer a choice between full ACID or nothing, and ACID carries a performance penalty. We're seeing calls to TransactionScope#Commit() taking as long as a few hundred milliseconds in some cases depending on the number of messages sent.
Using a non-transactional message queue causes messages to be processed before their originating transaction completes, resulting in consistency problems.
Another part of our system which has similar consistency requirements but lower complexity is already using a custom implementation of something akin to a transaction log, and generalising that for this use case is certainly an option, but I'd rather not implement a low-latency, concurrent, durable, transactional messaging system myself if I don't have to :P
In case anyone's wondering, the reason for requiring durability of denormalisation messages is that detecting desyncs and fixing desyncs can be extremely difficult and extremely expensive respectively. People do notice when something's slightly wrong and a page refresh doesn't fix it, so ignoring desyncs isn't an option.
It's not exactly the answer you're looking for, but Jonathan Oliver has written extensively on how to avoid using distributed transactions in messaging and yet maintain transactional integrity:
http://blog.jonathanoliver.com/2011/04/how-i-avoid-two-phase-commit/
http://blog.jonathanoliver.com/2011/03/removing-2pc-two-phase-commit/
http://blog.jonathanoliver.com/2010/04/idempotency-patterns/
Not sure if this helps you but, hey.
It turns out that MSMQ+SQL+DTC don't even offer the consistency guarantees we need. We previously encountered a problem where messages were being processed before the distributed transaction which queued them had been committed to the database, resulting in out-of-date reads. This is a side-effect of using ReadCommitted isolation to consume the queue, since:
Start transaction A.
Update database table in A.
Queue message in A.
Request commit of A.
Message queue commits A
Start transaction B.
Read message in B.
Read database table in B, using ReadCommitted <- gets pre-A data.
Database commits A.
Our requirement is that B's read of the table block on A's commit, which requires Serializable transactions, which carries a performance penalty.
It looks like the normal thing to do is indeed to implement the necessary constraints and guarantees oneself, even though it sounds like reinventing the wheel.
Anyone got any comments on this?
If you want to do this by hand, here is a reliable approach. It satisfies (1) and (2), and it doesn't even need the liberties that you allow in (3) and (4).
Producer (business logic) starts transaction A.
Insert/update whatever into one or more tables.
Insert a corresponding message into PrivateMessageTable (part of the domain, and unshared, if you will). This is what will be distributed.
Commit transaction A. Producer has now simply and reliably performed its writes including the insertion of a message, or rolled everything back.
Dedicated distributer job queries a batch of unprocessed messages from PrivateMessageTable.
Distributer starts transaction B.
Mark the unprocessed messages as processed, rolling back if the number of rows modified is different than expected (two instances running at the same time?).
Insert a public representation of the messages into PublicMessageTable (a publically exposed table, in whatever way). Assign new, strictly sequential Ids to the public representations. Because only one process is doing these inserts, this can be guaranteed. Note that the table must be on the same host to avoid 2PC.
Commit transaction B. Distributor has now distributed each message to the public table exactly once, with strictly sequantial Ids.
A consumer (there can be several) queries the next batch of messages from PublicMessageTable with Id greater than its own LastSeenId.
Consumer starts transaction C.
Consumer inserts its own representation of the messages into its own table ConsumerMessageTable (thus advancing LastSeenId). Insert-ignore can help protect against multiple instances running. Note that this table can be in a completely different server.
Commit transaction C. Consumer has now consumed each message exactly once, in the same order the messages were made publically available, without ever skipping a message.
We can do whatever we want based on the consumed messages.
Of course, this requires very careful implementation.
It is even suitable for database clusters, as long as there is only a single write node, and both reads and writes perform causality checks. It may well be that having one of these is sufficient, but I'd have to consider the implications more carefully to make that claim.