Is the ClearAsync method atomic? - azure-service-fabric

I had some questions about the IReliableCollection.ClearAsync method. I could not find the answers for this in the documentation.
Can we assume it is an atomic operation?
What happens if the node hosting the partition with the reliable collection crashes during, or right after the method is called? Is it possible for the collection to contain a subset of the older items due to such crashes?
Appreciate any help!

The idea of reliable collections is that all operations, absent exceptions, are persistent and replicated. The confusion is probably arising from the fact that ClearAsync doesn't take a ITransaction parameter (doesn't support undoing), but nothing in the documentation indicates it offers anything less than the strong consistency guarantees of every other operation.
In short, the answer to your first question is yes, you can consider it atomic; the second question follows.
Source.

Related

Event Sourcing - How to query inside a command?

We would like to be able to read state inside a command use case.
We could get the state from event store for the specific aggregate, but what about querying aggregates by field(not id) or performing more complicated queries, that are not fitted for the event store?
The approach we were thinking was to use our read model for those cases as well and not only for query use cases.
This might be inconsistent, so a solution could be to have the latest version of the aggregate stored in both write/read models, in order to be able to tell if the state is correct or stale.
Does this make sense and if yes, if we need to get state by Id should we use event store or the read model?
If you want the absolute latest state of an event-sourced aggregate, you're going to have to read the latest snapshot (assuming that you are snapshotting) and then replay events since that snapshot from the event store. You can be aggressive about snapshotting (conceivably even saving a snapshot after every command), but you're giving away some write performance to make the read faster.
Updating the read model directly is conceivably possible, though that level of coupling is something that should be considered very carefully. Note also that you will very likely need some sort of two-phase commit to ensure that the read model is only updated when the write model is updated and vice versa. I strongly suggest considering why you're using CQRS/ES in this project, because you are quite possibly undermining that reason by doing this sort of thing.
In general, if you need a query for processing a particular command, it's likely that query will generally be the same, i.e. you don't need free-form query support. In that case, you can often have a read model that's tuned for exactly that query and which only cares about events which could affect that query: often a fairly small subset of the events. The finer-grained the read model, the easier it is to keep in sync (if it ignores 99% of events, for instance, it can't really fall that far behind).
Needing to make complex queries as part of command processing could also be a sign that your aggregate boundaries aren't right and could do with a re-examination.
Does this make sense
Maybe. Let's start with
This might be inconsistent
Yup, they might be. So what?
We typically respond to a query by sending an unlocked copy of the answer. In other words, it's possible that the actual information in the write model will change after this response is dispatched but before the response arrives at its destination. The client will be looking at a copy of the answer taken from the past.
So we might reasonably ask how much better it is to get information no more than one minute old compared to information no more than five minutes old. If the difference in value is pennies, then you should probably deploy the five minute version. If the difference is millions of dollars, then you're in a good position to negotiate a real budget to solve the problem.
For processing a command in our own write model, that kind of inconsistency isn't usually acceptable or wise. But neither of the two common answers require keeping the read and write models synchronized. The most common answer is to just work with the write model alone. The less common answer is to grab a snapshot out of a cache, and then apply any additional events to it to bring it up to date. The latter approach is "just" a performance optimization (first rule: don't.)
The variation that trips everyone up is trying to process a command somewhere else, enforcing a consistency rule on our data here. Once again, you need a really clear picture of how valuable the consistency is to the business. If it's really important, that may be a signal that the information in question shouldn't be split into two different piles - you may be working with the wrong underlying data model.
Possibly useful references
Pat Helland Data on the Outside Versus Data on the Inside
Udi Dahan Race Conditions Don't Exist

Do Firebase/Firestore Transactions create internal queues?

I'm wondering if transactions (https://firebase.google.com/docs/firestore/manage-data/transactions) are viable tools to use in something like a ticketing system where users maybe be attempting to read/write to the same collection/document and whoever made the request first will be handled first and second will be handled second etc.
If not what would be a good structure for such a need with firestore?
Transactions just guarantee atomic consistent update among the documents involved in the transaction. It doesn't guarantee the order in which those transactions complete, as the transaction handler might get retried in the face of contention.
Since you tagged this question with google-cloud-functions (but didn't mention it in your question), it sounds like you might be considering writing a database trigger to handle incoming writes. Cloud Functions triggers also do not guarantee any ordering when under load.
Ordering of any kind at the scale on which Firestore and other Google Cloud products operate is a really difficult problem to solve (please read that link to get a sense of that). There is not a simple database structure that will impose an order where changes are made. I suggest you think carefully about your need for ordering, and come up with a different solution.
The best indication of order you can get is probably by adding a server timestamp to individual documents, but you will still have to figure out how to process them. The easiest thing might be to have a backend periodically query the collection, ordered by that timestamp, and process things in that order, in batch.

Can PostgreSQL Serializable be safely mixed with lower isolation levels?

I'm trying to figure out what the behaviour is when mixing Serialisable and lower levels of isolation, but not having much luck. Specifically, we have a reorder-queue-items transaction that is currently taking the table lock to ensure no new items will be added to it in the meantime. Will making it Serialised and reading the head of the queue (to ensure a dependency on the result) provide the same level of guarantee, even if other transactions operate at Repeatable Read?
Using PostgreSQL 9.5.
PS. I know there's another question with a similar title, but it's over 4 years old, much less specific in what it asks, and the only answer is essentially unsourced given the cited materials, and doesn't truly answer the question.
To guarantee serializability, all participating transactions should use the SERIALIZABLE isolation level.
But I am not certain that serializable isolation is the solution to your problem. It won't block anybody from reading from or writing to the queue, it will make some transactions fail, and the failing transaction might well be the one that is trying to reorder the queue.
I think that a lock is the way to go in such a case. Using serializable transactions will affect performance just as bad, the difference being that rather than waiting, you have to keep retrying transactions until reordering succeeds.

NoSQL databases: what about read consistency?

From what I can make out NoSQL databases might be a good option for high intensity data read applications, but are a less good fit if you need to do also do a lot data updates and transactionality is very important to you (what with there being no ACID compliance). Right? Too simplistic maybe.
But anyway, supposing I'm partly right at least I'm now concerned about how NoSQL databases maintain a "read consistent" view of the data that you're either reading or writing. Or do they? And if they don't, isn't that a really big problem?
I mean, if the data that you're reading (or updating) is changing as you read it then you're potentially going to get an inconsistent/dirty result set. Coming from an Oracle rdbms background, where all this is just handled for you, I find it confusing how the lack of read consistency is anything but a big problem. Could well be though that I'm missing some key point about all this. Can someone set me straight?
I am a developer on the Oracle NoSQL Database and will answer your question relative to that particular NoSQL system.
The Oracle NoSQL Database API allows the programmer to specify -- with each API call -- the level of read consistency. The four possible values, ranging from strictest to loosest, are Absolute, Time, Version, and None. Absolute says to always read from the replication master so that the most current value is returned. "Time" says that the system can return a value from any replica that is at least within a certain time delta of the master (e.g. read the value from any replica that is within 2 seconds of the master). Every read and write call to the system returns a "version handle". This version handle may be passed into any read call when Consistency.Version is specified and it tells the system to read from any replica which is at least as up to date as that version. This is useful for Read Modify Write (aka CAS) scenarios. The last value, Consistency.None says that any replica can be used (i.e. there is no consistency guaranteed).
I hope this is helpful.
Charles Lamb
A NoSQL database can be read-consistent, although it's generally not a big problem if it's not strictly so, check out the CAP theorem. There's been quite a lot of research done in this area, I recommend reading Amazon's Dynamo paper for a quick view of some of the problems and solutions faced by distributed systems like NoSQL databases.
MongoDB allows the application to select the desired level of read consistency using "write concern". This concept allows your application to block until a certain condition is met for a given write.
By way of example, you can consider any write successful so long as the operation is communicated to a master server. Alternatively, you can block until a write has been propagated to a majority of nodes in your replica set. In this way, you can mix performance/consistency to taste.
It depends on the NoSQL database you are using as each implements a different strategy. You can read, for example, Riak's explanation of their "eventual consistency" model or Lars Hofhansel's writeup on ACID in HBase

What are atomic store types?

I've read the Core Data references on Apple's site. But I wonder, what they mean with atomic store types? They write something of "they have to be read and written in their entirety".
Could somebody clarify this, please?
Atomicity refers to the property of an entire operation happening before another operation can interrupt it-i.e. for that thread an "atomic" operation will complete if started. In real life this usually means checking the value of something at the start and end of an operation, and if that value changed apart from what the atomic operation performed re-attempt the operation.
It means that they cannot be in a state where they are half-written.