SimpleDB: Guaranteed to see all item attributes if we see the item? (non-consistent read)

SimpleDB: Guaranteed to see all item attributes if we see the item? (non-consistent read) - nosql

I've just discovered an assumption in my use of SimpleDB. I suspect it's safe but would like other opinions since the docs don't seem to cover it.
So say Process 1 stores an item with x attributes. When Process 2 tries to access said item (without consistent read) & finds it, is it guaranteed to have all the attributes stored by Process 1?
I'm excluding the possibility that another process could have changed the data.
I also know that Process 2 has no guarantee of seeing the item unless consistent read is used, I'm just talking about the point when it does eventually see it.
I guess the question is, once I can get an item & am not changing it anywhere else can I assume it has an ad-hoc fixed schema and access all my expected attributes without checking they actually exist?
I don't want to be in a situation where I need to keep requesting items until they have all the attributes I need to use them.
Thanks.

Although Amazon makes no such guarantees in the documentation the current implementation of their eventual consistency guarantees that you'll see all the properties stored by Process 1 or none of them.
See this thread over at the AWS forums and more specifically this answer by an Amazon employee confirming the behavior (emphasis mine).
I don't think we make that guarantee in the documentation, but the
current implementation treats each Put request as a bundle. It won't
split the request up and apply the operations piecemeal. You'll get
either step-1 responses or step-2 responses until eventual consistency
shakes out and leaves you with step-2 responses.
While this is undocumented behavior I suspect quite a few SimpleDB users are relying on it now and as such Amazon won't be likely to change it anytime soon, but that's ju my guess.

Related

How to manage a pool via a RESTful interface

As I am not sure I stated the question very well originally, I am restating it to see if there is a better response.
I have a problem with how best to manage a specific kind collection with a RESTful API. To help illustrate the issue I have I will use an simple artificial example. Lets call it the 'Raffle Ticket Selector'. For this question I am only interested in how to perform one function.
I have a collection of unpurchased raffle tickets (raffleTickets). Each with a unique Raffle Number along with other information.
I need to be able to take an identified number of tickets (numTickets) from the raffleTickets collection without uniquely selecting them. The collection itself has a mechanism for random selection.
The result is that I am returned 5 unique tickets from the collection and the size of the collection is decreased by 5 as the 5 returned have been removed.
The quesition is, how do I do it in a RESTfull way?
I intuatively want to do METHOD .../raffelTickets?numTickets=5 but struggle with which HTTP Method to use
In answering; you are not allowed to suggest that I just PATCH/PUT a status change to effect a removal by marking them taken. It must result an actual change in the cardanality of the collection.
Note: Calling the method twice will return a different result set every time and will always alter the collection on which it is performed (unless it is empty!)
So what method should I use? PUT? POST? DELETE? PATCH? Identpotent restrictions would seem to only leave me with POST and PATCH neither of which feels ideal to me. Or perhaps there is another way of providing the overall behavior that is considered the correct approach.
I am really interested to know what is best practice and understand why.
Cheers
Original Post on which the first response was based:
I have a pool of a given item which is to be managed with a RESTful API. Now adding items to the pool is not an issue but how to I take items from the pool? Is it also a POST or is it a DELETE?
Lets say it is a pool of random numbers and I want to retrieve a variable number of items in a single method call.
I have two scenarios:
I am not checking them out as once taken they will not be returned to the pool.
I only want to check them out and they effectively remain part of the pool but have a status altered to 'inUse'
The important thing in each case is I do not care which items I get, I just want N of them.
What is considered the RESTful way performing each of the two actions on the pool? I have an opinion on the second option but I dither on the former so I am interested in your thoughts for both so I better understand the thought pattern
Thanks

Not sure if I understood well your question. It will mostly depend on the way you developed the API side of your REST communication.
In a generic solution, you would use DELETE to take items out of a list. However, if you just want to PARTIALY update the items, you could use PATCH instead of POST or PUT.
Give this a look: http://restcookbook.com/HTTP%20Methods/patch/

Is it RESTful do DELETE collections?

Some say it's "often not desirable" for a REST server to allow the DELETEion of the entire collection of entities.
DELETE http://www.example.com/customers
Is this a real rule for achieving RESTful nirvana?
And what about sub-collections, defined by query parameters?
DELETE http://www.example.com/customers?gender=m

The answer to this depends more on the requirements and risks of your application than on the inherent RESTfulness of either construct.
It's "not often desirable" to delete an entire collection if you imagine the collection as something with enduring importance like a customer list. It doesn't break with some essential REST wisdom.
If the collection contains information that a user should be able to delete, and potentially a lot of such information, DELETE of the entire collection can be the nicest REST-ish way to go, rather than run a lot of individual DELETEs.
Deleting based on criteria (e.g. the query parameter) is so essential to some applications that if the REST police declared it Officially UnRESTful I would continue to do it without shame.

(They actually say "not often desirable," which one might interpret slightly differently than "often not desirable.")
Yes, it's RESTful. If you have a valid use case, it's fine to do it. Your second scenario (deleting with a query) is frequently useful, and can be an easy way to reduce the number of HTTP requests the client has to make.
Edit: as #peeskillet says, do consider if you actually want to delete something, versus change some flag on the record (e.g. "active").

EventStore basics - what's the difference between Event Meta Data/MetaData and Event Data?

I'm very much at the beginning of using / understanding EventStore or get-event-store as it may be known here.
I've consumed the documentation regarding clients, projections and subscriptions and feel ready to start using on some internal projects.
One thing I can't quite get past - is there a guide / set of recommendations to describe the difference between event metadata and data ? I'm aware of the notional differences; Event data is 'Core' to the domain, Meta data for describing, but it is becoming quite philisophical.
I wonder if there are hard rules regarding implementation (querying etc).
Any guidance at all gratefully received!

Shamelessly copying (and paraphrasing) parts from Szymon Kulec's blog post "Enriching your events with important metadata" (emphases mine):
But what information can be useful to store in the metadata, which info is worth to store despite the fact that it was not captured in
the creation of the model?
1. Audit data
who? – simply store the user id of the action invoker
when? – the timestamp of the action and the event(s)
why? – the serialized intent/action of the actor
2. Event versioning
The event sourcing deals with the effect of the actions. An action
executed on a state results in an action according to the current
implementation. Wait. The current implementation? Yes, the
implementation of your aggregate can change and it will either because
of bug fixing or introducing new features. Wouldn’t it be nice if
the version, like a commit id (SHA1 for gitters) or a semantic version
could be stored with the event as well? Imagine that you published a
broken version and your business sold 100 tickets before fixing a bug.
It’d be nice to be able which events were created on the basis of the
broken implementation. Having this knowledge you can easily compensate
transactions performed by the broken implementation.
3. Document implementation details
It’s quite common to introduce canary releases, feature toggling and
A/B tests for users. With automated deployment and small code
enhancement all of the mentioned approaches are feasible to have on a
project board. If you consider the toggles or different implementation
coexisting in the very same moment, storing the version only may be
not enough. How about adding information which features were applied
for the action? Just create a simple set of features enabled, or map
feature-status and add it to the event as well. Having this and the
command, it’s easy to repeat the process. Additionally, it’s easy to
result in your A/B experiments. Just run the scan for events with A
enabled and another for the B ones.
4. Optimized combination of 2. and 3.
If you think that this is too much, create a lookup for sets of
versions x features. It’s not that big and is repeatable across many
users, hence you can easily optimize storing the set elsewhere, under
a reference key. You can serialize this map and calculate SHA1, put
the values in a map (a table will do as well) and use identifiers to
put them in the event. There’s plenty of options to shift the load
either to the query (lookups) or to the storage (store everything as
named metadata).
Summing up
If you create an event sourced architecture, consider adding the
temporal dimension (version) and a bit of configuration to the
metadata. Once you have it, it’s much easier to reason about the
sources of your events and introduce tooling like compensation.
There’s no such thing like too much data, is there?

I will share my experiences with you which may help. I have been playing with akka-persistence, akka-persistence-eventstore and eventstore. akka-persistence stores it's event wrapper, a PersistentRepr, in binary format. I wanted this data in JSON so that I could:
use projections
make these events easily available to any other technologies
You can implement your own serialization for akka-persistence-eventstore to do this, but it still ended up just storing the wrapper which had my event embedded in a payload attribute. The other attributes were all akka-persistence specific. The author of akka-persistence-eventstore gave me some good advice, get the serializer to store the payload as the Data, and the rest as MetaData. That way my event is now just the business data, and the metadata aids the technology that put it there in the first place. My projections now don't need to parse out the metadata to get at the payload.

Can I use claims to secure EF fields using PostSharp?

It it possible to use claims based permissions to secure EF fields using post sharp. We have a multi-tenanted app that we are moving to claims and also have issues of who can read/write to what fields. I saw this but it seems role based http://www.postsharp.net/aspects/examples/security.
As far as I can see it would just be a case of rewriting the ISecurable part.
We were hoping to be able to decorate a field with a permission and silently ignore any write to if if the user did not have permission. We were also hopping that if they read it we could swap in another value e.g. Read salary and get back 0 if you don't have a claim ReadSalary.
Are these standard sort of things to do I've never done any serious AOP. So just wanted a quick confirmation before I mention this as an option.

Yes, it is possible to use PostSharp in this case and it should be pretty easy to convert given example from RBAC to claims based.
One thing that has to be considered is performance. When decorated fields are accessed frequently during processing an use-case (e.g. are read inside a loop) then a lot of time is wasted in redundant security checks. Decorating a method that represent an end-user's use-case would be more appropriate.
I would be afraid to silently swapping values of fields when user has insufficient permission. It could lead to some very surprising results when an algorithm is fed by artificial not-expected data.

Forcing web api consumers to accept new fields in responses

I'm creating v2 of an existing RESTful web api.
The responses are JSON lists of objects, roughly in the form:
[
{
name1=value1,
name2=value2,
},
{
name1=value3,
name2=value4,
}
]
One problem we've observed with v1 is that some clients will access fields by integer position, instead of by name. This means that if we decide to add fields to the response (which we had originally considered a compatibility-preserving change), then some of our client's code breaks, unless we add the fields at the end. Even then, other clients code breaks anyway, because they will fail in some way when they encounter an unexpected attribute name.
To counter this in v2, we are considering randomly reordering the fields in every response. This will force clients to index fields by name instead of by position.
Additionally, we are considering adding a randomly-named field to every response. This will force clients to ignore fields they don't recognize.
While this sounds somewhat barmy, it does have the advantage that we will be able to add new fields, safe in the knowledge that this isn't breaking any clients. This means we can issue compatible updates to v2.1, v2.3, etc at the same URL, and that means we will only have to maintain & support a smaller number of API versions.
The alternative is to issue compatibility-breaking v3, v4, at new URLs, which means that we will have to maintain & support many incompatible API versions, which will stretch us that little bit thinner.
Is this a bad idea, and if so, why? Are there any other similar ideas I should think about?
Update: The first couple of responses are pointing out that if I document the issue (i.e. indicate in the docs that fields may be added or reordered) then I am no longer to blame if client code breaks when I subsequently add or reorder fields. Sadly I don't think this is an appropriate option for us: Many dozens of organisations rely on the functionality of our APIs for real-world transactions with substantial financial impact. These organisations are not technically oriented - and the resulting implementations at the client end cover the whole spectrum of technical proficiency. We already did document that fields may get added or reordered in the docs for v1, and that clearly didn't work, because now we're having to issue v2 because many clients, due to lack of time or experience or ability, still wrote code that breaks when we add new fields. If I were now to add fields to the interface, it breaks a dozen different company's interfaces to us, which means they (and us) are bleeding money every minute. If I were to refuse to revert the change or fix it, saying "They should have read the docs!", then I will soon be out of the job, and rightly so. We may attempt to educate the 'failing' partners, but this is doomed to fail as the problem gets larger every month as we continue to grow. My question is, can I systemically head the whole issue off at the pass, preventing this situation from ever arising, no matter what clients try to do? If the techniques I suggest would work, why shouldn't I use them? Why isn't everyone else using them?

If you want your media types to be "evolvable", make that point very clear in the documentation. Similarly, if the placement order of fields is not guaranteed, make that explicitly clear too. If you supply sample code for your API, make sure it does not rely on field ordering.
However, even assuming that you have to maintain different versions of your media types, you don't have to version the URI. REST gives you the ability to maintain the same version-agnostic URI but use HTTP content negotiation (via the Accept and Content-Type headers) to offer different payloads at the same URI.
Therefore any client that doesn't explicitly wish to accept your new v2/v3/etc encoding won't get it. By default, you can return the old v1 encoding with the original field ordering and all of those brittle client apps will work fine. However, new client developers will know (thanks to your documentation) to indicate via Accept that they are willing and able to see the new fields and they don't care about their order. Best of all, you can (and should) use the same URI throughout. Remember - different payloads like this are just different representations of the same underlying resource, so the URI should be the same.

I've decided to run with the described techniques, to the max. I haven't heard any objections to them that hold any water for me. Brian's answer, about re-using the same URI for different API versions, is a solid and much-appreciated complementary idea (with upvote), but I can't award it 'best answer' because it doesn't get to the core of my original question.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse