A situation when HTTP put is not idempotent

A situation when HTTP put is not idempotent - rest

Consider the following scenario:
Alice updates item1 using http put
Bob updates item1 using http put with different data
Alice updates item1 using http put again with the same data accidentally, for instance, using the back button in a browser
Charlie reads the data
Is this idempotent?

Is this idempotent?
Yes. The relevant definition of idempotent is provided by RFC 7231
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
However, the situation you describe is that of a data race -- the representation that Charlie receives depends on the order that the server applies the PUT requests received from Alice and Bob.
The usual answer to avoiding lost writes is to use requests that target a particular version of the resource to update; this is analogous to using compare and swap semantics on your request -- a write that loses the data race gets dropped on the floor
For example
x = 7
x.swap(7, 8) # Request from Alice changes x == 7 to x == 8
x.swap(8, 9) # Request from Bob changes x == 8 to x == 9
x.swap(7, 8) # No-Op, this request is ignored, x == 9
In HTTP, the specification of Conditional Requests gives you a way to take simple predicates, and lift them into the meta data so that generic components can understand the semantics of what is going on. This is done with validators like eTag.
The basic idea is this: the server provides, in the metadata, a representation of the validator associated with the current representation of the resource. When the client wants to make a request on the condition that the representation hasn't changed, it includes that same validator in the request. The server is expected to recalculate the validator using the current state of the server side resource, and apply the change only if the two validator representations match.
If the origin server rejects a request because the expected precondition headers are missing from the request, it can use 428 Precondition Required to classify the nature of the client error.

Yes, this is idempotent. If it is wrong behavior for you, we should know bussiness logick behind that.

Related

RESTful - How to update a subresource and what ETag/payload to return concerning optimistic locking?

As an example I have an order where the invoicing address can be modified. The change might trigger various additional actions (i.e. create a cancellation invoice and a new invoice with the updated address).
As recommended by various sources (see below) I don't want to have a PATCH on the order resource, because it has many other properties, but want to expose a dedicated endpoint, also called "intent" resource or subresource according to the web links below:
/orders/{orderId}/invoicing-address
Should I use a POST or a PATCH against this subresource?
The invoicing address itself has no ID. In the domain layer it is represented as a value object that is part of the order entity.
What ETag should be used for the subresource?
The address is part of the order and together with the items they form an aggregate in the domain layer. When the aggregate is updated it gets a new version number in the database. That version number is used as an ETag for optimistic locking.
Should a GET on invoicing-address respond with the order aggregate version number or a hash value of the address DTO in the ETag header?
What payload should be returned after updating the address?
Since the resource is the invoicing address it seems natural to return the updated address object (maybe with server side added fields). Should the body also include the ID/URI and the ETag of the order resource?
None of the examples I found with subresources showed any server responses or considered optimistic locking.
https://rclayton.silvrback.com/case-against-generic-use-of-patch-and-put
https://www.thoughtworks.com/insights/blog/rest-api-design-resource-modeling
https://softwareengineering.stackexchange.com/questions/371273/design-update-properties-on-an-entity-in-a-restful-resource-based-api (see provided answer)
https://www.youtube.com/watch?v=aQVSzMV8DWc&t=188s (Jim Webber at about about 31 mins)

As far as REST is concerned, "subresources" aren't a thing. /orders/12345/invoicing-address identifies a resource. The fact that this resource has a relationship with another resource identified by /orders/12345 is irrelevant.
Thus, the invoicing-address resource should understand HTTP methods exactly the same way as every other resource on the web.
Should I use a POST or a PATCH against this subresource?
Use PUT/PATCH if you are proposing a direct change to the representation of the resource. For example, these are the HTTP methods we would use if we were trying to fix a spelling error in an HTML document (PUT if we were sending a complete copy of the HTML document; PATCH if we were sending a diff).
PUT /orders/12345/invoicing-address
Content-Type: text/plain
1060 W Addison St.
Chicago, IL
60613
On the other hand, if you are proposing an indirect change to the representation of the resource (the request shows some information to the server, and the server is expected to compute a new representation itself)... well, we don't have a standardized method that means exactly that; therefore, we use POST
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.” -- Fielding, 2009
What ETag should be used for the subresource?
You should first give some thought to whether you want to use a strong-validator or a weak validator
A strong validator is representation metadata that changes value whenever a change occurs to the representation data that would be observable in the content of a 200 (OK) response to GET.
...
In contrast, a weak validator is representation metadata that might
not change for every change to the representation data.
...
a weak entity-tag ought to change whenever the origin server wants caches to invalidate old responses.
I might use a weak validator if the representation included volatile but insignificant information; I don't need clients to refresh their copy of a document because it doesn't have the latest timestamp metadata. But I probably wouldn't use an "aggregate version number" if I expected the aggregate to be changing more frequently than the invoicing-address itself changes.
What payload should be returned after updating the address?
See 200 OK.
In the case of a POST request, sending the current representation of the resource (after changes have been made to it) is nice because the response is cacheable (assuming you include the appropriate metadata signals in the response headers).
Responses to PATCH have similar rules to POST (see RFC 5789).
PUT is the odd man out, here
Responses to the PUT method are not cacheable.
Should the body also include the ID/URI and the ETag of the order resource?
Entirely up to you - HTTP components aren't going to be paying attention to the representation, so you can design that representation as makes sense to you. On the web, it's perfectly normal to return HTML documents with links to other HTML documents.

Should API PUT endpoint receive all parameters, even if they are not editable?

There is a object of a name Car in the backend database. It contains several fields:
id
name
age
vinNumber
retailerId
There is also a API that elevates adding and editing the car:
POST /car - creates a car
PUT /car/{carId} - updates a car
User of a API can provide name, age and vinNumber while creating a car in a POST body.
When updating a car user can edit name and age. VinNumber is not enabled to be edited after creating a car.
Also retailerId is not editable since it comes from another system to the backend database.
Since that said, we have two fields that should not be edited with the API: vinNumber and retailerId.
So, taking into account REST idempotency, should the PUT request require the user of the API vinNumber and retailerId to be provided also, that were received earlier by GET request? In spite these parameters should not be editable?

An important thing to recognize -- the HTTP specification describes the semantics of an HTTP request; what does this message mean? It allows clients and servers implemented by different organizations to collaborate without requiring a direct partnership between the two.
The point being that a generic client can prepare a request for a generic server without needing out of band information.
PUT, semantically, is a request that the server change its current representation of a resource to match the client's local copy.
If "the server" was just an anemic data store (a facade in front of a file system, or a document database), then the effect of PUT at the server would just be to write the message-body as is into storage.
The point of REST, and the uniform interface, is that your server should always understand the messages the same way that the anemic facade understands them.
Similarly, your server should use the same shared semantics for its responses.
If the representations you are working with include vinNumber and retailId, then the client should be sending those fields unless the request is to remove them from the representation altogether (which may or may not be allowed, depending on whether or not they are required).
The server should understand that the request missing those fields is trying to remove them, and a request with new values in those fields is trying to change them. It can then decide what it wants to do with that request, and send the corresponding response.
Roy Fielding wrote about GET semantics in 2002:
HTTP does not attempt to require the results of a GET to be safe. What it does is require that the semantics of the operation be safe, and therefore it is a fault of the implementation, not the interface or the user of that interface, if anything happens as a result that causes loss of property (money, BTW, is considered property for the sake of this definition).
The same idea holds for PUT (and also the other HTTP methods); we hold the implementation responsible for loss of property if its handling of the request doesn't match the semantics.

According to the PUT request documentation- one should provide the complete data (ie vinNumber and retailerId also) - https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods
You could use PATCH instead for such cases.
Also what we done initally and i have see many times is POST /car/{carId}

What is the proper response status for a REST API GET returning no content?

I have an endpoint like so:
GET /api/customer/primary
If a primary customer exists, I return something like
{
name: "customerName"
}
But what if I send a GET and a primary customer doesn't exist?
Is it better to send a 200 OK with an empty JSON {}
Or is better to only send a 204 No Content?

404 is the appropriate status code for this. You're trying to represent a 'primary customer' as a resource, but in some cases this relationship doesn't exists. This situation is pretty clear, it should be a 404 for GET requests.
This is a perfectly acceptable way to communicate this. A 404 might signal a client that the resource doesn't exist yet, and perhaps that it can be created with PUT.
204 No Content has a specific meaning, and doesn't make that much sense for your case. 204 is not just meant to signal there's not going to be response body (Content-Length: 0 can do that), but it has a more specific application for hypermedia applications. Specifically, it signals that when a user performs an action that results in the 204, the view shouldn't refresh. This makes sense for for example an "Update" operation where a user can occasionally save their progress while working on a document. Contrast to 205 Reset Content, which signals that the 'view' should reset so (perhaps) a new document can be created from scratch.
Most applications don't go this far. Frankly, I haven't seen a single one. Given that, returning 200 with Content-Length: 0 or 204 No Content is an almost completely irrelevant discussion. The HTTP specification certainly doesn't forbid 200 OK with Content-Length: 0.
That was a bit of a tangent. To conclude, 404 signals this 'thing' doesn't exist, and that's appropriate here. There's no multiple interpretations. There's the people who wrote the specifications, those who read them well and on the other side of the discussion the people who are wrong.

But what if I send a GET and a primary customer doesn't exist?
Is it better to send a 200 OK with an empty JSON {}
Or is better to only send a 204 No Content?
If I'm interpreting your question correctly, you aren't really asking about status codes, but rather what kind of schema should you be using to manage the different cases in your API.
For cases like REST, where the two ends of the conversation are not necessarily controlled by the same organization and same release cycle, you may need to consider that one side of the conversation is using a more recent schema version than the other.
So how is that going to be possible? The best treatments I have seen focus on designing schema for extension - new fields are optional, and have documented semantics for how they should be understood if a field is absent.
From that perspective
{}
Doesn't look like a representation of a missing object - it looks like a representation of an object with default values for all of the optional fields.
It might be that what you want is something like Maybe or Option - where instead of promising to send back an object or not, you are promising to send back a collection of zero or one object. Collections I would normally expected to be represented in JSON as a array, rather than an object.
[]
Now, with that idea in pocket, I think it's reasonable to decide that you are returning a representation of a Maybe, where the representation of None is zero bytes long, and the representation of Some(object) is the JSON representation of the object.
So in that design 204 when returning None makes a lot of sense, and you can promise that if a successful response returns a body, that there is really something there.
There's a trade off here - the list form allows consumers to always parse the data, but they have to do that even when a None is sent. On the other hand, using the empty representation for None saves a parse, but requires that the consumer be paying attention to the content length.
So, looking back to your two proposals, I would expect that using 204 is going to be the more successful long term approach.
Another possibility would be to return the null primitive type when you want to express that there is no object available. This would go with a 200 response, because the content length would be four bytes long.
null

HTTP 404 status's text ("Not Found") is the closest to the situation, But:
The first digit of the Status-Code defines the class of response. The
last two digits do not have any categorization role. There are 5
values for the first digit:
1xx: Informational - Request received, continuing process
2xx: Success - The action was successfully received,
understood, and accepted
3xx: Redirection - Further action must be taken in order to
complete the request
4xx: Client Error - The request contains bad syntax or cannot
be fulfilled
5xx: Server Error - The server failed to fulfill an apparently
valid request
(reference)
In practice, 4xx recognized as an error and it is likely some alerts will rise from network / security / logging infrastructure
204 semantic indicate that the server has successfully fulfilled a request and there is no additional content to send - not exactly what happening.
A common use case is to return 204 as a result of a PUT request, updating the resource.
Therefore I would recommend using either:
HTTP 200 with an empty object / array
like you suggested.
HTTP 200 returning a null object, e.g.:
"none" (valid JSON)
or
{
"name": "NO_PRIMARY_CUSTOMER"
}
(implementation of such a null object depends on your specific system behavior with the returned data)
Custom HTTP 2xx code with an empty result
Less common, but still workable alternative is to return a custom HTTP code within the 2xx range (e.g. HTTP 230) with an empty result.
This option should be used with extra caution or even avoided if the API is exposed to a wide audience that may use unknown tools to access / monitor the API.

Server return status 200 but client doesn't receive it because network connection is broken

I have REST service and client (Android app) that send POST request to REST service. On client side there are documents (orders) that need to be synchronized with web server. Synchronization means that client sends POST request to REST service for each order. When REST service receive POST request it writes data to database and sends response with status 200 to client. Client receives 200 and mark that order as synchronized.
Problem is when connection is broken after a server sent status 200 response but before client received response. Client doesn't mark order as synchronized. Next time client sends again this order and servers write it again in database so we have same order two times.
What is good practice to deal with this kind of problem?

Problem is when connection is broken after a server sent status 200 response but before client received response. Client doesn't mark order as synchronized. Next time client sends again this order and servers write it again in database so we have same order two times.
Welcome to the world of unreliable messaging.
What is good practice to deal with this kind of problem?
You should review Nobody Needs Reliable Messaging, by Marc de Graauw (2010).
The cornerstone of reliable messaging is idempotent request handling. Idempotent semantics are described this way
A request method is considered "idempotent" if the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request.
Simply fussing with the request method, however, doesn't get you anything. First, the other semantics in the message may not align with the idempotent request methods, and second the server needs to know how to implement the effect as intended.
There are two basic patterns to idempotent request handling. The simpler of these is set, meaning "overwrite the current representation with the one I am providing".
// X == 6
server.setX(7)
// X == 7
server.setX(7) <- a second, identical request, but the _effect_ is the same.
// X == 7
The alternative is test and set (sometimes called compare and swap); in this pattern, the request has two parts - a predicate to determine is some condition holds, and the change to apply if the condition does hold.
// X == 6
server.testAndSetX(6,7)
// X == 7
server.testAndSetX(6,7) <- this is a no op, because 7 != 6
// X == 7
That's the core idea.
From your description, what you are doing is manipulating a collection of orders.
The same basic idea works there. If you can calculate a unique identifier from the information in the request, then you can treat your collection like a set/key-value store.
// collection.get(Id.of(7)) == Nothing
collection.put(Id.of(7), 7)
// collection.get(Id.of(7)) == Just(7)
collection.put(Id.of(7), 7) <- a second, identical request, but the _effect_ is the same.
// collection.get(Id.of(7)) == Just(7)
When that isn't an option, then you need some property of the collection that will change when your edit is made, encoded into the request
if (collection.size() == 3) {
collection.append(7)
}
A generic way to manage something like this is to consider version numbers -- each time a change is made, the version number is incremented as part of the same transaction
// begin transaction
if (resource.version.get() == expectedVersion) {
resource.version.set(1 + expectedVersion)
resource.applyChange(request)
}
// end transaction
For a real world example, consider JSON Patch, which includes a test operation that can be used as a condition to prevent "concurrent" modification of a document.
What we're describing in all of these test and set scenarios is the notion of a conditional request
Conditional requests are HTTP requests [RFC7231] that include one or more header fields indicating a precondition to be tested before applying the method semantics to the target resource.
What the conditional requests specification gives you is a generic way to describe conditions in the meta data of your requests and responses, so that generic http components can usefully contribute.
Note well: what this works gets us is not a guarantee that the server will do what the client wants. Instead, it's a weaker: that the client can safely repeat the request until it receives the acknowledgement from the server.

Surely your documents must have an unique identifier. The semantically correct way would be to use the If-None-Match header where you send that identifier.
Then the server checks whether a document with that identifier already exists, and will respond with a 412 Precondition Failed if that is the case.

One of possible options would be validation on server side. Order should have some uniqueness parameter: name or id or something else. But this parameter should be send by client also. Then you get this value (e.x. if name is unique and client send it), find this order in database. If order is founded then you don't need to save it into database and should send 409 Conflict response to client. If you din't find such order in database then you save it and send 201 Ok response.
Best practices:
201 Ok for POST
409 Conflict - if resource already exists

Your requests should be idempotent.
From your description, you should be using PUT instead of POST.
Client side generated Ids (guids) and Upsert logic server side, help achieve this.
This way you can implement a retry logic client side for failed requests, without introducing multiple records.

How to handle network connectivity loss in the middle of REST POST request?

REST POST is used to create resources.
Let's say we have resource url
"http://example.com/cars"
We want to create a new car.
We POST to "http://example.com/cars" with JSON payload containing car properties (color, weight, model, etc).
Server receives the request, creates a new car, sends a response over the network.
At this point network fails (let's say router stops working properly and ignores every packet).
Client fails with TCP timeout (like 90 seconds).
Client has no idea whether car was created or not.
Also client haven't received car resource id, so it can't GET it to check if it was created.
Now what?
How do you handle this?
You can't simply retry creating, because retrying will just create a duplicate (which is bad).

REST POST is used to create resources.
HTTP POST is used for lots of things. REST doesn't particularly care; it just wants resources that support a uniform interface, and hypermedia.
At this point network fails
Bummer!
Now what? How do you handle this? You can't simply retry creating, because retrying will just create a duplicate (which is bad).
This is a general messaging concern, not directly related to REST. The most common solution is to use the Idempotent Receiver pattern. In short, you
need to define your messages so that the receiver has enough information to recognize the request as something that has already been done.
Ideally, this is being supported at the business level.
Idempotent collections of values are often straight forward; we just need to be thinking sets, rather than lists.
Idempotent collections of entities are trickier; if the request includes an identifier for the new entity, or if we can compute one from the data provided, then we can think of our collection as a hash.
If none of those approaches fits, then there's another possibility. Instead of performing an idempotent mutation of the collection, we make the mutation of the collection itself idempotent. Think "compare and swap" - we encode into the request information that identifies the current state of the collection; is that state is still current when the request arrives, then the mutation is applied. If the condition does not hold, then the request becomes a no-op.
Translating this into HTTP, we make a small modification to the protocol for updating the collection resource. First, we GET the current representation; and in the meta data the server provides validators that can be used in subsquent requests. Having obtained the validator, the client evaluates the current representation of the resource to determine if it needs to be changed. If the client decides to make a change, then submits the change with an If-Match or an If-Unmodified-Since header including the validator. The server, before processing the requests, then considers the validator, immediately abandoning the request with 412 Precondition Failed.
Thus, if a conditional state-changing request is lost, the client can at its own discretion repeat the request without concern that server will misunderstand the client's intent.

Retry it a limited number of times, with increasing delays between the attempts, and make sure the transaction concerned is idempotent.
because retrying will just create a duplicate (which is bad).
It is indeed, and it needs fixing, see above. It should be impossible in your system to create two entries with the same attributes. This is easily accomplished at the database level. You can attain idempotence by having the transaction return the same thing whether the entry already existed or was newly created. Or else just have it return EXISTS if the entry already exists, and adjust your client accordingly.

Categories

HOME

google-cloud-dataproc

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse