Transaction safety in RESTful APIs - rest

I wonder how to establish transaction safety in RESTful APIs, where everything is built around single entities.
Example
Database model:
Invoice
Item
User-performed steps in the browser:
Change order number.
Add an item.
Remove an item.
Edit an item.
Requests made:
PATCH/PUT invoice data/order number.
POST item.
DELETE item.
PATCH/PUT item.
Issue
If after any of the requests above an error happens, further calls might mess up the data integrity. Additionally previous requests have to be made undone. E.g. if deleting the item fails, then steps 1 and 2 have to be rewound in order for the overall invoice to be how it was before.
Another problem that might arise is a browser crash, dying internet connection, server failure or whatever.
How can one make sure that certain actions are executed in some kind of transaction to maintain data integrity and safety?

So the thing to remember with REST is the "state transfer" bit. You are not telling the server the steps needed to update a resource, you are telling the server the state the resource should be in after the update, because you have already updated it on the client and are now simply transferring this new state over to the server.
So say you have an Invoice item on the server that looks like this JSON on the server
{
invoice_id: 123,
invoice_description: "Some invoice",
invoice_items: [
{item_id: 10, item_desc: "Large item", order_amount: 34},
{item_id: 11, item_desc: "Small item", order_amount: 400}
]
}
And a user wants to edit that invoice as a single atomic transaction. Firstly they GET the invoice from the server. This essentially says "Give me the current state of the invoice"
GET /invoices/123
The user then edits that invoice anyway they want. They decide the number of large items should be 40 not 34. They decide to delete the small items completely. And finally they decide to add another item of "Extra small" items to the invoice. After the user has edited the invoice the client has the following invoice
{
invoice_id: 123,
invoice_description: "Some invoice",
invoice_items: [
{item_id: 10, item_desc: "Large item", order_amount: 40},
{item_id: 30, item_desc: "Extra small item", order_amount: 5}
]
}
So the client has the invoice in a different state to the server. The user now wants to send this back to the server to be stored, so it PUTs the new state of the invoice to the server.
PUT /invoices/123
which essentially says "Here is the new state of this resource."
Now depending on how fancy you want your validation to be the server can simply accept this new state the invoice is in as it is, or it can do a whole load of validation for each change. How much you want to do is up to you.
You would at the very least want to check that no other client has PUT an updated invoice onto the server while the user was editing this copy of the invoice on their client. You can do this by checking the various headers of the HTTP requests (such as the etag header http://en.wikipedia.org/wiki/HTTP_ETag)
If for any reason the server decides that this update is not valid it simply fails the entire PUT request. This is what gives you transactions in HTTP. A require should either work or fail. If it fails it is the servers responsibility to make sure the resource has not been effected by the failed request. From an implantation point of view on the server you would probably do some validation of the new JSON and then attempt to save the new data to a database within a DB transaction. If anything fails then the database is kept in original state and the user is told that the PUT didn't work.
If the request fails the user should be returned a HTTP status code and response that explains why the PUT require failed. This might be because someone else has edited the invoice while the user was thinking about his changes. It might be because the user is trying to put the invoice into an invalid state (say the user tried to PUT an invoice with no items and this breaks the business logic of the company).
You can of course develop a URI scheme that allows editing of individual items in an invoice, for example
GET /invoices/123/items/10
Would give you the item id 10 from invoice id 123. But if you do this you have to allow the editing of these resources independently of each other. If I delete item 10 by sending the delete command
DELETE /invoice/123/items/10
that action must be an independent transaction. If other requests depend on this you must instead do it as detailed above, by updating the invoice itself in a single request. You should never be able to put the resource into an invalid state through a single HTTP request, or to put it another way it should never require multiple HTTP requests to get the resource into a valid state (and thus never require a string of HTTP requests to work in order to be valid)
Hope that helps

Great answer, thanks a lot. There's just one bit I'm not sure about:
What if the user sends back an updated invoice with a new item that
hasn't got an ID yet, because that is always generated at the server?
AFAIK at least PUT wouldn't be correct here, but POST instead.
However, is that how it's done?
Yes PUT would be wrong here. PUT should be idempotent, which means that you should be able to make multiple PUT requests to a resource and if they are the same request the end result should be the same after all of them.
This makes sense if you think again of state transfer again, doing a PUT of the same state multiple times should still end up with the resource in that state. If I upload a PNG file to a resource 20 times the PNG file should be still the same PNG file as if I just uploaded it once.
So you should not have anything unambiguous in the state that you PUT to the server. If you left out the ID of the item you are essentially saying to the server "As part of this state update create a item". And of course if you run that 10 times you will create 10 new items and the invoice will not be in the same state.
So POST would be better here, and you might want to do that to an "items" end point for clarity if you are just updating the items.
POST /invoices/123/items
[
{item_id: 10, item_desc: "Large item", order_amount: 40},
{item_desc: "Extra small item", order_amount: 5}
]
The server can then return you the state of the invoice's items with the newly created IDs in the body of the response
[
{item_id: 10, item_desc: "Large item", order_amount: 40},
{item_id: 30, item_desc: "Extra small item", order_amount: 5}
]

Related

What is the REST way to update a record without primary key?

I've created a REST API. According to my design, we have to store user's blood sugar level per daily basis.
The problem is:
I want to use single endpoint for the insert and the update operations
I don't want to use primary key of the blood-sugar resource in the URI because i want to store only the last value for a single day.
For example if I make this call
POST https://{host}/users/1/blood-sugar/
{
"measureDate": "2019-05-04",
"bloodSugarLevel": 86
}
It will create a blood-sugar resource and the database will assign and ID (let's say ID=333)
It's OK until here.
Then, I want to be able to make a second request with same date but different blood sugar level. As a result, i want to the backend should find the previous blood-sugar resource (with ID=333) and update the bloodSugarLevel field, because we already have a record for this day (2019-05-04). I don't want to send ID=333 in the request body or URI.
POST https://{host}/users/1/blood-sugar/
{
"measureDate": "2019-05-04",
"bloodSugarLevel": 105 # only this value is different
}
My question is:
Is there any way to achieve this (or similar) result with REST? You can offer me to change the VERB or the URI or the request body.
Note:
If I was doing this with WCF or similar thing, only single method would satisfy the all my requirements. For example: CreateOrUpdateBloodSugarLevel(int userId, DateTime measureDate, int bloodSugarLevel)
Thanks.
Is there any way to achieve this (or similar) result with REST?
Just POSTing the updated value to the same endpoint is fine.
Think about how you would do this on the world wide web. You would visit a website, and would load some form, containing a text field for date, a text field for bloodSugarLevel, and a submit button. That would POST the message to the web server, and your browser would get back some response.
Note that, as a client, we really don't care whether the server appends the new message into a list, or upserts the message into a map, or does some clever thing with an RDBMS or a graph database. Those are implementation details; part of the point of having a uniform interface is that the interface means that the clients (and generic components) don't really need to know what is happening.
Another application protocol that could work would be to treat bloodSugarLevel as a document that users can edit locally. That way, a client could just use any HTTP aware editor to do the right thing.
GET /users/1/blood-sugar/
200 OK
{
"measureDate": "2019-05-03",
"bloodSugarLevel": 90
}
PUT /users/1/blood-sugar/
{
"measureDate": "2019-05-04",
"bloodSugarLevel": 86
}
204 No Content
PUT /users/1/blood-sugar/
{
"measureDate": "2019-05-04",
"bloodSugarLevel": 105
}
There are some semantic advantages to using PUT when the network is unreliable; because the server agrees that the message handing will be done idempotently, clients can respond to a timeout waiting for an acknowledgment by repeating the send.
Semantically, PUT means "upsert", but the underlying implementation doesn't have to be an upsert. We're only making promises about the semantics that the client can expect.

How to design a REST API to fetch a large (ephemeral) data stream?

Imagine a request that starts a long running process whose output is a large set of records.
We could start the process with a POST request:
POST /api/v1/long-computation
The output consists of a large sequence of numbered records, that must be sent to the client. Since the output is large, the server does not store everything, and so maintains a window of records with a upper limit on the size of the window. Let's say that it stores upto 1000 records (and pauses computation whenever this many records are available). When the client fetches records, the server may subsequently delete those records and so continue with generating more records (as more slots in the 1000-length window are free).
Let's say we fetch records with:
GET /api/v1/long-computation?ack=213
We can take this to mean that the server should return records starting from index 214. When the server receives this request, it can assume that the (well-behaved) client is acknowledging that records up to number 213 are received by the client and so it deletes them, and then returns records starting from number 214 to whatever is available at that time.
Next if the client requests:
GET /api/v1/long-computation?ack=214
the server would delete record 214 and return records starting from 215.
This seems like a reasonable design until it is noticed that GET requests need to be safe and idempotent (see section 9.1 in the HTTP RFC).
Questions:
Is there a better way to design this API?
Is it OK to keep it as GET even though it appears to violate the standard?
Would it be reasonable to make it a POST request such as:
POST /api/v1/long-computation/truncate-and-fetch?ack=213
One question I always feel like that needs to be asked is, are you sure that REST is the right approach for this problem? I'm a big fan and proponent REST, but try to only apply to to situations where it's applicable.
That being said, I don't think there's anything necessarily wrong with expiring resources after they have been used, but I think it's bad design to re-use the same url over and over again.
Instead, when I call the first set of results (maybe with):
GET /api/v1/long-computation
I'd expect that resource to give me a next link with the next set of results.
Although that particular url design does sort of tell me there's only 1 long-computation on the entire system going on at the same time. If this is not the case, I would also expect a bit more uniqueness in the url design.
The best solution here is to buy a bigger hard drive. I'm assuming you've pushed back and that's not in the cards.
I would consider your operation to be "unsafe" as defined by RFC 7231, so I would suggest not using GET. I would also strongly advise you to not delete records from the server without the client explicitly requesting it. One of the principles REST is built around is that the web is unreliable. Under your design, what happens if a response doesn't make it to the client for whatever reason? If they make another request, any records from the lost response will be destroyed.
I'm going to second #Evert's suggestion that you absolutely must keep this design, you instead pick a technology that's build around reliable delivery of information, such as a messaging queue. If you're going to stick with REST, you need to allow clients to tell you when it's safe to delete records.
For instance, is it possible to page records? You could do something like:
POST /long-running-operations?recordsPerPage=10
202 Accepted
Location: "/long-running-operations/12"
{
"status": "building next page",
"retry-after-seconds": 120
}
GET /long-running-operations/12
200 OK
{
"status": "next page available",
"current-page": "/pages/123"
}
-- or --
GET /long-running-operations/12
200 OK
{
"status": "building next page",
"retry-after-seconds": 120
}
-- or --
GET /long-running-operations/12
200 OK
{
"status": "complete"
}
GET /pages/123
{
// a page of records
}
DELETE /pages/123
// remove this page so new records can be made
You'll need to cap out page size at the number of records you support. If the client request is smaller than that limit, you can background more records while they process the first page.
That's just spitballing, but maybe you can start there. No promises on quality - this is totally off the top of my head. This approach is a little chatty, but it saves you from returning a 404 if the new page isn't ready yet.

REST Best practice for sync log data in reverse order

Consider a backend system that stores logs of this form:
{"id": 541, "timestamp": 123, "status": "ok"}
{"id": 681, "timestamp": 124, "status": "waiting"}
...
Assuming that there are MANY logs, a client (e.g. an Android app) wants to sync the log data stored at a server to the client's device for presentation. Since the most recent logs are of more interest to a user, the GET request should be paged and start with the most recent logs and walk its way towards the oldest ones.
What is a proper design for this situation? What about the following design?
Let the server response in reverse order, add parameters lastReceivedIdand size to the request and add a field more=true/false in the response that indicates whether there are more old logs available before the oldest log send in the current request. On the first request set lastRecivedId=-1 indicating that the server should answer with the most recent logs.
Ship 'em all, let the server sort them out. The endpoint simply doesn't care what order they show up in, the server will handle that detail for presentation.
On the client, the client can choose to send the latest logs first, but that's simply coincidence. There's no requirement one way or the other.
There's also no need to send them in any particular order. If the client has a thousand log entries (in chronological order), it can send back batches of 100 starting at 900-1000, then 800-899, etc. The server will figure it out in the end.

Rest API: path for accessing derived data

It is not clear to me that if I have a micro service that is in place to provide some derived data how the rest api should be designed for this. For instance :-
If I have a customer and I want to access the customer I would define the API as:
/customer/1234
this would return everything we know about the customer
however if I want to provide a microservice that simply tells me if the customer was previously known to the system with another account number what do I do. I want this logic to be in the microservice but how do I define the API
customer/1234/previouslyKnow
customerPreviouslyKnown/1234
Both don't seem correct. In the first case it implies
customer/1234
could be used to get all the customer information but the microservice doesn't offer this.
Confused!
Adding some extra details for clarification.
I suppose my issue is, I don't really want a massive service which handles everything customer related. It would be better if there were lighter weight services that handles customer orders, customer info, customer history, customer status (live, lost, dead....).
It strikes me all of these would start with
/customer/XXXX
so would all the services be expected to provide a customer object back if only customer/XXXX was given with no extra in the path such as /orders
Also some of the data as mentioned isn't actually persisted anywhere it is derived and I want the logic of this hidden in a service and not in the calling code. So how is this requested and returned.
Doing microservices doesn't mean to have a separate artifact for each method. The rules of coupling and cohesion also apply to the microservices world. So if you can query several data all related to a customer, the related resources should probably belong to the same service.
So your resource would be /customers/{id}/previous-customer-numbers whereas /customers (plural!) is the list of customers, /customers/{id} is a single customer and /customers/{id}/previous-customer-numbers the list of customer numbers the customer previously had.
Try to think in resources, not operations. So returning the list of previously used customer numbers is better than returning just a boolean value. /customer/{id}/previous-accounts would be even better, I think...
Back to topic: If the value of previous-accounts is directly derived from the same data, i.e. you don't need to query a second database, etc. I would even recommend just adding the value to the customer representation:
{
"id": "1234",
"firstName": "John",
"lastName": "Doe",
"previouslyKnown": true,
"previousAccounts": [
{
"id": "987",
...
}
]
}
Whether the data is stored or derived shouldn't matter so the service client to it should not be visible on the boundary.
Adding another resource or even another service is unnecessary complexity and complexity kills you in the long run.
You mention other examples:
customer orders, customer info, customer history, customer status (live, lost, dead....)
Orders is clearly different from customer data so it should reside in a separate service. An order typically also has an order id which is globally unique. So there is the resource /orders/{orderId}. Retrieving orders by customer id is also possible:
/orders;customer={customerId}
which reads give me the list of orders for which the customer is identified by the given customer id.
These parameters which filter a list-like rest resource are called matrix parameters. You can also use a query parameter: /orders?customer={customerId} This is also quite common but a matrix parameter has the advantage that it clearly belongs to a specific part of the URL. Consider the following:
/orders;customer=1234/notifications
This would return the list of notifications belonging to the orders of the customer with the id 1234.
With a query parameter it would look like this:
/orders/notifications?customer=1234
It is not clear from the URL that the orders are filtered and not the notifications.
The drawback is that framework support for matrix parameters is varying. Some support them, some don't.
I'd like matrix parameters best here but a query parameter is OK, too.
Going back to your list:
customer orders, customer info, customer history, customer status (live, lost, dead....)
Customer info and customer status most likely belong to the same service (customer core data or the like) or even the same resource. Customer history can also go there. I would place it there as long as there isn't a reason to think of it separately. Maybe customer history is such a complicated domain (and it surely can be) that it's worth a separate service: /customer-history/{id} or maybe just /customer/{id}.
It's no problem that different services use the same paths for providing different information about one customer. They are different services and they have different endpoints so there is no collision whatsoever. Ideally you even have a DNS alias pointing to the corresponding service:
https://customer-core-data.service.lan/customers/1234
https://customer-history.service.lan/customers/1234
I'm not sure if I really understand your question. However, let me show how you can check if a certain resource exist in your server.
Consider the server provides a URL that locates a certain resource (in this situation, the URL locates a customer with the identifier 1): http://example.org/api/customers/1.
When a client perform a GET request to this URL, the client can expect the following results (there may be other situation, like authentication/authorization problems, but let's keep it simple):
If a customer with the identifier 1 exists, the client is supposed to receive a response with the status code 200 and a representation of the resource (for example, a JSON or XML representing the customer) in the response payload.
If the customer with the identifier 1 do not exist, the client is supposed to receive a response with the status code 404.
To check whether a resource exists or not, the client doesn't need the resource representation (the JSON or XML that represents the customer). What's relevant here is the status code: 200 when the resource exists and 404 when the resource do not exist. Besides GET requests, the URL that locates a customer (http://example.org/api/customers/1) could also handle HEAD requests. The HEAD method is identical to the GET method, but the server won't send the resource representation in HEAD requests. Hence, it's useful to check whether a resource exists or not.
See more details regarding the HEAD method:
4.3.2. HEAD
The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section). The server SHOULD send the same
header fields in response to a HEAD request as it would have sent if
the request had been a GET, except that the payload header fields MAY be omitted. This method can be used for obtaining
metadata about the selected representation without transferring the
representation data and is often used for testing hypertext links for
validity, accessibility, and recent modification. [...]
If the difference between resource and resource representation is not clear, please check this answer.
One thing I want to add to the already great answers is: URLS design doesn't really matter that much if you do REST correctly.
One of the important tenets of REST is that urls are discovered. A client that has the customers's information already, and wants to find out what the "previously known" information, should just be able to discover that url on the main customer resource. If it links from there to the "previously known" information, it doesn't matter if the url is on a different domain, path, or even protocol.
So if you application naturally makes more sense if "previouslyKnown" is on a separate base path, then maybe you should just go for that.

REST DELETE with payload

This https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-p2-semantics-14#page-20 state that:
Bodies on DELETE requests have no defined semantics. Note that sending
a body on a DELETE request might cause some existing implementations
to reject the request.
This of course make sense.
But we have the scenario that when a customer deletes on of his purchased services that an interaction record needs to be created so CRM is notified can e.g. can try to win back the customer.
When the user delete the service he/she can enter a free format reason which needs to be stored in the interaction.
We just pass this reason text as body payload to the DELETE service call.
Because of some discussion we had I'm wondering if this is semantically right or how others would implement this.
Note: we don't want to send the reason text as query string as in theory this could be very long.
As the DELETE you mention has more semantics than specified in HTTP spec (i.e. it affects (creates) another resource), I'd use another resource, e.g. DeleteRequest, to which POST-ing something like:
{
"resource_href": "http://some/resource/to/delete",
"reason": {
"source": "customer request",
"msg": "I'm really not satisfied with the quality of your service"
}
}
would has the effect of removing the purchase (or maybe just changing it's state?) and notifying the CRM — and of course creating DeleteRequest resource for e.g. later CRM monitoring.
In more complex systems it is often a better idea to let the object stay (as immutable) and mark it somehow as "invisible" than to delete it permanently, without any traces.
UPDATE: In fact, the current HTTP spec more directly points out that DELETE doesn't have to mean removing the resource and is in line with my feeling that DELETE'ing would be rare. Although the OP quotation of message body semantics is still there.