how to isolate data in Restful API - rest

There are some restful apis, as follows:
api/v1/billing/invoices/{invoiceNumber}
api/v1/billing/transactions/{transactionNumber}
And, each invoice or transaction belong to a specific account.
When implementing the restful apis, we must meet: Each account can only view their own invoice or transaction.
How should we isolate the data in restful apis?
Of course, we can pass the account number to the api, such as:
api/v1//billing/invoices/{invoiceNumber}?accoutNumber=XXX
api/v1/billing/{accountNumber}/invoices/{invoiceNumber}
But the Invoice Number has been able to uniquely identify a resource. So I do not want the problem to be complicated.
Is there any other way to solve this problem?

You are mixing a lot of things here.
This is not a REST problem, this is a security problem. More precisely, it's a OWASP top 10 2013 Insecure direct object vulnerability.
Let's make it simple: you have a URL like this
.../superSensitiveStuff/1
and you want to prevent the owner of "1" from accessing to ".../superSensitiveStuff/2"
To the best of my knowledge, there are three ways of dealing with this issue:
enforcing integrity in request URLs. This strategy does not apply to all cases, it only works in those scenarios where the client issues a request to a resource previously communicated by the server. In this case, the server may add a query param like this
.../superSensitiveStuff/1?sec=HMAC(.../superSensitiveStuff/1)
where HMAC is a cryptographic HASH function. If the parameter is missing, the server will drop the request and if it's there the server will be able to verify that it's exactly the authorized URL because the HMAC value guarantees its integrity(for additional infos, hit the link above).
using unpredictable references. The problem here is that a user can guess another id. "uhmm... I have the resource number 1, let me check whether the resource number 2 exists". If you drop sequences and move to long random number this is very hard to do. The resource will become
.../superSensitiveStuff/195A23FR3548...32OT465
This is good because it's effective and cheap.
exploiting a mixed RBAC-ABAC approach. RBAC stands for Role Based Access Control and this is what you are using. The leading A of the second acronym stands for Attribute. This means that access is provided on the basis of a user role and an attribute. In this case is the userId, since it must be authenticated for accessing private resources. In few words, when a user requests a specific .../superSensitiveStuff resource it is loaded from the repository when you have the ownership information for that resource. It could be a DB, for example, and your SuperSensitiveStuff java business model could be like this
public class SuperSensitiveStuff {
private String userId;
private String secretStuff;
...
}
now, in your controller you can do the following
String principal = getPrincipal(); //you request the logged userId
SuperSensitiveStuff resource = myService.load(id); //you load the resource using the {id} in the request path
if (resource.getUserId.equals(principal))
return resource //200 ok, this is an authorized access
else
throw new EvilAttemptException() //401 unauthorized, cheater detected

Related

REST API Design: Path variable vs request body on UPDATE (Best practices)

When creating an UPDATE endpoint to change a resource, the ID should be set in the path variable and also in the request body.
Before updating a resource, I check if the resource exists, and if not, I would respond with 404 Not Found.
Now I ask myself which of the two information I should use and if I should check if both values are the same.
For example:
PUT /users/42
// request body
{
"id": 42,
"username": "user42"
}
You should PUT only the properties you can change into the request body and omit read-only properties. So you should check the id in the URI, because it is the only one that should exist in the message.
It is convenient to accept the "id" field in the payload. But you have to be sure it is the same as the path parameter. I solve this problem by setting the id field to the value of the path parameter (be sure to explain that in the Swagger of the API). In pseudo-code :
idParam = request.getPathParam("id");
object = request.getPayload();
object.id = idParam;
So all these calls are equivalent :
PUT /users/42 + {"id":"42", ...}
PUT /users/42 + {"id":"41", ...}
PUT /users/42 + {"id":null, ...}
PUT /users/42 + {...}
Why do you need the id both in URL and in the body? because now you have to validate that they are both the same or ignore one in any case. If it is a requirement for some reason, than pick which one is the one that is definitive and ignore the other one. If you don't have to have this strange duplication, than I'd say pass it in the body only
If you take a closer look at how HTTP works you might notice that the URI used to send a request to is also used as key for caching results. Any non-safe operation performed on that URI, such as POST, PUT, PATCH, will lead to (intermediary) caches automatically invalidating any stored responses for that URI. As such, if you use an other URI than the actual resource URI you are actually bypassing that feature and risk getting served outdated state from caches. As caching is one of the few constraints REST has simply skipping all caching via certain directives isn't ideal in first place.
In regards to including the ID of the resource or domain entity in the URI and/or in the payload: A common mistake in designing so-called REST APIs is that the domain object is mapped in a 1:1 manner onto a resource. We had a customer once who went through a merger and in a result they ended up with the same products being addressed by multiple IDs. In order to reduce the data in their DB they at one point tried to consolidate their data and continue. But they had to support still the old URIs they exposed for their products. In the end they realized that exposing the product ID via the URI wasn't ideal in their situation as it lead to plenty of downstream changes that affected their customers. As such, a recommendation here is to use UUIDs that don't give the target resource any semantic meaning and don't ever change. If the product ID in the back changes it doesn't affect the exposed URI at all. Sure, you might need a further table/collection to map from the product to the actual resource URI but you in the end designed your system with the eventuality of change which it now is more likely to coop with.
I've read so many times that the product ID shouldn't be part of the resource as it is already present in the URI. First, the whole URI is a unique identifier of that resource and not only a part of it. Next as mentioned above, IMO the product ID shouldn't be part of the URI in first place but it should be part of the resources' state. After all, the product ID is part of the products properties and therefore should be included there accordingly. As such, the media type exposed should contain all the necessities that a client is able to identify the product ID off the payload. The media type the resource's state is exchange with should also provide means to include the ID if you want to perform an update. I.e. if you take HTML as example, here you get served a HTML form by the server which basically teaches you where to send the request to, which HTTP operation to use, which media-type to marshal the request with and the actual properties of the resource, including the ones you are not meant to change. HTML does this i.e. via hidden input fields. Other form-based media types, such as HAL forms, JsonForms or Ion, might provide other mechanisms though.
So, to sum my post up:
Don't map the product ID onto URIs. Use a mapping from product ID to UUIDs instead
Use form-based media-types that support clients in creating requests. These media types should allow to include unmodifiable properties, such as hidden input fields and the like

How to model a CANCEL action in a RESTful way?

We are currently in the process of wrangling smaller services from our monoliths. Our domain is very similar to a ticketing system. We have decided to start with the cancellation process of the domain.
Our cancel service has as simple endpoint "Cancel" which takes in the id of the ticket. Internally, we retrieve the id, perform some operations related to cancel on it and update the state of the entity in the store. From the store's perspective the only difference between a cancelled ticket and a live ticket are a few properties.
From what I have read, PATCH seems to be the correct verb to be used in this case, as am updating only a simple property in the resource.
PATCH /api/tickets/{id}
Payload {isCancelled: true}
But isCancelled is not an actual property in the entity. Is it fair to send properties in the payload that are not part of the entity or should I think of some other form of modeling this request? I would not want to send the entire entity as part of the payload, since it is large.
I have considered creating a new resource CancelledTickets, but in our domain we would never have the need to a GET on cancelled tickets. Hence stayed away from having to create a new resource.
Exposing the GET interface of a resource is not compulsory.
For example, use
PUT /api/tickets/{id}/actions/cancel
to submit the cancellation request. I choose PUT since there would be no more than one cancellation request in effect.
Hope it be helpful.
Take a look what exactly is RESTful way. No matter if you send PATCH request with isCancelled as payload or even DELETE if you want tickets to disappear. It's still RESTful.
Your move depends on your needs. As you said
I have considered creating a new resource CancelledTickets, but in our
domain we would never have the need to a GET on cancelled tickets.
I would just send DELETE. You don't have to remove it physically. If it's possible to un-cancel, then implement isCancelled mechanism. It's just question of taste.
REST is basically a generalization of the browser based Web. Any concepts you apply for the Web can also be applied to REST.
So, how would you design a cancel activity in a Web page? You'd probably have a table row with certain activities like edit and delete outlined with icons and mouse-over text that on clicking invoke a URI on the server and lead to a followup state. You are not that much interested how the URI of that button might look like or if a PATCH or DELETE command is invoked in the back. You are just interested that the request is processed.
The same holds true if you want to perform the same via REST. Instead of images that hint the user that an edit or cancel activity is performed on an entry, a meaningful link-relation name should be used to hint the client about the possiblities. In your case this might be something like reserve new tickets, edit reservation or cancel reservation. Each link relation name is accompanied by a URL the client can invoke if he wants to perform one of the activities. The exact characters of the URI is not of importance here to the client also. Which method to invoke might either be provided already in the response (as further accompanying field) or via the media type the response was processed for. If neither the media type nor an accompanying field gives a hint on which HTTP operation to use an OPTIONS request may be issued on the URI beforehand. The rule of thumb here is, the server should teach a client on how to achieve something in the end.
By following such a concept you decouple a client from the API and make it robust to changes. Instead of a client generating a URI to invoke the client is fed by the server with possible URIs to invoke. If the server ever changes its iternal URI structure a client using one of the provided URIs will still be able to invoke the service as it simply used one of the URIs provided by the server. Which one to use is determined by analyzing the link relation name that hints the client when to invoke such URI. As mentioned above, such link relation names need to be defined somewhere. But this is exactly what Fielding claimed back in 2008 by:
A REST API should spend almost all of its descriptive effort in defining the media type(s) used for representing resources and driving application state, or in defining extended relation names and/or hypertext-enabled mark-up for existing standard media types. (Source)
Which HTTP operation to choose for canceling a ticket/reservation may depend on your desing. While some of the answers recommended DELETE RFC 7231 states that only the association between the URI and the resource is removed and no gurantee is given that the actual resource is also being removed here as well. If you design a system where a cancelation might be undone, then DELETE is not the right choice for you as the mapping of the URI to the resource should not exist further after handling a DELETE request. If you, however, consider a cancelation to also lead to a removal of the reservation then you MAY use DELETE.
If you model your resource in a way that maintains the state as property within the resource, PATCHing the resource might be a valid option. However, simply sending something like state=canceled is probably not enough here as PATCH is a calculation of steps done by the client in order to transform a certain resource (or multipe resources) into a desired target state. JSON Patch might give a clue on how this might be done. A further note needs to be done on the atomicy requirement PATCH has. Either all of the instructions succeed or none at all.
As also PUT was mentioned in one of the other answers. PUT has the semantics of replacing the current representation available at the given URI with the one given in the request' payload. The server is further allowed to either reject the content or transform it to a more suitable representation and also affect other resources as well, i.e. if they mimic a version history of the resource.
If neither of the above mentioned operations really satisfies your needs you should use POST as this is the all-purpose, swiss-army-knife toolkit of HTTP. While this operation is usually used to create new resources, it isn't limited to it. It should be used in any situation where the semantics of the other operations aren't applicable. According to the HTTP specification
The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics.
This is basically the get-free-out-of-jail card. Here you can literally process anything at the server according to your own rules. If you want to cancel or pause something, just do it.
I highly discourage to use GET for creating, altering or canceling/removing something. GET is a safe operation and gurantees any invoking client that it wont alter any state for the invoked resource on the server. Note that it might have minor side-effects, i.e. logging, though the actual state should be unaffected by an invocation. This is a property Web crawler rely on. They will simply invoke any URI via GET and learn the content of the received response. And I assume you don't want Google (or any other crawler) to cancel all of your reservations, or do you?
As mentioned above, which HTTP operation you should use depends on your design. DELETE should only be used if you are also going to remove the representation, eventhough the spec does not necessarily require this, but once the URI mapping to the resource is gone, you basically have no way to invoke this resource further (unless you have created a further URI mapping first, of course). If you designed your resource to keep the state within a property I'd probably go for PATCH but in general I'd basically opt for POST here as here you have all the choices at your hands.
I would suggest having a state resource.
This keeps things RESTful. You have your HTTP method acting as the verb. The state part of the URI is a noun. Then your request body is simple and consistent with the URI.
The only thing I don't like about this is the value of state requires documentation, meaning what states are there? This could be solved via the API by providing possible states on a meta resource or part of the ticket body in meta object.
PUT /api/tickets/:id/state
{state: "canceled"}
GET /api/meta/tickets/state
// returns
[
"canceled",
...
]
GET /api/tickets/:id
{
id: ...
meta: {
states: [
"canceled",
...
]
}
}
For modelling CANCEL Action in Restful way :
Suppose we have to delete a note in DB by providing noteId(Note's ID) and Note is a pojo
1] At controller :
#DeleteMapping(value="/delete/{noteId}")
public ResponseEntity<Note> deleteNote( #PathVariable Long noteId)
{
noteServiceImpl.deleteNote(noteId);
return ResponseEntity.ok().build();
}
2]Service Layer :
#Service
public class NoteServiceImpl {
#Autowired
private NotesRepository notesDao;
public void deleteNote(Long id) {
notesDao.delete(id);
}
}
3] Repository layer :
#Repository
public interface NotesRepository extends CrudRepository<Note, Long> {
}
and in 4] postman : http://localhost:8080/delete/1
So we have deleted note Id 1 from DB by CANCEL Action

Rest API: path for accessing derived data

It is not clear to me that if I have a micro service that is in place to provide some derived data how the rest api should be designed for this. For instance :-
If I have a customer and I want to access the customer I would define the API as:
/customer/1234
this would return everything we know about the customer
however if I want to provide a microservice that simply tells me if the customer was previously known to the system with another account number what do I do. I want this logic to be in the microservice but how do I define the API
customer/1234/previouslyKnow
customerPreviouslyKnown/1234
Both don't seem correct. In the first case it implies
customer/1234
could be used to get all the customer information but the microservice doesn't offer this.
Confused!
Adding some extra details for clarification.
I suppose my issue is, I don't really want a massive service which handles everything customer related. It would be better if there were lighter weight services that handles customer orders, customer info, customer history, customer status (live, lost, dead....).
It strikes me all of these would start with
/customer/XXXX
so would all the services be expected to provide a customer object back if only customer/XXXX was given with no extra in the path such as /orders
Also some of the data as mentioned isn't actually persisted anywhere it is derived and I want the logic of this hidden in a service and not in the calling code. So how is this requested and returned.
Doing microservices doesn't mean to have a separate artifact for each method. The rules of coupling and cohesion also apply to the microservices world. So if you can query several data all related to a customer, the related resources should probably belong to the same service.
So your resource would be /customers/{id}/previous-customer-numbers whereas /customers (plural!) is the list of customers, /customers/{id} is a single customer and /customers/{id}/previous-customer-numbers the list of customer numbers the customer previously had.
Try to think in resources, not operations. So returning the list of previously used customer numbers is better than returning just a boolean value. /customer/{id}/previous-accounts would be even better, I think...
Back to topic: If the value of previous-accounts is directly derived from the same data, i.e. you don't need to query a second database, etc. I would even recommend just adding the value to the customer representation:
{
"id": "1234",
"firstName": "John",
"lastName": "Doe",
"previouslyKnown": true,
"previousAccounts": [
{
"id": "987",
...
}
]
}
Whether the data is stored or derived shouldn't matter so the service client to it should not be visible on the boundary.
Adding another resource or even another service is unnecessary complexity and complexity kills you in the long run.
You mention other examples:
customer orders, customer info, customer history, customer status (live, lost, dead....)
Orders is clearly different from customer data so it should reside in a separate service. An order typically also has an order id which is globally unique. So there is the resource /orders/{orderId}. Retrieving orders by customer id is also possible:
/orders;customer={customerId}
which reads give me the list of orders for which the customer is identified by the given customer id.
These parameters which filter a list-like rest resource are called matrix parameters. You can also use a query parameter: /orders?customer={customerId} This is also quite common but a matrix parameter has the advantage that it clearly belongs to a specific part of the URL. Consider the following:
/orders;customer=1234/notifications
This would return the list of notifications belonging to the orders of the customer with the id 1234.
With a query parameter it would look like this:
/orders/notifications?customer=1234
It is not clear from the URL that the orders are filtered and not the notifications.
The drawback is that framework support for matrix parameters is varying. Some support them, some don't.
I'd like matrix parameters best here but a query parameter is OK, too.
Going back to your list:
customer orders, customer info, customer history, customer status (live, lost, dead....)
Customer info and customer status most likely belong to the same service (customer core data or the like) or even the same resource. Customer history can also go there. I would place it there as long as there isn't a reason to think of it separately. Maybe customer history is such a complicated domain (and it surely can be) that it's worth a separate service: /customer-history/{id} or maybe just /customer/{id}.
It's no problem that different services use the same paths for providing different information about one customer. They are different services and they have different endpoints so there is no collision whatsoever. Ideally you even have a DNS alias pointing to the corresponding service:
https://customer-core-data.service.lan/customers/1234
https://customer-history.service.lan/customers/1234
I'm not sure if I really understand your question. However, let me show how you can check if a certain resource exist in your server.
Consider the server provides a URL that locates a certain resource (in this situation, the URL locates a customer with the identifier 1): http://example.org/api/customers/1.
When a client perform a GET request to this URL, the client can expect the following results (there may be other situation, like authentication/authorization problems, but let's keep it simple):
If a customer with the identifier 1 exists, the client is supposed to receive a response with the status code 200 and a representation of the resource (for example, a JSON or XML representing the customer) in the response payload.
If the customer with the identifier 1 do not exist, the client is supposed to receive a response with the status code 404.
To check whether a resource exists or not, the client doesn't need the resource representation (the JSON or XML that represents the customer). What's relevant here is the status code: 200 when the resource exists and 404 when the resource do not exist. Besides GET requests, the URL that locates a customer (http://example.org/api/customers/1) could also handle HEAD requests. The HEAD method is identical to the GET method, but the server won't send the resource representation in HEAD requests. Hence, it's useful to check whether a resource exists or not.
See more details regarding the HEAD method:
4.3.2. HEAD
The HEAD method is identical to GET except that the server MUST NOT
send a message body in the response (i.e., the response terminates at
the end of the header section). The server SHOULD send the same
header fields in response to a HEAD request as it would have sent if
the request had been a GET, except that the payload header fields MAY be omitted. This method can be used for obtaining
metadata about the selected representation without transferring the
representation data and is often used for testing hypertext links for
validity, accessibility, and recent modification. [...]
If the difference between resource and resource representation is not clear, please check this answer.
One thing I want to add to the already great answers is: URLS design doesn't really matter that much if you do REST correctly.
One of the important tenets of REST is that urls are discovered. A client that has the customers's information already, and wants to find out what the "previously known" information, should just be able to discover that url on the main customer resource. If it links from there to the "previously known" information, it doesn't matter if the url is on a different domain, path, or even protocol.
So if you application naturally makes more sense if "previouslyKnown" is on a separate base path, then maybe you should just go for that.

REST API with segmented/path ID

I am trying to design a REST API for a system where the resources are essentially identified by path-like addresses with varying numbers of segments. For example, a "Schema" resource could be represented on the file system as follows:
/Resources/Schemas/MyFolder2/MyFolder5/MySchema27
The file-system path /Resources/Schemas/ is the root folder for all Schemas, and everything below this is entirely user defined (as far as folder depth and folder naming). So, in the example above, the particular Schema would be uniquely identified by the following address (since "MySchema27" by itself would not necessarily be unique):
/MyFolder2/MyFolder5/MySchema27
What would be the best way to refer to a resource like this in a REST API?
If I have a /schemas collection my REST URL could be:
/schemas/MyFolder2/MyFolder5/MySchema27
Would that be a reasonable approach? Are there better ways of handling this?
I could, potentially, do a 2-step approach where the client would first have to search for a Schema using the Schema address (in URL parameters or in the request body), which would then return a unique ID that could then be used with a more traditional /schemas/{id} design. Not sure that I like that, though, especially since it would mean tracking a separate ID for each resource. Thoughts? Thanks.
The usual way to add a resource to your "folder" /Resources/Schemas/ is to make a POST request on it with the body of this POST request containing a representation of the resource to add, then the server will take care of finding the next free {id} and and setting the new resource to /Resources/Schemas/{id}.
Another approach is to, as you said, make a GET request on /Resources/Schemas/new which would return the next free {id}, and then, make a second request PUT directly on /Resources/Schemas/{id}. However this second approach is not as secure as the first since two simultaneous request could lead to the same new {id} returned and so the second PUT would erase the first. You can secure this with some sort of reservation mechanism.
This is called as Resource Based URI approach for building REST services . Follow these wonderful set of video tutorials to understand more about them and learn how to implement too . https://javabrains.io/courses/javaee_jaxrs

Resftul API structure, Is userID expected in route if already derived from session token?

I am currently planning a web API based on the principles of REST. I am using a session token to correct identify what user is making a request (after authentication of course), then determining if that user has access to the given resource.
Assuming the user making the request has a userID of 7, and I am wanting to retrieve a list of only the presentations that he can access, would best/proper practice be to:
1. Include my userID in the route, such as:
localhost:55555/api/users/7/presentations
or
2. Not include userID, such as:
localhost:55555/api/presentations
Each presentation can be accessed by any number of other users. For this reason I am leaning towards option 2 but would like to know what others think before I finalize the structure.
A very common pattern for REST APIs is to have both:
a list resource with optional parameters like /presentation/?by=Alice&since=2013-1-1,
object resources like /presentation/0AFF56E7.
For presentations, I wouldn't use a composite ID containing the user ID, since it doesn't seem really needed and it would prevent future features like changing the "owner" of the presentation (without changing its ID).