Informative vs unique generated ID in REST API

Informative vs unique generated ID in REST API - rest

Designing a RESTful API. I have two ways of identifying resources (person data). Either by the unique ID generated by the database, or by a social security number (SSN), entered for each person. The SSN is supposedly unique, though can be changed.
Using the ID would be most convenient for me, since it is guaranteed to be unique, and does not change. Hence the URL for the resource, also always stays the same:
GET /persons/12
{
"name": Morgan
"ssn": "840212-3312"
}
The argument for using SSN, is that it is more informative and understandable by API clients. SSN is also used more in surrounding systems:
GET /persons/840212-3321
{
"name": Morgan
"id": "12"
}
So the question is: Should I go with the first approach, and avoid some implementation headaches where the SSN may change. And maybe provide some helper method that converts from SSN to ID?
Or go with the second approach. Providing a more informative API. Though having to deal with some not so RESTful strangeness where URL:s might change due to SSN changes?

URL design is a personal choice. But to give you some more examples which differ from those Ray has already provided, I will give you some of my own.
I have a user account resource and allow access via both URIs:
/users/12
and
/users/morgan
where the numerical value is an auto_incremented ID, and the alphabetic value is a unique username on the system specified by the user. these resources are uncachable so I do not bother about canonicalisation, however the /users page links to the alphabetic forms.
No other resources on my system have two unique fields, so are referred to by IDs, /jobs/123, /quotations/456.
As you can see, I prefer plural URI segments ;-)
I think of "job 123" as being from the "jobs" collection, so it seems logical to have a "jobs" resource, with subresources for each job.
You do not need to have a separate /search/ area for performing searches, I think it would be cleaner to apply your search criteria to the collection resource directly:
/people?ssn=123456-7890 (people with SSN matching/containing "123456-7890")
/people?name=morgan (people who's name is/contains "Morgan")
I have something similar, but use only the first letter as a filter:
/sites?alpha=f
Lists all sites beginning with F. You can think of it as a filter, or as a search criteria, those terms are just different sides of the same coin.

Good to see someone taking time to think about their Resource urls!
I would make a Url with the unique id to provide resource to a single user. Like:
http://api.mysite.com/person/12/
Where 12 is your unique ID. Note that I also prefer the singular 'person'....
Regardless, the url should return:
{
"ssn": "840212-3312"
"name": "Morgan"
"id": "12"
}
However, I would also create a general search URL that returns a list of users that match the parameters (either a json array or whatever format you need). You can specify search parameters as get params like this:
http://api.mysite.com/person/search/?ssn=840212-3312
Or
http://api.mysite.com/person/search/?name=Morgan
These would return something like this for a single search hit--note it's an array, not a single item like the unique id url that points directly to a single user.
[{
"ssn": "840212-3312"
"name": "Morgan"
"id": "12"
}]
This search could then be later augmented for other search criteria. You might only return the unique id's via the search Url--you could always make a request to the unique id url once you've got it from the search...

I would suggest that you use neither. Generate resource IDs that are unique both to a single user of your API and across all other resources (including other users' resources).
Using the unique database ID is not ideal for a couple of reasons. First, API resources and database records won't necessarily always be 1-to-1 even if you have designed it that way today. Second, you might change to a different data store that would generate different format unique ids.
Also, it is good practice to separate out the ID from other resource properties, such as SSN (as an aside I hope you are storing SSN in a very secure manner, but that's another topic). If for whatever reason an SSN changed, more than one API resource was associated with the same SSN, or you decide that piece of data is not needed someday, you don't want to have to change the ID.
One pattern is to prepend the unique ID with a few characters that indicate the resource type. For example if User is a resource type in you API, a generated unique ID would be something like USR56382.

RESTful API is an architectural style which emphasizes on resource centric design approach.
In my opinion, I would keep the resources as plural and noun format.
Every resource, for example, customers has following uniform interfaces
POST /customers - for creating a resource instance
PUT /customers/{customerId} - for updating a particular instance
GET /customers - is for search customers. So #Ray, search is not required to be part of URI itself. Any filter or query parameters that need to be supported should be there itself.
GET /customers/{customerId} - to retrieve a particular instance of customer
DELETE /customers/{customerId} to delete a particular instance
The reason why plural, it is because it behaves as a factory. For example, when u r trying to create a new instance of a resource, the instance does not exist and therefore, it cannot be on the self instance. Hence, singularity is not used.
It also goes hand-in-hand for search/inquiry, where you do not know or hold the actual instance of resource. Hence, the plural form is much recommended.
Now, the question is what to use for a resource id - a database primary key, a generated identifier, or an encrypted token.
In my opinion, database primary keys should not be exposed. Resource identifier should not be designed 1-1 with DB primary key. But, it happens a lot. A generated UUID based key is much more recommended to avoid any sequential follow-through attack but world is not ideal always.
Coming back to token or an encrypted token, is a recommended approach for sensitive APIs, and where data exchange is performed between two separate applications. If we are using it, the encryption/decryption should be solely at the API end. That means, the encrypted keys for sub-resources should be returned as part of parent API response, otherwise it defeats the purpose.

Related

REST API: Resource hierarchy and multiple URI

I work with a banking database, which is structured like this:
Table Primary Key Unique Keys Foreign Keys
-------------------------------------------------------------
BANK ID BIC
CUSTOMER ID CUSTNO, PASS, CARD BANK
ACCOUNT ID IBAN BANK, CUSTOMER
I want to design a clean REST API, but I run into following problems:
Should I put resources in a hierarchy, or rather flat? The problem with the hierarchy might be that the client only knows the ACCOUNT ID, but does not know the CUSTOMER ID, so how is he supposed to get the resource?
/banks/{id}/customers/{id}/accounts{id}
or
/banks/{id}
/customers/{id}
/accounts{id}
The primary key in each table is the database ID. It is an internal ID and has no business meaning. Is it correct to use it as the default URI of the resource?
Each object has its own set of unique keys. For example, CUSTOMER can be identified by his CUSTNO, PASS or CARD. Each client only has a subset of these keys. Should I define a sub-resource per key or provide a lookup service that will give the proper URI back?
/customers/id/{id}
/customers/custno/{custno}
/customers/pass/{pass}
/customers/card/{card}
or
/lookup/customer?keyType=card&keyValue=AB-303555
(gives back customer {id})
I am asking what is the truly RESTful way, what is best practice. I haven't found proper answers yet.

I am asking what is the truly RESTful way, what is best practice.
REST doesn't care what spellings you use for your identifiers.
/ef726381-dd43-4017-9778-83cee2bbbd93
is a perfectly RESTful URI, suitable for any use case.
Outside of some purely mechanical concerns, general purpose consumers treat a URI as a single opaque unit. There's no notion of a consumer extracting semantic information from the URI -- which means that any information encoded into the identifier is done at the server's discretion and for its use alone.
For cases where information known to the client needs to be included in the target-uri of the request, we have URI Templates, which are a sort of generalization of a GET form in HTML. So a way to think about your problem is to consider what information the client has, and how they would put that information into a form.
HTML's form processing rules are pretty limiting -- general URI templates have fewer constraints.
/customers/id/{id}
/customers/custno/{custno}
/customers/pass/{pass}
/customers/card/{card}
Having multiple resources sharing common information is normal in REST -- your resource model is not your data model. So this could be fine. It's even OK to have multiple resources that share representations. You could have them stand alone, or you could have them share a Content-Location, or a canonical link relation, or you could simply have those resources redirect to the canonical resource.
It's all good.
So you mean if a UUID can be a valid URI, then a table autonumber key can be too?
Yes, exactly.
Note that if you want the lifetime of the URI to extend beyond the lifetime of your current implementation, then you need to design your identifiers with that constraint in mind. See Cool URIs Don't Change.
The clients don't care what the URI is, they just want the link to work again when they need it.

Good URL syntax for a GET request with a composite key

Let's take the following resource in my REST API:
GET `http://api/v1/user/users/{id}`
In normal circumstances I would use this like so:
GET `http://api/v1/user/users/aabc`
Where aabc is the user id.
There are times, however, when I have had to design my REST API in a way that some extra information is passed with the ID. For example:
GET `http://api/v1/user/users/customer:1`
Where customer:1 denotes I am using an id from the customer domain to lookup the user and that id is 1.
I now have a scenario where the identifier is more than one key (a composite key). For example:
GET `http://api/v1/user/users/customer:1;type:agent`
My question: in the above URL, what should I use as the separator between customer:1 and type:agent?
According to https://www.ietf.org/rfc/rfc3986.txt I believe that the semi-colon is not allowed.

You should either:
Use parameters:
GET http://api/v1/user/users?customer=1
Or use a new URL:
GET http://api/v1/user/users/customer/1
But use Standards like this
("Paths tend to be cached, parameters tend to not be, as a general rule.")

Instead of trying to create a general structure for accessing records via multiple keys at once, I would suggest trying to think of this on more of a case-by-case basis.
To take your example, one way to interpret it is that you have multiple customers, and those customers each may have multiple user accounts. A natural hierarchy for this would be:
/customer/x/user/y
Often an elegant decision like this can be made, that not only solves the problem but also documents your data-model in a way that someone can easily see that users belong to customers via a 1-to-many relationship.

Designing REST end-point(s) for GET request supporting different IDs

I seek suggestions regarding designing an API endpoint.
I have a table (resource) with id (PK) and some other ids, which are not unique but have not-null constraints.
Now for designing this:
For the PK search /<resourceName>/{id}
Non-PK search
2.1 /<resourceName>/someOtherIdName/{someOtherId} - using path param, distinct for different IDs
2.2 or /<resourceName>?<nameOfId>=<value> - using query param
For 2nd one, which way is better? If I use 2.2, then multiple IDs can be supported but it becomes convoluted, as I have to check the nameOfId. And what about 2.1?
Edit: For example, take transactions to be a resource, and txn_id as primary key, and txn_event_id and txn_activity_id as other IDs. The last two ids can represent a group of related transactions. Does 2.2 suits for the last two IDs?
In case of 2.1, the implementation looks like:
#Path("/transactions")
class TransactionResource {
#Path("/eventid/{event_id}")
public List getTxnWithEventId(#PathParam("event_id") String eventId) {
// do a "event_id" based search
}
#Path("/activityid/{activity_id}")
public List getTxnWithActivityId(#PathParam("activity_id") String txnActivityId) {
// do a "pin" based search
}
}
In case of 2.2, the implementation becomes something like:
#Path("/transactions")
class TransactionResource {
public List getTxnsWithAnotherId(#QueryParam("searchKey") String id,
#QueryParam("searchValue") String value) {
if("event_id".equals(id)) // do a "event_id" based search
else if("activity_id".equals(id)) // do a "activity_id" based search
else return null;
}
}
In my opinion, the 2nd option feels better for searches but why not the former if thats true?

I think it all comes down to the developer's preference. I would not go with either of the options you listed. My approach would be collectionId/resourceId/collectionId/resourceId. So in your case, it would be something like users/1/messages to get all messages of a specific user of users/1/messages/1 to get message with id of 1 for that specific user. That way, you create clearer API endpoints which can be routed more efficiently in your app and can be better documented and managed.
Have a look at how Google's API Design Guide approach this subject for their Gmail resource model:
A collection of users: users/*. Each user has the following resources.
A collection of messages: users/*/messages/*.
A collection of threads: users/*/threads/*.
A collection of labels: users/*/labels/*.
A collection of change history: users/*/history/*.
A resource representing the user profile: users/*/profile.
A resource representing user settings: users/*/settings.

For 2nd one, which way is better?
Either of these is fine for most use cases
/<resourceName>?<nameOfId>=<value>
/<resourceName>/<nameOfId>/<value>
Tomato, tomato.
One reason that you might care about the difference is in the use of relative resolution and dot segments. Dot segments are useful for traversing the hierarchical portion of the URI, which is to say the path segments.
Another reason that you might care is that the query part of a URI has not always been understood to be part of the identifier. Old versions of the HTTP spec described exceptions to the caching rules when the query part was present. In the current standard, it shouldn't make a difference.
If you are struggling with readability of the URI with data encoded into the path segments, there are a number of spelling conventions that may help -- many derive ideas from TBL's work on Matrix URIs. If your clients and servers have access to decent URI Template implementations, then a lot of the work has already been done for you.

I am not sure what your resources are specifically but here are some tips that you can keep in mind while designing RESTful APIs
Identify what the primary resource is.
For example: employees
In your first case, you'd then access employees as
GET /employees. To get all employees.
GET /employees/1. Get a specific employee with ID 1.
Search is specific to your needs. If you need to fetch multiple employees based on IDs, you could do
GET /employees?id=1,2,3,4
Alternately if you find that you will need to search based on more than one parameter, I'd recommend a POST
POST /employees/search
{
id: [1,2,3,4],
department: "computer-science"
}

REST Resources, should they be client specific, server specific or both?

I have a use case where for a same entity, there are 2 identifiers, and each of them can map to the entity if used separately. 1 identifier is client friendly (say c_id), and the other is server friendly (say s_id). Clients do know the s_id, but in most cases they wont use it. And server knows both the ids, but the implementation on server is such that every thing is easily mapped using s_id.
In such a case, is it a good practice to provide resources on both c_id & s_id level, where the resource name and id (in input) will differ and will do the same thing, or should it be only a single resource, which also leads to the debate that which resource should be used.

I would normally have the resources existing under their server identities, and then provide search methods that return 302 redirects (or 307 if you prefer) to the appropriate resources. The clients would use these search/query methods to arrive at the correct resources.
Because the server "owns" the resources (and their URLs), it's fine for it to arrive at the resources by URL-fiddling. Whereas, because the clients shouldn't engage in URL-fiddling to find resources, giving them a query method where they can specify the ID(s) they know as query string parameters feels cleaner.

Your API exists for the benefit of its users. You should always strive to make it as easy as possible for them. If unique client-friendly IDs exist for all resources, then make your API work against the client-friendly ID. Your API can store a map from clientId to serverId in memory and easily switch to the ID it's more comfortable with.
If there aren't unique client-friendly IDs for all resources, you've got a pickle. I'd start by trying to close the gap (ie give the remaining resources clientIds). If that's not an option, then I agree with Damien.

What I often do is create the canonical URI as the one with the s_id, so something like,
http://example.org/api/foo/{s_id}
and then provide a search facility that allows searching by c_id, and potentially other attributes.
http://examples.org/api/foos?c_id={c_id}
This can return a list of foos, each with a link to the s_id URI. Or if, as in your case, only one foo returns, then you could either redirect to the canonical URI, or you can actually return the full foo representation and set the Content-Location header to the canonical URI.

RESTful url to GET resource by different fields

Simple question I'm having trouble finding an answer to..
If I have a REST web service, and my design is not using url parameters, how can I specify two different keys to return the same resource by?
Example
I want (and have already implemented)
/Person/{ID}
which returns a person as expected.
Now I also want
/Person/{Name}
which returns a person by name.
Is this the correct RESTful format? Or is it something like:
/Person/Name/{Name}

You should only use one URI to refer to a single resource. Having multiple URIs will only cause confusion. In your example, confusion would arise due to two people having the same name. Which person resource are they referring to then?
That said, you can have multiple URIs refer to a single resource, but for anything other than the "true" URI you should simply redirect the client to the right place using a status code of 301 - Moved Permanently.
Personally, I would never implement a multi-ID scheme or redirection to support it. Pick a single identification scheme and stick with it. The users of your API will thank you.
What you really need to build is a query API, so focus on how you would implement something like a /personFinder resource which could take a name as a parameter and return potentially multiple matching /person/{ID} URIs in the response.

I guess technically you could have both URI's point to the same resource (perhaps with one of them as the canonical resource) but I think you wouldn't want to do this from an implementation perspective. What if there is an overlap between IDs and names?
It sure does seem like a good place to use query parameters, but if you insist on not doing so, perhaps you could do
person/{ID}
and
personByName/{Name}

I generally agree with this answer that for clarity and consistency it'd be best to avoid multiple ids pointing to the same entity.
Sometimes however, such a situation arises naturally. An example I work with is Polish companies, which can be identified by their tax id ('NIP' number) or by their national business registry id ('KRS' number).
In such case, I think one should first add the secondary id as a criterion to the search endpoint. Thus users will be able to "translate" between secondary id and primary id.
However, if users still keep insisting on being able to retrieve an entity directly by the secondary id (as we experienced), one other possibility is to provide a "secret" URL, not described in the documentation, performing such an operation. This can be given to users who made the effort to ask for it, and the potential ambiguity and confusion is then on them, if they decide to use it, not on everyone reading the documentation.
In terms of ambiguity and confusion for the API maintainer, I think this can be kept reasonably minimal with a helper function to immediately detect and translate the secondary id to primary id at the beginning of each relevant API endpoint.
It obviously matters much less than normal what scheme is chosen for the secret URL.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse