What would be the best approach to implement a GET REST API in order to check if a given URL existed in the database.
Each GET request will have the following parts : hostname, port, origin, path, and query.
My idea is to setup the resource as follows.
/urlservice/1/{hostname}/{port}/{origin}/{path}/{query}
But this seems very verbose since it will results in resource urls like:
/urlservice/1/google.com/80/"https://google.com/"/"/search"/"?q=aba"
What is a better way of designing this?
The main caveat with REST when designing your URI structure is that you follow the URI spec. That being said, looking at the URI spec in regards to the structure it has a couple important notes to help with your question:
1.2.3. Hierarchical Identifiers
The URI syntax is organized hierarchically, with components listed in
order of decreasing significance from left to right. For some URI
schemes, the visible hierarchy is limited to the scheme itself:
everything after the scheme component delimiter (":") is considered
opaque to URI processing. Other URI schemes make the hierarchy
explicit and visible to generic parsing algorithms.
The generic syntax uses the slash ("/"), question mark ("?"), and
number sign ("#") characters to delimit components that are
significant to the generic parser's hierarchical interpretation of an
identifier...
Now in regards to the query string:
3.4. Query
The query component contains non-hierarchical data that, along with
data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI...
With the above, looking at your URI you need to determine whether your structure is hierarchical or not to follow the URI spec to satisfy REST. Of course, there is a bit of subjectivity here but looking at what you have, most (if not all) of the parameters you called out look like candidates for use in a query string is as it is non-hierarchical. To that end, I'd recommend moving them to the query string.
/urlservice/1?hostname={hostname}&port={port}&origin={origin}&path={path}&query={query}
Again, as there is some subjectivity and you know your domain better than others, use the above guidance and make your best judgement call.
You could decompose it to be service based rather than specifying everything in the request:
/urlservice/1/{service}/{request}
The services would be 'service based' so a google search service would know how to build a google search url:
/urlservice/1/google/aba
would be resolved to:
https://google.com/search/?q=aba
by the google service. It also means all clients wouldn't have to change if google changed their service parameters. Only the google service would change its internal implementation of the url builder.
Related
I'm learning REST and I have a question.
Is there a scenario where the endpoint person/pathParm1/PathParam2 is legitimate?
For example:
person/ben/stiller
people /2/4
As far as I understand REST, query parameters should be used for searches:
person?firstName=ben&secondName=stiller
or
person/2/order4
REST doesn't care what spelling conventions you use for your resource identifiers.
So if you want to have a URI template with multiple variables to expand, and more than one of those variables are expanded as path segments, that's fine.
For example, you'll notice that your browser has no trouble with this resource identifier:
https://stackoverflow.com/questions/74969638/endpoint-with-two-path-parameters
which might reasonably be produced by expanding variables into a template like
https://stackoverflow.com/questions/{id}/{hint}
As far as I understand REST, query parameters should be used for searches:
That's not a REST constraint, although for the special case of the web it turned out that way. This is primarily a historical accident: we didn't have standards for URI templates when the web was young, which meant that searches came about from the standardized implementation of HTML form submissions (application/x-www-form-urlencoded key value parameters replacing the query part of the form action)
REST does say that we use resource identifiers to... identify resources; and that we all use the same general purpose resources (ie: conforming to the production rules defined in RFC 3986), but without constraints on the spelling or semantics of those identifiers.
Example: URL shorteners work.
(Note: your misunderstanding is a common one, and not at all your fault; the literature sucks. FWIW, I was once where you are; Stefan Tilkov's 2014 talk was the one that really got my own thinking straightened out.)
That said, you might find a "query parameters should be used for searches" constraint coming from somewhere else; a local style guide, for example.
this means I could also make a restful endpoint like this: api/person/{firstName}/{lastName} instead api/person?firstName=ben&lastName=stiller ?
Yes; you can use either of those spellings for your resource identifiers, and all of the general purpose components out there will still "just work" -- because they are treating the resource identifier as semantically opaque.
I've come across this curious scenario while writing tests + documentation for a REST API I am developing. According to this REST tutorial, a key abstraction to exploit in a RESTful API is the concept of a resource, and a common pattern is to have resources which themselves contain resources of their own. Additionally, returning 404 for an ID'd resource that does not actually exist is just as much of a common pattern.
My questions comes from the fact that a 404 response code can be ambiguous considering the hierarchical nature of a REST API.
For example, assume the data layer our REST API interacts with has the following data:
{
"users": {
"foo": {
"notes": {
"hello": "world"
}
}
}
}
Calls to our REST API that return 200 imply that all resources in the path exist:
GET /users/foo returns 200 because the user foo exists.
GET /users/foo/notes returns 200 for the same reason.
GET /users/foo/notes/hello returns 200 because both the user foo and a note named hello belonging to foo both exist.
There are even expected 404 response codes for particular paths:
GET /users/bar returns 404. That is nonambiguous since the 404 only refers to one resource.
GET /users/bar/notes returns 404. This is just as unambiguous (assuming the API does not return 404 for nonexistent paths).
But consider that the following return 404 for different and ambiguous reasons:
GET /users/bar/notes/baz returns 404 because the user bar does not exists.
GET /users/foo/notes/baz returns 404 because the existing user foo does not have a baz note.
In short, the 404s returned do not inform the client what exactly failed to be found: the user or the note. So my question is as follows:
Is it the responsibility of the server to be nonambiguous with 404 response codes? And if so, how should it differentiate to the client the nonexistence of a user versus the nonexistence of a user's note?
Is it the responsibility of the server to be nonambiguous with 404 response codes? And if so, how should it differentiate to the client the nonexistence of a user versus the nonexistence of a user's note?
By providing a "a representation containing an explanation of the error situation, and whether it is a temporary or permanent condition" as described in RFC 7231.
In other words, put the explanatory details into the document that you include in the HTTP response.
It may help to think more carefully about how all this works with web pages.
The status code is metadata in the transfer of documents over a network domain. The intended audience for that information is the web browser (and other general purpose components - spiders, caches, and so on). It's provided so that your browser (and other general purpose components) can correctly interpret the semantics of the response.
The audience for the "representation of the error" is the human being using the web browser. That's the place where one would provide, for example, information about what specifically has gone wrong, or what corrective actions might be taken.
In modern days, it is often the case that we are expecting bespoke machine clients, rather than humans, to be looking at the "web browser". Free form text or free form text marked up with hypermedia controls aren't likely to be useful. So we probably want to use problem details - a standardized schema for reporting problems.
One difficulty you may be having (not your fault; the literature sucks) is recognizing that identifiers are semantically opaque. /users/foo/notes/baz does not, generally, have any dependency on /users/foo/notes or any of the other prefixes. Nor does the identifier mean that /users/foo/notes/baz has four different parts that need to be satisfied.
Identifiers should be understood like keys into a map/dictionary - 200 means that the key exists in the map, 404 means the key doesn't exist in the map. But that doesn't actually tell you anything about the presence or absence of other keys with similar spellings!
Is your API, which conventionally organizes its resource model into a hierarchy, and chooses identifiers that are closely aligned with that hierarchy, "better" than an API that uses an unconventional resource model and arbitrary identifiers? Probably.
But good resource models and good identifier spelling conventions are not a REST constraint, and the HTTP and URI specifications also support designs that don't follow the current conventions (among other things, backwards compatibility is really important to REST and the web; REST and the web predate these spelling conventions by quite a bit).
(Analogy: we have coding conventions that describe "best practices" around ideas like variable naming and function naming because we use languages that don't restrict us to using "good" names. The machines don't care.)
Using expressjs term route parameters to show my problem, I also see people call that path parameters. The "proper" URL will be
Route path: /users/:userId/books/:bookId
But currently I am taking over a project that design the api like this,
/:userId/:bookId
/:groupId/:userId/some_resurce
...
The obvious problem is when I look at the url from browser I will feel confused with what those parameters mean, like the following. But the project has run for more than one year, I need to know whether it is worth the effort to rewrite it.
So is there other problem with the URL like these ?
So is there other problem with the URL like these ?
They might be making extra work for your operators when reading the access logs?
REST doesn't care about URI spelling conventions - until you get to the origin server, a URI is effectively an opaque string; only the origin server has the authority to decompose the URI into its semantic parts.
Which is to say, general purpose components don't care that there are identifiers encoded into the path, or that the semantics of those identifiers changes depending on other path elements.
In particular, they don't care at all that unrelated identifiers have common elements:
/1/2
/1/2/some_resource
As far as a general purpose component is concerned, the resources identified here have no special relationship to one another. (For example, if you DELETE /1/2, that's not expected to impact /1/2/some_resource in any way).
when I look at the url from browser I will feel confused with what those parameters mean
Yup - this is your primary argument: that the current URI design doesn't consider human affordances.
Unless you can make a case that those human focused considerations (users, operators, tech writers) offset the costs of change, you are probably stuck with it.
For getting the latest valid address (of the logged in user), how RESTful is the following URL?
GET /addresses/valid/latest
Probably
GET /addresses?valid=true&limit=1
is the best, but it should then return a list. And, I'd like to return an object rather then a list.
Any other suggestions?
Your url structure doesn't have much to do with how RESTful something is.
So lets assume which one is the 'best'. Also a bit hard to say, pretty subjective.
I would generally avoid a pattern like /addresses/valid/latest. This kinda suggest that there is a 'latest resource' in the 'valid collection', in the 'addresses collection'.
So I like your other suggestion a bit better, because it suggests that you're using an 'addresses' collection, filtering by valid items and only showing 1.
If you don't want all kinds of parameters, I would be more inclined to find a url pattern that's not literally 'addresses, but only the valid, but only the latest', but think about what the purpose is of the endpoint. Maybe something that's easier to remember like /fresh-address =)
how RESTful is the following URL?
Any identifier that satisfies the production rules described by RFC 3986 is RESTful.
General purpose components are not supposed to derive semantics from identifiers, they are opaque. Which means that the server is free to encode information into those identifiers at its own discretion.
Consider Google search: does your browser care what URI is used as the target of the search form? Does your browser care about the href provided by Google with each search result? In both cases, the browser just does what it is told, which is to say it creates an HTTP request based on the representation of application state that was provided by the server.
URI are in the same broad category as variable names in a programming language - the machines don't care so long as the spellings are consistent with some simple constraints. People care, so there are some benefits to having a locally consistent and logical scheme.
But there are contexts in which easily guessed URI are not what you want. See Mark Seemann 2013.
Since the semantic content of the URI is reserved for use by the server only, it follows that the server can choose to encode that information into path segments or the query part. Or both.
Spellings that can be described by a URI Template can be very powerful. The most familiar URI template is probably an HTML form using the GET method, which encodes key value pairs onto the query part of the URI; so you should think about whether that's a use case you want to support.
I want the following URLs:
Get me players with names John and age 30
Get me players with names John or Mary
So thinking:
/players?names=John&age=30
But the second one. The "or" bit making me thing
- /players?names=John&name=Mary
is wrong
and it should be:
- /players?names=John,Mary
Note: I get the idea that pure REST doesn't care as URLs are supposed to be opaque but just thinking more API or pragmatic REST here.
REST doesn't care what spelling you use for your identifiers, so long as the spelling conforms to the production rules defined by RFC 3986.
Level 4 URI Templates support behaviors for encoding/extracting lists into a URI. They can be made to work pretty much the way you illustrate in your second example /players?names=John,Mary, with some additional magic required to handle cases where the delimiters and reserved characters appear in the data.
You'll also see examples where a particular query name is repeated more than once: /players?name=John&name=Mary; JAX-RS is an example where this sort of spelling can be extracted into a List or a Set.
In some cases, you'll find examples where brackets [] are used to identify array parameters: /players?name[]=John&name[]=Mary. That's not RFC 3986 compliant, but is based on an older specification that is now obsolete.
This is mainly decided by how you handle the request in the server. Please be aware that many frameworks will consider /players?names=John,Mary to be a single element and not necessarily a list if you do not setup a matching URI pattern (e.g. take a look at this discussion regarding micronaut).
Nothing is stopping you from using the array param and treat it as OR. Just make sure you API documentation is clear.