I have a couple of microservices and would like to be able to see every endpoint that a specific API call goes through. In essence, given a requestID, I should be able to generate a sequence diagram of its journey.
Research suggests that I need to attach a UUID to a every request, then I can log the request to ELK wherever I am interested. Seems logical.
My concerns:
How do you guarantee that some intermediate service or function does not strip or change this requestID?
Is it a good idea to generate the ID in the client or the API gateway?
Where would you keep this requestID: header? body? url parameter?
Rather than hypothetical recommendations (which I have already considered), I would appreciate real-world experience from someone who has done this. Thanks.
Related
I want to design a RESTful API for a website scraping service. A user delegates a task to the service. Each task is a website that has to be scaped. User can check tasks' statuses. When a task is done a user can fetch a task result.
The status can be either "Waiting", "In progress" or "Done", when it is done a user can get a data.
What I have now is:
POST /tasks - post a URL to scrape
GET /tasks - returns a list of tasks
I need two more endpoints: one to get a status of a task and one to get scraped data from a website. How should GET look like?
GET /tasks/{id} - return a status? Or return the data?
Or maybe
GET /tasks/{id}/status
GET /tasks/{id}/data
But what would return /tasks/{id}/ then?
And what if I would also like to present scapped data as html?
Should I use
GET /tasks/{id}/data or GET /tasks/{id}/result
POST /tasks - post a URL to scrape
GET /tasks - returns a list of tasks
That's good. Notice that when you POST successfully, cache invalidation kicks in. Generic clients will know that the previously returned representation(s) of the list of tasks is no longer valid.
GET /tasks/{id} - return a status? Or return the data?
Why not both? /tasks/{id} identifies a resource; you can use any sort of representation you like for it. There's no reason that the representation shouldn't include optional elements.
(Herustic: what would the web page look like? Do you really feel like there need to be two different we pages for this one concept? If not, then it can probably be a single resource in your API.)
what if I would also like to present scapped data as html?
Same identifier is fine for multiple representations; the client can use the Accept header to describe its preferences to the server.
You may want to give some thought to the problem of how the client knows what representations are possible. On the web, the specification for HTML describes a number of different kinds of links -- browsers can state different preferences when they encounter a script tag or an image tag, for example. You'll want something similar in your own media types.
There is nothing wrong with deciding that these should all be different resources, too. Either approach can be implemented in a way that is consistent with the REST architectural style.
I don't really know the constraints but GET /tasks/{id} could return both status and data if available.
If you prefer not to (for example, if getting data too often would be a problem from a performance perspective), it seems sensible to have:
GET /tasks/{id} #returns status and other plain task fields
and then:
GET /tasks/{id}/scrappeddata #returns data
Why? Because, that way is probably the most consistent with your model (and/or the mental model in the mind of your API users).
The general rules on resource naming given in Rest API tutorial are helpful: https://www.restapitutorial.com/lessons/restfulresourcenaming.html
There are no hard rules when it comes to naming routes for a RESTFUL api.
You can adhere to a convention, know best practices, advice from SO, but at the end of the day, you're the one designing your API, so you know better than anyone else what would fit your particular use case.
Search for "rest api naming best practices" or "how to structure rest api routes" and you'll get plenty of ideas.
The 2 suggestions me and #jonrsharpe made are both valid, it's up to you to define what makes sense for your project.
We have a series of REST services that pull resources by identifier but we've been recently tasked with passing disclosure parameters to save with audit.
What use to be...
GET entity/{id}
now turns into something like...
GET entity/{id}?requestName=&requestingOrganization=&reasonForUse=&verificationMethod=&otherAuditDisclosureProperties....
The state of entity does not change and is still idempotent however we must audit the additional information with each call in order to provide it.
First thought was to construct a body instead but that did not seem proper for a GET. This is the second approach using query parameters which have no intention of querying/filtering. These additional parameters are truly context information captured at the point of request. These are the equivalent of SAML attributes within a SOAP call that live outside of the SOAP body (which makes me think as possible header attributes).
Also note, that this information is relayed so the authentication token provided is for the service user calling in and not the actual identity of the context. The identity of the original caller is implicitly trusted in the trust framework surrounding.
How would you define this verb/path?
Maybe a custom header: vnd.mycompany.myheader; where you put all the params you need in some parseable format: name1=value1; name2=value2. You take the waste out of the query string.
The off-topic response
I cannot imagine an scenario where you are asking the user of an API for such subjective information, that requires a lot of effort to provide (as it changes per request) and provides no value to the client. It is only for your internal use. The most probable result is clients hard coding those values and repeating them over in all requests.
If the client is internal you may be looking for ways to correlate requests that span multiple services, like Sleuth, which will let you understand why clients are using your API.
If the client is external, think of making surveys and personal interviews with developers. I'd also suggest that you first nurture your API community to reach those people and understand how and why they use your API.
I agree with Daniel Cerecedo. The proper way is to add the information as part of your Request Header.
A general information can be found at: https://www.w3.org/Protocols/HTTP/HTRQ_Headers.html
The implementation will depends on your programming language.
Considerations:
First of all, I'm looking for a programmed/automated solution, not a -personal- solution. I'm afraid that this question has not a direct answer because technology, so I'll check any workaround to make this validation.
Scenario:
I've a public RESTful service that my customers (third party applications) can consume.
It has authentication basic (in the header) and the POST has a parameter that contains a cyphered string in SHA-256 with the data sent in the other parameters, in order to validate the data.
This cyphered string is made by a hash-key provided by me, for every customer, because some customers are competitors between them.
Anyway...
Problem:
Some customers are hitting the service directly from ajax, instead using a server-side http client. They are using the hashkey and the user/pass inside a javascript and beware my recommendations, there were no changes in their code. Because of this, we are not enabling them in our production environment.
Question:
It's possible (and how can I do it?) validate if the call is from server-side without checking the URL referer?
Just as comment, I'm using Web Api 2.2 in C#, but I think I could handle making the code myself, so any answer without code will be useful anyway.
I'm afraid that there is not exists any answer, because the clients are the same, but any some workaround or idea will be preciated.
Sorry for my english and my poor knowledge in HTTP clients.
If you could describe why it is a problem that customers are using ajax - would be easier to guess general solution. For example you can create registration service where your customer must specify their IPs so you can whitelist them, or you can create client auth library which all customers should use.
One of the goals of the REST API architecture is decoupling of the client and the server.
One of the questions I have run across in planning a REST API is: "how does the client know what is a valid payload for POST methods?"
Somehow the API needs to communicate to the UI what a valid payload for a given resource’s POST method. Otherwise here we are back at depending on out-of-band knowledge being necessary to work with an API and we are tightly coupled again.
So I’ve had this idea that the API response for a GET on a resource would provide a specification for constructing a valid payload for the POST method on that resource. This would include field names, data type, max length, etc.
This guy has a similar idea.
What's the correct way to handle this? Are most people just relying on out-of-band information? What are people doing in the real world with this problem?
EDIT
Something I have come up with to solve this problem is illustrated in the following sequence diagram:
The client and the api service are separate. The client knows:
Entry point
How to navigate the API via the hypermedia.
Here's what happens:
Someone (user) requests the registration page from the client
The client requests the entry point from the API and receives all hypermedia links with appropriate meta data on how to traverse them legally.
Client constructs the registration form based on the meta data associated with the registration hypermedia POST method.
User fills in the form and submits.
Client POSTs to the API with the correct data and all is well.
No magic /meta resouces, no need to use a method for the meta data. Everything is provided by the API.
Thoughts?
Most people are relying on out-of-band information. This is usually ok, though, because most clients aren't being built dynamically, but statically. They rely on known parts of the API rather than being HATEOAS-driven.
If you are developing or want to support a metadata-driven client, then yes, you're going to need to come up with a schema for providing that information. The implementation you linked to seems reasonable after a quick skim. Note that you've only moved the problem, though. Clients still need to know how to interpret the information in the metadata responses.
Your are right, the client should understand the semantics of the links in the response, and choose the right one from them to achieve its goal. The client is coupled to the semantics the API provides about this and not to the API itself. So for example a client should not retrieve information from the URI structure, since it is tightly coupled to the actual API.
I know of 2 current solution types about this:
by HAL+JSON you use IANA link relations to describe what the link does, and vendor specific MIME types to describe the schema of the fields
by JSON-LD (or any other RDF format) with Hydra vocab you send back RDF metadata according to the operation the link calls. This meta-data can contain the validation details of the fields (xsd vocab) and the semantics of the fields (microdata, microformats, etc...). This information is completely decoupled from the API implementation, so it might be a better option than using vendor specific MIME types, but Hydra is still under development and HAL is much simpler.
However your solution is valid as well, I think you should check both of these, since they are already standard solutions, and the uniform interface / self-descripting message constraint of REST encourages the usage of existing standards instead of custom solutions. But it is up to you if you want to create an own standard.
I think you are asking about, Rest API meta data handling. Unlike SOAP, Rest APIs doesn't use meta data normally, but sometimes it can be pretty useful, once your api size gets bigger.
I think you should look into swagger. It is the most elegant you can find out for rest apis. I have being using it for sometime and with the annotation support it is being rather easy to work with. It also has many examples found on github. Other advantage is, it contains nice configurable ui.
Apart from that you can find other ways of doing it like WADL and WSDL 2.0. Even-though I haven't being using them, you can read more about them here.
With RFC 6861, you can link to your form with create-form and edit-form Link Relations, instead of the client constructing the form by itself. The corresponding form should have the necessary schema to construct the POST request.
I have a question about REST in general.
Imagine I have a WCF webservice that is used to add an operation performed on a bank card.
The problem is that there are about 30 different parameters to pass on the WS.
On WCF that's pretty easy to do, calling a RPC with all those parameters.
The problem is that I wanted to switch this WCF WS to a REST API with ServiceStack.
The problem I encountered is that if i try to create the operation using REST and passing parameters through 'querystring', I have a string that is AWFUL for reading and VERY VERY LONG (?amount=1234&operationID=12& etc.).
I know this way of doing is not good as it's not resource oriented, but does that i mean i should split the creation of that item into SEVERAL steps (I mean, first create using POST then adding new infos/fields using several post ?).
In this situation I can't see clearly the gain with REST.
If you are passing these parameters in a query string I assume you are performing an HTTP GET. In a REST API GET's are generally reserved for getting data back and the only parameters you pass in are to filter your results. If you are performing an operation that changes the state of the system you want to perform a POST or PUT and pass the data in the body of the message as either XML or JSON, not in the query string.
The gain with REST is if you are opening this API up to other as it makes it much more portable to heterogeneous systems and there are some performance benefits. It also opens your API up to being used by clients such as web browsers. But if this API is just for internal use with .NET application that is not run in a browser then you may want to stick with WCF. REST is not the answer for every problem.
I am not sure to understand your question... REST doesn't mean "no payload". On the contrary, REST means "representational state transfer", so the body of HTTP requests (aka "representational state") is essential.
For a lot of reasons, in the case of a bank, resources are usually bank operations. CouchDB's guide has a very nice scenario about that.
In other words, your "parameters" would be the attributes of the resource representation (in JSON, XML or what you want) you would GET, POST, PUT or DELETE.