RESTful design for handling requests differentiated as either JSON or multpart/form-data - rest

I am designing a REST API for text analysis, and I want to accept submitted content as either JSON or in file format (which is more convenient for large submissions). The JSON format has a "text" field, from which the text is extracted. Uploaded files just contain raw textual content. So I will be handling these two types of content as either a JSON-encoded body, or in the latter case as a multipart/form-data upload. My question is really related to how I should design my API with respect to handling these two different types of submissions, whilst remaining as RESTful as possible.
My initial thought was to have two different endpoints, /json for JSON, and /files for uploaded files. It doesn't seem right, however, to have two endpoints differentiated only by the type of content submitted by clients, rather than functionality. As an alternative I then considered differentiating according to request method, using PUT for JSON and POST for files. Again this seems wrong, since it is add odds with the semantics of the request methods themselves.
It seems the only alternative is to accept the two types of encodings via the same endpoint. I'm still not sure if this is the right way to proceed from a design perspective, however, hence my question. I guess this is precisely what the Content-Type header is for, as stated here. But there seems a more radical distinction between JSON and multipart/form-data than between JSON and XML.

When doing REST design my inclination is to stick with:
Unified methods (PUT is PUT regardless of Content-Type, POST is always POST)
I think that Content-Type is ultimately the correct differentiator, but if you'd prefer to embed it in the URL, I'd go with a content suffix of some sort.
.json for application/JSON and .file for form-data?
Once you move afield of Content-Type as the differentiator, it's sort wibbly wobbly freeformy at that point.

Related

Need feedbck on the quality of REST URL

For getting the latest valid address (of the logged in user), how RESTful is the following URL?
GET /addresses/valid/latest
Probably
GET /addresses?valid=true&limit=1
is the best, but it should then return a list. And, I'd like to return an object rather then a list.
Any other suggestions?
Your url structure doesn't have much to do with how RESTful something is.
So lets assume which one is the 'best'. Also a bit hard to say, pretty subjective.
I would generally avoid a pattern like /addresses/valid/latest. This kinda suggest that there is a 'latest resource' in the 'valid collection', in the 'addresses collection'.
So I like your other suggestion a bit better, because it suggests that you're using an 'addresses' collection, filtering by valid items and only showing 1.
If you don't want all kinds of parameters, I would be more inclined to find a url pattern that's not literally 'addresses, but only the valid, but only the latest', but think about what the purpose is of the endpoint. Maybe something that's easier to remember like /fresh-address =)
how RESTful is the following URL?
Any identifier that satisfies the production rules described by RFC 3986 is RESTful.
General purpose components are not supposed to derive semantics from identifiers, they are opaque. Which means that the server is free to encode information into those identifiers at its own discretion.
Consider Google search: does your browser care what URI is used as the target of the search form? Does your browser care about the href provided by Google with each search result? In both cases, the browser just does what it is told, which is to say it creates an HTTP request based on the representation of application state that was provided by the server.
URI are in the same broad category as variable names in a programming language - the machines don't care so long as the spellings are consistent with some simple constraints. People care, so there are some benefits to having a locally consistent and logical scheme.
But there are contexts in which easily guessed URI are not what you want. See Mark Seemann 2013.
Since the semantic content of the URI is reserved for use by the server only, it follows that the server can choose to encode that information into path segments or the query part. Or both.
Spellings that can be described by a URI Template can be very powerful. The most familiar URI template is probably an HTML form using the GET method, which encodes key value pairs onto the query part of the URI; so you should think about whether that's a use case you want to support.

Why is a JWT split into three dot-delimited parts?

A JSON Web Token (JWT) is split into three Base-64-encoded parts, which are concatenated by periods ("."). The first two parts encode JSON objects, the first of which is a header detailing the signature and hashing algorithm, and the second contains the assertions. The third is binary data that is the signature itself.
My question is: why is the JSON Web Token split into three separate parts like this? It seems like it would have made parsing them a lot easier to have encoded them as a single JSON object, like so (the example below is incomplete for brevity's sake):
{
"header": {
"alg": "rsa"
},
"assertions": {
"iss": "2019-10-09T12:34:56Z"
},
"sig": "qoewrhgoqiethgio3n5h325ijh3=="
}
Stated differently: why didn't the designers of JWT just put all parts of the JWT in a single JSON object like shown above?
IMHO, it would bring cause more issues. Yes you could parse it nicely, but what about verification of signature?
The Structure of a JWT is <B64 String>.<B64 String>.<B64 String>. Signature is basically the 2 first parts signed. It is unlikely that the structure will be modified by various frameworks in any way.
Now consider JSON: during serialisation and deserialisation the order of elements may chang. Object {"a":1,"b":2} and {"b":2,"a":1} might be equal in javascript but if you stringify them, they will generate different signatures.
Also, to check the signature you would need to decide upon a standard form of JSON that will be used to generate signature (for instance, beautified or minified). Again, different choices will generate different signatures.
As a result there are more hassles than benefits from simply using JSON
While I'm not speaking for the people who designed the JWT, I can think of one major reason why your suggestion won't fly:
Headers don't allow newlines
Remember that a primary use-case for a JWT is to use it as a cookie value. Cookies get passed up in headers. Header values don't support newlines: each header key/value pair needs to fit on one line.
Therefore, arbitrary JSON will not work for something that is meant to be passed as a header value in an HTTP request.
Therefore, some sort of encoding is required - which is why base64 is used in the first place. The reason base64 often shows up is because it converts any blob or string into something that can be reliably transported as simple ascii in almost any circumstances. I.e. three base64 encoded "payloads" separated with periods (which isn't a valid character in base64 encoding) is pretty much guaranteed to transport safely and without mangling between just about any system.
JSON cannot make the same guarantees. Of course, you could remove the newlines (JSON ignores whitespace anyway), but quotes are still a problem: they should be encoded in URLs, and maybe not in other circumstances, although they would probably be okay in an HTTP headers. As a result it just becomes one more "gotcha" as a developer tries to implement it for the first time.
I'm sure there are other reasons too: this isn't meant to be a comprehensive list.
The signature can not be a part of what is signed, therefore it has to be separate.
The header and payloads could be combined into on JSON object, but it would be bad design. It is the job of your JWT library to check the headers and verify the signature. That can (and should) be done without concern for the payload. It is the job for your application to react to the payload. As long as the signature checks out, that can be done without concern for the headers.
Separate conserns, separate objects.

RESTful compliant by using only POST

until now I'm using only the POST method to create web services (not for websites). I thought it is more secure and the better way, if it is not for a website, because the parameters are not stored in web server logs, when sensitiv data are send.
Now I'm not sure if this is RESTful compliant and the best way. My current definition is something like
POST https://{url}/order/getOrder
Content-Type: application/json
{
"orderId": "42"
}
Normally a GET-Request will be
GET https://{url}/order/42
or
GET https://{url}/order/getOrder
Content-Type = application/json
{ "orderId" : 42 }
May question is, if all examples are RESTful compliant, or only the last two.
best regards
jd
Now I'm not sure if this is RESTful compliant and the best way
Technically, I suppose not. Cache constraints are a first class concern in the REST architectural style
Cache constraints require that the data within a response to a request be implicitly or explicitly labeled as cacheable or non-cacheable.
In the HTTP specification, POST is explicitly listed as a cacheable method. However, the cache invalidation specification requires that a non-error response to a POST request invalidates previously cached entries for the effective Request URI.
So for queries, which would normally be safe operations anyway, you should be using GET or HEAD.
Thus, of the options that you've listed, the "REST compliant" approach would be
GET https://{url}/order/42
Your third alternative fails for a different reason:
A payload within a GET request message has no defined semantics
This is primarily, I would argue, because it is difficult for caches to do sensible things when the payload needs to be considered part of the cache key.

How do I mark SOAP service 'MTOM enabled'

This is not Java specific question, but let's have an example in Java: It is a standard practice in the Java world to add xmime:expectedContentTypes="*/* to base64 elements to enable MTOM processing on the server side - it results in the #XmlMimeType annotation, use of DataHandlers instead of byte arrays etc. While this description is of course greatly simplified, xmime:expectedContentTypes="*/* is usually recognized as 'MTOM ready' by the developers (and more importantly also by the implementing libraries) when seen in the schema. From what I've gathered from the examples, the situation is the same in the C# world.
It does however make no sense to me - the attribute specifies what kind of data we might actually expect in the XML, not that it can be used together with MTOM. I have also not found any direct connection between expected content type and MTOM in any RFC or similar document for SOAP 1.1.
My question can be phrased in two ways:
How does the service make clear that it accepts / serves binary data as MTOM attachments in the request / response?
How does the client correctly recognize that the binary data can be sent / obtained by using MTOM attachments for the given service?
It seems you are slightly confused between attachments, SOAP Attachment and MTOM.
SOAP-Attachment was first introduced in December 2000 as a W3C note (not a specification) and defined an extension to the transport binding mechanisms defined in SOAP 1.1. In particular, this note defined:
a binding for a SOAP 1.1 message to be carried within a MIME multipart/related message in such a way that the processing rules for the SOAP 1.1 message are preserved.The MIME multipart mechanism for encapsulation of compound documents can be used to bundle entities related to the SOAP 1.1 message such as attachments.
In simple terms, it defined a mechanism for multiple documents (attachments) to be associated with SOAP message in their native formats using a multipart mime structure for transport. This was achieved using a combination of "Content-Location" and "Content-ID" headers along with a set of rules for interpreting the URI that was referred to by "Content-Location" headers.
A SOAP message in this format can be visualized as below (encapsulated as multipart/mime):
This is also the format that you might have worked with when you used SAAJ, but is not recommended anymore, unless you are working with legacy code. The W3C note was later revised to a "feature" level in 2004 (along with SOAP 1.2) and was eventually superseded by SOAP MTOM mechanism.
SOAP Message Transmission Optimization Mechanism (MTOM) is officially defined as not one, but three separate features that work together to deliver the functionality:
"Abstract SOAP Transmission Optimization Feature" describes an abstract feature for optimizing the transmission and/or wire format of the SOAP message by selectively encoding portions of the message, while still presenting an XML infoset to the SOAP application.
"An optimized MIME Multipart/Related serialization of SOAP Messages" describes an Optimized MIME Multipart/Related Serialization of SOAP Messages implementing the Abstract SOAP Transmission Optimization Feature in a binding independent way.
"HTTP SOAP Transmission Optimization Feature" describes an implementation of the Abstract Transport Optimization Feature for the SOAP 1.2 HTTP binding.
If you read the second document, you will realize that "attachments" has been replaced with XML binary optimized "packages" or XOP.
A XOP package is created by placing a serialization of the XML Infoset inside of an extensible packaging format (such a MIME Multipart/Related, see [RFC 2387]). Then, selected portions of its content that are base64-encoded binary data are extracted and re-encoded (i.e., the data is decoded from base64) and placed into the package. The locations of those selected portions are marked in the XML with a special element that links to the packaged data using URIs.
In simple terms, this means that instead of encapsulating data as "attachment" in a multipart/mime message, the data is now referred to by a "pointer" or links. The following diagrams may assist in understanding:
Now that we have the background, let us come back to your questions.
How does the service make clear that it accepts / serves binary data as MTOM attachments in the request / response?
It does not. There is no concept of an attachment with MTOM, and thus a server can't declare that it accepts attachments.
How does the client correctly recognize that the binary data can be sent / obtained by using MTOM attachments for the given service?
Like I said above, there is no way for a client to do this as "attachments" are not supported.
Having said that, there is yet another W3C spec on XML media types that states:
The xmime:contentType attribute information item allows Web services applications to optimize the handling of the binary data defined by a binary element information item and should be considered as meta-data. The presence of the xmime:contentType attribute does not changes the value of the element content.
When you enable MTOM using xmime:contentType and xmime:expectedContentTypes="application/octet-stream (* should not be used), the generated WSDL will have an entry like this:
<element name="myImage" xmime:contentType="xsd:base64Binary" xmime:expectedContentTypes="application/octet-stream"/>
This is server's way of declaring that it can receive an XML binary optimized package (which could be broken down into multipart MIME message).
When the client sees the above, the client knows server can accept XML binary optimized packages and generates appropriate HTTP requests as defined Identifying XOP Documents:
XOP Documents, when used in MIME-like systems, are identified with the "application/xop+xml" media type, with the required "type" parameter conveying the original XML serialization's associated content type.
Hope that helps!

Is there a way to get serializer or content-type from inside Catalyst::Controller::REST class?

Generally it seems that the way Catalyst::Controller::REST is that you put a reference into "entity" and then Catalyst::Action::Serialize picks a content-type and a serializer after you're done.
In my case, I may be dealing with very large data and I can't hold the entire thing in memory at once (it's coming from a different server and I'm reformatting and returning it). If I knew what content-type Serialize was going to choose, I could transform the incoming data and write it to a file as it comes in and then serve it back out from disk. Is there any way for me to find out what content-type I'm being asked for beyond copying the code in Catalyst::Action::SerializeBase?
The workaround will be to say "I don't care what you asked for, here's your JSON" but it'd be nice to actually provide what's requested. :)