Handle the cases when users input all one-of optional paramaters in API response - rest

An API endpoint I wrote supports two one-of parameters, and users are requested to supply values for one of them only, but ideally, they should not specify both.
{"A": "something", "B": "something"}
When users do not supply any of the two, an exception will be thrown.
However, I'm wondering how I should handle the scenarios when users put in values for both.
For the context, A is, loosely speaking, a subset of B. There's two opinions with my teammate:
When users input both, A shall prevail.
When users input both, a 400 exception is thrown to remind the users that we only need one of the two.
Thanks!

However, I'm wondering how I should handle the scenarios when users put in values for both.
You could look to media types as an analog.
That is, suppose we had a specification for application/vnd.example.AorB+json, which defined the A field, the B field, and explicitly expressed that exactly one of those two options should be present.
Now when our endpoint gets a POST, or a PUT, with content-type application/vnd.example.AorB+json, we know the entity included in the body of the request should be a json document with additional constraints on the keys and values.
So by analogy, if the keys and values are wrong, our endpoint should reject the document in the same way that it would if the JSON structure itself were not well formed.
If you buy that logic, then the practical answer is to deploy an endpoint that expects json, send it something broken, and see what response you get back. Then model your own design on that.
The WebDAV standard has an interesting discussion related to its definition of the 422 status code.
The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions.
So you might look at your situation, and try to decide whether including both elements is "syntactically correct" or not, and choose the matching status code.
In practice, it really doesn't matter very much. You SHOULD be explaining the specific problem in the body of the response, so human beings can figure it out what happened. The different status codes are really meta data, so that generic components can figure out what to do -- but there isn't anything in the standard that tells generic components to handle a 422 differently from a 400.
So don't over think it.

If it's reasonable/meaningful to expect one property, you should only allow one property.
Strict APIs are helpful because they guide users much better to what the intent of the server and resource is.
One great tool for this kind of stuff, is to use something like json-schema to specify exactly what your server does and does not expect. Being able to specify that only 1 out of 2 properties can appear is one of its features.

Related

REST API: Does validation on identifiers break encapsulation? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 months ago.
Improve this question
I figured I'd post here to get some ideas/feedback on something I've come up against recently. The API I've developed has validation on an identifier that's passed through as a path parameter:
e.g. /resource/resource_identifier
There are some specific business rules as to what makes an idenfier valid and my API has validation which enforces these rules and returns a 400 when that's violated.
Now the reason I'm writing this is that I've been doing this sort of thing in every REST (ish) API I've ever written. It's kind of ingrained in me now but ecently I've been told that this is 'bad' and that it breaks encapsulation. Furthermore, it does this by forcing a consumer to have knowledge about the format of an identifier. I'm told that I should be returning a 404 instead and simply accept anything as an idenfier.
We've had some pretty heated debates about this and what encapsulation actually means in the context of REST. I've found numerous definitions but they aren't specific. As with any REST contention it's hard to substantiate an argument for either.
If StackOverflow would allow me, I'd like to try and gain a concensus on this and why APIs like Spotify for example, use 400 in this scenario.
I've been doing this sort of thing in every REST (ish) API I've ever written. It's kind of ingrained in me now but recently I've been told that this is 'bad'
In the context of HTTP, it is an "anti-pattern", yes.
I'm told that I should be returning a 404 instead
And that is the right pattern when you want the advantages of responding like a general purpose web server.
Here's the point: if you want general purpose components in the HTTP application to be able to do sensible things with your response messages, then you need to provide them with the appropriate meta data.
In the case of a target resource identifier that satisfies the request-target production rules defined in RFC 9112 but is otherwise unsatisfactory; you can choose any response semantics you want (400? 403? 404? 499? 200?).
But if you choose 404, then general purpose components will know that the response is an error that can be re-used for other requests (under appropriate conditions - see RFC 9111).
why APIs like Spotify for example, use 400 in this scenario.
Remember: engineering is about trade offs.
The benefits of caching may not outweigh more cost effective request processing, or more efficient incident analysis, or ....
It's also possible that it's just habit - it's done that way because that's the way that they have always done it; or because they were taught it as a "best practice", or whatever. One of the engineering trade offs we need to consider is whether or not to invest in analyzing a trade off!
An imperfect system that ships earns more market share than a perfect solution that doesn't.
While it may sound natural to expose the resource internal ID as ID used in the URI, remember that the whole URI itself is the identifier of a resource and not only the last bit of the URI. Clients are usually also not interested in the characters that form the URI (or at least they shouldn't care about it) but only in the state they receive upon requesting that from the API/server.
Further, if you think long-term, which should be the reason why you want to build your design on top of a REST architecture, is there a chance that the internal identifier of a resource could ever change? If so, introducing an indirection could make more sense then i.e. by using UUIDs instead of product IDs in the URI and then have a further table/collection to perform a mapping from UUID to domain object ID. Think of a resource that exposes some data of a product. It may sound like a good idea to use the product ID at the end of the URI as they identify the product in your domain model clearly. But what happens if you company undergoes a merge with an other company that happens to have an overlap on product but then uses different identifiers than you? I've seen such cases in reality, unfortunately, and almost all of them wanted to avoid change for their clients and so had to support multiple URIs for the same products in the end.
This is exactly why Mike Amundsen said
... your data model is not your object model is not your resource model ... (Source)
REST is full of such indirection mechanisms to allow such systems to avoid coupling. I.e. besides above mentioned mechanism, you also have link-relations to allow servers to switch URIs when needed while clients can still lookup the URI via the exposed relation name, or its focus on negotiated media types and its representation formats rather than forcing clients to speak their API-specific RPC-like, plain-JSON slang.
Jim Webber further coined the term domain application protocol to describe that HTTP is an application protocol for exchanging documents and any business rules we infer are just side effects of the actual document management performed by HTTP. So all we do in "REST" is basically to send documents back and forth and infer some business logic to act upon receiving certain documents.
In regards to encapsulation, this isn't the scope of REST nor HTTP. What data you return depends on your business needs and/or on the capabilities of the representation formats exchanged. If a certain media-type isn't able to express a certain capability, providing such data to clients might not make much sense.
In general, I'd would recommend not to use domain internal IDs as part of URIs for the above mentioned reasons. Usually that information should be part of the exchanged payload to give users/customers the option to refer to that resources on other channels like e/mail, telephone, ... Of course, that depends on the resource and its purpose at hand. As a user I'd prefer to refer to myself with my full name rather than some internal user- or customer ID or the like.
edit: sorry, missed the validation aspect ...
If you expect user/client input on the server/API side, you should always validate the data before starting to process it. Usually though, URIs are provided by the server and might only trigger business activities if the URI requested matches one of your defined rules. In general, most frameworks will respond with 400 Bad Request responses when they couldn't map the URI to a concrete action, giving the client a chance to correct its mistake and reissue the updated request. As URIs shouldn't be generated or altered by clients anyways, validating such parameters might be unnecessary overhead unless they might introduce security risks. Here it might be a better approach then to toughen-up the mapping rules of URIs to actions then and let those frameworks respond with a 400 message when clients use stuff they aren't supposed to.
Encapsulation makes sense when we want to hide data and implementation behind an interface. Here we want to expose the structure of the data, because it is for communication, not for storage and the service certainly needs this communication in order to function. Validation of data is a very basic concept, because it makes the service reliable and because is protects against hacking attempts. The id here is a parameter and checking its structure is just parameter validation, which should return 400 if failed. So this is not restricted to the body of the request, the problem can be anywhere in the HTTP message as you can read below. Another argument against 404 that the requested resource cannot possibly exist, because we are talking about a malformed id and so a malformed URI. It is very important to validate every user input, because a malformed parameter can be used for injections e.g. for SQL injection if it is not validated.
The HyperText Transfer Protocol (HTTP) 400 Bad Request response status
code indicates that the server cannot or will not process the request
due to something that is perceived to be a client error (for example,
malformed request syntax, invalid request message framing, or
deceptive request routing).
vs
The HTTP 404 Not Found response status code indicates that the server
cannot find the requested resource. Links that lead to a 404 page are
often called broken or dead links and can be subject to link rot.
A 404 status code only indicates that the resource is missing: not
whether the absence is temporary or permanent. If a resource is
permanently removed, use the 410 (Gone) status instead.
In the case of REST we describe the interface using the HTTP protocol, URI standard, MIME types, etc. instead of the actual programming language, because they are language independent standards. As of your specific case it would be nice to check the uniform interface constraints including the HATEOAS constraint, because if your service makes the URIs as it should, then it is clear that a malformed id is something malicious. As of Spotify and other APIs, 99% of them are not REST APIs, maybe REST-ish. Read the Fielding dissertation and standards instead of trying to figure it out based on SO answers and examples. So this a classic RTFM situation.
In the context of REST a very simple example of data hiding is storing a number something like:
PUT /x {"value": "111"} "content-type:application/vnd.example.binary+json"
GET /x "accept:application/vnd.example.decimal+json" -> {"value": 7}
Here we don't expose how we store the data. We just send the binary and decimal representations of it. This is called data hiding. In the case of id it does not make sense to have an external id and convert it to an internal id, it is why you use the same in your database, but it is fine to check if its structure is valid. Normally you validate it and convert it into a DTO.
Implementation hiding is more complicated in this context, it is sort of avoiding micromanagement with the service and rather implement new features if it happens frequently. It might involve consumer surveys about what features they need and checking logs and figuring out why certain consumers send way too many messages and how to merge them into a single one. For example we have a math service:
PUT /x 7
PUT /y 8
PUT /z 9
PUT /s 0
PATCH /s {"add": "x"}
PATCH /s {"add": "y"}
PATCH /s {"add": "z"}
GET /s -> 24
vs
POST /expression {"sum": [7,8,9]} -> 24
If you want to translate between structured programming, OOP and REST, then it is something like this:
Number countCartTotal(CartId cartId);
<=>
interface iCart {
Number countTotal();
}
<=>
GET api/cart/{cartid}/total -> {total}
So an endpoint represents an exposed operation something like verbNoun(details) e.g. countCartTotal(cartId), which you can split into verb=countTotal, noun=cart, details=cartId and build the URI from it. The verb must be transformed into a HTTP method. In this case using GET makes the most sense, because we need data instead of sending data. The rest of the verb must be transformed into a noun, so countTotal -> GET totalCount. Then you can merge the two nouns: totalCount + cart -> cartTotal. Then you can build an URI template based on the resulting noun and the details: cartTotal + cartId -> cart/{cartid}/total and you are done with the endpoint design GET {root}/cart/{cartid}/total. Now you can bind it to the countCartTotal(cartId) or to the repo.resource(iCart, cartId).countTotal().
So I think if the structure of the id does not change, then you can even add it to the API documentation if you want to. Though it is not necessary to do so.
From security perspective you can return 404 if the only possible reason to send such a request is a hacking attempt, so the hacker won't know for certain why it failed and you don't expose details of the protection. In this situation it would be overthinking the problem, but in certain scenarios it makes sense e.g. where the API can leak data. For example when you send a password reset link, then a web application usually asks for an email address and most of them send an error message if it is not registered. This can be used to check if somebody is registered on the site, so better to hide this kind of errors. I guess in your case the id is not something sensitive and if you have proper access control, then even if a hacker knows the id, they cannot do much with that information.
Another possible aspect is something like what if the structure of the id changes. Well we write a different validation code, which allows only the new structure or maybe both structures and make a new version of the API with v2/api and v2/docs root and documentation URIs.
So I fully support your point of view and I think the other developer you mentioned does not even understand OOP and encapsulation, not to mention webservices and REST APIs.

HTTP GET Request Params - which error code? 400 vs 404 vs 422

I've read a lot of questions about 400 vs 422, but they are almost for HTTP POST requests for example this one: 400 vs 422 response to POST of data.
I am still not sure what should I use for GET when the required parameters are sent with wrong values.
Imagine this scenario:
I have the endpoint /searchDeclaration, with the parameter type.
My declarations have 2 types: TypeA and TypeB.
So, I can call this endpoint like this: /searchDeclaration?type=TypeA to get all TypeA declarations.
What error should I send when someone calls the endpoint with an invalid type? For example: /searchDeclaration?type=Type123
Should I send 400? I am not sure it is the best code, because the parameter is right, only the value is not valid.
The 422 code looks like it is more suitable for POST request.
EDIT:
After some responses I have another doubt.
The /searchDeclaration endpoints will return the declaration for the authenticated user. TypeA and TypeB are valid values, but some users don't have submitted a TypeB declaration, so when they call /searchDeclaration?type=TypeB which error should I send? Returning 404 does not seem right because the URI is correct, but that user does not have a declaration for that value yet. Am I overthinking this?
If the URI is wrong, use 404 Not Found.
The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource
The target resource is, as one might expect, identified by the target URI.
Aside from the semantics of the response, 404 indicates that the response is cacheable - meaning that general purpose caches will know that they can re-use this response to future requests
I am not sure it is the best code, because the parameter is right, only the value is not valid.
The fact that this URI includes parameters and values is an implementation detail; something that we don't care about at the HTTP level.
Casually: 404 means "the URI is spelled wrong"; but doesn't try to discriminate which part(s) of the URI have errors. That information is something that you can include in the body of the response, as part of the explanation of the error situation.
Am I overthinking this?
No, but I don't think you are thinking about the right things, yet.
One of the reasons that you are finding this challenging is that you have multiple logical resources sharing the same target URI. If each user declaration document had its own unique identifier, then the exercise of choosing the right response semantics would be a lot more straight forward.
Another way of handing it would be to redirect the client to the more specific URI, and then handle the response semantics there in the straight forward way.
It's trying to use a common URI for different logical resources AND respond without requiring an extra round trip that is making it hard. Bad news: this is one of the trade offs that ought to have been considered when designing your resource identifiers; if you don't want this to require harder thinking, don't use this kind of design.
The good news: 404 is still going to be fine - you are dealing with authorized requests, and the rules about sharing responses to authorized requests mean that the only possible confusion will be if different users are sharing the same private cache.
Remember: part of the point is that all resources share a common, general purpose message vocabulary. Everything is supposed to look like a document being served by a boring web server.
The fact that there's a bunch of complexity of meaning behind the resource is an implementation detail that is correctly hidden behind the uniform interface.
There are two options, it all depends on your 'type' variable.
If 'type' is an ENUM, which only allows 'typeA' and 'typeB', and your client sends 'type123', the service will respond with a '400 Bad Request' error, you don't need to check. In my opinion, this should be ideal, since if you need to add new 'type's in the future, you will only have to add them in the ENUM, instead of doing 'if-else' inside your code to check them all.
In case the 'type' variable is a String, the controller will admit a 'type123' and you should be the one to return an error, since the client request is not malformed, but rather it is trying to access a resource that does not exist.
In this case, you can return a 404 Not Found error, (that resource the client is filtering by cannot be found), or a 422 error as you say, since the server understands the request, but is not able to process it.
Let's assume for a moment that the resource you are querying is returning a set of entries that do contain certain properties. If you don't specify a filter you will basically get a (pageable) representation of those entries either as embedded objects or as links to those resources.
Now you want to filter the results based on some properties these entries have. Most programming languages nowadays provide some lambda functionality in the form of
List filteredList = list.filter(item => item.propertyX == ...)...;
The result of such a filter function is usually a list of items that fulfilled the specified conditions. If no items met the given condition then the result will be an empty list.
Applying certain filter conditions on the Web can be designed similarly. Is it really an error when a provided filter expression doesn't yield any entries? IMO it is not an error in terms of the actual message transport itself as the server was able to receive and parse the request without any issues. As such it has to be some kind of business rule that states that only admissible values are allowed as input.
If you or your company consider the case of a provided filter value for a property returning no results as an error or you perform some i.e. XML or JSON schemata validation on the received payload (for complex requests) then we should look at how those mentioned HTTP errors are defined:
The 400 (Bad Request) status code indicates that the server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing). (Source: RFC 7230)
Here, it is clearly the case that you don't want to process requests that come with an invalid property value and as such decline the request as such.
The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions. (Source: RFC 4918 (WebDAV))
In this case you basically say that the payload was actually syntactically correct but failed on a semmantical level.
Note that 422 Unprocessable Entity stems from WebDAV while 400 Bad Request is defined in the HTTP specification. This can have some impact if your API serves arbitrary HTTP clients. Ones that only know and support the HTTP error codes defined in the HTTP sepcification won't be able to really determine the semantics of the 422 response. They will still consider it as a user error, but won't be able to provide the client with any more help on that issue. As such, if your API needs to be as generic as possible, stick to 400 Bad Request. If you are sure all clients support 422 Unprocessable Entity go for that one.
General improvement hints
As you tagged your question with rest, let's see how we can improve this case.
REST is an architectural style with an intention of decoupling clients from servers to make the former one more failure tollerant while allowing the latter one to evolve freely over time. Servers in such architectures should therefore provide clients with all the things clients need to make valid requests. To avoid having clients to know upfront what a server expects as input, servers usually provide some kind of input mask clients can use to fill in stuff the server needs.
On the browsable Web this is usually accomplished by HTML Forms. The form not only teaches your client where to send the request to, which HTTP operation to use and which representation format the request should actually use (usually given implicitly as application/x-www-form-urlencoded) but also the sturcture and properties the server supports.
In HTML forms it is rather easy for the server to restrict the input choices of a client by using something along the lines of
<form action="/target">
<label for="cars">Choose a car:</label>
<select name="cars" id="cars">
<option value="volvo">Volvo</option>
<option value="saab">Saab</option>
<option value="opel">Opel</option>
<option value="audi">Audi</option>
</select>
<br/>
<input type="submit" value="Submit">
</form>
This doesn't really remove the needs to check and verify the correctness of the request on the server side, tough you make it much easier for the client to actually perform a valid request.
Unfortunately, HTML forms itself have their limits. I.e. they only allow POST and GET requests to be issues. While encType defaults to application/x-www-form-urlencoded, if you want to transfer files you should use multipart/form-data. Other than that, any valid content-type should be admissible.
If you prefer JSON-based payloads over HTML you might want to look into JSON Forms, HAL forms, Ion Forms among others.
Note though that you should adhere to the content type negotiation principles. Most often proactive content type negotiation is performed where a client sends its preferences within the Accept header and the server will select the best match somehow and return either the resource mapped to that representation format or respond with a 406 Not Acceptable response. While the standard doesn't prevent returning a default representation in such caes, it bears the danger that clients won't be able to process such responses then. A better alternative here would be to fall back to reactive negotiation where the server responds with a 300 Muliple Choice response where a client has to select one of the provided alternatives and then send a GET request to the selected alternative URIs to retrieve the content in the payload may be able to process.
If you want to provide a simple link a client can use to retrieve filtered results, the server should provide the client already with the full URI as well as a link relation name and/or extension relation type that the client can use to lookup the URI to retrieve the content for if interested in.
Both, forms and link-relation support, fall under the HATEOAS umbrella as they help to remove the need for any external documentation such as OpenAPI or Swagger documentation.
To sum things up, I would rethink whether a provided property value that does not exist should really end up as a business failure. I think returning an empty list is just fine here as you clearly state that way that for the given criterias no result was obtainable. If you though still want to stick to a business error check what clients actually make use of your API. If they support 422 go for that one. If you don't know, better stick to 400 as it should be understood by all HTTP clients equally.
In oder to remove the likelihood of ending up with requests that issue invalid property values, use forms to teach clients how requests should look like. Through certain elements or properties you can already teach a client that only a limited set of choices is valid for a certain property. Instead of a form you could also provide dedicated links a client can just use to obtain the filtered result. Just make sure to issue those links with meaningful link relatin names then.

Should model of a request for a PUT method be the same class with the model of the response of the GET method?

In my app, the backend will use an endpoint to get and update data (with GET and PUT method). The exact same JSON schema is used for both operations. The question is, should I also use the same model for the update request body and the get response body for this or not? Can you tell me the pros and cons of either separating and combining them?
Thanks.
Should model of a request for a PUT method be the same class with the model of the response of the GET method?
See RFC 7231:
A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response.
It is normal that the PUT request will have the same Media Type as a successful GET response for that target-uri.
HTTP defines the semantics of the messages, but doesn't constrain the implementation (see Fielding 2002).
That said, if GET and PUT are using representations that change for the same reason, then a common implementation is reasonable, so it will likely be easier to maintain your code when those two paths share the same underlying model.
what about the Single Responsibility Principle if it's the same class?
My answer is that the Single Responsibility Principle doesn't actually make your code better in this case, so you don't use it. There are different ways you can try to generalize that idea; one is that you are working with a data structure, not an "object". Another is that this information is crossing a boundary, and at the boundaries, applications are not object-oriented.
Expressing that idea somewhat differently: what you have here is a single responsibility (manage some in memory representation of the message), but two different roles; an outbound role for the GET scenario (where you need to be able to convert the message into bytes, describe the content-type, and so on), and an inbound role for the PUT scenario (where you need to extract units of information from the data structure).

Correct HTTP request method for nullipotent action that takes binary data

Consider a web API method that has no side effects, but which takes binary data as a parameter. An example would be a method that tells the user whether or not their image is photoshopped, but does not permanently store the image or the result on its servers.
Should such a method be a GET or a POST?
GET doesn't seem to have a recommended way of sending data outside of URL parameters, but the behavior of the method implies a GET, which according to the HTTP spec is for safe, stateless responses. This becomes particularly constraining under the semantics of REST, which imply that POST methods create a new object on the server.
This becomes particularly constraining under the semantics of REST, which imply that POST methods create a new object on the server.
While a POST request means that the entity sent will be treated "as a new subordinate of the resource identified by the Request-URI", there is no requirement that this result in the creation of a new permanent object or that any such new object be identified by a URI (so no new object as far as the client knows). An object can be transient, representing the results of e.g. "Providing a block of data, such as the result of submitting a form, to a data-handling process" and not persisting after the entity representing that object has been sent.
While this means that a POST can create a new resource, and is certainly the best way to do so when it is the server that will give that new resource its URI (with PUT being the more appropriate method when the client dictates the new URI) it is can also be used for cases that delete objects (though again if it's a deletion of a single* resource identifiable by a URI then DELETE is far more appropriate), both create and delete objects, change multiple objects, it can mean that your kitchen light turns on but that the response is the same whether that worked or failed because the communication from the webserver to the kitchen light doesn't allow for feedback about success. It really can do anything at all.
But, your instincts are good in wanting this to be a GET: While the looseness of POST means we can make a case for it for just about every request (as done by approaches that use HTTP for an RPC-like protocol, essentially treating HTTP as if it was a transport protocol), this is inelegant in theory, inefficient in practice and clumsy in definition. You have an idempotent function that depends on solely on what the client is interested in, and that maps most obviously the GET in a few ways.
If we could fit everything in a URI then GET would be easy. E.g we can define a simple integer addition with something like http://example.net/addInts?x=1;y=2 representing the addition of 71 and 2 and hence being a permanent immutable resource representing the number 3 (since the results of GET can vary with changes to a resource over time, but this resource never changes) and then use a mechanism like HTML's <form> or javascript to allow the server to inform the client as to how to construct the URIs for other numbers (to maintain the HATEOS and/or COD constraints). Simples!
Your problem here is that you do not have input that can be represented as concisely as the numbers 1 and 2 can above. In theory you could do something like http://example.net/photoshoppedCheck?image=data:image/png;base64,iVBORw0KGgoAAAANSU… and hence create a URI that represents the resource of the results of the check. This URI will though will have 4 characters for every 3 bytes in the image. While there's no absolute limit on URI length both the theory and the practice allow this to fail (in theory HTTP allows for proxies and servers to set a limit on URI length, and in practice they do).
An argument could be made for using GET and sending a request body the same way as you would with a POST, and some webservers will even allow you to do this. However, GET is defined as returning an entity describing the resource identified in the URI with headers restricting how that entity does that describing: Since the request body is not part of that definition it must be ignored by your code! If you were tempted to bend this rule then you must consider that:
Some webservers will refuse the request or strip the body, so you may not be able to.
If your webserver does allow it, the fact that its not specified means you can't be sure an upgrade won't "fix" this and so break your code.
Some proxies will refuse or strip the request.
Some client libraries will most certainly refuse to allow developers to send a request body along with a GET.
So it's a no-no in both theory and practice.
About the only other approach we could do apart from POST is to have a URI that we consider as representing an image that was not photoshopped. Hence if you GET that you get an entity describing the image (obviously it could be the actual image, though it could also be something else if we stretch the concept of content-negotiation) and then PUT will check the image and if its deemed to not be photoshopped it responds with the same image and a 200 or just a 204 while if it is deemed to be photoshopped it responds with a 400 because we've tried to PUT a photoshopped image as a resource that can only be a non-photoshopped image. Because we respond immediately, there's no race-condition with simultaneous requests.
Frankly, this would be darn-right horrible. While I think I've made a case for it by the letter of the specs, it's just nasty: REST is meant to help us design clear APIs, not obtuse APIs we can offer a too-clever-for-its-own-good justification of.
No, in all the way to go here is to POST the image to a fixed URI which then returns a simple entity describing the analysis.
It's perfectly justifiable as REST (the POST creates a transient object based on that image, and then responds with an entity describing that object, and then that object disappears again). It's straight-forward. It's about as efficient as it could be (we can't do any HTTP caching† but most of the network delay is going to be on the upload rather than the download anyway). It also fits into the general use-case of "process something" that POST was first invented for. (Remember that first there was HTTP, then REST described why it worked so well, and then HTTP was refined to better play to those strengths).
In all, while the classic mistake that moves a web application away from REST is to abuse POST into doing absolutely everything when GET, PUT and DELETE (and perhaps the WebDAV methods) would be superior, don't be afraid to use its power when those don't meet the bill, and don't think that the "new subordinate of the resource" has to mean a full long-lived resource.
*Note that a "single" resource here might be composed of several resources that may have their own URIs, so it can be easy to have a single DELETE that deletes multiple objects, but if deleting X deletes A, B & C then it better be obvious that you can't have A, B or C if you don't have X or your API is not going to be understandable. Generally this comes down to what is being modelled, and how obvious it is that one thing depends on another.
†Strictly speaking we can, as we're allowed to send cache headers indicating that sending an identical entity to the same URI will have the same results, but there's no general-purpose web-software that will do this and your custom client can just "remember" the opinion about a given image itself anyway.
It's a difficult one. Like with many other scenarios there is no absolutely correct way of doing it. You have to try to interpret RESTful principles in terms of the limitations of the semantics of HTTP. (Incidentally, I don't think it's right to think of REST having semantics, REST is an architectural style which is commonly used with HTTP services, but can be used for any type of interface.)
I've faced a similar situation in my current project. We chose to use a POST but with the response code being a 200 (OK) rather than the 201 (Resource Created) usually returned by RESTful Web APIs.

Proper RESTful way to handle a request that is not really creating or getting something?

I am writing a little app that does one thing only: takes some user-provided data, does some analysis on it, and returns a "tag" for that data. I am thinking that the client should either GET or POST their request to /getTag in order to get a response back.
Nothing is stored on the server when the client does this, so it feels weird to use a POST. However, there is not a uniform URI for the analysis either, so using a GET feels weird, since it will return different things depending on what data is provided.
What is the best way to represent this functionality with REST?
The "best way" is to do whatever is most appropriate for your application and its needs. Not knowing that, here are a few ideas:
GET is the most appropriate verb since you're not creating or storing anything on the server, just retrieving something that the server provides.
Don't put the word get in the URI as you've suggested. Verbs like that are already provided by HTTP, so just use /tag and GET it instead.
You should use a well-understood (or "cool") URI for this resource and pass the data as query parameters. I wouldn't worry about it feeling weird (see this question's answers to find out why).
To sum up, just GET on /tag?foo=bar&beef=dead, and you're done.
POST can represent performing an action. The action doesn't have to be a database action.
What you have really created is a Remote Procedure. RPC is usually all POST. I don't think this is a good fit for REST, but that doesn't have to stop you from using simple URLs and JSON.
It seems to me like there would probably be a reason you or the user who generated the original data would want the generated tag to persist, wouldn't they?
If that's a possibility, then I'd write it as POST /tags and pass the /tags/:id resource URI back as a Location: header.
If I really didn't care about persisting the generated tag, I'd think about what the "user-generated data" was and how much processing is happening behind the scenes. If the "tag" is different enough from whatever data is being passed into the system, GET /tag might be really confusing for an API consumer.
I'll second Brian's answer: use a GET. If the same input parameters return the same output, and you're not really creating anything, it's an idempotent action and thus perfectly suited for a GET.
You can use GET and POST either:
GET /tag?data="..." -> 200, tag
The GET method means retrieve whatever information (in the form of an
entity) is identified by the Request-URI. If the Request-URI refers to
a data-producing process, it is the produced data which shall be
returned as the entity in the response and not the source text of the
process, unless that text happens to be the output of the process.
POST /tag {data: "..."} -> 200, tag
The action performed by the POST method might not result in a resource
that can be identified by a URI. In this case, either 200 (OK) or 204
(No Content) is the appropriate response status, depending on whether
or not the response includes an entity that describes the result.
according to the HTTP standard / method definitions section.
I would use GET if I were you (and POST only if you want to send files).