ESAPI for email address "blabla#example.com" - rest

I saw some related questions. But I was not getting what exactly I was looking for. Sorry, if this turns out to be a silly request. Hopefully, I am having this specific query:
So I am trying to make a ReST API with MySQL database.
I am trying to read data from a table which is basically pulling out the valid email addresses of the users.
The output is going to be displayed on a HTML page.
temp = blabla#example.com
temp = ESAPI.encoder().canonicalize(temp);
temp = ESAPI.encoder().encodeForHTML(temp);
OUTPUT: temp = blabla#gmail.com
How can I avoid this from happening? and get blabla#email.com
I think the behavior here is as expected. But I just wanted to know if there is a work around other can Conditional Handling (if..else)
Also, what if someone can point me to the reasoning behind some of design choices for ESAPI. I t should be interesting read.

SHORT ANSWER:
If you can truly trust what's coming from your database, you don't need to perform canonicalize. If you know your data isn't going to be used by a browser, don't encode for HTML. If however you suspect your data will be used by a browser, encode it, have the caller deal with the results. If that's deemed unacceptable, expose an "unsafe" version of your webservice, one whose URL will explicitly use warning words to flag as "potentially malicious," forcing your caller to be aware that they're engaging in unsafe activity.
LONG ANSWER:
Well first, according to your use-case, you're essentially providing data to a calling client. My first instinct upon reading your question is that I don't think you're comfortable with your data contexts.
So, typically you're going to see a call to canonicalize() when you need safe data to perform validation against. So, the first questions to ask are these:
q1: Can I trust the data coming from my database?
Guidelines for q1: If the data is appropriately validated and neutralized, say by using a call to ESAPI.validator().getValidInput( args ); by the process that stores the data, then the application will store a safe email string into the database. If you can provably trust your input data at this point, it should be completely safe for you to not canonicalize your output as you're doing here.
If however, you cannot trust the data at this point, then you're in a scenario where before you pass along data to a downstream system, you'll need to validate it. A call to ESAPI.validator().getValidInput( args ); will BOTH canonicalize the input and ensure that its a valid email address. However this comes with the baggage that your caller is going to have to properly transform the neutralized input, which according to your question is what you want to avoid.
If you want to send safe data downstream, and you cannot defensibly trust your data source, you have no choice but to send safe data to your caller and have them work with it on their end--except perhaps to expose an unsafe method, which I will discuss shortly.
q2: Will browsers be used to consume my data?
Guidelines for q2: the encoder.encodeForHTML() method is designed to neutralize browser interpretation. Since you're talking about RESTful web services, I don't understand why you think you need to use it, because a browser should correctly interpret blabla#gmail.com to the correct canonical form--unless perhaps its being correctly trapped as a data element, such as in a dropdown box. But this is something I'm guessing you have NO control over?
As you can now tell, there are no fast answers to questions like this. You have to have some idea of how the data will be used by your caller. Since you have the possibility of having your data treated correctly as data by the browser, and the possibility of the data treated as code, you might be forced to offer a "safe" and "unsafe" call to retrieve your data, assuming that you have no control over how the client uses your service. That puts you in a bad spot, because a lazy caller might simply only ever use the unsafe version. When this happens in my industry, I'll usually make it so that the URL to call for an unsafe function looks something like mywebservice.com/unSafeNonPCICompliantMethod or something similar, so that you force your caller to explicitly accept the risk. If its being used in the correct context on the browser... the unsafe method might actually be safe. You just won't know.

Related

REST Best practise for filtering and knowing the result is singular: List or single?

Variety of REST practises suggest (i.e. 1, 2, 3) to use plurals in your endpoints and the result is always a list of objects, unless it's filtered by a specific value, such as /users/123 Query parameters are used to filter the list, but still result in a list, nevertheless. I want to know if my case should 'abandon' those best practices.
Let's use cars for my example below.
I've got a database full of cars and each one has a BuildNumber ("Id"), but also a model and build year which combination is unique. If I then query for /cars/ and search for a specific model and year, for example /cars?model=golf&year=2018 I know, according to my previous sentence, my retrieve will always contain a single object, never multiple. My result, however, will still be a list, containing just one object, nevertheless.
In such case, what will be the best practise as the above would mean the object have to be extracted from the list, even though a single object could've been returned instead.
Stick to best practises and export a list
Make a second endpoind /car/ and use the query parameters ?model=golf&year=2018, which are primarily used for filtering in a list, and have the result be a single object, as the singular endpoint states
The reason that I'm asking this is simply for the cleanness of the action: I'm 100% sure my GET request will result in single object, but still have to perform actions to extract it from the list. These steps should've been unnecessary. Aside of that, In my case I don't know the unique identifier, so cars/123 for retrieving a specific car isn't an option. I know, however, filters that will result in one object and one specific object altogether. The additional steps simply feel redundant.
1: https://learn.microsoft.com/en-us/azure/architecture/best-practices/api-design
2: https://blog.mwaysolutions.com/2014/06/05/10-best-practices-for-better-restful-api/
3: https://medium.com/hashmapinc/rest-good-practices-for-api-design-881439796dc9
As you've specifically asked for best practices in regards to REST:
REST doesn't care how you specify your URIs or that semantically meaningful tokens are used inside the URI at all. Further, a client should never expect a certain URI to return a certain type but instead rely on content-type negotiation to tell the server all of the capabilities the client supports.
You should furthermore not think of REST in terms of object orientation but more in terms of affordance and statemachines where a client get served every information needed in order to make an educated decision on what to do next.
The best sample to give here is probably to take a close look at the Web and how it's done for HTML pages. How can you filter for a specific car and how it will be presented to you? The same concepts that are used in the Web also apply to REST as both use the same interaction model. In regards to your car sample, the API should initially return some control-structures that teach a client how a request needs to be formed and what options could be filtered for. In HTML this is done via forms. For non-HTML based REST APIs dedicated media-types should be defined that translate the same approach to non-HTML structures. On sending the request to the server, your client would include all of the supported media-types it supports in an Accept HTTP header, which informs the server about the capabilities of the client. Media-types are just human-readable specification on how to process payloads of such types. Such specifications may include hints on type information a link relation might return. In order to gain wide-usage of media-types they should be defined as generic as possible. Instead of defining a media-type specific for a car, which is possible, it probably would be more convenient to use an existing or define a new general data-container format (similar to HTML).
All of the steps mentioned here should help you to design and implement an API that is free to evolve without having to risk to break clients, that furthermore is also scalable and minimizes interoperability concerns.
Unfortunately your question targets something totally different IMO, something more related to RPC. You basically invoke a generic method via HTTP on an endpoint, similar like SOAP, RMI or CORBA work. Whether you respect the semantics of HTTP operations or not is only of sub-interest here. Even if you'd reached level 3 of the Richardson Maturity Model (RMM) it does not mean that you are compliant to REST. Your client might still break if the server changes anything within the response. The RMM further doesn't even consider media-types at all, hence I consider it as rather useless.
However, regardless if you use a (true) REST or RPC/CRUD client, if retrieving single items is your preference instead of feeding them into a collection you should consider to include the URI of the items of interest instead of its data directly into the collection, as Evert also has suggested. While most people seem to be concerned on server performance and round-trip-times, it actually is very elegant in terms of caching. Further certain link-relation names such as prefetch may inform the client that it may fetch the targets payload early as it is highly possible that it's content will be requested next. Through caching a request might not even have to be triggered or sent to the server for processing, which is probably the best performance gain you can achieve.
1) If you use query like cars/where... - use CARS
2) If you whant CAR - make method GetCarById
You might not get a perfect answer to this, because all are going to be a bit subjective and often in a different way.
My general thought about this is that every item in my system will have its own unique url, for example /cars/1234. That case is always singular.
But this specific item might appear as a member in collections and search results. When /cars/1234 apears in these, they will always appear as a list with 1 item (or 0 or more depending on the query).
I feel that this is ultimately the most predictable.
In my case though, if a car appears as a member of a search or colletion, it's 'true url' will still be displayed.

Why use two step approach to deleting multiple items with REST

We all know the 'standard' way of deleting a single item via REST is to send a single DELETE request to a URI example.com/Items/666. Grand, let's move on to deleting many at once. As we do not require atomic deleting (or true transaction, ie all or nothing) we could just tell the client 'tough luck, make many requests' but that's not very nice is it. So we need a way to allow a client to request many 'Items' be deleted at once.
From my understanding, the 'typical' solution to this problem is a 'two step' approach. First the client POSTs a list of item IDs and is returned a URI such as example.com/Items/Collection/1. Once that collection is created, they call DELETE on it.
Now, I see that this works just fine, except to me, it is a bad solution. Firstly, you are forcing the client to make two requests to accommodate the server. Secondly, 'I thought DELETE was supposed to delete an Item?', shouldn't calling DELETE on this URI effectively cancel the transaction (it's not a true transaction though), how would we even cancel it? Really would be better if there was some form 'EXECUTE' action, but I can't rock the boat that much. It also forces the server to have to consider 'the JSON that was POSTed looks more like a request to modify these Items, but the request was DELETE... so I guess I will delete them'. This approach also starts to impose a sort of state on the client/server, not a true state I will admit, but it is sort of.
In my opinion, a better solution would be to simply call DELETE on example.com/Items (or maybe example.com/Items/Collection to imply this is a multiple delete) and pass JSON data containing a list of IDs that you wish to delete. As far as I can see, this basically solves all the problems the first method had. It is easier to use as a client, reduces the work the server has to do, is truly stateless, is more semantic.
I would really appreciate the feed back on this, am I missing something about REST that makes my solution to this problem unrealistic? I would also appreciate links to articles, especially if they compare these two methods; I am aware this is not normally approved of for SO. I need to be able to disprove that only the first method is truly RESTfull, prove that the second approach is a viable solution. Of course, if I am barking up the wrong tree do tell me.
I have spent the last week or so reading a fair bit on REST, and to the best of my understanding, it would be wrong to describe either of these solutions as 'RESTfull', rather you should say that 'neither solution goes against what REST means'.
The short answer is simply that REST, as laid out in Roy Fielding's dissertation (See chapter 5), does not cover the topic of how to go about deleting resources, singular or multiple, in a REST manor. That's right, there is no 'correct RESTful way to delete a resource'... well, not quite.
REST itself does not define how delete a resource, but it does define that what ever protocol you are using (remember that REST is not a protocol) will dictate the how perform these actions. The protocol will usually be HTTP; 'usually' being the key word as Fielding will point out, REST is not synonymous with HTTP.
So we look to HTTP to say which method is 'right'. Sadly, as far as HTTP is concerned, both approaches are viable. Yes 'viable'. HTTP will allow a client to send a POST request with a payload (to create a collection resource), and then call a DELETE method on this new collection to delete the resources; it will also allow you to send the data within the payload of a single DELETE method to delete the list of resources. HTTP is simply the medium by which you send requests to the server, it would be up to the server to respond appropriately. To me, the HTTP protocol seems to be rather open to interpretation in places, but it does seem to lay down fairly clear guide lines for what actions mean, how they should be dealt with and what response should be given; it's just it is a 'you should do this' rather than 'you must do this', but perhaps I am being a little pedantic on the wording.
Some people would argue that the 'two stage' approach cannot possibly be 'REST' as the server has to store a 'state' for the client to perform the second action. This is simply a misunderstanding of some part. It must be understood that neither the client nor the server is storing any 'state' information about the other between the list being POSTed and then subsequently being DELETEd. Yes, the list must have been created before it can deleted, but the server does not remember that it was client alpha that made this list (such an approach would allow the client to simply call 'DELETE' as the next request and the server remembers to use that list, this would not be stateless at all) as such, the client must tell the server to DELETE that specific list, the list it was given a specific URI for. If the client attempted to DELETE a collection list that did not already exist it would simply be told 'the resource can not be found' (the classic 404 error most likely). If you wish to claim that this two step approach does maintain a state, you must also claim that to simply GET an URI requires a state, as the URI must first exist. To claim that there is this 'state' persisting is misunderstanding what 'state' means. And as further 'proof' that such a two stage approach is indeed stateless, you could quite happily have client alpha POST the list and later client beta (without having had any communication with the other client) call DELETE on the list resources.
I think it can stand rather self evident that the second option, of just sending the list in the payload of the DELETE request, is stateless. All the information required to complete the request is stored completely within the one request.
It could be argued though that the DELETE action should only be called on a 'tangible' resource, but in doing so you are blatantly ignoring the REpresentational part of REST; It's in the name! It is the representational aspect that 'permits' URIs such as http://example.com/myService/timeNow, a URI that when 'got' will return, dynamically, the current time, with out having to load some file or read from some database. It is a key concept that the URIs are not mapping directly to some 'tangible' piece of data.
There is however one aspect of that stateless nature that must be questioned. As Fielding describes the 'client-stateless-server' in section 5.1.3, he states:
We next add a constraint to the client-server interaction: communication must
be stateless in nature, as in the client-stateless-server (CSS) style of
Section 3.4.3 (Figure 5-3), such that each request from client to server must
contain all of the information necessary to understand the request, and
cannot take advantage of any stored context on the server. Session state is
therefore kept entirely on the client.
The key part here in my eyes is "cannot take advantage of any stored context on the server". Now I will grant you that 'context' is somewhat open for interpretation. But I find it hard to see how you could consider storing a list (either in memory or on disk) that will be used to give actual useful meaning would not violate this 'rule'. With out this 'list context' the DELETE operation makes no sense. As such, I can only conclude that making use of a two step approach to perform an action such as deleting multiple resources cannot and should not be considered 'RESTfull'.
I also begrudge somewhat the effort that has had to be put into finding arguments either way for this. The Internet at large seems to have become swept up with this idea the the two step approach is the 'RESTfull' way doing such actions, with the reasoning 'it is the RESTfull way to do it'. If you step back for a moment from what everybody else is doing, you will see that either approach requires sending the same list, so it can be ignored from the argument. Both approaches are 'representational' and 'stateless'. The only real difference is that for some reason one approach has decided to require two requests. These two requests then come with follow up questions, such as how 'long do you keep that data for' and 'how does a client tell a server that it no longer wants that this collection, but wishes to keep the actual resources it refers to'.
So I am, to a point, answering my question with the same question, 'Why would you even consider a two step approach?'
IMO:
HTTP DELETE on existing collection to delete all of its member seems fine. Creating the collection just to delete all of the member sounds odd. As you yourself suggest, just pass IDs of the to be deleted items using JSON (or any other payload format). I think that the server should try to make multiple deletes an internal transaction though.
I would argue that HTTP already provides a method of deleting multiple items in the form of persistent connections and pipelining. At the HTTP protocol level it is absolutely fine to request idempotent methods like DELETE in a pipelined way - that is, send all the DELETE requests at once on a single connection and wait for all the responses.
This may be problematic for an AJAX client running in a browser since few browsers have pipelining support enabled by default. This is not the fault of HTTP, though, it is the fault of those specific clients.

post body in REST

I was referring to the O'Reilly book on REST api design, that clearly lays down the message format specifically around the areas of how links should be used to represent interrelated resources and stuff. But all the examples are for reading a resource (GET) and how the server structures the message. But what about a Create (POST) ? Should the message structure for create of a similarily inter-connected object be similar i.e through links ??
By the way of an example, let us consider we want to create a Person object with a Parent field . Should the json message format sent to server thru POST (Post msg body) be like :-
{
name:'test',
age:12,
links:[
{
rel:'parent',
href:'/people/john'
}
]
}
Here is a media type you could look at
http://stateless.co/hal_specification.html
Yes, that is one way of doing it. GET information might be usefully made human-readable, but POST/PUT information targets the machine.
Adding information to reduce the server's need to process information (e.g. by limiting itself to verifying information makes sense rather than recovering it all from scratch) also makes a lot of sense, performance-wise. As long as you do verify: keep in mind that user data must be treated as suspect on general principles. You don't want the first ExtJS-savvy guy being able to forge requests to your services.
You might also format data in XML or CSV, depending on what's best for the specific application. And keeping in mind that you might want to refactor or reuse the code, so adhering to a single standard also makes sense. All things considered, JSON is probably the best option.

Proper RESTful way to handle a request that is not really creating or getting something?

I am writing a little app that does one thing only: takes some user-provided data, does some analysis on it, and returns a "tag" for that data. I am thinking that the client should either GET or POST their request to /getTag in order to get a response back.
Nothing is stored on the server when the client does this, so it feels weird to use a POST. However, there is not a uniform URI for the analysis either, so using a GET feels weird, since it will return different things depending on what data is provided.
What is the best way to represent this functionality with REST?
The "best way" is to do whatever is most appropriate for your application and its needs. Not knowing that, here are a few ideas:
GET is the most appropriate verb since you're not creating or storing anything on the server, just retrieving something that the server provides.
Don't put the word get in the URI as you've suggested. Verbs like that are already provided by HTTP, so just use /tag and GET it instead.
You should use a well-understood (or "cool") URI for this resource and pass the data as query parameters. I wouldn't worry about it feeling weird (see this question's answers to find out why).
To sum up, just GET on /tag?foo=bar&beef=dead, and you're done.
POST can represent performing an action. The action doesn't have to be a database action.
What you have really created is a Remote Procedure. RPC is usually all POST. I don't think this is a good fit for REST, but that doesn't have to stop you from using simple URLs and JSON.
It seems to me like there would probably be a reason you or the user who generated the original data would want the generated tag to persist, wouldn't they?
If that's a possibility, then I'd write it as POST /tags and pass the /tags/:id resource URI back as a Location: header.
If I really didn't care about persisting the generated tag, I'd think about what the "user-generated data" was and how much processing is happening behind the scenes. If the "tag" is different enough from whatever data is being passed into the system, GET /tag might be really confusing for an API consumer.
I'll second Brian's answer: use a GET. If the same input parameters return the same output, and you're not really creating anything, it's an idempotent action and thus perfectly suited for a GET.
You can use GET and POST either:
GET /tag?data="..." -> 200, tag
The GET method means retrieve whatever information (in the form of an
entity) is identified by the Request-URI. If the Request-URI refers to
a data-producing process, it is the produced data which shall be
returned as the entity in the response and not the source text of the
process, unless that text happens to be the output of the process.
POST /tag {data: "..."} -> 200, tag
The action performed by the POST method might not result in a resource
that can be identified by a URI. In this case, either 200 (OK) or 204
(No Content) is the appropriate response status, depending on whether
or not the response includes an entity that describes the result.
according to the HTTP standard / method definitions section.
I would use GET if I were you (and POST only if you want to send files).

RESTful API design: should unchangable data in an update (PUT) be optional?

I'm in the middle of implementing a RESTful API, and I am unsure about the 'community accepted' behavior for the presence of data that can not change. For example, in my API there is a 'file' resource that when created contains a number of fields that can not be modified after creation, such as the file's binary data, and some metadata associated with it. Additionally, the 'file' can have a written description, and tags associated.
My question concerns doing an update to one of these 'file' resources. A GET of a specific 'file' will return all the metadata, description & tags associated with the file, plus the file's binary data. Should a PUT of a specific 'file' resource include the 'read only' fields? I realize that it can be coded either way: a) include the read only fields in the PUT data and then verify they match the original (or issue an error), or b) ignore the presence of the read only fields in the PUT data because they can't change, never issuing an error if they don't match or are missing because the logic ignores them.
Seems like it could go either way and be acceptable. The second method of ignoring the read only fields can be more compact, because the API client can skip sending that read only data if they want; which seems good for people who know what they are doing...
Personally, both ways are acceptable.... however, if I were you, I'll opt for option A (check read-only fields to ensure they are not changed, else throw an error). Depending on the scope of your project, you cannot assume what the consumers know about your Restful WS in depth because most of them don't read documentations or WADL, even if they are the experienced ones. :)
If you don't provide immediate feedback to the consumers that certain fields are read-only, they will have a false assumption that your web service will take care all the changes they have made without double checking, OR once they find out the "inconsistent" updates, they complain to others that your web service is buggy.
You can approach this in two different ways if the read-only field doesn't match the original values...
Don't process the request. Send a 409 Conflict code and specific error message.
Process the request, send a 200 OK and a message stating that changes made the read-only fields are ignored.
Unless the read-only data makes up a significant portion of the data (to the extreme that transmitting the read-only data has a noticeable impact on network traffic and response times), you should write the service to accept the read only fields in the PUT and check them for changes. It's just simpler to have the same data going in and out.
Look at it this way: You could make inclusion of the read only fields optional in the PUT, but you will still have to / should write the code in the service to check that any read only fields that were received contain the expected values. You have to write the read only checking either way.
Prohibiting the read-only fields in the PUT is a bad idea because it will require the clients to strip away fields they received from you in the GET. This requires that the client get more intimately involved with your data and semantics than they really need to be. The clients will consider this a headache, an unnecessary complication, and downright mean of you to add to their burden. Taking data received from your GET, modifying one field of interest, and sending it back to you with a PUT should be a brain-dead simple round-trip for the client. Don't complicate things when you don't have to.