How can the client side encoding be bypassed in XSS - encoding

I hear everyone saying Output encoding has to be done client-side instead of server-side. My question is: doesnt it vary with context?
Are there cases where client-side output encoding is good enough and
cant be bypassed?
If I use a client side js function like encodeURIComponent to encode a url causing XSS, how can an attacker bypass this and still cause XSS?
Phishing can also happen due to XSS. If I at least do output encoding can phishing be prevented?

The short answer is that XSS encoding needs to happen where data is put into html or javascript be it server-sider and/or client-side. I could easily imagine data put into a script tag on the server side begin properly encoded, but then javascript on the client-side is using that value in an insecure way, creating an XSS vulnerability.
So when putting untrusted data into a web page (be it in a html tag, inside -tags, in css. etc - see the OWASP XSS prevention cheat sheet) we need to encode. Then when we come to the client side, we also need to make sure our javascript does not introduce XSS-problems. This could for instance be DOM-based XSS, or the example mentioned above.
So my answer is, you need to do encoding both on the server AND client side.
I don't understand how the 3rd question is related. Phishing could happen in so many different ways. On a completely different domain just mimicking the original page etc.
Edit: One more thing. If utrusted data is put into the page server side without encoding, there is very little the client side can do to fix that. It's most likely already to late.

Erlend answer is beautiful. I want to share my findings regarding output encoding.
Output encoding done in server side is better that in client side.
You can get more knowledge regarding output encoding from OWASP Xss Prevention
and you can do this client side too. If you are going to use un trusted(user given input) data in html context, please use javascript's native api innerText IE Docs ( textContent for moz) or encoding the characters (<,>,',",/,) into html entity

Related

Need help in identifying the difference between ESAPI.validator() and ESAPI.encoder()

We are implementing application security in our website. Its a REST based application, so i will have to validate the whole request payload, rather than each attribute. This payload need to be validated against all type of attacks (SQL,XSS etc). While browsing i found people are using ESAPI for web security.
I am confused between ESAPI.validator().getValidXXX, ESAPI.encoder() Java API's of ESAPI library. What is the difference between these two and when to use which API. I would also like to know in what cases we might use both API's
As per my understanding i could encode an input to form a valid html using both API's
Eg:
ESAPI.encoder().encodeForHTML(input);
ESAPI.validator().getValidSafeHTML(context, input, maxLength, allowNull).
For XSS attacks, I have made code changes to strip-of html tags using java pettern&matcher, but i would like to achieve the same using ESAPI. Can someone help me how to achieve it.
Or
Are there any new java plugins developed for websecurity similar to ESAPI which i did not come accross. I have found https://jsoup.org/, but it solves only XSS attacks, i am looking for a library which provides API's for several attacks (SQL injection/XSS)
ESAPI.encoder().encodeForHTML(input);
You use this when you're sending input to a browser, so that the data you're sending gets escaped for HTML. This can get tricky, because you have to know if that exact data is for example, being passed to javascript before it is being rendered into HTML. Or if it's being used as part of an HTML attribute.
We use:
ESAPI.validator().getValidSafeHTML(context, input, maxLength, allowNull).
when we want to get "safe" HTML from a client, that is backed by an antisamy policy file that describes exactly what kinds of HTML tags and HTML attributes we will accept from the user. The default is deny, so you have to explicitly tell policy file, if you will accept:
text
You need to specify that you want the "a" tag, and that you will allow an "href" attribute, and you can even specify further rules against the content within the text fields and tag attributes.
You only need "getValidSafeHTML" if your application needs to accept HTML content from the user... which is usually specious in most corporate applications. (Myspace used to allow this, and the result was the Samy worm.)
Generally, you use the validator API when content is coming into your application, and the encoder API when you direct content back to a user or a backend interpreter. AntiSamy isn't supported anymore, so if you need a "safe HTML" solution, use OWASP's HTML Sanitizer.
Are there any new java plugins developed for websecurity similar to
ESAPI which i did not come accross. I have found https://jsoup.org/,
but it solves only XSS attacks, i am looking for a library which
provides API's for several attacks (SQL injection/XSS)
The only other one that attempts a similar amount of security is HDIV. Here is an answer that compares HDIV to ESAPI by an HDIV developer.
*DISCLAIMER: I am an ESAPI developer, and OWASP member.
Sidenote: I discourage the use of Jsoup, because by default it mutates incoming data, constructing "best guess" (invalid) parse trees, and doesn't allow you fine-grained control of that behavior... meaning, if there's an instance where you want to override and mandate a particular kind of policy, Jsoup asserts that it is always smarter than you are... and that's simply not the case.

How to check if there is any script injected in the json request?

We have got issues in our AEM application for cross site scripting. We decided to check for any scripts before submitting a request. How do we check if there is any script available in the SOAP request at the server side(Java). Is this the correct solution for avoiding cross site scripting issue?
This is a pretty broad question, and we can't provide any implementation details since we don't know any of your architecture or implementation details. However, there are some general XSS things to keep in mind:
If you are "checking for scripts" only in the browser, using JS, before submitting a form that will not solve anything. People can easily bypass this by simply issuing the HTTP request that the form would have made from any other tool (e.g. curl, PostMan, etc.). You need to check for bad data on the server side while processing the request that the Form is submitting.
As far as how to do this sort of thing on the CQ server side: Adobe has some recommendation that you should read through:
AEM 6.1
AEM 5.6
The PDF "cheat sheet" link on those pages will probably be most helpful.
There are different ways to mitigate the XSS risk. White-listing the data to let only known good data through, black-listing the data to block out any known bad data, encoding the data to prevent scripts from being treated as HTML. For an excellent read on what to do pay attention to the OWASP recommendations
Check out XSSAPI , you can use methods in this api to prevent XSS security risks.
On the other hand, you could probably start using sightly which provides automatic contextual XSS protection.

HATEOAS Content-Type : Custom mime-type

I've been trying to implement a RESTFul architecture, but I've gotten thoroughly confused as to whether custom media-types are good or bad.
Currently my application conveys "links", using the Http Link: header. This is great, I use it with a title attribute, allowing the server to describe what on earth this 'action' actually is, especially when presented to a user.
Where I've gotten confused is whether I should specify a custom mime-type or not. For instance I have the concept of a user. It may be associated with the current resource. I'm going to make up an example and say I have an item on an auction. We may have a user "watching" it. So I would include the link
<http://someuserrelation> rel="http://myapp/watching";title="Joe Blogg", methods="GET"
In the header. If you had the ability to remove that user from watching you would get.
<http://someuserrelation> rel="http://myapp/watching";title="Joe Blogg", methods="GET,DELETE"
I'm pretty happy with this, if the client has the correct role he can remove the relationship. So I'm defining what to do with a relationship. The neat thing is say we call GET on that 'relation' resource, I redirect the client to the user resource.
What's confusing me is whether or not to use a custom mime-type. There's arguments both ways on the internet, and in my head in regards to this.
I've done a sample in which I call HEAD on an unknown url, and the server returns Content-Type: application/vnd.myapp.user. My client then decides whether it can understand this mime-type (it maintains mappings of resources it understands to views), and will either follow it, or explain that it's unable to figure out what is at the end of that link.
Is this bad?. I have to maintain special mime-types. What's particularly odd is I'm more than happy to use a standard application/user format, but can't find one specified anywhere.
I'm beginning to think I should be attempting to completely guess at rendering what in any HTTP response, almost to the point that maybe my RESTFul api should just be rendering html instead of attempting to do anything with json/xml.
I've tried searching (even Roy Fieldings blog), but can't find anything that describes how the client should deal with this sort of situation.
EDIT: the argument I have with including the custom type, is that it may not necessarily be a 'user' watching the item, it could be something with application/vnd.myapp.group. By getting the response a client knows the body has something different, and so change to a view that displays groups. But is this coupling of mime-type to view bad?.
I'd say, you definitely want to have a specific media-type for all representations. If you can find a standard one (html, jpeg, atom, etc.) use that, but if not, you should define one (or multiple ones).
The reason is: representations should be self-contained. That means your client gets a link from somewhere it should know what to do with it. How to display it, how to proceed from there, etc. For example a browser knows how to display text/html. You client should know how to display/handle application/vnd.company.user.
Also, I think you've got content negotiation backwards. You don't need to call HEAD to determine what representations the server supports. You can tell the server what your client supports in the GET/POST/etc requests using the "Accepts" header. Indeed this would be the standard way to do it. The server then responds with the 'best' representation it can give you for your accepted mime-types. You don't need more round-trips.
So, although the links you are providing can contain contextual information, usually given in the 'rel' attribute, like if the link points to a 'next page', 'previous page', 'subscribed user' or 'owner user', etc., the client can not assume any representation under those links. It knows it is semantically a 'user', so it can fill the 'Accepts' header with all supported representations for a user (application/vnd.company.user). If the representation only says text/xml, there is nothing the client can assume either of the content, or semantics of Links that it may receive.
In practice you can of course code any client to just assume what representations are under what links/urls, and you don't have to conform to REST all the time, but you do get a lot of benefits (described in Roy Fielding's paper) if you do.
Another minor point: The links do not need to contain which methods are available for a given resource, that's what OPTIONS is for. Admittedly, it is rarely implemented.
You don't have to send HTML to serve hypermedia. There are many different hypermedia formats which are much easier to parse than HTML. https://sookocheff.com/post/api/on-choosing-a-hypermedia-format/ https://stackoverflow.com/a/13063069/607033
You don't have to use a domain specific MIME type, in my opinion it is better to use a domain specific vocabulary with a general hypermedia type e.g. microformats or schema.org with JSON-LD + Hydra or ATOM/XML + microdata / RDFa, etc... There are many alternatives depending on your taste. https://en.wikipedia.org/wiki/Microdata_(HTML) http://microformats.org/ http://schema.org/ http://www.hydra-cg.com/
I am not sure whether adding the same relation to multiple methods is a good choice. Sending multiple links with different in link headers with d if you want: https://stackoverflow.com/a/25416118/607033

Can you please correct me if i am wrong about REST?

REST is used to communicate any two systems.
So that if you want get info from one machine we have to use GET method and add info in one system we need to use the method POST..Like wise PUT and DELETE.
When a machine GETs the resource, it will ask for the machine readable one. When a browser GETs a resource for a human, it will ask for the human readable one.
So When you are sending request from machine 1. It will go to some machine x. Machine x will send a machine readable format to machine 1. Now Browser changes to user readable format.
So JSON is a machine readable format and HTML is a client readable format...Correct me if i am wrong?
REST is an architectural style, not a technology. That being said, the only technology that most people know that is intended to align with the REST architectural style is HTTP. If you want to understand the REST architectural style, I recommend the following two resources:
Roy Fielding's presentation "The Rest of REST" (http://roy.gbiv.com/talks/200709_fielding_rest.pdf)
The book "RESTful Web Services"
When you send a GET request for a resource, it is up to the server to determine what representation (format, e.g. html vs. json) it wishes to send back. The client can send along an Accept header that specifies a set of preferred formats, but it's ultimately up to the server to decide what it wants to send. To learn more about this interaction, Google on "HTTP content negotiation".
The reason browsers tend to get back HTML is that they send an Accept header with "text/html". If you somehow configured your browser to always send an Accept header of only "application/json", you would sometimes get JSON back (if the server supported JSON representations), sometimes HTML (if the server ignored your Accept header) and sometimes an error saying that the server could not support the representation you requested.
A computer can parse either JSON or HTML if you have the right libraries. JSON content tends to be structured data (optimized for parsing) and HTML tends to be optimized for presentation, so JSON is generally much easier for a program to parse.
Sounds about right to me. HTML is definitely for end-user consumption (despite all the nasty screen-scraping code out there) and there's no way that I'd want to deliver JSON (or XML or YAML) to end-user clients for direct display.
You probably want to ensure that you only deliver HTML that matches up with the same basic data model that you're delivering to mechanical clients; producing XHTML on demand by applying an XSLT stylesheet to an XML version of your standard responses is probably the easiest way to do it since it's likely you can do that with a layer that's independent of your basic application.
To be pedantic, both HTML and JSON are machine readable formats. The difference is that HTML has a specification that describes some semantics that web browsers know how to interpret in order to visually render it. The JSON spec has really no semantics other than defining how to serialize arrays, objects and properties.
Don't forget JSON and HTML are just two of the hundreds of potentially useful media type formats that a RESTful system can use.

What are the Pros and Cons of using URI vs Accept Headers for REST content format negotiation?

Based on info in the following question REST Content-Type: Should it be based on extension or Accept header?, I'm aware either custom URIs or specifying Accept Headers are 'acceptable' (pun intended) methods for for a REST-ish web service to determine response format for the client.
However, a lot of big names seem to use the custom URI method with their APIs. What are the strengths of one way over the other?
In REST, URIs are intended to identify only a resource. Content negotiation is used to identify representation format. It's your traditional separation of concerns. When using the URI to identify the representation format you are mixing those concerns.
In addition to mixing concerns, my observation is that when using the URI based approach people generally know the convention and rely on URI building rather than hypertext to navigate. This increases coupling and can cause problems if the server ever wants to change the URI structure.
With that being said, there are some positives from the URI approach, namely convenience. During development, you can launch the browser and easily see what the server is responding with by simply entering it in the address bar ('example.com/foo.json'). When relying 100% on content negotiation it's a bit more difficult, and you have to rely on plugins or cURL, anything that can manipulate the headers.