How to determine the encoding of request query string - unicode

Suppose I have a .NET HttpModule that analyzes incoming requests to check for possible attacks like Sql Injection.
Now suppose that a user of my application enters the following in a form field and submits it:
&#039&#032&#079&#082&#032&#049&#061&#049
That is Unicode for ' OR 1=1. So in the request I get something like:
http://example.com/?q=%26%23039%26%23032%26%23079%26%23082%26%23032%26%23049%26%23061%26%23049
Which in my HttpModule looks fine (no Sql Injection), but the server will correctly decode it to q=' OR 1=1 and my filter will fail.
So, my question is: Is there any way to know at that point what is the encoding used by the request query string, so I can decode it and detect the attack?
I guess the browser has to tell the server which encoding the request is in, so it can be correctly decoded. Or am I wrong?

the server will correctly decode it to q=' OR 1=1
It shouldn't. There is no valid reason(*) an application would HTML-decode the &#039... string before using it in an SQL query. HTML-decoding is a client-side occurrence.
(* there's the invalid reason: that the application author doesn't have the foggiest idea what they're doing, tries to write an input-HTML-escaping function - a misguided idea in the first place - and due to incompetence writes an input-de-escaping function instead... but that would be an unlikely case. Hopefully.)
Is there any way to know at that point what is the encoding used by the request query string
No. Some Web Application Firewalls attempt to get around this by applying every decoding scheme they can think of to the incoming data, and triggering if any of them match something suspicious, just in case the application happens to have an arbitrary decoder of that type sitting between the input and a vulnerable system.
This can result in a performance hit as well as increased false positives, and doubly so for the WAFs that try all possible combinations of two or more decoders. (eg is T1IrMQ a base-64-encoded, URL-encoded OR 1 SQL attack, or just a car numberplate?)
Quite how far you take this idea is a trade-off between how many potential attacks you catch and how much negative impact you have on real users of the app. There's no one 'correct' solution because ultimately you can never provide complete protection against app vulnerabilities in a layer outside the app (aka "WAFs don't work").

What you are seeing is URL Encoded, where a percent sign followed by 2 hex digits represents a single encoded byte octet. In HTML, an entity starting with an ampersand and ending with a semicolon contains an entity name or an explicit Unicode codepoint value.
What gets sent over the wire between the browser and server is http://example.com/?q=%26%23039%26%23032%26%23079%26%23082%26%23032%26%23049%26%23061%26%23049, but logically is actually represents http://example.com/?q=&#039&#032&#079&#082&#032&#049&#061&#049 when decoded by the server upon receiving it. When your code reads the query string, it should be receiving &#039&#032&#079&#082&#032&#049&#061&#049. The server should not be decoding that any further to ' OR 1=1, you would have to do that in your own code.
If you are allowing a URL query string to specify an SQL query filter as-is, then that is a mistake on your part to begin with. That suggests you are building SQL queries dynamically instead of using parameterized SQL queries or stored procedures, so you are leaving yourself open to SQL Injection attacks. You should not be using that. Parameterized SQL queries and stored procedure are not subject to injection attacks, so your clients should only be allowed to submit the indiviudal parameter values in the URL. Your server code can then extract the individual values from the URL query and pass them to the SQL parameters as needed. The SQL Engine will make sure the values are santitized and formatted to avoid attacks. You should not be handling that manually.

Related

How to avoid Mongo DB NoSQL blind (sleep) injection

While scanning my Application for vulnerability, I have got one high risk error i.e.
Blind MongoDB NoSQL Injection
I have checked what exactly request is sent to database by tool which performed scanning and found while Requesting GET call it had added below line to GET request.
{"$where":"sleep(181000);return 1;"}
Scan received a "Time Out" response, which indicates that the injected "Sleep" command succeeded.
I need help to fix this vulnerability. Can anyone help me out here? I just wanted to understand what I need to add in my code to perform this check before connecting to database?
Thanks,
Anshu
Similar to SQL injection, or any other type of Code Injection, don't copy untrusted content into a string that will be executed as a MongoDB query.
You apparently have some code in your app that naively accepts user input or some other content and runs it as a MongoDB query.
Sorry, it's hard to give a more specific answer, because you haven't shown that code, or described what you intended it to do.
But generally, in every place where you use external content, you have to imagine how it could be misused if the content doesn't contain the format you assume it does.
You must instead validate the content, so it can only be in the format you intend, or else reject the content if it's not in a valid format.

RESTful query API design

I want to ask what is the most RESTful way for queries, I have this existing API
/entities/users?skip=0&limit=100&queries={"$find":{"$minus":{"$find":{"username":"markzu"}}}}
Easily the first parts of the query, skip and limit are easily identifiable however I find the "queries" part quite confusing for others. What the query means is to
Find every User minus Find User entities with username 'markzu'
The reason it is defined this way is due to the internal database query behavior.
Meaning in the NoSQL database we use, the resource run two transactional queries, first is to find everything in the User table minus a find User with a username that was specified (similar to SQL) -- boolean operations. So in other words, the query means, "fetch every User except username 'markzu' "
What is the proper way to define this in RESTful way, based on standards?
What is the proper way to define this in RESTful way, based on standards?
REST doesn't care what spelling you use for resource identifiers, so long as your choice is consistent with the production rules defined in RFC 3986.
However, we do have a standard for URI Templates
A URI Template is a compact sequence of characters for describing a range of Uniform Resource Identifiers through variable expansion.
You are already aware of the most familiar form of URI template -- key-value pairs encoded in the query string.
?skip=0&limit=100&username=markzu
That's often a convenient choice, because HTML understands how to process forms into url encoded queries.
It doesn't look like you need any other parameters, you just need to be able this query from others. So a perfectly reasonable choice might be
/every-user-except?skip=0&limit=100&username=markzu
It may help to think "prepared statement", rather than "query".
The underlying details of the implementation really shouldn't enter into the calculation at all. Your REST API is a facade that makes your app look like an HTTP aware key value store.

Is it RESTful to include parameters in a GET request’s body rather than the URI?

I have a complicated schema that requires complicated API calls. For many resource retrievals, the user would want to specify several parameters to filter the results. Including all of these parameters in the URI seems like it would be messy and difficult for front-end developers to craft, so I’ve opted to put the parameters into the request body as JSON. Unfortunately, this doesn’t seem to sit well with the web back-end I’m using (Django-Rest Framework). Is this RESTful, or am I making a mistake?
As a follow-up question, if I should put the parameters in the URI, how would I represent complex pieces of data, like lists of strings, and the relationships between pieces of data?
Is this RESTful, or am I making a mistake?
It sounds to me as though you are making a mistake. The authority in this case is RFC 7231
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
My interpretation is this: caching is an important part of the web; for caching to work as people would expect it requires compliant caches to be able to manage that message body as part of the key.
An HTTP method that may serve your needs better is SEARCH.
The SEARCH method plays the role of transport mechanism for the query and the result set. It does not define the semantics of the query. The type of the query defines the semantics.
SEARCH is a safe method; it does not have any significance other than executing a query and returning a query result.
If that doesn't fit your needs, you could look through the HTTP method registry to see if one of the other standards fits your use case.
how would I represent complex pieces of data, like lists of strings, and the relationships between pieces of data?
The real answer is "any way you want" -- the origin server has control of its URI space, and any information encoded into it is done so at the server's convenience for its own use.
You could, for instance, consider using one of the Base64 encodings defined in RFC 4648
From what I read about RESTful, you can only use GET, POST, PUT, PATCH, and DELETE.
The GET and DELETE are not expected to include a body. As #VoiceOfUnreason mentioned, this is mainly because caches can have difficulties handling a body along a GET. That being said, if your results are never cached, it should not be a concern at all. (i.e. return Cache: no-cache and other similar HTTP header from your server.)
There is no real convention on the Query String and supporting lists or JSON and such. If you want to keep a GET, you could use an encoded JSON string, though. There is no problem with that, except the length of the URL.
http://www.example.com/?query=<encoded-json>
(encoded just means that you have to properly escape URI special characters, what the JavaScript encodeURICompent() function does.)
The length of the URL should be kept under 1Kb to be 100% safe. You can do some research on it, I think that the browser with the most stringent limit is around 2k.
If you want to use a bigger query, then you should revert your queries to using a POST and not a GET. Then the buffer is normal in that situation and the reply is not expected to be cached.
Real World Use (but Not An Excuse)
If you look into Elasticsearch, you will see that all their queries accept a JSON. You can send a DSL query using a GET or a POST. Either one accept a JSON in their body.
They offer the POST because most browsers will not accept to attach a body to a GET method. So GET queries would not work at all from a browser.
Example of Arrays in Query Strings
There has been various libraries, and at least PHP, that added support for arrays in parameters. In most cases this is done by supporting the array syntax in the parameter name. For example:
path/?var[1]=123&var[2]=456&var[3]=789
In this case, those languages will convert the value in an array. $_GET['var'][1] would then return 123.
This is not a convention, just an extension of that specific environment.

When and where to encode user input?

I am currently storing data submitted from users into my database already encoded like such:
<cfquery>
INSERT INTO dbo.MyTable (UserID, Comment)
VALUES
(
<cfqueryparam value="#FORM.UserID#" cfsqltype="cf_sql_integer"/>,
<cfqueryparam value="#EncodeForHTML(FORM.Comment)#" cfsqltype="cf_sql_nvarchar"/>
)
</cfquery>
Evidently this is not the right way to do it because now I have escaped characters in my DB table which are only useful for HTML output and difficult to perform searches on within SQL Server.
So how do I ensure that I apply the EncodeForHTML() on the input before it hits the server and then Canonicalize() the data received to be stored in the DB?
Mitigate potentially DB-harmful text when it heads towards the DB: pass it as a parameter, not hard-coded into the SQL statement, as you have kinda done in your example. You are still exposing yourself by not parameterising your ID value. As a rule, only SQL should go in your <cfquery>'s SQL string; any data values should be passed as parameters.
Similarly, mitigate risk your user-provided data might expose when you use the data. Not when it goes into storage, but when you actually use it. encodeForHtml() is only appropriate for stuff being written into HTML. It's no help if it's being passed on a URL, or used in JavaScript, etc. There are different mitigation approaches for those (urlEncodedFormat() and encodeForJavaScript() respectively). The point being you handle the mitigation on a use-by-use basis, not just generically.
And how to ensure this is done (you ask this)? Well... you write your code diligently and have a rigorous code review and QA process (with QA doing pen. tests).
You can store them as is, and use <cfqueryparam> for form.userid as well. On output, you use encodeforhtml().
If you prefer to have some data sanitizing done before storing, try AntiSamy (built-in in CF11) http://www.adobe.com/devnet/coldfusion/articles/security-improvements-cf11.edu.html#articlecontentAdobe_numberedheader_2

ESAPI for email address "blabla#example.com"

I saw some related questions. But I was not getting what exactly I was looking for. Sorry, if this turns out to be a silly request. Hopefully, I am having this specific query:
So I am trying to make a ReST API with MySQL database.
I am trying to read data from a table which is basically pulling out the valid email addresses of the users.
The output is going to be displayed on a HTML page.
temp = blabla#example.com
temp = ESAPI.encoder().canonicalize(temp);
temp = ESAPI.encoder().encodeForHTML(temp);
OUTPUT: temp = blabla#gmail.com
How can I avoid this from happening? and get blabla#email.com
I think the behavior here is as expected. But I just wanted to know if there is a work around other can Conditional Handling (if..else)
Also, what if someone can point me to the reasoning behind some of design choices for ESAPI. I t should be interesting read.
SHORT ANSWER:
If you can truly trust what's coming from your database, you don't need to perform canonicalize. If you know your data isn't going to be used by a browser, don't encode for HTML. If however you suspect your data will be used by a browser, encode it, have the caller deal with the results. If that's deemed unacceptable, expose an "unsafe" version of your webservice, one whose URL will explicitly use warning words to flag as "potentially malicious," forcing your caller to be aware that they're engaging in unsafe activity.
LONG ANSWER:
Well first, according to your use-case, you're essentially providing data to a calling client. My first instinct upon reading your question is that I don't think you're comfortable with your data contexts.
So, typically you're going to see a call to canonicalize() when you need safe data to perform validation against. So, the first questions to ask are these:
q1: Can I trust the data coming from my database?
Guidelines for q1: If the data is appropriately validated and neutralized, say by using a call to ESAPI.validator().getValidInput( args ); by the process that stores the data, then the application will store a safe email string into the database. If you can provably trust your input data at this point, it should be completely safe for you to not canonicalize your output as you're doing here.
If however, you cannot trust the data at this point, then you're in a scenario where before you pass along data to a downstream system, you'll need to validate it. A call to ESAPI.validator().getValidInput( args ); will BOTH canonicalize the input and ensure that its a valid email address. However this comes with the baggage that your caller is going to have to properly transform the neutralized input, which according to your question is what you want to avoid.
If you want to send safe data downstream, and you cannot defensibly trust your data source, you have no choice but to send safe data to your caller and have them work with it on their end--except perhaps to expose an unsafe method, which I will discuss shortly.
q2: Will browsers be used to consume my data?
Guidelines for q2: the encoder.encodeForHTML() method is designed to neutralize browser interpretation. Since you're talking about RESTful web services, I don't understand why you think you need to use it, because a browser should correctly interpret blabla#gmail.com to the correct canonical form--unless perhaps its being correctly trapped as a data element, such as in a dropdown box. But this is something I'm guessing you have NO control over?
As you can now tell, there are no fast answers to questions like this. You have to have some idea of how the data will be used by your caller. Since you have the possibility of having your data treated correctly as data by the browser, and the possibility of the data treated as code, you might be forced to offer a "safe" and "unsafe" call to retrieve your data, assuming that you have no control over how the client uses your service. That puts you in a bad spot, because a lazy caller might simply only ever use the unsafe version. When this happens in my industry, I'll usually make it so that the URL to call for an unsafe function looks something like mywebservice.com/unSafeNonPCICompliantMethod or something similar, so that you force your caller to explicitly accept the risk. If its being used in the correct context on the browser... the unsafe method might actually be safe. You just won't know.