When and where to encode user input?

When and where to encode user input? - forms

I am currently storing data submitted from users into my database already encoded like such:
<cfquery>
INSERT INTO dbo.MyTable (UserID, Comment)
VALUES
(
<cfqueryparam value="#FORM.UserID#" cfsqltype="cf_sql_integer"/>,
<cfqueryparam value="#EncodeForHTML(FORM.Comment)#" cfsqltype="cf_sql_nvarchar"/>
)
</cfquery>
Evidently this is not the right way to do it because now I have escaped characters in my DB table which are only useful for HTML output and difficult to perform searches on within SQL Server.
So how do I ensure that I apply the EncodeForHTML() on the input before it hits the server and then Canonicalize() the data received to be stored in the DB?

Mitigate potentially DB-harmful text when it heads towards the DB: pass it as a parameter, not hard-coded into the SQL statement, as you have kinda done in your example. You are still exposing yourself by not parameterising your ID value. As a rule, only SQL should go in your <cfquery>'s SQL string; any data values should be passed as parameters.
Similarly, mitigate risk your user-provided data might expose when you use the data. Not when it goes into storage, but when you actually use it. encodeForHtml() is only appropriate for stuff being written into HTML. It's no help if it's being passed on a URL, or used in JavaScript, etc. There are different mitigation approaches for those (urlEncodedFormat() and encodeForJavaScript() respectively). The point being you handle the mitigation on a use-by-use basis, not just generically.
And how to ensure this is done (you ask this)? Well... you write your code diligently and have a rigorous code review and QA process (with QA doing pen. tests).

You can store them as is, and use <cfqueryparam> for form.userid as well. On output, you use encodeforhtml().
If you prefer to have some data sanitizing done before storing, try AntiSamy (built-in in CF11) http://www.adobe.com/devnet/coldfusion/articles/security-improvements-cf11.edu.html#articlecontentAdobe_numberedheader_2

Related

Best practice for RESTful API design updating 1/many-to-many relationship?

Suppose I have a recipe page where the recipe can have a number of ingredients associated with it. Users can edit the ingredients list and update/save the recipe. In the database there are these tables: recipes table, ingredients table, ingredients_recipes_table. Suppose a recipe has ingredients a, b, c, d but then the user changes it to a, d, e, f. With the request to the server, do I just send only the new ingredients list and have the back end determine what values need to be deleted/inserted into the database? Or do I explicitly state in the payload what values need to be deleted and what values need to be inserted? I'm guessing it's probably the former, but then is this handled before or during the db query? Do I read from the table first then write after calculating the differences? Or does the query just handle this?
I searched and I'm seeing solutions involving INSERT IGNORE... + DELETE ... NOT IN ... or using the MERGE statement. The project isn't using an ORM -- would I be right to assume that this could be done easily with an ORM?

Can you share what the user interface looks like? It would be pretty standard practice that you can either post a single new ingredient as an action or delete one as an action. You can simply have a button next to the ingredients to initiate a DELETE request, and have a form beneath for a POST.
Having the users input a list creates unnecessary complexity.

A common pattern to use would be to treat this like a remote authoring problem.
The basic idea of remote authoring is that we ask the server for its current representation of a resource. We then make local (to the client) edits to the representation, and then request that the server accept our representation as a replacement.
So we might GET a representation that includes a JSON Array of ingredients. In our local copy, we remove the ingredients we no longer want, add the new ones in. The we would PUT our local copy back to the server.
When the documents are very large, with changes that are easily described, we might instead of sending the entire document to the server instead send a PATCH request, with a "patch document" that describes the changes we have made locally.
When the server is just a document store, the implementation on the server is easy -- you can review the changes to decide if they are valid, compute the new representation (if necessary), and then save it into a file, or whatever.
When you are using a relational database? Then the server implementation needs to figure out how to update itself. An ORM library might save you a bunch of work, but there are no guarantees -- people tend to get tangled up in the "object" end of the "object relational mapper". You may need to fall back to hand rolling your own SQL.
An alternative to remote authoring is to treat the problem like a web site. In that case, you would get some representation of a form that allows the client to describe the change that should be made, and then submit the form, producing a POST request that describes the intended changes.
But you run into the same mapping problem on the server end -- how much work do you have to do to translate the POST request into the correct database transaction?
REST, alas, doesn't tell you anything about how to transform the representation provided in the request into your relational database. After all, that's part of the point -- REST is intended to allow you to replace the server with an alternative implementation without breaking existing clients, and vice versa.
That said, yes - your basic ideas are right; you might just replace the entire existing representation in your database, or you might instead optimize to only issue the necessary changes. An ORM may be able to effectively perform the transformations for you -- optimizations like lazy loading have been known to complicate things significantly.

How to expose URL friendly UUIDs?

Hello Internet Denizens,
I was reading through a nice database design article and the final determination on how to properly generate DB primary keys was ...
So, in reality, the right solution is probably: use UUIDs for keys,
and don’t ever expose them. The external/internal thing is probably
best left to things like friendly-url treatments, and then (as Medium
does) with a hashed value tacked on the end.
That is, use UUIDs for internal purposes like db joins, but use a friendly-url for external purposes (like a REST API).
My question is ... how do you make uniquely identifiable (and friendly) keys for external purposes?
I've used several APIs: Stripe, QuickBooks, Amazon, etc. and it seems like they use straight up sequential IDs for things like customers, report IDs, etc for retrieving information. It makes me wonder if exposing UUIDs as a security risk is a little overblown b/c in theory you should be able to append a where clause to your queries.
SELECT * FROM products where UUID = <supplied uuid> AND owner/role/group/etc = <logged in user>
The follow-up question is: If you expose a primary key, how do people efficiently restrict access to that resource in a database environment? Assign an owner to a db row?
Interested in the design responses.
Potential Relevant Posts for Further Reading:
Should I use UUIDs for resources in my public API?

It is not a good idea to expose your internal ids to the outside. You should either encode them (with some algorithm) or have a look up table.
Also, do not append parameters provided by user (or URL) to your SQL query (UUIDS or not), this is prone to SQL injection. Use parameterized SQL queries for that.

What kind of int storage is this?

We have an Firebird database for a (very crappy) application, and the app's front end, but nothing in between (i.e. no source code).
There is a field in the database that is stored as -2086008209 but in the front-end represents as 63997.
Examples:
Database Front-End
758038959 44093
1532056691 61409
28401112 65866
-712038758 40712
936488434 43872
-688079579 48567
1796491935 39437
1178382500 30006
1419373703 66069
1996421588 48454
890825339 46313
-820234748 45206
What kind of storage is this? The aim for us here is to access the application's back-end data and bypass the front-end GUI alltogether, so I need to know how to decode this field in order to get appropriate values from it. It is stored as a int in FireBird (I don't know if FireBird has signed/unsigned ints, but this is showing as signed when we select it).
This is the definition of the field:
It is not, as far as I can tell, de-normalised. The generator GEN_CONTACTS_ID has 66241 against it, which at a glance looks accurate.

I work on with an application that stores bitmaps in integers (just don't ask), if you express them in that form do you something useful or consistant

My impression is that the problem is in the front end. If what is stored in the DB is -2086008209, then what is stored in the DB is -2086008209. To understand better how the application is manipulating the data, try storing other numbers in the DB and see how they are displayed.

Did you come to this realization through logging SQL? If you havent, you may serve yourself well by using the Firebird Trace API to get that SQL: http://www.firebirdfaq.org/faq95/. An easier tool to parse the Trace API is this commercial product: http://www.upscene.com/products.fbtm.index.php.
I've used these tools and other techniques (triggers etc,.) to find what an application is using/changing in the Database.
Of course, if the SQL statement is select * from table, then these tools would not help much.

Mule: after delivering a message, save the current timestamp for later use. What's the correct idiom?

I'm connecting to a third-party web service to retrieve rows from the underlying database. I can optionally pass a parameter like this:
http://server.com/resource?createdAfter=[yyyy-MM-dd hh:ss]
to get only the rows created after a given date.
This means I have to store the current timestamp (using #[function:datestamp:...], no problem) in one message scope and then retrieve it in another.
It also implies the timestamp should be preserved in case of an outage.
Obviously, I could use a subflow containing a file endpoint, saving in a designated file on a path. But, intuitively, based on my (very!) limited experience, it feels hackish.
What's the correct idiom to solve this?
Thanks!

The Object Store Module is designed just for that: to allow you to save bits of information from your flows.
See:
http://mulesoft.github.io/mule-module-objectstore/mule/objectstore-config.html
https://github.com/mulesoft/mule-module-objectstore/

How to determine the encoding of request query string

Suppose I have a .NET HttpModule that analyzes incoming requests to check for possible attacks like Sql Injection.
Now suppose that a user of my application enters the following in a form field and submits it:
&#039&#032&#079&#082&#032&#049&#061&#049
That is Unicode for ' OR 1=1. So in the request I get something like:
http://example.com/?q=%26%23039%26%23032%26%23079%26%23082%26%23032%26%23049%26%23061%26%23049
Which in my HttpModule looks fine (no Sql Injection), but the server will correctly decode it to q=' OR 1=1 and my filter will fail.
So, my question is: Is there any way to know at that point what is the encoding used by the request query string, so I can decode it and detect the attack?
I guess the browser has to tell the server which encoding the request is in, so it can be correctly decoded. Or am I wrong?

the server will correctly decode it to q=' OR 1=1
It shouldn't. There is no valid reason(*) an application would HTML-decode the &#039... string before using it in an SQL query. HTML-decoding is a client-side occurrence.
(* there's the invalid reason: that the application author doesn't have the foggiest idea what they're doing, tries to write an input-HTML-escaping function - a misguided idea in the first place - and due to incompetence writes an input-de-escaping function instead... but that would be an unlikely case. Hopefully.)
Is there any way to know at that point what is the encoding used by the request query string
No. Some Web Application Firewalls attempt to get around this by applying every decoding scheme they can think of to the incoming data, and triggering if any of them match something suspicious, just in case the application happens to have an arbitrary decoder of that type sitting between the input and a vulnerable system.
This can result in a performance hit as well as increased false positives, and doubly so for the WAFs that try all possible combinations of two or more decoders. (eg is T1IrMQ a base-64-encoded, URL-encoded OR 1 SQL attack, or just a car numberplate?)
Quite how far you take this idea is a trade-off between how many potential attacks you catch and how much negative impact you have on real users of the app. There's no one 'correct' solution because ultimately you can never provide complete protection against app vulnerabilities in a layer outside the app (aka "WAFs don't work").

What you are seeing is URL Encoded, where a percent sign followed by 2 hex digits represents a single encoded byte octet. In HTML, an entity starting with an ampersand and ending with a semicolon contains an entity name or an explicit Unicode codepoint value.
What gets sent over the wire between the browser and server is http://example.com/?q=%26%23039%26%23032%26%23079%26%23082%26%23032%26%23049%26%23061%26%23049, but logically is actually represents http://example.com/?q=&#039&#032&#079&#082&#032&#049&#061&#049 when decoded by the server upon receiving it. When your code reads the query string, it should be receiving &#039&#032&#079&#082&#032&#049&#061&#049. The server should not be decoding that any further to ' OR 1=1, you would have to do that in your own code.
If you are allowing a URL query string to specify an SQL query filter as-is, then that is a mistake on your part to begin with. That suggests you are building SQL queries dynamically instead of using parameterized SQL queries or stored procedures, so you are leaving yourself open to SQL Injection attacks. You should not be using that. Parameterized SQL queries and stored procedure are not subject to injection attacks, so your clients should only be allowed to submit the indiviudal parameter values in the URL. Your server code can then extract the individual values from the URL query and pass them to the SQL parameters as needed. The SQL Engine will make sure the values are santitized and formatted to avoid attacks. You should not be handling that manually.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse