Storing parts of user data in files for preventing SQL injection

Storing parts of user data in files for preventing SQL injection - sql-injection

I am new to web programming and have been exploring issues related to web security.
I have a form where the user can post two types of data - lets call them "safe" and "unsafe" (from the point of view of sql).
Most places recommend storing both parts of the data in database after sanitizing the "unsafe" part (to make it "safe").
I am wondering about a different approach - to store the "safe" data in database and "unsafe" data in files (outside the database). Ofcourse this approach creates its own set of problems related to maintaining association between files and DB entries. But are there any other major issues with this approach, especially related to security?
UPDATE: Thanks for the responses! Apologies for not being clear regarding what I am
considering "safe" so some clarification is in order. I am using Django, and the form
data that I am considering "safe" is accessed through the form's "cleaned_data"
dictionary which does all the necessary escaping.
For the purpose of this question, let us consider a wiki page. The title of
wiki page does not need to have any styling attached with it. So, this can be accessed
through form's "cleaned_data" dictionary which will convert the user input to
"safe" format. But since I wish to provide the users the ability to arbitrarily style
their content, I can't perhaps access the content part using "cleaned_data" dictionary.
Does the file approach solve the security aspects of this problem? Or are there other
security issues that I am overlooking?

You know the "safe" data you're talking about? It isn't. It's all unsafe and you should treat it as such. Not by storing it al in files, but by properly constructing your SQL statements.
As others have mentioned, using prepared statements, or a library which which simulates them, is the way to go, e.g.
$db->Execute("insert into foo(x,y,z) values (?,?,?)", array($one, $two, $three));

What do you consider "safe" and "unsafe"? Are you considering data with the slashes escaped to be "safe"? If so, please don't.
Use bound variables with SQL placeholders. It is the only sensible way to protect against SQL injection.

Splitting your data will not protect you from SQL injection, it'll just limit the data which can be exposed through it, but that's not the only risk of the attack. They can also delete data, add bogus data and so on.
I see no justification to use your approach, especially given that using prepared statements (supported in many, if not all, development platforms and databases).
That without even entering in the nightmare that your approach will end up being.
In the end, why will you use a database if you don't trust it? Just use plain files if you wish, a mix is a no-no.

SQL injection can targeted whole database not only user, and it is the matter of query (poisoning query), so for me the best way (if not the only) to avoid SQL injection attack is control your query, protect it from possibility injected with malicious characters rather than splitting the storage.

Related

Suggest data access design for Entity Framework (stored procedure less)

The plan is to use Entity Framework for data access. We are in a dilemma in deciding whether to use stored procedures or not.
The main idea behind avoiding stored procedures: we don't want any one tempted in writing business logic at the database level. I believe in database is only for storage.
Is there any performance hit if I write joins, business logic at the data access level? Is this as good as stored procedures? please provide your recommendations.
Regards,
Ramana Akula.

SP disadvantages:
SP code is "fixed" - e.g. you don't have LINQ flexibility here. This required extra level of synchronization.
Should be written by specialist in most cases. Almost all C# developers historically are not very good in this.
Although you put extra effort for SPs maintaining you don't get significant performance improvement
Even if you don't have intention, SPs will contain some part of business logic code.
If you loose some control over SPs maintenance (or if you'd add some premature optimization to DAL), you'd need more and more SPs as you go: getXbyY, getXbyZ, getSomeFieldByX, geAnotherFieldsByX, etc, etc.
So my opinion is avoiding SPs as possible. If you feel you need one, that could indicate incorrect data/storage structure design.

Using MicroORM for read layer in CQRS

Folks,
Im considering using a microORM such as Dapper.net for the read access component of a CQRS application (Asp.Net MVC), with Entity Framework being used for manipulating the domain.
This is CQRS light, I am not using event sourcing etc. I have seen it mentioned several times that the read only model in CQRS should be light/simpleas possible querying the data layer, possible using something like ADO.net
That implies potentially hardcoding SQL Query strings in our code or in some XML file. How should I go about justifying this approach where we have to maintain the domain mappings on one side and SQL statements on another?
Has anyone used MicroORM's in a CQRS solution in this way?
Thanks
Mick

Yes, absolutely you can use Dapper, PetaPoco, Massive, Simple.Data, or any other micro ORM you would like. In the past we have used NHibernate to solve the problem but it was a 10,000 lbs. gorilla compared to what we needed.
One thing that we really liked about Simple.Data and Petapoco in our evaluation of those libraries was that they each could adapt your queries to different database engines (including Mongo) with minimal tweaking necessary, whereas Dapper was basically one big bunch of SQL strings--it was "stringly typed". Don't get me wrong, Dapper's great and is very, very fast and will absolutely work great. Just evaluate your functional and non-functional requirements before committing.
Here are the relative number of downloads using NuGet for each of the primary micro ORMs (as of about 1/1/2012). For us, having a good community with lots of downloads is always a must in order to help iron out issues when the arise:
5568 Simple.Data
4990 Petapoco
4913 Dapper
2203 Massive
1152 OrmLite
Lastly, one thing you may want to investigate is your reasoning behind SQL altogether for your read models. If your domain is publishing events (regardless of event sourcing), and you're writing to simple, flat/non-relational view models, you may be able to get away with something as simple as JSON files that are pushed to the browser which the browser then interprets and uses to populate your HTML templates. There's all kinds of options that are available, you just need to determine what works best in your scenario.

Small methods - Small sprocs

Uncle Bob recommends having small methods. Do stored procedures have an ideal size? Or can they run on for 100's and 100's of lines long?
Also does anyone have anything to say about where to place business logic. If located in stored procedures, the database is being used as data processing tier.
If you read Adam Machanic, his bias is toward the database, does that imply long stored procedures that only the author of the sproc understands, leaving maintainers to deal with the mess?
I guess there is two inter-related questions, somehow.
Thanks in advance for responding to a fuzzy question(s).

Stored procedures are no different than regular functions. They should be of manageable size, regardless. I am biased towards keeping away business logic from stored procedures but reasonable people may disagree.

I think stored procedures nowadays should NEVER be used on a whole system as the only access method to the database. This is an outdated architecture that in the long-term gives much more maintainance problems than benefits.
There are much better ways nowadays to handle every data access requirement.
The best use for stored procedure is for certain rare cases when you want a single, well defined and unique function to retrieve data that you know it will be used in the same way by more applications. The stored procedure will allow you to be DRY in this case.
Also in certain cases where your db administrator that handles security needs to protect certain part of the data (for example a credit card table) on such a granular way that allowing access only to SP is a good option.
Apart from those cases avoid stored procedures as much as possible and stick with only using code with all the benefits of inheritance, compilers checks, tools for refactorizations, enumerations instead of magic strings also in queries, source control, easier deployment etc etc. The list of benefit of avoiding SP as much as possible is just too long to pass nowadays.
BUT if for some reason you decide to use stored procedures you might as well put business logic in there as having such a layer so close to the data without even allowing it to contain business logic will just further complicate your project and you will not reap the very few positive points of using SPs.

I would apply the same recommendations to SP's as much as possible as I consider SP's code.
Business Logic in my opinion belongs in a tier of the code base not in SP's. To me if the SP's keep the business logic they know too much about what they are supposed to do. I think SP's mainly should be tasked with when used for retreiving data and/or storing it. If business logic has been applied up the chain of command / in the code then SP's would only be called when business logic has been satisfied.
I doubt Adam Machanic or most would advocate that long SP's that are hard to understand and maintain are a good thing.

How would you use EF in a typical Business Layer/Data Access Layer/Stored Procedures set up?

Whenever I watch a demo regarding the Entity Framework the demonstrator simply sets up some tables and performs Inserts, Updates and Deletes using automatically created code stubs but never shows any use of stored procedures. It seems to me that this is executing SQL from the client.
In my experience this is not particular good practice so I am presuming that my understanding of the Entity Framework is wrong.
Similarly WCF RIA Services demos use the EF and the demos are always the same. Can anyone shed any light on how you would use EF in a typical Business Layer/Data Access Layer/Stored Procedures set up.
I think I am confused and shouldn't be!!?

There's nothing wrong with executing SQL from the client. Most (if not all) of the problems that it might cause are in fact not there when using something like EF. For instance:
Client generated SQL might cause runtime syntax errors. This is not unlikely since the description of your query is mostly checked on compile time (assuming that the generator itself doesn't generate invalid SQL, which is also unlikely)
Client generated SQL might be inefficient. This is not true with modern database software which have query caches. EF works in a way that's compatible with query caches, i.e. it generates the same SQL consistently (as long as you use the same code consistently) and uses parameters for varying data.
Client generated SQL might be insecure (SQL injections and whatnot). This is all handled by the generator, which uses parameters for your values and does not interpolate user input into the query itself.

Back in the old Client / Server days, it used to be considered good practice to do all db updates using stored procedures.
But now, it's perfectly acceptable to have an O/RM generate SQL and run directly against DB.

Well, part of the reason why executing sql in stored procedures is a good idea is that it gives you a level of abstraction - when db changes inevitably occur, you make a change in a single place (the proc) rather than a dozen places (all the places where you were calling the client sql). Entity Framework provides this layer of abstraction through the data model, and you have the same advantage.
There are some other reasons why you might want to look at procs, like security granularity (only allowing certain users the right to execute), and some minor performance differences. Ultimately, you have to decide for yourself what the right trade-off is. EF is an attempt to dramatically reduce the developer time spent creating a data layer, with the trade-offs listed above.

never shows any use of stored procedures
Take a look at this video: Using Your Own Stored Procedures to Insert, Update and Delete Entities in Entity Framework.
Note that there are a lot of other videos on that topic there that are certainly worth watching!

The legend is that Scott Hanselman once said "It's not a real demo unless someone drags a datagrid" (pg 478 Silverlight 4 In Action, Pete Brown)
You have to remember that demos, are all about selling software, and not at all about communicating best practice. So your observations about the demos are absolutely correct, they cover the basics, and leave it to the observer to fill in the blanks.
As to your comment about Stored Procedures, and various answers to your question about the generator. The generator is good, and getting better. Howerver there are certain circumstances when it will generate completely unusable queries. (see my SO question here and discussed on the ADO.NET team blog)
Therefore there are occasions when hand crafted queries are your only recourse (either by way of stored proc, table value functions, views etc)

relational_database vs config_file vs spreadsheet usage

I have heard some genuine arguments for the use of relational database vs spreadsheet before. Relational database provides fast reporting and (relatively speaking) reliable data warehousing,where spreadsheets are lightweight, fast replicating, and easy to float around the organization to different audience. Although I notice the advantages of either, I can rarely distinguish what's better in which scenario, and always end up using database.
In development, it's easy to forget to consider other options when one can place config settings in the database. I've ran into quite a few apps where user menus, work flows and their orders, and constants are defined in the database level. While this is good if these entities were subject to change by end user from application level, it was not the case.
So, what's your take on the roles of databases, config files, and spread sheets?

The old adage is this.
When you use a spreadsheet to solve a problem, you now have two problems.
Database is for records of the business. Long-lasting. Permanent.
Other configuration files are for other configuration information -- not long-lasting business records. Current settings and what-not are not enduring business records, they're part of a specific software configuration that processes the business records.
Spreadsheets are -- well -- they are what they are. Too complex to be a simple, configuration file. Too simple to be a real database.
Since they're (almost) impossible to control, you need one standard, correct, idempotent result in the database. You should be able to rebuild spreadsheets from that controlled source.
Similarly, if you accept a spreadsheet for upload, you have to extract the data, and never refer back to the (almost uncontrollable) source document again.

For me, I want all of the core data to be stored in a database. Two reasons:
to allow adhoc reporting access to the data
to allow applications to share data.
Databases should contain all of the domain data, and occasionally some on-the-fly data (user preferences for example). Relational databases are most popular, but for some apps there are other options.
The config file on the other hand should contain all of the 'parameters' you want to change in the system; the ones that are not changed rapidly (on-the-fly). Config items are flexible, but not easily, and usually not from the interface. If it's a param that you only want the coder to possibly change, that should be right in the code (so no one else has access).
If you want to fiddle with data mining, provide some generic mechanism to download a CSV file with the results of a SQL query, directly into Excel. That way people can fiddle with pivot tables, without having to alter the application's schema.

Spreadsheets are documents, databases are repositories for information, configuration files store rules for how a specific instance of an application should behave. If you think of it that way, it's usually not hard to make a call.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse