erlang: how to protect database against SQL injection inside json field

erlang: how to protect database against SQL injection inside json field - postgresql

I am using epgsql(https://github.com/epgsql/epgsql) lib which allows writing queries like this:
"INSERT INTO my_table"
"(item_id, json_data) "
"VALUES ($1, $2) "
"ON CONFLICT "
"DO NOTHING "
"RETURNING *;",
And then call such queries with different parameters. In general, we expect a wide range of incoming data without any pre-defined format. The only thing we expect is that it's a JSON of the following format:
{"field_name": "long value from the user with a potential injection of the SQL code I need to be protected from"}
The question I am having is how do I protect the query from something bad. E.g. that someone enters something like ; DROP table ... --- or anything like that?

In the situation when you provide a service for DB queries really very hard to do a call cleanup.
Only one option coming to my mind:
Normalise the incoming command with upper case, removing of double-spaces, split a command into ";"-lines etc. and then you can potentially do a dictionary search for text as "DROP TABLE…" or "DELETE FROM…" to mark command as a dangerous and prevent execution, other cases can be also provided, but this creates a knowledge base to be maintained.

I think you have to write it by yourself considering other language's implementation.

Related

Truncating a mergefield in Microsoft Word for mailmerge

I'm looking to create a formula and apply to a mergefield in Word 2013. I have no access to the db and only given merge fields. My goal is truncate/shorten certain mergefields. E.g. Full Name to just initial, Age [27 years] to short age [27].
Access and excel have the formula 'left' which i've tried to use with no success. There seems to be a low more options available for numbers.
{=left({ MERGEFIELD First_Name },1)}
However this gives a syntax error. Is there a list of formulas that work for mergefield?
Outcome
'Steven' -> 'S'
'27 Years' -> '27'

The short answer to your question is that there is nothing in the Word field language that can reliably do string manipulation such as left(), mid() and so on. The {=} field only has numeric functions such as SUM, ABS, PRODUCT and so on. There are unreliable approaches, and in some cases they may be reliable enough for your requirement, but that really depends on how sure you can be that the data source will always contain values formatted as you expect.
As a simple example, let's take the "27 years" thing.
If every value in the relevant data source column is in the same general format, which I will describe as "something that Word recognises as an individual number, followed by an alpha string", then you can in fact use
{ SET dat { MERGEFIELD age } }{ =dat }
Notice that in that case, if you are merging to a new document, { =dat } fields will remain in the output, and updating those fields will cause errors. You can avoid that by nesting either the { =dat } field or the all the fields in a QUOTE field:
{ QUOTE "{ SET dat { MERGEFIELD age } }{ =dat }" }
{ SET dat { MERGEFIELD age } }{ QUOTE { =dat } }
However, if your data source field could contain a value such as
4 years 2 months
then this will not work because in that case { =dat } will evaluate to 6, not 2. Word will also evaluate anything that looks like an { = } field expression, e.g. if your data source contains
SUM(23,25)
then { =dat } will evaluate to 48. There are further oddities that I will not describe now.
The simplest unreliable approach to extracting the first letter from a field is to use a large number of IF fields to test for every possible initial letter, e.g.
{ IF "{ MERGEFIELD First_Name }" = "A*" "A" }{ IF "{ MERGEFIELD First_Name }" = "B*" "B" } etc.
If you don't need to distinguish between lower and upper case you can use
{ IF "{ MERGEFIELD First_Name \*Upper }" = "A*" "A" } etc.
That's OK if you know (for example) that the names can only start with A-Z,a-z (and you could obviously test for 0-9 etc. as well. But what if it could start with any Unicode letter? Not sure that inserting thousands of IF fields is a reliable approach.
There is a similarly "unreliable" - and resource-consuming - way to use functions such as left, mid etc., as long as you are using recent versions of Windows Word (not Mac Word).
What you can do is create a completely empty Access/Jet database .mdb (let's say it is at c:\i\i.mdb, then insert a DATABASE field nested in a QUOTE field like this
{ QUOTE { DATABASE \d "c:\\i\\i.mdb" \s "SELECT left('{ MERGEFIELD First_Name }',1)" } }
Normally, a DATABASE field inserts a Word table (unless the data source has more columns than a Word table can contain), but when you only insert a single value with no headings, Word does not put the value in a cell. Unfortunately, these days Word does add a paragraph mark, but nesting the DATABASE field inside a QUOTE field appears to remove that again.
So why is that "unreliable"? Well, the main reason is if the First_Name field contains any quotation marks (certainly single-quotation marks, and OTTOMH I think double quotation marks) then the query that Word sends to Jet will look like this like this
SELECT left('a name containing a ' mark'),1)
and Jet will return a syntax error.
There are other problems with the DATABASE field approach, including
Word restricts the SELECT statement to 255 characters (I think). If
your data source filed causes the SLEECT statement length to exceed
that, Jet will return an error.
You have to put the database somewhere. If you are just using this
merge yourself, that may not be a problem, but if you have to
distribute the Word document etc. for others to use, you also have to
ensure they have the .mdb and that it's at the specified location.
Word sometimes gets confused between a Mail Merge data source and a
data source introduced via a DATABASE field.
Even one DATABASE field will execute a query for every record in the data source. If you use this technique in several places, a very large number of queries will be issued. That could cause problems.
As far as "single letter extraction" is concerned, there is another approach, rather similar to the DATABASE one, that uses an external .XML file and a set of INCLUDETEXT fields to specify a node in the file and return its content. But there are also similar difficulties. I may modify this Answer to describe that approach at some point, but as far as I know it has never been used in a real-world scenario.
So what if you need something more reliable? Well, there are several approaches, but all of them suffer from shortcomings of one kind or another. The main approaches I know are:
use Word VBA and the OpenDataSource method to open the data source.
That allows you to specify a query in the SQL dialect understood by
the data source.
use a Query/View defined in an intermediate database to extract the
data items you need, and use that Query/View as your data source
Use Word VBA's MailMerge Events to manipulate the data for each
record in the data source as Word processes the mailmerge
use a manual intermediate step
(more drastic) ditch Word MailMerge and find another approach
altogether, e.g. create a .docx using .NET, the relevant database
provider, and the Office Open XML SDK
If you are creating this merge for use by other people, two side-effects of all those approaches is that the overall process becomes more complicated or unfamiliar for the user, and in particular, they may not be able to use Word's facilities for data source record filtering and so on. Another issue that some people encounter is that if your database contains long text fields/memo fields longer than 255 characters, they have a tendency to be truncated by Jet whenever you do something much more complicated than the default "SELECT * FROM TABLE"
(1) requires that you can write a suitable query to get the columns you need from your data source. Because the query is executed using OLE DB you don't actually need to create any permanent objects in your database. So it may be a viable approach as long as the backend database allows you to execute external queries. But Word also imposes a 255 or 511 character limit on the query, so if you have to manipulate a lot of fields or the functions you need are complicated, you may find that you exceed the character limit quite quickly.
(2) is rather similar to (1) but may allow you to specify a much more complex query. For example, if your data source is a Jet .accdb, you may be able to create your own .accdb and define a query in that that accesses the tables in the .accdb that you are not allowed to modify. You might either used "linked tables" to achieve that, or in certain cases you can specify the locations of the underlying tables/queries in the SQL.
(3) means that you use VBA to intercept Word as it processes each data source record. I leave you to research that. You have to control the process from VBA to ensure that the MailMerge events are invoked. There have been reports of various unreliabilities. VBA can only access the first 255 characters of any memo fields.
(4), e.g. you create an Excel workbook and use it to query the database. In that case you may be able to issue a much longer SQL query than you can in Word, and you may be able to create new Excel columns that manipulate the data using excel formulas. (I have never tried that, though). Then use that as your data source.
Finally, a web search should reveal a list of functions recognised by Word's "=" field, but recent Microsoft documentation tends to omit the IF() function. The ISO29500 documents on the .docx standard omit it as well, but I think that was not the intention and may be fixed in a future version of the standard. The functions are:
ABS, AND, AVERAGE, COUNT, DEFINED, FALSE, IF, INT, MIN, MAX, MOD, NOT, OR, PRODUCT, ROUND, SUM, TRUE.

Is there any logical reason to use CFQUERYPARAM in Query of Queries?

I primarily use CFQUERYPARAM to prevent SQL injection. Since Query-of-Queries (QoQ) does not touch the database, is there any logical reason to use CFQUERYPARAM in them? I know that values that do not match the cfsqltype and maxlength will throw an exception, but, these values should already be validated before that and display friendly messages (from a UX viewpoint).

Since Query-of-Queries (QoQ) does not touch the database, is there any logical reason to use CFQUERYPARAM in them? Actually, it does touch the database, the database that you currently have stored in memory. The data in that database could still theoretically be tampered with via some sort of injection from the user. Does that affect your physical database - no. Does that affect the use of the data within your application - yes.
You did not give any specific details but I would err on the side of caution. If ANY of the data you are using to build your query comes from the client then use cfqueryparam in them. If you can guarantee that none of the elements in your query comes from the client then I think it would be okay to not use the cfqueryparam.
As an aside, using cfqueryparam also helps optimize the query for the database although I'm not sure if that is true for query of queries. It also escapes characters for you like apostrophes.

Here is a situation where it's simpler, in my opinion.
<cfquery name="NoVisit" dbtype="query">
select chart_no, patient_name, treatment_date, pr, BillingCompareField
from BillingData
where BillingCompareField not in
(<cfqueryparam cfsqltype="cf_sql_varchar"
value="#ValueList(FinalData.FinalCompareField)#" list="yes">)
</cfquery>
The alternative would be to use QuotedValueList. However, if anything in that value list contained an apostrophe, cfqueryparam will escape it. Otherwise I would have to.
Edit starts here
Here is another example where not using query parameters causes an error.
QueryAddRow(x,2);
QuerySetCell(x,"dt",CreateDate(2001,1,1),1);
QuerySetCell(x,"dt",CreateDate(2001,1,11),2);
</cfscript>
<cfquery name="y" dbtype="query">
select * from x
<!---
where dt in (<cfqueryparam cfsqltype="cf_sql_date" value="#ValueList(x.dt)#" list="yes">)
--->
where dt in (#ValueList(x.dt)#)
</cfquery>
The code as written throws this error:
Query Of Queries runtime error.
Comparison exception while executing IN.
Unsupported Type Comparison Exception:
The IN operator does not support comparison between the following types:
Left hand side expression type = "DATE".
Right hand side expression type = "LONG".
With the query parameter, commented out above, the code executes successfully.

SQL injection? CHAR(45,120,49,45,81,45)

I just saw this come up in our request logs. What were they trying to achieve?
The full request string is:
properties?page=2side1111111111111 UNION SELECT CHAR(45,120,49,45,81,45),CHAR(45,120,50,45,81,45),CHAR(45,120,51,45,81,45),CHAR(45,120,52,45,81,45),CHAR(45,120,53,45,81,45),CHAR(45,120,54,45,81,45),CHAR(45,120,55,45,81,45),CHAR(45,120,56,45,81,45),CHAR(45,120,57,45,81,45),CHAR(45,120,49,48,45,81,45),CHAR(45,120,49,49,45,81,45),CHAR(45,120,49,50,45,81,45),CHAR(45,120,49,51,45,81,45),CHAR(45,120,49,52,45,81,45),CHAR(45,120,49,53,45,81,45),CHAR(45,120,49,54,45,81,45) -- /*
Edit: As a google search didn't return anything useful I wanted to ask the question for people who encounter the same thing.

This is just a test for injection. If an attacker can see xQs in the output then they'll know injection is possible.
There is no "risk" from this particular query.
A developer should pay no attention to whatever injection mechanisms, formats or meanings - these are none of his business.
There is only one cause for for all the infinite number of injections - an improperly formatted query. As long as your queries are properly formatted then SQL injections are not possible. Focus on your queries rather than methods of SQL injection.

The Char() function interprets each value as an integer and returns a string based on given the characters by the code values of those integers. With Char(), NULL values are skipped. The function is used within Microsoft SQL Server, Sybase, and MySQL, while CHR() is used by RDBMSs.
SQL's Char() function comes in handy when (for example) addslashes() for PHP is used as a precautionary measure within the SQL query. Using Char() removes the need of quotation marks within the injected query.
An example of some PHP code vulnerable to an SQL injection using Char() would look similar to the following:
$uname = addslashes( $_GET['id'] );
$query = 'SELECT username FROM users WHERE id = ' . $id;
While addslashes() has been used, the script fails properly sanitize the input as there is no trailing quotation mark. This could be exploited using the following SQL injection string to load the /etc/passwd file:
Source: http://hakipedia.com/index.php/SQL_Injection#Char.28.29

Relation does not exist

I have just connected Powerbuilder with PostgreSQL through ODBC, but something goes wrong when I'm trying to create a datawindow! I can't understand where is the problem. I will be so grateful to receive any answers.
The error:
Cannot create DataWindow
SQLSTATE=42P01
ERROR:relation "core sample" does not exist;
No query has been executed with that handle
SELECT CORE_SAMPLE.N_CORE, CORE_SAMPLE.DEPTH,
CORE_SAMPLE.WELL_ID_WELL, CORE_SAMPLE.ID_CORE FROM
CORE_SAM'

Obviously, there is a mixup with names. "core sample" is not the same as CORE_SAMPLE. Hard to say more, based on what little information we have here.
Unquoted identifiers are cast to lower case in PostgreSQL, so CORE_SAMPLE, Core_Sample or core_sample end up to be identical.
But once you enclose identifiers in double quotes, the name is preserved as is. This way you can have otherwise illegal characters like a space in the name: "core sample". My standing advise is to stay away form that and use legal, lower case identifiers exclusively with PostgreSQL.
The error message tells you there is no table named "core sample", at least not in the database you connected to in any of the schemas listed in the search_path.
But the displayed query refers to a table named CORE_SAMPLE which does not match this error message.

Parameterized SQL Columns?

I have some code which utilizes parameterized queries to prevent against injection, but I also need to be able to dynamically construct the query regardless of the structure of the table. What is the proper way to do this?
Here's an example, say I have a table with columns Name, Address, Telephone. I have a web page where I run Show Columns and populate a select drop-down with them as options.
Next, I have a textbox called Search. This textbox is used as the parameter.
Currently my code looks something like this:
result = pquery('SELECT * FROM contacts WHERE `' + escape(column) + '`=?', search);
I get an icky feeling from it though. The reason I'm using parameterized queries is to avoid using escape. Also, escape is likely not designed for escaping column names.
How can I make sure this works the way I intend?
Edit:
The reason I require dynamic queries is that the schema is user-configurable, and I will not be around to fix anything hard-coded.

Instead of passing the column names, just pass an identifier that you code will translate to a column name using a hardcoded table. This means you don't need to worry about malicious data being passed, since all the data is either translated legally, or is known to be invalid. Psudoish code:
#columns = qw/Name Address Telephone/;
if ($columns[$param]) {
$query = "select * from contacts where $columns[$param] = ?";
} else {
die "Invalid column!";
}
run_sql($query, $search);

The trick is to be confident in your escaping and validating routines. I use my own SQL escape function that is overloaded for literals of different types. Nowhere do I insert expressions (as opposed to quoted literal values) directly from user input.
Still, it can be done, I recommend a separate — and strict — function for validating the column name. Allow it to accept only a single identifier, something like
/^\w[\w\d_]*$/
You'll have to rely on assumptions you can make about your own column names.

I use ADO.NET and the use of SQL Commands and SQLParameters to those commands which take care of the Escape problem. So if you are in a Microsoft-tool environment as well, I can say that I use this very sucesfully to build dynamic SQL and yet protect my parameters
best of luck

Make the column based on the results of another query to a table that enumerates the possible schema values. In that second query you can hardcode the select to the column name that is used to define the schema. if no rows are returned then the entered column is invalid.

In standard SQL, you enclose delimited identifiers in double quotes. This means that:
SELECT * FROM "SomeTable" WHERE "SomeColumn" = ?
will select from a table called SomeTable with the shown capitalization (not a case-converted version of the name), and will apply a condition to a column called SomeColumn with the shown capitalization.
Of itself, that's not very helpful, but...if you can apply the escape() technique with double quotes to the names entered via your web form, then you can build up your query reasonably confidently.
Of course, you said you wanted to avoid using escape - and indeed you don't have to use it on the parameters where you provide the ? place-holders. But where you are putting user-provided data into the query, you need to protect yourself from malicious people.
Different DBMS have different ways of providing delimited identifiers. MS SQL Server, for instance, seems to use square brackets [SomeTable] instead of double quotes.

Column names in some databases can contain spaces, which mean you'd have to quote the column name, but if your database contains no such columns, just run the column name through a regular expression or some sort of check before splicing into the SQL:
if ( $column !~ /^\w+$/ ) {
die "Bad column name [$column]";
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse