How can I prevent SQL injection with arbitrary JSONB query string provided by an external client? - postgresql

I have a basic REST service backed by a PostgreSQL database with a table with various columns, one of which is a JSONB column that contains arbitrary data. Clients can store data filling in the fixed columns and provide any JSON as opaque data that is stored in the JSONB column.
I want to allow the client to query the database with constraints on both the fixed columns and the JSONB. It is easy to translate some query parameters like ?field=value and convert that into a parameterized SQL query for the fixed columns, but I want to add an arbitrary JSONB query to the SQL as well.
This JSONB query string could contain SQL injection, how can I prevent this? I think that because the structure of the JSONB data is arbitrary I can't use a parameterized query for this purpose. All the documentation I can find suggests I use parameterized queries, and I can't find any useful information on how to actually sanitize the query string itself, which seems like my only option.
For example a similar question is:
How to prevent SQL Injection in PostgreSQL JSON/JSONB field?
But I can't apply the same solution as I don't know the structure of the JSONB or the query, I can't assume the client wants to query a particular path using a particular operator, the entire JSONB query needs to be freely provided by the client.
I'm using golang, in case there are any existing libraries or code fragments that I can use.
edit: some example queries on the JSONB that the client might do:
(content->>'company') is NULL
(content->>'income')::numeric>80000
content->'company'->>'name'='EA' AND (content->>'income')::numeric>80000
content->'assets'#>'[{"kind":"car"}]'
(content->>'DOB')::TIMESTAMP<'2000-01-30T10:12:18.120Z'::TIMESTAMP
EXISTS (SELECT FROM jsonb_array_elements(content->'assets') asset WHERE (asset->>'value')::numeric > 100000)
Note that these don't cover all possible types of queries. Ideally I want any query that PostgreSQL supports on the JSONB data to be allowed. I just want to check the query to ensure it doesn't contain sql injection. For example, a simplistic and probably inadequate solution would be to not allow any ";" in the query string.

You could allow the users to specify a path within the JSON document, and then parameterize that path within a call to a function like json_extract_path_text. That is, the WHERE clause would look like:
WHERE json_extract_path_text(data, $1) = $2
The path argument is just a string, easily parameterized, which describes the keys to traverse down to the given value, e.g. 'foo.bars[0].name'. The right-hand side of the clause would be parameterized along the same rules as you're using for fixed column filtering.

Related

Redshift Spectrum table doesnt recognize array

I have ran a crawler on json S3 file for updating an existing external table.
Once finished I checked the SVL_S3LOG to see the structure of the external table and saw it was updated and I have new column with Array<int> type like expected.
When I have tried to execute select * on the external table I got this error: "Invalid operation: Nested tables do not support '*' in the SELECT clause.;"
So I have tried to detailed the select statement with all columns names:
select name, date, books.... (books is the Array<int> type)
from external_table_a1
and got this error:
Invalid operation: column "books" does not exist in external_table_a1;"
I have also checked under "AWS Glue" the table external_table_a1 and saw that column "books" is recognized and have the type Array<int>.
Can someone explain why my simple query is wrong?
What am I missing?
Querying JSON data is a bit of a hassle with Redshift: when parsing is enabled (eg using the appropriate SerDe configuration) the JSON is stored as a SUPER type. In your case that's the Array<int>.
The AWS documentation on Querying semistructured data seems pretty straightforward, mentioning that PartiQL uses "dotted notation and array subscript for path navigation when accessing nested data". This doesn't work for me, although I don't find any reasons in their SUPER Limitations Documentation.
Solution 1
What I have to do is set the flags set json_serialization_enable to true; and set json_serialization_parse_nested_strings to true; which will parse the SUPER type as JSON (ie back to JSON). I can then use JSON-functions to query the data. Unnesting data gets even crazier because you can only use the unnest syntax select item from table as t, t.items as item on SUPER types. I genuinely don't think that this is the supposed way to query and unnest SUPER objects but that's the only approach that worked for me.
They described that in some older "Amazon Redshift Developer Guide".
Solution 2
When you are writing your query or creating a query Redshift will try to fit the output into one of the basic column data types. If the result of your query does not match any of those types, Redshift will not process the query. Hence, in order to convert a SUPER to a compatible type you will have to unnest it (using the rather peculiar Redshift unnest syntax).
For me, this works in certain cases but I'm not always able to properly index arrays, not can I access the array index (using my_table.array_column as array_entry at array_index syntax).

test JOOQ postgres jsonb column for key exists

I've got a table TABLE that contains a jsonb column named tags. The tags element in each row may or may not contain a field called group. My goal is to group by tags.group for all rows where tags contains a group field. Like the following postgres query:
select tags->>'group' as group, sum(n) as sum
from TABLE
where tags ? 'group'
group by tags->>'group';
I'm trying to turn it into JOOQ and cannot find out how to express the where tags ? 'group' condition.
For example,
val selectGroup = DSL.field("{0}->>'{1}'", String::class.java, TABLE.TAGS, "group")
dsl().select(selectGroup, DSL.sum(TABLE.N))
.from(TABLE)
.where(TABLE.TAGS.contains('group'))
.groupBy(selectGroup)
This is equivalent to testing contains condition #> in postgres. But I need to do exists condition ?. How can I express that in JOOQ?
There are two things worth mentioning here:
The ? operator in JDBC
Unfortunately, there's no good solution to this as ? is currently strictly limited to be used as a bind variable placeholder in the PostgreSQL JDBC driver. So, even if you could find a way to send that character to the server through jOOQ, the JDBC driver would still misinterpret it.
A workaround is documented in this Stack Overflow question.
Plain SQL and string literals
When you're using the plain SQL templating language in jOOQ, beware that there is a parser that will parse certain tokens of your string, including e.g. comments and string literals. This means that your usage of...
DSL.field("{0}->>'{1}'", String::class.java, TABLE.TAGS, "group")
is incorrect, as '{1}' will be parsed as a string literal and sent to the server as is. If you want to use a variable string literal, do this instead:
DSL.field("{0}->>{1}", String::class.java, TABLE.TAGS, DSL.inline("group"))
See also DSL.inline()

How do I prevent sql injection if I want to build a query in parts within the fatfree framework?

I am using the fatfree framework, and on the front-end I am using jQuery datatables plugin with server-side processing. And thus, my server-side controller may or may not receive a variable number of information, for example a variable number of columns to sort on, a variable number of filtering options and so forth. So if I don't receive any request for sorting, I don't need to have a ORDER BY portion in my query. So I want to generate the query string in parts as per certain conditions and join it at the end to get the final query for execution. But if I do it this way, I won't have any data sanitization which is really bad.
Is there a way I can use the frameworks internal sanitization methods to build the query string in parts? Also is there a better/safer way to do this than how I am approaching it?
Just use parameterized queries. They are here to prevent SQL injection.
Two possible syntaxes are allowed:
with question mark placeholders:
$db->exec('SELECT * FROM mytable WHERE username=? AND category=?',
array(1=>'John',2=>34));
with named placeholders:
$db->exec('SELECT * FROM mytable WHERE username=:name AND category=:cat',
array(':name'=>'John',':cat'=>34));
EDIT:
The parameters are here to filter the field values, not the column names, so to answer more specifically to your question:
you must pass filtering values through parameters to avoid SQL injection
you can check if column names are valid by testing them against an array
Here's a quick example:
$columns=array('category','age','weight');//columns available for filtering/sorting
$sql='SELECT * FROM mytable';
$params=array();
//filtering
$ctr=0;
if (isset($_GET['filter']) && is_array($_GET['filter'])
foreach($_GET['filter'] as $col=>$val)
if (in_array($col,$columns,TRUE)) {//test for column name validity
$sql.=($ctr?' AND ':' WHERE ')."$col=?";
$params[$ctr+1]=$val;
$ctr++;
}
//sorting
$ctr=0;
if (isset($_GET['sort']) && is_array($_GET['sort'])
foreach($_GET['sort'] as $col=>$asc)
if (in_array($col,$columns,TRUE)) {//test for column name validity
$sql.=($ctr?',':' ORDER BY ')."$col ".($asc?'ASC':'DESC');
$ctr++;
}
//execution
$db->exec($sql,$params);
NB: if column names contain weird characters or spaces, they must be quoted: $db->quote($col)

Dynamic WHERE Clause & SQL Injection

I need to create functionality for users to determine the WHERE criteria of a select - the criteria will be dynamic.
Is there a way I can achieve this without opening up my code to SQL injection?
I'm using C# / .NET Windows Application.
Using parameterized queries would go long way toward protecting you from SQL injection attacks, because most bad things happen in the value portion of your where conditions.
For exampleg given a condition a=="hello" && b=="WORLD", do this:
select a,b,c,d
from table
where a=#pa and b=#pb -- this is generated dynamically
Then, bind #pa="hello" and #pb="WORLD", and run your query.
In C#, you would start with an in-memory representation of your where clause in hand, go through it element-by-element, and produce two output objects:
A string with the where clause, where constants are replaced by automatically generated parameter references pa, pb, and so on (use your favorite naming scheme for these blind parameters: the actual names do not matter)
A dictionary of name-value pairs, where names correspond to the parameters that you've inserted in your where clause, and values that correspond to the constants that you pulled from the expression representation.
With these outputs in hand, you prepare your dynamic query using the string, add parameter values using the dictionary, and then execute the query against your RDBMS source.
DO NOT DO THIS
select a,b,c,d
from table
where a='hello' and b='WORLD' -- This dynamic query is ripe for an interjection attack
Ah two phases. Given you column names and operators are not direct user input. E.g. picked from a list or radio group etc
then
String WhereClause = String.Format("Where {0} {1} #{0}","Customer", "=");
So now you Have "Where Customer = #Customer".
Then you can add aparamer Customer and set it from the user input.
There are a few ways to attack this, depends on how complex your criteria could be though.

Using table names as parameters in t-sql (eg from #tblname)

Is it possible to use the name of a table as a parameter in t-sql?
I want to insert data into a table, but I want one method in C# which has a parameter for the table.
Is this a good approach? I think if I have one form and I am choosing the table and fields to insert data into, I am essentially looking to write my own dynamic sql query built on the fly. This is another thing altogether which I am sure has its catches?
Thanks
Not directly. The only way to do this is through dynamic SQL - either EXEC or sp_ExecuteSQL. The latter has the advantage of query cache/re-use, and avoiding injection via parameters for the values - but you will have to concatenate the table-name itself into the query (you can't parameterise it), so be sure to white-list it against a list of known-good table names.