How to properly parameterize my postgresql query - postgresql

I'm trying to parameterize my postgresql query in order to prevent SQL injection in my ruby on rails application. The SQL query will sum a different value in my table depending on the input.
Here is a simplified version of my function:
def self.calculate_value(value)
calculated_value = ""
if value == "quantity"
calculated_value = "COALESCE(sum(amount), 0)"
elsif value == "retail"
calculated_value = "COALESCE(sum(amount * price), 0)"
elsif value == "wholesale"
calculated_value = "COALESCE(sum(amount * cost), 0)"
end
query = <<-SQL
select CAST(? AS DOUBLE PRECISION) as ? from table1
SQL
return Table1.find_by_sql([query, calculated_value, value])
end
If I call calculate_value("retail"), it will execute the query like this:
select location, CAST('COALESCE(sum(amount * price), 0)' AS DOUBLE PRECISION) as 'retail' from table1 group by location
This results in an error. I want it to execute without the quotes like this:
select location, CAST(COALESCE(sum(amount * price), 0) AS DOUBLE PRECISION) as retail from table1 group by location
I understand that the addition of quotations is what prevents the sql injection but how would I prevent it in this case? What is the best way to handle this scenario?
NOTE: This is a simplified version of the queries I'll be writing and I'll want to use find_by_sql.

Prepared statement can not change query structure: table or column names, order by clause, function names and so on. Only literals can be changed this way.
Where is SQL injection? You are not going to put a user-defined value in the query text. Instead, you check the given value against the allowed list and use only your own written parts of SQL. In this case, there is no danger of SQL injection.
I also want to link to this article. It is safe to create a query text dynamically if you control all parts of that query. And it's much better for RDBMS than some smart logic in query.

Related

In Postgres, how can I efficiently filter using the inner numbers of this jsonb structure?

So I work with Postgres SQL, and I have a jsonb column with the following structure:
{
"Store1":[
{
"price":5.99,
"seller":"seller"
},
{
"price":56.43,
"seller":"seller"
}
],
"Store2":[
{
"price":45.65,
"seller":"seller"
},
{
"price":44.66,
"seller":"seller"
}
]
}
I have a jsonb like this for every product in the database. I want to run an SQL query that will answer the following question:
For each product, is one of the prices in this JSON is bigger/equal/smaller than X?
Basically filter the product to include only the ones who have at least one price that satisfies a mathematical condition.
How can I do it efficiently? What's the best way in Postgres to iterate a JSON like this, with a relatively complex inner structure?
Also, if I could control the way the data is structured (to an extent, I can), what changes can I do to make this query more efficient?
Thanks!
Use a json path expression:
WHERE col ## '$.*[*].price < 20'
or
WHERE col #? '$.*[*] ? (#.price < 20)'
If you need to compare to another column or make the query parameterised, you can either build the jsonpath dynamically
WHERE col ## format('$.*[*].price < %s', $1)::jsonpath
WHERE col #? format('$.*[*] ? (#.price < %s)', $1)::jsonpath
or you can use the respective function and pass variables as an object:
WHERE jsonb_path_match(col, '$.*[*].price < $limit', jsonb_build_object('limit', $1))
WHERE jsonb_path_exists(col, format('$.*[*] ? (#.price < $limit)', jsonb_build_object('limit', $1))
I admit I had to check my cheat sheet to figure out the right combination of operator and expression. Takeaways:
if a comparison operator needs to work with multiple values, it generally functions as an ANY
## does not work with ? (# …) filter expressions since they don't return a boolean,
#? does not work with predicates since they always return a value (even if it's false)
What changes can I do to make this query more efficient?
As #jjanes commented on my other answer, the jsonpath match col ## '$.*[*].price < $limit' isn't going to be fast and needs to do full table scan, at least for < and >. To make a useful index, a different approach is required. An index can only have a single value to compare with, not any number. For that, we need to change the condition from EXISTS(SELECT prices_of(col) WHERE price < $limit) to (SELECT MIN(prices_of(col))) < $limit.
With this idea it is possible to build an expression index on the result of a custom immutable function:
CREATE FUNCTION min_price(data jsonb) RETURNS float
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT
RETURN (
SELECT min((offer ->> 'price')::float)
FROM jsonb_each(data) AS entries(name, store),
LATERAL jsonb_array_elements(store) AS elements(offer)
);
CREATE INDEX example_min_data_price_idx ON example (min_price(data));
which you can use as
SELECT * FROM example WHERE min_price(data) < 20;
Looking for rows with a price larger than a certain number requires a separate index on max_price(data). If you want to use the index in a JOIN with more conditions, consider making it a multi-column index.
Looking for row with a price equalling a certain number can be optimised by indexing the jsonb column and using a jsonpath:
CREATE INDEX example_data_idx ON example USING GIN (data jsonb_ops);
SELECT * FROM example WHERE data ## '$.*[*].price == 20';
SELECT * FROM example WHERE data #? '$.*[*] ? (#.price == 20)';
Unfortunately you can't use jsonb_path_ops here since that doesn't support the wildcard.

How to format a number in Entity Framework LINQ (without trailing zeroes)?

In a SQL Server database I have a column of decimal datatype defined something like this:
CREATE TABLE MyTable
(
Id INT,
Number DECIMAL(9, 4)
)
I use Entity Framework and I would like to return column Number converted to a string with only the digits right of the decimal separator that are actually needed. A strict constraint is that a result must be an IQueryable.
So my query is:
IQueryable queryable = (
from myTable in MyDatabase.NyTable
select new
{
Id = myTable.Id,
Number = SqlFunctions.StringConvert(myTable.Number,9,4)
}
);
The problem with is that it always convert number to string with 4 decimals, even if they are 0.
Examples:
3 is converted to "3.0000"
1.2 is converted to "1.2000"
If I use other parameters for StringConvert i.e.
SqlFunctions.StringConvert(myTable.Number, 9, 2)
the results are also not OK:
0.375 gets rounded to 0.38.
StringConvert() function is translated into SQL Server function STR.
https://learn.microsoft.com/en-us/sql/t-sql/functions/str-transact-sql?view=sql-server-2017
This explains the weird results.
In the realm of Entity Framework and LINQ I was not able to find a working solution.
What I look for is something like C# function
String.Format("0.####", number)
but this cannot be used in a LINQ query.
In plain simple SQL I could write my query like this
SELECT
Id,
Number = CAST(CAST(Number AS REAL) AS VARCHAR(15))
FROM
MyTable
I have not managed to massage LINQ to produce query like that.
A workaround would be to forget doing this in LINQ, which is quite inflexible and messy thing, borderline on useless and just return type DECIMAL from database and do my formatting on a client side before displaying. But this is additional, unnecessary code and I would hate to di it that way if there perhaps is a simpler way via LINQ.
Is it possible to format numbers in LINQ queries?
I would absolutely return a decimal from he database and format it when needed. Possible directly after the query. But usually this is done at display time to take into account culture specific formatting from the the client.
var q =
(from myTable in MyDatabase.NyTable
select new
{
Id = myTable.Id,
Number = myTable.Number
})
.AsEnumerable()
.Select(x => new { Id = x.Id, Number = x.Number.ToString("G29") });

Prepare dynamic case statement using PostgreSQL 9.3

I have the following case statement to prepare as a dynamic as shown below:
Example:
I have the case statement:
case cola
when cola between '2001-01-01' and '2001-01-05' then 'G1'
when cola between '2001-01-10' and '2001-01-15' then 'G2'
when cola between '2001-01-20' and '2001-01-25' then 'G3'
when cola between '2001-02-01' and '2001-02-05' then 'G4'
when cola between '2001-02-10' and '2001-02-15' then 'G5'
else ''
end
Note: Now I want to create dynamic case statement because of the values dates and name passing as a parameter and it may change.
Declare
dates varchar = '2001-01-01to2001-01-05,2001-01-10to2001-01-15,
2001-01-20to2001-01-25,2001-02-01to2001-02-05,
2001-02-10to2001-02-15';
names varchar = 'G1,G2,G3,G4,G5';
The values in the variables may change as per the requirements, it will be dynamic. So the case statement should be dynamic without using loop.
You may not need any function for this, just join to a mapping data-set:
with cola_map(low, high, value) as (
values(date '2001-01-01', date '2001-01-05', 'G1'),
('2001-01-10', '2001-01-15', 'G2'),
('2001-01-20', '2001-01-25', 'G3'),
('2001-02-01', '2001-02-05', 'G4'),
('2001-02-10', '2001-02-15', 'G5')
-- you can include as many rows, as you want
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
If your date ranges could collide, you can use DISTINCT ON or GROUP BY to avoid row duplication.
Note: you can use a simple sub-select too, I used a CTE, because it's more readable.
Edit: passing these data (as a single parameter) can be achieved by passing a multi-dimensional array (or an array of row-values, but that requires you to have a distinct, predefined composite type).
Passing arrays as parameters can depend on the actual client (& driver) you use, but in general, you can use the array's input representation:
-- sql
with cola_map(low, high, value) as (
select d[1]::date, d[2]::date, d[3]
from unnest(?::text[][]) d
)
select table_name.*,
coalesce(cola_map.value, '') -- else branch from case expression
from table_name
left join cola_map on table_name.cola between cola_map.low and cola_map.high
// client pseudo code
query = db.prepare(sql);
query.bind(1, "{{2001-01-10,2001-01-15,G2},{2001-01-20,2001-01-25,G3}}");
query.execute();
Passing each chunk of data separately is also possible with some clients (or with some abstractions), but this is highly depends on your driver/orm/etc. you use.

UDTF returning a Table on DB2 V5R4 with Dynamic SQL

I must to write a UDF returning a Table. I’ve done it with Static SQL.
I’ve created Procedures preparing a Dynamic and Complex SQL sentence and returning a cursor.
But now I must to create a UDF with Dynamic SQL and return a table to be used with an IN clause inside other select.
It is possible on DB2 v5R4? Do you have an example?
Thanks in advance...
I don't have V5R4, but I have i 6.1 and V5R3. I have a 6.1 example, and I poked around in V5R3 to find how to make the same example work there. I can't guarantee V5R4, but this ought to be extremely close. Generating the working V5R3 code into 'Run SQL Scripts' gives this:
DROP SPECIFIC FUNCTION SQLEXAMPLE.DYNTABLE ;
SET PATH "QSYS","QSYS2","SYSPROC","SYSIBMADM","SQLEXAMPLE" ;
CREATE FUNCTION SQLEXAMPLE.DYNTABLE (
SELECTBY VARCHAR( 64 ) )
RETURNS TABLE (
CUSTNBR DECIMAL( 6, 0 ) ,
CUSTFULLNAME VARCHAR( 12 ) ,
CUSTBALDUE DECIMAL( 6, 0 ) )
LANGUAGE SQL
NO EXTERNAL ACTION
MODIFIES SQL DATA
NOT FENCED
DISALLOW PARALLEL
CARDINALITY 100
BEGIN
DECLARE DYNSTMT VARCHAR ( 512 ) ;
DECLARE GLOBAL TEMPORARY TABLE SESSION.TCUSTCDT
( CUSTNBR DECIMAL ( 6 , 0 ) NOT NULL ,
CUSTNAME VARCHAR ( 12 ) ,
CUSTBALDUE DECIMAL ( 6 , 2 ) )
WITH REPLACE ;
SET DYNSTMT = 'INSERT INTO Session.TCustCDt SELECT t2.CUSNUM , (t2.INIT CONCAT '' '' CONCAT t2.LSTNAM) as FullName , t2.BALDUE FROM QIWS.QCUSTCDT t2 ' CONCAT CASE WHEN SELECTBY = '' THEN '' ELSE SELECTBY END ;
EXECUTE IMMEDIATE DYNSTMT ;
RETURN SELECT * FROM SESSION . TCUSTCDT ;
END ;
COMMENT ON SPECIFIC FUNCTION SQLEXAMPLE.DYNTABLE
IS 'UDTF returning dynamic table' ;
And in 'Run SQL Scripts', the function can be called like this:
SELECT t1.* FROM TABLE(sqlexample.dyntable('WHERE STATE = ''TX''')) t1
The example is intended to work over IBM's sample QCUSCDT table in library QIWS. Most systems will have that table available. The table function returns values from two QCUSCDT columns, CUSNUM and BALDUE, directly through two of the table function's columns, CUSTNBR and CUSTBALDUE. The third table function column, CUSTFULLNAME, gets its value by a concatenation of INIT and LSTNAM from QCUSTCDT.
However, the part that apparently relates to the question is the SELECTBY parameter of the function. The usage example shows that a WHERE clause is passed in and used to help built a dynamic 'INSERT INTO... SELECT...statement. The example shows that rows containingSTATE='TX'` will be returned. A more complex clause could be passed in or the needed condition(s) could be retrieved from somewhere else, e.g., from another table.
The dynamic statement inserts rows into a GLOBAL TEMPORARY TABLE named SESSION.TCUSTCDT. The temporary table is defined in the function. The temporary column definitions are guaranteed (by the developer) to match the 'RETURNS TABLE` columns of the table function because no dynamic changes can be made to any of those elements. This allows SQL to handle reliably columns returned from the function, and that lets it compile the function.
The RETURN statement simply returns whatever rows are in the temporary table after the dynamic statement completes.
The various field definitions take into account the somewhat unusual definitions in the QCUSTCDT file. Those don't make great sense, but they're useful enough.

SQL invalid conversion return null instead of throwing error

I have a table with a varchar column, and I want to find values that match a certain number. So lets say that column contains the following entries (except with millions of rows in real life):
123456789012
2345678
3456
23 45
713?2
00123456789012
So I decide I want all the rows which are numerically 123456789012 write a statement that looks something like this:
SELECT * FROM MyTable WHERE CAST(MyColumn as bigint) = 123456789012
It should return the first and last row, but instead the whole query blows up because it can't convert the "23 45" and "713?2" to bigint.
Is there another way to do the conversion that will return NULL for values that can't convert?
SQL Server does NOT guarantee boolean operator short-circuit, see On SQL Server boolean operator short-circuit. So all solution using ISNUMERIC(...) AND CAST(...) are fundamentally flawed (they may work, but hey can arbitrarily fail later dependiong on the generated plan). A better solution is using CASE, as Thomas suggests: CASE ISNUMERIC(...) WHEN 1 THEN CAST(...) ELSE NULL END. But, as gbn pointed out, ISNUMERIC is notoriously finicky in identifying what 'numeric' means and many cases where one would expect it to return 0 it returns 1. So mixing the CASE with the LIKE:
CASE WHEN MyRow NOT LIKE '%[^0-9]%' THEN CAST(MyRow as bigint) ELSE NULL END
But the real problem is that if you have millions of rows and you have to search them like this, you'll always end up scanning end-to-end since the expression is not SARG-able (no matter how we rewrite it). The real issue here is data purity, and should be addressed at the appropriate level, where the data is populated. Another thing to consider is if is possible to create a persisted computed column with this expression and create a filtered index on it which eliminates NULL (ie. non-numeric). That would speed up things a little.
If you are using SQL Server 2012 you can use the 2 new methods:
TRY_CAST()
TRY_CONVERT()
Both methods are equivalent. They return a value cast to the specified data type if the cast succeeds; otherwise, returns null. The only difference is that CONVERT is SQL Server specific, CAST is ANSI. using CAST will make your code more portable (although not sure if any other database provider implements TRY_CAST)
ISNUMERIC will accept empty string and values like 1.23 or 5E-04 so could be unreliable.
And you don't know what order things will be evaluated in so it could still fail (SQL is declarative, not procedural, so the WHERE clause probably won't be evaluated left to right)
So:
you want to accept value that consist only of the characters 0-9
you need to materialise the "number" filter so it's applied before CAST
Something like:
SELECT
*
FROM
(
SELECT TOP 2000000000 *
FROM MyTable
WHERE MyColumn NOT LIKE '%[^0-9]%' --double negative rejects anything except 0-9
ORDER BY MyColumn
) foo
WHERE
CAST(MyColumn as bigint) = 123456789012 --applied after number check
Edit: quick example that fails.
CREATE TABLE #foo (bigintstring varchar(100))
INSERT #foo (bigintstring )VALUES ('1.23')
INSERT #foo (bigintstring )VALUES ('1 23')
INSERT #foo (bigintstring )VALUES ('123')
SELECT * FROM #foo
WHERE
ISNUMERIC(bigintstring) = 1
AND
CAST(bigintstring AS bigint) = 123
SELECT *
FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as float) = 123456789012
The ISNUMERIC() function should give you what you need.
SELECT * FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as bigint) = 123456789012
And to add a case statement like Thomas suggested:
SELECT * FROM MyTable
WHERE CASE(ISNUMERIC(MyRow)
WHEN 1 THEN CAST(MyRow as bigint)
ELSE NULL
END = 123456789012
http://msdn.microsoft.com/en-us/library/ms186272.aspx
SELECT *
FROM MyTable
WHERE (ISNUMERIC(MyColumn) = 1) AND (CAST(MyColumn as bigint) = 123456789012)
Additionally you can use a CASE statement in order to get null values.
SELECT
CASE
WHEN (ISNUMERIC(MyColumn) = 1) THEN CAST(MyColumn as bigint)
ELSE NULL
END AS 'MyColumnAsBigInt'
FROM tableName
If you require additional filtering, for numerics which are not valid to be cast to bigint, you can use the following instead of ISNUMERIC:
PATINDEX('%[^0-9]%',MyColumn)) = 0
If you need decimal values instead of integers, cast to float instead and change the regex to '%[^0-9.]%'