I was reading PostgreSql documentation here, and came across the following code snippet:
EXECUTE 'SELECT count(*) FROM mytable WHERE inserted_by = $1 AND inserted <= $2'
INTO c
USING checked_user, checked_date;
The documentation states that "This method is often preferable to inserting data values into the command string as text: it avoids run-time overhead of converting the values to text and back, and it is much less prone to SQL-injection attacks since there is no need for quoting or escaping".
Can you show me how this code is prone to SQL injection at all?
Edit: in all other RDBMS I have worked with this would completely prevent SQL injection. What is implemented differently in PostgreSql?
Quick answer is that it isn't by itself prone to SQL injection, As I understand your question you are asking why we don't just say so. So since you are looking for scenarios where this might lead to SQL injection, consider that mytable might be a view, and so could have additional functions behind it. Those functions might be vulnerable to SQL injection.
So you can't look at a query and conclude that it is definitely not susceptible to SQL injection. The best you can do is indicate that at the level provided, this specific level of your application does not raise sql injection concerns here.
Here is an example of a case where sql injection might very well happen.
CREATE OR REPLACE FUNCTION ban_user() returns trigger
language plpgsql security definer as
$$
begin
insert into banned_users (username) values (new.username);
execute 'alter user ' || new.username || ' WITH VALID UNTIL ''YESTERDAY''';
return new;
end;
Note that utility functions cannot be parameterized as you indicate, and we forgot to quote_ident() around new.username, thus making the field vulnerable.
CREATE OR REPLACE VIEW banned_users_today AS
SELECT username FROM banned_users where banned_date = 'today';
CREATE TRIGGER i_banned_users_today INSTEAD OF INSERT ON banned_users_today
FOR EACH ROW EXECUTE PROCEDURE ban_user();
EXECUTE 'insert into banned_users_today (username) values ($1)'
USING 'postgres with password ''boo''; drop function ban_user() cascade; --';
So no it doesn't completely solve the problem even if used everywhere it can be used. And proper use of quote_literal() and quote_ident() don't always solve the problem either.
The thing is that the problem can always be at a lower level than the query you are executing.
The bound parameters prevent garbage to manipulate the statement into doing anything other than what it's intended to do.
This guarantees no possibility for SQL-injection attacks short of a Postgres bug. (See H2C03's link for examples of what could go wrong.)
I imagine the "much less prone to SQL-injection attacks" amounts to CYA verbiage were such a thing were to arise.
SQL injection is usually associated with large data dumps on pastebin.com and such scenario won't work here even if the example used contatenation not variables. It's because that COUNT(*) will aggregate all data you'd be trying to steal.
But I can imagine scenarios where count of arbitrary record would be sufficiently valuable information - e.g. number of competitor's clients, number of sold products etc. And actually recalling some of the really tricky blind SQL injection methods it might be possible to build a query that using COUNT alone would allow to iteratively recover actual text from the database.
It would be also much easier to exploit on a database sufficiently old and misconfigured to allow the ; separator, in which case the attacker might just append a completely separate query.
Related
I use PostgreSQL exclusively. I have no plans to ever change this. However, I recognize that other people are not me, and they instead use MySQL, MS SQL, IBM SQL, SQLite SQL, Oracle SQL and ManyOthers SQL. I'm aware that they have different names in reality.
My queries look like:
SELECT * FROM table WHERE id = $1;
UPDATE table SET col = $1 WHERE col = $2;
INSERT INTO table (a, b, c) VALUES ($1, $2, $3);
My database wrapper functions currently support only PostgreSQL, by internally calling the pg_* functions.
I wish to support "the other databases" too. This would involve (the trivial part) to make my wrapper functions able to interact with the other databases by using the PHP functions for those.
The difficult part is to reconstruct the PostgreSQL-flavor SQL queries from the application into something that works identically yet will be understood by the other SQL database in use, such as MySQL. This obviously involves highly advanced parsing, analysis and final creation of the final query string. For example, this PostgreSQL SQL query:
SELECT * FROM table WHERE col ILIKE $1 ORDER BY random() LIMIT 1;
... will be turned into WeirdSQL like this:
SELECT * FROM table WHERE col ISEQUALTOKINDA %1 ORDER BY rnd() LIMIT 1;
I don't require support from any other input SQL flavor than PostgreSQL, but the output must be "all the big SQL database vendors".
Has anyone even attempted this? Or is it something that is never gonna happen as free software but might exist as a commercial offering? It sounds like it would be a thing. It would be insanely useful, and "crazier" projects have been attempted.
jOOQ is a Java library that aims to hide differences between databases. It has its own SQL grammar which tries to be compatible with everything (but parameter markers must be the JDBC ?), and generates DB-specific SQL from that.
There is an online translator, which generates the following from your query for Oracle:
select *
from table
where lower(cast(col as varchar2(4000))) like lower(cast(:1 as varchar2(4000)))
order by DBMS_RANDOM.RANDOM
fetch next 1 rows only
ODBC uses its own syntax on top on the database's syntax. ODBC drivers are required to convert ODBC parameter markers (?) to whatever the database uses, and to translate escape sequences for certain elements that are likely to have a non-standard syntax in the DB (time/GUID/interval literals, LIKE escape character, outer joins, procedure calls, function calls).
However, most escape sequences are optional, and this does not help with other syntax differences, such as the LIMIT 1.
ODBC drivers provide a long list of information about SQL syntax details, but it is the application's job to construct queries that conform to those restrictions, and not all differences can be described by this list. In practice, most ODBC applications restrict themselves to a commonly supported subset of SQL.
The issue can be simplified to this, where, within a view, any CTE referring to some_immutable_func breaks, except in the WHERE, HAVING clauses, resulting in the following error::
create or replace function some_immutable_func ()
returns int
immutable as $$
SELECT 1
$$ language sql;
create view some_view as
WITH some_cte AS (
SELECT immutable_func()
)
SELECT * FROM some_cte;
FATAL: Query processing failed due to an internal error.
CONTEXT: SQL function "immutable_func"
SSL connection has been closed unexpectedly
The connection to the server was lost. Attempting reset: Succeeded.
-- but this is okay
create view some_view as
WITH some_cte AS (
SELECT * FROM some_table
WHERE ...some_immutable_func()...
HAVING ...some_immutable_func()...
)
SELECT * FROM some_cte;
CREATE VIEW
Built-in immutable UDFs such as ABS(-3) work fine. Simply changing the UDF to be stable fixes the issue, but I am looking to optimize query performance within a complex view, where apparently the stable nature of a certain UDF slows it down nearly 100x. Ideally, I would also like to minimize changes to the view's structure, hopefully doing a simple replace-all instead of rearranging all references to the UDF into WHERE and HAVING clauses.
I believe the problem may be with the query optimizer, but I'm surprised there is little information out there regarding the cryptic/finer details of Redshift/postgres and immutability.
EDIT: I've also discovered that changing the UDF language to python seems to work fine. However, it seems to have quite slow performance in my specific use case, potentially worse than just using the STABLE SQL UDF.
Just wanted to confirm that this has been resolved since the question was posted.
The provided example now works as expected.
Using Postgres 9.3:
I am attempting to automatically populate a table when an insert is performed on another table. This seems like a good use for rules, but after adding the rule to the first table, I am no longer able to perform inserts into the second table using the writable CTE. Here is an example:
CREATE TABLE foo (
id INT PRIMARY KEY
);
CREATE TABLE bar (
id INT PRIMARY KEY REFERENCES foo
);
CREATE RULE insertFoo AS ON INSERT TO foo DO INSERT INTO bar VALUES (NEW.id);
WITH a AS (SELECT * FROM (VALUES (1), (2)) b)
INSERT INTO foo SELECT * FROM a
When this is run, I get the error
"ERROR: WITH cannot be used in a query that is rewritten by rules
into multiple queries".
I have searched for that error string, but am only able to find links to the source code. I know that I can perform the above using row-level triggers instead, but it seems like I should be able to do this at the statement level. Why can I not use the writable CTE, when queries like this can (in this case) be easily re-written as:
INSERT INTO foo SELECT * FROM (VALUES (1), (2)) a
Does anyone know of another way that would accomplish what I am attempting to do other than 1) using rules, which prevents the use of "with" queries, or 2) using row-level triggers? Thanks,
TL;DR: use triggers, not rules.
Generally speaking, prefer triggers over rules, unless rules are absolutely necessary. (Which, in practice, they never are.)
Using rules introduces heaps of problems which will needlessly complicate your life down the road. You've run into one here. Another (major) one is, for instance, that the number of affected rows will correspond to that of the very last query -- if you're relying on FOUND somewhere and your query is incorrectly reporting that no rows were affected by a query, you'll be in for painful bugs.
Moreover, there's occasional talk of deprecating Postgres rules outright:
http://postgresql.nabble.com/Deprecating-RULES-td5727689.html
As the other answer I definitely recommend using INSTEAD OF triggers before RULEs.
However if for some reason you don't want to change existing VIEW RULEs and still want use WITH you can do so by wrapping the VIEW in a stored procedure:
create function insert_foo(int) returns void as $$
insert into foo values ($1)
$$ language sql;
WITH a AS (SELECT * FROM (VALUES (1), (2)) b)
SELECT insert_foo(a.column1) from a;
This could be useful when using some legacy db through some system that wraps statements with CTEs.
Given a client library that can only execute one statement in a batch, if you run
query.exec_sql("SELECT * FROM (" + sql + ")")
Are there any vectors where sql can run anything but a SELECT ?
Are there any other ways to temporarily de-elevate a connection so it can only perform SELECT?
Note: It looks like SET ROLE solves this problem, but the the issue I have is that I am unable to create a role upfront in an easy way.
While you can put data-modifying statements in queries by embedding INSERT/UPDATE/DELETE statements in CTEs, they're only allowed at the top level, so that's not an issue.
You can, however, invoke a function, which could contain just about anything. Even if you ran this in a read-only transaction, a function could potentially elevate it to read-write.
But the solution is simple: If you don't want to allow the caller to do something, don't give them permission to do it. Create a user with only the GRANTs they need, and you can execute sql as-is.
Without the ability to define permissions, the closest you're going to get is probably a read-only transaction and/or an explicit rollback after the query, but there will still be holes you can't plug (e.g. you can't roll back a setval() call).
If the sql string came from a third party then it can be used to SQL injection. I'm not sure if that is what you are asking because it is too basic for a 56k points user to ask. Sorry if that is not the case. The string could be:
some_table; insert into user_table (user_id, admin_privilege) values (1, true);
I use both Firebird embedded and Firebird Server, and from time to time I need to reindex the tables using a procedure like the following:
CREATE PROCEDURE MAINTENANCE_SELECTIVITY
ASDECLARE VARIABLE S VARCHAR(200);
BEGIN
FOR select RDB$INDEX_NAME FROM RDB$INDICES INTO :S DO
BEGIN
S = 'SET statistics INDEX ' || s || ';';
EXECUTE STATEMENT :s;
END
SUSPEND;
END
I guess this is normal using embedded, but is it really needed using a server? Is there a way to configure the server to do it automatically when required or periodically?
First, let me point out that I'm no Firebird expert, so I'm answering on the basis of how SQL Server works.
In that case, the answer is both yes, and no.
The indexes are of course updated on SQL Server, in the sense that if you insert a new row, all indexes for that table will contain that row, so it will be found. So basically, you don't need to keep reindexing the tables for that part to work. That's the "no" part.
The problem, however, is not with the index, but with the statistics. You're saying that you need to reindex the tables, but then you show code that manipulates statistics, and that's why I'm answering.
The short answer is that statistics goes slowly out of whack as time goes by. They might not deteriorate to a point where they're unusable, but they will deteriorate down from the perfect level they're in when you recreate/recalculate them. That's the "yes" part.
The main problem with stale statistics is that if the distribution of the keys in the indexes changes drastically, the statistics might not pick that up right away, and thus the query optimizer will pick the wrong indexes, based on the old, stale, statistics data it has on hand.
For instance, let's say one of your indexes has statistics that says that the keys are clumped together in one end of the value space (for instance, int-column with lots of 0's and 1's). Then you insert lots and lots of rows with values that make this index contain values spread out over the entire spectrum.
If you now do a query that uses a join from another table, on a column with low selectivity (also lots of 0's and 1's) against the table with this index of yours, the query optimizer might deduce that this index is good, since it will fetch many rows that will be used at the same time (they're on the same data page).
However, since the data has changed, it'll jump all over the index to find the relevant pieces, and thus not be so good after all.
After recalculating the statistics, the query optimizer might see that this index is sub-optimal for this query, and pick another index instead, which is more suited.
Basically, you need to recalculate the statistics periodically if your data is in flux. If your data rarely changes, you probably don't need to do it very often, but I would still add a maintenance job with some regularity that does this.
As for whether or not it is possible to ask Firebird to do it on its own, then again, I'm on thin ice, but I suspect there is. In SQL Server you can set up maintenance jobs that does this, on a schedule, and at the very least you should be able to kick off a batch file from the Windows scheduler to do something like it.
That does not reindex, it recomputes weights for indexes, which are used by optimizer to select most optimal index. You don't need to do that unless index size changes a lot. If you create the index before you add data, you need to do the recalculation.
Embedded and Server should have exactly same functionality apart the process model.
I wanted to update this answer for newer firebird. here is the updated dsql.
SET TERM ^ ;
CREATE OR ALTER PROCEDURE NEW_PROCEDURE
AS
DECLARE VARIABLE S VARCHAR(300);
begin
FOR select 'SET statistics INDEX ' || RDB$INDEX_NAME || ';'
FROM RDB$INDICES
WHERE RDB$INDEX_NAME <> 'PRIMARY' INTO :S
DO BEGIN
EXECUTE STATEMENT :s;
END
end^
SET TERM ; ^
GRANT EXECUTE ON PROCEDURE NEW_PROCEDURE TO SYSDBA;