I feel like I'm missing a very basic concept. I could use some clarification or reference material.
On my website, I have a user that enters text into an input box and submits that to the database to be stored. I insert that text into the database using a function in the code block below where $conn->exec(query) is from Pg.pm.
$conn->exec("select someFunc($mykey,'text to insert');");
Now, this works, but is vulnerable to a sql injection or even if a user enters a comma, it breaks.
I read about DBD::Pg which has the prepare statement which seems what I want, but I could not find the equivalent of this for Pg.pm. Did I miss it?
If Pg.pm does not support prepare. Should I be using a perl module that supports the prepare statement? Or can I just follow the approach outlined at bobby-tables with quote_ident() and quote_literal in my SQL functions that are inserting/updating user-input fields.
How should I be handling user-input in a safe way?
You can not just use quote_ident and quote_literal, because they're at the SQL level, applying to dynamic SQL invoked with EXECUTE. It won't do you any good when passing arguments into the function because the SQL string parsing (and SQL injection attack risk) occurs before the function is even executed with those arguments.
You really need either prepared statement support or a strong, secure literal escaping function that understands PostgreSQL literal quoting rules. If your database driver provides neither then is is unacceptably insecure and should be discarded in favour of one that does.
Related
I have ormlite integrated into an application I'm working on. Right now I'm trying to build in functionality to easily switch from automatically inserting data to the database to outputting the equivalent collection of insert statements to a file for later use. The data isn't user input but still requires proper escaping to handle basic gotchas like apostrophes.
Ideas I've burned through:
Dao.create() writes to the database directly, so that's a no-go.
QueryBuilder can't handle inserts.
JdbcDatabaseConnection.compileStatement() might work but the amount of setup required is inappropriate.
Using a java.sql.PreparedStatement has a reasonable enough interface (if toString() returns the SQL like I would hope) but it's not compatible with ormlite's connection types.
This should be very easy and if it is, I can't find the right combination of method calls to make it happen.
Right now I'm trying to build in functionality to easily switch from automatically inserting data to the database to outputting the equivalent collection of insert statements to a file for later use.
Interesting. So one hack would be to use the MappedCreate class. The MappedCreate.build(...) method takes a DatabaseType and a TableInfo which is available from the dao.getTableInfo().
The mappedCreate.toString() exposed the generated INSERT statement (with a prefix) which might help but you would still need to convert the ? arguments to be the actual values with escaped quotes. That you would have to do in your own code.
Hope this helps somewhat.
In PostgreSQL, when are (SELECT) queries planned?
Is it:
at statement-prepare time, or
at the start of processing the SELECT, or
something else
The reason I ask is that there is a Stackoverflow question: same query, two different ways, vastly different performance
A lot of people seem to be thinking that the query is planned differently because in one case the query contains a string literal ('foo') and in another case it's a placeholder (?).
Now my thinking is that this is a red herring, because the query isn't planned at statement-prepare time, but is actually planned at SELECT time.
So, say, I could prepare a statement with a placeholder, then run the query multiple times with different bound values, and the query planner will be run for each different bound value.
I suspect that the question linked above boils down to the PostgreSQL data type of the value, which in the case of a 'foo' literal is known to be a string, but in the case of a placeholder, the type can't be divined, so is coming through to the query planner as some strange type, which it can't create an efficient plan for. In which case, the issue is not that the query is being planned differently because the value is a placeholder (at statement preparation time) per se but that the value is coming through to the query as a different PostgreSQL type, and that is what is influencing the query planner. To fix this would simply be a matter of binding the placeholder with an appropriate explicit type declaration.
I cannot talk about the client-side Perl interface itself but I can shed some light on the PostgreSQL server side.
PostgreSQL has prepared statements and unprepared statements. Unprepared statements are parsed, planned and executed immediately. They also do not support parameter substitution. On a plain psql shell you can show their query plan like this:
tmpdb> explain select * from sometable where flag = true;
On the other hand there are prepared statements: They are usually (see "exception" below) parsed and planned in one step and executed in a second step. They can be re-executed several times with different parameters, because they do support parameter substitution. The equivalent in psql is this:
tmpdb> prepare foo as select * from sometable where flag = $1;
tmpdb> explain execute foo(true);
You may see, that the plan is different from the plan in the unprepared statement, because planning did take place already in the prepare phase as described in the doc for PREPARE:
When the PREPARE statement is executed, the specified statement is parsed, rewritten, and planned. When an EXECUTE command is subsequently issued, the prepared statement need only be executed. Thus, the parsing, rewriting, and planning stages are only performed once, instead of every time the statement is executed.
This also means, that the plan is NOT optimized for the substituted parameters: In the first examples might use an index for flag because PostgreSQL knows that within a million entries only ten have the value true. This reasoning is impossible when PostgreSQL uses a prepared statement. In that case a plan is created which will work for all possible parameter values as good as possible. This might exclude the mentioned index because fetching the better part of the complete table via random access (due to the index) is slower than a plain sequential scan. The PREPARE doc confirms this:
In some situations, the query plan produced for a prepared statement will be inferior to the query plan that would have been chosen if the statement had been submitted and executed normally. This is because when the statement is planned and the planner attempts to determine the optimal query plan, the actual values of any parameters specified in the statement are unavailable. PostgreSQL collects statistics on the distribution of data in the table, and can use constant values in a statement to make guesses about the likely result of executing the statement. Since this data is unavailable when planning prepared statements with parameters, the chosen plan might be suboptimal.
BTW - Regarding plan caching the PREPARE doc also has something to say:
Prepared statements only last for the duration of the current database session. When the session ends, the prepared statement is forgotten, so it must be recreated before being used again.
Also there is no automatic plan caching and no caching/reuse over multiple connections.
EXCEPTION: I have mentioned "usually". The shown psql examples are not the stuff a client adapter like Perl DBI really uses. It uses a certain protocol. Here the term "simple query" corresponds to the "unprepared query" in psql, the term "extended query" corresponds to "prepared query" with one exception: There is a distinction between (one) "unnamed statement" and (possibly multiple) "named statements". Regarding named statements the doc says:
Named prepared statements can also be created and accessed at the SQL command level, using PREPARE and EXECUTE.
and also:
Query planning for named prepared-statement objects occurs when the Parse message is processed.
So in this case planning is done without parameters as described above for PREPARE - nothing new.
The mentioned exception is the "unnamed statement". The doc says:
The unnamed prepared statement is likewise planned during Parse processing if the Parse message defines no parameters. But if there are parameters, query planning occurs every time Bind parameters are supplied. This allows the planner to make use of the actual values of the parameters provided by each Bind message, rather than use generic estimates.
And here is the benefit: Although the unnamed statement is "prepared" (i.e. can have parameter substitution), it also can adapt the query plan to the actual parameters.
BTW: The exact handling of the unnamed statement has changed several times in the past releases of the PostgreSQL server. You can lookup the old docs for details if you really want.
Rationale - Perl / any client:
How a client like Perl uses the protocol is a completely different question. Some clients like the JDBC driver for Java basically say: Even if the programmer uses a prepared statement, the first five (or so) executions are internally mapped to a "simple query" (i.e. effectively unprepared), after that the driver switches to "named statement".
So a client has these choices:
Force (re)planning each time by using the "simple query" protocol.
Plan once, execute multiple times by using the "extended query" protocol and the "named statement" (plan might be bad because planning is done without parameters).
Parse once, plan for each execution (with current PostgreSQL version) by using the "extended query" protocol and the "unnamed statement" and obeying some more things (provide some params during "parse" message)
Play completely different tricks like the JDBC driver.
What Perl does currently: I don't know. But the mentioned "red herring" is not very unlikely.
In an effort to adhere to the Dry Principle I have some code I feel could easily live in a function. I may need to reuse this code at some point in the future, I may not. Ideally I would have a function that lives just in this piece of code as it provides no benefit to the database as a whole and living inside any of the existing scheme's will create noise when trying to find meaningful and globally useful functions.
I have tried to write a script which uses typical syntax to create a function before my other code and drop the function at the end of the code. This is less than ideal because of potential collisions in the future, but an acceptable risk. Unfortunately I get an error:
'CREATE FUNCTION' must be the first statement in a query batch.
Adding semi-colons before and after the statement unfortunately is not a quick fix. Is there no way to quickly to use functions without building them into the framework of the database?
Or am I asking the wrong question. Is there a way in one script to force separate batches?
If you're truly running a "batch" (e.g. a set of T-SQL commands run in Query analyzer or ossql), then simply use "go". Your "create function" should work if it's the first line after a "go" - again, depending on your T-SQL interpreter. OSSQL: should work. An ADO connection in a VB6 program: definitely WON'T work.
I'm trying to search all tables and columns in a database, a la here. The suggested technique is to construct SQL query strings and then EXEC them. This works well, as a stored procedure. (Another example of variable table/column names is here. Again, EXEC is used to execute "dynamic SQL".)
However, my app requires that I do this in a function, not an SP. (Our development framework has trouble obtaining results from an SP.) But in a function, at least on SQL Server 2008 R2, you can't use EXEC; I get this error:
Invalid use of a side-effecting operator 'INSERT EXEC' within a function.
According to the answer to this post, apparently by a Microsoft developer, this is by design; it has nothing to do with the INSERT, only the fact that when you execute dynamically-constructed SQL code, the parser cannot guarantee a lack of side effects. Therefore it won't allow you to create such a function.
So... is there any way to iterate over many tables/columns within a function?
I see from BOL that
The following statements are valid in a function: ...
EXECUTE
statements calling extended stored procedures.
Huh - How could extended SP's be guaranteed side-effect free?
But that doesn't help me anyway:
The extended stored procedure, when it is called from inside a
function, cannot return result sets to the client. Any ODS APIs that
return result sets to the client will return FAIL. The extended stored
procedure could connect back to an instance of SQL Server; however, it
should not try to join the same transaction as the function that
invoked the extended stored procedure.
Since we need the function to return the results of the search, an ESP won't help.
I don't really want to get into extended SP's anyway: incrementing the number of programming languages in the environment would complicate our development environment more than it's worth.
I can think of a few solutions right now, none of which is very satisfactory:
First call an SP that produces the needed data and puts it in a table, then select from the function which merely reads the result from the table; this could be trouble if the search takes a while and two users' searches overlap. Or,
Have the application (not the function) generate a long query naming every table and column name from the db. I wonder if the JDBC driver can handle a query that long. Or,
Have the application (not the function) generate a long series of short queries naming every table and column name from the db. This will make the overall search a lot slower.
Thanks for any suggestions.
P.S. Upon further searching, I stumbled across this question which is closely related. It has no answers.
Update: No longer needed
I think this question is still valid, and we may again have a situation where we need it. However, I don't need an answer anymore for the present problem. After much trial-and-error I managed to get our application framework to retrieve row results from the RDBMS via the JDBC driver from the stored procedure. Therefore getting the thing to work as a function is unnecessary.
But if anyone posts an answer here that helps with the stated problem, I will be happy to upvote and/or accept it as appropriate.
An sp is basically a predefined sql statment with some add ons.
So if you had
PSEUDOCODE
Create SP_DoSomething As
Select * From MyTable
END
And you can't use the SP
Then you just execute the SQL as in "Select * From MyTable"
As for that naff sql code.
For start you could join table to column with a where clause, which would get rid of that line by line if stuff.
Ask another question. Like How could this be improved, there's lots of scope for more attempts than mine.
How can I disable string escape in $db->insert, I need to insert html in my database, so I don't want any string escape.Any solutions?
You don't want to disable that escaping.
Escaping data doesn't prevent you from inserting anything. In fact, quite the opposite: escaping data enables you to properly insert characters like quote marks that could otherwise confuse the database. More importantly, passing unescaped data directly to a database exposes an enormous security hole, making it trivial for a "hacker" (if we use the term liberally) to gain unrestricted access to your site and to your database.
You're probably confusing SQL escaping (which escapes data for use in SQL queries) with htmlspecialchars(), which escapes data for use on webpages. The two are unrelated.