How does postgresql jdbc driver implement prepared batch? - postgresql

As document said, in the extended query protocol, "The query string
contained in a Parse message cannot include more than one SQL
statement".
So to support batch in prepared statement, I think the only way is to
determine the batch size in advance and then create the appropriate
prepared statement, e.g.
Given the batch size is fixed to 3, then prepare below statement:
-- create table foobar(a int, b text);
insert into foobar values($1, $2), ($3, $4), ($5, $6);
Then this prepared statement must be bound with 3 set of arguments.
The limitation is obvious: the batch size is fixed, so if you need to
do batch with size of 4, the previous prepared statement is useless
and you need to recreate it.
On the other hand, in JDBC, it seems that you just need to prepare
below statement:
insert into foobar values($1, $2);
And then calls addBatch() repeatedly until you think the batch size is enough.
What's the final statement does postgresql jdbc driver convert to? I'm
so curious.
I'm not familiar with jdbc, and although I check the driver source
codes, but I still cannot understand it.
Anybody knows the answer? Thanks.

Related

Postgresql - moving data from loosely typed staging table to strongly typed final table - and catch errors

I have a staging table that is loosely typed (all cols (except pk) of datatype text) and I want to move this data into a table that is correctly typed for the data e.g. numerics are numerics, dates are dates etc.
My initial approach is to create a cursor in pgsql and loop through the data inserting it into the final table with appropriate casts.
I will catch errors and use them to insert rows that fail into an errors table that I can look at and address later.
Is there a cleverer way of doing this?
Regards
Dave
Use a PL/PgSQL function with a BEGIN ... EXCEPTION WHEN ... END block wrapping the operation(s) that might fail.
See Postgres sql exception handling for batch insert for an example, and caveats.
It'd be nice if PostgreSQL's cast functions could be called in a test mode, where they return a boolean result indicating success and discard the product, or even better return a tuple of (result, successflag). That way you could do it all with plain SQL. Unfortunately they don't support that, so you've got to use an exception handling loop and subtransactions.

How is this code prone to SQL injection?

I was reading PostgreSql documentation here, and came across the following code snippet:
EXECUTE 'SELECT count(*) FROM mytable WHERE inserted_by = $1 AND inserted <= $2'
INTO c
USING checked_user, checked_date;
The documentation states that "This method is often preferable to inserting data values into the command string as text: it avoids run-time overhead of converting the values to text and back, and it is much less prone to SQL-injection attacks since there is no need for quoting or escaping".
Can you show me how this code is prone to SQL injection at all?
Edit: in all other RDBMS I have worked with this would completely prevent SQL injection. What is implemented differently in PostgreSql?
Quick answer is that it isn't by itself prone to SQL injection, As I understand your question you are asking why we don't just say so. So since you are looking for scenarios where this might lead to SQL injection, consider that mytable might be a view, and so could have additional functions behind it. Those functions might be vulnerable to SQL injection.
So you can't look at a query and conclude that it is definitely not susceptible to SQL injection. The best you can do is indicate that at the level provided, this specific level of your application does not raise sql injection concerns here.
Here is an example of a case where sql injection might very well happen.
CREATE OR REPLACE FUNCTION ban_user() returns trigger
language plpgsql security definer as
$$
begin
insert into banned_users (username) values (new.username);
execute 'alter user ' || new.username || ' WITH VALID UNTIL ''YESTERDAY''';
return new;
end;
Note that utility functions cannot be parameterized as you indicate, and we forgot to quote_ident() around new.username, thus making the field vulnerable.
CREATE OR REPLACE VIEW banned_users_today AS
SELECT username FROM banned_users where banned_date = 'today';
CREATE TRIGGER i_banned_users_today INSTEAD OF INSERT ON banned_users_today
FOR EACH ROW EXECUTE PROCEDURE ban_user();
EXECUTE 'insert into banned_users_today (username) values ($1)'
USING 'postgres with password ''boo''; drop function ban_user() cascade; --';
So no it doesn't completely solve the problem even if used everywhere it can be used. And proper use of quote_literal() and quote_ident() don't always solve the problem either.
The thing is that the problem can always be at a lower level than the query you are executing.
The bound parameters prevent garbage to manipulate the statement into doing anything other than what it's intended to do.
This guarantees no possibility for SQL-injection attacks short of a Postgres bug. (See H2C03's link for examples of what could go wrong.)
I imagine the "much less prone to SQL-injection attacks" amounts to CYA verbiage were such a thing were to arise.
SQL injection is usually associated with large data dumps on pastebin.com and such scenario won't work here even if the example used contatenation not variables. It's because that COUNT(*) will aggregate all data you'd be trying to steal.
But I can imagine scenarios where count of arbitrary record would be sufficiently valuable information - e.g. number of competitor's clients, number of sold products etc. And actually recalling some of the really tricky blind SQL injection methods it might be possible to build a query that using COUNT alone would allow to iteratively recover actual text from the database.
It would be also much easier to exploit on a database sufficiently old and misconfigured to allow the ; separator, in which case the attacker might just append a completely separate query.

DB2 deadlock timeout Sqlstate: 40001, reason code 68 due to update statements called from servlet using SQL

I am calling update statements one after the other from a servlet to DB2. I am getting error sqlstate 40001, reason code 68 which i found it is due to deadlock timeout.
How can I resolve this issue?
Can it be resolved by setting query timeout?
If yes then how to use it with update statements in servlet or where to use it?
The reason code 68 already tells you this is due to a lock timeout (deadlock is reason code 2) It could be due to other users running queries at the same time that use the same data you are accessing, or your own multiple updates.
Begin by running db2pd -db locktest -locks show detail from a db2 command line to see where the locks are. You'll then need to run something like:
select tabschema, tabname, tableid, tbspaceid
from syscat.tables where tbspaceid = # and tableid = #
filling in the # symbols with the ID number you get from the db2pd command output.
Once you see where the locks are, here are some tips:
◦Deadlock frequency can sometimes be reduced by ensuring that all applications access their common data in the same order – meaning, for example, that they access (and therefore lock) rows in Table A, followed by Table B, followed by Table C, and so on.
taken from: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.trb.doc/doc/t0055074.html
recommended reading: http://www.ibm.com/developerworks/data/library/techarticle/dm-0511bond/index.html
Addendum: if your servlet or another guilty application is using select statements found to be involved in the deadlock, you can try appending with ur to the select statements if accuracy of the newly updated (or inserted) data isn't important.
For me, the solution was adding FOR READ ONLY WITH UR at the end of all my SELECT statements. (Apparently my select statements were returning so much data, it locked the tables long enough to interfere with other SQL statements)
See https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/sqlref/src/tpc/db2z_sql_isolationclause.html

Multiple prepared statements disrupt a transaction using DBD::Sybase

In my Perl script, I use DBD::Sybase (via DBI module) to connect to a SQL Server 2008. The base program as below runs without problem:
use DBI;
# assign values to $host, $usr, $pwd
my $dbh = DBI->connect("dbi:Sybase:$host", $usr, $pwd);
$dbh->do("BEGIN TRAN tr1");
my $update = $dbh->prepare("UPDATE mytable SET qty = ? where name = ?");
$update->execute(100, 'apple');
$dbh->do("END TRAN tr1");
however, if I insert one more prepare statement right before the existing prepare statement, to have the program look like:
...
my $insert = $dbh->prepare("INSERT INTO mytable (name, qty) VALUES (?, ?)");
my $update = $dbh->prepare("UPDATE mytable SET qty = ? where name = ?");
...
and the rest is all the same, then when I run it, I got:
DBD::Sybase::db do failed: Server message number=3902 severity=16 state=1 line=1 server=xxx text=The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.
So looks like the additional prepare statement somehow disrupted the entire transaction flow. I had been running the same code via the DBD::ODBC driver with no problem against a SQL SERVER 2005. (But my firm upgraded to 2008 and I had to use the DBD::Sybase to get around some other problems.)
Any help / suggestion on how to resolve this issue would be much appreciated. In particular, using a different db handle for the other prepare is not a desired solution since that will beat the purpose of having them in a single transaction.
UPDATE: Turns out if I execute at least once on the additional insert, then the program is again run fine. So looks like every prepared statement needs to be run under Sybase. But that isn't a requirement with ODBC and isn't a reasonable requirement in general. Anyway to get around it?
You are learning perl AND Sybase basics and making several incorrect conclusions.
Forget about what it does under ODBC for a moment. ODBC most probably has AUTOCOMMIT turned on, and thus you have no transaction control whatsoever. (Why anyone would use ODBC when the DBD:: supports DB-Lib and CT-Lib is beyond me, but that's a separate story.)
Re: "So looks like every prepared statement needs to be run under Sybase."
Rawheiser is correct. What exactly do you expect to achieve by preparing a batch but performing a Do instead ? Where else do you expect to execute the batch prepared under Sybase, other than under Sybase?
Do vs prepare/execute are quite different. prepare/execute for Sybase works just fine in millions of programs. you just have to learn what it does, not what you think it should do. prepare let's you load a batch, a block of commands terminated by GO in the normal Sybase sense. Execute executes the prepared batch (supplies the GO and sends the batch to the server), and captures whatever is returned (according to whatever array/variables you have set).
Do is immediate, single command, with no prepare. A prepare+execute combined.
Performing only single-statement do's, and only dynamic SQL, simply because that's all that you could get to work, is very limiting and quite unnecessary.
You currently have:
Prepare:
UPDATE
Execute (100)
ExecuteImmediate(Do):
COMMIT TRAN
So of course, there is no BEGIN TRAN. (The first "do" executed, the BEGIN TRAN is gone)
I think what you want (intended originally) is this. Forget the 'do':
Prepare:
BEGIN TRAN
UPDATE
COMMIT TRAN
Execute (100)
Then change it to:
BEGIN TRAN
INSERT
UPDATE
COMMIT TRAN
Execute (100)
Your $update and $insert will confuse you (you're executing a multi-statement batch, right ?not a isolated single command in the middle of a prepare batch). If you get rid of them, and think in terms of $execute [whatever you have prepared in the batch], it might help you to understand the problem better.
Do not form conclusions until you have all the above working as intended.
And read up on BEGIN/COMMIT TRAN.
Last, What exactly is a "END TRAN" ? I do not think the code block you have posted is real.
Don't dynamically create SQL, it is dangerous (sql injection).
You should be able to prepare multiple inserts/updates and your link to the DBI documentation does not say you cannot, it says some drivers may not be able to tell you much about a statement which is ONLY prepared.
I'd post a failing example with error to the dbi-users list for comment as the DBD::Sybase maintainer hangs out there (see dbi.perl.org).
Turns out that DBI's prepare method is not quite portable across various database drivers as noted here. For the Sybase driver, it is most likely that prepare is not working as intended. One way to tell is that after running prepare, the variable $insert->{NUM_OF_FIELDS} is undefined.
To get around the problem, do one of the following:
1) do not prepare anything. Just dynamically construct the statement in text string and run $dbh->do($stmt), or
2) run finish on all outstanding statement handles (under that database handle) before running COMMIT TRAN. I personally prefer this way much better.

DB2 Equivalent to SQL's GO?

I have written a DB2 query to do the following:
Create a temp table
Select from a monster query / insert into the temp table
Select from the temp table / delete from old table
Select from the temp table / insert into a different table
In MSSQL, I am allowed to run the commands one after another as one long query. Failing that, I can delimit them with 'GO' commands. When I attempt this in DB2, I get the error:
DB2CLI.DLL: ERROR [42601] [IBM][CLI Driver][DB2] SQL0199N The use of the reserved
word "GO" following "" is not valid. Expected tokens may include: "".
SQLSTATE=42601
What can I use to delimit these instructions without the temp table going out of scope?
GO is something that is used in MSSQL Studio, I have my own app for running upates into live and use "GO" to break the statements apart.
Does DB2 support the semi-colon (;)? This is a standard delimiter in many SQL implementations.
have you tried using just a semi-colon instead of "GO"?
This link suggests that the semi-colon should work for DB2 - http://www.scribd.com/doc/16640/IBM-DB2
I would try wrapping what you are looking to do in BEGIN and END to set the scope.
GO is not a SQL command, it's not even a TSQL command. It is an instruction for the parser. I don't know DB2, but I would imagine that GO is not neccessary.
From Devx.com Tips
Although GO is not a T-SQL statement, it is often used in T-SQL code and unless you know what it is it can be a mystery. So what is its purpose? Well, it causes all statements from the beginning of the script or the last GO statement (whichever is closer) to be compiled into one execution plan and sent to the server independent of any other batches.