JDBC batch for multiple prepared statements - postgresql

Is it possible to batch together commits from multiple JDBC prepared statements?
In my app the user will insert one or more records along with records in related tables. For example, we'll need to update a record in the "contacts" table, delete related records in the "tags" table, and then insert a fresh set of tags.
UPDATE contacts SET name=? WHERE contact_id=?;
DELETE FROM tags WHERE contact_id=?;
INSERT INTO tags (contact_id,tag) values (?,?);
// insert more tags as needed here...
These statements need to be part of a single transaction, and I want to do them in a single round trip to the server.
To send them in a single round-trip, there are two choices: for each command create a Statement and then call .addBatch(), or for each command create a PreparedStatement, and then call .setString(), .setInt() etc. for parameter values, then call .addBatch().
The problem with the first choice is that sending a full SQL string in the .addBatch() call is inefficient and you don't get the benefit of sanitized parameter inputs.
The problem with the second choice is that it may not preserve the order of the SQL statements. For example,
Connection con = ...;
PreparedStatement updateState = con.prepareStatement("UPDATE contacts SET name=? WHERE contact_id=?;");
PreparedStatement deleteState = con.prepareStatement("DELETE FROM contacts WHERE contact_id=?;");
PreparedStatement insertState = con.prepareStatement("INSERT INTO tags (contact_id,tag) values (?,?);");
updateState.setString(1, "Bob");
updateState.setInt(1, 123);
updateState.addBatch();
deleteState.setInt(1, 123);
deleteState.addBatch();
... etc ...
... now add more parameters to updateState, and addBatch()...
... repeat ...
con.commit();
In the code above, are there any guarantees that all of the statements will execute in the order we called .addBatch(), even across different prepared statements? Ordering is obviously important; we need to delete tags before we insert new ones.
I haven't seen any documentation that says that ordering of statements will be preserved for a given connection.
I'm using Postgres and the default Postgres JDBC driver, if that matters.

The batch is per statement object, so a batch is executed per executeBatch() call on a Statement or PreparedStatement object. In other words, this only executes the statements (or value sets) associated with the batch of that statement object. It is not possible to 'order' execution across multiple statement objects. Within an individual batch, the order is preserved.
If you need statements executed in a specific order, then you need to explicitly execute them in that order. This either means individual calls to execute() per value set, or using a single Statement object and generating the statements in the fly. Due to the potential of SQL injection, this last approach is not recommended.

Related

Get data copied by a function

I have a quite complicated data structure that lies in several tables. I have a function that makes a copy of that structure. I want to make a copy and get newly created data in a single query like this:
SELECT
*
FROM
main_table
JOIN other_table
ON (main_table.id = other_table.main_id)
WHERE
main_table.id = make_copy(old_id);
The copy is successfully created, but is not returned by the above query. I guess it is not yet visible for the outer query or somehow committed.
I have also tried to use WITH ... SELECT ... but with no success...
The function make_copy(id) is declared as VOLATILE because it modifies the database, and multiple calls with the same parameter will create multiple copies.
Possible solution could be that make_copy(id) function would return the whole new data structure (SELECT * FROM make_copy(old_id)) but it would require many aliasing (many tables have id or name column). Also I would end up with many places to build (read) that data structure.
How can I call that function and use its result (and all side effects) in one query?
I'm afraid that's not possible without splitting it into two queries.
CTE can't help you - Data-Modifying Statements in WITH (See there example with updating table inside of the cte):
...The sub-statements in WITH are executed concurrently with each
other and with the main query. Therefore, when using data-modifying
statements in WITH, the order in which the specified updates actually
happen is unpredictable. All the statements are executed with the same
snapshot (see Chapter 13), so they cannot “see” one another's effects
on the target tables. This alleviates the effects of the
unpredictability of the actual order of row updates, and means that
RETURNING data is the only way to communicate changes between
different WITH sub-statements and the main query...
And I guess you cannot do this with function either - Function Volatility Categories:
For functions written in SQL or in any of the standard procedural
languages, there is a second important property determined by the
volatility category, namely the visibility of any data changes that
have been made by the SQL command that is calling the function. A
VOLATILE function will see such changes, a STABLE or IMMUTABLE
function will not. ... VOLATILE functions obtain a fresh snapshot at
the start of each query they execute.

Postgresql: split cell containing column names (WHERE Metacolumn='col1;col2;col3;..') apart into array to dynamically generate INSERT statement

In Postgresql (and Sybase ADS), I am making my own trigger-based multimaster replication across both platforms which must dynamically handle various composite keys and sometimes no PK on certain tables. To make it easiest, I am trying to auto generate the INSERT/UPDATE/DELETE where the user can choose which columns they want to copy over by listing column names in a cell separated by semicolon.
-"SELECT Address, city, us_state, zipcode FROM public.place;" would be a table that needs to replicate.
-The Metatable for Publication/Subscriptions would have a cell containing 'Address;city;us_state;zipcode'.
-I am using Insert/update/delete triggers to capture new row data and want to use the columns to dynamically make a statement like
"insert into place (Address,city,us_state,zipcode) VALUES (NEW.Address,NEW.city,NEW.us_state,NEW.zipcode);" which can be read and executed on the desination via script. I will do the same action for UPDATE and DELETE, using OLD prefix in the UPDATE and DELETE generated statements where needed.
I am not looking for someone to do a bunch of work, but to give an idea of any functions, logic and statements involved. Thank you for any ideas or advice.
You can split a String and create an Array using the regexp_split_to_array function.
Probably something like: regexp_split_to_array(metacolumn, ';')
More info about string functions: https://www.postgresql.org/docs/9.6/functions-string.html

PHP and sanitizing strings for use in dynamicly created DB2 queries

I'm relatively new to DB2 for IBMi and am wondering the methods of how to properly cleanse data for a dynamically generated query in PHP.
For example if writing a PHP class which handles all database interactions one would have to pass table names and such, some of which cannot be passed in using db2_bind_param(). Does db2_prepare() cleanse the structured query on its own? Or is it possible a malformed query can be "executed" within a db2_prepare() call? I know there is db2_execute() but the db is doing something in db2_prepare() and I'm not sure what (just syntax validation?).
I know if the passed values are in no way effected by the result of user input there shouldn't be much of an issue, but if one wanted to cleanse data before using it in a query (without using db2_prepare()/db2_execute()) what is the checklist for db2? The only thing I can find is to escape single quotes by prefixing them with another single quote. Is that really all there is to watch out for?
There is no magic "cleansing" happening when you call db2_prepare() -- it will simply attempt to compile the string you pass as a single SQL statement. If it is not a valid DB2 SQL statement, the error will be returned. Same with db2_exec(), only it will do in one call what db2_prepare() and db2_execute() do separately.
EDIT (to address further questions from the OP).
Execution of every SQL statement has three stages:
Compilation (or preparation), when the statement is parsed, syntactically and semantically analyzed, the user's privileges are determined, and the statement execution plan is created.
Parameter binding -- an optional step that is only necessary when the statement contains parameter markers. At this stage each parameter data type is verified to match what the statement text expects based on the preparation.
Execution proper, when the query plan generated at step 1 is performed by the database engine, optionally using the parameter (variable) values provided at step 2. The statement results, if any, are then returned to the client.
db2_prepare(), db2_bind_param(), and db2_execute() correspond to steps 1, 2 and 3 respectively. db2_exec() combines steps 1 and 3, skipping step 2 and assuming the absence of parameter markers.
Now, speaking about parameter safety, the binding step ensures that the supplied parameter values correspond to the expected data type constraints. For example, in the query containing something like ...WHERE MyIntCol = ?, if I attempt to bind a character value to that parameter it will generate an error.
If instead I were to use db2_exec() and compose a statement like so:
$stmt = "SELECT * FROM MyTab WHERE MyIntCol=" . $parm
I could easily pass something like "0 or 1=1" as the value of $parm, which would produce a perfectly valid SQL statement that only then will be successfully parsed, prepared and executed by db2_exec().

SQL Anywhere, Entity Framework 4 and Transactions

I have a process in my program that uses an Entity Framework 4 EDM. The entity context object contains function imports for calling stored procedures.
The process receives a batch of data from a remote server. The batch can consist of data for any of our tables / data types (each data type is stored in its own table). The batch can also contain data for the same row multiple times. It has to handle this as a single insert (for the first occurance) and one or more updates (for each subsequent occurance). The stored procedures therefore implement an upsert operation using the INSERT ... ON EXISTING UPDATE command.
Our code basically determines which stored procedure to call and then calls it using the entity context object's method for that stored procedure. Then entire batch has to be done in a single transaction, so we call context.Connection.BeginTransaction() at the beginning of the batch.
There is one data type that has millions of rows. We need to load that data as quickly as possible. I'm implementing logic to import that data type using the SABulkCopy class. This also needs to be a part of the single transaction already started. The issue is that I need to pass an SATransaction to the SABulkCopy class's constructor (there is no way to set it it using properties) and I don't have an SATransaction. context.Connection.BeginTransaction() returns a DBTransaction. I tried to cast this into an SATransaction without success.
What's the right way to get the SABulkCopy object to join the transaction?
We gave up on the SABulkCopy class. It turns out that it doesn't do a bulk load. It creates an SACommand object that executes an INSERT statement and inserts the rows one at a time. And it does it inefficiently, to boot.
I still needed to get at the SATransaction associated with the DBTransaction returned by context.Connection.BeginTransaction(). I was given some reflection code that does this in response to another question I posted about this:
SATransaction saTransaction = (SATransaction) dbTransaction.GetType()
.InvokeMember( "StoreTransaction",
BindingFlags.FlattenHierarchy | BindingFlags.NonPublic | BindingFlags.InvokeMethod |
BindingFlags.Instance | BindingFlags.GetProperty | BindingFlags.NonPublic,
null, dbTransaction, new object[ 0 ] );
The program does what it needs to do. It's unfortunate, though, that Microsoft didn't make the StoreTransaction property of the EntityTransaction class public.

What is a "batch", and why is GO used?

I have read and read over MSDN, etc. Ok, so it signals the end of a batch.
What defines a batch? I don't see why I need go when I'm pasting in a bunch of scripts to be run all at the same time.
I've never understood GO. Can anyone explain this better and when I need to use it (after how many or what type of transactions)?
For example why would I need GO after each update here:
UPDATE [Country]
SET [CountryCode] = 'IL'
WHERE code = 'IL'
GO
UPDATE [Country]
SET [CountryCode] = 'PT'
WHERE code = 'PT'
GO is not properly a TSQL command.
Instead it's a command to the specific client program which connects to an SQL server (Sybase or Microsoft's - not sure about what Oracle does), signalling to the client program that the set of commands that were input into it up till the "go" need to be sent to the server to be executed.
Why/when do you need it?
GO in MS SQL server has a "count" parameter - so you can use it as a "repeat N times" shortcut.
Extremely large updates might fill up the SQL server's log. To avoid that, they might need to be separated into smaller batches via go.
In your example, if updating for a set of country codes has such a volume that it will run out of log space, the solution is to separate each country code into a separate transaction - which can be done by separating them on the client with go.
Some SQL statements MUST be separated by GO from the following ones in order to work.
For example, you can't drop a table and re-create the same-named table in a single transaction, at least in Sybase (ditto for creating procedures/triggers):
> drop table tempdb.guest.x1
> create table tempdb.guest.x1 (a int)
> go
Msg 2714, Level 16, State 1
Server 'SYBDEV', Line 2
There is already an object named 'x1' in the database.
> drop table tempdb.guest.x1
> go
> create table tempdb.guest.x1 (a int)
> go
>
GO is not a statement, it's a batch separator.
The blocks separated by GO are sent by the client to the server for processing and the client waits for their results.
For instance, if you write
DELETE FROM a
DELETE FROM b
DELETE FROM c
, this will be sent to the server as a single 3-line query.
If you write
DELETE FROM a
GO
DELETE FROM b
GO
DELETE FROM c
, this will be sent to the server as 3 one-line queries.
GO itself does not go to the server (no pun intended). It's a pure client-side reserved word and is only recognized by SSMS and osql.
If you will use a custom query tool to send it over the connection, the server won't even recognize it and issue an error.
Many command need to be in their own batch, like CREATE PROCEDURE
Or, if you add a column to a table, then it should be in its own batch.
If you try to SELECT the new column in the same batch it fails because at parse/compile time the column does not exist.
GO is used by the SQL tools to work this out from one script: it is not a SQL keyword and is not recognised by the engine.
These are 2 concrete examples of day to day usage of batches.
Edit: In your example, you don't need GO...
Edit 2, example. You can't drop, create and permission in one batch... not least, where is the end of the stored procedure?
IF OBJECT_ID ('dbo.uspDoStuff') IS NOT NULL
DROP PROCEDURE dbo.uspDoStuff
GO
CREATE PROCEDURE dbo.uspDoStuff
AS
SELECT Something From ATable
GO
GRANT EXECUTE ON dbo.uspDoStuff TO RoleSomeOne
GO
Sometimes there is a need to execute the same command or set of commands over and over again. This may be to insert or update test data or it may be to put a load on your server for performance testing. Whatever the need the easiest way to do this is to setup a while loop and execute your code, but in SQL 2005 there is an even easier way to do this.
Let's say you want to create a test table and load it with 1000 records. You could issue the following command and it will run the same command 1000 times:
CREATE TABLE dbo.TEST (ID INT IDENTITY (1,1), ROWID uniqueidentifier)
GO
INSERT INTO dbo.TEST (ROWID) VALUES (NEWID())
GO 1000
source:
http://www.mssqltips.com/tip.asp?tip=1216
Other than that it marks the "end" of an SQL block (e.g. in a stored procedure)... Meaning you're on a "clean" state again... e.G: Parameters used in the statement before the code are reset (not defined anymore)
As everyone already said, "GO" is not part of T-SQL. "GO" is a batch separator in SSMS, a client application used to submit queries to the database. This means that declared variables and table variables will not persist from code before the "GO" to code following it.
In fact, GO is simply the default word used by SSMS. This can be changed in the options if you want. For a bit of fun, change the option on someone else's system to use "SELECT" as a batch seperator instead of "GO". Forgive my cruel chuckle.
It is used to split logical blocks. Your code is interpreted into sql command line and this indicate next block of code.
But it could be used as recursive statement with specific number.
Try:
exec sp_who2
go 2
Some statement have to be delimited by GO:
use DB
create view thisViewCreationWillFail