Fast Array Inserts with Postgres - postgresql

In Oracle OCI and OCCI there are API facilities to perform array inserts where you build up an array of values in the client and send this array along with a prepared statement to the server to insert thousands of entries into a table in a single shot resulting in huge performance improvements in some scenarios. Is there anything similar in PostgreSQL ?
I am using the stock PostgreSQL C API.
Some pseudo code to illustrate what i have in mind:
stmt = con->prepare("INSERT INTO mytable VALUES ($1, $2, $3)");
pg_c_api_array arr(stmt);
for triplet(a, b, c) in mylongarray:
pg_c_api_variant var = arr.add();
var.bind(1, a);
var.bind(2, b);
var.bind(3, c);
stmt->bindarray(arr);
stmt->exec()

PostgreSQL has similar functionality - statement COPY and COPY API - it is very fast
libpq documentation
char *data = "10\t20\40\n20\t30\t40";
pres = PQexec(pconn, "COPY mytable FROM stdin");
/* can be call repeatedly */
copy_result = PQputCopyData(pconn, data, sizeof(data));
if (copy_result != 1)
{
fprintf(stderr, "Copy to target table failed: %s\n",
PQerrorMessage(pconn));
EXIT;
}
if (PQputCopyEnd(pconn, NULL) == -1)
{
fprintf(stderr, "Copy to target table failed: %s\n",
PQerrorMessage(pconn));
EXIT;
}
pres = PQgetResult(pconn);
if (PQresultStatus(pres) != PGRES_COMMAND_OK)
{
fprintf(stderr, "Copy to target table failed:%s\n",
PQerrorMessage(pconn));
EXIT;
}
PQclear(pres);

As Pavel Stehule points out, there is the COPY command and, when using libpq in C, associated functions for transmitted the copy data. I haven't used these. I mostly program against PostgreSQL in Python, have have used similar functionality from psycopg2. It's extremely simple:
conn = psycopg2.connect(CONN_STR)
cursor = conn.cursor()
f = open('data.tsv')
cusor.copy_from(f, 'incoming')
f.close()
In fact I've often replaced open with a file-like wrapper object that performs some basic data cleaning first. It's pretty seamless.

I like this way of creating thousands of rows in a single command:
INSERT INTO mytable VALUES (UNNEST($1), UNNEST($2), UNNEST($3));
Bind an array of the values of columnĀ 1 to $1, an array of the values of columnĀ 2 to $2 etc.! Providing the values in columns may seem a bit strange at first when you are used to thinking in rows.
You need PostgreSQL >= 8.4 for UNNEST or your own function to convert arrays into sets.

Related

Inserting many rows causes locking conflicts with Hibernate and Postgres, leaving the table empty

We are benchmarking some queries to see if they will still work reliably for "a lot of" data. (1 million isn't that much to be honest, but Postgres already fails here, so it evidently is.)
Our Java code to call this queries looks something like that:
#PersistenceContext
private EntityManager em;
#Resource
private UserTransaction utx;
for (int i = 0; i < 20; i++) {
this.utx.begin();
for (int inserts = 0; inserts < 50_000; inserts ++) {
em.createNativeQuery(SQL_INSERT).executeUpdate();
}
this.utx.commit();
for (int parameter = 0; parameter < 25; parameter ++)
long time = System.currentTimeMillis();
Assert.assertNotNull(this.em.createNativeQuery(SQL_SELECT).getResultList());
System.out.println(i + " iterations \t" + parameter + "\t" + (System.currentTimeMillis() - time) + "ms");
}
}
Or with plain JDBC:
Connection connection = //...
for (int i = 0; i < 20; i++) {
for (int inserts = 0; inserts < 50_000; inserts ++) {
try (Statement statement = connection.createStatement();) {
statement.execute(SQL_INSERT);
}
}
for (int parameter = 0; parameter < 25; parameter ++)
long time = System.currentTimeMillis();
try (Statement statement = connection.createStatement();) {
statement.execute(SQL_SELECT);
}
System.out.println(i + " iterations \t" + parameter + "\t" + (System.currentTimeMillis() - time) + "ms");
}
}
The queries we tried were a simple INSERT into a table with JSON and a INSERT over two tables with about 25 lines. The SELECT has one or two JOINs and is pretty easy. One set of queries is (I had to anonymize the SQL else I wouldn't have been allowed to post it):
CREATE TABLE ts1.p (
id integer NOT NULL,
CONSTRAINT p_pkey PRIMARY KEY ("id")
);
CREATE TABLE ts1.m(
pId integer NOT NULL,
mId character varying(100) NOT NULL,
a1 character varying(50),
a2 character varying(50),
CONSTRAINT m_pkey PRIMARY KEY (pI, mId)
);
CREATE SEQUENCE ts1.seq_p;
/*
* SQL_INSERT
*/
WITH p AS (
INSERT INTO ts1.p (id)
VALUES (nextval('ts1.seq_p'))
RETURNING id AS pId
)
INSERT INTO ts1.m(pId, mId, a1, a2)
VALUES ((SELECT pId from p), 'M1', '11', '12'),
((SELECT pId from p), 'M2', '13', '14'),
/* ... about 20 to 25 rows of values */
/*
* SQL_SELECT
*/
WITH userInput (mId, a1, a2) AS (
VALUES
('M1', '11', '11'),
('M2', '12', '15'),
/* ... about "parameter" rows of values */
)
SELECT m.pId, COUNT(m.a1) AS matches
FROM userInput u
LEFT JOIN ts1.m m ON (m.mId) = (u.mId)
WHERE (m.a1 IS NOT DISTINCT FROM u.a1) AND
(m.a2 IS NOT DISTINCT FROM u.a2) OR
(m.a1 IS NULL AND m.a2 IS NULL)
GROUP BY m.pId
/* plus HAVING, additional WHERE clauses etc. according to the use case, but that just speeds up the query */
When executing, we get the following output (the values are supposed to rise steadly and linearly):
271ms
414ms
602ms
820ms
995ms
1192ms
1396ms
1594ms
1808ms
1959ms
110ms
33ms
14ms
10ms
11ms
10ms
21ms
8ms
13ms
10ms
As you can see, after some value (usually at around 300,000 to 500,000 inserts) the time needed for the query drops significantly. Sadly we can't really debug what the result is at that point (other than that it's not null), but we assume it's an empty list, because the database tables are empty.
Let me repeat that: After half a million INSERTS, Postgres clears tables.
Of course that's not acceptable at all.
We tried different queries, all of easy to medium difficulty, and all produced this behavior, so we assume it's not the queries.
We thought that maybe the sequence returned a value too high for a column integer, so we droped and recreated the sequence.
Once there was this exception:
org.postgresql.util.PSQLException : FEHLER: Verklemmung (Deadlock) entdeckt
Detail: Prozess 1620 wartet auf AccessExclusiveLock-Sperre auf Relation 2001098 der Datenbank 1937678; blockiert von Prozess 2480.
Which I'm entirely unable to translate. I guess it's something like:
org.postgresql.util.PSQLException : ERROR: Jamming? Clamping? Constipation? (Deadlock) found
But I don't think this error has anything to do with the clearing of the table. We just tested against the wrong database, so multiple queries were run on the same table. Normally we have one database per benchmark test.
Of course it's important that we find out what the error is, so that we can decide if there is any risk to our customers losing their data (because again, on error the database empties some table of its choice).
Postgres version: PostgreSQL 10.6, compiled by Visual C++ build 1800, 64-bit
We tried PostgreSQL 9.6.11, compiled by Visual C++ build 1800, 64-bit, too. And we never had the same problem there (even though that could just be luck, since it's not 100% reproducible).
Do you have any idea what the error is? Or how we could debug it? The entire benchmark test runs for an hour, so there is no immediate feedback.

very large fields in As400 ISeries database

I would like to save a large XML string (possibly longer than 32K or 64K) into an AS400 file field. Either DDS or SQL files would be OK. Example of SQL file below.
CREATE TABLE MYLIB/PRODUCT
(PRODCODE DEC (5 ) NOT NULL WITH DEFAULT,
PRODDESC CHAR (30 ) NOT NULL WITH DEFAULT,
LONGDESC CLOB (70K ) ALLOCATE(1000) NOT NULL WITH DEFAULT)
We would use RPGLE to read and write to fields.
The goal is to then pull out data via ODBC connection on a client side.
AS400 character fields seem to have 32K limit, so this is not great option.
What options do I have? I have been reading up on CLOBs but there appear to be restrictions writing large strings to CLOBS and reading CLOB field remotely. Note that client is (still) on v5R4 of AS400 OS.
thanks!
Charles' answer below shows how to extract data. I would like to insert data. This code runs, but throws a '22501' SQL error.
D wLongDesc s 65531a varying
D longdesc s sqltype(CLOB:65531)
/free
//eval longdesc = *ALL'123';
eval Wlongdesc = '123';
exec SQL
INSERT INTO PRODUCT (PRODCODE, PRODDESC, LONGDESC)
VALUES (123, 'Product Description', :LongDesc );
if %subst(sqlstt:1:2) <> '00';
// an error occurred.
endif;
// get length explicitly, variables are setup by pre-processor
longdesc_len = %len(%trim(longdesc_data));
wLongDesc = %subst(longdesc_data:1:longdesc_len);
/end-free
C Eval *INLR = *on
C Return
Additional question: Is this technique suitable for storing data which I want to extract via ODBC connection later? Does ODBC read CLOB as pointer or can it pull out text?
At v5r4, RPGLE actually supports 64K character variables.
However, the DB is limited to 32K for regular char/varchar fields.
You'd need to use a CLOB for anything bigger than 32K.
If you can live with 64K (or so )
CREATE TABLE MYLIB/PRODUCT
(PRODCODE DEC (5 ) NOT NULL WITH DEFAULT,
PRODDESC CHAR (30 ) NOT NULL WITH DEFAULT,
LONGDESC CLOB (65531) ALLOCATE(1000) NOT NULL WITH DEFAULT)
You can use RPGLE SQLTYPE support
D code S 5s 0
d wLongDesc s 65531a varying
D longdesc s sqltype(CLOB:65531)
/free
exec SQL
select prodcode, longdesc
into :code, :longdesc
from mylib/product
where prodcode = :mykey;
wLongDesc = %substr(longdesc_data:1:longdesc_len);
DoSomthing(wLongDesc);
The pre-compiler will replace longdesc with a DS defined like so:
D longdesc ds
D longdesc_len 10u 0
D longdesc_data 65531a
You could simply use it directly, making sure to only use up to longdesc_len or covert it to a VARYING as I've done above.
If absolutely must handle larger than 64K...
Upgrade to a supported version of the OS (16MB variables supported)
Access the CLOB contents via an IFS file using a file reference
Option 2 is one I've never seen used....and I can't find any examples. Just saw it mentioned in this old article..
http://www.ibmsystemsmag.com/ibmi/developer/general/BLOBs,-CLOBs-and-RPG/?page=2
This example shows how to write to a CLOB field in Db2 database... with help from Charles and Mr Murphy's feedback.
* ----------------------------------------------------------------------
* Create table with CLOB:
* CREATE TABLE MYLIB/PRODUCT
* (MYDEC DEC (5 ) NOT NULL WITH DEFAULT,
* MYCHAR CHAR (30 ) NOT NULL WITH DEFAULT,
* MYCLOB CLOB (65531) ALLOCATE(1000) NOT NULL WITH DEFAULT)
* ----------------------------------------------------------------------
D PRODCODE S 5i 0
D PRODDESC S 30a
D i S 10i 0
D wLongDesc s 65531a varying
D longdesc s sqltype(CLOB:65531)
D* Note that variables longdesc_data and longdesc_len
D* get create automatocally by SQL pre-processor.
/free
eval wLongdesc = '123';
longdesc_data = wLongDesc;
longdesc_len = %len(%trim(wLongDesc));
exec SQL set option commit = *none;
exec SQL
INSERT INTO PRODUCT (MYDEC, MYCHAR, MYCLOB)
VALUES (123, 'Product Description',:longDesc);
if %subst(sqlstt:1:2)<>'00' ;
// an error occurred.
endif;
Eval *INLR = *on;
Return;
/end-free

Can't get mysql_query to work, mysql_error displays nothing

Recently converted from mysql to mysqli and things were working fine. Connect, select still work fine but now one query fails. Here is the basic code:
Connect to database ($con) - successful.
mysqli_query to select some data from table1 (fn1, ln1, yr1) - successful.
Table data goes to $fn, $ln, $yr after mysql_fetch_array - successful.
Use the data to form an insert:
$sql = "insert into table2 (fn2, ln2, yr2) values ('$fn', '$ln', '$yr')";
mysql-query($con, $sql) or die ("Insert failed: " . mysqli_error($con));
The query fail with the Insert failed message but no reason from mysql_error.
What have I missed?
I've try it. It work. Insert it's ok!
$link = new mysqli($dbhost, $dbuser, $dbpass, $dbname);
$qry = "insert into reputazione (iduser,star,votante,commento)
values(10,5,\"pippo\",\"commento\")";
mysqli_query($link, $qry);
if (mysqli_affected_rows($link) > 0) {
echo "ok!";
}
Okay, I solved the problem by coding all the mysqli commands in-line and not calling functions and passing the sql statements to them. I had functions for connecting to the DB, selecting from one table, inserting into another table and then deleting from the first table.

Psycopg2 insert python dictionary in postgres database

In python 3+, I want to insert values from a dictionary (or pandas dataframe) into a database. I have opted for psycopg2 with a postgres database.
The problems is that I cannot figure out the proper way to do this. I can easily concatenate a SQL string to execute, but the psycopg2 documentation explicitly warns against this. Ideally I wanted to do something like this:
cur.execute("INSERT INTO table VALUES (%s);", dict_data)
and hoped that the execute could figure out that the keys of the dict matches the columns in the table. This did not work. From the examples of the psycopg2 documentation I got to this approach
cur.execute("INSERT INTO table (" + ", ".join(dict_data.keys()) + ") VALUES (" + ", ".join(["%s" for pair in dict_data]) + ");", dict_data)
from which I get a
TypeError: 'dict' object does not support indexing
What is the most phytonic way of inserting a dictionary into a table with matching column names?
Two solutions:
d = {'k1': 'v1', 'k2': 'v2'}
insert = 'insert into table (%s) values %s'
l = [(c, v) for c, v in d.items()]
columns = ','.join([t[0] for t in l])
values = tuple([t[1] for t in l])
cursor = conn.cursor()
print cursor.mogrify(insert, ([AsIs(columns)] + [values]))
keys = d.keys()
columns = ','.join(keys)
values = ','.join(['%({})s'.format(k) for k in keys])
insert = 'insert into table ({0}) values ({1})'.format(columns, values)
print cursor.mogrify(insert, d)
Output:
insert into table (k2,k1) values ('v2', 'v1')
insert into table (k2,k1) values ('v2','v1')
I sometimes run into this issue, especially with respect to JSON data, which I naturally want to deal with as a dict. Very similar. . .But maybe a little more readable?
def do_insert(rec: dict):
cols = rec.keys()
cols_str = ','.join(cols)
vals = [ rec[k] for k in cols ]
vals_str = ','.join( ['%s' for i in range(len(vals))] )
sql_str = """INSERT INTO some_table ({}) VALUES ({})""".format(cols_str, vals_str)
cur.execute(sql_str, vals)
I typically call this type of thing from inside an iterator, and usually wrapped in a try/except. Either the cursor (cur) is already defined in an outer scope or one can amend the function signature and pass a cursor instance in. I rarely insert just a single row. . .And like the other solutions, this allows for missing cols/values provided the underlying schema allows for it too. As long as the dict underlying the keys view is not modified as the insert is taking place, there's no need to specify keys by name as the values will be ordered as they are in the keys view.
[Suggested answer/workaround - better answers are appreciated!]
After some trial/error I got the following to work:
sql = "INSERT INTO table (" + ", ".join(dict_data.keys()) + ") VALUES (" + ", ".join(["%("+k+")s" for k in dict_data]) + ");"
This gives the sql string
"INSERT INTO table (k1, k2, ... , kn) VALUES (%(k1)s, %(k2)s, ... , %(kn)s);"
which may be executed by
with psycopg2.connect(database='deepenergy') as con:
with con.cursor() as cur:
cur.execute(sql, dict_data)
Post/cons?
using %(name)s placeholders may solve the problem:
dict_data = {'key1':val1, 'key2':val2}
cur.execute("""INSERT INTO table (field1, field2)
VALUES (%(key1)s, %(key2)s);""",
dict_data)
you can find the usage in psycopg2 doc Passing parameters to SQL queries
Here is another solution inserting a dictionary directly
Product Model (has the following database columns)
name
description
price
image
digital - (defaults to False)
quantity
created_at - (defaults to current date)
Solution:
data = {
"name": "product_name",
"description": "product_description",
"price": 1,
"image": "https",
"quantity": 2,
}
cur = conn.cursor()
cur.execute(
"INSERT INTO products (name,description,price,image,quantity) "
"VALUES(%(name)s, %(description)s, %(price)s, %(image)s, %(quantity)s)", data
)
conn.commit()
conn.close()
Note: The columns to be inserted is specified on the execute statement .. INTO products (column names to be filled) VALUES ..., data <- the dictionary (should be the same **ORDER** of keys)

Psycopg2 copy_from throws DataError: invalid input syntax for integer

I have a table with some integer columns. I am using psycopg2's copy_from
conn = psycopg2.connect(database=the_database,
user="postgres",
password=PASSWORD,
host="",
port="")
print "Putting data in the table: Opened database successfully"
cur = conn.cursor()
with open(the_file, 'r') as f:
cur.copy_from(file=f, table = the_table, sep=the_delimiter)
conn.commit()
print "Successfully copied all data to the database!"
conn.close()
The error says that it expects the 8th column to be an integer and not a string. But, Python's write method can only read strings to the file. So, how would you import a file full of string representation of number to postgres table with columns that expect integer when your file can only have character representation of the integer (e.g. str(your_number)).
You either have to write numbers in integer format to the file (which Python's write method disallows) or psycopg2 should be smart enough to the conversion as part of copy_from procedure, which it apparently is not. Any idea is appreciated.
I ended up using copy_expert command. Note that on Windows, you have to set the permission of the file. This post is very useful setting permission.
with open(the_file, 'r') as f:
sql_copy_statement = "copy {table} FROM '"'{from_file}'"' DELIMITER '"'{deli}'"' {file_type} HEADER;".format(table = the_table,
from_file = the_file,
deli = the_delimiter,
file_type = the_file_type
)
print sql_copy_statement
cur.copy_expert(sql_copy_statement, f)
conn.commit()