Does the MATLAB Interface to SQLite support concurrency? - matlab

SQLite as a database supports varying degrees of concurrency depending on version and settings, therefore I was expecting the MATLAB Interface to SQLite in the Database Toolbox would support some level of concurrency. And when a database access fails, it should at the very least show errors.
However, when I used the following sniplet,
conn=sqlite("sqlite_concurr_test.db","create");
conn.close();
ppool=parpool(4);
ff=parallel.Future.empty();
disp("Write 500 numbers")
for ii=1:500
ff(ii)=parfeval(ppool,#writeOne,0,ii);
end
for ii=1:500
ff(ii).wait()
end
delete(ppool);
conn=sqlite("sqlite_concurr_test.db");
readback=conn.sqlread("test");
disp("Readback "+num2str(size(readback,1))+" numbers");
function writeOne(ii)
conn=sqlite("sqlite_concurr_test.db");
conn.sqlwrite("test",array2table(ii));
conn.close();
end
I got the unexpected result of
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to the parallel pool (number of workers: 4).
Write 500 numbers
Parallel pool using the 'Processes' profile is shutting down.
Readback 74 numbers
This indicates that some database write did not occur, and there are no error reports.
What can I do to change this behavior? Is there anything I should do to assure parallel access, or at the very least get notifications if anything goes wrong?

Whatever the configuration of your sqlite3 database connection you can only have one simultaneous writer. With wal mode you can have a single writer running concurrently with multiple reader. In your code sample whatever this statement does conn.sqlwrite("test",array2table(ii));, you don't check for returned error if any. So you probably have errors on concurrent writes, but you actully don't detect them.

Related

Does Firebird have "Group Commit"

As I understand it, this is where a background thread is responsible for writing transactions to disk in "careful write" order so that the user does not have to wait for the actual writing to disk to occur.
I have seen references to this (e.g. here) from a long time ago relating to interbase but I could not see it mentioned in relation to firebird anywhere.
Using gfix utility you can set FORCED WRITES flag on or off for a database file. When turned on, the server will wait until actual disk write occur. When turned off the server will continue execution leaving to OS to decide when to write data to a disk. Performance gains are up to 3x but then there is a posibility that some data would be written in a wrong order if power failure occurs.
We strongly advice our customers toward using RAID controller with independent power source for a cache memory together with FORCED WRITES = ON.
Based on the comments on this thread and searching online it seems that firebird does not have GROUP COMMIT

Multiple connections to PostgreSQL with huge number of INSERTs

This question is involved by this one: How to speed up insertion performance in PostgreSQL
So, I have java application which is doing a lot of (aprox. billion) INSERTs into PostgreSQL database. It opens several JDBC connections to the same DB for doing these inserts in parallel. As I read in mentioned question-answer:
INSERT or COPY in parallel from several connections. How many depends
on your hardware's disk subsystem; as a rule of thumb, you want one
connection per physical hard drive if using direct attached storage.
But in my case I have only one disk storage for my DB.
So, my question is: does it really have sence to open several connections in this case? Could it reduce perfomance instead of desired increasing due to I/O operations competitions?
For clarifying, here is the picture with actual postgresql processes load:
Since you mentioned INSERT in Java application, I'd assume (utilizing plain JDBC) COPY is not what you're looking for. Without using API like JPA or framework such as Spring-data, may I introduce addBatch() and executeBatch() in case you haven't heard of these:
/*
the whole nine yards
*/
Connection c = ...;
PreparedStatement ps = c.prepareStatement("INSERT INTO table1(columnInt2,columnVarchar)VALUES(?,?)");
Then read data in a loop:
ps.setShort(1, someShortValue);
ps.setString(2, someStringValue);
ps.addBatch(); // one row at a time from human's perspective
When data of all rows are prepared:
ps.executeBatch();
May I also recommend:
Connection pooling which saves you a lot of resources; check out Commons DBCP, c3p0 and BoneCP.
When doing multiple CUD (create, update, delete) operations, one should think about transaction (so you can rollback in case any row goes wrong).

PostgreSQL. Slow queries in log file are fast in psql

I have an application written on Play Framework 1.2.4 with Hibernate(default C3P0 connection pooling) and PostgreSQL database (9.1).
Recently I turned on slow queries logging ( >= 100 ms) in postgresql.conf and found some issues.
But when I tried to analyze and optimize one particular query, I found that it is blazing fast in psql (0.5 - 1 ms) in comparison to 200-250 ms in the log. The same thing happened with the other queries.
The application and database server is running on the same machine and communicating using localhost interface.
JDBC driver - postgresql-9.0-801.jdbc4
I wonder what could be wrong, because query duration in the log is calculated considering only database processing time excluding external things like network turnarounds etc.
Possibility 1: If the slow queries occur occasionally or in bursts, it could be checkpoint activity. Enable checkpoint logging (log_checkpoints = on), make sure the log level (log_min_messages) is 'info' or lower, and see what turns up. Checkpoints that're taking a long time or happening too often suggest you probably need some checkpoint/WAL and bgwriter tuning. This isn't likely to be the cause if the same statements are always slow and others always perform well.
Possibility 2: Your query plans are different because you're running them directly in psql while Hibernate, via PgJDBC, will at least sometimes be doing a PREPARE and EXECUTE (at the protocol level so you won't see actual statements). For this, compare query performance with PREPARE test_query(...) AS SELECT ... then EXPLAIN ANALYZE EXECUTE test_query(...). The parameters in the PREPARE are type names for the positional parameters ($1,$2,etc); the parameters in the EXECUTE are values.
If the prepared plan is different to the one-off plan, you can set PgJDBC's prepare threshold via connection parameters to tell it never to use server-side prepared statements.
This difference between the plans of prepared and unprepared statements should go away in PostgreSQL 9.2. It's been a long-standing wart, but Tom Lane dealt with it for the up-coming release.
It's very hard to say for sure without knowing all the details of your system, but I can think of a couple of possibilities:
The query results are cached. If you run the same query twice in a short space of time, it will almost always complete much more quickly on the second pass. PostgreSQL maintains a cache of recently retrieved data for just this purpose. If you are pulling the queries from the tail of your log and executing them immediately this could be what's happening.
Other processes are interfering. The execution time for a query varies depending on what else is going on in the system. If the queries are taking 100ms during peak hour on your website when a lot of users are connected but only 1ms when you try them again late at night this could be what's happening.
The point is you are correct that the query duration isn't affected by which library or application is calling it, so the difference must be coming from something else. Keep looking, good luck!
There are several possible reasons. First if the database was very busy when the slow queries excuted, the query may be slower. So you may need to observe the load of the OS at that moment for future analysis.
Second the history plan of the sql may be different from the current session plan. So you may need to install auto_explain to see the actual plan of the slow query.

How to use the same SQLite3 database from multiple Perl processes?

I've an unfortunate situation where multiple Perl processes write and read the same SQLite3 database at the same time.
This often causes Perl processes to crash as two processes would be writing at the same time, or one process would be reading from the database while the other tries to update the same record.
Does anyone know how I could coordinate the multiple processes to work with the same sqlite database?
I'll be working on moving this system to a different database engine but before I do that, I somehow need to fix it to work as it is.
SQLite is designed to be used from multiple processes. There are some exceptions if you host the sqlite file on a network drive, and there maybe a way to compile it such that it expects to be used from one process, but I use it from multiple processes regularly. If you are experiencing problems, try increasing the timeout value. SQLite uses the filesystem locks to protect the data from simultaneous access. If one process is writing to the file, a second process might have to wait. I set my timeouts to 3 seconds, and have very little problems with that.
Here is the link to set the timeout value

should i activate c3p0 statement pooling?

we are running java6/hibernate/c3p0/postgresql stack.
Our JDBC Driver is 8.4-701.jdbc3
I have a few questions about Prepared Statements. I have read
excellent document about Prepared Statements
But i still have a question how to configure c3p0 with postgresql.
At the moment we have
c3p0.maxStatements = 0
c3p0.maxStatementsPerConnection = 0
In my understanding the prepared statements and statement pooling are two different things:
Our hibernate stack uses prepared statements. Postgresql is caching the
execution plan. Next time the same statement is used, postgresql reuses the
execution plan. This saves time planning statements inside DB.
Additionally c3p0 can cache java instances of "java.sql.PreparedStatement"
which means it is caching the java object. So when using
c3p0.maxStatementsPerConnection = 100 it caches at most 100 different
objects. It saves time on creating objects, but this has nothing to do with
the postgresql database and its prepared statements.
Right?
As we use about 100 different statements I would set
c3p0.maxStatementsPerConnection = 100
But the c3p0 docs say in c3p0 known shortcomings
The overhead of Statement pooling is
too high. For drivers that do not
perform significant preprocessing of
PreparedStatements, the pooling
overhead outweighs any savings.
Statement pooling is thus turned off
by default. If your driver does
preprocess PreparedStatements,
especially if it does so via IPC with
the RDBMS, you will probably see a
significant performance gain by
turning Statement pooling on. (Do this
by setting the configuration property
maxStatements or
maxStatementsPerConnection to a value
greater than zero.).
So: Is it reasonable to activate maxStatementsPerConnection with c3p0 and Postgresql?
Is there a real benefit activating it?
kind regards
Janning
I don't remember offhand if Hibernate actually stores PreparedStatement instances itself, or relies on the connection provider to reuse them. (A quick scan of BatcherImpl suggests it reuses the last PreparedStatement if executing the same SQL multiple times in a row)
I think the point that the c3p0 documentation is trying to make is that for many JDBC drivers, a PreparedStatement isn't useful: some drivers will end up simply splicing the parameters in client-side and then passing the built SQL statement to the database anyway. For these drivers, PreparedStatements are no advantage at all, and any effort to reuse them is wasted. (The Postgresql JDBC FAQ says this was the case for Postgresql before sever protocol version 3 and there is more detailed information in the documentation).
For drivers that do handle PreparedStatements usefully, it's still likely necessary to actually reuse PreparedStatement instances to get any benefit. For example if the driver implements:
Connection.prepareStatement(sql) - create a server-side statement
PreparedStatement.execute(..) etc - execute that server-side statement
PreparedStatement.close() - deallocate the server-side statement
Given this, if the application always opens a prepared statement, executes it once and then closes it again, there's still no benefit; in fact, it might be worse since there are now potentially more round-trips. So the application needs to hang on to PreparedStatement instances. Of course, this leads to another problem: if the application hangs on to too many, and each server-side statement consumes some resources, then this can lead to server-side issues. In the case where someone is using JDBC directly, this might be managed by hand- some statements are known to be reusable and hence are prepared; some aren't and just use transient Statement instances instead. (This is skipping over the other benefit of prepared statements: handling argument escaping)
So this is why c3p0 and other connection pools also have prepared statement caches- it allows application code to avoid dealing with all this. The statements are usually kept in some limited LRU pool, so common statements reuse a PreparedStatement instance.
The final pieces of the puzzle are that JDBC drivers may themselves decide to be clever and do this; and servers may themselves also decide to be clever and detect a client submitting a statement that is structurally similar to a previous one.
Given that Hibernate doesn't itself keep a cache of PreparedStatement instances, you need to have c3p0 do that in order to get the benefit of them. (Which should be reduced overhead for common statements due to reusing cached plans). If c3p0 doesn't cache prepared statements, then the driver will just see the application preparing a statement, executing it, and then closing it again. Looks like the JDBC driver has a "threshold" setting for avoiding the prepare/execute server overhead in the case where the application always does this. So, yes, you need to have c3p0 do statement caching.
Hope that helps, sorry it's a bit long winded. The answer is yes.
Remember that statements have to be cached per connection which will mean you're going to have to consume quite a chunk of memory and it will take a long time before you'll see any benefit. So if you set it to use 100 statements to be cached, that's actually 100*number of connections or else 100/no of connections but you will still need to take quite some time until your cache will have any meaningful effect.