When streaming large volumes of data out of PostgreSQL into C#, using Npgsql, does the command default to Single Row Mode or will it extract the entire result set before returning? I can find no mention of Single Row Mode in the Npgsql documentation, and nothing in the source code to suggest that it is optional one way or the other.
When Npgsql sends the SQL query you give it, PostgreSQL will immediately send back all the rows. If you pass CommandBehavior.SingleRow (or SingleResult) to NpgsqlCommand.ExecuteReader, Npgsql will simply not return those rows to the user; it will consume them internally, but they are still sent from the server. In other words, if you expect these options to reduce the network bandwidth used, that won't work; your only way to do that is to limit the resultset in the SQL itself, via a LIMIT clause. This is in general a better idea anyway.
See https://github.com/npgsql/npgsql/issues/410 for a bit more detail on why we didn't implement something more aggressive.
From my experience, the default in Npgsql is to get a cursor for the result set that will fetch the number of rows you are currently processing, basically, when invoking reader.Read() you get a row from the server to the driver client. There might be some buffering taking place, but streaming the result is the norm.
Related
My application is running 200 select statements per second (like SELECT A, B, C FROM DUMMYSC.DUMMYTB, etc.). 10-15% of the queries fail with the error below:
DB2 SQL Error: SQLCODE=-913, SQLSTATE=57033, SQLERRMC=00C9008E;00000304;DSNDB06 .SYSTSTSS.X'000001C5'.X'0C'
I'm looking to use one of the solutions below, but unable to understand the difference between the two.
ResultSet.CONCUR_READ_ONLY in
statement = connection.createStatement (ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
&
FOR FETCH ONLY in SELECT A, B, C FROM DUMMYSC.DUMMYTB FOR FETCH ONLY.
For fetch only (aka For Read only ) prevents the cursor from being used in a positioned update or positioned delete statement (i.e. Update ...WHERE CURRENT OF cursor-name, or DELETE...WHERE CURRENT OF cursor-name).
At jdbc level on the client, the ResultSet concurrency option determines whether the java code can update the result-set contents or not. If you do not need the cursor to be scrollable then don't use TYPE_SCROLL_*, instead use TYPE_FORWARD_ONLY as that should improve concurrency. CONCUR_READ_ONLY and FOR FETCH ONLY work together.
Sometimes it's best to ensure a plan-specific isolation level by using a WITH CS or WITH UR clause on the query, instead of depending on the package isolation or some default that you don't control.
For Db2-on-Z/OS, If your application can cope with incomplete results, i.e. if that makes business sense, then you can use SKIP LOCKED DATA in your query. For Db2 for Linux/Unix/Windows, other registry settings and special register settings are available to get similar behaviour.
There's also the USE AND KEEP...LOCKS syntax in the isolation clause of the query, which influences the duration of locks.
Cannot tell from your question whether the result-set is read-only by nature (for example, if the query is from a read only view ), or how your java code runs the query (via a prepared statement or not?) , these influence outcomes.
A DBA will be able to show you exactly what locks your transaction is taking, for a specific combination of settings for the jdbc cursor/Resultset and query syntax .
The information you posted is not enough to decide what caused the timeout on the table space access. It could be other SQLs holding the lock or some of these 200 SQLs attempting update, or others.
But if you know for sure that you don't need to update the data in your SQL and you don't worry about dirty read, then you should specify "FOR READ ONLY WITH UR" in your query. This not only avoids any potential timeout caused by other SQLs but also lowers the resource overhead and improves the system performance.
I am using Azure PostgreSQL, I have a lot of files saved as byeta datatype in a table. In my project, I will execute some SQL query to get these files.
Sometimes a query will involve multiple files so the result data size of SQL query will be large. My questions: is there has some data size limit of SQL result for one SQL query ? Should I do some limit here? Any suggestion is appreciated.
There is no limit for the size of a result set in PostgreSQL.
However, many clients cache the whole result set in memory, which can easily lead to an out-of-memory condition on the client side.
There are ways around that:
Use cursors and fetch the result row by row or in batches. That should work with any client API.
With the C API (libpq), you could activate single-row mode.
With JDBC, you could set the fetch size.
Note that this means that you could get a runtime error from the database server in the middle of processing a result set.
I am working on a project which uses graphql and PostgreSQL where we want to select data from the database with a value after a certain date. It is currently selecting all data from the database and then filtering it on the server:
.filter(({time}) => moment(time).isAfter(startTime))
However I would have thought it would be best to do this filtering in the database query as the full dataset is never used.
Is there any benefit to doing it on the server rather than in the database query?
Barring some unusual edge case -- such as other parts of your backend code really do need all the data for some reason -- it would definitely be more efficient to filter everything on the Postgres side via the SQL that is being used to fetch the data in the first place.
This is true for several reasons:
Assuming the table is properly indexed, the filtering will be able to occur much faster within the database.
The unneeded data will not need to be serialized and sent over the wire to the backend, only to then be discarded by the backend's own filtering.
The memory footprint should be reduced on both the Postgres and server end due to needing to process only a portion of the results.
I've not worked with GraphQL myself, but from doing a bit of poking around through its docs, it appears GraphQL often uses other mechanisms in different layers (outside of the database) to try to improve performance.
It would be worth seeing what the actual SQL is that your GraphQL query is generating (that may be possible via a function in GraphQL; it could also be done by enabling certain log settings on the Postgres server and correlating the log output to the query). That may lead to further optimization possibilities if you want to keep things purely GraphQL.
Jumping down to a raw query seems like it would be a good possibility though. Certainly that is something that is often done with ORMs like Django and ActiveRecord.
For a simple select query like select column_name from table_name on a very large table, is it possible to have the output being provided as the scan of the table progresses?
If I abort the command after sometime, I expect to get output from the select at least thus far.
Think cat, which I believe won't wait till it completes the full read of the file.
Does MySQL or other RDBMS systems support this?
PostgreSQL always streams the result to the client, and usually it is the client library that collects the whole result set before returning it to the user.
The C API libpq has functionality that supports this. The main disadvantage with this approach is that you could get a run time error after you already have received a some rows, so that's a case you'd have to handle.
The traditional way to receive a query result in parts is to use a cursor and fetch results from it. This is a technique supported by all client APIs.
Cursors are probably what you are looking for, and they are supported by all RDBMS I know in some fashion.
I'm looking to send multiple read queries to a Postgres database in order to reduce the number of trips that need to be made to a painfully remote database. Is there anything in libpq that supports this behavior?
Yes, you can use the asynchronous handling functions in libpq. On the linked page it says:
Using PQsendQuery and PQgetResult solves one of PQexec's problems: If
a command string contains multiple SQL commands, the results of those
commands can be obtained individually. (This allows a simple form of
overlapped processing, by the way: the client can be handling the
results of one command while the server is still working on later
queries in the same command string.)
For example, you should be able to call PQsendQuery with a string containing multiple queries, then repeatedly call PQgetResult to get the result sets. PQgetResult returns NULL when there are no more result sets to obtain.
If desired, you can also avoid your application blocking while it waits for these queries to execute (described in more detail on the linked page).