I'm looking to send multiple read queries to a Postgres database in order to reduce the number of trips that need to be made to a painfully remote database. Is there anything in libpq that supports this behavior?
Yes, you can use the asynchronous handling functions in libpq. On the linked page it says:
Using PQsendQuery and PQgetResult solves one of PQexec's problems: If
a command string contains multiple SQL commands, the results of those
commands can be obtained individually. (This allows a simple form of
overlapped processing, by the way: the client can be handling the
results of one command while the server is still working on later
queries in the same command string.)
For example, you should be able to call PQsendQuery with a string containing multiple queries, then repeatedly call PQgetResult to get the result sets. PQgetResult returns NULL when there are no more result sets to obtain.
If desired, you can also avoid your application blocking while it waits for these queries to execute (described in more detail on the linked page).
Related
We have a use-case where we are having RDS Postgres which we are accessing through Java application. We want to run 2 select queries and don't want to run them sequentially to save the latency. If query1 responses are non-null and non-empty then we'll consider its responses for further processing otherwise will use responses of query2.
I haven't found proper resource which mentions how to do that ? I have confusions like whether its possible or not ? Do I need to create 2 different sessions first and then start async call, as I think 2 queries cannot run together in one postgres session.
Please guide me out here.
When streaming large volumes of data out of PostgreSQL into C#, using Npgsql, does the command default to Single Row Mode or will it extract the entire result set before returning? I can find no mention of Single Row Mode in the Npgsql documentation, and nothing in the source code to suggest that it is optional one way or the other.
When Npgsql sends the SQL query you give it, PostgreSQL will immediately send back all the rows. If you pass CommandBehavior.SingleRow (or SingleResult) to NpgsqlCommand.ExecuteReader, Npgsql will simply not return those rows to the user; it will consume them internally, but they are still sent from the server. In other words, if you expect these options to reduce the network bandwidth used, that won't work; your only way to do that is to limit the resultset in the SQL itself, via a LIMIT clause. This is in general a better idea anyway.
See https://github.com/npgsql/npgsql/issues/410 for a bit more detail on why we didn't implement something more aggressive.
From my experience, the default in Npgsql is to get a cursor for the result set that will fetch the number of rows you are currently processing, basically, when invoking reader.Read() you get a row from the server to the driver client. There might be some buffering taking place, but streaming the result is the norm.
For a simple select query like select column_name from table_name on a very large table, is it possible to have the output being provided as the scan of the table progresses?
If I abort the command after sometime, I expect to get output from the select at least thus far.
Think cat, which I believe won't wait till it completes the full read of the file.
Does MySQL or other RDBMS systems support this?
PostgreSQL always streams the result to the client, and usually it is the client library that collects the whole result set before returning it to the user.
The C API libpq has functionality that supports this. The main disadvantage with this approach is that you could get a run time error after you already have received a some rows, so that's a case you'd have to handle.
The traditional way to receive a query result in parts is to use a cursor and fetch results from it. This is a technique supported by all client APIs.
Cursors are probably what you are looking for, and they are supported by all RDBMS I know in some fashion.
I'm writing a program to run mass calculation and output results into PostgreSQL.
My platform is Windows Sever 2008, PostgreSQL 10. My program is written in C.
The results would be produced group by group, finishing of each group will create an extra thread to write the output.
Now since the output threads are created one by one, it is possible that two or more SQL input commands will be created simultaneously, or the previous one is under process when new ones call the function.
So my questions are:
(1) What would happen if one thread is in SQL processing and another thread called PQexec(PGconn *conn, const char *query), would they effect each other?
(2) What if I apply different PGconn? Would it speed up?
If you try to call PQexec on a connection that is in the process of executing an SQL statement, you would cause a protocol violation. That just doesn't work.
Processing could certainly be made faster if you use several database connections in parallel — concurrent transactions is something that PostgreSQL is designed for.
I have 2 sql files and each containing 100 queries.
I need to execute first 10 queries from the first sql file and then execute first 10 queries from 2nd sql file. After the executions of 10 queries from 2nd sql file, the 11th query should start execution from the 1st sql file.
Is there a way to keep count of how many queries have completed?
How to pause the query execution in 1st file and resume it after completion of certain number of queries?
You can't do this with the psql command line client, its include file handling is limited to reading the file and sending the whole contents to the server query-by-query.
You'll want to write a simple Perl or Python script, using DBD::Pg (Perl) or psycopg2 (Python) that reads the input files and sends queries.
Splitting the input requires parsing the SQL, which requires a bit of care. You can't just split into queries on semicolons. You must handle quoted "identifier"s and 'literal's as well as E'escape literals', and $dollar$ quoting $dollar$. You may be able to find existing code to help you with this, or use functionality from the database client driver to do it.
Alternately, if you can modify the input files to insert entries into them, you can potentially run them using multiple psql instances and use advisory locking as an interlock to cause them to wait for each other at set points. For details see explicit locking.