How to count the number of queries executed in postgres? - postgresql

I have 2 sql files and each containing 100 queries.
I need to execute first 10 queries from the first sql file and then execute first 10 queries from 2nd sql file. After the executions of 10 queries from 2nd sql file, the 11th query should start execution from the 1st sql file.
Is there a way to keep count of how many queries have completed?
How to pause the query execution in 1st file and resume it after completion of certain number of queries?

You can't do this with the psql command line client, its include file handling is limited to reading the file and sending the whole contents to the server query-by-query.
You'll want to write a simple Perl or Python script, using DBD::Pg (Perl) or psycopg2 (Python) that reads the input files and sends queries.
Splitting the input requires parsing the SQL, which requires a bit of care. You can't just split into queries on semicolons. You must handle quoted "identifier"s and 'literal's as well as E'escape literals', and $dollar$ quoting $dollar$. You may be able to find existing code to help you with this, or use functionality from the database client driver to do it.
Alternately, if you can modify the input files to insert entries into them, you can potentially run them using multiple psql instances and use advisory locking as an interlock to cause them to wait for each other at set points. For details see explicit locking.

Related

What would happen if I run two SQL commands using the same DB connection?

I'm writing a program to run mass calculation and output results into PostgreSQL.
My platform is Windows Sever 2008, PostgreSQL 10. My program is written in C.
The results would be produced group by group, finishing of each group will create an extra thread to write the output.
Now since the output threads are created one by one, it is possible that two or more SQL input commands will be created simultaneously, or the previous one is under process when new ones call the function.
So my questions are:
(1) What would happen if one thread is in SQL processing and another thread called PQexec(PGconn *conn, const char *query), would they effect each other?
(2) What if I apply different PGconn? Would it speed up?
If you try to call PQexec on a connection that is in the process of executing an SQL statement, you would cause a protocol violation. That just doesn't work.
Processing could certainly be made faster if you use several database connections in parallel — concurrent transactions is something that PostgreSQL is designed for.

How to improve import speed on SQL Workbench/J

Tried like below, but it imports terribly slow, with speed 3 rows/sec
WbImport -file=c:/temp/_Cco_.txt
-table=myschema.table1
-filecolumns=warehouse_id,bin_id,cluster_name
---deleteTarget
-batchSize=10000
-commitBatch
WbInsert can use the COPY API of the Postgres JDBC driver.
To use it, use
WbImport -file=c:/temp/_Cco_.txt
-usePgCopy
-table=myschema.table1
-filecolumns=warehouse_id,bin_id,cluster_name
The options -batchSize and -commitBatch are ignored in that case, so you should remove them.
SQL Workbench/J will then essentially use the equivalent of a COPY ... FROM STDIN. That should be massively faster than regular INSERT statements.
This requires that the input file is formatted according to the requirements of the COPY command.
WbImport uses INSERT to load data. This is the worst way to load data into Redshift.
You should be using the COPY command for this as noted in the Redshift documentation:
"We strongly recommend using the COPY command to load large amounts of data. Using individual INSERT statements to populate a table might be prohibitively slow."

Multistatement Queries in Postgres

I'm looking to send multiple read queries to a Postgres database in order to reduce the number of trips that need to be made to a painfully remote database. Is there anything in libpq that supports this behavior?
Yes, you can use the asynchronous handling functions in libpq. On the linked page it says:
Using PQsendQuery and PQgetResult solves one of PQexec's problems: If
a command string contains multiple SQL commands, the results of those
commands can be obtained individually. (This allows a simple form of
overlapped processing, by the way: the client can be handling the
results of one command while the server is still working on later
queries in the same command string.)
For example, you should be able to call PQsendQuery with a string containing multiple queries, then repeatedly call PQgetResult to get the result sets. PQgetResult returns NULL when there are no more result sets to obtain.
If desired, you can also avoid your application blocking while it waits for these queries to execute (described in more detail on the linked page).

Behaviour of fetchrow_hashref in Perl

I am trying to execute a procedure from Perl and store the results in a text file (on Windows). I am using DBI's
fetchrow_hashref() to fetch row results. The stored procedure that I am trying to execute returns more than 5 million rows. I want to know the functionality "behind-the-scene" - particularly what happens during the
fetchrow_hashref() call. e.g. Perl executes the procedure, the procedure returns all the impacted rows, keeps it in a pool (either on Database side or the calling machine side?) and then Perl selects the rows from resultset one by one. Does it happen that way or something else?
This is a difficult question to answer as you've not said which Perl database driver you are using. I'm assuming you are using DBD::ODBC and the MS SQL Server ODBC Driver in this answer.
When you call prepare on the SQL calling the procedure the ODBC driver sends it to MS SQL Server where the procedure is parsed. On calling execute the procedure is started (how you progress through the procedure depends on a lot of things). Assuming the first thing in your procedure is a select a cursor will be created for the query and MS SQL Server will start sending the rows back to the ODBC Driver (it uses the TDS protocol). In the mean time DBD::ODBC will make ODBC calls to the driver which tells it there is a result-set (SQLNumResultCol returns a non zero value). DBD::ODBC will then query the driver for the types of the columns in the result-set and bind them (SQLBindCol).
Each time you call fetchrow_hashref, DBD::ODBC will call SQLFetch, the ODBC driver will read a row from the socket and copy the data to the bound buffers.
There are important things to realise here. Mostly MS SQL Server will write a lot of rows initially to the socket even though the ODBC driver is probably not reading them yet. As a result, if you close your statement the driver has to read a lot of rows from the socket and throw them away. If you use a non standard cursor or enable Multiple Active Statements in the driver then rows are sent back to the driver one at a time so the ODBC driver can ask the server to move forward, backward in the result-set or request a row from result-set 1 then result-set 2.
There are other areas a little unusual when using procedures like whether nocount is enabled or not and you progress through your procedure statements using SQLMoreResults (odbc_more_results). Also, the output parameters in a procedure are no available until SQLMoreResults returns false.
You may find Multiple Active Statements (MAS) and DBD::ODBC of some interest to you or may some of the other articles. You may also want to read about the TDS protocol.

Extract Active Directory into SQL database using VBScript

I have written a VBScript to extract data from Active Directory into a record set. I'm now wondering what the most efficient way is to transfer the data into a SQL database.
I'm torn between;
Writing it to an excel file then firing an SSIS package to import it or...
Within the VBScript, iterating through the dataset in memory and submitting 3000+ INSERT commands to the SQL database
Would the latter option result in 3000+ round trips communicating with the database and therefore be the slower of the two options?
Sending an insert row by row is always the slowest option. This is what is known as Row by Agonizing Row or RBAR. You should avoid that if possible and take advantage of set based operations.
Your other option, writing to an intermediate file is a good option, I agree with #Remou in the comments that you should probably pick CSV rather than Excel if you are going to choose this option.
I would propose a third option. You already have the design in VB contained in your VBscript. You should be able to convert this easily to a script component in SSIS. Create an SSIS package, add a DataFlow task, add a Script Component (as a datasource {example here}) to the flow, write your fields out to the output buffer, and then add a sql destination and save yourself the step of writing to an intermediate file. This is also more secure, as you don't have your AD data on disk in plaintext anywhere during the process.
You don't mention how often this will run or if you have to run it within a certain time window, so it isn't clear that performance is even an issue here. "Slow" doesn't mean anything by itself: a process that runs for 30 minutes can be perfectly acceptable if the time window is one hour.
Just write the simplest, most maintainable code you can to get the job done and go from there. If it runs in an acceptable amount of time then you're done. If it doesn't, then at least you have a clean, functioning solution that you can profile and optimize.
If you already have it in a dataset and if it's SQL Server 2008+ create a user defined table type and send the whole dataset in as an atomic unit.
And if you go the SSIS route, I have a post covering Active Directory as an SSIS Data Source