Performance issues using psqlodbc in Excel/PowerQuery [duplicate] - postgresql

This question already exists:
psqlodbc performance issues when using functions [duplicate]
Closed 1 year ago.
Already asked the question here but it was falsely marked as closed.
I use psqlODBC and PowerQuery to load data from a PostgreSQL (12.4) to Microsoft Excel. Recently I am having performance issues I cannot explain. Some of the functions take extremely long (15 minutes and longer) to pull the data via ODBC. Some of the functions take only seconds. When running the same functions using pgAdmin, Beekeeper or psql from command line none of them takes more than 500ms for execution.
Is there anything I can do to improve the performance of psqlODBC?
I found this answer stating that it might be possible that the aggregations are done client side. That would be really bad as the tables contain several million rows and I would only want to send the aggregated columns back. Can someone give more information to that? I wasn't able to find anything about that.
What I've tried so far
I replaced a function by a view, which seems to solve the problem. However, this is not a general solution as there are some functions that contain logic that cannot be done in a view.
I used pg_stat_statements and auto_explain to find out if there is a problem in the functions. I did not find anything suspicious. As mentioned before, the problem only occurs when using psqlODBC, so I expect the problem to be anywhere there.
Modifying the psqlODBC config's parameters. No success so far.
Details
The functions in postgreSQL are usually defined like this:
CREATE FUNCTION func(parameter integer) RETURNS TABLE(<col1> <type1>, ...)
LANGUAGE 'plpgsql' COST 100 STABLE PARALLEL UNSAFE ROWS 500
AS $BODY$
BEGIN
RETURN QUERY
<SELECT ...>
END
I use functions instead of views because some of them contain logic that cannot be done in views.
The server is running PostgreSQL 12.4, client uses psqlODBC 12.02, Excel 2016 (latest).
I am thankful for every piece of advice here how I could solve my problem.
Thank you.

Related

How to go ahead with the question using SQLworkbench?

Note that MR stands for “most recent”:
PHA_NAME,
MR_INSPECTION_DATE,
MR_INSPECTION_COST,
SECOND_MR_INSPECTION_DATE,
SECOND_MR_INSPECTION_COST,
CHANGE_IN_COST
PERCENT_CHANGE_IN_COST
Management has asked that you perform this function using lead or lag functions in SQL.
However, they’re concerned that the files when imported into MySQL Workbench may not properly refer to dates using the correct format. If that is the case, they’ve asked you to investigate how best to convert dates from TEXT to Date format so that the lead/lag functions work as expected.
They’ve also asked that you filter your dataset to only those PHAs that saw an increase in $$ cost, and that you only list the PHA once with no duplicates to avoid noisy data.
Naturally, this would also require you to filter out PHAs that only performed one inspection, so they’ve asked you to remove those as well.

Unit tests producing different results when using PostgreSQL

Been working on a module that is working pretty well when using MySQL, but when I try and run the unit tests I get an error when testing under PostgreSQL (using Travis).
The module itself is here: https://github.com/silvercommerce/taxable-currency
An example failed build is here: https://travis-ci.org/silvercommerce/taxable-currency/jobs/546838724
I don't have a huge amount of experience using PostgreSQL, but I am not really sure why this might be happening? The only thing I could think that might cause this is that I am trying to manually set the ID's in my fixtures file and maybe PostgreSQL not support this?
If this is not the case, does anyone have an idea what might be causing this issue?
Edit: I have looked again into this and the errors appear to be because of this assertion, which should be finding the Tax Rate vat but instead finds the Tax Rate reduced
I am guessing there is an issue in my logic that is causing the incorrect rate to be returned, though I am unsure why...
In the end it appears that Postgres has different default sorting to MySQL (https://www.postgresql.org/docs/9.1/queries-order.html). The line of interest is:
The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on
In the end I didn't really need to test a list with multiple items, so instead I just removed the additional items.
If you are working on something that needs to support MySQL and Postgres though, you might need to consider defining a consistent sort order as part of your query.

find out postgresql query progress

I have ran an UPDATE query in postgresql. After 5 days it has not finished yet. Is it possible to check percentage of completeness of my query which I have ran previously. I want to know if it takes only 5 days later to complete for example, or 955 days!
There is no easy way to do that.
If you understand the PostgreSQL source code and you know the execution plan of your query and you have installed the debugging symbols, you can attach a debugger to the backend process, examine the stack and determine where approximately in the query execution you are.

IBM i (System i) V7R1 upgrade causes cursor not open errors on ODBC connection

For years, at least 8, our company has been running a process daily that has never failed. Nothing on the client side has changed, but we recently upgraded to V7R1 on the System i. The very first run of the old process fails with a Cursor not open message reported back to the client, and that's all that's in the job log as well. I have seen Error -501, SQLSTATE 24501 on occasions.
I got both IBM and DataDirect (provider of the ODBC driver) involved. IBM stated it was a client issue, DataDirect dug through logs and found that when requesting the next block of records from a cursor this error occurs. They saw no indication that the System i alerted the client that the cursor was closed.
In troubleshooting, I noticed that the ODBC driver has an option for WITH HOLD which by default is checked. If I uncheck it, this particular issue goes away, but it introduces another issue (infinite loops) which is even more serious.
There's no single common theme that causes these errors, the only thing that I see that causes this is doing some processing while looping through a fairly large resultset. It doesn't seem to be related to timing, or to a particular table or table type. The outside loops are sometimes large tables with many datatypes, sometimes tiny tables with nothing but CHAR(10) and CHAR(8) data types.
I don't really expect an answer on here since this is a very esoteric situation, but there's always some hope.
There were other issues that IBM has already addressed by having us apply PTFs to take us to 36 for the database level. I am by no means a System i expert, just a Java programmer who has to deal with this issue that has nothing to do with Java at all.
Thanks
This is for anyone else out there who may run across a similar issue. It turns out it was a bug in the QRWTSRVR code that caused the issue. The driver opened up several connections within a single job and used the same name for cursors in at least 2 of those connections. Once one of those cursors was closed QRWTSRVR would mistakenly attempt to use the closed cursor and return the error. Here is the description from the PTF cover letter:
DESCRIPTION OF PROBLEM FIXED FOR APAR SE62670 :
A QRWTSRVR job with 2 cursors named C01 takes a MSGSQL0501
error when trying to fetch from the one that is open. The DB2
code is trying to use the cursor which is pseudo closed.
The PTF SI57756 fixed the issue. I do not know that this PTF will be generally released, but if you find this post because of a similar issue hopefully this will assist you in getting it corrected.
This is how I fix DB problems on the iseries.
Start journaling the tables on the iseries or change the connection to the iseries to commit = *NONE.
for the journaling I recommend using two journals each with its own receiver.
one journal for tables with relatively few changes like a table of US States or a table that gets less than 10 updates a month. This is so you can determine when the data was changed for an audit. Keep all the receivers for this journal on-line for ever.
one journal for tables with many changes through out the day. Delete the receivers for these journals when you can no longer afford the space they take up.
If the journal or commit *none doesn't fix it. You'll need to look at the sysixadv table long running queries can wreck an ODBC connection.
SELECT SYS_TNAME, TBMEMBER, INDEX_TYPE, LASTADV, TIMESADV, ESTTIME,
REASON, "PAGESIZE", QUERYCOST, QUERYEST, TABLE_SIZE, NLSSNAME,
NLSSDBNAME, MTIUSED, MTICREATED, LASTMTIUSE, QRYMICRO, EVIVALS,
FIRSTADV, SYS_DNAME, MTISTATS, LASTMTISTA, DEPCNT FROM sysixadv
ORDER BY ESTTIME desc
also order by timesadv desc
fix those queries maybe create the advised index.
Which ODBC driver are you using?
If you're using the IBM i Access ODBC driver, then this problem may be fixed by APAR SE61342. The driver didn't always handle the return code from the server that indicated that the result set was closed and during the SQLCloseCursor function, the driver would send a close command to the server, which would return an error, since the server had already closed the cursor. Note, you don't have to be at SP11 to hit this condition, it just made it easier to hit, since I enabled pre-fetch in more cases in that fixpack. An easy test to see if that is the problem is to disable pre-fetch for the DSN or pass PREFETCH=0 on the connection string.
If you're using the DB2 Connect driver, I can't really offer much help, sorry.

Perl script fails when selecting data from big PostgreSQL table

I'm trying to run a SELECT statement on PostgreSQL database and save its result into a file.
The code runs on my environment but fails once I run it on a lightweight server.
I monitored it and saw that the reason it fails after several seconds is due to a lack of memory (the machine has only 512MB RAM). I didn't expect this to be a problem, as all I want to do is to save the whole result set as a JSON file on disk.
I was planning to use fetchrow_array or fetchrow_arrayref functions hoping to fetch and process only one row at a time.
Unfortunately I discovered there's no difference when it comes to the true fetch operations between the two above and fetchall_arrayref when you use DBD::Pg. My script fails at the $sth->execute() call, even before it has a chance to do call any fetch... function.
This suggests to me that the implementation of execute in DBD::Pg actually fetches ALL the rows into memory, leaving only the actual format its returned to the fetch... functions.
A quick look at the DBI documentation gives a hint:
If the driver supports a local row cache for SELECT statements, then this attribute holds the number of un-fetched rows in the cache. If the driver doesn't, then it returns undef. Note that some drivers pre-fetch rows on execute, whereas others wait till the first fetch.
So in theory I would just need to set the RowCacheSize parameter. I've tried but this feature doesn't seem to be implemented by DBD::Pg
Not used by DBD::Pg
I find this limitation a huge general problem (execute() call pre-fetches all rows?) and more inclined to believe that I'm missing something here, than that this is actually a true limitation of interacting with PostgreSQL databases using Perl.
Update (2014-03-09): My script works now thanks to using a workaround as described in my comment to Borodin's answer. The maintainer of DBD::Pg library got back to me on the issue actually saying the root cause is deeper and lies within libpq postgresql internal library (used by DBD::Pg). Also, I think very similar issue to the one described here affects pgAdmin. Being postgresql native tool it still doesn't give in the Options chance to define the default limit of the result set row size. This is probably why it makes Query tool sometimes waiting a good while before presenting results from bulky queries, potentially breaking the app in some cases too.
In the section Cursors, the documentation for the database driver says this
Therefore the "execute" method fetches all data at once into data structures located in the front-end application. This fact must to be considered when selecting large amounts of data!
So your supposition is correct. However the same section goes on to describe how you can use cursors in your Perl application to read the data in chunks. I believe this would fix your problem.
Another alternative is to use OFFSET and LIMIT clauses on your SELECT statement to emulate cursor functionality. If you write
my $select = $dbh->prepare('SELECT * FROM table OFFSET ? LIMIT 1');
then you can say something like (all of this is untested)
my $i = 0;
while ($select->execute($i++)) {
my #data = $select->fetchrow_array;
# Process data
}
to read your tables one row at a time.
You may find that you need to increase the chunk size to get an acceptable level of efficiency.