Short story: A report running against a Progress database (OpenEdge Release 10.1C03) takes hours to complete. I suspect that it does not take advantage of existing data indexes. Would like to understand how it scans the data to then try to add an index that will make it run faster.
Source code of the report is not available. The code is native Progress 4GL, not SQL.
If it were an SQL database I would try to do a dump of SQL queries and would then go from that. With 4GL I did not find any such functionality. Is it possible to somehow peek at what gets executed at the low level?
What else can be done if there is no source code?
Thanks!
There are several things you can do:
If I recall correctly 10.1C should have the _usertablestat and _userindexstat virtual system tables available. These allow you to observe, at runtime, what tables and indexes are being accessed by a particular session. You can either write your own 4GL program to query them or you can use the screens in PROMON, R&D, 3 "Other Displays", 5 "I/O Operations by User by Table" and 6 "I/O Operations by User by Index". That will show you what tables and indexes are actually in use and how much use they are getting. If the observed data seems wrong it will probably give you a clue. (If the VSTs are missing it might be because the db was upgraded from an older version -- add them with proutil dbname -C updatevsts.)
You could also use the session startup parameters -clientlog "filename" and -logentrytypes QryInfo to obtain more detailed information about the queries being executed.
Keep in mind that Progress is not SQL. Unlike most SQL databases the 4gl uses a static, compile-time, optimizer. Index selection happens when the code is compiled. So unless you can recompile (and you seem to not have source so that seems unlikely) you won't be able to improve things by adding missing indexes. You might, however, at least be able to show the person who does have source where the problem is.
Another tool that can help is the profiler. This will identify where in the code the time is being spent. That can also be good information to provide to the original vendor if they need help finding the problem. For more information on the profiler: http://dbappraise.com/ppt/profiler.pptx
Related
I'm trying to run a SELECT statement on PostgreSQL database and save its result into a file.
The code runs on my environment but fails once I run it on a lightweight server.
I monitored it and saw that the reason it fails after several seconds is due to a lack of memory (the machine has only 512MB RAM). I didn't expect this to be a problem, as all I want to do is to save the whole result set as a JSON file on disk.
I was planning to use fetchrow_array or fetchrow_arrayref functions hoping to fetch and process only one row at a time.
Unfortunately I discovered there's no difference when it comes to the true fetch operations between the two above and fetchall_arrayref when you use DBD::Pg. My script fails at the $sth->execute() call, even before it has a chance to do call any fetch... function.
This suggests to me that the implementation of execute in DBD::Pg actually fetches ALL the rows into memory, leaving only the actual format its returned to the fetch... functions.
A quick look at the DBI documentation gives a hint:
If the driver supports a local row cache for SELECT statements, then this attribute holds the number of un-fetched rows in the cache. If the driver doesn't, then it returns undef. Note that some drivers pre-fetch rows on execute, whereas others wait till the first fetch.
So in theory I would just need to set the RowCacheSize parameter. I've tried but this feature doesn't seem to be implemented by DBD::Pg
Not used by DBD::Pg
I find this limitation a huge general problem (execute() call pre-fetches all rows?) and more inclined to believe that I'm missing something here, than that this is actually a true limitation of interacting with PostgreSQL databases using Perl.
Update (2014-03-09): My script works now thanks to using a workaround as described in my comment to Borodin's answer. The maintainer of DBD::Pg library got back to me on the issue actually saying the root cause is deeper and lies within libpq postgresql internal library (used by DBD::Pg). Also, I think very similar issue to the one described here affects pgAdmin. Being postgresql native tool it still doesn't give in the Options chance to define the default limit of the result set row size. This is probably why it makes Query tool sometimes waiting a good while before presenting results from bulky queries, potentially breaking the app in some cases too.
In the section Cursors, the documentation for the database driver says this
Therefore the "execute" method fetches all data at once into data structures located in the front-end application. This fact must to be considered when selecting large amounts of data!
So your supposition is correct. However the same section goes on to describe how you can use cursors in your Perl application to read the data in chunks. I believe this would fix your problem.
Another alternative is to use OFFSET and LIMIT clauses on your SELECT statement to emulate cursor functionality. If you write
my $select = $dbh->prepare('SELECT * FROM table OFFSET ? LIMIT 1');
then you can say something like (all of this is untested)
my $i = 0;
while ($select->execute($i++)) {
my #data = $select->fetchrow_array;
# Process data
}
to read your tables one row at a time.
You may find that you need to increase the chunk size to get an acceptable level of efficiency.
I have just moved to PostgreSQL after having worked with Oracle for a few years.
I have been looking into some performance issues with prepared statements in the application (Java, JDBC) with the PostgreSQL database.
Oracle caches prepared statements in its SGA - the pool of prepared statements is shared across database connections.
PostgreSQL documentation does not seem to indicate this. Here's the snippet from the documentation (https://www.postgresql.org/docs/current/static/sql-prepare.html) -
Prepared statements only last for the duration of the current database
session. When the session ends, the prepared statement is forgotten,
so it must be recreated before being used again. This also means that
a single prepared statement cannot be used by multiple simultaneous
database clients; however, each client can create their own prepared
statement to use.
I just want to make sure that I am understanding this right, because it seems so basic for a database to implement some sort of common pool of commonly executed prepared statements.
If PostgreSQL does not cache these that would mean every application that expects a lot of database transactions needs to develop some sort of prepared statement pool that can be re-used across connections.
If you have worked with PostgreSQL before, I would appreciate any insight into this.
Yes, your understanding is correct. Typically if you had a set of prepared queries that are that critical then you'd have the application call a custom function to set them up on connection.
There are three key reasons for this afaik:
There's a long todo list and they get done when a developer is interested/paid to tackle them. Presumably no-one has thought it worth funding yet or come up with an efficient way of doing it.
PostgreSQL runs in a much wider range of environments than Oracle. I would guess that 99% of installed systems wouldn't see much benefit from this. There are an awful lot of setups without high-transaction performance requirement, or for that matter a DBA to notice whether it's needed or not.
Planned queries don't always provide a win. There's been considerable work done on delaying planning/invalidating caches to provide as good a fit as possible to the actual data and query parameters.
I'd suspect the best place to add something like this would be in one of the connection pools (pgbouncer/pgpool) but last time I checked such a feature wasn't there.
HTH
I have an application written on Play Framework 1.2.4 with Hibernate(default C3P0 connection pooling) and PostgreSQL database (9.1).
Recently I turned on slow queries logging ( >= 100 ms) in postgresql.conf and found some issues.
But when I tried to analyze and optimize one particular query, I found that it is blazing fast in psql (0.5 - 1 ms) in comparison to 200-250 ms in the log. The same thing happened with the other queries.
The application and database server is running on the same machine and communicating using localhost interface.
JDBC driver - postgresql-9.0-801.jdbc4
I wonder what could be wrong, because query duration in the log is calculated considering only database processing time excluding external things like network turnarounds etc.
Possibility 1: If the slow queries occur occasionally or in bursts, it could be checkpoint activity. Enable checkpoint logging (log_checkpoints = on), make sure the log level (log_min_messages) is 'info' or lower, and see what turns up. Checkpoints that're taking a long time or happening too often suggest you probably need some checkpoint/WAL and bgwriter tuning. This isn't likely to be the cause if the same statements are always slow and others always perform well.
Possibility 2: Your query plans are different because you're running them directly in psql while Hibernate, via PgJDBC, will at least sometimes be doing a PREPARE and EXECUTE (at the protocol level so you won't see actual statements). For this, compare query performance with PREPARE test_query(...) AS SELECT ... then EXPLAIN ANALYZE EXECUTE test_query(...). The parameters in the PREPARE are type names for the positional parameters ($1,$2,etc); the parameters in the EXECUTE are values.
If the prepared plan is different to the one-off plan, you can set PgJDBC's prepare threshold via connection parameters to tell it never to use server-side prepared statements.
This difference between the plans of prepared and unprepared statements should go away in PostgreSQL 9.2. It's been a long-standing wart, but Tom Lane dealt with it for the up-coming release.
It's very hard to say for sure without knowing all the details of your system, but I can think of a couple of possibilities:
The query results are cached. If you run the same query twice in a short space of time, it will almost always complete much more quickly on the second pass. PostgreSQL maintains a cache of recently retrieved data for just this purpose. If you are pulling the queries from the tail of your log and executing them immediately this could be what's happening.
Other processes are interfering. The execution time for a query varies depending on what else is going on in the system. If the queries are taking 100ms during peak hour on your website when a lot of users are connected but only 1ms when you try them again late at night this could be what's happening.
The point is you are correct that the query duration isn't affected by which library or application is calling it, so the difference must be coming from something else. Keep looking, good luck!
There are several possible reasons. First if the database was very busy when the slow queries excuted, the query may be slower. So you may need to observe the load of the OS at that moment for future analysis.
Second the history plan of the sql may be different from the current session plan. So you may need to install auto_explain to see the actual plan of the slow query.
I just got into a new company and my task is to optimize the Database performance. One possible (and suggested) way would be to use multiple servers instead of one. As there are many possible ways to do that, i need to analyse the DB first. Is there a tool with which i can measure how many Inserts/Updates and Deletes are performed for each table?
I agree with Surfer513 that the DMV is going to be much better than CDC. Adding CDC is fairly complex and will add a load to the system. (See my article here for statistics.)
I suggest first setting up a SQL Server Trace to see which commands are long-running.
If your system makes heavy use of stored procedures (which hopefully it does), also check out sys.dm_exec_procedure_stats. That will help you to concentrate on the procedures/tables/views that are being used most-often. Look at execution_count and total_worker_time.
The point is that you want to determine which parts of your system are slow (using Trace) so that you know where to spend your time.
One way would be to utilize Change Data Capture (CDC) or Change Tracking. Not sure how in depth you are looking for with this, but there are other simpler ways to get a rough estimate (doesn't look like you want exacts, just ballpark figures..?).
Assuming that there are indexes on your tables, you can query sys.dm_db_index_operational_stats to get data on inserts/updates/deletes that affect the indexes. Again, this is a rough estimate but it'll give you a decent idea.
I have a quite slow data retrieval from a sqlite database on my iPhone and perhaps someone have an alternative idea to explain this. From what I tracked down so far sqlite3_step(statement) is sometimes unusually slow. While retrieving e.g. 50 rows from the database to execute this step takes normally some milliseconds but sometimes it takes several seconds.
My database is not small (80MB) and my theory is that the reason is paging. But can someone else think of an other reason for this?
Do you have a proper index on that table? Queries can be very slow if a full table scan is required to perform your query. See this page for example for some guidance on how to optimize your SQLite queries.
You may simply need to add an index to your table. Don't forget to reindex your table after you add your index (you'll need a tool, there's a free firefox add-on called "SQLite Manager" that does a pretty decent job for this)