I'm doing some test to see the capability of Postgresql query performance. And my plan is to disable/enable different query optimizations, such as disable (automatic) index, disable re-order joins etc. is there some configuration to achieve this quickly or what part source code should I revise? thanks.
PostgreSQL provides debug parameters for exactly this.
See the enable_ params in the docs.
Please don't use them in production; it might seem like a good idea at the time, but those settings won't adapt as the data changes and grows, so what starts out as a performance win will probably become a problem later.
Related
I'm studying query optimization and want to know how much each kind of optimizations help the query. Last time, I got an answer but when in my experiments, disable all optimization in the link has time complicity of O(n^1.8) enable all of them has O(n^0.5). there is not so much difference, if disable all of them, is there still other optimizations? how can I really have only one main optimizations each time?
You can't.
PostgreSQL's query planner has no "turn off optimisation" flag.
It'd be interesting to add, but would make the regression tests a lot more complex, and be of very limited utility.
To do what you want, I think you'd want to modify the query planner code, recompile, and reinstall PostgreSQL for each test. Or hack it to add a bunch of custom GUCs (system variables, like enable_seqscan) to let you turn particular optimisations on and off.
I doubt any such patch would be accepted into PostgreSQL, but it'd be worth doing as a throwaway.
The only challenge is that PostgreSQL doesn't differentiate strongly between "optimisation" and "thing we do to execute the query". Sometimes parts of the planner code expect and require that a particular optimisation has been applied in order to work correctly.
Short story: A report running against a Progress database (OpenEdge Release 10.1C03) takes hours to complete. I suspect that it does not take advantage of existing data indexes. Would like to understand how it scans the data to then try to add an index that will make it run faster.
Source code of the report is not available. The code is native Progress 4GL, not SQL.
If it were an SQL database I would try to do a dump of SQL queries and would then go from that. With 4GL I did not find any such functionality. Is it possible to somehow peek at what gets executed at the low level?
What else can be done if there is no source code?
Thanks!
There are several things you can do:
If I recall correctly 10.1C should have the _usertablestat and _userindexstat virtual system tables available. These allow you to observe, at runtime, what tables and indexes are being accessed by a particular session. You can either write your own 4GL program to query them or you can use the screens in PROMON, R&D, 3 "Other Displays", 5 "I/O Operations by User by Table" and 6 "I/O Operations by User by Index". That will show you what tables and indexes are actually in use and how much use they are getting. If the observed data seems wrong it will probably give you a clue. (If the VSTs are missing it might be because the db was upgraded from an older version -- add them with proutil dbname -C updatevsts.)
You could also use the session startup parameters -clientlog "filename" and -logentrytypes QryInfo to obtain more detailed information about the queries being executed.
Keep in mind that Progress is not SQL. Unlike most SQL databases the 4gl uses a static, compile-time, optimizer. Index selection happens when the code is compiled. So unless you can recompile (and you seem to not have source so that seems unlikely) you won't be able to improve things by adding missing indexes. You might, however, at least be able to show the person who does have source where the problem is.
Another tool that can help is the profiler. This will identify where in the code the time is being spent. That can also be good information to provide to the original vendor if they need help finding the problem. For more information on the profiler: http://dbappraise.com/ppt/profiler.pptx
I just got into a new company and my task is to optimize the Database performance. One possible (and suggested) way would be to use multiple servers instead of one. As there are many possible ways to do that, i need to analyse the DB first. Is there a tool with which i can measure how many Inserts/Updates and Deletes are performed for each table?
I agree with Surfer513 that the DMV is going to be much better than CDC. Adding CDC is fairly complex and will add a load to the system. (See my article here for statistics.)
I suggest first setting up a SQL Server Trace to see which commands are long-running.
If your system makes heavy use of stored procedures (which hopefully it does), also check out sys.dm_exec_procedure_stats. That will help you to concentrate on the procedures/tables/views that are being used most-often. Look at execution_count and total_worker_time.
The point is that you want to determine which parts of your system are slow (using Trace) so that you know where to spend your time.
One way would be to utilize Change Data Capture (CDC) or Change Tracking. Not sure how in depth you are looking for with this, but there are other simpler ways to get a rough estimate (doesn't look like you want exacts, just ballpark figures..?).
Assuming that there are indexes on your tables, you can query sys.dm_db_index_operational_stats to get data on inserts/updates/deletes that affect the indexes. Again, this is a rough estimate but it'll give you a decent idea.
Can we get a list of basic optimization techniques going (anything from modeling to querying, creating indexes, views to query optimization). It would be nice to have a list of these, one technique per answer. As a hobbyist I would find this to be very useful, thanks.
And for the sake of not being too vague, let's say we are using a maintstream DB such as MySQL or Oracle, and that the DB will contain 500,000-1m or so records across ~10 tables, some with foreign key contraints, all using the most typical storage engines (eg: InnoDB for MySQL). And of course, the basics such as PKs are defined as well as FK contraints.
Learn about indexes, and use them properly. Generally speaking*, follow these guidelines:
Every table should have a clustered index
Fields used for filters and sorts are good candidates for indexing
More selective fields are better candidates for indexing
For best performance on crucial queries, design "covering indexes" for those queries
Make sure your indexes are actually being used, and remove those that aren't
If your table has 15 fields, and you make 15 indexes, each with only a single field, you're doing it wrong :)
*There are some exceptions to these rules if you know what you're doing. My experience is Microsoft SQL Server, but I would presume most of this advice would still apply to a different RDMS.
IMO, by far the best optimization is to have the data model fit the problem domain for which it was built. When it does not, the resulting symptom is difficult-to-write or convoluted queries in order to get the information desired and that typically rears itself when reports are built against the database. Thus, in designing a database it helps to have an idea as to the types and nature of the information, such as reports, that the users will want from the system.
When talking database design, check out the database normalization, e.g. the wikipedia article: Normal forms.
If you have a good design and still you need to optimize for performance, try Denormalisation.
If you have specific needs which are not covered by relational model efficiently, look at other models covered by the term NoSQL.
Some query/schema optimizations:
Be mindful when using DISTINCT or GROUP BY. I find that many new developers will use DISTINCT in places where it really is not needed or could be rewritten more efficiently using an Exists statement or a derived query.
Be mindful of Left Joins. All too often I find new SQL developers will ignore the schema in place and use Left Joins where they really are not necessary. For example:
Select
From Orders
Left Join Customers
On Customers.Id = Orders.CustomerId
If Orders.CustomerId is a required column, then it is not necessary to use a left join.
Be a student of new features. Currently, MySQL does not support common-table expressions which means that some types of queries are cumbersome and probably slower to write than they would be if CTEs were supported. However, that will not be true forever. Keep up on new syntax features in MySQL which might be used to make existing queries more efficient.
You do not have to use surrogate keys everywhere. There might be tables better suited to an intelligent key (e.g. US State abbreviations, Currency Codes etc) which would enable developers to avoid additional joins in many cases.
If possible, find ways of archiving data to an OLAP or reporting server. The smaller you can make the production data, the faster it will run.
A design that concisely models your problem is always a good start. Overgeneralizing the data model can lead to performance problems. For example, I've heard reports of projects striving for uber-flexibility that use the RDBMS as a dumb "name/value" store - and resulting performance was appalling.
Once a good design is in place, then use the tools provided by the RDBMS to help it achieve good performance. Single field PKs (no composites), but composite business keys as an index with unique constraint, use of appropriate data types, e.g. using appropriate numeric types for numeric values rather than char or similar. Physical attributes of the hardware the RDBMS is running on should also be considered, since the bulk of query time is often disk I/O - but of course don't take this for granted - use a profiler to find out where the time is going.
Depending upon the update/query ratio, materialized views/indexed views can be useful in improving performance for slow running queries. A poor-man's alternative is to use triggers to invoke a procedure that populates the table with a result of a slow-running, infrequently-changed view.
Query optimization is a bit of a black art since it is often database-dependent, but some rules of thumb are given here - Optimizing SQL.
Finally, although possibly outside the intended scope of your question, use a good data access layer in your application, and avoid the temptation to roll your own - there are surely tested and performant implementations available for all major languages. Use of caching at the data access layer, middle tier and application layer can help improve performance considerably.
Do use less query whenever possible. Use "JOIN", and group your tables so that a single query gives your results.
A good example is the Modified Preorder Tree Transversal (MPTT) to get all of a tree node parents, ordered, in a single query.
Take a holistic approach to optimization.
Consider the impact of slow disks, network latency, lack of memory, and server load.
How can i check the Query running from long time & steps of tuning the query? (Oracle)
Run explain plan for select .... to see what Oracle is doing with your query.
Post your query here so that we can look at it and help you out.
Check out the Oracle Performance Tuning FAQ for some tricks-of-the-trade, if you will.
You can capture the query by selecting from v$sql or v$sqltext.
If you are not familiar with it, look up 'Explain Plan' in the Oracle
documentation. There should be plenty on it in the performance tuning
guide.
Have a look at Quest Software's Toad for a third party tool that helps
in this area too.
K
Unfortunately your question is not expressed clearly. The other answers have already tackled the issue of tuning a known bad query, but another interpretation is that you want to monitor your database to find poorly performing queries.
If you don't have Enterprise Edition with the Diagnostics pack - and not many of us do - your best bet is to run statspack snapshots on a reqular basis. This will give you a lot of information about your system, including which queries take a long time to complete and which queries consume a lot of your system's resources. You can find out more about statspack here.
If you do not want to use OEM, then you can query and find out.
First find the long running query. If it's currently being executing, You can join gv$session to find which session running since long time. Then go to gv$sql to find SQL details. You need to look last_call_et column.If SQL executed some time inpast you can use dba_hist_snapshot ,dba_hist_sqlstat ,DBA_HIST_SQLTEXT tables to find offending SQL.
Once you get query, you can check what plan it's picking from dba_hist_sql_plan table if this SQL executed in past or from gv$sql_plan if it's currently executing.
Now you analyze execution plan and see if it's using right index, join etc.
If not tune those.
Let me know which step you have the problem. I can help you in answering those.