How to find all objects that query a table in DB2 - db2

I want to find all objects (Stored Procedures, views, etc) that make queries to each table in DB2 hosted in IBM iSeries. Is there a way to track this over a time period say 1 month.

You can view the plan cache. This is not exactly what you are asking for, but I believe it should be able to give you the information you need. The plan cache is a cache of all SQL statements that have been executed, along with performance in formation about them. You can filter the plan cache by a number of things, including the run date, and objects referenced.
This is an interactive tool that is available in System i Navigator, or more recently in iACS. In iACS, you can search the plan cache by:
Navigate to Database -> SQL Performance Center.
Under the Plan Cache tab, click the Show Statements button.
The resulting dialog has filters on the left, and statements on the right sorted by total (cumulative) processing time in descending order. As long as you don't need some automated task to handle your queries, and you are not looking for the programs involved, then this should give you the information you are looking for.
If you are looking for the program that references a given table, you can use the command DSPPGMREF. This command allows you to capture (in a file in your case) all the objects referenced by a given program. As long as you run the command over all libraries that contain programs you are interested in, then you can query the output table to find every program that references a given table (except dynamic queries).

Related

Column level lineage without access to code

I am trying to determine column-level lineage between a target table and a number of source tables. The columns that end up in the target table come from one or more of the source tables and may have been transformed by one or more intermediate processes. Trouble is, I have no access to the intermediate processes - all I have are source tables and a target table. I am trying to find out whether there exists a class of solutions or tools for column level (fine-grained) lineage that assumes black-box processes.
Solutions that my searches turned up appear to determine lineage from code, e.g. SQL queries, etc, which obviously requires some foresight and pre-integration. I am working with legacy systems and data from different organizations for which getting access to transformation processes is just not going to happen. Searches for blackbox column level lineage didn't return anything I consider useful.
I am about to sketch out a custom solution but didn't want to undertake such a huge task without making sure something already exists. It seems unlikely that no one has tackled this problem.
Separately, I also would like to know whether there exists open-source standalone visualization tools for column-level lineage that was generated using another process.

Measure performance of DB2 view

I am trying to compare the performance of a view before and after adding an index. So I am trying to measure the performance of it using below query:
create table qtemp.ffs as (select * from psavlldsvw) with data  
Statement ran successfully   (1,932 ms  =  1.932 sec)
Above statement is what I have used where psavlldsvw is the view name.
As you might guess, the idea is to measure how much time the above query takes to complete in both cases.
Can I please get some feedback on how good this method is for comparison?
The test is indeed meaningless...
First of all the question is poorly worded, you can not and are not testing a view. Views are performance neutral on Db2 for i.
Running a statement, adding an index and rerunning the statement is a meaningless test. Db2 for i has all kinds tricks built in to improve the speed of a repeated statement. Among them
Input data cached in memory
Data access paths are left open
Starting from a fresh connection, you can ensure that no data is in memory by using SETOBJACC OBJ(YOURLIB/YOURFILE) OBJTYPE(*FILE) POOL(*PURGE) for each table referenced by your statement.
Now run the statement multiple times; at least 3 if the system defaults have not been changed. You should see that the first (few) iterations is slower than the last few. This is a result of the data access path being left open for a repeated statement.
Now add your index, disconnect/reconnect, clear the object(s) from memory and run your tests again.
Depending on the use case for the statement, you may want to focus on the first iteration performance or the later iterations.
Mao is correct in that using Visual Explain (VE) is the best way to see if an index is being used or otherwise understanding how the query is performing.
Lastly realize that load on the server effects how the query engine operates. The query engine optimizer will calculate your jobs "fair share" of memory and that value will affect rather or not some more efficient yet memory intensive plans would be used. So if you're testing in a non-prod environment that doesn't exactly match prod in terms of resources, data size and load, the results are likely to differ when the query is moved to prod.
Performance tuning is part art, part science. Generally, use VE to ensure that you've got a decent query to start with. Then monitor actual production use to ensure that it's preforming as expected.

How to get datasource query From PowerBI Datasets

We have hundreds of datasets where one or more table has source to AnalysisService Tabular (Import mode). On Analysis Service we have set up a log (extended events), where we can find how long the query was processed.
Now I wonder how to find from which Powerbi Report/Dataset certain queries come from. That I can point the business users to change some of the bad performing queries to better ones. I can't find a way to find this.
Is there a way to do that? Can we list a dataset with queries?
No, you can't do that, or at least not exactly. One option is, if you have Power BI Premium, to use Metrics app to find the reports with highest query wait times. Another option is to you the Scanner API, which can give you the tables used in each model.

tableau extract vs live

I just need a bit more clarity around tableau extract VS live. I have 40 people who will use tableau and a bunch of custom SQL scripts. If we go down the extract path will the custom SQL queries only run once and all instances of tableau will use a single result set or will each instance of tableau run the custom SQL separately and only cache those results locally?
There are some aspects of your configuration that aren't completely clear from your question. Tableau extracts are a useful tool - they essentially are temporary, but persistent, cache of query results. They act similar to a materialized view in many respects.
You will usually want to employ your extract in a central location, often on Tableau Server, so that it is shared by many users. That's typical. With some work, you can make each individual Tableau Desktop user have a copy of the extract (say by distributing packaged workbooks). That makes sense in some environments, say with remote disconnected users, but is not the norm. That use case is similar to sending out data marts to analysts each month with information drawn from a central warehouse.
So the answer to your question is that Tableau provides features that you can can employ as you choose to best serve your particular use case -- either replicated or shared extracts. The trick is then just to learn how extracts work and employ them as desired.
The easiest way to have a shared extract, is to publish it to Tableau Server, either embedded in a workbook or separately as a data source (which is then referenced by workbooks). The easiest way to replicate extracts is to export your workbook as a packaged workbook, after first making an extract.
A Tableau data source is the meta data that references an original source, e.g. CSV, database, etc. A Tableau data source can optionally include an extract that shadows the original source. You can refresh or append to the extract to see new data. If published to Tableau Server, you can have the refreshes happen on schedule.
Storing the extract centrally on Tableau Server is beneficial, especially for data that changes relatively infrequently. You can capture the query results, offload work from the database, reduce network traffic and speed your visualizations.
You can further improve performance by filtering (and even aggregating) extracts to have only the data needed to display your viz. Very useful for large data sources like web server logs to do the aggregation once at extract creation time. Extracts can also just capture the results of long running SQL queries instead of repeating them at visualization time.
If you do make aggregated extracts, just be careful that any further aggregation you do in the visualization makes sense. SUMS of SUMS and MINS of MINs are well defined. Averages of Averages etc are not always meaningful.
If you use the extract, than if will behave like a materialized SQL table, thus anything before the Tableau extract will not influence the result, until being refreshed.
The extract is used when the data need to be processed very fast. In this case, the copy of the source of data is stored in the Tableau memory engine, so the query execution is very fast compared to the live. The only problem with this method is that the data won't automatically update when the source data is updated.
The live is used when handling real-time data. Here each query is accessed from the source data, so the performance won't be as good as the extract.
If you need to work on a static database use extract else the live.
I am feeling from your question that you are worrying about performance issues, which is why you are wondering if your users should use tableau extract or use live connection.
From my opinion for both cases (live vs extract) it all depends on your infrastructure and the size of the table. It makes no sense to make an extract of a huge table that would take hours to download (for example 1 billion rows and 400 columns).
In the case all your users are directly connected on a database (not a tableau server), you may run on different issues. If the tables they are connecting to, are relatively small and your database processes well multiple users that may be OK. But if your database has to run many resource-intensive queries in parallel, on big tables, on a database that is not optimized for many users to access at the same time and located in a different time zone with high latency, that will be a nightmare for you to find a solution. On the worse case scenario you may have to change your data structure and update your infrastructure to allow 40 users to access the data simultaneously.

Query log equivalent for Progress/OpenEdge

Short story: A report running against a Progress database (OpenEdge Release 10.1C03) takes hours to complete. I suspect that it does not take advantage of existing data indexes. Would like to understand how it scans the data to then try to add an index that will make it run faster.
Source code of the report is not available. The code is native Progress 4GL, not SQL.
If it were an SQL database I would try to do a dump of SQL queries and would then go from that. With 4GL I did not find any such functionality. Is it possible to somehow peek at what gets executed at the low level?
What else can be done if there is no source code?
Thanks!
There are several things you can do:
If I recall correctly 10.1C should have the _usertablestat and _userindexstat virtual system tables available. These allow you to observe, at runtime, what tables and indexes are being accessed by a particular session. You can either write your own 4GL program to query them or you can use the screens in PROMON, R&D, 3 "Other Displays", 5 "I/O Operations by User by Table" and 6 "I/O Operations by User by Index". That will show you what tables and indexes are actually in use and how much use they are getting. If the observed data seems wrong it will probably give you a clue. (If the VSTs are missing it might be because the db was upgraded from an older version -- add them with proutil dbname -C updatevsts.)
You could also use the session startup parameters -clientlog "filename" and -logentrytypes QryInfo to obtain more detailed information about the queries being executed.
Keep in mind that Progress is not SQL. Unlike most SQL databases the 4gl uses a static, compile-time, optimizer. Index selection happens when the code is compiled. So unless you can recompile (and you seem to not have source so that seems unlikely) you won't be able to improve things by adding missing indexes. You might, however, at least be able to show the person who does have source where the problem is.
Another tool that can help is the profiler. This will identify where in the code the time is being spent. That can also be good information to provide to the original vendor if they need help finding the problem. For more information on the profiler: http://dbappraise.com/ppt/profiler.pptx