Crystal Reports - very large database, very long processing time - crystal-reports

I'm really at a loss as to how to procede.
I have a very large database, and the table I'm accessing has approx. 600,000 records. This database is accessed using an accounting application, which provides the report with the SQL query by which this report accesses the database.
My report has a linked subreport which has restrictions that are placed in the report header. When this report is run, the average time to refresh, using a very base query is 36 minutes. When adding two more items to the query, the report takes 2.5 hours.
Here is what I've tried:
cleaned up the report only leaving items in absolutely necessary - no difference
removed most formulas (removing the remaining formulas makes no time difference)
tried editing the SQL query - wasn't allowed because of the accounting application
tried flipping subreport and main report - didn't work
added other groupings - no difference
removed groupings - no difference
checked all the servers for lack of temp disc space - no issue
tried "on demand" subreport - no change
checked Parameters (discrete vs. range) and it is as it should be
tried bursting indexes, grouping on server, etc. - no difference
the report requires 2 passes. I've tried getting it down to one pass unsuccessfully.
There must be something I'm missing.
There does not appear to be any other modifications to the report using regular crystal functions. Is there any way to speed up the accessing of the data without having to go through all 600,000 records? The SQL query that accesses this data is long and has many requests. It is not something I can change.
Can I add something (formula?) that nullifies these requests? I'm reaching now...

Couple of things we have had success with is adding indexes to the databases, and instead of importing tables into the report, we instead wrote a stored procedure to retrieve the desired results.

If indices and stored procedures dont get you where you need to be you have reached the denormalise until it works part of life with a database. You might want to look at creating an MI database with tables optimized for your reporting needs; and some data transformation scripts that can extract the data from production to your MI database. Depending on what it is oracle / ms have tools to help you do this.

We use Crystal Reports with a billing system, and we had queries in the database that take over 1.5 hours to complete. This doesn't even take into account the rendering/formatting of the reports.
We created Materialized Views and force the client to refresh them daily. A materialized view is basically a database view that holds the returned dataset. The dataset is not refreshed unless you explicitly tell it to refresh.

Do you know what the SQL query is? If so, you can move the report outside the accounting application and paste the query directly into the Command in the database expert. I've had to do this in a couple of cases with another application I work with.

Related

DB2 Tables Not Loading when run in Batch

I have been working on a reporting database in DB2 for a month or so, and I have it setup to a pretty decent degree of what I want. I am however noticing small inconsistencies that I have not been able to work out.
Less important, but still annoying:
1) Users claim it takes two login attempts to connect, first always fails, second is a success. (Is there a recommendation for what to check for this?)
More importantly:
2) Whenever I want to refresh the data (which will be nightly), I have a script that drops and then recreates all of the tables. There are 66 tables, each ranging from 10's of records to just under 100,000 records. The data is not massive and takes about 2 minutes to run all 66 tables.
The issue is that once it says it completed, there is usually at least 3-4 tables that did not load any data in them. So the table is deleted and then created, but is empty. The log shows that the command completed successfully and if I run them independently they populate just fine.
If it helps, 95% of the commands are just CAST functions.
While I am sure I am not doing it the recommended way, is there a reason why a number of my tables are not populating? Are the commands executing too fast? Should I lag the Create after the DROP?
(This is DB2 Express-C 11.1 on Windows 2012 R2, The source DB is remote)
Example of my SQL:
DROP TABLE TEST.TIMESHEET;
CREATE TABLE TEST.TIMESHEET AS (
SELECT NAME00, CAST(TIMESHEET_ID AS INTEGER(34))TIMESHEET_ID ....
.. (for 5-50 more columns)
FROM REMOTE_DB.TIMESHEET
)WITH DATA;
It is possible to configure DB2 to tolerate certain SQL errors in nested table expressions.
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyfqetnint.html
When the federated server encounters an allowable error, the server allows the error and continues processing the remainder of the query rather than returning an error for the entire query. The result set that the federated server returns can be a partial or an empty result.
However, I assume that your REMOTE_DB.TIMESHEET is simply a nickname, and not a view with nested table expressions, and so any errors when pulling data from the source should be surfaced by DB2. Taking a look at the db2diag.log is likely the way to go - you might even be hitting a Db2 issue.
It might be useful to change your script to TRUNCATE and INSERT into your local tables and see if that helps avoid the issue.
As you say you are maybe not doing things the most efficient way. You could consider using cache tables to take a periodic copy of your remote data https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.data.fluidquery.doc/topics/iiyvfed_tuning_cachetbls.html

Slowness of Jasper Studio vs Jasper Server

I am facing a performance issue with jasper server. My query is for CrossTab. Query works fine in Toad as well as in jasper studio but it's execution is very slow in jasper server and sometimes it even fail with connection timeout.
I can't understand what is the reason for this behavior. Please help me.
Thank you
Query performance in jasper server depends on various factors but to get a quick idea of where the bottle neck might be in case of CrossTab (AdHoc functionality), follow these steps:
Login to JasperReport server through web UI (login as superuser) and take a look at Manage => Server Setings => Ad Hoc Cache. Here, analyze Query and Fetch column values.
Query (msec)
It shows the time from when query was sent to the db until the first row was received. If this is slow then one possible improvement would be to index some fields in the underlying database query. If you are using derived tables then trying switching to actual tables because derived tables are sub-queries/sub-selects and are intensive performance wise.
Fetch (msec)
Time from when first row was received until the last
row was received. If this is slow there might be a network
bottleneck. Try to set the fetch size in the jasperreports.properties
file to modify the number of rows to fetch at a time. Optimizing this can reduce the number of trips to the underlying database.

SSIS or TSQL for SQL/MySQL table comparrison

I am new to SSIS and am after some assistance in creating an SSIS package to do a specific task. My data is stored remotely within a MySQL Database and this is downloaded to a SQL Server 2014 Database. What I want to do is the following, create a package where I can enter 2 dates that can be compared against the create date/date modified per record on a number of tables to give me a snap shot and compare the MySQL Data to the SQL Data so that I can see if there are any rows that are missing from my local SQL Database or if any need to be updated. Some tables have no dates so I just want to see a record count on what is missing if anything between the 2. If this is better achieved through TSQL I am happy to hear about other suggestions or sites to look at where things have been done similar.
In relation to your query Tab :
"Hi Tab, What happens at the moment is our master data is stored in a MySQL Database, the data was then downloaded to a SQL Server Database as a one off. What happens at the moment is I have a SSIS package that uses the MAX ID which can be found on most of the tables to work out which records are new and just downloads them or updates them. What I want to do is run separate checks on the tables to make sure that during the download nothing has been missed and everything is within sync. In an ideal world I would like to pass in to a SSIS package or tsql stored procedure a date range, shall we say calender week, this would then check for any differences between the remote MySQL database tables and the local SQL tables. It does not currently have to do anything but identify issues, correcting them may come later or changes would need to be made to the existing sync package. Hope his makes more sense."
Thanks P
To do this, you need to implement a Type 1 Slowly Changing Dimension type data flow in SSIS. There are a number of ways to do this, including a built in transformation aptly called the Slowly Changing Dimension transformation. Whilst this is easy to set up, it is a pain to maintain and it runs horrendously slowly.
There are numerous ways to set this up using other transformations or even SQL merge statements which are detailed here: https://bennyaustin.wordpress.com/2010/05/29/alternatives-to-ssis-scd-wizard-component/
I would recommend that you use Lookup transformations as they perform better than the Slowly Changing Dimension transformation but offer better diagnostics and error handling than the better performing SQL merge statement.
Before you do this you will need to add a Checksum or Hashbytes column to your SQL data for ease of comparison with the incoming MySQL data.
In short, calculate some sort of repeatable checksum as the data is downloaded into your SQL Server, then use this in an SSIS Lookup, matching on the row key, to check for changes. Where the checksum value is different for the same row it needs updating and where there is no matching row key in your SQL Data you need to insert the new row.

Crystal Reports Performance Options

We create several crystal reports based on SQL Server - usually 2005 or 2008. Broadly there are 2 kind of reports
a) tabular reports - which shows some data in a table (for example, invoice list)
b) document layouts - which shows data in specific format - usually from one or two main tables - and several secondary tables (for example, invoice)
We sometimes use tables directly in crystal. Or create a procedure in SQL and than use that procedure. One invoice could refer to usually around 10-12 tables. Most of these linked using left outer join to the primary invoice table.
What option is better - using tables in crystal (and let crystal create and run the sql query) - or create a query - and than use that query in crystal. Which one will give better performance?
There will be no difference in performance between a query generated by the 'Database Expert' versus the same SQL added to a Command. One caveat: ensure that the record-selection formula can be parsed and sent to the database (a filter applied WhileReadingRecords will definitely be less efficient that a pure-SQL one).
Reasons to prefer the 'Database Expert':
prior to v 2008, Command objects didn't support a multivalued parameter
easier to manage (somewhat subjective)
Reasons to prefer a Command:
you can add hints
you have more finely-grained control over the SQL (e.g. in-line views, CTEs, more-complex JOINs, subselects)
Personally, I try to avoid stored procedures as they offer minimal performance benefits, but require a more-signification investment in development and maintenance.
In the end, there is no substitute for performance. Try you query both ways and measure the results.
Coding it yourself will almost invariably run faster -- after all, you know what your data looks like, and Crystal doesn't. Also, there are things you can do in manual queries (windowing functions, for example) that Crystal can't.
Crystal had tendency to do some crazy stuff behind the scenes. You can view the "Show SQL Query" under the Database menu options to see what it creates. If find it easier to write the query in SQL as I can optimize it myself much easier. I also prefer to do any calculated/formula fields in SQL to and just use Crystal as a display interface. If you do put logic in crystal remember that it is running that logic for every record returned... so if there are conditions that exclude a record from a formula put that first to limit the time spent in the calculation.

SSRS 2008 - refresh report without requerying data?

I have four parameters on my report. Three of them are required for the underlying stored procedure data source, but the fourth parameter is just used to show/hide items on the report.
If the user changes the value for that fourth parameter, is there a way to refresh the report using the existing data without running the stored procedure again? The result set won't change, only the rows that are to be displayed.
Reporting Services 2008 seems to treat each combination of report parameters as a unique set, even if some of them are internal to the report only, and not related to the stored procedure. Therefore, aside from using report caching, there is no way to prevent report server from making a round trip to the database, even if only the internal parameter changes. You basically have two options:
Turn on report caching in report server, and run all combinations of
the four parameters, so that the user will be accessing report
server's cache when she runs any report. This avoids making a round trip to the database, but only for the parameter values you've already tried.
write your underlying stored procedure with caching behavior so that it writes its results to a database table. Whenever the stored procedure is run, have it first check the table to see if the results for the current set of parameter values is already stored in the cache table, and if so return those rows to report server. This still requires a round trip, but it is faster than running the procedure again.