Does to good to do string concatination at SSRS report or its good to use SQL query for that? - ssrs-2008

I am working on SSRS report and I have some column values to be concat while displaying in the report. So does to advisable to do that at report end or I have to do it at SQL query and bind that value directly to the report.
I am having 4 columns that I have to concat to single column while binding it to report.
there are three different ways to do that,
Can do at SQL query to get combined column.
Can create expression while binding dataset to tablix.
Create a calculated field in dataset and bind that to my tablix.
from above three which one is advisable to get better performance.

This question is very broad but let me put it this way.
If you put business rules in the database then they can be consistently reused by many things beyond SSRS, for example, Excel, Power BI, data extracts
The downside is that it is often more technically difficult to consistently apply rules at a lower level like this. In other words you need a SQL Developer to do this properly rather than if you did the calc in SSRS, in which case you just need a SSRS developer.
So if you have a team full of SSRS developers, then it's going to be easier to create and maintain rules in SSRS, but the downside is these rules can't be reused by anything else.
Short answer: do it in a view in the database unless this is going to be difficult to maintain because your team doesn't have any SQL skills.

Related

How to use RLS with compund field

In Redshift we have a table (let's call it entity) which among other columns it has two important ones: hierarchy_id & entity_timestampt, the hierarchy_id is a combination of the ids of three hierarchical dimensions (A, B, C; each one having a relationship of one-to-many with the next one).
Thus: hierarchy_id == A.a_id || '-' || B.b_id || '-' || C.c_id
Additionally the table is distributed according to DISTKEY(hierarchy_id) and sorted using COMPOUND SORTKEY(hierarchy_id, entity_timestampt).
Over this table we need to generate multiple reports, some of them are fixed to the depths level of the hierarchy, while others will be filtered by higher parts and group the results by the lowers. However, the first layer of the hierarchy (the A dimension) is what defines our security model, users will never have access to different A dimensions other than the one they belong (this is our tenant information).
The current design proven to be useful for that matter when we were prototyping the reports in plain SQL as we could do things like this for the depths queries:
WHERE
entity.hierarchy_id = 'fixed_a_id-fixed_b_id-fixed_c_id' AND
entity.entity_timestampt BETWEEN 'start_date' AND 'end_data'
Or like this for filtering by other points of the hierarchy:
WHERE
entity.hierarchy_id LIKE 'fixed_a_id-%' AND
entity.entity_timestampt BETWEEN 'start_date' AND 'end_data'
Which would still take advantage of the DISTKEY & SORTKEY setup, even though we are filtering just for a partial path of the hierarchy.
Now we want to use QuickSight for creating and sharing those reports using the embedding capabilities. But we haven't found a way to filter the data of the analysis as we want.
We tried to use the RLS by tags for annonymous users, but we have found two problems:
How to inject the A.a_id part of the query in the API that generates the embedding URL in a secure way (i.e. that users can't change it), While allowing them to configure the other parts of the hierarchy. And finally combining those independent pieces in the filter; without needing to generate a new URL each time users change the other parts.
(however, we may live with this limitation but)
How to do partial filters; i.e., the ones that looked like LIKE 'fixed_a_id-fixed_b_id-%' Since it seems RLS is always an equals condition.
Is there any way to make QuickSight to work as we want with our current table design? Or would we need to change the design?
For the latter, we have thought on keeping the three dimension ids as separated columns, that way we may add RLS for the A.a_id column and use parameters for the other ones, the problem would be for the reports that group by lower parts of the hierarchy, it is not clear how we could define the DISTKEY and SORTKEY so that the queries are properly optimized.
COMPOUND SORTKEY(hierarchy_id, entity_timestampt)
You are aware you are sorting on only the first eight bytes of hierarchy_id? and the ability of the zone map to differentiate between blocks is based purely on the first eight bytes of the string?
I suspect you would have done a lot better to have had three separate columns.
Which would still take advantage of the DISTKEY & SORTKEY setup, even though we are filtering just for a partial path of the hierarchy.
I may be wrong - I would need to check - but I think if you use operators of any kind (such as functions, or LIKE, or even addition or subtraction) on a sortkey, the zone map does not operate and you read all blocks.
Also in your case, it may be - I've not tried using it yet - if you have AQUA enabled, because you're using LIKE, your entire query is being processed by AQUA. The performance consequences of this, positive and/or negative, are completely unknown to me.
Have you been using the system tables to verify your expectations of what is going on with your queries when it comes to zone map use?
the problem would be for the reports that group by lower parts of the hierarchy, it is not clear how we could define the DISTKEY and SORTKEY so that the queries are properly optimized.
You are now facing the fundamental nature of sorted column-store; the sorting you choose defines the queries you can issue and so also defines the queries you cannot issue.
You either alter your data design, in some way, so what you want becomes possible, or you can duplicate the table in question where each duplicate has different sorting orders.
The first is an art, the second has obvious costs.
As an aside, although I've never used Quicksight, my experience with all SQL generators has been that they are completely oblivious to sorting and so the SQL they issue cannot be used on Big Data (as sorting is the method by which Big Data can be handled in a timely manner).
If you do not have Big Data, you'll be fine, but the question then is why are you using Redshift?
If you do have Big Data, the only solution I know of is to create a single aggregate table per dashboard, about 100k rows, and have the given dashboard use and only use that one table. The dashboard should normally simply read the entire table, which is fine, and then you avoid the nightmare SQL it normally will produce.

How to get datasource query From PowerBI Datasets

We have hundreds of datasets where one or more table has source to AnalysisService Tabular (Import mode). On Analysis Service we have set up a log (extended events), where we can find how long the query was processed.
Now I wonder how to find from which Powerbi Report/Dataset certain queries come from. That I can point the business users to change some of the bad performing queries to better ones. I can't find a way to find this.
Is there a way to do that? Can we list a dataset with queries?
No, you can't do that, or at least not exactly. One option is, if you have Power BI Premium, to use Metrics app to find the reports with highest query wait times. Another option is to you the Scanner API, which can give you the tables used in each model.

OLAP Approach for Backend redshift connection

We have a system where we do some aggregations in Redshift based on some conditions. We aggregate this data with complex joins which usually takes about 10-15 minutes to complete. We then show this aggregated data on Tableau to generate our reports.
Lately, we are getting many changes regarding adding a new dimension ( which usually requires join with a new table) or get data on some more specific filter. To entertain these requests we have to change our queries everytime for each of our subprocesses.
I went through OLAP a little bit. I just want to know if it would be better in our use case or is there any better way to design our system to entertain such adhoc requests which does not require developer to change things everytime.
Thanks for the suggestions in advance.
It would work, rather it should work. Efficiency is the key here. There are few things which you need to strictly monitor to make sure your system (Redshift + Tableau) remains up and running.
Prefer Extract over Live Connection (in Tableau)
Live connection would query the system everytime someone changes the filter or refreshes the report. Since you said the dataset is large and queries are complex, prefer creating an extract. This'll make sure data is available upfront whenever someone access your dashboard .Do not forget to schedule the extract refresh, other wise the data will be stale forever.
Write efficient queries
OLAP systems are expected to query a large dataset. Make sure you write efficient queries. It's always better to first get a small dataset and join them rather than bringing everything in the memory and then joining / using where clause to filter the result.
A query like (select foo from table1 where ... )a left join (select bar from table2 where) might be the key at times where you only take out small and relevant data and then join.
Do not query infinite data.
Since this is analytical and not transactional data, have an upper bound on the data that Tableau will refresh. Historical data has an importance, but not from the time of inception of your product. Analysing the data for the past 3, 6 or 9 months can be the key rather than querying the universal dataset.
Create aggregates and let Tableau query that table, not the raw tables
Suppose you're analysing user traits. Rather than querying a raw table that captures 100 records per user per day, design a table which has just one (or two) entries per user per day and introduce a column - count which'll tell you the number of times the event has been triggered. By doing this, you'll be querying sufficiently smaller dataset but will be logically equivalent to what you were doing earlier.
As mentioned by Mr Prashant Momaya,
"While dealing with extracts,your storage requires (size)^2 of space if your dashboard refers to a data of size - **size**"
Be very cautious with whatever design you implement and do not forget to consider the most important factor - scalability
This is a typical problem and we tackled it by writing SQL generators in Python. If the definition of the metric is the same (like count(*)) but you have varying dimensions and filters you can declare it as JSON and write a generator that will produce the SQL. Example with pageviews:
{
metric: "unique pageviews"
,definition: "count(distinct cookie_id)"
,source: "public.pageviews"
,tscol: "timestamp"
,dimensions: [
['day']
,['day','country']
}
can be relatively easy translated to 2 scripts - this:
drop table metrics_daily.pageviews;
create table metrics_daily.pageviews as
select
date_trunc('day',"timestamp") as date
,count(distinct cookie_id) as "unique_pageviews"
from public.pageviews
group by 1;
and this:
drop table metrics_daily.pageviews_by_country;
create table metrics_daily.pageviews_by_country as
select
date_trunc('day',"timestamp") as date
,country
,count(distinct cookie_id) as "unique_pageviews"
from public.pageviews
group by 1,2;
the amount of complexity of a generator required to produce such sql from such config is quite low but in increases exponentially as you need to add new joins etc. It's much better to keep your dimensions in the encoded form and just use a single wide table as aggregation source, or produce views for every join you might need and use them as sources.

SSRS rows grouping - front end or back end

On hand is a requirement for a report that needs to perform a substring operation and grouping on the chopped strings in a column. For example, consider my over-simplified scenario:
Among others, I have a column called FileName, which may have values like this
NWSTMT201308201230_STMTA
NWSTMT201308201230_STMTB
NWSTMT201308201230_STMTC
etc.
The report I'm working on should do the grouping on the values before the _ sign.
Assuming the volume of data is large, where is the best place to do the substring & grouping - in the Stored procedure or return the raw data and do all the work in SSRS?The expectation is to have good performance and maintainability.
As you mention, there are a few different possibilities. There's no correct answer for this, but certainly each method has advantages and disadvantages.
My take on the options:
On the SQL server: as a computed column in a view.
Pro: Easy to reuse if the query will be used by multiple reports or other queries.
Con: Very poor language for string manipulation.
On the SQL Server: Query embedded in the report, calculation still in query. Similar to 1, but now you lose the advantage of reuse.
Pro: report is very portable: changes can be tested against production data without disturbing current production reports.
Con: Same as 1, string manipulation in SQL is no fun. Less centralized, so possibly harder to maintain.
In the report, in formulas where required. Many disadvantages to this method, but one advantage:
Pro: It's easy to write.
Con: Maintenance is very difficult; finding all occurences of a formula can be a pain. Limited to VB Script-like commands. Editor in SSRS Authoring environment is no fun, and lacks many basic code editing features.
In the report, in the centralized code for the report.
Pro: VB.NET syntax, global variables, easy maintenance, with centralized code per report.
Con: VB.NET Syntax (I greatly prefer C#.) Editor is no better than the formula windows. You'll probably still end up writing this in another window and cutting and pasting to its destination.
Custom .NET assembly: compiled as a .dll, and called from the report.'
Pro: Use any .NET language, full Visual Studio editor support, along with easy source control and centralization of code.
Con: More finicky to get set up, report deployment will require a .dll deployed to the SSRS Server.
So my decision process for this is something like:
Is this just a one time, easy formula? Use method 3.
Is this cleanly expressed in SQL and only used in one report? Method 2.
Cleanly expressed in SQL and used in multiple reports or queries? Method 1.
Better expressed in Visual Basic than SQL? Method 4.
Significant development effort going into this with multiple developers? Method 5.
Too often I'll start following method 3, and then realize I've used the formula too many places and I should have centralized earlier. Also, our team is pretty familiar with SQL, so that pushes towards the first two options more than some shops might be.
I'd put performance concerns second unless you know that you have a problem. Putting this code in SQL can sometimes pay off, but if you aren't careful, you can end up calling things excessively on results that are ultimately filtered out.

crystal report & sub report

am developing a payroll project in vb.net with SQL server, in this am using SEAGATE crystal report for taking reports, if i use more than 10 sub reports in single report weather it affect my project efficiency or it will take more time
Yes, it will take more time since you're in effect probably running 10 different queries and the reporting tool is probably having to link the results of all of those queries.
I've written reports with 3 or 4 subreports, but usually more is unnecessary. I would try to think of a workaround for that many subreports - usually there's a way. (For example, use a column as a toggle for showing/hiding or grouping data.)
Actually it is hard to say, how your subreports affect performance. I've designed reports, where using subreports makes entire report run faster - sometimes it is not so easy to build underlying query to be more efficient than many simpler queries for subreports.
One example is A-B-A-C type reports, where there are many one-to-many relations from master table/query (A) to subtables/queries (B,C) AND users want to see all BC type data at once (not on-demand). For single query it would be A*B*C rows to process (and implement nasty logic to show-hide sections), using subreports you can deal with A*(B+C) total rows to process and display.
But when you use subreport to display only some total value, then often it is more efficient to aggregate it already in master view - takes less time both in server and while transmitting data. Crystal Reports formatting time is usually negligible compared to query execution time.
Like always, optimum strategy depends on particular report needs.