SSRS rows grouping - front end or back end - tsql

On hand is a requirement for a report that needs to perform a substring operation and grouping on the chopped strings in a column. For example, consider my over-simplified scenario:
Among others, I have a column called FileName, which may have values like this
NWSTMT201308201230_STMTA
NWSTMT201308201230_STMTB
NWSTMT201308201230_STMTC
etc.
The report I'm working on should do the grouping on the values before the _ sign.
Assuming the volume of data is large, where is the best place to do the substring & grouping - in the Stored procedure or return the raw data and do all the work in SSRS?The expectation is to have good performance and maintainability.

As you mention, there are a few different possibilities. There's no correct answer for this, but certainly each method has advantages and disadvantages.
My take on the options:
On the SQL server: as a computed column in a view.
Pro: Easy to reuse if the query will be used by multiple reports or other queries.
Con: Very poor language for string manipulation.
On the SQL Server: Query embedded in the report, calculation still in query. Similar to 1, but now you lose the advantage of reuse.
Pro: report is very portable: changes can be tested against production data without disturbing current production reports.
Con: Same as 1, string manipulation in SQL is no fun. Less centralized, so possibly harder to maintain.
In the report, in formulas where required. Many disadvantages to this method, but one advantage:
Pro: It's easy to write.
Con: Maintenance is very difficult; finding all occurences of a formula can be a pain. Limited to VB Script-like commands. Editor in SSRS Authoring environment is no fun, and lacks many basic code editing features.
In the report, in the centralized code for the report.
Pro: VB.NET syntax, global variables, easy maintenance, with centralized code per report.
Con: VB.NET Syntax (I greatly prefer C#.) Editor is no better than the formula windows. You'll probably still end up writing this in another window and cutting and pasting to its destination.
Custom .NET assembly: compiled as a .dll, and called from the report.'
Pro: Use any .NET language, full Visual Studio editor support, along with easy source control and centralization of code.
Con: More finicky to get set up, report deployment will require a .dll deployed to the SSRS Server.
So my decision process for this is something like:
Is this just a one time, easy formula? Use method 3.
Is this cleanly expressed in SQL and only used in one report? Method 2.
Cleanly expressed in SQL and used in multiple reports or queries? Method 1.
Better expressed in Visual Basic than SQL? Method 4.
Significant development effort going into this with multiple developers? Method 5.
Too often I'll start following method 3, and then realize I've used the formula too many places and I should have centralized earlier. Also, our team is pretty familiar with SQL, so that pushes towards the first two options more than some shops might be.
I'd put performance concerns second unless you know that you have a problem. Putting this code in SQL can sometimes pay off, but if you aren't careful, you can end up calling things excessively on results that are ultimately filtered out.

Related

Does to good to do string concatination at SSRS report or its good to use SQL query for that?

I am working on SSRS report and I have some column values to be concat while displaying in the report. So does to advisable to do that at report end or I have to do it at SQL query and bind that value directly to the report.
I am having 4 columns that I have to concat to single column while binding it to report.
there are three different ways to do that,
Can do at SQL query to get combined column.
Can create expression while binding dataset to tablix.
Create a calculated field in dataset and bind that to my tablix.
from above three which one is advisable to get better performance.
This question is very broad but let me put it this way.
If you put business rules in the database then they can be consistently reused by many things beyond SSRS, for example, Excel, Power BI, data extracts
The downside is that it is often more technically difficult to consistently apply rules at a lower level like this. In other words you need a SQL Developer to do this properly rather than if you did the calc in SSRS, in which case you just need a SSRS developer.
So if you have a team full of SSRS developers, then it's going to be easier to create and maintain rules in SSRS, but the downside is these rules can't be reused by anything else.
Short answer: do it in a view in the database unless this is going to be difficult to maintain because your team doesn't have any SQL skills.

Why to build a SSAS Cube?

I was just searching for the best explanations and reasons to build a OLAP Cube from Relational Data. Is that all about performance and query optimization?
It will be great if you can give links or point out best explanations and reasons for building a cube, as we can do all the things from relational database that we can do from cube and cube is faster to show results.Is there any other explanation or reasons?
There are many reasons why you should use a cube for analytical proccessing.
Speed. Olap wharehouses are read only infrastractures providing 10 times faster queries than their oltp counterparts. See wiki
Multiple data integration. On a cube you can easily use multiple data sources and do minimal work with many automated tasks (especially when you use SSIS) to intergrate them on a single analysis system. See elt process
Minimum code. That is, you need not write queries. Even though you can write MDX - the language of the cubes in SSAS, the BI Studio does most of the hard work for you. On a project I am working on, at first we used SSRS to provide reports for the client. The queries were long and hard to make and took days to implement. Their SSAS equivalent reports took us half an hour to make, writing only a few simple queries to trasform some data.
A cube provides reports and drill up-down-through, without the need to write additional queries. The end user can traverse the dimension automatically, as the aggregations are already stored in the warehouse. This helps as the users of the cube need only traverse its dimensions to produce their own reports without the need to write queries.
Is is part of the Bussiness Intelligence. When you make a cube it can be fed to many new technologies and help in the implementation of BI solutions.
I hope this helps.
If you want a top level view, use OLAP. Say you have millions of rows detailing product sales and you want to know your monthly sales totals.
If you want bottom-level detail, use OLTP (e.g. SQL). Say you have millions of rows detailing product sales and want to examine one store's sales on one particular day to find potential fraud.
OLAP is good for big numbers. You wouldn't use it to examine string values, really...
It's bit like asking why using JAVA/C++ when we can do everything with Assembly Language ;-) Building a cube (apart from performance) is giving you the MDX language; this language has higher level concepts than SQL and is better with analytic tasks. Perhaps this question gives more info.
My 2 centavos.

When are TSQL Cursors the best or only option?

I'm having this argument about using Cursors in TSQL recently...
First of all, I'm not a cheerleader in the debate. But every time someone says cursor, there's always some knucklehead (or 50) who pounce with the obligatory 'cursors are evil' mantra. I know SQL-Server was optimized for set-based operations, and maybe cursors truly ARE evil incarnate, but if I wanted to put some objective thought behind that...
Here's where my mind is going:
Is the only difference between cursors and set operations one of performance?
Edit: There's been a good case made for it not being simply a matter of performance -- such as running a single batch over-and-over for a list of id's, or alternatively, executing actual SQL text stored in a table field row-by-row.
Follow-up: do cursors always perform worse?
EDIT: #Martin shows a good case where Cursors out-perform set-based operations fairly dramatically. I suspect that this wouldn't be the kind of thing you'd do too often (before you resorted to some kind of OLAP / Data Warehouse kind of solution), but nonetheless, seems like a case where you really couldn't live without a cursor.
reference to TPC benchmarks suggesting cursors may be more competitive than folks generally believe.
reference to memory-usage optimizations for cursors since Sql-Server 2005
Are there any problems you can think of, that cursors are better suited to solve than set-based operations?
EDIT: Set-based operations literally cannot Execute stored procedures, etc. (see edit for item 1 above).
EDIT: Set-based operations are exponentially slower than row-by-row when it comes to aggregating over large data sets.
Article from MSDN explaining their perspective
of the most common problems people resort to cursors for (and some
explanation of set-based techniques that would work better.)
Microsoft says (vaguely) in the 2008 Transact SQL Reference on MSDN: "...there are times when the results are best processed one row at a time", but the don't give any examples as to what cases they're referring to.
Mostly, I'm of a mind to convert cursors to set-based operations in my old code if/as I do any significant upgrades to various applications, as long as there's something to be gained from it. (I tend toward laziness over purity a lot of the time -- i.e., if it ain't broke, don't fix it.)
To answer your question directly:
I have yet to encounter a situation where set operations could not do what might otherwise be done with cursors. However, there are situations where using cursors to break a large set problem down into more manageable chunks proves a better solution for purposes of code maintainability, logging, transaction control, and the like. But I doubt there are any hard-and-fast rules to tell you what types of requirements would lead to one solution or the other -- individual databases and needs are simply far too variant.
That said, I fully concur with your "if it ain't broke, don't fix it" approach. There is little to be gained by refactoring procedural code to set operations for a procedure that is working just fine. However, it is a good rule of thumb to seek first for a set-based solution and only drop into procedural code when you must. Gut feel? If you're using cursors more than 20% of the time, you're doing something wrong.
And for what I really want to say:
When I interview programmers, I always throw them a couple of moderately complex SQL questions and ask them to explain how they'd solve them. These are problems that I know can be solved with set operations, and I'm specifically looking for candidates who are able to solve them without procedural approaches (i.e., cursors).
This is not because I believe there is anything inherently good or more performant in either approach -- different situations yield different results. Rather it's because, in my experience, programmers either get the concept of set-based operations or they do not. If they do not, they will spend too much time developing complex procedural solutions for problems that can be solved far more quickly and simply with set-based operations.
Conversely, a programmer who gets set-based operations almost never has problems implementing a procedural solution when, indeed, it's absolutely necessary.
Running Totals is the classic case where as the number of rows gets larger cursors can out perform set based operations as despite the higher fixed cost of the cursor the work required grows linearly rather than exponentially as with the set based "triangular join" approach.
Itzik Ben Gan does some comparisons here.
Denali has more complete support for the OVER clause however that should make this use redundant.
Since I've seen people manage to re-implement cursors (in all there varied forms) using other TSQL constructs (usually involving at least one while loop), there's nothing that cursors can achieve that can't be done using other constructs.
That's not to say that the re-implementations aren't equally as inefficient as the cursors that were avoided by not including the word "cursor" in that solution. Some people seem to purely hate the word, not the mechanics.
One place I've successfully argued to keep cursors was for a data transfer/transform between two different databases (we were dealing with clients here). Whilst we could have implemented this transfer in a set based manner (indeed, we previously had), there was problematic data that could cause issues for a few clients. In a set based solution, we had either to:
Continue the transfer, excluding failed client data at each table, leaving those clients partially transferred, or,
abort the entire batch
Whereas, by making the unit of transfer the individual client (using a cursor to select each client), we could make each client's transfer between the systems either work fully or be entirely rolled back (i.e. place each transfer in its own transaction)
I can't think of any situations where I've wanted to use a cursor below the "top level" of such transfers though (e.g. selecting which client to transfer next)
Often when you build dynamic sql, you have to use cursors. Imagine a script that search through all tabels in the database for same value in different fields. Best solution will be a cursor. Question where the problem was raised is here How to use EXEC or sp_executeSQL without looping in this case? I will be really impressed if anyone can solve that better without a cursor.

Parallelize TSQL CLR Procedure

I'm trying to figure out how I can parallelize some procedural code to create records in a table.
Here's the situation (sorry I can't provide much in the way of actual code):
I have to predict when a vehicle service will be needed, based upon the previous service date, the current mileage, the planned daily mileage and the difference in mileage between each service.
All in all - it's very procedural, for each vehicle I need to take into account it's history, it's current servicing state, the daily mileage (which can change based on ranges defined in the mileage plan), and the sequence of servicing.
Currently I'm calculating all of this in PHP, and it takes about 20 seconds for 100 vehicles. Since this may in future be expanded to several thousand, 20 seconds is far too long.
So I decided to try and do it in a CLR stored procedure. At first I thought I'd try multithreading it, however I quickly found out it's not easy to do in the TSQL host. I was recommended to allow TSQL to work out the parallelization itself. Yet I have no idea how. If it wasn't for the fact the code needs to create records I could define it as a function and do:
SELECT dbo.PredictServices([FleetID]) FROM Vehicles
And TSQL should figure out it can parallelize that, but I know of no alternative for procedures.
Is there anything I can do to parallelize this?
The recommendation you received is a correct one. You simply don't have .NET framework facilities for parallelism available in your CLR stored procedure. Also please keep in mind that the niche for CLR Stored Procedures is rather narrow and they adversely impact SQL Server's performance and scalability.
If I understand the task correctly you need to compute a function PredictServices for some records and store the results back to database. In this case CLR Stored procedures could be your option provided PredictServices is just data access/straightforward transformation of data. Best practice is to create WWF (Windows Workflow Foundation) service to perform computations and call it from PHP. In Workflow Service you can implement any solution including one involving parallelism.

Why “Set based approaches” are better than the “Procedural approaches”?

I am very eager to know the real cause though earned some knowledge from googling.
Thanks in adavnce
Because SQL is a really poor language for writing procedural code, and because the SQL engine, storage, and optimizer are designed to make it efficient to assemble and join sets of records.
(Note that this isn't just applicable to SQL Server, but I'll leave your tags as they are)
Because, in general, the hundreds of man-years of development time that have gone into the database engine and optimizer, and the fact that it has access to real-time statistics about the data, have resulted in it being better than the user in working out the best way to process the data, for a given request.
Therefore by saying what we want to achieve (with a set-based approach), and letting it decide how to do it, we generally achieve better results than by spelling out exactly how to provess the data, line by line.
For example, suppose we have a simple inner join from table A to table B. At design time, we generally don't know 'which way round' will be most efficient to process: keep a list of all the values on the A side, and go through B matching them, or vice versa. But the query optimizer will know at runtime both the numbers of rows in the tables, and also the most recent statistics may provide more information about the values themselves. So this decision is obviously better made at runtime, by the optimizer.
Finally, note that I have put a number of 'generally's in this post - there will always be times when we know better than the optimizer will, and for such times we can provide hints (NOLOCK etc).
Set based approaches are declarative, so you don't describe the way the work will be done, only what you want the result to look like. The server can decide between several strategies how to complay with your request, and hopefully choose one that is efficient.
If you write procedural code, that code will at best be less then optimal in some situation.
Because using a set-based approach to SQL development conforms to the design of the data model. SQL is a very set-based language, used to build sets, subsets, unions, etc, from data. Keeping that in mind while developing in TSQL will generally lead to more natural algorithms. TSQL makes many procedural commands available that don't exist in plain SQL, but don't let that switch you to a procedural methodology.
This makes me think of one of my favorite quotes from Rob Pike in Notes on Programming C:
Data dominates. If you have chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
SQL databases and the way we query them are largely set-based. Thus, so should our algorithms be.
From an even more tangible standpoint, SQL servers are optimized with set-based approaches in mind. Indexing, storage systems, query optimizers, and other optimizations made by various SQL database implmentations will do a much better job if you simply tell them the data you need, through a set-based approach, rather than dictating how you want to get it procedurally. Let the SQL engine worry about the best way to get you the data, you just worry about telling it what data you want.
As each one has explained, let the SQL engine help you, believe, it is very smart.
If you do not use to write set based solution and use to develop procedural code, you will have to spend some time until write well formed set based solutions. This is a barrier for most people. A tip if you wish to start coding set base solutions is, stop thinking what you can do with rows, and start thinking what you can do with collumns, and do practice functional languages.