SSRS and string parsing to produce a report - tsql

A friend told me that his new employer needs an SSRS report that parse a column that contains n consecutive occurrences of
1) the literal "Date:"
2) An optional separator character
3) followed by a date in DD-MM-YY format (leading zeros are optional)
4) a separator space
5) A single "WORD" of data that is associated with the date. This word will have no embedded spaces.
I'll populate a Sample table with data that meets this critera to give you an example to make it clear:
CREATE TABLE [dbo].[Sample](
[RowNumber] [int] NOT NULL,
[DataMess] [varchar](max) NOT NULL
) ON [PRIMARY]
INSERT [dbo].[Sample] ([RowNumber], [DataMess]) VALUES (1, N'Date:12-21-13 12/13/14/15 Date:4-2-11 39/12/134/14 Date:4-1-13 19/45/5/12')
INSERT [dbo].[Sample] ([RowNumber], [DataMess]) VALUES (2, N'Date:7-21-13 12/13/14/15 Date:8-21-12 39/12/34/14 Date:12-1-13 19/4/65/12')
INSERT [dbo].[Sample] ([RowNumber], [DataMess]) VALUES (3, N'Date:3-21-13 12/11233/14/15 Date:4-28-13 39/12/34/14 Date:9-19-13 19/45/65/12')
For the first record, "12/13/14/15" is considered to be the "Word" of data that is associated with the Date 12-21-13.
He was aked to produce the following report in SSRS:
Row Number DataMess
1 Date: 12-21-13 12/13/14/15
Date: 4-1-13 19/45/5/12
Date: 4-2-11 39/12/134/14
2 Date:12-1-13 19/4/65/12
Date:7-21-13 12/13/14/15
Date:8-21-12 39/12/34/14
3 Date:9-19-13 19/45/65/12
Date:4-28-13 39/12/34/14
Date:3-21-13 12/11233/14/15
Note that the Dates for each source row number are sorted in descending arder alomng with the associated wor of data.
I don't know SSRS, but my reaction was to recommend to him that he not even attempt the task but to tell his employer that the data shouldn't really be trying to do all of that ugly string parsing with T-SQL. Instead this repeating "Date: DATA" should be stored in individual child records that are associated with a parent Row record. I believe that the code would be ugly, inefficient, brittle and hard to maintain. What are your thoughts?
Assuming that management\client is always right or to conceed that "ideally" this is correct, but "for now" we need a SQL that produces the following report, how would one do this? The expectation was that this can be produced quickly ( a half day, for example)

You are of course correct, it's certainly far from the best way of storing the data. Any way of extracting the data for this report will be much more complex than it would be if it was stored differently.
However, based on the data it still wouldn't be too tough to actually generate the report. Due to the table structure actually generating the dataset for the report would be the hardest part.
So to generate the dataset, you need to split the data in DataMess to get one row per Date/Word, and be able to extract the date from that split data to be able to order by date as required.
Take your pick on how you want to split the data:
Split function equivalent in T-SQL? has many options, as does this link - Best Split Function.
Here's a SQL Fiddle with one of the functions in action.
Once you've split the data, use the appropriate function to extract the date portion, i.e. between the colon and the space before the word data, then CAST this as a date.
Once you've actually got the dataset, it's the most trivial of reports - just add a row group based on RowNumber, add the split Date/Word data as a detail field and you're done. Make sure the dataset is ordered by the extracted date field, even though you don't actually display this in the report.
As an interim measure I would certainly expect this to be doable in half a day of work or so. So for just this report it's not too bad, but for anything else you'll probably have trouble.
For a few rows it will likely run fine, but on any non-trivial sized dataset performance will be suboptimal.

Thank you. Here's what I did for the remaining part to get the dates sorted in DESC order.
SELECT
RowNumber
,'Date: ' + ss.Item AS Data
--,cast(substring(ss.Item, 1, charindex(' ' , ss.Item) ) AS date)
FROM
Sample s
CROSS apply dbo.SplitStrings_XML(s.DataMess,
N'Date:') ss
WHERE
Item IS NOT NULL
ORDER BY
rownumber,
cast(substring(ss.Item, 1, charindex(' ' , ss.Item) ) AS date) desc
If the data fails to hold up to this expected format and we encounter a date that is not valid, then the whole report blows up.

Related

SUMX is giving same value for all the rows

I have a fact table with columns as Quantity, Unit Price, etc. I am trying to calculate the revenue with the SUMX formula but I am getting the same value for all the records. And due to this I am also getting a dependency error in other column. Here is the code:
SUMX(
'''Sales Details$''',
'''Sales Details$'''[Quantity]*'''Sales Details$'''[Unit Price]
)
This table has been imported from SSMS as it is, into the tabular model analysis service in VS2019.
I wish to understand few things here-
Why we have to provide a table inside of 3-quotes? The DAX bar is not taking the table without specifying them under 3-quotes.
SUMX shouldn't evaluate the same value for all the records. But it is doing here for an unknown reason.
If I try to replace the [Unit Price] with [Unit Cost] in the upper code then I am getting a dependency error in the new column. As far as I know, I am not using a CALCULATE function which will generate circular dependency and SUMX doesn't puts the filter on columns, [Quantity] here.
I think it is because the table name has spaces. When a table name has spaces or not allowed characters it goes between two single quotes: ""
If I'm not wrong (I'm quite new with DAX too), SUMX is like sumproduct in Excel. It does the unit price * quantity per row and then sums up all the rows, breaking the row context. If you want to calculate the amount per row, just do price * quantity, without SUMX.
I don't know, sorry.

SSRS Grouping Summary - with Max not working

This is the data that comes back from the database
Data Sample for one season (the report returns values for two):
What you can see is groupings, by Season, Theater then Performance number and lastly we have the revenue and ticket columns.
The SSRS Report Has three levels of groupings. Pkg (another ID that groups the below), venue -- the venue column and perf_desc -- the description column linked tot he perf_no.
Looks like this --
What I need to do is take the revenue column (a unique value) for each Performance and return it in a separate column -- so i use this formula.
sum(Max(Fields!perf_tix.Value, "perf_desc"))
This works great, gives me the total unique value for each performance -- and sums them up by the pkg level.
The catch is when i need to pull the data out by season.
I created a separate column looks like this
it's yellow because it's invisible and is referenced elsewhere. But the expression is if the Season value = to the Parameter (passed season value) -- then basically pull the sum of each of the tix values and sum them up. This also works great on the lower line - the line where the grouping exists for pkg -- light blue in my case.
=iif(Fields!season.Value = Parameters!season.Value, Sum(Max(Fields!perf_tix.Value, "perf_desc")), 0)
However, the line above -- the parent/header line its giving me the sum of the two seasons values. Basically adding it all up. This is not what I want and also why is it doing this. The season value is not equal to the passed parameter for the second season value so why is it adding it to the grouped value.
How do I fix this??
Since your aggregate function is inside your IIF function, only the first record in your dataset is being evaluated. If the first one matches the parameter, all records would be included.
This might work:
=IIF(Fields!season.Value = Parameters!season.Value, Sum(Max(Fields!perf_tix.Value, "perf_desc")), 0)
It might be better if your report was also grouping on the Venue, otherwise you count may include all values.

How to get all missing days between two dates

I will try to explain the problem on an abstract level first:
I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.
Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.
I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):
Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.
After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.
Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.
I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.
I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.
I have tried to be as clear as posible but if I failed at any point please let me know!
Im not sure what the aggregator is supposed to do?
The same with the 'full outer' join? A normal join on a constant port is fine :) c
Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.
In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more)
I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here
SQL to generate a list of numbers from 1 to 100
SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery
(Check if the query works for you - haven't access to pc to test it at the minute)

How to extract information meeting a specific criterion from a table?

I have a table with 6 columns and 140,000 rows, and I can't figure out how to extract specific information from the table. For instance, when I try to extract all the accidents that happens on a specific date, either it tells me that the row '12/05/2015' does not exist or it doesn't let me set 'Date' as a Row Name since the dates repeat because more than one accident happens in a day, thus giving me the error that 'Duplicate row name: '01/01/2015'.
How can I pick a date and extract all of the data that corresponds to it?
P.S. Below you can see two photos, one of the table and one of the errors I get when trying to set date as a row to make everything clearer.
if I understand correctly your matter, you want to extract from the table, the rows that contain Date1, if so try this :
new_table = table(table(:,1)==Date1,:);

SSRS 2008 - Multiple Groupings For Date Range

A record in a table contains a range of valid dates, say:
*tbl1.start_date* and *tbl1.end_date*. So to ensure I get all records that are valid for a specific date range, the selection logic is: <...> WHERE end_date >= #dtFrom AND start_date < #dtTo (the #dtTo parameter used in the SQL statement is actually the calculated next day of the *#prmDt_To* parameter used in the report).
Now in a report I need to count the number of records for each day within the specified data range and include the days, if any, for which there were no valid records. Thus a retrieved record may be counted in several different days. I can do it relatively easily with a recursive CTE within the data set, but my rule of thumb is to avoid the unnecessary load on the SQL database and instead return just the necessary raw data and let the Report engine handle groupings. So is there a means to do this within SSRS?
Thank you,
Sergey
You might be able to do something in SSRS with custom code, but I recommend against it. The place to do this is in the dataset. SSRS is not designed to fill in groups that don't exist in the dataset. That sounds like what you are trying to do: SSRS would need to create the groups for each date whether or not that date is in the dataset.
If you don't have a number or date table in your database, I would just create a recursive CTE with a record for every date in the range that you are interested as you mention. Then outer join this to your table and use COUNT(tbl1.start_date) to find the appropriate days. This shouldn't be too painful a query for SQL server.
If you really need to avoid the CTE, then I would create a date or number table to use to generate the dates in your range.