Month to month data comparison - crystal-reports

I'm trying to create a formula in Crystal Reports using the below data that will calculate the difference of DxLoaded in the current RunDt minus the previous month RunDt for each of the two Data Sources. Each month a new RunDt will be populated. When the report is refreshed I need to have the formula calculate using the most current RunDt comparing it to the previous RunDt for each Data Source.
For example, I would like to calculate 5,491,932 for the 1203 RunDt minus 3,830,842 for the 1202 RunDt. Then have the formula do that for each Data Source that might be populated. There will also be service year of 2012 for the same Data Source and would like 2011 compared only with 2011 and 2012 with 2012. I've tried using the previous function but haven't had luck with obtaining the correct results.
Yr Data_Source RunDt_YYMM DxLoaded
2011 ABS 1203 5,491,932
2011 ABS 1202 3,830,842
2011 IALT 1203 9,193,144
2011 IALT 1202 6,578,678
2012 ABS 1203 1,837,900
2012 ABS 1202 1,083,124
2012 IALT 1203 4,223,111
2012 IALT 1202 2,985,543
Any help of suggestions are greatly appreciated!!
Thank you!

Assuming that you want the output to appear in the order given in the question, and that you want the difference to appear next to the most recent RunDt_YYMM value for the data source and year:
Explicitly sort the dataset in the order required - ie. Yr ascending, Data_Source ascending, RunDt_YYMM descending. (It should be possible to do this either in the query or in the Crystal Record Sort expert.)
Enter a formula like the following (amend to match your table name):
if NextIsNull ({SummaryTable.Yr}) then 0 else
if {SummaryTable.Data_Source}=Next({SummaryTable.Data_Source}) and
{SummaryTable.Yr}=Next({SummaryTable.Yr})
then {SummaryTable.DxLoaded} - Next({SummaryTable.DxLoaded})
Drag and drop the new formula from the Field Explorer into the report detail section, next to the DxLoaded field.

It's very easy to get bogged down with this sort of problem. The trick is to sort the data out in the query by joining the table to itself using the rundt column. You can either do this in crystal using a custom command or what I prefer to do is create a view or stored procedure in SQL. For example:
SELECT *
INTO #TEMP
FROM (
SELECT 2011 Yr, 'ABS' Data_Source, '1203' RunDt_YYMM, 5491932 DxLoaded
UNION SELECT 2011 Yr, 'ABS' Data_Source, '1202' RunDt_YYMM, 3830842 DxLoaded
UNION SELECT 2011 Yr, 'IALT' Data_Source, '1203' RunDt_YYMM, 9193144 DxLoaded
UNION SELECT 2011 Yr, 'IALT' Data_Source, '1202' RunDt_YYMM, 6578678 DxLoaded
UNION SELECT 2012 Yr, 'ABS' Data_Source, '1203' RunDt_YYMM, 1837900 DxLoaded
UNION SELECT 2012 Yr, 'ABS' Data_Source, '1202' RunDt_YYMM, 1083124 DxLoaded
UNION SELECT 2012 Yr, 'IALT' Data_Source, '1203' RunDt_YYMM, 4223111 DxLoaded
UNION SELECT 2012 Yr, 'IALT' Data_Source, '1202' RunDt_YYMM, 2985543 DxLoaded
) A
SELECT * FROM #TEMP
SELECT
a.Yr, a.Data_Source, a.RunDT_YYMM, a.DxLoaded, b.DxLoaded PrevDxLoaded
FROM
#TEMP a
LEFT OUTER JOIN
#TEMP b
ON
b.Yr = a.Yr
AND
b.Data_Source = a.Data_Source
AND
b.RunDT_YYMM = CASE WHEN RIGHT(a.RunDT_YYMM,2) = '01' THEN
CAST(CAST(a.RunDT_YYMM as INT) - 89 AS VARCHAR(4))
ELSE
CAST(CAST(a.RunDT_YYMM as INT) - 1 AS VARCHAR(4))
END
Result:
Yr Data_Source RunDT_YYMM DxLoaded PrevDxLoaded
2011 ABS 1202 3830842 NULL
2011 ABS 1203 5491932 3830842
2011 IALT 1202 6578678 NULL
2011 IALT 1203 9193144 6578678
2012 ABS 1202 1083124 NULL
2012 ABS 1203 1837900 1083124
2012 IALT 1202 2985543 NULL
2012 IALT 1203 4223111 2985543

Related

Postgres Crosstab on double columns with unknown value

So i have a table like this in my Postgres v.10 DB
CREATE TABLE t1(id integer primary key, ref integer,v_id integer,total numeric, year varchar, total_lastyear numeric,lastyear varchar ) ;
INSERT INTO t1 VALUES
(1, 2077,15,10000,2020,9000,2019),
(2, 2000,13,190000,2020,189000,2019),
(3, 2065,11,10000,2020,10000,2019),
(4, 1999,14,2300,2020,9000,2019);
select * from t1 =
id ref v_id total year total_lastyear lastyear
1 2077 15 10000 2020 9000 2019
2 2000 13 190000 2020 189000 2019
3 2065 11 10000 2020 10000 2019
4 1999 14 2300 2020 9000 2019
Now i want to Pivot this table so that i have 2020 and 2019 as columns with the total amounts as values.
My Problems:
I don't know how two pivot two columns in the same query, is that even possibly or do you have to make two steps?
The years 2020 and 2019 are dynamic and can change from one day to another. The year inside the column is the same on every row.
So basicly i need to save the years inside lastyear and year in some variable and pass it to the Crosstab query.
This far i made it myself but i only managed to pivot one year and the 2019 and 2020 years is hardcoded.
Demo
You can pivot one at a time with WITH.
WITH xd1 AS (
SELECT * FROM crosstab('SELECT ref,v_id,year,total FROM t1 ORDER BY 1,3',
'SELECT DISTINCT year FROM t1 ORDER BY 1') AS ct1(ref int,v_id int,"2020" int)
), xd2 AS (
SELECT * FROM crosstab('SELECT ref,v_id,lastyear,total_lastyear FROM t1 ORDER BY 1,3',
'SELECT DISTINCT lastyear FROM t1 ORDER BY 1') AS ct2(ref int,v_id int,"2019" int)
)
SELECT xd1.ref,xd1.v_id,xd1."2020",xxx."2019"
FROM xd1
LEFT JOIN xd2 AS xxx ON xxx.ref = xd1.ref AND xxx.v_id = xd1.v_id;
This doesn't prevent from last_year and year colliding.
You still have to know the years query will return as you have to define record as it is returned by crosstab.
You could wrap it in an EXECUTE format() to make it more dynamic and deal with some stringology.
This issue was mentioned here.

What is the best way to get this TSQL Pivot to work [duplicate]

I need to do the following transpose in MS SQL
from:
Day A B
---------
Mon 1 2
Tue 3 4
Wed 5 6
Thu 7 8
Fri 9 0
To the following:
Value Mon Tue Wed Thu Fri
--------------------------
A 1 3 5 7 9
B 2 4 6 8 0
I understand how to do it with PIVOT when there is only one column (A) but I can not figure out how to do it when there are multiple columns to transpose (A,B,...)
Example code to be transposed:
select LEFT(datename(dw,datetime),3) as DateWeek,
sum(ACalls) as A,
Sum(BCalls) as B
from DataTable
group by LEFT(datename(dw,datetime),3)
Table Structure:
Column DataType
DateTime Datetime
ACalls int
BCalls int
Any help will be much appreciated.
In order to transpose the data into the result that you want, you will need to use both the UNPIVOT and the PIVOT functions.
The UNPIVOT function takes the A and B columns and converts the results into rows. Then you will use the PIVOT function to transform the day values into columns:
select *
from
(
select day, col, value
from yourtable
unpivot
(
value
for col in (A, B)
) unpiv
) src
pivot
(
max(value)
for day in (Mon, Tue, Wed, Thu, Fri)
) piv
See SQL Fiddle with Demo.
If you are using SQL Server 2008+, then you can use CROSS APPLY with VALUES to unpivot the data. You code would be changed to the following:
select *
from
(
select day, col, value
from yourtable
cross apply
(
values ('A', A),('B', B)
) c (col, value)
) src
pivot
(
max(value)
for day in (Mon, Tue, Wed, Thu, Fri)
) piv
See SQL Fiddle with Demo.
Edit #1, applying your current query into the above solution you will use something similar to this:
select *
from
(
select LEFT(datename(dw,datetime),3) as DateWeek,
col,
value
from DataTable
cross apply
(
values ('A', ACalls), ('B', BCalls)
) c (col, value)
) src
pivot
(
sum(value)
for dateweek in (Mon, Tue, Wed, Thu, Fri)
) piv

T-SQL - Data Islands and Gaps - How do I summarise transactional data by month?

I'm trying to query some transactional data to establish the CurrentProductionHours value for each Report at the end of each month.
Providing there has been a transaction for each report in each month, that's pretty straight-forward... I can use something along the lines of the code below to partition transactions by month and then pick out the rows where TransactionByMonth = 1 (effectively, the last transaction for each report each month).
SELECT
ReportId,
TransactionId,
CurrentProductionHours,
ROW_NUMBER() OVER (PARTITION BY [ReportId], [CalendarYear], [MonthOfYear]
ORDER BY TransactionTimestamp desc
) AS TransactionByMonth
FROM
tblSource
The problem that I have is that there will not necessarily be a transaction for every report every month... When that's the case, I need to carry forward the last known CurrentProductionHours value to the month which has no transaction as this indicates that there has been no change. Potentially, this value may need to be carried forward multiple times.
Source Data:
ReportId TransactionTimestamp CurrentProductionHours
1 2014-01-05 13:37:00 14.50
1 2014-01-20 09:15:00 15.00
1 2014-01-21 10:20:00 10.00
2 2014-01-22 09:43:00 22.00
1 2014-02-02 08:50:00 12.00
Target Results:
ReportId Month Year ProductionHours
1 1 2014 10.00
2 1 2014 22.00
1 2 2014 12.00
2 2 2014 22.00
I should also mention that I have a date table available, which can be referenced if required.
** UPDATE 05/03/2014 **
I now have query which is genertating results as shown in the example below but I'm left with islands of data (where a transaction existed in that month) and gaps in between... My question is still similar but in some ways a little more generic - What is the best way to fill gaps between data islands if you have the dataset below as a starting point?
ReportId Month Year ProductionHours
1 1 2014 10.00
1 2 2014 12.00
1 3 2014 NULL
2 1 2014 22.00
2 2 2014 NULL
2 3 2014 NULL
Any advice about how to tackle this would be greatly appreciated!
Try this:
;with a as
(
select dateadd(m, datediff(m, 0, min(TransactionTimestamp))+1,0) minTransactionTimestamp,
max(TransactionTimestamp) maxTransactionTimestamp from tblSource
), b as
(
select minTransactionTimestamp TT, maxTransactionTimestamp
from a
union all
select dateadd(m, 1, TT), maxTransactionTimestamp
from b
where tt < maxTransactionTimestamp
), c as
(
select distinct t.ReportId, b.TT from tblSource t
cross apply b
)
select c.ReportId,
month(dateadd(m, -1, c.TT)) Month,
year(dateadd(m, -1, c.TT)) Year,
x.CurrentProductionHours
from c
cross apply
(select top 1 CurrentProductionHours from tblSource
where TransactionTimestamp < c.TT
and ReportId = c.ReportId
order by TransactionTimestamp desc) x
A similar approach but using a cartesian to obtain all the combinations of report ids/months.
in the first step.
A second step adds to that cartesian the maximum timestamp from the source table where the month is less or equal to the month in the current row.
Finally it joins the source table to the temp table by report id/timestamp to obtain the latest source table row for every report id/month.
;
WITH allcombinations -- Cartesian (reportid X yearmonth)
AS ( SELECT reportid ,
yearmonth
FROM ( SELECT DISTINCT
reportid
FROM tblSource
) a
JOIN ( SELECT DISTINCT
DATEPART(yy, transactionTimestamp)
* 100 + DATEPART(MM,
transactionTimestamp) yearmonth
FROM tblSource
) b ON 1 = 1
),
maxdates --add correlated max timestamp where the month is less or equal to the month in current record
AS ( SELECT a.* ,
( SELECT MAX(transactionTimestamp)
FROM tblSource t
WHERE t.reportid = a.reportid
AND DATEPART(yy, t.transactionTimestamp)
* 100 + DATEPART(MM,
t.transactionTimestamp) <= a.yearmonth
) maxtstamp
FROM allcombinations a
)
-- join previous data to the source table by reportid and timestamp
SELECT distinct m.reportid ,
m.yearmonth ,
t.CurrentProductionHours
FROM maxdates m
JOIN tblSource t ON t.transactionTimestamp = m.maxtstamp and t.reportid=m.reportid
ORDER BY m.reportid ,
m.yearmonth

Problems with Group By - want to call a column without using in Group By (T-SQL, SQL Server)

I want to be able to select the top Max(HR) leaders by LgID and YearID. But I also want the Player's name column. (T-SQL, SQL Server 2012 Express)
When I query with the player name it returns '0' for every Max(HR) output. It seems SQL Server 2012 Express won't allow me to omit the PlayerID in the group by when I have it in the select statement. Is there a way to get by this? A Case when? Or something else?
Select
playerID,
yearID,
lgID,
Max(HR) HR_Leader
from batting
group by
yearID,
lgID
order by
yearID desc,
lgID,
Max(HR)
Returns this error:
Msg 8120, Level 16, State 1, Line 2
Column 'batting.playerID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
But when I comment out the PlayerID, it runs, but I have no name, as seen here:
Select
--playerID,
yearID,
lgID,
Max(HR) HR_Leader
from batting
group by
yearID,
lgID
order by
yearID desc,
lgID,
Max(HR)
yearID lgID HR_Leader
2013 AL 53
2013 NL 36
2012 AL 44
2012 NL 41
2011 AL 43
2011 NL 39
2010 AL 54
2010 NL 42
2009 AL 39
2009 NL 47
Update after 1st comment/question.
Select
playerID,
yearID,
lgID,
Max(HR) HR_Leader
from batting
group by
playerID,
yearID,
lgID
order by
yearID desc,
lgID,
Max(HR) desc
Query Returns this: Which doesn't have the look of the 1st output (
playerID yearID lgID HR_Leader
davisch02 2013 AL 53
cabremi01 2013 AL 44
encared01 2013 AL 36
dunnad01 2013 AL 34
trumbma01 2013 AL 34
jonesad01 2013 AL 33
longoev01 2013 AL 32
ortizda01 2013 AL 30
mossbr01 2013 AL 30
beltrad01 2013 AL 30
What I want is this:
PlayerID yearID lgID HR_Leader
Player1 2013 AL 53
Player2 2013 NL 36
Player3 2012 AL 44
Player4 2012 NL 41
Player5 2011 AL 43
Player6 2011 NL 39
Player7 2010 AL 54
Player8 2010 NL 42
Player9 2010 NL 42
Here a simple way. Use a common table expression (CTE) to get the top HR for each League and Year. Then join that back to batting to get the players that own the those numbers. The sample data includes a tie which returns both players in no particular order.
CREATE TABLE batting (playerID INT, yearID INT, lgID CHAR(2), HR INT)
INSERT INTO batting
SELECT 1, 2010, 'AL', 40 UNION
SELECT 2, 2010, 'AL', 35 UNION
SELECT 3, 2010, 'NL', 35 UNION
SELECT 4, 2010, 'NL', 30 UNION
SELECT 5, 2011, 'AL', 50 UNION
SELECT 6, 2011, 'AL', 45 UNION
SELECT 3, 2011, 'NL', 45 UNION
SELECT 7, 2011, 'NL', 45 UNION
SELECT 4, 2011, 'NL', 40
;WITH cte AS (
SELECT yearID
,lgID
,MAX(HR) HR_Leader
FROM batting
GROUP BY yearID
,lgID
)
SELECT playerID
,c.*
FROM batting b
INNER JOIN
cte c ON b.yearID=c.yearID
AND b.lgID=c.lgID
AND b.HR=c.HR_Leader
ORDER BY c.yearID DESC
,c.lgID
DROP TABLE batting

PostgreSQL - GROUP subsequent rows

I have a table which contains some records ordered by date.
And I want to get start and end dates for each subsequent group (grouped by some criteria e.g.position).
Example:
create table tbl (id int, date timestamp without time zone,
position int);
insert into tbl values
( 1 , '2013-12-01', 1),
( 2 , '2013-12-02', 2),
( 3 , '2013-12-03', 2),
( 4 , '2013-12-04', 2),
( 5 , '2013-12-05', 3),
( 6 , '2013-12-06', 3),
( 7 , '2013-12-07', 2),
( 8 , '2013-12-08', 2)
Of course if I simply group by position I will get wrong result as positions could be the same for different groups:
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tbl GROUP BY POSITION
I will get:
POSITION MIN MAX
1 December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
3 December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2 December, 02 2013 00:00:00+0000 December, 08 2013 00:00:00+0000
But I want:
POSITION MIN MAX
1 December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
2 December, 02 2013 00:00:00+0000 December, 04 2013 00:00:00+0000
3 December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2 December, 07 2013 00:00:00+0000 December, 08 2013 00:00:00+0000
I found a solution for MySql which uses variables and I could port it but I believe PostgreSQL can do it in some smarter way using its advanced features like window functions.
I'm using PostgreSQL 9.2
There is probably more elegant solution but try this:
WITH tmp_tbl AS (
SELECT *,
CASE WHEN lag(position,1) OVER(ORDER BY id)=position
THEN position
ELSE ROW_NUMBER() OVER(ORDER BY id)
END AS grouping_col
FROM tbl
)
, tmp_tbl2 AS(
SELECT position,date,
CASE WHEN lag(position,1)OVER(ORDER BY id)=position
THEN lag(grouping_col,1) OVER(ORDER BY id)
ELSE ROW_NUMBER() OVER(ORDER BY id)
END AS grouping_col
FROM tmp_tbl
)
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tmp_tbl2 GROUP BY grouping_col,position
There are some complete answers on Stackoverflow for that, so I'll not repeat them in detail, but the principle of it is to group the records according to the difference between:
The row number when ordered by the date (via a window function)
The difference between the dates and a static date of reference.
So you have a series such as:
rownum datediff diff
1 1 0 ^
2 2 0 | first group
3 3 0 v
4 5 1 ^
5 6 1 | second group
6 7 1 v
7 9 2 ^
8 10 2 v third group