Calculating Running Totals

Calculating Running Totals - tsql

I am trying to re-calculate a few different columns of data for a particular EmployeeID.
I want the hrs_YTD column to keep a running total. What is a good way of updating this info?
HRS_YTD currently has 0.00 values. I wan't to achieve the results in the table below.
ID | CHEKDATE | CHEKNUMBR | HRS | HRS_YTD
EN344944 | 01/1/2014 | dd1001 | 40.00 | 40.00
EN344944 | 01/8/2014 | dd1002 | 30.00 | 70.00
EN344944 | 1/15/2014 | dd1003 | 32.50 | 102.50
etc.....
DECLARE #k_external_id varchar(32)
SET #k_external_id = 'EN344944'
SELECT * INTO #tmpA
FROM dbo.gp_check_hdr a
WHERE a.EMPLOYID = #k_external_id
SELECT a.ID, a.CHEKNMBR, a.CHEKDATE,
(SELECT CAST(SUM(a.[hours]) as decimal(18,2)) FROM #tmpA b
WHERE (b.CHEKDATE <= a.CHEKDATE and YEAR(b.CHEKDATE) = 2013)) AS hrs_ytd
FROM #tmpA a
WHERE YEAR(a.CHEKDATE) = 2013
I really don't know if I can alias a table like I did with #tmpA b, but it's worked for me in the past. That doesn't mean its a good way of doing things though. Can someone show me a way to achieve the results I need?

havent tested this, but you can give this a try
DECLARE #k_external_id varchar(32)
SET #k_external_id = 'EN344944'
SELECT g1.primarykey, g1.ID,g1.CHEKDATE, g1.CHEKNUMBR, g1.HRS ,(SELECT SUM(g2.HRS)
FROM dbo.gp_check_hdr g2
WHERE g2.ID = #k_external_id AND
(g2.primarykey <= g1.primarykey)) as HRS_YTD
FROM dbo.gp_check_hdr g1
WHERE g1.ID = #k_external_id
ORDER BY g1.primarykey;
http://www.codeproject.com/Articles/300785/Calculating-simple-running-totals-in-SQL-Server

The way I'd do this is a combination of a computed column and a user defined function.
The function allows to aggregate the data. In a computed column, you can only work with fields of the same row, hence calling a function (which is allowed) is necessary.
The computed column allows this to work continuously without any additional queries or temp tables, etc. Once it's set, you don't need to run nightly updates or triggers or anything of the sort to keep the data updated, including when records change or get deleted.
Here's my solution ... and SqlFiddle: http://www.sqlfiddle.com/#!3/cd8d6/1/0
Edit:
I've updated this to reflect your need to calculate the running totals per employee. SqlFiddle also updated.
The function:
Create Function udf_GetRunningTotals (
#CheckDate DateTime,
#EmployeeID int
)
Returns Decimal(18,2)
As
Begin
Declare #Result Decimal(18,2)
Select #Result = Cast(Sum(rt.Hrs) As Decimal(18,2))
From RunningTotals rt
Where rt.CheckDate <= #CheckDate
And Year(rt.CheckDate) = Year(#CheckDate)
And rt.EmployeeID = #EmployeeID
Return #Result
End
The Table Schema:
Create Table [dbo].[RunningTotals](
[ID] [int] Identity(1,1) NOT NULL,
[EmployeeID] [int] NOT NULL,
[CheckDate] [datetime] NOT NULL,
[CheckNumber] [int] NOT NULL,
[Hrs] [decimal](18, 2) NOT NULL,
[Hrs_Ytd] AS ([dbo].[udf_GetRunningTotals]([CheckDate],[EmployeeID])), -- must add after table creation and function creation due to inter-referencing of table and function
Constraint [PK_RunningTotals3] Primary Key Clustered (
[ID] ASC
) With (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON
)
) On [PRIMARY]
Result will tally up the YTD hrs for each year.
Note --
You cannot create the function or the table as is since they reference each other.
First, create the table with all but the computed column;
Then, create the function.
Finally, alter the table and add the computed column.
Here's a full running test script:
-- Table schema
Create Table [dbo].[RunningTotals](
[ID] [int] Identity(1,1) NOT NULL,
[EmployeeID] [int] NOT NULL,
[CheckDate] [datetime] NOT NULL,
[CheckNumber] [int] NOT NULL,
[Hrs] [decimal](18, 2) NOT NULL,
Constraint [PK_RunningTotals3] Primary Key Clustered (
[ID] ASC
) With (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON
)
) On [PRIMARY]
Go
-- UDF Function to compute totals
Create Function udf_GetRunningTotals (
#CheckDate DateTime,
#EmployeeID int
)
Returns Decimal(18,2)
As
Begin
Declare #Result Decimal(18,2)
Select #Result = Cast(Sum(rt.Hrs) As Decimal(18,2))
From RunningTotals rt
Where rt.CheckDate <= #CheckDate
And Year(rt.CheckDate) = Year(#CheckDate)
And rt.EmployeeID = #EmployeeID
Return #Result
End
Go
-- Add the computed column to the table
Alter Table RunningTotals Add [Hrs_Ytd] As (dbo.udf_GetRunningTotals(CheckDate, EmployeeID))
Go
-- Insert some test data
Insert into RunningTotals Values (334944, '1/1/2014', '1001', 40.00)
Insert into RunningTotals Values (334944, '1/5/2014', '1002', 30.00)
Insert into RunningTotals Values (334944, '1/15/2014', '1003', 32.50)
Insert into RunningTotals Values (334945, '1/5/2014', '1001', 10.00)
Insert into RunningTotals Values (334945, '1/6/2014', '1002', 20.00)
Insert into RunningTotals Values (334945, '1/8/2014', '1003', 12.50)
-- Test the computed column
Select * From RunningTotals

Your sub query should work just fine.
I used a table variable in place of a temp table.
I also limited the results inserted in the temp table to 2013 to simplify the final select statement and limit the results in the temp table to just what you need. The only other thing is joining the sub query to the main query using the ID but what you have should work as you are limiting the result in your temp table to a specific ID.
DECLARE
#k_external_id varchar(32)
,#k_reporting_year int
SET #k_external_id = 'EN344944'
SET #k_reporting_year = 2013
DECLARE #temp TABLE(
ID NVARCHAR(32)
,CheckDate DATE
,CheckNumber NVARCHAR(6)
,HRS DECIMAL(18,2)
)
INSERT INTO #temp (
ID
,CheckDate
,CheckNumber
,HRS
)
SELECT
ID
,CHEKDATE
,CHEKNMBR
,[hours]
FROM
dbo.gp_check_hdr
WHERE
EMPLOYID = #k_external_id
AND YEAR(a.CHEKDATE) = #k_reporting_year
SELECT
ID
,CheckDate
,CheckNumber
,HRS
,(SELECT SUM(HRS) FROM #temp b WHERE a.ID = b.ID AND b.CheckDate <= a.CheckDate) AS hrs_ytd
FROM
#temp a

Related

Is it possible to find duplicating records in two columns simultaneously in PostgreSQL?

I have the following database schema (oversimplified):
create sequence partners_partner_id_seq;
create table partners
(
partner_id integer default nextval('partners_partner_id_seq'::regclass) not null primary key,
name varchar(255) default NULL::character varying,
company_id varchar(20) default NULL::character varying,
vat_id varchar(50) default NULL::character varying,
is_deleted boolean default false not null
);
INSERT INTO partners(name, company_id, vat_id) VALUES('test1','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test2','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test3','3214567890102', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test4','9999999999999', 'GE9999999999999');
I am trying to figure out how to return test1, test2 (because the company_id column value duplicates vertically) and test3 (because the vat_id column value duplicates vertically as well).
To put it in other words - I need to find duplicating company_id and vat_id records and group them together, so that test1, test2 and test3 would be together, because they duplicate by company_id and vat_id.
So far I have the following query:
SELECT *
FROM (
SELECT *, LEAD(row, 1) OVER () AS nextrow
FROM (
SELECT *, ROW_NUMBER() OVER (w) AS row
FROM partners
WHERE is_deleted = false
AND ((company_id != '' AND company_id IS NOT null) OR (vat_id != '' AND vat_id IS NOT NULL))
WINDOW w AS (PARTITION BY company_id, vat_id ORDER BY partner_id DESC)
) x
) y
WHERE (row > 1 OR nextrow > 1)
AND is_deleted = false
This successfully shows all company_id duplicates, but does not appear to show vat_id ones - test3 row is missing. Is this possible to be done within one query?
Here is a db-fiddle with the schema, data and predefined query reproducing my result.

You can do this with recursion, but depending on the size of your data you may want to iterate, instead.
The trick is to make the name just another match key instead of treating it differently than the company_id and vat_id:
create table partners (
partner_id integer generated always as identity primary key,
name text,
company_id text,
vat_id text,
is_deleted boolean not null default false
);
insert into partners (name, company_id, vat_id) values
('test1','1010109191191', 'BG1010109191192'),
('test2','1010109191191', 'BG1010109191192'),
('test3','3214567890102', 'BG1010109191192'),
('test4','9999999999999', 'GE9999999999999'),
('test5','3214567890102', 'BG8888888888888'),
('test6','2983489023408', 'BG8888888888888')
;
I added a couple of test cases and left in the lone partner.
with recursive keys as (
select partner_id,
array['n_'||name, 'c_'||company_id, 'v_'||vat_id] as matcher,
array[partner_id] as matchlist,
1 as size
from partners
), matchers as (
select *
from keys
union all
select p.partner_id, c.matcher,
p.matchlist||c.partner_id as matchlist,
p.size + 1
from matchers p
join keys c
on c.matcher && p.matcher
and not p.matchlist #> array[c.partner_id]
), largest as (
select distinct sort(matchlist) as matchlist
from matchers m
where not exists (select 1
from matchers
where matchlist #> m.matchlist
and size > m.size)
-- and size > 1
)
select *
from largest
;
matchlist
{1,2,3,5,6}
{4}
fiddle
EDIT UPDATE
Since recursion did not perform, here is an iterative example in plpgsql that uses a temporary table:
create temporary table match1 (
partner_id int not null,
group_id int not null,
matchkey uuid not null
);
create index on match1 (matchkey);
create index on match1 (group_id);
insert into match1
select partner_id, partner_id, md5('n_'||name)::uuid from partners
union all
select partner_id, partner_id, md5('c_'||company_id)::uuid from partners
union all
select partner_id, partner_id, md5('v_'||vat_id)::uuid from partners;
do $$
declare _cnt bigint;
begin
loop
with consolidate as (
select group_id,
min(group_id) over (partition by matchkey) as new_group_id
from match1
), minimize as (
select group_id, min(new_group_id) as new_group_id
from consolidate
group by group_id
), doupdate as (
update match1
set group_id = m.new_group_id
from minimize m
where m.group_id = match1.group_id
and m.new_group_id != match1.group_id
returning *
)
select count(*) into _cnt from doupdate;
if _cnt = 0 then
exit;
end if;
end loop;
end;
$$;
updated fiddle

How to fill column basing on two other columns

I have table LessonHour with empty Number column.
TABLE [dbo].[LessonHour]
(
[Id] [uniqueidentifier] NOT NULL,
[StartTime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL,
[SchoolId] [uniqueidentifier] NOT NULL,
[Number] [int] NULL
)
How can I fill up the table with Number for each LessonHour so it would be the number of lesson hour in order?
The LessonHours cannot cross each other. Every school has defined its own lesson hour schema.
Example set of data
http://pastebin.com/efWCtUbv
What'd I do:
Order by SchoolId and StartTime
Use Cursor to insert into row next number, starting from 1 every time the SchoolId changes.
Edit:
Solution with cursor
select -- top 20
LH.[Id],
[StartTime],
[EndTime],
[SchoolId]
into #LH
from
LessonHour as LH
join RowStatus as RS on LH.RowStatusId = RS.Id
where
RS.IsActive = 1
select * from #LH order by SchoolId, StartTime
declare #id uniqueidentifier, #st time(7), #et time(7), #sid uniqueidentifier
declare #prev_sid uniqueidentifier = NEWID()
declare #i int = 1
declare cur scroll cursor for
select * from #LH order by SchoolId, StartTime
open cur;
fetch next from cur into #id, #st, #et, #sid
while ##FETCH_STATUS = 0
begin
--print #prev_sid
if #sid <> #prev_sid
begin
set #i = 1
end
update LessonHour set Number = #i where Id = #id
print #i
set #i = #i + 1
set #prev_sid = #sid
fetch next from cur into #id, #st, #et, #sid
end;
close cur;
deallocate cur;
drop table #LH
This is the result I was after http://pastebin.com/iZ8cnA6w

Merging the information from the StackOverflow questions SQL Update with row_number() and
How do I use ROW_NUMBER()?:
with cte as (
select number, ROW_NUMBER() OVER(partition by schoolid order by starttime asc) as r from lessonhour
)
update cte
set number = r

Would this work
CREATE TABLE [dbo].[LessonHour]
(
[Id] [uniqueidentifier] NOT NULL,
[StartTime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL,
[SchoolId] [uniqueidentifier] NOT NULL,
[Number] AS DATEDIFF(hour,[StartTime],[EndTime])
)
So if I understand the question correctly you require a calculated column which takes in the values of [StartTime] and [EndTime] and returns the number of hours for that lesson as an int. The above table definition should do the trick.

Selecting one specific data row (required), and 3 others (specific data row must be included)

I need to select a specific row and 2 other rows that is not that specific row (a total of 3). The specific row must always be included in the 3 results. How should I go about it? I think it can be done with a UNION ALL, but do I have another choice? Thanks all! :)
Here are my scripts to create the sample tables:
create table users (
user_id serial primary key,
user_name varchar(20) not null
);
create table result_table1 (
result_id serial primary key,
user_id int4 references users(user_id),
result_1 int4 not null
);
create table result_table2 (
result_id serial primary key,
user_id int4 references users(user_id),
result_2 int4 not null
);
insert into users (user_name) values ('Kevin'),('John'),('Batman'),('Someguy');
insert into result_table1 (user_id, result_1) values (1, 20),(2, 40),(3, 70),(4, 42);
insert into result_table2 (user_id, result_2) values (1, 4),(2, 3),(3, 7),(4, 5);
Here is my UNION query:
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id = 1
UNION ALL
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id != 1
LIMIT 3;
Are there any options other than a UNION? The query works and does what I want for now, but will it always include user_id = 1 if I had a larger set of rows (assume that user_id = 1 will always be there)? :(
Thank you all! :)

Common Table Expression Select where last observation was at a location

I have the following tables
Location table
[ID] [int] IDENTITY(1,1) NOT NULL
Package table
[ID] [int] IDENTITY(1,1) NOT NULL
PackageObservation table
[PackageID] int
[LocationID] int
[Date] datetime
[Quantity] int
For a given location I want to select packages where the last observation of the package was at the location
What is the Transact SQL?
I think it involves a common table expression but I cant figure it out.
More information.
The following almost does it, but I don't really want to assume that the identity field is in date order
select max(id) ,packageid
from packageobservation o1
where not exists (
select o2.id from packageobservation o2
where o2.[date] > o1.[date] )
group by packageid

You can use following SQL statement:
DECLARE #locationID int = 1
SELECT po.PackageID, MAX(po.[Date]) AS DateAtLocation
FROM PackageObservation po
WHERE po.LocationID=#locationID
AND NOT EXISTS (SELECT * FROM PackageObservation po2
WHERE po2.PackageID = po.PackageID AND
po2.LocationID <> po.LocationID AND
po2.[Date] >= po.[Date] )
GROUP BY po.PackageID
For better speed you can also add combined index on [LocationID],[PackageID] and [Date].
Seems to me that using CTE is not necessary here.

CTE Hierachy descending but picking up child nodes not parents from ancestor

Explanation
OK, the title might be a bit much :)
I'll paste the scripts at the end.
Imagine the following n-ary tree
.
|
---1 **(25)**
|
-----1.1 **(13)**
| |
| ----1.1.1 (1)
| |
| ----1.1.2 **(7)**
| | |
| | ----1.1.2.1 (4)
| | |
| | ----1.1.2.2 (3)
| |
| ----1.1.3 (5)
|
-----1.2 (2)
|
|
-----1.3 (10)
And so on, where the root branch "." can also have a 2,3,n branch and that branch would also have its own arbitrary tree form with n-branches possible from any give node. The values in brackets at the end of each node are the values at the node so to speak. Think of them as accounts with sub-accounts with the parent accounting being the sum of the child-accounts.
What I'm trying to do with CTE is to retrieve all the [sub] accounts directly beneath a parent. So for providing 1.1 as the search point, it'll retrieve that whole branch of the tree. But, if I try to be smart and sum the returned values, I will be adding (for this specific example) 1.1.2 twice, once through the summation of its sub accounts, the second by the summation of the value it itself contains.
How would I go about something like this?
Thanks a zillion :)
Here are the scripts:
Scripts
Table
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Account](
[ID] [nvarchar](50) NOT NULL,
[ParentID] [nvarchar](50) NULL,
[Value] [float] NOT NULL,
[HasChild] [bit] NOT NULL,
CONSTRAINT [PK_Account] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Account] WITH CHECK ADD CONSTRAINT [FK_Account_Account] FOREIGN KEY([ParentID])
REFERENCES [dbo].[Account] ([ID])
GO
ALTER TABLE [dbo].[Account] CHECK CONSTRAINT [FK_Account_Account]
GO
ALTER TABLE [dbo].[Account] ADD CONSTRAINT [DF_Account_HasChild] DEFAULT ((0)) FOR [HasChild]
GO
CTE Script
WITH
DescendToChild([ID],ParentID,Value)
AS
(
--base case
SELECT [ID],ParentID,Value FROM Account
Where ParentID = '1.1'
UNION ALL
----recursive step
SELECT
A.[ID],A.ParentID,A.Value FROM Account as A
INNER JOIN DescendToChild D on A.ParentID = D.ID
)
select * from DescendToChild;

Here's a solution based on your sample data. It works by only summing up those nodes with no children:
DECLARE #tree TABLE
(id INT
,parentid INT
,nodeName VARCHAR(10)
,VALUE INT
)
INSERT #tree (id,parentid,nodeName,VALUE)
VALUES
(1,NULL,'.',NULL),
(2,1,'1',25),
(3,2,'1.1',13),
(4,2,'1.2',2),
(5,2,'1.3',10),
(6,3,'1.1.1',1),
(7,3,'1.1.2',7),
(8,3,'1.1.3',5),
(9,7,'1.1.2.1',4),
(10,7,'1.1.2.2',3)
;WITH recCTE
AS
(
SELECT id, parentid, nodeName, value,
CASE WHEN EXISTS (SELECT 1 FROM #tree AS t1 WHERE t1.parentid = t.id) THEN 1 ELSE 0 END AS hasChildren
FROM #tree AS t
WHERE nodeName = '1.1'
UNION ALL
SELECT t.id, t.parentid, t.nodeName, t.value,
CASE WHEN EXISTS (SELECT 1 FROM #tree AS t1 WHERE t1.parentid = t.id) THEN 1 ELSE 0 END AS hasChildren
FROM #tree AS t
JOIN recCTE AS r
ON r.id = t.parentid
)
SELECT SUM(VALUE)
FROM recCTE
WHERE hasChildren = 0
OPTION (MAXRECURSION 0)

http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/959fe835-e43d-4995-882c-910f3aa0ff68/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Calculating Running Totals - tsql

Related

Is it possible to find duplicating records in two columns simultaneously in PostgreSQL?

How to fill column basing on two other columns

Selecting one specific data row (required), and 3 others (specific data row must be included)

Common Table Expression Select where last observation was at a location

CTE Hierachy descending but picking up child nodes not parents from ancestor

Categories

Resources