CASE WHEN GROUP BY Returning Duplicate values

CASE WHEN GROUP BY Returning Duplicate values - tsql

I am using SQL Server 2012 and am trying to use a Case When Group By query. The results give me duplicate values in the ProdType field. I know this has to do with my Group by clause. My query is:
SELECT
CASE WHEN dbo.tblPayCode.PayCode IN (10,11,12,13,14,15,16,20,21,22,23,24,25,26,30,31,32,33,34,35,36) THEN 'WORK'
WHEN dbo.tblPayCode.PayCode > 39 AND dbo.tblPayCode.PayCode NOT IN (40,43,57,58,59,67,68,72,75,78,79) THEN 'SVLA'
END AS ProdType,
dbo.tblJobsWorked.WrkDate, SUM(dbo.tblJobsWorked.Hours) AS TotalHours
FROM dbo.tblEmployees INNER JOIN
dbo.tblJobsWorked ON dbo.tblEmployees.EMP_NUMB = dbo.tblJobsWorked.EMP_NUMB INNER JOIN
dbo.tblPayCode ON dbo.tblJobsWorked.PayCode = dbo.tblPayCode.PayCode INNER JOIN
dbo.tblCostCenters ON dbo.tblEmployees.CC_ORGN_NUMB = dbo.tblCostCenters.CC_C_NB AND
dbo.tblEmployees.ORGN_DEPT_TYP_C = dbo.tblCostCenters.DEPT_TYP_C AND dbo.tblJobsWorked.CC_RSPB_NUMB = dbo.tblCostCenters.CC_C_NB AND
dbo.tblJobsWorked.RSPB_DEPT_TYP_C = dbo.tblCostCenters.DEPT_TYP_C
GROUP BY
dbo.tblPayCode.PayCode, dbo.tblJobsWorked.WrkDate
HAVING dbo.tblJobsWorked.WrkDate>'2013-04-30'
ORDER BY dbo.tblJobsWorked.WrkDate
My results are
ProdType WrkDate TotalHours
WORK 2013-05-01 00:00:00.000 58.70
WORK 2013-05-01 00:00:00.000 5.20
SVLA 2013-05-01 00:00:00.000 8.00
SVLA 2013-05-01 00:00:00.000 8.00
WORK 2013-05-01 00:00:00.000 68.00
WORK 2013-05-01 00:00:00.000 825.40
WORK 2013-05-01 00:00:00.000 8.90
SVLA 2013-05-01 00:00:00.000 21.00
SVLA 2013-05-01 00:00:00.000 8.00
SVLA 2013-05-01 00:00:00.000 8.00
WORK 2013-05-01 00:00:00.000 5.30
SVLA 2013-05-01 00:00:00.000 53.00
SVLA 2013-05-01 00:00:00.000 8.60
I am expecting to see 2 rows for 5/1/13 with 'WORK' and 'SVLA' and their total corresponding hours. Your help is much appreciated! Thanks

You should move that HAVING condition to WHERE
WHERE dbo.tblJobsWorked.WrkDate>'2013-04-30'
You should consider modifying your query like
SELECT ProdType, WrkDate, SUM(Hours) AS TotalHours FROM (
SELECT
CASE WHEN dbo.tblPayCode.PayCode IN (10,11,12,13,14,15,16,20,21,22,23,24,25,26,30,31,32,33,34,35,36) THEN 'WORK'
WHEN dbo.tblPayCode.PayCode > 39 AND dbo.tblPayCode.PayCode NOT IN (40,43,57,58,59,67,68,72,75,78,79) THEN 'SVLA'
END AS ProdType,
dbo.tblJobsWorked.WrkDate,
dbo.tblJobsWorked.Hours
FROM dbo.tblEmployees INNER JOIN
dbo.tblJobsWorked ON dbo.tblEmployees.EMP_NUMB = dbo.tblJobsWorked.EMP_NUMB INNER JOIN
dbo.tblPayCode ON dbo.tblJobsWorked.PayCode = dbo.tblPayCode.PayCode INNER JOIN
dbo.tblCostCenters ON dbo.tblEmployees.CC_ORGN_NUMB = dbo.tblCostCenters.CC_C_NB AND
dbo.tblEmployees.ORGN_DEPT_TYP_C = dbo.tblCostCenters.DEPT_TYP_C AND dbo.tblJobsWorked.CC_RSPB_NUMB = dbo.tblCostCenters.CC_C_NB AND
dbo.tblJobsWorked.RSPB_DEPT_TYP_C = dbo.tblCostCenters.DEPT_TYP_C
WHERE dbo.tblJobsWorked.WrkDate > '2013-04-30'
ORDER BY dbo.tblJobsWorked.WrkDate ) xx
GROUP BY ProdType, WrkDate;

The query first applies the grouping and only then applies the row-wise calculations in the select list. If you want these rows to be grouped together, you could apply the same case expression in the group by clause too.

Related

Misaligned data and duplicate keys using Deedle? F#

I have data that has a reference date and a publish date. Similar to economic reports which are published/released on different dates than they reference (i.e. Q4 GDP for 2014 references the date 12/31/2014 but is published the following week on 01/07/2015). Multiple references date values can be published on a single publish date. I want to be able to add data together which has similar structure just with misaligned and duplicate reference and publish dates.
Below is a sample of the data for Item A:
Publish_ItemA Reference_ItemA Value_ItemA
2002-01-10 00:00:00.000 2001-09-30 00:00:00.000 83
2002-02-14 00:00:00.000 2001-12-31 00:00:00.000 48
2002-05-23 00:00:00.000 2002-03-31 00:00:00.000 57
2002-08-15 00:00:00.000 2002-06-30 00:00:00.000 41
2002-12-31 00:00:00.000 2002-09-30 00:00:00.000 18
2003-02-13 00:00:00.000 2002-12-31 00:00:00.000 18
2003-05-22 00:00:00.000 2003-03-31 00:00:00.000 29
2003-08-21 00:00:00.000 2003-06-30 00:00:00.000 40
2003-12-31 00:00:00.000 2003-09-30 00:00:00.000 51
2004-12-16 00:00:00.000 2002-12-31 00:00:00.000 17
2004-12-16 00:00:00.000 2003-03-31 00:00:00.000 28
2004-12-16 00:00:00.000 2003-06-30 00:00:00.000 33
2004-12-16 00:00:00.000 2003-09-30 00:00:00.000 60
2004-12-16 00:00:00.000 2003-12-31 00:00:00.000 107
Below is a sample of the data for Item B:
Publish_ItemB Reference_ItemB Value_ItemB
2001-01-25 00:00:00.000 2000-12-31 00:00:00.000 -207
2001-04-25 00:00:00.000 2000-12-31 00:00:00.000 -195
2001-04-25 00:00:00.000 2001-03-31 00:00:00.000 43
2001-07-19 00:00:00.000 2001-06-30 00:00:00.000 61
2001-10-18 00:00:00.000 2001-09-30 00:00:00.000 66
2002-01-17 00:00:00.000 2001-12-31 00:00:00.000 38
2002-04-24 00:00:00.000 2002-03-31 00:00:00.000 40
2002-07-18 00:00:00.000 2002-06-30 00:00:00.000 32
2002-10-17 00:00:00.000 2002-09-30 00:00:00.000 -45
2003-01-16 00:00:00.000 2002-12-31 00:00:00.000 -8
2003-04-24 00:00:00.000 2003-03-31 00:00:00.000 14
2003-07-17 00:00:00.000 2003-06-30 00:00:00.000 19
2003-10-23 00:00:00.000 2003-09-30 00:00:00.000 44
2004-01-22 00:00:00.000 2003-12-31 00:00:00.000 63
I would like to be able to do alignments and arithmetic with columns of values (i.e. itemAframe?Value_ItemA + itemBframe?Value_ItemB) and return a series with either the reference date or the publish date dependent on which was required.
Aligning with the reference date is easy because the dates are non-overlapping so there is no issue with a duplicate key, but returning the frame with a publish date is problematic because not all keys will be unique
Any suggestion would be much appreciated.
Thanks!

The answer depends on what you want to do when there are multiple values for a given (duplicate) publish day. Will there be the same number of keys in both of the frames? Do you have some way of aggregating the values (e.g. take average or sum them)?
For example, let's say that Publish and Reference are just integers:
let f =
frame [ "Publish" => Series.ofValues [ 1; 1; 2; 2 ]
"Reference" => Series.ofValues [ 1; 2; 3; 4 ]
"Value" => Series.ofValues [ 10; 9; 11; 8] ]
You can get a frame with multi-level index (grouped by publish day like this):
f
|> Frame.groupRowsByInt "Publish"
Now your keys will be tuples - the first element is the "Publish" value and the second is the original row index (here, just an ordinal index - but you could also use the "Reference" date as the secondary part of the index). If you have some way of making the keys match at this point (e.g. there is a same number of duplicates in both frames and ordinal indexing is good enough), then you can just use the frames as they are now.
However, the next thing you can do is to create a series of frames, containing the groups:
f
|> Frame.groupRowsByInt "Publish"
|> Frame.nest
So, for example, if you wanted to get an average value for each Publish day, you could do:
f
|> Frame.groupRowsByInt "Publish"
|> Frame.nest
|> Series.mapValues (fun df -> df?Value |> Stats.mean)
Alternatively, you can create a series that has a list of values for each "Publish" date, but this will make further calculations harder:
f
|> Frame.groupRowsByInt "Publish"
|> Frame.nest
|> Series.mapValues (fun df -> df?Value.Values |> List.ofSeq)
Fundamentally, you need some indexing scheme that will uniquely identify rows in both of the frames, so that you can align them. The key could be just the "Publish" date or the "Publish" date together with something else.

SQL Exclude Field from GROUP BY in results but use in WHERE

Pretty simple table:
CREATE TABLE [dbo].[Recognitions](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Submitter_CH_id] [int] NULL,
[Submitter_Last_Name] [varchar](50) NULL,
[Submit_Date] [datetime] NULL,
Submitter_CH_id Submitter_Last_Name Submit_Date
50 Prokupek 2014-04-01 00:00:00.000
50 Prokupek 2014-04-07 00:00:00.000
50 Prokupek 2014-04-01 00:00:00.000
50 Prokupek 2014-04-07 00:00:00.000
215 Conklin 2014-04-07 00:00:00.000
215 Conklin 2014-04-07 00:00:00.000
130 Catron 2014-04-07 00:00:00.000
136 Jardee 2014-04-07 00:00:00.000
247 Emken 2014-04-07 00:00:00.000
What I need to do is get a count of all the submissions made with in a certain date range grouped by recipient_ch_id. My app allows the user to enter the date range, so it needs to be part of the query results for my app to use it.
I need the results to be grouped by Submitter_CH_id. So something like this:
SELECT TOP (100) PERCENT Submitter_CH_id, Submitter_First_Name, Submitter_Last_Name, Submitter_Email, Submitter_Department,
Submit_Date AS [Last Submit], COUNT(Submitter_CH_id) AS [Total Submit]
FROM dbo.Recognitions
GROUP BY Submitter_First_Name, Submitter_Last_Name, Submitter_Email, Submitter_Department, Submitter_CH_id, Submit_Date
ORDER BY Submitter_CH_id
What I would like is the following:
Submitter_CH_ID Submitter_Last_Name Total Submissions
50 Prokupek 4
215 Conklin 2
130 Catron 1
... but because I also have to include Submit_Date in my GROUP BY the results instead show the count per ID per unique date (which it has to of course), so I get something like this:
Submitter_CH_ID Submitter_Last_Name Total Submissions
50 Prokupek 2
50 Prokupek 2
215 Conklin 1
215 Conklin 1
130 Catron 1
Any thoughts? This is MS SQL 2008. Thanks very much.

use a sub query.... like this:
select Submitter_CH_ID, Submitter_Last_Name, count(ID) AS [Total Submissions]
from (
select ID, Submitter_CH_ID, Submitter_Last_Name
from dbo.Recognitions
where date >= #start_date and date <= #end_date
) T
GROUP BY Submitter_CH_ID, Submitter_Last_Name
yay sub-queries!

I am guessing that you feel that you need Submit_Date in the GROUP BY clause because you're including it in the SELECT clause, because you're filtering by this value in the returned results. If that's correct, you can delete the field from your SELECT and GROUP BY lists if you instead filter in this query:
SELECT
Submitter_CH_ID, Submitter_Last_Name, COUNT(*) AS Submissions
FROM
Recognitions
WHERE
Submit_Date BETWEEN #StartDate AND #EndDate
GROUP BY
Submitter_CH_ID, Submitter_Last_Name
ORDER BY
Submitter_CH_ID

Creating row in SQL query to fill missing dates

I've only been playing around with SQL for a few months so please go easy.
So I've currently got the following that will return something like this:
SELECT WorkItems.workDate
, SUM(WorkItems.hours)as [hours]
, WorkItems.techID
FROM Jobs
INNER JOIN WorkItems
ON Jobs.jobID = WorkItems.jobID
INNER JOIN Techs
ON Jobs.techID = Techs.techID
WHERE (WorkItems.workDate BETWEEN #startdate AND #enddate)
AND (WorkItems.techID = 41)
GROUP BY WorkItems.workDate
, WorkItems.techID
Returns:
2013-06-03 00:00:00.000 7.00 41
2013-06-05 00:00:00.000 7.00 41
2013-06-06 00:00:00.000 7.50 41
2013-06-07 00:00:00.000 1.00 41
I'm trying to fill the date gap left by the query to get something like this:
2013-06-03 00:00:00.000 7.00 41
2013-06-04 00:00:00.000 0 41
2013-06-05 00:00:00.000 7.00 41
2013-06-06 00:00:00.000 7.50 41
2013-06-07 00:00:00.000 1.00 41
I've tried adding the dates by generating everything within a defined range, but now i'm getting duplicates.
Declare #t table(Date datetime, hoursWorked int, tech int)
Declare #startdate datetime
Declare #enddate datetime
declare #techID int
Set #startdate='06/02/2013'
Set #enddate='06/07/2013'
Set #techID='41'
Select dateadd(day,number,#startdate) as workDate, 0 as [hours], #techID from master.dbo.spt_values
where master.dbo.spt_values.type='p' and dateadd(day,number,#startdate)<=#enddate
and dateadd(day,number,#startdate) not in
(select date from #t)
UNION
SELECT WorkItems.workDate, SUM(WorkItems.hours)as [hours], WorkItems.techID
FROM Jobs INNER JOIN
WorkItems ON Jobs.jobID = WorkItems.jobID INNER JOIN
Techs ON Jobs.techID = Techs.techID
WHERE (WorkItems.workDate BETWEEN #startdate AND #enddate) AND (WorkItems.techID = 41)
GROUP BY WorkItems.workDate, WorkItems.techID
Returns:
2013-06-02 00:00:00.000 0.00 41
2013-06-03 00:00:00.000 0.00 41
2013-06-04 00:00:00.000 0.00 41
2013-06-04 00:00:00.000 7.00 41
2013-06-05 00:00:00.000 0.00 41
2013-06-05 00:00:00.000 7.00 41
2013-06-06 00:00:00.000 0.00 41
2013-06-06 00:00:00.000 7.50 41
2013-06-07 00:00:00.000 0.00 41
2013-06-07 00:00:00.000 1.00 41
Sorry if this query is cringe worthy but i'm just trying to get my head around SQL.
Thanks.

t-sql multiplying 2 tables plus join between 2 dates which comes from asp.net

I have below 2 tables
Table1
Plant
-----
TRP1
DEP1
Table2
Config
------
84ROC20
100ROC20
and 2 textboxes
1.Start date(datetime) : 2012-08-01 00:00:00.000
2.Enddate(datetime):2012-10-01 00:00:00.000
I want to have below table as a result with 3 columns
Plant Config Time
----- ------ -------
TRP1 84ROC20 2012-08-01 00:00:00.000
TRP1 84ROC20 2012-09-01 00:00:00.000
TRP1 84ROC20 2012-10-01 00:00:00.000
DEP1 84ROC20 2012-08-01 00:00:00.000
DEP1 84ROC20 2012-09-01 00:00:00.000
DEP1 84ROC20 2012-10-01 00:00:00.000
TRP1 100ROC20 2012-08-01 00:00:00.000
TRP1 100ROC20 2012-09-01 00:00:00.000
TRP1 100ROC20 2012-10-01 00:00:00.000
DEP1 100ROC20 2012-08-01 00:00:00.000
DEP1 100ROC20 2012-09-01 00:00:00.000
DEP1 100ROC20 2012-10-01 00:00:00.000
Can you please help to have this table

I'm assuming SQL Server 2005+ (using a recursive CTE for convenience), but this should help you out:
-- Get user data
declare #StartDate datetime = '2012-08-01'
declare #EndDate datetime = '2012-10-01'
-- Actual query
;with Dates as ( -- Build a date table based upon the user values
select #StartDate as DateEntry -- Start at StartDate
union all
select dateadd(m, 1, DateEntry) -- Recursively add a month
from Dates
where dateadd(m, 1, DateEntry) <= #EndDate -- Until the EndDate is reached
)
select *
from Plant, Config, Dates -- Cross-join all tables to get all possibilities
order by Config desc, Plant desc, DateEntry
This will give the output below for your test data:
Plant Config DateEntry
---------- ---------- -----------------------
TRP1 84ROC20 2012-08-01 00:00:00.000
TRP1 84ROC20 2012-09-01 00:00:00.000
TRP1 84ROC20 2012-10-01 00:00:00.000
DEP1 84ROC20 2012-08-01 00:00:00.000
DEP1 84ROC20 2012-09-01 00:00:00.000
DEP1 84ROC20 2012-10-01 00:00:00.000
TRP1 100ROC20 2012-08-01 00:00:00.000
TRP1 100ROC20 2012-09-01 00:00:00.000
TRP1 100ROC20 2012-10-01 00:00:00.000
DEP1 100ROC20 2012-08-01 00:00:00.000
DEP1 100ROC20 2012-09-01 00:00:00.000
DEP1 100ROC20 2012-10-01 00:00:00.000
Essentially, the trick here is to build the Dates table on the fly and then cross-join that with Plant and Config. You could build the Dates table in a variety of other ways, such as with a tally table, cursor, while-loop, in asp.net itself, etc. I like the ease of a recursive CTE here, though I'm assuming a small number of dates need to be generated. More than 100 dates will require that maxrecursion be set, if not picking another method entirely if performance is a problem.

Select latest date

SELECT
distinct
HRM_Employee.EmployeeId EmployeeXId,
([HRM_Employee].[FirstName] +' '+ISNULL([HRM_Employee].[MiddleName],' ')+' '+ISNULL([HRM_Employee].[LastName],' ')) AS FirstName
-- ,[FirstName]
,[HRM_Employee].[MiddleName]
,[HRM_Employee].[LastName]
,[HRM_Employee].[Code]
,[HRM_Employee].[UserName]
,[HRM_Employee].[Password]
,[HRM_Employee].[DateOfBirth]
,[HRM_Employee].[OriginalBirthDate]
,[HRM_Employee].[Gender]
,[HRM_Employee].[BloodGroup]
,[HRM_Employee].[Height]
,[HRM_Employee].[MaritalStatus]
,[HRM_Employee].[DateOfMarriage]
,[HRM_Employee].[IdentificationMark1]
,[HRM_Employee].[IdentificationMark2]
,[HRM_Employee].[Religion]
,(SELECT [A].[FirstName] +' '+ [A].[MiddleName] +' '+ [A].[LastName]
FROM [dbo].[HRM_Employee] [A] WHERE [A].EmployeeId = [HRM_Transfer].[ReportingOfficerXId]
) [PersonInCharge]
,[HRM_Department].[Name] [DepartmentName]
,[HRM_Branch].[Name] [BranchName]
,[HRM_Division].[Name] [DivisionName]
,[HRM_Designation].[Name] [DesignationName]
,HRM_Transfer.TransferDate
from HRM_Employee
LEFT join [dbo].[HRM_Division]
ON [HRM_Employee].DivisionXId = [HRM_Division].DivisionId
JOIN [dbo].[HRM_Designation]
ON [HRM_Employee].DesignationXId = [HRM_Designation].DesignationId
JOIN [HRM_Department]
ON [HRM_Employee].[DepartmentXId] = [HRM_Department].[DepartmentId]
JOIN [HRM_Branch]
ON [HRM_Employee].[BranchXId] = [HRM_Branch].[BranchId]
INNER JOIN HRM_Transfer
ON HRM_Transfer.EmployeeXId=HRM_Employee.EmployeeId
WHERE
Convert(varchar(11),HRM_Transfer.TransferDate,103) <=Convert(varchar(11), getdate(),103)
END
When I excecute this i got the output as follows
EmployeeXId FirstName TransferDate
34 Ambarish V 2012-08-09 00:00:00.000
54 Anil N P 2012-08-09 00:00:00.000
55 Ann Rose Abraham 2012-08-08 00:00:00.000
55 Ann Rose Abraham 2012-08-09 00:00:00.000
74 Anees M S 2012-08-09 00:00:00.000
From this I want to display data with latest Transfer date only. That is in EmployeeId 55 I need to display data with Transfer date 2012-08-09 00:00:00.000 only. What modification is I want to do in above SP to get required answer?. Please help me to solve this.

Group by EmployeeXId and select MAX(TransferDate).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

CASE WHEN GROUP BY Returning Duplicate values - tsql

The query first applies the grouping and only then applies the row-wise calculations in the select list. If you want these rows to be grouped together, you could apply the same case expression in the group by clause too.

Related

Misaligned data and duplicate keys using Deedle? F#

SQL Exclude Field from GROUP BY in results but use in WHERE

Creating row in SQL query to fill missing dates

t-sql multiplying 2 tables plus join between 2 dates which comes from asp.net

Select latest date

Categories

Resources