DISTINCT Still gives me dups records in results - tsql

I am getting double results for every part...so I'm obviously not using Distinct right here or need to use grouping?
example:
select DISTINCT p.PartNum,
p.PartID,
pn.Name,
d.[Description],
n.Note as PartNote
from Part p
join PartName pn on pn.PartNameID = p.PartNameID
join ApplicationPaint ap on ap.partID = p.PartID
join [Application] a on a.ApplicationID = ap.ApplicationID
join [Description] d on d.DescriptionID = ap.DescriptionID
join Note n on n.NoteID = a.NoteID
join MYConfig mmy on mmy.MMYConfigID = a.MYConfigID
join Model mo on mo.ModelID = mmy.ModelID
where mmy.ModelId = 2673
and substring(n.Note, CHARINDEX(']', n.Note) + 2, LEN(n.Note))= 'Johnson'
results:
T50015 765963 Some Part Name SomeNoteA [342] Johnson
T50015 765963 Some Part Name SomeNoteA [343] Johnson
T60024 766068 Some Part Name SomeNoteB [342] Johnson
T60024 766068 Some Part Name SomeNoteB [343] Johnson
T60231 766093 Some Part Name SomeNoteA [342] Johnson
T60231 766093 Some Part Name SomeNoteA [343] Johnson
T60232 766094 Some Part Name SomeNoteA [342] Johnson
T60232 766094 Some Part Name SomeNoteA [343] Johnson
T70134 766150 Some Part Name SomeNoteA [342] Johnson
T70134 766150 Some Part Name SomeNoteA [343] Johnson
T70230 766153 Some Part Name SomeNoteC [342] Johnson
T70230 766153 Some Part Name SomeNoteC [342] Johnson
T70230 766153 Some Part Name SomeNoteC [343] Johnson
Y50078 766253 Some Part Name SomeNoteH [342] Johnson
N30026 766352 Some Part Name SomeNoteT [342] Johnson
N30026 766352 Some Part Name SomeNoteT [343] Johnson
N50041 766465 Some Part Name SomeNoteK [342] Johnson
N50041 766465 Some Part Name SomeNoteK [343] Johnson
N60176 766499 Some Part Name SomeNoteX [342] Johnson
N60176 766499 Some Part Name SomeNoteX [343] Johnson
N60750 766503 Some Part Name SomeNoteU [342] Johnson
N60750 766503 Some Part Name SomeNoteU [343] Johnson
so I'm getting dups even triples on every PartNumber
T70230 766153 Some Part Name SomeNoteC [342] Johnson
T70230 766153 Some Part Name SomeNoteC [342] Johnson
T70230 766153 Some Part Name SomeNoteC [343] Johnson
T50015 765963 Some Part Name SomeNoteA [342] Johnson
T50015 765963 Some Part Name SomeNoteA [343] Johnson
so what I want to see is this:
T50015 765963 Some Part Name SomeNoteA [342] Johnson
T60024 766068 Some Part Name SomeNoteB [342] Johnson
T60231 766093 Some Part Name SomeNoteA [342] Johnson
T60232 766094 Some Part Name SomeNoteA [342] Johnson
T70134 766150 Some Part Name SomeNoteA [342] Johnson
T70230 766153 Some Part Name SomeNoteC [342] Johnson
Y50078 766253 Some Part Name SomeNoteH [342] Johnson
N30026 766352 Some Part Name SomeNoteT [342] Johnson
N50041 766465 Some Part Name SomeNoteK [342] Johnson
N60176 766499 Some Part Name SomeNoteX [342] Johnson
N60750 766503 Some Part Name SomeNoteU [342] Johnson
So I want only one unique row for each Unique part number, not dup part number rows showing here.
So to put it in other words for example I want this (one row only for a partID):
T70230 766153 Some Part Name SomeNoteC [342] Johnson
vs. dups:
T70230 766153 Some Part Name SomeNoteC [342] Johnson
T70230 766153 Some Part Name SomeNoteC [342] Johnson
T70230 766153 Some Part Name SomeNoteC [343] Johnson

You've omitted part names and notes from your example, but I believe DISTINCT means that it should omit rows from the results where all of the columns you've specified are duplicated, not any.
So since you've specified p.PartNum, p.PartID, pn.Name, d.[Description], and n.Note, only rows where all of those values are duplicated will be removed.
For example, you've said your results included:
T70230 766153 Some Part Name SomeNoteC [342] Johnson
T70230 766153 Some Part Name SomeNoteC [342] Johnson
T70230 766153 Some Part Name SomeNoteC [343] Johnson
If those rows were really:
T70230 766153 CoolWidget1 "So much fun!" [342] Johnson
T70230 766153 CoolWidget1 "Buy one today!" [342] Johnson
T70230 766153 CoolWidget2 "Buy one today!" [343] Johnson
Then all three rows will remain, as none have the exact same values for all five of the column names you've listed for the DISTINCT operator.

What's happening is that you have duplicated rows in your join. Remember, the Distinct doesn't necessarily filter on the columns in the select list (read up on this here: http://weblogs.sqlteam.com/jeffs/archive/2007/12/13/select-distinct-order-by-error.aspx).
There's a number of solutions, fort this from:
SELECT DISTINCT * FROM (
select p.PartNum,
p.PartID,
pn.Name,
d.[Description],
n.Note as PartNote
from Part p
join PartName pn on pn.PartNameID = p.PartNameID
join ApplicationPaint ap on ap.partID = p.PartID
join [Application] a on a.ApplicationID = ap.ApplicationID
join [Description] d on d.DescriptionID = ap.DescriptionID
join Note n on n.NoteID = a.NoteID
join MYConfig mmy on mmy.MMYConfigID = a.MYConfigID
join Model mo on mo.ModelID = mmy.ModelID
where mmy.ModelId = 2673
and substring(n.Note, CHARINDEX(']', n.Note) + 2, LEN(n.Note))= 'Johnson'
)
To using GROUP BY instead of distinct, to modifying the JOIN that is creating the duplicated rows. Something like:
select DISTINCT p.PartNum,
p.PartID,
pn.Name,
d.[Description],
n.Note as PartNote
from Part p
join (SELECT Distinct Name, PartNameID
FROM PartName) pn ON pn.PartNameId = p.PartNameID
join ApplicationPaint ap on ap.partID = p.PartID
join [Application] a on a.ApplicationID = ap.ApplicationID
join [Description] d on d.DescriptionID = ap.DescriptionID
join Note n on n.NoteID = a.NoteID
join MYConfig mmy on mmy.MMYConfigID = a.MYConfigID
join Model mo on mo.ModelID = mmy.ModelID
where mmy.ModelId = 2673
and substring(n.Note, CHARINDEX(']', n.Note) + 2, LEN(n.Note))= 'Johnson'

DISTINCT will combine rows, but they have to be exactly the same. Leaving off the PartNote field from your select will give you the unique set.
To get the PartNote as you've shown it in your example, the following should work ...
select DISTINCT p.PartNum,
p.PartID,
pn.Name,
d.[Description],
min(PartNote)
....
group by p.PartNum, p.PartID, pn.Name, d.[Description]

Related

Merge two files using Sort in batch

So i have two files that i need to merge. File A contains the key. Im unsure how to do this using SORT in batch (JCL). I know I need to use joinkey or ifthen. Would anyone know a solution to this?
Any help is greatly appreciated.
File A:
000001EMPLOYEE ID # 1
000002EMPLOYEE ID # 2
000003EMPLOYEE ID # 3
000004EMPLOYEE ID # 4
000005EMPLOYEE ID # 5
000006EMPLOYEE ID # 6
000007EMPLOYEE ID # 7
000008EMPLOYEE ID # 8
000009EMPLOYEE ID # 9
000010EMPLOYEE ID # 10
File B:
000001 John Doe
000002 Sam Maguire
000003 Jane Doe
000006 Jackson
000007 James Bond
000008 Spiderman
000019 Not an Employee
Desired output:
000001 EMPLOYEE ID # 1 John Doe
000002 EMPLOYEE ID # 2 Sam Maguire
000003 EMPLOYEE ID # 3 Jane Doe
000004 EMPLOYEE ID # 4
000005 EMPLOYEE ID # 5
000006 EMPLOYEE ID # 6 Jackson
000007 EMPLOYEE ID # 7 James Bond
000008 EMPLOYEE ID # 8 Spiderman
000009 EMPLOYEE ID # 9
000010 EMPLOYEE ID # 10
000019 Not an Employee
To join records in two files on common fields, you can use the DFSORT JoinKeys command.
Input file: empData.txt
000001EMPLOYEE ID # 1
000002EMPLOYEE ID # 2
000003EMPLOYEE ID # 3
000004EMPLOYEE ID # 4
000005EMPLOYEE ID # 5
000006EMPLOYEE ID # 6
000007EMPLOYEE ID # 7
000008EMPLOYEE ID # 8
000009EMPLOYEE ID # 9
000010EMPLOYEE ID # 10
Input file: empNames.txt
000001 John Doe
000002 Sam Maguire
000003 Jane Doe
000006 Jackson
000007 James Bond
000008 Spiderman
000019 Not an Employee
Input file: control.txt
* Employee Number in 1-6-EmpData.txt
JOINKEYS FILE=F1,FIELDS=(1,6,A)
* Employee Number in 1-6-EmpNames.txt
JOINKEYS FILE=F2,FIELDS=(1,6,A)
* Copy Name to EmpData
* Put file indicator (?) in column1
* This will be either 1,2 or B
REFORMAT FIELDS=(?,F1:1,26,F2:1,23)
JOIN UNPAIRED,F1,F2
* Use Change to see if record was only in file 2
* and replace employee number from file2 in output
OUTREC FIELDS=(1,1,CHANGE=(6,
C'2',28,6),NOMATCH=(2,6),
X,8,19,35,15)
END
Output File: joined.txt
000001 EMPLOYEE ID # 1 John Doe
000002 EMPLOYEE ID # 2 Sam Maguire
000003 EMPLOYEE ID # 3 Jane Doe
000004 EMPLOYEE ID # 4
000005 EMPLOYEE ID # 5
000006 EMPLOYEE ID # 6 Jackson
000007 EMPLOYEE ID # 7 James Bond
000008 EMPLOYEE ID # 8 Spiderman
000009 EMPLOYEE ID # 9
000010 EMPLOYEE ID # 10
000019 Not an Employee
The REFORMAT FIELDS ? places a '1','2' or 'B' in the output record to indicate how the joined record was built. If it is '2', then the record only occurred in the second file so we use the CHANGE function to get the Employee Id from the second file and place it in the output record.
Command Line:
ahlsort control.txt "empData.txt,dcb=(recfm=T,lrecl=100),empNames.txt,dcb=(recfm=T,lrecl=100)" "joined.txt,dcb=(recfm=T,lrecl=200)"
This was tested with AHLSORT v14r3-227 for Windows but should work the same on AHLSORT for Linux or DFSORT on the mainframe.

How do I produce a report to show the number of occurrences an employee has been absent from work

I have been asked to generate a report to show the number of occurrences an employee is absent from work sick.
If an employee is absent from work for 3 consecutive days this will be counted as 1 occurrence. If they then return to work and are then absent again for another 2 consecutive days this will be recorded as 2 occurrences.
I need to generate a report to show the number of occurrences an employee is away from work sick within a 6 month period.
I have set out an example below of the data showing an employee's absence records and how i need the report to look.
How data shows in database:
enter image description here
Name Absence Dates
John Smith 01-Sep-19
John Smith 02-Sep-19
John Smith 03-Sep-19
John Smith 10-Sep-19
John Smith 11-Sep-19
How i wish for the report to look:
Name Occurrences
John Smith 2
I would be grateful for any assistance with writing to code to achieve this result.
Not a full answer, as you should really do some of this yourself, however, based on what you have detailed in your quesiton, you could use the approach below to count up any spells of absence, within a 6 month period.
Assumes you would be compiling this using SQL Server
declare #absences table (empid nvarchar(10), [abs date] date, [ret date] date);
declare #staff table ([empid] int, [name1] nvarchar(50), [name2] nvarchar(50), [surname] nvarchar(50));
-- put some test values in the staff table to work with
insert into #staff
values
(1, 'John', 'Lewis', 'Smith'), -- using a unique ID here, in any good system this should be an incremental number for each new staff member added to the table
(2, 'James', 'Thomas', 'Brown')
-- put some test values in the absences table to work with
insert into #absences
values
(1, '2019-07-01', '2019-07-04'), -- userid, absence date & return date
(1, '2019-08-04', '2019-08-06'),
(2, '2019-07-02', '2019-07-05'),
(2, '2019-08-05', '2019-08-07')
select count(*) spellsoff, empid, name1, name2, surname, [days absent]
from
(
select
s.empid,
s.name1,
s.name2,
s.surname,
a.[abs date],
a.[ret date],
datediff(d,a.[abs date], a.[ret date]) [days absent]
from #staff s
left join #absences a
on s.empid = a.empid
where [abs date] >= DATEADD(M,-6,GETDATE()) -- pull back those employeess that have been absent in the last 6 months from today's date
)doff
group by empid, name1, name2, surname, [days absent]
Gives you the following breakdown:
spellsoff empid name1 name2 surname days absent
1 1 John Lewis Smith 2
1 1 John Lewis Smith 3
1 2 James Thomas Brown 2
1 2 James Thomas Brown 3

Select a specific row from a table with duplicated entries based on one field

I have a table which holds data in the following format, however I would like to be able to create a query that checks whether the reference number is duplicated and only return the entry with the latest date_issued.
ref_no name gender place date_issued
xgb/358632/p John Smith M London 02.08.2016
Xgb/358632/p John Smith M London 14.06.2017
Rtu/638932/k Jane Doe F Birmingham 04.09.2017
The result from the query should be;
ref_no name gender place date_issued
Xgb/358632/p John Smith M London 14.06.2017
Rtu/638932/k Jane Doe F Birmingham 04.09.2017
Is there a fairly straightforward solution for this?
assuming the date column is type date or timestamp
select distinct on(ref_no) * from tablename order by refno,date desc;
this works beacuse distinct on supresses rows with duplicates of the expression in parenthese.

Return a Distinct Result Set Across Multiple Tables - DB2

My query:
SELECT DISTINCT V.COMPANY, V.VENDOR_NUM, V.VENDOR_PAYEE_NUM,
C.CONTACT_NAME, C.CONTACT_AUDIT_DATE
FROM
VENDOR_TABLE AS V INNER JOIN CONTACT_TABLE AS C ON (V.COMPANY = C.COMPANY AND DIGITS(V.VENDOR_NUM) = C.VENDOR_NUM)
WHERE DATE(INSERT(INSERT(DIGITS(V.VENDOR_AUDIT_DATE), 5, 0, '-'), 8, 0, '- ')) >= DATE(VARCHAR_FORMAT(TIMESTAMP_ISO(CURRENT DATE), 'YYYY-MM-DD')) - 21 DAYS
AND V.VENDOR_AUDIT_DATE <> 0 AND (V.STATUS = ' ' OR V.STATUS IS NULL)
It returns the following result:
COMPANY VENDOR_NUM VENDOR_PAYEE_NUM V CONTACT_NAME CONTACT_AUDIT_DATE
------- ---------- ---------------- ------------------------------ ------------------
908 13514 13514 Coleen 20120427
908 34242 34242 Frank Cheese 20100120
908 60148 60148 Sarah Lee/Jonh Doe 20141121
908 60148 60148 Sarah Lee/Jonh Doe 20141121
908 60151 60151 Sarah Lee/Jonh Doe 20140919
908 60151 60151 Sarah Lee/Jonh Doe 20140919
908 60152 60152 Sarah Lee/Jonh Doe 20140919
908 60152 60152 Sarah Lee/Jonh Doe 20140919
The contact table may have multiple contacts for the same vendor. However, I only want to pull back one contact per vendor number retrieved from the vendor table. How do I join these tables, yet only select 1 or distinct vendor contact from the Contact table per vendor from the Vendor table?
I modified my query a bit. I found that there is a sequence number on the contact table that I can manipulate. I would need the max sequence in that record set. However, I keep receiving an error, "Column CZCO or expression in SELECT list not valid". I'm not sure what I might be doing wrong...Any help is greatly appreciated.
SELECT COMPANY, VENDOR_NUM, VENDOR_PAYEE_NUM,
VENDOR_NAME, COUNTRY_CODE, ADDRESS_1,
ADDRESS_2, CITY_STATE, ZIP_CODE,
PAY_ADDR_1, PAY_ADDR_2, PAY_CITY_STATE,
PAYEE_ZIP_CODE, VENDOR_AUDIT_DATE
FROM VENDOR_TABLE V
INNER JOIN
(
SELECT CONTACT_KEY, COMPANY, CONTACT_PHONE,
CONTACT_FAX, CONTACT_EMAIL, CONTACT_NAME,
CONTACT_AUDIT_DATE, MAX(SEQ_NUM) AS SEQ_NUM
FROM CONTACT_TABLE GROUP BY CONTACT_KEY
) C ON (V.COMPANY = C.COMPANY AND DIGITS(V.VENDOR_NUM) = C.CONTACT_KEY)
WHERE DATE(INSERT(INSERT(DIGITS(V.VENDOR_AUDIT_DATE), 5, 0, '-'), 8, 0, '-')) >= DATE(VARCHAR_FORMAT(TIMESTAMP_ISO(CURRENT DATE), 'YYYY-MM-DD')) - 20 DAYS
AND V.VENDOR_AUDIT_DATE <> 0 AND (V.STATUS = ' ' OR V.STATUS IS NULL)
one way is to use the keep dense rank function to group by something and keep the rest of it together.
WITH sample_data AS (SELECT 908 AS company,
13514 AS vendor_num,
13514 AS vendor_payee_num,
'Coleen' AS contact_name,
20120427 AS contact_audit_date
FROM DUAL
UNION ALL
SELECT 908,
34242,
34242,
'Frank Cheese',
20100120
FROM DUAL
UNION ALL
SELECT 908,
60148,
60148,
'Sarah Lee/Jonh Doe',
20141121
FROM DUAL
UNION ALL
SELECT 908,
60148,
60148,
'Sarah Lee/Jonh Doe',
20141121
FROM DUAL
UNION ALL
SELECT 908,
60151,
60151,
'Sarah Lee/Jonh Doe',
20140919
FROM DUAL
UNION ALL
SELECT 908,
60151,
60151,
'Sarah Lee/Jonh Doe',
20140919
FROM DUAL
UNION ALL
SELECT 908,
60152,
60152,
'Sarah Lee/Jonh Doe',
20140919
FROM DUAL
UNION ALL
SELECT 908,
60152,
60152,
'Sarah Lee/Jonh Doe',
20140919
FROM DUAL)
SELECT company,
vendor_num,
vendor_payee_num,
MIN (contact_name)
KEEP (DENSE_RANK FIRST ORDER BY contact_name, contact_audit_date)
contact_name,
MIN (contact_audit_date)
KEEP (DENSE_RANK FIRST ORDER BY contact_name, contact_audit_date)
contact_audit_date
FROM sample_data
GROUP BY company, vendor_num, vendor_payee_num
COMPANY VENDOR_NUM VENDOR_PAYEE_NUM CONTACT_NAME CONTACT_AUDIT_DATE
908 13514 13514 Coleen 20120427
908 34242 34242 Frank Cheese 20100120
908 60148 60148 Sarah Lee/Jonh Doe 20141121
908 60151 60151 Sarah Lee/Jonh Doe 20140919
908 60152 60152 Sarah Lee/Jonh Doe 20140919

Dynamic SQL SELECT column names and related data using UNPIVOT

I have a table called PERSONNELSERVICELEVELS with general info such as ID and Name, but also many columns identified as ServiceLevel% as follows:
Person_ID Last_Name First_Name ServiceLevel1 ServiceLevel2 ServiceLevel3 etc.
--------- --------- ---------- ------------- ------------- -------------
222 Doe John 4 5 NULL
555 Doe Jane 2 6 9
I would like to create a SELECT statement to produce this output:
Person_ID Last_Name First_Name ServiceLevel Level
--------- --------- ---------- ------------ -----
222 Doe John ServiceLevel1 4
222 Doe John ServiceLevel2 5
222 Doe John ServiceLevel3 NULL
555 Doe Jane ServiceLevel1 2
555 Doe Jane ServiceLevel2 6
555 Doe Jane ServiceLevel3 9
Thanks.
To turn columns into rows, so to speak, you might use a UNION, as in:
select Person_ID, Last_Name, First_Name, 'ServiceLevel1' as ServiceLevel, ServiceLevel1 as [Level]
from PERSONNELSERVICELEVELS
union all
select Person_ID, Last_Name, First_Name, 'ServiceLevel2' as ServiceLevel, ServiceLevel2 as [Level]
from PERSONNELSERVICELEVELS
union all
-- etc.
I'm using union all here because union implicitly runs a distinct operation as well, and that is unnecessary here.
You can also do this type of thing with UNPIVOT.