I am rather new in T-SQL and I have to create a view, where the output will be as shown below:
enter image description here
But my sales table doesn't have any data about sales in February and May for customer ABC and no data in January for customer XYZ, but I really want to have 0 for these months. How to do it in T-SQL?
This is great question about a very important topic that, even many experienced developers need to touch up on. Being "relatively new at SQL" I wont just offer a solution, I'll explain the key concepts involved.
The Auxiliary Table Numbers
First lets learn about what a tally table, aka numbers table is all about.
What does this do?
SELECT N = 1 ;
It returns the number 1.
N
-----
1
How about this?
SELECT N = 1 FROM (VALUES(0)) AS e(N);
Same thing:
N
-----
1
What does this return?
SELECT N = 1 FROM (VALUES(0),(0),(0),(0),(0),(0)) AS e(n);
Here I'm leveraging the VALUES table constructer which allows for a list of values to be treated like a view. This returns:
N
-------
1
1
1
1
1
We don't need the ones, we need the rows. This will make more sense in a moment. Now, what does this do?
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1;
It returns the same thing, five 1's, but I've wrapped the code into a CTE named e. Think of CTEs as inline unnamed views that you can reference multiple times. Now lets CROSS JOIN e to itself. This returns for 25 dummy rows (5*5).
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1, e e2;
Next we leverage ROW_NUMBER() over our set of dummy values.
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a;
Returns (truncated for brevity):
N
--------------------
1
2
3
...
24
25
Using as an auxiliary numbers table
#OneToTen is a table with random numbers 1 to 10. I need to count how many there are, returning 0 when there aren't any. NOTE MY COMMENTS:
;--== 2. Simple Use Case - Counting all numbers, including missing ones (missing = 0)
DECLARE #OneToTen TABLE (N INT);
INSERT #OneToTen VALUES(1),(2),(2),(2),(4),(8),(8),(10),(10),(10);
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT
N = i.N,
Wrong = COUNT(*), -- WRONG!!! Don't do THIS, this counts ALL rows returned
Correct = COUNT(t.N) -- Correct, this counts numbers from #OneToTen AKA "t.N"
FROM iTally AS i -- Aux Table of numbers
LEFT JOIN #OneToTen AS t -- Table to evaluate
ON i.N = t.N -- LEFT JOIN #OneToTen numbers to our Aux table of numbers
WHERE i.N <= 10 -- We only need the numbers 1 to 10
GROUP BY i.N; -- Group by with no Sort!!!
This returns:
N Wrong Correct
----- ----------- -----------
1 1 1
2 3 3
3 1 0
4 1 1
5 1 0
6 1 0
7 1 0
8 2 2
9 1 0
10 3 3
Note that I show you the wrong and right way to do this. Note how COUNT(*) is wrong for this, you need COUNT(whatever you are counting).
Auxiliary table of Dates (AKA calendar table)
My we use our numbers table to create a calendar table.
;--== 3. Auxilliary Month/Year Calendar Table
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1)
TheDate = f.Dt,
TheYear = YEAR(f.Dt),
TheMonth = MONTH(f.Dt),
TheWeekday = DATEPART(WEEKDAY,f.Dt),
DayOfTheYear = DATEPART(DAYOFYEAR,f.Dt),
LastDayOfMonth = EOMONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
This returns:
TheDate TheYear TheMonth TheWeekday DayOfTheYear LastDayOfMonth
---------- ----------- ----------- ----------- ------------ --------------
2019-10-01 2019 10 3 274 2019-10-31
2019-11-01 2019 11 6 305 2019-11-30
2019-12-01 2019 12 1 335 2019-12-31
2020-01-01 2020 1 4 1 2020-01-31
2020-02-01 2020 2 7 32 2020-02-29
2020-03-01 2020 3 1 61 2020-03-31
You will only need the YEAR and MONTH.
The Auxiliary Customer table
Because you are performing aggregations (SUM,COUNT,etc.) against multiple customers we will also need an Auxiliary table of customers, more commonly known as a lookup or dimension.
SAMPLE DATA:
;--== Sample Data
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
Distinct list of customers for an "Auxilliary Table of Customers"
SELECT DISTINCT s.Customer FROM #sale AS s;
For my sample data we get:
Customer
----------
ABC
CDF
FRED
Putting it all together
Here I'm going to:
Create a numbers table
Use my numbers table to create a calendar table
Create an auxiliary Customer table from #sale
CROSS JOIN (combine) both tables for a "junk dimension"
LEFT JOIN our sales data to our calendar/customer auxiliary tables/junk dimension
Group by the auxiliary table values
SOLUTION:
;--==== SAMPLE DATA
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
;--==== START/END DATEs
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
;--==== FINAL SOLUTION
WITH -- 6.1. Auxilliary Table of numbers:
E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a),
-- 6.2. Use numbers table to create an "Auxilliary Date Table" (Calendar Table):
MonthYear(SaleYear,SaleMonth) AS
(
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1) YEAR(f.Dt), MONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
)
SELECT
Customer = cust.Customer,
MonthYear = CONCAT(cal.SaleYear,'-',cal.SaleMonth),
Sales = ISNULL(SUM(s.SaleAmt),0)
-- Auxilliary Table of Customers
FROM (SELECT DISTINCT s.Customer FROM #sale AS s) AS cust -- 6.3. Aux Customer Table
CROSS JOIN MonthYear AS cal -- 6.4. Cross join to create Calendar/Customer Junk Dimension
LEFT JOIN #sale AS s -- 6.5. Join #sale to Junk Dimension on Year,Month and Customer
ON s.SaleYear = cal.SaleYear
AND s.SaleMonth = cal.SaleMonth
AND s.Customer = cust.Customer
GROUP BY cust.Customer, cal.SaleYear, cal.SaleMonth -- 6.6. Group by Junk Dim values
ORDER BY cust.Customer, cal.SaleYear, cal.SaleMonth; -- Order by not required
RESULTS:
Customer MonthYear Sales
---------- ------------ ------------
ABC 2019-10 0.00
ABC 2019-11 0.00
ABC 2019-12 410.00
ABC 2020-1 718.00
ABC 2020-2 0.00
ABC 2020-3 250.00
CDF 2019-10 200.00
CDF 2019-11 198.00
CDF 2019-12 0.00
CDF 2020-1 333.00
CDF 2020-2 5325.00
CDF 2020-3 1105.00
FRED 2019-10 0.00
FRED 2019-11 0.00
FRED 2019-12 0.00
FRED 2020-1 0.00
FRED 2020-2 0.00
FRED 2020-3 0.00
I have a need to select all rows from a table (main table) and join to another table (child table). In the results set, I want to include one column from the child table, that is only the first row / line number with a column value in a specified list. If there is no match for the specified list, it should be (null)
Desired Result:
ORDER_NO
ORDER_DATE
ORDER CUST
ORDER_VALUE
ITEM
1
02/14/2022
12345
$1,000.00
APPLES
2
02/13/2022
67890
$5,000.00
(null)
3
02/12/2022
45678
$100.00
PEARS
Example:
Main Table: Order Table
Order Number (Handle)
Order Date,
Order Customer,
Order Value
ORDER_NO
ORDER_DATE
ORDER CUST
ORDER_VALUE
1
02/14/2022
12345
$1,000.00
2
02/13/2022
67890
$5,000.00
3
02/12/2022
45678
$100.00
Child Table: Order Details Tbl
Order Number (Handle)
Line Number = Order Line No
Ordered Item,
Ordered Qty
ORDER_NO
LINE_NO
ITEM
1
10
APPLES
1
20
ORANGES
1
30
LETTUCE
2
10
BROCCOLI
2
20
CAULIFLOWER
2
30
LETTUCE
3
10
KALE
3
20
RADISHES
3
30
PEARS
In this example, the returned column is essentially the first line of the order that is a fruit, not a vegetable. And if the order includes no matching fruit, null is returned.
What my code is thus far:
SELECT
MAIN.ORDER_NO,
MAIN.ORDER_DATE,
MAIN.ORDER_CUST,
MAIN.ORDER_VALUE,
B.ITEM
FROM
MAIN
LEFT JOIN
(
SELECT
CHILD.ORDER_NO,
CHILD.LINE_NO,
CHILD.ITEM
FROM
CHILD
WHERE
CHILD.ORDER_NO||'_'||LINE_NO IN
(
SELECT
CHILD.ORDER_NO||'_'||MIN(LINE_NO) AS ORDER_LINE_NO
FROM
CHILD
WHERE
CHILD.ITEM IN ('APPLES','ORANGES','PEACHES','PEARS','GRAPES')
GROUP BY
CHILD.ORDER_NO
)
) B ON MAIN.ORDER_NO = B.ORDER_NO
'''
This code is of course not working as desired, as table 'B' is including all results from CHILD.
From Oracle 12, you can use:
SELECT o.*,
d.item
FROM orders o
LEFT OUTER JOIN LATERAL(
SELECT *
FROM order_details d
WHERE o.order_no = d.order_no
AND item IN ('APPLES','ORANGES','PEACHES','PEARS','GRAPES')
ORDER BY line_no ASC
FETCH FIRST ROW ONLY
) d
ON (1 = 1)
In earlier versions you can use:
SELECT o.*,
d.item
FROM orders o
LEFT OUTER JOIN(
SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY order_no ORDER BY line_no ASC)
AS rn
FROM order_details d
WHERE item IN ('APPLES','ORANGES','PEACHES','PEARS','GRAPES')
) d
ON (o.order_no = d.order_no AND rn = 1)
Which, for the sample data:
CREATE TABLE orders (ORDER_NO, ORDER_DATE, ORDER_CUST, ORDER_VALUE) AS
SELECT 1, DATE '2022-02-14', 12345, 1000.00 FROM DUAL UNION ALL
SELECT 2, DATE '2022-02-13', 67890, 5000.00 FROM DUAL UNION ALL
SELECT 3, DATE '2022-02-12', 45678, 100.00 FROM DUAL;
CREATE TABLE Order_Details (ORDER_NO, LINE_NO, ITEM) AS
SELECT 1, 10, 'APPLES' FROM DUAL UNION ALL
SELECT 1, 20, 'ORANGES' FROM DUAL UNION ALL
SELECT 1, 30, 'LETTUCE' FROM DUAL UNION ALL
SELECT 2, 10, 'BROCCOLI' FROM DUAL UNION ALL
SELECT 2, 20, 'CAULIFLOWER' FROM DUAL UNION ALL
SELECT 2, 30, 'LETTUCE' FROM DUAL UNION ALL
SELECT 3, 10, 'KALE' FROM DUAL UNION ALL
SELECT 3, 20, 'RADISHES' FROM DUAL UNION ALL
SELECT 3, 30, 'PEARS' FROM DUAL;
Both output:
ORDER_NO
ORDER_DATE
ORDER_CUST
ORDER_VALUE
ITEM
1
2022-02-14 00:00:00
12345
1000
APPLES
2
2022-02-13 00:00:00
67890
5000
null
3
2022-02-12 00:00:00
45678
100
PEARS
db<>fiddle here
I have a column in the following format:
Time Value
17:27 2
17:27 3
I want to get the distinct rows based on one column: Time. So my expected result would be one result. Either 17:27 3 or 17:27 3.
Distinct
T-SQL uses distinct on multiple columns instead of one. Distinct would return two rows since the combinations of Time and Value are unique (see below).
select distinct [Time], * from SAPQMDATA
would return
Time Value
17:27 2
17:27 3
instead of
Time Value
17:27 2
Group by
Also group by does not appear to work
select * from table group by [Time]
Will result in:
Column 'Value' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Questions
How can I select all unique 'Time' columns without taking into account other columns provided in a select query?
How can I remove duplicate entries?
This is where ROW_NUMBER will be your best friend. Using this as your sample data...
time value
-------------------- -----------
17:27 2
17:27 3
11:36 9
15:14 5
15:14 6
.. below are two solutions with that you can copy/paste/run.
DECLARE #youtable TABLE ([time] VARCHAR(20), [value] INT);
INSERT #youtable VALUES ('17:27',2),('17:27',3),('11:36',9),('15:14',5),('15:14',6);
-- The most elegant way solve this
SELECT TOP (1) WITH TIES t.[time], t.[value]
FROM #youtable AS t
ORDER BY ROW_NUMBER() OVER (PARTITION BY t.[time] ORDER BY (SELECT NULL));
-- A more efficient way solve this
SELECT t.[time], t.[value]
FROM
(
SELECT t.[time], t.[value], ROW_NUMBER() OVER (PARTITION BY t.[time] ORDER BY (SELECT NULL)) AS RN
FROM #youtable AS t
) AS t
WHERE t.RN = 1;
Each returns:
time value
-------------------- -----------
11:36 9
15:14 5
17:27 2
Need help with the following query:
Current Data format:
StudentID EnrolledStartTime EnrolledEndTime
1 7/18/2011 1.00 AM 7/18/2011 1.05 AM
2 7/18/2011 1.00 AM 7/18/2011 1.09 AM
3 7/18/2011 1.20 AM 7/18/2011 1.40 AM
4 7/18/2011 1.50 AM 7/18/2011 1.59 AM
5 7/19/2011 1.00 AM 7/19/2011 1.05 AM
6 7/19/2011 1.00 AM 7/19/2011 1.09 AM
7 7/19/2011 1.20 AM 7/19/2011 1.40 AM
8 7/19/2011 1.10 AM 7/18/2011 1.59 AM
I would like to calculate the time difference between EnrolledEndTime and EnrolledStartTime and group it with 15 minutes difference and the count of students that enrolled in the time.
Expected Result :
Count(StudentID) Date 0-15Mins 16-30Mins 31-45Mins 46-60Mins
4 7/18/2011 3 1 0 0
4 7/19/2011 2 1 0 1
Can I use a combination of the PIVOT function to acheive the required result. Any pointers would be helpful.
Create a table variable/temp table that includes all the columns from the original table, plus one column that marks the row as 0, 16, 31 or 46. Then
SELECT * FROM temp table name PIVOT (Count(StudentID) FOR new column name in (0, 16, 31, 46).
That should put you pretty close.
It's possible (just see the basic pivot instructions here: http://msdn.microsoft.com/en-us/library/ms177410.aspx), but one problem you'll have using pivot is that you need to know ahead of time which columns you want to pivot into.
E.g., you mention 0-15, 16-30, etc. but actually, you have no idea how long some students might take -- some might take 24-hours, or your full session timeout, or what have you.
So to alleviate this problem, I'd suggesting having a final column as a catch-all, labeled something like '>60'.
Other than that, just do a select on this table, selecting the student ID, the date, and a CASE statement, and you'll have everything you need to work the pivot on.
CASE WHEN date2 - date1 < 15 THEN '0-15' WHEN date2-date1 < 30 THEN '16-30'...ELSE '>60' END.
I have an old version of ms sql server that doesn't support pivot. I wrote the sql for getting the data. I cant test the pivot, so I tried my best, couldn't test the pivot part. The rest of the sql will give you the exact data for the pivot table. If you accept null instead of 0, it can be written alot more simple, you can skip the "a subselect" part defined in "with a...".
declare #t table (EnrolledStartTime datetime,EnrolledEndTime datetime)
insert #t values('2011/7/18 01:00', '2011/7/18 01:05')
insert #t values('2011/7/18 01:00', '2011/7/18 01:09')
insert #t values('2011/7/18 01:20', '2011/7/18 01:40')
insert #t values('2011/7/18 01:50', '2011/7/18 01:59')
insert #t values('2011/7/19 01:00', '2011/7/19 01:05')
insert #t values('2011/7/19 01:00', '2011/7/19 01:09')
insert #t values('2011/7/19 01:20', '2011/7/19 01:40')
insert #t values('2011/7/19 01:10', '2011/7/19 01:59')
;with a
as
(select * from
(select distinct dateadd(day, cast(EnrolledStartTime as int), 0) date from #t) dates
cross join
(select '0-15Mins' t, 0 group1 union select '16-30Mins', 1 union select '31-45Mins', 2 union select '46-60Mins', 3) i)
, b as
(select (datediff(minute, EnrolledStartTime, EnrolledEndTime )-1)/15 group1, dateadd(day, cast(EnrolledStartTime as int), 0) date
from #t)
select count(b.date) count, a.date, a.t, a.group1 from a
left join b
on a.group1 = b.group1
and a.date = b.date
group by a.date, a.t, a.group1
-- PIVOT(max(date)
-- FOR group1
-- in(['0-15Mins'], ['16-30Mins'], ['31-45Mins'], ['46-60Mins'])AS p