need help writing a date sensitive T-SQL query - tsql

I need help writing a T-SQL query that will generate 52 rows of data per franchise from a table that will often contain gaps in the 52 week sequence per franchise (i.e., the franchise may have reported data bi-weekly or has not been in business for a full year).
The table I'm querying against looks something like this:
FranchiseId | Date | ContractHours | PrivateHours
and I need to join it to a table similar to this:
FranchiseId | Name
The output of the query needs to look like this:
Name | Date | ContractHours | PrivateHours
---- ---------- ------------- ------------
AZ1 08-02-2011 292 897
AZ1 07-26-2011 0 0 -- default to 0's for gaps in sequence
...
AZ1 08-03-2010 45 125 -- row 52 for AZ1
AZ2 08-02-2011 382 239
...
AZ2 07-26-2011 0 0 -- row 52 for AZ2
I need this style of output for every franchise, i.e., 52 rows of data with default rows for any gaps in the 52 week sequence, in a single result set. Thus, if there are 100 franchises, the result set should be 5200 rows.
What I've Tried
I've tried the typical suggestions of:
Create a table with all possible dates
LEFT OUTER JOIN this to the table of data needed
The problems I'm running into are
ensuring that for every franchise their are 52 rows and
filling in gaps with the franchise name and 0 for hours, I can't
have the following in the result set:
Name | Date | ContractHours | PrivateHours
---- ---------- ------------- ------------
NULL 08-02-2011 NULL NULL
I don't know where to go from here? Is there an efficient way to write a T-SQL query that will produce the required output?

The bare bones is this
Generate 52 week ranges
Cross join with Franchise
LEFT JOIN the actual date
ISNULL to substitute zeroes
So, like this, untested
;WITH cDATE AS
(
SELECT
CAST('20100101' AS date /*smalldatetime*/) AS StartOfWeek,
CAST('20100101' AS date /*smalldatetime*/) + 6 AS EndOfWeek
UNION ALL
SELECT StartOfWeek + 7, EndOfWeek + 7
FROM cDATE WHERE StartOfWeek + 7 < '20110101'
), Possibles AS
(
SELECT
StartOfWeek, FranchiseID
FROM
cDATE CROSS JOIN Franchise
)
SELECT
P.FranchiseID,
P.StartOfWeek,
ISNULL(SUM(O.ContractHours), 0),
ISNULL(SUM(O.PrivateHours), 0)
FROM
Possibles P
LEFT JOIN
TheOtherTable O ON P.FranchiseID = O.FranchiseID AND
O.Date BETWEEN P.StartOfWeek AND P.EndOfWeek
GROUP BY
P.FranchiseID

Related

T-SQL vlookup with fake calendar table?

I am rather new in T-SQL and I have to create a view, where the output will be as shown below:
enter image description here
But my sales table doesn't have any data about sales in February and May for customer ABC and no data in January for customer XYZ, but I really want to have 0 for these months. How to do it in T-SQL?
This is great question about a very important topic that, even many experienced developers need to touch up on. Being "relatively new at SQL" I wont just offer a solution, I'll explain the key concepts involved.
The Auxiliary Table Numbers
First lets learn about what a tally table, aka numbers table is all about.
What does this do?
SELECT N = 1 ;
It returns the number 1.
N
-----
1
How about this?
SELECT N = 1 FROM (VALUES(0)) AS e(N);
Same thing:
N
-----
1
What does this return?
SELECT N = 1 FROM (VALUES(0),(0),(0),(0),(0),(0)) AS e(n);
Here I'm leveraging the VALUES table constructer which allows for a list of values to be treated like a view. This returns:
N
-------
1
1
1
1
1
We don't need the ones, we need the rows. This will make more sense in a moment. Now, what does this do?
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1;
It returns the same thing, five 1's, but I've wrapped the code into a CTE named e. Think of CTEs as inline unnamed views that you can reference multiple times. Now lets CROSS JOIN e to itself. This returns for 25 dummy rows (5*5).
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1, e e2;
Next we leverage ROW_NUMBER() over our set of dummy values.
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a;
Returns (truncated for brevity):
N
--------------------
1
2
3
...
24
25
Using as an auxiliary numbers table
#OneToTen is a table with random numbers 1 to 10. I need to count how many there are, returning 0 when there aren't any. NOTE MY COMMENTS:
;--== 2. Simple Use Case - Counting all numbers, including missing ones (missing = 0)
DECLARE #OneToTen TABLE (N INT);
INSERT #OneToTen VALUES(1),(2),(2),(2),(4),(8),(8),(10),(10),(10);
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT
N = i.N,
Wrong = COUNT(*), -- WRONG!!! Don't do THIS, this counts ALL rows returned
Correct = COUNT(t.N) -- Correct, this counts numbers from #OneToTen AKA "t.N"
FROM iTally AS i -- Aux Table of numbers
LEFT JOIN #OneToTen AS t -- Table to evaluate
ON i.N = t.N -- LEFT JOIN #OneToTen numbers to our Aux table of numbers
WHERE i.N <= 10 -- We only need the numbers 1 to 10
GROUP BY i.N; -- Group by with no Sort!!!
This returns:
N Wrong Correct
----- ----------- -----------
1 1 1
2 3 3
3 1 0
4 1 1
5 1 0
6 1 0
7 1 0
8 2 2
9 1 0
10 3 3
Note that I show you the wrong and right way to do this. Note how COUNT(*) is wrong for this, you need COUNT(whatever you are counting).
Auxiliary table of Dates (AKA calendar table)
My we use our numbers table to create a calendar table.
;--== 3. Auxilliary Month/Year Calendar Table
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1)
TheDate = f.Dt,
TheYear = YEAR(f.Dt),
TheMonth = MONTH(f.Dt),
TheWeekday = DATEPART(WEEKDAY,f.Dt),
DayOfTheYear = DATEPART(DAYOFYEAR,f.Dt),
LastDayOfMonth = EOMONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
This returns:
TheDate TheYear TheMonth TheWeekday DayOfTheYear LastDayOfMonth
---------- ----------- ----------- ----------- ------------ --------------
2019-10-01 2019 10 3 274 2019-10-31
2019-11-01 2019 11 6 305 2019-11-30
2019-12-01 2019 12 1 335 2019-12-31
2020-01-01 2020 1 4 1 2020-01-31
2020-02-01 2020 2 7 32 2020-02-29
2020-03-01 2020 3 1 61 2020-03-31
You will only need the YEAR and MONTH.
The Auxiliary Customer table
Because you are performing aggregations (SUM,COUNT,etc.) against multiple customers we will also need an Auxiliary table of customers, more commonly known as a lookup or dimension.
SAMPLE DATA:
;--== Sample Data
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
Distinct list of customers for an "Auxilliary Table of Customers"
SELECT DISTINCT s.Customer FROM #sale AS s;
For my sample data we get:
Customer
----------
ABC
CDF
FRED
Putting it all together
Here I'm going to:
Create a numbers table
Use my numbers table to create a calendar table
Create an auxiliary Customer table from #sale
CROSS JOIN (combine) both tables for a "junk dimension"
LEFT JOIN our sales data to our calendar/customer auxiliary tables/junk dimension
Group by the auxiliary table values
SOLUTION:
;--==== SAMPLE DATA
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
;--==== START/END DATEs
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
;--==== FINAL SOLUTION
WITH -- 6.1. Auxilliary Table of numbers:
E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a),
-- 6.2. Use numbers table to create an "Auxilliary Date Table" (Calendar Table):
MonthYear(SaleYear,SaleMonth) AS
(
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1) YEAR(f.Dt), MONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
)
SELECT
Customer = cust.Customer,
MonthYear = CONCAT(cal.SaleYear,'-',cal.SaleMonth),
Sales = ISNULL(SUM(s.SaleAmt),0)
-- Auxilliary Table of Customers
FROM (SELECT DISTINCT s.Customer FROM #sale AS s) AS cust -- 6.3. Aux Customer Table
CROSS JOIN MonthYear AS cal -- 6.4. Cross join to create Calendar/Customer Junk Dimension
LEFT JOIN #sale AS s -- 6.5. Join #sale to Junk Dimension on Year,Month and Customer
ON s.SaleYear = cal.SaleYear
AND s.SaleMonth = cal.SaleMonth
AND s.Customer = cust.Customer
GROUP BY cust.Customer, cal.SaleYear, cal.SaleMonth -- 6.6. Group by Junk Dim values
ORDER BY cust.Customer, cal.SaleYear, cal.SaleMonth; -- Order by not required
RESULTS:
Customer MonthYear Sales
---------- ------------ ------------
ABC 2019-10 0.00
ABC 2019-11 0.00
ABC 2019-12 410.00
ABC 2020-1 718.00
ABC 2020-2 0.00
ABC 2020-3 250.00
CDF 2019-10 200.00
CDF 2019-11 198.00
CDF 2019-12 0.00
CDF 2020-1 333.00
CDF 2020-2 5325.00
CDF 2020-3 1105.00
FRED 2019-10 0.00
FRED 2019-11 0.00
FRED 2019-12 0.00
FRED 2020-1 0.00
FRED 2020-2 0.00
FRED 2020-3 0.00

Pivot table with multiple value columns in KDB+

I would like to transform the following two row table generated by:
tb: ([] time: 2010.01.01 2010.01.01; side:`Buy`Sell; price:100 101; size:30 50)
time side price size
--------------------------------
2010.01.01 Buy 100 30
2010.01.01 Sell 101 50
To the table below with single row:
tb1: ([] enlist time: 2010.01.01; enlist price_buy:100; enlist price_sell:101; enlist size_buy:30; enlist size_sell:50)
time price_buy price_sell size_buy size_sell
-----------------------------------------------------
2010.01.01 100 101 30 50
What is the most efficient way to achieve this?
(select price_buy:price, size_buy:size by time from tb where side = `Buy) lj select price_sell:price, size_sell:size by time from tb where side = `Sell
time | price_buy size_buy price_sell size_sell
----------| ---------------------------------------
2010.01.01| 100 30 101 50
If you wanted to avoid 2 select statements:
raze each select `price_buy`price_sell!(side!price)#/:`Buy`Sell, `size_buy`size_sell!(side!size)#/:`Buy`Sell by time from tb
As an additional note, having a date column labeled time can be misleading. Typical financial tables in kdb have the format date time sym etc
Edit: Functional form for dynamic column generation:
{x[0] lj x[1]}[{?[`tb;enlist (=;`side;enlist `$x);(enlist `time)!enlist `time;(`$("price",x;"size",x))!(`price;`size)]} each ("Sell";"Buy")]
time | priceSell sizeSell priceBuy sizeBuy
----------| -----------------------------------
2010.01.01| 101 50 100 30
The general pivot function on the Kx website can do this, see https://code.kx.com/q/kb/pivoting-tables/
q)piv[tb;(),`time;(),`side;`price`size;{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]};{x,z}]
time | Buyprice Sellprice Buysize Sellsize
----------| -----------------------------------
2010.01.01| 100 101 30 50
I have a pivot function in github . But it doesn't support multiple columns
.math.st.pivot: {[t;rc;cf;ff]
P: asc distinct t cf;
Pcol: `$string[P] cross "_",/:string key ff;
t: ?[t;();rc!rc;key[ff]!{({[x;y;z] z each y#group x}[;;z];x;y)}[cf]'[key ff;value ff]];
t: ![t;();0b; Pcol! raze {((';#);x;$[-11h=type y;enlist;::] y)}'[key ff]'[P] ];
![t;();0b;key ff]
};
But you can left join to achieve expected result:
.math.st.pivot[tb;enlist`time;`side;enlist[`price]!enlist first]
lj .math.st.pivot[tb;enlist`time;`side;enlist[`size]!enlist first]
Looks like adding support for multiple columns is a good idea.

Redshift Postgres SQL comparing NULL vs NOT NULL values in a table

I am trying to create a query in Redshift DB (Postgres SQL) to do the following:
I have columns that I am checking for quality control and need the percentages of NULL vs. NOT NULL for each column. I would like my output to look like this, below shows the totals but need it in % if possible. How can I write this query?
Column NOT NULL NULL Total Records Percentage NULL
--------- ------- ------ ---------------- ---------------------
Column A 78 10 88 11.3%
Column B 68 15 83 18.0%
Column C 3 5 8 62.5%
With SQL, you can calculate the values for a specific column, like this:
select
count(a) as "NOT_NULL",
count(*) - count(a) as "NULL",
count(*) as "Total Records",
to_char(100.0 * (count(*) - count(a)) / count(*), '999.9%') as "Percentage NULL"
from stack
However, it is not possible to display "one row per column". You would have to JOIN several queries together to produce that result.

kdb: dynamically denormalize a table (convert key values to column names)

I have a table like this:
q)t:([sym:(`EURUSD`EURUSD`AUDUSD`AUDUSD);server:(`S01`S02`S01`S02)];volume:(20;10;30;50))
q)t
sym server| volume
-------------| ------
EURUSD S01 | 20
EURUSD S02 | 10
AUDUSD S01 | 30
AUDUSD S02 | 50
I need to de-normalize it to display the data nicely. The resulting table should look like this:
sym | S01 S02
------| -------
EURUSD| 20 10
AUDUSD| 30 50
How do I dynamically convert the original table using distinct values from server column as column names for the new table?
Thanks!
Basically you want 'pivot' table. Following page has a very good solution for your problem:
http://code.kx.com/q/cookbook/pivoting-tables/
Here are the commands to get the required table:
q) P:asc exec distinct server from t
q) exec P#(server!volume) by sym:sym from t
One tricky thing around pivoting a table is - the keys of the dictionary should be of type symbol otherwise it won't generate the pivot table structure.
E.g. In the following table, we have a column dt with type as date.
t:([sym:(`EURUSD`EURUSD`AUDUSD`AUDUSD);dt:(0 1 0 1+.z.d)];volume:(20;10;30;50))
Now if we want to pivot it with columns as dates , it will generate a structure like :
q)P:asc exec distinct dt from t
q)exec P#(dt!volume) by sym:sym from t
(`s#flip (enlist `sym)!enlist `s#`AUDUSD`EURUSD)!((`s#2018.06.22 2018.06.23)!30j, 50j;(`s#2018.06.22 2018.06.23)!20j, 10j)
To get the dates as the columns , the dt column has to be typecasted to symbol :
show P:asc exec distinct `$string date from t
`s#`2018.06.22`2018.06.23
q)exec P#((`$string date)!volume) by sym:sym from t
sym | 2018.06.22 2018.06.23
------| ---------------------
AUDUSD| 30 50
EURUSD| 20 10

Not getting desired format from Oracle query

I am trying to fetch data from data base in below format,
Month Count
----- -----
201208 124
201209 0
201210 56
201211 25
201212 0
201301 184
201302 0
In database I have entries like,
Month Count
----- -----
201206 56
201208 124
201210 56
201211 25
201301 184
201304 49
Below is my query,
SELECT MONTH, Count
FROM TABLE_NAME
WHERE MONTH BETWEEN 201208 AND 201302
AND ID = 'X'
Output :
Month Count
----- -----
201208 124
201210 56
201211 25
201301 184
Can anyone help me getting data in desired format.
First you should generate full month's sequence between these dates. You can do it with CONNECT BY LEVEL in Oracle. then just JOIN this sequence with your table:
SELECT MonthSeq.MONTH,
NVL(Count,0) Count
FROM TABLE_NAME
RIGHT JOIN
(
SELECT
TO_CHAR(ADD_MONTHS(TO_DATE('201208','YYYYMM'),
(ROWNUM-1))
,'YYYYMM') MONTH
FROM DUAL
CONNECT BY LEVEL<=
MONTHS_BETWEEN(TO_DATE('201302','YYYYMM') ,
TO_DATE('201208','YYYYMM'))+1
) MonthSeq
ON TABLE_NAME.MONTH=MonthSeq.MONTH
ORDER BY MonthSeq.MONTH
SQLFiddle demo
UPD:
Your query from the comment should looks like the following. You should move WHERE condition to the JOIN ON. If you use it in WHERE you don't get rows with zero counts.
SELECT MonthSeq.MONTH,
NVL(SUM(TOTAL_SESSIONS),0) AS SESSIONS
FROM X
RIGHT JOIN
(
SELECT
TO_CHAR(ADD_MONTHS(TO_DATE('201208','YYYYMM'),
(ROWNUM-1))
,'YYYYMM') MONTH
FROM DUAL
CONNECT BY LEVEL<=
MONTHS_BETWEEN(TO_DATE('201302','YYYYMM') ,
TO_DATE('201208','YYYYMM'))+1
) MonthSeq
ON X.MONTH=MonthSeq.MONTH and X.acct_id = 'ABCD'
ORDER BY MonthSeq.MONTH
You need to use TO_DATE function to convert the month field to DATE format. Refer here for more in detail. Try like this,
SELECT TO_CHAR(TO_DATE(MONTH, 'YYYYMM'), 'YYYYMM') month, count
FROM TABLE_NAME
WHERE month BETWEEN TO_DATE('201208', 'YYYYMM') AND TO_DATE('201302', 'YYYYMM')
AND id = 'X'
ORDER BY TO_DATE(month, 'YYYYMM');