I am rather new in T-SQL and I have to create a view, where the output will be as shown below:
enter image description here
But my sales table doesn't have any data about sales in February and May for customer ABC and no data in January for customer XYZ, but I really want to have 0 for these months. How to do it in T-SQL?
This is great question about a very important topic that, even many experienced developers need to touch up on. Being "relatively new at SQL" I wont just offer a solution, I'll explain the key concepts involved.
The Auxiliary Table Numbers
First lets learn about what a tally table, aka numbers table is all about.
What does this do?
SELECT N = 1 ;
It returns the number 1.
N
-----
1
How about this?
SELECT N = 1 FROM (VALUES(0)) AS e(N);
Same thing:
N
-----
1
What does this return?
SELECT N = 1 FROM (VALUES(0),(0),(0),(0),(0),(0)) AS e(n);
Here I'm leveraging the VALUES table constructer which allows for a list of values to be treated like a view. This returns:
N
-------
1
1
1
1
1
We don't need the ones, we need the rows. This will make more sense in a moment. Now, what does this do?
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1;
It returns the same thing, five 1's, but I've wrapped the code into a CTE named e. Think of CTEs as inline unnamed views that you can reference multiple times. Now lets CROSS JOIN e to itself. This returns for 25 dummy rows (5*5).
WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1, e e2;
Next we leverage ROW_NUMBER() over our set of dummy values.
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a;
Returns (truncated for brevity):
N
--------------------
1
2
3
...
24
25
Using as an auxiliary numbers table
#OneToTen is a table with random numbers 1 to 10. I need to count how many there are, returning 0 when there aren't any. NOTE MY COMMENTS:
;--== 2. Simple Use Case - Counting all numbers, including missing ones (missing = 0)
DECLARE #OneToTen TABLE (N INT);
INSERT #OneToTen VALUES(1),(2),(2),(2),(4),(8),(8),(10),(10),(10);
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT
N = i.N,
Wrong = COUNT(*), -- WRONG!!! Don't do THIS, this counts ALL rows returned
Correct = COUNT(t.N) -- Correct, this counts numbers from #OneToTen AKA "t.N"
FROM iTally AS i -- Aux Table of numbers
LEFT JOIN #OneToTen AS t -- Table to evaluate
ON i.N = t.N -- LEFT JOIN #OneToTen numbers to our Aux table of numbers
WHERE i.N <= 10 -- We only need the numbers 1 to 10
GROUP BY i.N; -- Group by with no Sort!!!
This returns:
N Wrong Correct
----- ----------- -----------
1 1 1
2 3 3
3 1 0
4 1 1
5 1 0
6 1 0
7 1 0
8 2 2
9 1 0
10 3 3
Note that I show you the wrong and right way to do this. Note how COUNT(*) is wrong for this, you need COUNT(whatever you are counting).
Auxiliary table of Dates (AKA calendar table)
My we use our numbers table to create a calendar table.
;--== 3. Auxilliary Month/Year Calendar Table
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1)
TheDate = f.Dt,
TheYear = YEAR(f.Dt),
TheMonth = MONTH(f.Dt),
TheWeekday = DATEPART(WEEKDAY,f.Dt),
DayOfTheYear = DATEPART(DAYOFYEAR,f.Dt),
LastDayOfMonth = EOMONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
This returns:
TheDate TheYear TheMonth TheWeekday DayOfTheYear LastDayOfMonth
---------- ----------- ----------- ----------- ------------ --------------
2019-10-01 2019 10 3 274 2019-10-31
2019-11-01 2019 11 6 305 2019-11-30
2019-12-01 2019 12 1 335 2019-12-31
2020-01-01 2020 1 4 1 2020-01-31
2020-02-01 2020 2 7 32 2020-02-29
2020-03-01 2020 3 1 61 2020-03-31
You will only need the YEAR and MONTH.
The Auxiliary Customer table
Because you are performing aggregations (SUM,COUNT,etc.) against multiple customers we will also need an Auxiliary table of customers, more commonly known as a lookup or dimension.
SAMPLE DATA:
;--== Sample Data
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
Distinct list of customers for an "Auxilliary Table of Customers"
SELECT DISTINCT s.Customer FROM #sale AS s;
For my sample data we get:
Customer
----------
ABC
CDF
FRED
Putting it all together
Here I'm going to:
Create a numbers table
Use my numbers table to create a calendar table
Create an auxiliary Customer table from #sale
CROSS JOIN (combine) both tables for a "junk dimension"
LEFT JOIN our sales data to our calendar/customer auxiliary tables/junk dimension
Group by the auxiliary table values
SOLUTION:
;--==== SAMPLE DATA
DECLARE #sale TABLE
(
Customer VARCHAR(10),
SaleYear INT,
SaleMonth TINYINT,
SaleAmt DECIMAL(19,2),
INDEX idx_cust(Customer)
);
INSERT #sale
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50), ('ABC',2020,3,250),
('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);
;--==== START/END DATEs
DECLARE #Start DATE = '20191001',
#End DATE = '20200301';
;--==== FINAL SOLUTION
WITH -- 6.1. Auxilliary Table of numbers:
E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a),
-- 6.2. Use numbers table to create an "Auxilliary Date Table" (Calendar Table):
MonthYear(SaleYear,SaleMonth) AS
(
SELECT TOP(DATEDIFF(MONTH,#Start,#End)+1) YEAR(f.Dt), MONTH(f.Dt)
FROM iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, #Start))) AS f(Dt)
)
SELECT
Customer = cust.Customer,
MonthYear = CONCAT(cal.SaleYear,'-',cal.SaleMonth),
Sales = ISNULL(SUM(s.SaleAmt),0)
-- Auxilliary Table of Customers
FROM (SELECT DISTINCT s.Customer FROM #sale AS s) AS cust -- 6.3. Aux Customer Table
CROSS JOIN MonthYear AS cal -- 6.4. Cross join to create Calendar/Customer Junk Dimension
LEFT JOIN #sale AS s -- 6.5. Join #sale to Junk Dimension on Year,Month and Customer
ON s.SaleYear = cal.SaleYear
AND s.SaleMonth = cal.SaleMonth
AND s.Customer = cust.Customer
GROUP BY cust.Customer, cal.SaleYear, cal.SaleMonth -- 6.6. Group by Junk Dim values
ORDER BY cust.Customer, cal.SaleYear, cal.SaleMonth; -- Order by not required
RESULTS:
Customer MonthYear Sales
---------- ------------ ------------
ABC 2019-10 0.00
ABC 2019-11 0.00
ABC 2019-12 410.00
ABC 2020-1 718.00
ABC 2020-2 0.00
ABC 2020-3 250.00
CDF 2019-10 200.00
CDF 2019-11 198.00
CDF 2019-12 0.00
CDF 2020-1 333.00
CDF 2020-2 5325.00
CDF 2020-3 1105.00
FRED 2019-10 0.00
FRED 2019-11 0.00
FRED 2019-12 0.00
FRED 2020-1 0.00
FRED 2020-2 0.00
FRED 2020-3 0.00
I'm looking to dynamically insert a set of columns from one table to another in PostgreSQL. What I think I'd like to do is read in a 'checklist' of column headings (those columns which exist in table 1 - the storage table), and if they exist in the export table (table 2) then insert them in all at once from table 1. Table 2 will be variable in its columns though - once imported ill drop it and import new data to be imported with potentially different column structure. So I need to import it based on the column names.
e.g.
Table 1. - The storage table
ID NAME YEAR LITH_AGE PROV_AGE SIO2 TIO2 CAO MGO COMMENTS
1 John 1998 2000 3000 65 10 5 5 comment1
2 Mark 2005 2444 3444 63 8 2 3 comment2
3 Luke 2001 1000 1500 77 10 2 2 comment3
Table 2. - The export table
ID NAME MG# METHOD SIO2 TIO2 CAO MGO
1 Amy 4 Method1 65 10 5 5
2 Poe 3 Method2 63 8 2 3
3 Ben 2 Method3 77 10 2 2
As you can see the export table may include columns which do not exist in the storage table, so these would be ignored.
I want to insert all of these columns at once, as I've found if I do it individually by column it extends the number of rows each time on the insert (maybe someone can solve this issue instead? Currently I've written a function to check if a column name exists in table 2, if it does, insert it, but as said this extends the rows of the table every time and NULL the rest of the columns).
The INSERT line from my function:
EXECUTE format('INSERT INTO %s (%s) (SELECT %s::%s FROM %s);',_tbl_import, _col,_col,_type,_tbl_export);
As a type of 'code example' for my question:
EXECUTE FORMAT('INSERT INTO table1 (%s) (SELECT (%s) FROM table2)',columns)
where 'columns' would be some variable denoting the columns that exist in the export table that need to go into the storage table. This will be variable as table 2 will be different every time.
This would ideally update Table 1 as:
ID NAME YEAR LITH_AGE PROV_AGE SIO2 TIO2 CAO MGO COMMENTS
1 John 1998 2000 3000 65 10 5 5 comment1
2 Mark 2005 2444 3444 63 8 2 3 comment2
3 Luke 2001 1000 1500 77 10 2 2 comment3
4 Amy NULL NULL NULL 65 10 5 5 NULL
5 Poe NULL NULL NULL 63 8 2 3 NULL
6 Ben NULL NULL NULL 77 10 2 2 NULL
UPDATED answer
As my original answer did not meet requirement came out later but was asked to post an alternative example for information_schema solution so here it is.
I made two versions for solutions:
V1 - is equivalent to already given example using information_schema. But that solution relies on table1 column DEFAULTs. Meaning, if table1 column that does not exist at table2 does not have DEFAULT NULL then it will be filled with whatever the default is.
V2 - is modified to force 'NULL' in case of two table columns mismatch and does not inherit table1 own DEFAULTs
Version1:
CREATE OR REPLACE FUNCTION insert_into_table1_v1()
RETURNS void AS $main$
DECLARE
columns text;
BEGIN
SELECT string_agg(c1.attname, ',')
INTO columns
FROM pg_attribute c1
JOIN pg_attribute c2
ON c1.attrelid = 'public.table1'::regclass
AND c2.attrelid = 'public.table2'::regclass
AND c1.attnum > 0
AND c2.attnum > 0
AND NOT c1.attisdropped
AND NOT c2.attisdropped
AND c1.attname = c2.attname
AND c1.attname <> 'id';
-- Following is the actual result of query above, based on given data examples:
-- -[ RECORD 1 ]----------------------
-- string_agg | name,si02,ti02,cao,mgo
EXECUTE format(
' INSERT INTO table1 ( %1$s )
SELECT %1$s
FROM table2
',
columns
);
END;
$main$ LANGUAGE plpgsql;
Version2:
CREATE OR REPLACE FUNCTION insert_into_table1_v2()
RETURNS void AS $main$
DECLARE
t1_cols text;
t2_cols text;
BEGIN
SELECT string_agg( c1.attname, ',' ),
string_agg( COALESCE( c2.attname, 'NULL' ), ',' )
INTO t1_cols,
t2_cols
FROM pg_attribute c1
LEFT JOIN pg_attribute c2
ON c2.attrelid = 'public.table2'::regclass
AND c2.attnum > 0
AND NOT c2.attisdropped
AND c1.attname = c2.attname
WHERE c1.attrelid = 'public.table1'::regclass
AND c1.attnum > 0
AND NOT c1.attisdropped
AND c1.attname <> 'id';
-- Following is the actual result of query above, based on given data examples:
-- t1_cols | t2_cols
-- --------------------------------------------------------+--------------------------------------------
-- name,year,lith_age,prov_age,si02,ti02,cao,mgo,comments | name,NULL,NULL,NULL,si02,ti02,cao,mgo,NULL
-- (1 row)
EXECUTE format(
' INSERT INTO table1 ( %s )
SELECT %s
FROM table2
',
t1_cols,
t2_cols
);
END;
$main$ LANGUAGE plpgsql;
Also link to documentation about pg_attribute table columns if something is unclear: https://www.postgresql.org/docs/current/static/catalog-pg-attribute.html
Hopefully this helps :)
Banging my head up against this one for a while. I'm constructing a database on oracle 11g, and am attempting to insert a record into a "registry" table whenever a record is created on a "data product" table. The registry table needs to auto-increment the product_id, and then that product_id is used as a foreign key on the data product table. Here is my trigger code:
CREATE OR REPLACE TRIGGER "TR_CAMERA_DP_DPR_CREATE"
BEFORE INSERT ON "DD1"."CAMERA_DP"
FOR EACH ROW
BEGIN
:new.product_id := ID_SEQ.NEXTVAL;
insert into dd1.dp_registry
( product_id,
fs_location,
parent_group_id,
product_name,
shortdes,
createdate,
revision )
values
( :new.product_id,
'placeholder',
0,
'_image',
'description placeholder',
sysdate,
0
);
END;
So, ideally, an insert into dd1.camera_dp without providing a product_id will first insert a record into dd1.dp_registry, and then use that incremented product_id as the key field for dd1.camera_dp.
The insert statement works when run with a hard-coded value for :new.product_id, and ID_SEQ.NEXTVAL is also working properly. I get the feeling I'm missing something obvious.
Thanks!
Your code works perfectly well for me. If you're getting an error, there is something about the code that you are actually running from the code that you posted.
SQL> create table CAMERA_DP(
2 product_id number,
3 name varchar2(10)
4 );
Table created.
SQL> create sequence id_seq;
Sequence created.
SQL> ed
Wrote file afiedt.buf
1 create table dp_registry
2 ( product_id number,
3 fs_location varchar2(100),
4 parent_group_id number,
5 product_name varchar2(100),
6 shortdes varchar2(100),
7 createdate date,
8* revision number)
SQL> /
Table created.
SQL> ed
Wrote file afiedt.buf
1 CREATE OR REPLACE TRIGGER "TR_CAMERA_DP_DPR_CREATE"
2 BEFORE INSERT ON "CAMERA_DP"
3 FOR EACH ROW
4 BEGIN
5 :new.product_id := ID_SEQ.NEXTVAL;
6 insert into dp_registry
7 ( product_id,
8 fs_location,
9 parent_group_id,
10 product_name,
11 shortdes,
12 createdate,
13 revision )
14 values
15 ( :new.product_id,
16 'placeholder',
17 0,
18 '_image',
19 'description placeholder',
20 sysdate,
21 0
22 );
23* END;
24 /
Trigger created.
SQL> insert into camera_dp( name ) values( 'Foo' );
1 row created.
SQL> ed
Wrote file afiedt.buf
1* select product_id from dp_registry
SQL> /
PRODUCT_ID
----------
1
If you're getting an error that a table doesn't exist, the common culprits would be
You actually have a typo in the name of your table
You don't have permission to insert into the table. Note that if in your actual code, not everything is in the same schema, my guess would be that the user that owns the trigger has privileges to INSERT into the DP_REGISTRY table via a role rather than via a direct grant. Since priileges granted through a role are not available in a definer's rights stored procedure block, that would explain why you can do something at the command line but not in PL/SQL.
I am creating a Customer table and i want one of the attributes to be Expiry Date of credit card.I want the format to be 'Month Year'. What data type should i use? i want to use date but the format is year/month/day. Is there any other way to restrict format to only Month and year?
You can constrain the date to the first day of the month:
create table customer (
cc_expire date check (cc_expire = date_trunc('month', cc_expire))
);
Now this fails:
insert into customer (cc_expire) values ('2014-12-02');
ERROR: new row for relation "customer" violates check constraint "customer_cc_expire_check"
DETAIL: Failing row contains (2014-12-02).
And this works:
insert into customer (cc_expire) values ('2014-12-01');
INSERT 0 1
But it does not matter what day is entered. You will only check the month:
select
date_trunc('month', cc_expire) > current_date as valid
from customer;
valid
-------
t
Extract year and month separately:
select extract(year from cc_expire) "year", extract(month from cc_expire) "month"
from customer
;
year | month
------+-------
2014 | 12
Or concatenated:
select to_char(cc_expire, 'YYYYMM') "month"
from customer
;
month
--------
201412
Use either
char(5) for two-digit years, or
char(7) for four-digit years.
Code below assumes two-digit years, which is the form that matches all my credit cards. First, let's create a table of valid expiration dates.
create table valid_expiration_dates (
exp_date char(5) primary key
);
Now let's populate it. This code is just for 2013. You can easily adjust the range by changing the starting date (currently '2013-01-01'), and the "number" of months (currently 11, which lets you get all of 2013 by adding from 0 to 11 months to the starting date).
with all_months as (
select '2013-01-01'::date + (n || ' months')::interval months
from generate_series(0, 11) n
)
insert into valid_expiration_dates
select to_char(months, 'MM') || '/' || to_char(months, 'YY') exp_date
from all_months;
Now, in your data table, create a char(5) column, and set a foreign key reference from it to valid_expiration_dates.exp_date.
While you're busy with this, think hard about whether "exp_month" might be a better name for that column than "exp_date". (I think it would.)
As another idea you could essentially create some brief utilities to do this for you using int[]:
CREATE OR REPLACE FUNCTION exp_valid(int[]) returns bool LANGUAGE SQL IMMUTABLE as
$$
SELECT $1[1] <= 12 AND (select count(*) = 2 FROM unnest($1));
$$;
CREATE OR REPLACE FUNCTION first_invalid_day(int[]) RETURNS date LANGUAGE SQL IMMUTABLE AS
$$
SELECT (to_date($1[2]::text || $1[1]::text, CASE WHEN $1[2] < 100 THEN 'YYMM' ELSE 'YYYYMM' END) + '1 month'::interval)::date;
$$;
These work:
postgres=# select exp_valid('{04,13}');
exp_valid
-----------
t
(1 row)
postgres=# select exp_valid('{13,04}');
exp_valid
-----------
f
(1 row)
postgres=# select exp_valid('{04,13,12}');
exp_valid
-----------
f
(1 row)
Then we can convert these into a date:
postgres=# select first_invalid_day('{04,13}');
first_invalid_day
-------------------
2013-05-01
(1 row)
This use of arrays does not violate any normalization rules because the array as a whole represents a single value in its domain. We are storing two integers representing a single date. '{12,2}' is December of 2002, while '{2,12}' is Feb of 2012. Each represents a single value of the domain and is therefore perfectly atomic.
Requirements
I have data table that saves data in date ranges.
Each record is allowed to overlap previous record(s) (record has a CreatedOn datetime column).
New record can define it's own date range if it needs to hence can overlap several older records.
Each new overlapping record overrides settings of older records that it overlaps.
Result set
What I need to get is get per day data for any date range that uses record overlapping. It should return a record per day with corresponding data for that particular day.
To convert ranges to days I was thinking of numbers/dates table and user defined function (UDF) to get data for each day in the range but I wonder whether there's any other (as in better* or even faster) way of doing this since I'm using the latest SQL Server 2008 R2.
Stored data
Imagine my stored data looks like this
ID | RangeFrom | RangeTo | Starts | Ends | CreatedOn (not providing data)
---|-----------|----------|--------|-------|-----------
1 | 20110101 | 20110331 | 07:00 | 15:00
2 | 20110401 | 20110531 | 08:00 | 16:00
3 | 20110301 | 20110430 | 06:00 | 14:00 <- overrides both partially
Results
If I wanted to get data from 1st January 2011 to 31st May 2001 resulting table should look like the following (omitted obvious rows):
DayDate | Starts | Ends
--------|--------|------
20110101| 07:00 | 15:00 <- defined by record ID = 1
20110102| 07:00 | 15:00 <- defined by record ID = 1
... many rows omitted for obvious reasons
20110301| 06:00 | 14:00 <- defined by record ID = 3
20110302| 06:00 | 14:00 <- defined by record ID = 3
... many rows omitted for obvious reasons
20110501| 08:00 | 16:00 <- defined by record ID = 2
20110502| 08:00 | 16:00 <- defined by record ID = 2
... many rows omitted for obvious reasons
20110531| 08:00 | 16:00 <- defined by record ID = 2
Actually, since you are working with dates, a Calendar table would be more helpful.
Declare #StartDate date
Declare #EndDate date
;With Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
Select ...
From Calendar
Left Join MyTable
On Calendar.[Date] Between MyTable.Start And MyTable.End
Option ( Maxrecursion 0 );
Addition
Missed the part about the trumping rule in your original post:
Set DateFormat MDY;
Declare #StartDate date = '20110101';
Declare #EndDate date = '20110501';
-- This first CTE is obviously to represent
-- the source table
With SampleData As
(
Select 1 As Id
, Cast('20110101' As date) As RangeFrom
, Cast('20110331' As date) As RangeTo
, Cast('07:00' As time) As Starts
, Cast('15:00' As time) As Ends
, CURRENT_TIMESTAMP As CreatedOn
Union All Select 2, '20110401', '20110531', '08:00', '16:00', DateAdd(s,1,CURRENT_TIMESTAMP )
Union All Select 3, '20110301', '20110430', '06:00', '14:00', DateAdd(s,2,CURRENT_TIMESTAMP )
)
, Calendar As
(
Select #StartDate As [Date]
Union All
Select DateAdd(d,1,[Date])
From Calendar
Where [Date] < #EndDate
)
, RankedData As
(
Select C.[Date]
, S.Id
, S.RangeFrom, S.RangeTo, S.Starts, S.Ends
, Row_Number() Over( Partition By C.[Date] Order By S.CreatedOn Desc ) As Num
From Calendar As C
Join SampleData As S
On C.[Date] Between S.RangeFrom And S.RangeTo
)
Select [Date], Id, RangeFrom, RangeTo, Starts, Ends
From RankedData
Where Num = 1
Option ( Maxrecursion 0 );
In short, I rank all the sample data preferring the newer rows that overlap the same date.
Why do it all in DB when you can do it better in memory
This is the solution (I eventually used) that seemed most reasonable in terms of data transferred, speed and resources.
get actual range definitions from DB to mid tier (smaller amount of data)
generate in memory calendar of a certain date range (faster than in DB)
put those DB definitions in (much easier and faster than DB)
And that's it. I realised that complicating certain things in DB is not not worth it when you have executable in memory code that can do the same manipulation faster and more efficient.