I'm new to SQL and proceeded much by trial and error as well as searching books and the internet. I have to repeat a query for the sum over monthly data for five years and I'd like to insert the results for every month as a column in a table. I tried adding new columns for every month
alter table add column, insert etc.
but I can't get it right. Here's the code I used for jan and feb07:
CREATE TABLE "TVD_db"."lebendetiere"
(nuar text,
ak text,
sex text,
jan07 text,
feb07 text,
märz07 text,
april07 text,
mai07 text,
juni07 text,
juli07 text,
aug07 text,
sept07 text,
okt07 text,
nov07 text,
dez07 text,
jan08 text,
....
dez11 text);
INSERT INTO "TVD_db"."lebendetiere" (nuar, ak, sex, jan07)
SELECT
"AUFENTHALTE"."nuar",
CASE WHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") < 365 THEN '1' WHEN DATE('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") > 730 THEN 3 ELSE 2 END AS AK,
CASE WHEN "AUFENTHALTE"."isweiblich" = 'T' THEN 'female' ELSE 'male' END AS sex,
COUNT("AUFENTHALTE"."tierid")
FROM "TVD_db"."AUFENTHALTE"
WHERE DATE("AUFENTHALTE"."gueltigvon") <= DATE('2007-01-01')
AND DATE("AUFENTHALTE"."gueltigbis") >= DATE('2007-01-01')
GROUP BY "AUFENTHALTE"."nuar",
CASE WHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") < 365 THEN '1' WHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") > 730 THEN 3 ELSE 2 END,
CASE WHEN "AUFENTHALTE"."isweiblich" = 'T' THEN 'female' ELSE 'male' END
ORDER BY "AUFENTHALTE"."nuar",
CASE WHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") < 365 THEN '1' wWHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") > 730 THEN 3 ELSE 2 END,
CASE WHEN "AUFENTHALTE"."isweiblich" = 'T' THEN 'female' ELSE 'male' END
;
--until here it works fine
UPDATE "TVD_db"."lebendetiere" SET "feb07"= --this is the part I cant get right...
(SELECT
COUNT("AUFENTHALTE"."tierid")
FROM "TVD_db"."AUFENTHALTE"
WHERE DATE("AUFENTHALTE"."gueltigvon") <= DATE('2007-02-01')
AND DATE("AUFENTHALTE"."gueltigbis") >= DATE('2007-02-01')
GROUP BY "AUFENTHALTE"."nuar",
CASE WHEN DATE ('2007-02-01')- DATE ("AUFENTHALTE"."gebdat") < 365 THEN '1' WHEN DATE ('2007-02-01')- DATE ("AUFENTHALTE"."gebdat") > 730 THEN 3 ELSE 2 END,
CASE WHEN "AUFENTHALTE"."isweiblich" = 'T' THEN 'female' ELSE 'male' END
ORDER BY "AUFENTHALTE"."nuar",
CASE WHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") < 365 THEN '1' wWHEN DATE ('2007-01-01')- DATE ("AUFENTHALTE"."gebdat") > 730 THEN 3 ELSE 2 END,
CASE WHEN "AUFENTHALTE"."isweiblich" = 'T' THEN 'female' ELSE 'male' END);
Has anyone a solution or do I have to make a table for every month and then join the results?
After reading your post thoroughly, here is a complete redesign that should hold some insight for beginners in the field of SQL / PostgreSQL.
I would advise not to use mixed case identifiers in PostgreSQL. Use lower case exclusively, then you don't have to double-quote them and your code is much easier to read. You also avoid a lot of possible confusion.
Use table aliases to make your code more readable.
Column names in the SELECT statement for the INSERT are irrelevant. That's why I commented then out (avoids possible naming conflicts).
Use ordinal numbers in GROUP BY and ORDER BY to further simplify.
Don't use a separate column for every new month. Use a column identifying the month and add a row per month.
If you actually need the design with one column per month, then you need a large CASE statement or a pivot query. Refer to the tablefunc extension. But this is complicated stuff for an SQL newbie. I really think, you want a row per month.
I use generate_series() to generate one row per month between Jan 2007 and Dec 2011.
With my changed design, you don't need extra UPDATEs. It's all done in one INSERT.
I simplified quite a couple of other things. Here is what I would propose instead:
CREATE TABLE tvd_db.lebendetiere(
nuar text,
,alterskat integer
,sex text
,datum date
,anzahl integer
);
INSERT INTO tvd_db.lebendetiere (nuar, alterskat, sex, datum, anzahl)
SELECT a.nuar
,CASE WHEN a.gebdat >= '2006-01-01'::date THEN 1 -- use >= !
WHEN a.gebdat < '2005-01-01'::date THEN 3
ELSE 2 END -- AS alterskat
,CASE WHEN a.isweiblich = 'T' THEN 'female' ELSE 'male' END -- AS sex
,m.m
,count(*) -- AS anzahl
FROM tvd_db.aufenthalte a
CROSS JOIN (
SELECT generate_series('2007-01-01'::date
,'2011-12-01'::date, interval '1 month')::date
) m(m)
WHERE a.gueltigvon <= m.m
AND a.gueltigbis >= m.m
GROUP BY a.nuar, 2, 3, m.m
ORDER BY a.nuar, 2, 3, m.m;
Related
I have a lot of data with a lot of dates (date begin, date end, date activation etc.). I would like to retrieve those data selecting a specific time range and returning a period date.
I want only results where:
(last month date) <= Date Activation and (last month date) > Date End
+ return the column containing the period
If I select a unique period:
select "client Name","Program" from "database"."schema"."table"
WHERE "Date Activation" <= '2020-12-31' AND "Date End" > '2020-12-31'
The aim is to retrieve results like this (I want it for all periods in my table):
client Name
Program
period
client 1
program 1
2020/11/30
client 2
program 2
2020/12/31
client 3
program 3
2020/12/31
client 3
program 3
2021/01/31
client 1
program 1
2021/01/31
client 2
program 4
2021/02/28
This should achieve what you want:
-- set parameter to be used as generator "constant" including the start day
-- set start and end dates to match the date range you want to report on
set num_days = (Select datediff(day, TO_DATE('2020-01-01','YYYY-MM-DD'), current_date()+1));
-- generate all the dates between the start and end dates
with date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_val
from table (generator(rowcount => ($num_days)))
),
-- create a distinct list of month-end dates from the list of dates
month_list as (
select distinct last_day(date_val) as month_end
from date_list
)
-- Join the list of month-end dates to your data table
select
cpd.client_name
,cpd.program
,ml.month_end
from month_list ml
inner join client_project_data cpd on cpd.Date_Activation <= ml.month_end and cpd.Date_End > ml.month_end;
-- clean up previously set variable
-- unset num_days;
I believe this is how you would get the first two columns:
SELECT DISTINCT "client Name", "Program"
FROM "database"."schema"."table"
WHERE "Date Activation" < "Date End" AND LAST_DAY("Date Activation") <> LAST_DAY("Date End")
But with the third one you will have to get creative.
If the difference between "Date Activation" and "Date End" can only be one month, then LAST_DAY("Date Activation") would do it.
But if the difference is bigger, then you will probably need two or more month ends to list. You should form some kind of array of month ends which are between "Date Activation" and "Date End". And the you would need to make separate rows out of such array.
I hope someone can help with a calculation that I am having trouble developing.
I am developing a report in a DB2 database that I need to add "X" number of days to a "RECEIVED" date/time when an order comes in between X and Y; but exclude Weekends and Holidays to add to the received date. I have created a [TBLCALENDAR] that lists the Weekends and Holidays (Example below); and from this, I want to ADD X number of days to a "DUEDATE"
tblCalendar]
DATE DAYOFWK DAY HOLIDAY
1/19/2019 7 Saturday
1/20/2019 1 Sunday
1/21/2019 2 Monday YES
So, for example 1, if I have an order that is placed on 1/18/2019 at 4:01pm; the due date should be 1/23/2019 at 11:00am.
Example 2: if I have an order that is placed on 1/18/2019 at
Conditions are:
Previous Date 4:01pm to Current Date 11:00am = Due Date should be + "X" business days by 11:00am
If order received Current day by 4:00pm = Due Date should be + "X" business days by 4:00pm
I have tried to reference the tblCalendar to get the [Received] date/time and add X number of days based off of an order, but it's not functioning the way I have hoped.
I have used the following code...but it doesn't exclude Weekends or Holidays when adding the specified number of days or have my order time requirement to take into account previous day after 4:00pm to current date of 11:00am:
RECEIVEDDATETIME + 2 days as DUEDATE;
I have also used the below code to reference TBLCALENDAR to find the # of holidays and weekend days in a date range:
( SELECT COUNT (*) FROM TBLCALENDAR AS C WHERE C.HOLIDAY = 'YES'
AND C.DATE BETWEEN TBLORDERS.RECEIVEDDATETIME
AND TBLORDERS.DUEDATETIME) +
(SELECT COUNT (*) FROM TBLCALENDAR
WHERE DAYOFWK IN (1,7)
AND DATE BETWEEN TBLORDERS.RECEIVEDDATETIME
AND TBLORDERS.UPLOADTIME) AS NONWORKINGDAYS
Expected field output
If order was received between 1/17/2019 4:01pm to 1/18/2019 10:59am = 1/23/2019 11:00am
If order received Current day by 4:00pm 1/18/2019 3:59am= 1/23/2019 by 4:00pm.
RECEIVEDDATETIME DUEDATE
1/17/2019 4:01pm 1/23/2019 11:00am
1/18/2019 10:00am 1/23/2019 4:00pm
Here is a solution without the time logic.
with tblCalendar(DATE, DAYOFWK, DAY, HOLIDAY) as (values
(date('2019-01-19'), 7, 'Saturday', '')
, (date('2019-01-20'), 1, 'Sunday', '')
, (date('2019-01-21'), 2, 'Monday', 'YES')
, (date('2019-01-22'), 3, 'Tuesday', '')
, (date('2019-01-23'), 4, 'Wednesday', 'YES')
, (date('2019-01-24'), 5, 'Thursday', '')
, (date('2019-01-25'), 6, 'Friday', '')
, (date('2019-01-26'), 7, 'Saturday', '')
)
, mytab (RECEIVEDDATE, DAYS2ADD) as (values
(date('2019-01-19'), 2)
, (date('2019-01-20'), 2)
, (date('2019-01-21'), 2)
, (date('2019-01-22'), 2)
)
select m.*, t.date as DUEDATE
--, dayofweek(date) as DAYOFWK, dayname(date) as DAY
from mytab m
, table
(
select date
from table
(
select
date
, sum(case when HOLIDAY='YES' or dayofweek(date) in (7,1) then 0 else 1 end) over (order by date) as dn_
from tblCalendar t
where t.date > m.RECEIVEDDATE
)
where dn_ = m.DAYS2ADD
fetch first 1 row only
) t;
The idea is to enumerate each day of the calendar after the RECEIVEDDATE (1-st parameter) starting from 1 with the following logic: the number of each day increases by 1 if it's non-holiday non-weekend day (the sum(...) over(...) expression).
Finally, we select a date with the corresponding number of days needed to add (2-nd parameter).
Solution idea:
Your tblCalendar is a good idea but I recommend to add the working day information instead of (only) flagging the holidays and weekends. The problem with the "off days" are that after you have figured out how many of them are in the period from your receive date to the receive date + X days you cannot easily add them because there could be other "off dates" in that perios again.
By numbering all the work days you could identify the workday which is closest (equal or bigger) to the receive date. Retrieve its number and add the X days to that number. Retrieve the date that has this work day number and you are fine.
The time logic should be built before that all because it could add another day to the X days.
I have a student_table and in this table there is a column student_financial_aid_type and the next column is date_ , so the value of student_financial_aid_type e.g. = 'direct' and the date_ 1/04/2018. I have used CTE tables and I have a parameter date at the beginning of the code, so that I get the number of students as of that day. e.g. my parameter date is 20/04/2019.
My financial year runs from april to march eg 1/04/18 - 31/3/19.
My question is where, it indicates that the student received some form of financial aid in the financial year, I will have an output column that says either 'Y' or 'N'. So using the example above, because the date 1/04/2018 is not in the financial year of the parameter date (20/04/19), it's actually in the previous financial year (1/04/18 - 31/3/19) then I would want this to be 'N' in the output column as in the financial year of the parameter date (20/04/19) the student did not receive any financial aid. However if I happen to change the parameter date 2/06/18, then the date that the student received the financial aid (1/04/18) is in the dame financial year as the parameter date, therefore my output column will now have 'Y' to reflect this. So however I do this it has to be dynamic and respond to the parameter date as that is the one that I as the user will be changing as and when
I have tried using date_part and I have managed to have the month number of the date that the student received the payout, from this point on I was thinking of using the month number as an indicator to what FY year it falls in, but I am not sure how to go about this.
WITH
parameter_date as (
select '2019-04-26':: date p_date),
student_cohort as (select * from (
SELECT Distinct
ms.studentid,ss.student_admission_date,ms.graduation_date
FROM master_student_table ms
left join student_semeter ss on ms.student_id=ss.student_id ,
parameter_date, p
AND ss.student_admission_date <= p_date -- i.e. began studies less than
or equal to p_date
AND (ms.graduation_date is null or ms.graduation_date > p_date)) -- i.e.
student finished studies more than p_date or IS NULL
)x ),
student_finance as (select * from ( select date_part('month', st.date_::
date)
date_part, st.date_, st.studentid,st.student_financial_aid_type
from student_table st
left join student_cohort s on st.studentid = s.studentid
where st.student_financial_aid_type in ('direct' , 'indirect')
) x )
select distinct
s.student_id,
s.graduation_date,
s.admissiondate_date,
sf.date_,
-- this is what I would like it to be -- case when sf.date is in the same
--financial year as the parameter_date
--then 'Y' else 'N' end was_financial_aid_received_in_the_fy,
sf.date_part
from
cohort s
left join student_finance sf on s.student_id = sf.student_id and
sf.student_financial_aid_type = 'direct'
left join student_finance sf1 on s.student_id = sf1.student_id and
sf1.student_financial_aid_type = 'indirect' `
I would love for the output column 'was_financial_aid_received_in_the_fy' from the case statement, to have 'Y' if the sf.date_ that the student received financial aid is in the same FY year as the parameter_date and 'N' if this isn't the case
Thank you very much for all your help
I think this question basically boils down to the following:
Given a parameter date, figure out the financial year for that date.
Figure out if other dates fall in this financial year.
This is a great place to use dateranges, one of my favorite types. We can figure out the financial year from the parameter date and use a daterange to represent it. If the parameter date is before April, the financial year should be from April 1 of the previous year (inclusive) to April 1 of this year (exclusive). If the parameter date is after April, the financial year should be April 1 of this year (inclusive) to April 1 of next year (exclusive).
Here's a query that should demonstrate how to do this:
WITH parameter_date as (
select '2019-04-26'::date p_date
), fiscal_year as (
select daterange(
make_date(case when date_part('month', p_date)<4
THEN date_part('year', p_date)::int-1
ELSE date_part('year', p_date)::int END,
4, 1),
make_date(case when date_part('month', p_date)<4
THEN date_part('year', p_date)::int
ELSE date_part('year', p_date)::int+1 END,
4, 1),
'[)') as f_year
FROM parameter_date
),
test_data as (
select test_date::date from (values
('2019-04-01'),
('2018-04-01'),
('2019-03-02'),
('2020-12-01'),
('2017-05-26'),
('2020-02-27'),
('2020-04-01')
) v(test_date)
)
select test_date,
CASE WHEN test_date <# fiscal_year.f_year THEN 'Y' ELSE 'N' END as in_f_year
from test_data, fiscal_year;
test_date | in_f_year
------------+-----------
2019-04-01 | Y
2018-04-01 | N
2019-03-02 | N
2020-12-01 | N
2017-05-26 | N
2020-02-27 | Y
2020-04-01 | N
(7 rows)
I'm migrating a query of Oracle pivot to PostgreSQL crosstab.
create table(cntry numeric,week numeric,year numeric,days text,day text);
insert into x_c values(1,15,2015,'DAY1','MON');
...
insert into x_c values(1,15,2015,'DAY7','SUN');
insert into x_c values(2,15,2015,'DAY1','MON');
...
values(4,15,2015,'DAY7','SUN');
I have 4 weeks with 28 rows like this in a table. My Oracle query looks like this:
SELECT * FROM(select * from x_c)
PIVOT (MIN(DAY) FOR (DAYS) IN
('DAY1' AS DAY1 ,'DAY2' DAY2,'DAY3' DAY3,'DAY4' DAY4,'DAY5' DAY5,'DAY6' DAY6,'DAY7' DAY7 ));
Result:
cntry|week|year|day1|day2|day3|day4|day4|day6|day7|
---------------------------------------------------
1 | 15 |2015| MON| TUE| WED| THU| FRI| SAT| SUN|
...
4 | 18 |2015| MON| ...
Now I have written a Postgres crosstab query like this:
select *
from crosstab('select cntry,week,year,days,min(day) as day
from x_c
group by cntry,week,year,days'
,'select distinct days from x_c order by 1'
) as (cntry numeric,week numeric,year numeric
,day1 text,day2 text,day3 text,day4 text, day5 text,day6 text,day7 text);
I'm getting only one row as output:
1|17|2015|MON|TUE| ... -- only this row is coming
Where am I doing wrong?
ORDER BY was missing in your original query. The manual:
In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row.
More importantly (and more tricky), crosstab() requires exactly one row_name column. Detailed explanation in this closely related answer:
Crosstab splitting results due to presence of unrelated field
The solution you found is to nest multiple columns in an array and later unnest again. That's needlessly expensive, error prone and limited (only works for columns with identical data types or you need to cast and possibly lose proper sort order).
Instead, generate a surrogate row_name column with rank() or dense_rank() (rnk in my example):
SELECT cntry, week, year, day1, day2, day3, day4, day5, day6, day7
FROM crosstab (
'SELECT dense_rank() OVER (ORDER BY cntry, week, year)::int AS rnk
, cntry, week, year, days, day
FROM x_c
ORDER BY rnk, days'
, $$SELECT unnest('{DAY1,DAY2,DAY3,DAY4,DAY5,DAY6,DAY7}'::text[])$$
) AS ct (rnk int, cntry int, week int, year int
, day1 text, day2 text, day3 text, day4 text, day5 text, day6 text, day7 text)
ORDER BY rnk;
I use the data type integer for out columns cntry, week, year because that seems to be the (cheaper) appropriate type. You can also use numeric like you had it.
Basics for crosstab queries here:
PostgreSQL Crosstab Query
I got this figured out from http://www.postgresonline.com/journal/categories/24-tablefunc
select year_wk_cntry.t[1],year_wk_cntry.t[2],year_wk_cntry.t[3],day1,day2,day3,day4,day5,day6,day7
from crosstab('select ARRAY[country :: numeric,week,year] as t,days,min(day) as day
from x_c group by country,week,year,days order by 1,2
','select distinct days from x_c order by 1')
as year_wk_cntry (t numeric[],day1 text,day2 text,day3 text,
day4 text, day5 text,day6 text,day7 text);
thanks!!
I am creating a Customer table and i want one of the attributes to be Expiry Date of credit card.I want the format to be 'Month Year'. What data type should i use? i want to use date but the format is year/month/day. Is there any other way to restrict format to only Month and year?
You can constrain the date to the first day of the month:
create table customer (
cc_expire date check (cc_expire = date_trunc('month', cc_expire))
);
Now this fails:
insert into customer (cc_expire) values ('2014-12-02');
ERROR: new row for relation "customer" violates check constraint "customer_cc_expire_check"
DETAIL: Failing row contains (2014-12-02).
And this works:
insert into customer (cc_expire) values ('2014-12-01');
INSERT 0 1
But it does not matter what day is entered. You will only check the month:
select
date_trunc('month', cc_expire) > current_date as valid
from customer;
valid
-------
t
Extract year and month separately:
select extract(year from cc_expire) "year", extract(month from cc_expire) "month"
from customer
;
year | month
------+-------
2014 | 12
Or concatenated:
select to_char(cc_expire, 'YYYYMM') "month"
from customer
;
month
--------
201412
Use either
char(5) for two-digit years, or
char(7) for four-digit years.
Code below assumes two-digit years, which is the form that matches all my credit cards. First, let's create a table of valid expiration dates.
create table valid_expiration_dates (
exp_date char(5) primary key
);
Now let's populate it. This code is just for 2013. You can easily adjust the range by changing the starting date (currently '2013-01-01'), and the "number" of months (currently 11, which lets you get all of 2013 by adding from 0 to 11 months to the starting date).
with all_months as (
select '2013-01-01'::date + (n || ' months')::interval months
from generate_series(0, 11) n
)
insert into valid_expiration_dates
select to_char(months, 'MM') || '/' || to_char(months, 'YY') exp_date
from all_months;
Now, in your data table, create a char(5) column, and set a foreign key reference from it to valid_expiration_dates.exp_date.
While you're busy with this, think hard about whether "exp_month" might be a better name for that column than "exp_date". (I think it would.)
As another idea you could essentially create some brief utilities to do this for you using int[]:
CREATE OR REPLACE FUNCTION exp_valid(int[]) returns bool LANGUAGE SQL IMMUTABLE as
$$
SELECT $1[1] <= 12 AND (select count(*) = 2 FROM unnest($1));
$$;
CREATE OR REPLACE FUNCTION first_invalid_day(int[]) RETURNS date LANGUAGE SQL IMMUTABLE AS
$$
SELECT (to_date($1[2]::text || $1[1]::text, CASE WHEN $1[2] < 100 THEN 'YYMM' ELSE 'YYYYMM' END) + '1 month'::interval)::date;
$$;
These work:
postgres=# select exp_valid('{04,13}');
exp_valid
-----------
t
(1 row)
postgres=# select exp_valid('{13,04}');
exp_valid
-----------
f
(1 row)
postgres=# select exp_valid('{04,13,12}');
exp_valid
-----------
f
(1 row)
Then we can convert these into a date:
postgres=# select first_invalid_day('{04,13}');
first_invalid_day
-------------------
2013-05-01
(1 row)
This use of arrays does not violate any normalization rules because the array as a whole represents a single value in its domain. We are storing two integers representing a single date. '{12,2}' is December of 2002, while '{2,12}' is Feb of 2012. Each represents a single value of the domain and is therefore perfectly atomic.