I have problem select data between two dates if the only start_date is available.
The example I want to see is what discount_nr was active between 2020-07-01 and 2020-07-15 or only one day 2020-07-14. I tried different solutions, date range, generate series, and so on, but was still not able to get it to work.
Table only have start dates, no end dates
Example:
discount_nr, start_date
1, 2020-06-30
2, 2020-07-03
3, 2020-07-10
4, 2020-07-15
You can get the end dates by looking at the start date of the next row. This is done with lead. lead(start_date) over(order by start_date asc) will get you the start_date of the next row. If we take 1 day from that we'll get the inclusive end date.
Rather than separate start/end columns, a single daterange column is easier to work with. You can use that as a CTE or create a view.
create view discount_durations as
select
id,
daterange(
start_date,
lead(start_date) over(order by start_date asc)
) as duration
from discounts
Now querying it is easy using range operators. #> to check if the range contains a date.
select *
from discount_durations
where duration #> '2020-07-14'::date
And use && to see if they have any overlap.
select *
from discount_durations
where duration && daterange('2020-07-01', '2020-07-15');
Demonstration
Related
In my structure I have the following, I would like to keep (yellow) the most recent dates and delete the remaining? I don't necessary know the most recent date (ie 17/4/2021 and 10/2/2021 in my example) for each stock_id but I know I want to keep only the two most recent items.
Is that possible?
Thank you
Note: this assumes that dates do not repeat within each stock_id group in your table, so top two dates are always unique.
You can assign rank to each row within stock_id after ordering by date and delete rows where rank is greater than 2.
DELETE FROM mytable
WHERE (stock_id, date) NOT IN (
SELECT
stock_id,
date
FROM (
SELECT
stock_id,
date,
row_number() over (partition by stock_id order by date desc) as rank
FROM mytable
) ranks
WHERE rank <= 2
)
I have a SELECT-query which gives me the aggregated sum(minutes_per_hour_used) of some stuff. Grouped by id, weekday and observed hour.
SELECT id,
extract(dow from observed_date) AS weekday, ( --observed_date is type date
observed_hour, -- is type timestamp without timezone, every full hour 00:00:00, 01:00:00, ...
sum(minutes_per_hour_used)
FROM base_table
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
The result looks nice, but now I would like to store that in a self-maintained view, which only considers/aggregates the last 8 weeks. I thought contiouus aggregates are the right way, but I can't make it work (https://blog.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/). It seems I need to somehow use the time_bucket-function, but actually I don't know how. Any ideas/hints?
I am using postgres with timescaledb.
EDIT: This gives me the desired output, but I can't put it in a continouus aggregate
SELECT id,
extract(dow from observed_date) AS weekday,
observed_hour,
sum(minutes_per_hour_used)
FROM base_table
WHERE observed_date >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
EDIT: Prepend this with
CREATE VIEW my_view
WITH (timescaledb.continuous) AS
gives me [0A000] ERROR: invalid SELECT query for continuous aggregate
Continuous aggregates require grouping by time_bucket:
SELECT <grouping_exprs>, <aggregate_functions>
FROM <hypertable>
[WHERE ... ]
GROUP BY time_bucket( <const_value>, <partition_col_of_hypertable> ),
[ optional grouping exprs>]
[HAVING ...]
It should be applied to a partitioned column, which is usually the time dimension column used in the hypertable creation. Also ORDER BY is not supported.
In the case of the aggregate query in the question no time column is used for grouping. Neither weekday nor observed_hour are time valid columns, since they don't increase as time, instead their values are repeat regularly. weekday repeats every 7 days and observed_hour repeats every 24 hours. This breaks requirements for continuous aggregates.
Since there is no ready solution for this use case, one approach is to use a continuous aggregate to reduce the amount of data for the targeted query, e.g., by bucketing by day:
CREATE MATERIALIZED VIEW daily
WITH (timescaledb.continuous) AS
SELECT id,
time_bucket('1day', observed_date) AS day,
observed_hour,
sum(minutes_per_hour_used)
FROM base_table
GROUP BY 1, 2, 3;
Then execute the targeted aggregate query on top of it:
SELECT id,
extract(dow from day) AS weekday,
observed_hour,
sum(minutes_per_hour_used)
FROM daily
WHERE day >= now() - interval '8 weeks'
GROUP BY id, weekday, observed_hour
ORDER BY id, weekday, observed_hour;
Another approach is to use PostgreSQL's materialized views and refresh it on regular basis with help of custom jobs, which is run by the job scheduling framework of TimescaleDB. Note that the refresh will re-calculate entire view, which in the example case covers 8 weeks of data. The materialized view can be written in terms of the original table base_table or in terms of the continuous aggregate suggested above.
I need to find a date that is 11 business days after a date.
I did not have a date table. Requested one, long lead time for one.
Used a CTE to produce results that have a datekey, 1 if weekday, and 1 if holiday, else 0. Put those results into a Table Variable, now Business_Day is (weekday-holiday). Much Googling has already happened.
select dt.Datekey,
(dt.Weekdaycount - dt.HolidayCount) as Business_day
from #DateTable dt[enter image description here][1]
UPDATE, I've figured it out in Excel. Running count of business days, a column of business day count + 11, then a Vlookup finding the +11 date . Now how do I do that in SQL?
Results like this
Datekey
2019-01-01
Business_day 0
Datekey
2019-01-02
Business_day
1
I will assume you want to set your weekdays, and you can enter the holidays in a variable table, so you can do the below:-
here set the weekend names
Declare #WeekDayName1 varchar(50)='Saturday'
Declare #WeekDayName2 varchar(50)='Sunday'
Set the holiday table variable, you may have it as a specific table your database
Declare #Holidays table (
[Date] date,
HolidayName varchar(250)
)
Lets insert a a day or two to test it.
insert into #Holidays values (cast('2019-01-01' as date),'New Year')
insert into #Holidays values (cast('2019-01-08' as date),'some other holiday in your country')
lets say your date you want to start from is action date and you need 11 business days after it
Declare #ActionDate date='2018-12-28'
declare #BusinessDays int=11
A recursive CTE to count the days till you get the correct one.
;with cte([date],BusinessDay) as (
select #ActionDate [date],cast(0 as int) BusinessDay
union all
select dateadd(day,1,cte.[date]),
case
when DATENAME(WEEKDAY,dateadd(day,1,cte.[date]))=#WeekDayName1
OR DATENAME(WEEKDAY,dateadd(day,1,cte.[date]))=#WeekDayName2
OR (select 1 from #Holidays h where h.Date=dateadd(day,1,cte.[date])) is not null
then cte.BusinessDay
else cte.BusinessDay+1
end BusinessDay
From cte where BusinessDay<#BusinessDays
)
--to see the all the dates till business day + 11
--select * from cte option (maxrecursion 0)
--to get the required date
select MAX([date]) from cte option (maxrecursion 0)
In my example the date I get is as below:-
ActionDate =2018-12-28
After 11 business days :2019-01-16
Hope this helps
1st step was to create a date table. Figuring out weekday verse weekends is easy. Weekdays are 1, weekends are 0. Borrowed someone else's holiday calendar, if holiday 1 else 0. Then Business day is Weekday-Holiday = Business Day. Next was to create a running total of business days. That allows you to move from whatever running total day you're current on to where you want to be in the future, say plus 10 business days. Hard coded key milestones in the date table for 2 and 10 business days.
Then JOIN your date table with your transaction table on your zero day and date key.
Finally this allows you to make solid calculations of business days.
WHERE CONVERT(date, D.DTRESOLVED) <= CONVERT(date, [10th_Bus_Day])
I'm creating a subscription management system, and need to generate a list of upcoming billing date for the next 2 years. I've been able to use generate_series to get the appropriate dates as such:
SELECT i::DATE
FROM generate_series('2015-08-01', '2017-08-01', '1 month'::INTERVAL) i
The last step I need to take is exclude specific date ranges from the calculation. These excluded date ranges may be any range of time. Additionally, they should not be factored into the time range for the generate_series.
For example, say we have a date range exclusion from '2015-08-27' to '2015-09-03'. The resulting generate_series should exclude the date that week from the calculation, and basically push all future month billing dates one week to the future:
2015-08-01
2015-09-10
2015-10-10
2015-11-10
2015-12-10
First you create a time series of dates over the next two years, EXCEPT your blackout dates:
SELECT dt
FROM generate_series('2015-08-01'::date, '2017-08-01'::date, interval '1 day') AS s(dt)
EXCEPT
SELECT dt
FROM generate_series('2015-08-27'::date, '2015-09-03'::date, interval '1 day') as ex1(dt)
Note that you can have as many EXCEPT clauses as you need. For individual blackout days (as opposed to ranges) you could use a VALUES clause instead of a SELECT.
Then you window over that time-series to generate row numbers of billable days:
SELECT row_number() OVER (ORDER BY dt) AS rn, dt
FROM (<query above>) x
Then you select those days where you want to bill:
SELECT dt
FROM (<query above>) y
WHERE rn % 30 = 1; -- billing on the first day of the period
(This latter query following Craig's advice of billing by 30 days)
Yields:
SELECT dt
FROM (
SELECT row_number() OVER (ORDER BY dt) AS rn, dt
FROM (
SELECT dt
FROM generate_series('2015-08-01'::date, '2017-08-01'::date, interval '1 day') AS s(dt)
EXCEPT
SELECT dt
FROM generate_series('2015-08-27'::date, '2015-09-03'::date, interval '1 day') as ex1(dt)
) x
) y
WHERE rn % 30 = 1;
You will have to split the call to generate series for exclusions. Some thing like this:
Union of 3 queries
First query pulls dates from start to exclusion range from
Second query pulls dates between exclusion range to and your end date
Third query pulls dates when none of your series dates cross exclusion range
Note: You still need a way to loop through exclusion list (if you have one). Also this query may not be very efficient as such scenarios can be better handled through functions or procedural code.
I have a table with a row for each sale of a product. These rows include a date. I want to know the number of products sold for each distinct day in a range (user specifies the begin and end dates.) There is a distinct row for each sale, so on days where several products were sold, several rows with this date exist. I want to count the number of these rows, with the same date. How might this be done efficiently in postgresql?
Example
2015-01-02: 0
2015-01-03: 7
2015-01-04: 2
Assuming the date column is stored as a datetime, something like this should get you in the right direction:
SELECT date_trunc('day', TIMESTAMP sales.date) as day, COUNT(*)
FROM sales
WHERE day >= start and <= end
GROUP BY day;
where start and end are filled in by the user.
More documentation for postgres's date features found here:
http://www.postgresql.org/docs/9.4/static/functions-datetime.html
If they are not datetime, you can use the to_date function with the appropriate format string to convert to datetime and the use the solution above:
SELECT date_trunc('day', to_date('2015-01-02', 'YYYY-MM-DD'));
http://www.postgresql.org/docs/9.4/static/functions-formatting.html