calculating last 5 years from current date in hive - date

I need to calculate some count based on the given time frame
I need to consider the dates between current date and last 5 years
select count(*) from table where (year(current_date) -year('2015-12-01')) < 5 ;
above query will give counts for last 5 years however it will consider only year part but I need exact counts considering days so if I write
select count(*) from table where datediff(current_date,final_dt) <= 1825 ;
it won't consider the leap years if any in the last 5 years
so Is there any function in hive to calculate exact difference between two dates consider scenarios like leap years?

Use add_months function (assuming the dates should go back to 2013-05-25 with the current date being 2018-05-25).
select count(*)
from table
where final_dt >= add_months(current_date,-60) and final_dt <= current_date

I think you are trying to calculate count(*) all records between current_date and a date which is 5 year in the past from current_date, in this case, you can do something like this:
SELECT count(*) FROM table_1 WHERE date_column BETWEEN current_date AND to_date(CONCAT(YEAR(current_date) - 5, '-', MONTH(current_date), '-', DAY(current_date)));
And SELECT datediff( current_date() ,to_date(CONCAT(YEAR(current_date) - 5, '-', MONTH(current_date), '-', DAY(current_date))));
gives you 1826 (considering the fact that 2016 is a leap year).

Related

Grouping data by quarter intervals (or any time interval) with a defined starting basis in postgresql

Let's say I have a table orders with columns amount and order_date.
I want to be able to group this data by quarters and aggregate the amount, the catch however is that the quarters do not start on January 1st but on any given arbitrary date, say July 12th. These quarters are also split in 13 week intervals. From what I see using something like date_trunc such as:
SELECT SUM(orders.amount), DATE_TRUNC('quarter', orders.order_date) AS interval FROM orders WHERE orders.order_date BETWEEN [date_start] AND [date_end] GROUP BY interval
is out of the question as this forces quarters to start on Jan 1st and it has 'hardcoded' quarter starting dates (Apr 1st, Jul 1st, etc).
I have tried using something like:
SELECT SUM(orders.amount),
to_timestamp(floor((extract('epoch' from orders.order_date / 7862400 )) * 7862400 ) AT TIME ZONE 'UTC' AS interval
FROM orders
WHERE orders.order_date BETWEEN [date_start] AND [date_end]
GROUP BY interval
(where 7862400 is the time interval that I want)
But with this method I cannot figure out how to set the offset for the initial grouping date, in my example I would like it to start from July 12th of each year (then count 13 weeks and start the next quarter, and so on). Hope I was clear and I would appreciate any help!
You can use generate_series() to create the first day of each quarter, join it and group by it.
SELECT quarters.first_day,
quarters.first_day + '13 weeks'::interval last_day,
sum(orders.amount) amount
FROM orders
LEFT JOIN generate_series('2019-07-12'::timestamp,
'2020-07-10'::timestamp,
'13 weeks'::interval) quarters (first_day)
ON quarters.first_day <= orders.order_date
AND quarters.first_day + '13 weeks'::interval > orders.order_date
WHERE orders.order_date BETWEEN [date_start]
AND [date_end]
GROUP BY quarters.first_day,
quarters.first_day + '13 weeks'::interval;
You just need to make sure, that the boundary days you give the generate_series() cover the whole period you want to query, so that depends on your [date_start] and [date_end].
You can generate your own 'quarterly calendar' and use that in place of the Postgers 'quarter' date extraction.
create or replace function quarterly_calendar(annual_date text default extract('YEAR' from current_date)::text)
returns table( quarter integer
, quarter_start_date date
, quarter_end_date date
)
language sql immutable strict leakproof
as $$
with RECURSIVE quarters as
(select 1 qtr, qdt::date q_start_dt, (qdt + interval '90 day' )::date q_end_dt, (qdt+interval '1 year' - interval '1 day')::date last_dt
from ( select date_trunc('year',current_date) + interval '6 month 11 day' qdt) q
union all
select qtr+1, (q_end_dt + interval '1 day')::date, least ((q_end_dt + interval '91 day')::date,last_dt), last_dt
from quarters
where qtr+1 <=5
)
select qtr, q_start_dt, q_end_dt
from quarters;
$$;
-- test
select * from quarterly_calender();
It does actually create 5 quarters. But that is because a year is not a multiple of 13 weeks (or 91 days or 7862400 seconds). In your given year from 12-July-2019 through 11-July-2020 is 2 days (366 days total) over 4 times that interval. You'll have to decide how to handle that 5th quarter. It occurs every year, having either 1 or 2 days. Hope this helps .

PostgreSQL Selecting The Closest Previous Month of June

I am trying to write a piece for a query that grabs the closest, past June 1st. For example, today is 10/2/2018. If I run the query today, I need it to use the date 6/1/2018. If I run it on 5/29/2019, it still needs to grab 6/1/2018. If I run it on 6/2/2019, it should then grab 6/1/2019. If I run it on 6/2/2022, it should then grab 6/1/2022 and so on.
I believe I need to start with something like this:
SELECT CASE WHEN EXTRACT(MONTH FROM NOW())>=6 THEN 'CURRENT' ELSE 'RF LAST' END AS X
--If month is greater than or equal to 6, you are in the CURRENT YEAR (7/1/CURRENT YEAR)
--If month is less than 6, then reference back to the last year (YEAR MINUS ONE)
And I believe I need to truncate the date then perform an operation. I am unsure of which approach to take (if I should be adding a year to a timestamp such as '6/1/1900', or if I should try to disassemble the date parts to perform an operation. I keep getting errors in my attempts such as "operator does not exist". Things I have tried include:
SELECT (CURRENT_DATE- (CURRENT_DATE-INTERVAL '7 months'))
--This does not work as it just gives me a count of days.
SELECT (DATE_TRUNC('month',NOW())+TIMESTAMP'1900-01-01 00:00:00')
--Variations of this just don't work and generally error out.
Use a case expression to determine if you need to use the current year, or, the previous year (months 1 to 5)
case when extract(month from current_date) >= 6 then 0 else -1 end
then add that to the year extracted from current_date, e.g. using to_date()
select to_Date('06' || (extract(year from current_date)::int + case when extract(month from current_date) >= 6 then 0 else -1 end)::varchar, 'mmYYYY');
You could also use make_date(year int, month int, day int) in postgres 9.4+
select make_date(extract(year from current_date) + case when extract(month from current_date) >= 6 then 0 else -1 end, 6, 1) ;
If month lower than 6, trunc year and minus 6 months.
Else trunc year and add 6 months.
set datestyle to SQL,MDY;
select
case when (extract( month from (date::date)))<6 then date_trunc('year',date)-'6 month'::interval
else date_trunc('year',date)+'6 months'::interval
end as closest_prev_june,
another_column,
another_column2
from mytable;
But format is default and supposed you have a column that named date.
If you want to do this with now(), change date columns with now()
function.

Create buffers postgis table from points using date filter

I,m trying to create view or table where are polygons generated from point buffers and group all polygons based on year.
Data fields from source table:
point_id(int), point_created(date), geom geometry(Point,3301)
1, 2014-05-09, point
2, 2015-01-01, point
2, 2015-02-05, point
3, 2016-02-05, point
4, 2017-02-10, point
I was able to create table where I grouped all points by year and generated buffer as multipolygon but what I need is to group point based on date between years(group more than one year points together), so table must look like:
polygon(geom), nr_of_features(int), year(string)
1, 1, 2014(all point from 2014)
2, 3, 2015(all points from 2014 to 2015)
3, 4, 2016(all points from 2014 to 2016)
4, 5, 2017(all points from 2014 to 2017)
Script I,m using right now:
CREATE TABLE my_new_table as
SELECT ST_Union(ST_Buffer(geom,10))::geometry(MultiPolygon,3301) as polygon,count(point_id)::integer as nr_of_features,
extract(year from point_created) as year
FROM my_table
group by year;
Any help is welcome.
If I understand you correctly you need something like that:
--just for create few years
WITH RECURSIVE dates AS (
SELECT '2014-12-31'::timestamp years
UNION ALL
SELECT years + interval '1 year' FROM dates WHERE years < '2019-12-31'
)
SELECT
EXTRACT(YEAR FROM d.years) years,
p.*
FROM
points p
INNER JOIN
dates d ON p.date <= d.years
;
Now you can use the years column to aggregate the points.
I had to modify this script a little because point dates is between 1994 and current date, so I changed you sample value '2014-12-31' to 2013-12-31 and '2019-12-31' to current_date. It seems working now, thanks you. I have 13000 points and it takes 173msec to create table using this query:
WITH RECURSIVE dates AS (
SELECT '1993-12-31'::timestamp years
UNION ALL
SELECT years + interval '1 year' FROM dates WHERE years < current_date
)
SELECT
EXTRACT(YEAR FROM d.years) years,
ST_Multi(ST_Collect(geom))::geometry(MultiPoint,3301) as polygon, count(years)::int as nr_features
FROM
my_table p
INNER JOIN
dates d ON p.point_created <= d.years
group by years
order by years asc;

Monthly count of objects with start and end date using TSQL

I would like to create a bar chart displaying the number of objects that were a available on a monthly base. All rows have a start and end date. I know how to do the count for a single month:
SELECT COUNT(*) As NumberOfItems
FROM Items
WHERE DATEPART(MONTH, Items.StartDate) <= #monthNumber
AND DATEPART(MONTH, Items.EndDate) >= #monthNumber
Now I would like do create the SQL to get the month number and the number of items using a single SELECT statement.
Is there any elegant way of accomplishing this? I am aware I have to take the year number into account.
Assuming Sql Server 2005 or newer.
CTE part will return month numbers spanning years between #startDate and #endDate. Main body joins month numbers with items performing the same conversion on Items.StartDate and Items.EndDate.
; with months (month) as (
select datediff (m, 0, #startDate)
union all
select month + 1
from months
where month < datediff (m, 0, #endDate)
)
select year (Items.StartDate) Year,
month (Items.StartDate) Month,
count (*) NumberOfItems
from months
inner join Items
on datediff (m, 0, Items.StartDate) <= months.month
and datediff (m, 0, Items.EndDate) >= months.month
group by
year (Items.StartDate),
month (Items.StartDate)
Note: if you intend to span more than hundred months you will need option (maxrecursion 0) at the end of query.

Creating a date series in postgresql 8.3

I am trying to create a series of dates from a fixed date in past to current date, in month increments. I know this is possible in 8.4 with a new feature but i am stuck with 8.3 for now.
I feel I am going down a rabbit hole here as I have this sql to get me monthly increments
SELECT date('2008-01-01') + (to_char(a,'99')||' month')::interval as date FROM generate_series(0,20) as a;
I am then trying to extract months and years from the interval of current date - fixed date
SELECT extract( month from interval (age(current_date, date('2008-01-01'))) );
but im beginning to think this is a silly way to get the desired date series.
Could work like this:
SELECT ('2008-01-01 0:0'::timestamp
+ interval '1 month' * generate_series(0, months))::date
FROM (
SELECT (extract(year from intv) * 12
+ extract(month from intv))::int4 AS months
FROM (SELECT age(now(), '2008-01-01 0:0'::timestamp) as intv) x
) y
In case someone would need e.g. 3 hour interval inside given date range:
SELECT ('2013-01-01 0:0'::timestamp
+ interval '1 hour' * generate_series(0, ('2013-02-01'::date - '2013-01-01'::date)*24, 3))::timestamp;