Selecting data based off rolling quarter value - postgresql

I need to create a condition that will filter for dates in the next 8 quarters worth of data, based on the current quarter. The issue is this needs to be rolling. For example, as I'm writing this, it is 2/2/2023. The data condition would need to be such that it takes the data in the current quarter (i.e., Q1), and subsequently the current quarter +1, + 2, etc. The issue is when I get to a new year, as PostgreSQL does not allow you to do datefield + interval '1 quarter'. So, when I need an automated way of pulling the data for Q1 in 2023, I can't simply use the interval. This is also an issue when, say, I get to Q4. I can't do datefield + interval '1 quarter', because that gives Q5, which does not exist.
Any tips for getting this taken care of? My current thinking is that i need to create conditional logic where, if current quarter is Q4, filter where data is in Q1 of current year + 1, but am wondering if there are more efficient ways of doing this.
My current (and incomplete) solution is as follows:
select *
from mytable
where extract(quarter from datefield) = extract(quarter from current_date + interval '1 quarter')
and datefield >= concat(extract(year from current_date), '-01-01')::date
and datefield <= current_date + interval '2 years'
Thanks!

After another hour or so of digging, I found another post which answered this question:
see here
The solution relies on date_trunc, which makes lots of sense.

Related

Difference between two timestamps as timestamp across multiple days

I have two timestamps and I would like to have a result with the difference between them. I found a similar question asked here but I have noticed that:
select
to_char(column1::timestamp - column2::timestamp, 'HH:MS:SS')
from
table
Gives me an incorrect return if these timestamps cross multiple days. I know that I can use EPOCH to work out the number of hours/days/minutes/seconds etc but my use case requires the result as a timestamp (or a string...anything not an interval!).
In the case of multiple days I would like to continue counting the hours, even if it should go past 24. This would allow results like:
36:55:01
I'd use the built-in date_part function (as previously described in an older thread: How to convert an interval like "1 day 01:30:00" into "25:30:00"?) but finally cast the result to the type you desire:
SELECT
from_date,
to_date,
to_date - from_date as date_diff_interval,
(date_part('epoch', to_date - from_date) * INTERVAL '1 second')::text as date_diff_text
from (
(select
'2018-01-01 04:03:06'::timestamp as from_date,
'2018-01-02 16:58:07'::timestamp as to_date)
) as dates;
This results in the following:
I'm currently unaware of any way to convert this interval into a timestamp and also not sure whether there is a use for it. You're still dealing with an interval and you'd need a point of reference in time to transform that interval into an actual timestamp.

Tricking Weekofyear in Hive by shifting the week, for counting

I've been working on this problem for a while now. Basically I have a simple set of data with UserId, and TimeStamp. I want to know how many distinct UserId's appear each week, the catch is my week is measured in Sunday-Saturday, NOT Monday - Sunday, which is what Weekofyear() uses.
Right now I'm hardcoding each week and running the query:
SELECT
count(distinct UserId)
FROM data.table
where from_unixtime((CAST(timestamp as BIGINT)))
between TO_DATE("2016-06-05") AND TO_DATE("2016-06-12")
I'm trying to find a way to shift the timestamp back a day to trick weekofyear into thinking my Sunday is actually a Monday, but have not been successful. My latest futile attempt looked like:
SELECT
count(distinct UserId), weekofyear(date_sub(from_unixtime(CAST(timestamp as BIGINT)),1))
FROM table.data
where from_unixtime((CAST(timestamp as BIGINT)))
between TO_DATE("2016-06-01") AND TO_DATE("2016-06-30")
group by weekofyear(date_sub(from_unixtime(CAST(timestamp as BIGINT)),1))
This results in the same numbers as if I didn't subtract a day. I not sure why this isn't working. I feel like there should be a way to manage this. Right now if I wanted to pull all the data by week WHERE X is true, I'd have to manually do each week, that won't be sustainable. Any suggestions on how to work smarter?
Thank you.
Simple Solution
You can simply create your own formula instead of going with pre-defined function for "week of the year"
Advantage: you will be able to take any set of 7 days for a week.
In your case since you want the week should start from Sunday-Saturday we will just need the first date of sunday in a year
eg- In 2016, First Sunday is on '2016-01-03' which is 3rd of Jan'16
--assumption considering the timestamp column in the format 'yyyy-mm-dd'
SELECT
count(distinct UserId), lower(datediff(timestamp,'2016-01-03') / 7) + 1 as week_of_the_year
FROM table.data
where timestamp>='2016-01-03'
group by lower(datediff(timestamp,'2016-01-03') / 7) + 1;

Query only records that fall within a date range with postgresSQL (Redshift)

I am trying to grab all records from today's date to 14 weeks prior. I also need that same week but a year ago. I have the following:
WHERE date(date) >= date(dateadd(week,-14, current_date))
OR date(date) >= date(dateadd(week,-52, current_date))
OR date(date) <= date(dateadd(week,-53, current_date))
It doesn't seem to be working properly.
You could look into the use of INTERVAL to subtract from GETDATE() -- it may make things easier to read/reason about, but DATEADD should work in theory (although it isn't standard Postgres -- INTERVAL is).
One thing that immediately stands out from your stated requirements versus your code is that as written, from a boolean-logic perspective, the code doesn't match your requirements.
You said you need all records from between today and 14 weeks ago, and also from the same week a year ago.
What your code says is: give me all records where the date is within the past 14 weeks, or the date is greater than 52 weeks ago or the date is less than 53 weeks ago.
From a purely boolean perspective, what you want is more like something like this:
WHERE date(date) >= date(dateadd(week,-14, current_date))
OR (date(date) >= date(dateadd(week,-52, current_date))
AND date(date) <= date(dateadd(week,-53, current_date))
)
Note the additional set of parenthesis, and the switch amongst the additional set from or to and.
Additionally, I think you also may want to, within that additional set of parens, reverse the >= and <=, since in this case it's the larger negative number that represents the earlier date. This will all depend on exactly how DATEADD works, which I'm not familiar with since I use Postgres instead of Redshift (Redshift is a fork of Postgres 8.0), but based on how the Redshift doc seems to say it works, I believe you do want them reversed.
So then it becomes:
WHERE date(date) >= date(dateadd(week,-14, current_date))
OR (date(date) <= date(dateadd(week,-52, current_date))
AND date(date) >= date(dateadd(week,-53, current_date))
)
This is what I meant by it might be easier to reason about the logic when using INTERVAL -- using these large negative offsets, while they can certainly be made to work, just aren't as intuitive to reason about, IMO. It may also be worth switching the two sides of the inner AND, so the -53, the earlier value, is on the left side (won't change the functionality, but may make the use of DATEADD easier to reason about within this code).
Amazon Redshift is based on PostgreSQL 8.0. It doesn't support the "interval" data type, but docs say it does support interval arithmetic. I don't see any hint that it supports dateadd(), but I might have missed it.
This query will work as written on current versions of PostgreSQL and on SQLFiddle for testing and verification. I don't think it will work as written on Redshift, but the important part for you is just the WHERE clause. The common table expression is just to give us a calendar to do interval arithmetic against. You won't need it on Redshift.
It's not clear what you mean by "that same week but a year ago". You don't mention any particular week; you seem to be interested in a 14-week period. This query returns rows from
2014-09-23 (14 weeks ago) to 2014-12-30 (current date), and
the same dates in 2013.
with calendar as (
select (generate_series((current_date - interval '18 months'), current_date, '1 day'))::date cal_date
)
select cal_date
from calendar
where cal_date between (current_date - interval '14 weeks')::date
and current_date
or cal_date between ((current_date - interval '1 year')::date - interval '14 weeks')::date
and (current_date - interval '1 year')::date
order by cal_date;
cal_date
--
2013-09-23
...
2013-12-30
2014-09-23
...
2014-12-30

How do I do a running year for a calendar report?

Here is the Query I am using:
SELECT * FROM ra_ProjectCalendar
WHERE MonthNumber between #Month and #Month + 12 and FullYear = #Year
It works great for this year, but, stops at December of this year. How do I get it to show a running year?
how is your date stored?
from your sql, i guess that you have one field for month, one for year and one for day. i'd recommend having one single datetime field instead, and then you can use the DateAdd() method to add 12 months (or any other interval).
EDIT: it was pointed out to me in a comment that one thing you gain on this is performance - which is more or less important depending on the scale of your applicaiton, but always nice to be aware of. if you run this query in a stored procedure, here's what you'll do:
DECLARE #oneYearAhead datetime;
SET #oneYearAhead = DateAdd(m, 12, #PassedDate)
SELECT * FROM ra_ProjectCalendar
WHERE #PassedDate <= [Date] AND [Date] <= #oneYearAhead;
with the above code, you will get all entries between the date you pass, and one year ahead of it. (i am not 100% sure on the syntax to declare and set the #oneYearAhead variable, but you get the idea...). Note that the [] that i use around the column name Date allows me to use reserved words in column names - i have made it a habit, instead of memorizing which words are reserved...
Try this:
SELECT * FROM ra_ProjectCalendar
WHERE
(MonthNumber > #Month AND FullYear = #Year)
OR (MonthNumber < #Month AND FullYear = #Year + 1)
Something like this might work better - but its not clear what your intent is:
select *
from ra_ProjectCalendar
where DATEDIFF ( month , #startdate , #enddate ) <=12
Edit: This is SQl server syntax
Actually pass in and store the date rather than month and year, and compare with the current date using DateDiff:
WHERE DATEDIFF(m, #PassedDate, StoredDateColumn) < 12
Note that I used 12 months, rather than 1 year, because DATEDIFF counts the number of boundries crossed. So in december, using datediff with the year datepart would return one after only one month. You really want a 12 month span.
Depending on how your information is stored (it's a little difficult to tell from your query), either DATEADD or DATEDIFF will probably do you right. There's a list of the availabe date functions on MSDN.

How to convert an interval like "1 day 01:30:00" into "25:30:00"?

I need to add some intervals and use the result in Excel.
Since
sum(time.endtime-time.starttime)
returns the interval as "1 day 01:30:00" and this format breaks my Excel sheet, I thought it'd be nice to have the output like "25:30:00" but found no way to do it in the PostgreSQL documentation.
Can anyone here help me out?
Since there is not an exact solution for the topic:
=> SELECT date_part('epoch', INTERVAL '1 day 01:30:00') * INTERVAL '1 second' hours;
hours
-----------
25:30:00
(1 row)
Source: Documentation
The only thing I can come with (beside parsing the number of days and adding 24 to the hours every time) is :
mat=> select date_part('epoch', '01 day 1:30:00'::interval);
date_part
-----------
91800
(1 row)
It will give you the number of seconds, which may be ok for excel.
You could use EXTRACT to convert the interval into seconds.
SELECT EXTRACT(EPOCH FROM INTERVAL '5 days 3 hours');
Result: 442800
Then you would need to do your own maths (or let Excel do it).
Note that '1 day' is not necessarily equivalent to '24 hours' - PostgreSQL handles things like an interval that spans a DST transition.
If you wanted postgres to handle the HH:MM:SS formatting for you, take the difference in epoch seconds and convert it to an interval scaled in seconds:
SELECT SUM(EXTRACT(EPOCH FROM time.endtime) - EXTRACT(EPOCH FROM time.starttime))
* INTERVAL '1 SECOND' AS hhmmss
In standard SQL, you want to represent the type as INTERVAL HOUR TO SECOND, but you have a value of type INTERVAL DAY TO SECOND. Can you not use a CAST to get to your required result? In Informix, the notation would be either of:
SUM(time.endtime - time.starttime)::INTERVAL HOUR(3) TO SECOND
CAST(SUM(time.endtime - time.starttime) AS INTERVAL HOUR(3) TO SECOND)
The former is, AFAIK, Informix-specific notation (or, at least, not standard); the latter is, I believe, SQL standard notation.
It can be done, but I believe that the only way is through the following monstrosity (assuming your time interval column name is "ti"):
select
to_char(floor(extract(epoch from ti)/3600),'FM00')
|| ':' || to_char(floor(cast(extract(epoch from ti) as integer) % 3600 / 60), 'FM00')
|| ':' || to_char(cast(extract(epoch from ti) as integer) % 60,'FM00')
as hourstamp
from whatever;
See? I told you it was horrible :)
It would have been nice to think that
select to_char(ti,'HH24:MI:SS') as hourstamp from t
would worked, but alas, the HH24 format doesn't "absorb" the overflow beyond 24. The above comes (reconstructed from memory) from some code I once wrote. To avoid offending those of delicate constitution, I encapsulated the above shenanigans in a view...