Tricking Weekofyear in Hive by shifting the week, for counting - date

I've been working on this problem for a while now. Basically I have a simple set of data with UserId, and TimeStamp. I want to know how many distinct UserId's appear each week, the catch is my week is measured in Sunday-Saturday, NOT Monday - Sunday, which is what Weekofyear() uses.
Right now I'm hardcoding each week and running the query:
SELECT
count(distinct UserId)
FROM data.table
where from_unixtime((CAST(timestamp as BIGINT)))
between TO_DATE("2016-06-05") AND TO_DATE("2016-06-12")
I'm trying to find a way to shift the timestamp back a day to trick weekofyear into thinking my Sunday is actually a Monday, but have not been successful. My latest futile attempt looked like:
SELECT
count(distinct UserId), weekofyear(date_sub(from_unixtime(CAST(timestamp as BIGINT)),1))
FROM table.data
where from_unixtime((CAST(timestamp as BIGINT)))
between TO_DATE("2016-06-01") AND TO_DATE("2016-06-30")
group by weekofyear(date_sub(from_unixtime(CAST(timestamp as BIGINT)),1))
This results in the same numbers as if I didn't subtract a day. I not sure why this isn't working. I feel like there should be a way to manage this. Right now if I wanted to pull all the data by week WHERE X is true, I'd have to manually do each week, that won't be sustainable. Any suggestions on how to work smarter?
Thank you.

Simple Solution
You can simply create your own formula instead of going with pre-defined function for "week of the year"
Advantage: you will be able to take any set of 7 days for a week.
In your case since you want the week should start from Sunday-Saturday we will just need the first date of sunday in a year
eg- In 2016, First Sunday is on '2016-01-03' which is 3rd of Jan'16
--assumption considering the timestamp column in the format 'yyyy-mm-dd'
SELECT
count(distinct UserId), lower(datediff(timestamp,'2016-01-03') / 7) + 1 as week_of_the_year
FROM table.data
where timestamp>='2016-01-03'
group by lower(datediff(timestamp,'2016-01-03') / 7) + 1;

Related

SQL - Getting day for the whole week

I want to get the whole day of the week depend on the date, my query is working and getting the result that I want but when the date value is Sunday, result changes.
I'm starting the result from Mon to Sunday.
Examples below:
My Code:
SELECT UserID,Scdl_TkIN as TimeIn, Scdl_TkOut as [TimeOut]
FROM EmployeeTimekeeping
WHERE CONVERT(DATE,Scdl_TkIN) >= dateadd(day, 2-datepart(dw, '2022-04-23'),CONVERT(date,'2022-04-23'))
AND CONVERT(DATE,Scdl_TkIN) < dateadd(day, 9-datepart(dw, '2022-04-23'), CONVERT(date,'2022-04-23'))AND UserID ='15020009'
ORDER BY CONVERT(DATE,Scdl_TkIN)
1st display is correct, but when I change the value into '2022-04-24' , the result is now the second pic but I want the result still 1st pic.
If I got it right you want the whole week of data given a single date.
I'm not 100% sure about your date logic and I'd rather use the WEEK as a filter as it seems clearer, that said the issue you have is the value of SELECT ##DATEFIRST.
By default its value is 7, meaning that Sunday is considered the first day of the week, that's why you get that "unexpected" result.
here is my solution, but just setting SET DATEFIRST 1; should give you the expected result.
SET DATEFIRST 1;
SELECT
UserID
,Scdl_TkIN as TimeIn
,Scdl_TkOut as TimeOut
FROM EmployeeTimekeeping
WHERE
DATEPART(WEEK,Scdl_TkIN) = DATEPART(WEEK,'2022-04-23')
AND YEAR(Scdl_TkIN) = YEAR('2022-04-23')
AND UserID ='15020009'
ORDER BY
Scdl_TkIN
Note: if you decide to use WEEK for filtering you will have to choose between WEEK and ISO_WEEK
Edit: when using week you must also consider the year in the filter

How can I always get the full period when grouping by week in PostgreSQL?

I'm used to do the following syntax when analysing weekly data:
select week(creation_date)::date as week,
count(*) as n
from table_1
where creation_date > current_date - 30
group by 1
However, by doing this I will get just part of the first week.
Is there any smart way to alway get a whole week in the beginning?
Like get the first day of the week I would get half of.
First off you need to define what you mean by "week". This is more difficult than it appears. While humans have an intuitive since of a week, computers are just not that smart. There are 2 common conventions: the ISO-8601 Standard and, for lack of a better term, Traditional. ISO-8601 defines a week as always beginning on Monday and always containing 7 days. Traditional weeks begin on Sunday (usually) but may have weeks with less than 7 days. This results from having the 1st week of the year beginning on 1-Jan regardless of day of week. Thus the 1st and/or last weeks may have less than 7 days. ISO-8601 throws it own curve into the mix: the 1st week of the year begins on the week containing 4-Jan. Thus the last days of Dec may be in week 1 of the next year and the first days Jan may be in week 52/53 of the prior year.
All the below assume the ISO-8061.
Secondly there is no week function in Postgres. In you need extract function. So for this particular case:
select extract(week from creation_date)::integer as week, ...
Finally, your predicate (current_date - 30) ensures you will unusually not begin on the 1st of the week. To get the correct date take that result back 1 week, then go forward to the next Monday.
with days_to_monday (day_adj) as
( values ('{7,6,5,4,3,2,1}'::int[]) )
select current_date - 30
, current_date - 30 - 7 + day_adj[extract (isodow from current_date - 30 )]
from table_1 cross join days_to_monday;
The CTE establishes an array which for a given day of the week contains the number of days need to the next Monday. That main query extracts the day of week of current date and uses that to index the array. The corresponding value is added to get the proper date.
Putting that together with your original query to arrive at:
with next_week (monday) as
( values (current_date - 30 - 7
+ ('{7,6,5,4,3,2,1}'::int[])[extract (isodow from current_date - 30 )])
)
select extract(week from creation_date) as week,
count(*) as n
from table_1
where creation_date >= (select monday from next_week)
group by 1
order by 1;
For full example see fiddle.

how to find number of days since 28th of last month till 27th of current month in db2

I need to generate a report on 28th of every month .
So for that I need to run an autosys job.
In that I have a query with the condition
validation_date >= (number of days since last run)
Could you please help me on this .How can I achieve this condition in DB2 ?
This is a monthly job.So I don't want to hard code my previous run date in the query .At the same time I need to get a condition which satisfies for all the months .
Note :
If the query is running on feb 28th ,then feb 28th is not included. I need to get data from january 28th(included) till feb 27th(included)
similarly for march 28th run ,I need to get data from feb 28th(included) till march 27th(included)...Thanks in advance.Please help
Consider putting your report generation in a procedure, and parameterizing the start and end dates. In other words, have something like this:
create procedure monthly_report(
start_date date,
end_date date
)
language sql
begin
... report queries here ...
end
Now you potentially have something much more flexible (depending on the report requirements). If, in the future, you want to run a report on a different day, or for a different length of time, you will be able to do that.
Once you design it this way, it also may be easier to set the dates in your job scheduling script, rather than in SQL. If you did it in SQL, you could do something like this:
call monthly_report(
(select
year(current timestamp - 2 months) ||'-'||
month(current timestamp - 2 months) ||'-'||
'28' from sysibm.sysdummy1
),
(select
year(current timestamp - 1 month) ||'-'||
month(current timestamp - 1 month) ||'-'||
'27' from sysibm.sysdummy1
)
)
You may need to tweak it to handle some edge cases (I'm not exactly sure if you care what happens if it runs on the 29th of the month, and if so, how to handle it). But you get the basic approach.
You can use DAY() function that extracts day of month from date and you can use it for triggering job. for example where day(param)=28.
other two parameters can be calculated with date calculation , here is example for trigger , date_to value and date_from value
select day(timestamp_format(20170228,'yyyyMMdd') ),timestamp_format(20170228,'yyyyMMdd')- 1 DAY,timestamp_format(20170228,'yyyyMMdd') -1 month from sysibm.sysdummy1;
if your parameter/column is date/timestamp you can remove timestamp_format(20170228,'yyyyMMdd') function and just put your column/parameter

How do I find users created exactly multiple whole months/quarters ago in Postgres SQL?

How do I find all the user records created exactly multiple whole months/quarters ago in Postgres SQL? Is it possible?
In Postgres, you can find a date which is exactly 1 month away from a datetime field, like:
select created_at + interval '1 month' from users limit 1;
?column?
----------------------------
2016-10-05 17:05:14.811537
(1 row)
If today is 9/27/2016, I want to find all the user records created on 8/27/2016, 7/27/2016, 6/27/2016, ... 1/27/2015, etc, etc. How do I do this in SQL?
Note this isn't simply a comparison of date due to the fact that some months have 31 days while others have 28, 29, 30 days.
If today is 2/28/2015 (non-leap year), I want all the users created on the following dates: 1/28/2015, 1/29/2015, 1/30/2015, 1/31/2015, 12/28/2014, 12/29/2014, 12/30/2014, 12/31/2014, etc etc.
If today is 2/28/2016 (leap year), I want all the users created on the following dates: 1/28/2015, 12/28/2014, 11/28/2014, etc etc. (But not on 1/29/2015, 1/30/2015, 1/31/2015, as those users will be picked up the next day, see below).
If today is 2/29/2016 (leap year), I want all the users created on the following dates: 1/29/2015, 1/30/2015, 1/31/2015, 12/29/2014, 12/30/2014, 12/31/2014, etc etc.
If today is 3/31/2016, I want all the users created on 12/31/2015, 1/31/2016, but not anyone created in February 2016 because they would have been picked on previous days of March 2016.
How do I do the above with quarters instead of months?
Another question related to this is performance. If I create an index on created_at, would a whole table scan be avoided if I do this type of queries?
Thank you.
Well... If You think of it, You want to compare day number in reality, right? So You should do just that:
select
*
from
users
where
date_part('day', created_at) = date_part('day', current_timestamp)
OR
(
-- Check if this is the last day of month
extract(month from current_timestamp + '1day'::interval) <> extract(month from current_timestamp)
AND
date_part('day', created_at) > date_part('day', current_timestamp)
)
limit
1
;
And regarding Your index question - yes.
Inspiration for checking last day of a month taken from here.
Basically, if I still do not get Your requirement, You should be able to easily modify my code to meet it, if You understand it. :)

Get this week's monday's date in Postgres?

How can I get this week's monday's date in PostgreSQL?
For example, today is 01/16/15 (Friday). This week's monday date is 01/12/15.
You can use date_trunc() for this:
select date_trunc('week', current_date);
More details in the manual:
http://www.postgresql.org/docs/current/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC
If "today" is Monday it will return today's date.
SELECT current_date + cast(abs(extract(dow FROM current_date) - 7) + 1 AS int);
works, although there might be more elegant ways of doing it.
The general idea is to get the current day of the week, dow, subtract 7, and take the abs, which will give you the number of days till the end of the week, and add 1, to get to Monday. This gives you next Monday.
EDIT: having completely misread the question, to get the prior Monday, is much simpler:
SELECT current_date - ((6 + cast(extract(dow FROM current_date) AS int)) % 7)
ie, subtract the current day of the week from today's date (the number of day's past Monday) and add one, to get back to Monday.
And for other mondays:
Next Monday:
date_trunc('week', now())+ INTERVAL '7days'
Last week's monday:
date_trunc('week', now())- INTERVAL '7days'
etc. :)
I usually use a calendar table. There are two main advantages.
Simple. Junior devs can query it correctly with little training.
Obvious. Correct queries are obviously correct.
Assuming that "this week's Monday" means the Monday before today, unless today is Monday then . . .
select max(cal_date) as previous_monday
from calendar
where day_of_week = 'Mon'
and cal_date <= current_date;