How can I compute week dates in hive? - date

Background
Postgresql has the nice function date_trunc() which makes it easy to compute the date a week starts. This is great for aggregations on a week level. eg.
SELECT
date_trunc('week', create_date),
count(*)
FROM ...
GROUP BY 1;
HiveQL has the function WEEKOFYEAR() that gives you the week number. If you combine this with YEAR() you can make aggregates of the same type as in postgres.
SELECT
YEAR(create_date),
WEEKOFYEAR(create_date),
count(*)
FROM ...
GROUP BY YEAR(create_date), WEEKOFYEAR(create_date);
This is great. But what if I would like the actual date of the week?
Question
How can I compute the week date in HiveQL, from either a year and week number or directly from a timestamp?

Well there are not many functions in Hive. So it has the support for Custom UDF. You write your own function and integrate in Hive.
Here are some of the UDF which might be helpful:
1.) Link 1
2.) Link 2
Hope this helps..!!!

Related

Get truncked data from a table - postgresSQL

I want to get truncked data over the last month. My time is in unix timestamps and I need to get data from last 30 days for each specific day.
The data is in the following form:
{
"id":"648637",
"exchange_name":"BYBIT",
"exchange_icon_url":"https://cryptowisdom.com.au/wp-content/uploads/2022/01/Bybit-colored-logo.png",
"trade_time":"1675262081986",
"price_in_quote_asset":23057.5,
"price_in_usd":1,
"trade_value":60180.075,
"base_asset_icon":"https://assets.coingecko.com/coins/images/1/large/bitcoin.png?1547033579",
"qty":2.61,
"quoteqty":60180.075,
"is_buyer_maker":true,
"pair":"BTCUSDT",
"base_asset_trade":"BTC",
"quote_asset_trade":"USDT"
}
I need to truncate data based on trade_time
How do I write the query?
The secret sauce is the date_trunc function, which takes a timestamp with time zone and truncates it to a specific precision (hour, day, week, etc). You can then group based on this value.
In your case we need to convert these unix timestamps javascript style timestamps to timestamp with time zone first, which we can do with to_timestamp, but it's still a fairly simple query.
SELECT
date_trunc('day', to_timestamp(trade_time / 1000.0)),
COUNT(1)
FROM pings_raw
GROUP BY date_trunc('day', to_timestamp(trade_time / 1000.0))
Another approach would be to leave everything as numbers, which might be marginally faster, though I find it less readable
SELECT
(trade_time/(1000*60*60*24))::int * (1000*60*60*24),
COUNT(1)
FROM pings_raw
GROUP BY (trade_time/(1000*60*60*24))::int

Difference between two timestamps as timestamp across multiple days

I have two timestamps and I would like to have a result with the difference between them. I found a similar question asked here but I have noticed that:
select
to_char(column1::timestamp - column2::timestamp, 'HH:MS:SS')
from
table
Gives me an incorrect return if these timestamps cross multiple days. I know that I can use EPOCH to work out the number of hours/days/minutes/seconds etc but my use case requires the result as a timestamp (or a string...anything not an interval!).
In the case of multiple days I would like to continue counting the hours, even if it should go past 24. This would allow results like:
36:55:01
I'd use the built-in date_part function (as previously described in an older thread: How to convert an interval like "1 day 01:30:00" into "25:30:00"?) but finally cast the result to the type you desire:
SELECT
from_date,
to_date,
to_date - from_date as date_diff_interval,
(date_part('epoch', to_date - from_date) * INTERVAL '1 second')::text as date_diff_text
from (
(select
'2018-01-01 04:03:06'::timestamp as from_date,
'2018-01-02 16:58:07'::timestamp as to_date)
) as dates;
This results in the following:
I'm currently unaware of any way to convert this interval into a timestamp and also not sure whether there is a use for it. You're still dealing with an interval and you'd need a point of reference in time to transform that interval into an actual timestamp.

Newbie to HQL - Date conversions in Hive - extract Year from free flow text varchar100

I have a simple HQL statement which works. I want to be able to count the occurrence's by Year or by Month. The data Quality is not good,and the incorporationdate column is held as a varchar100 and contains free flowing text and nulls
So I cannot use Substring or YEAR as I need to only perform a extract on the format mm/dd/yyyy to pull out the Year or month. Ideally I would like to create a View and create 2 new Columns , one to show the year and one to show the month this would be the perfect scenario.
select
incorporationdate, count(incorporationdate) from default.chjp2
group by companynumber,incorporationdate
===================================================
Regards
JP
you could use if and test any condition you want before using substring:
select if(date is not null and date rlike '^\d{2}\/\d{2}\/\d{4}$', substr(7),null)

PostgreSQL 8.2 extract week number from a a date field

This might be a simple one but I haven't got a solution yet. I have a create_date field which is a date type, and a revenue number. I want to see weekly break down of revenue.
I can get the numbers easily in tableau because of built in functionality but doing it in PostgreSQL is where I need some help.
If you want the revenue by week, you'll need to group and aggregate:
select extract (week from create_date) as week, sum(revenue) from table group by week

Postgres - Convert Date Range to Individual Month

I have found similar help, but the issue was more complex, I am only good with the basics of SQL and am striking out here. I get a handful of columns a,b,c,startdate,enddate and i need to parse that data out into multiple rows depending on how many months are within the range.
Eg: a,b,c,1/1/2015, 3/15,2015 would become:
a,b,c,1/1/2015,value_here_doesnt_matter
a,b,c,2/1/2015,value_here_doesnt_matter
a,b,c,3/1/2015,value_here_doesnt_matter
Does not matter if the start date or end date is on a specific day, the only thing that matters is month and year. So if the range included any day in a given month, I'd want to output start days for each month in the range, with the 1st as a default day.
Could I have any advice on which direction to begin? I'm attempting generate_series, but am unsure if this is the right approach or how to make it work with keeping the data in the first few arbitrary columns consistent.
I think generate_series is the way to go. Without knowing what the rest of your data looks like, I would start with something like this:
select
a, b, c, generate_series(startdate, enddate, interval '1 month')::date
from
my_table