How to get 20 days before one date format:YYYYMMDD - date

How to get 20 days before one date format:YYYYMMDD? Function date_sub() seems not working.
for example get date 20 days before '20180912' in Hive.
I am using date_sub() in joining two tables by date.
select a.*,b.*
from table1
left join
table2
on
from_unixtime(unix_timestamp(date1,'yyyymmdd'))=date_sub(date2,20)
and return nothing.

The format you are using is wrong one. Upper/lower case matters a lot. Correct format is 'yyyyMMdd'.
Date_sub requires yyyy-MM-dd to work correctly, convert if necessary.
select date_sub(from_unixtime(unix_timestamp('20180912','yyyyMMdd')),20) ;
OK
2018-08-23
Cast as timestamp produces wrong result (1970 year), maybe it is an issue in my Hive version (1.2.1):
select cast(unix_timestamp('20180912','yyyyMMdd') as timestamp);
OK
1970-01-18 18:51:50.4
Use from_unixtime(unix_timestamp('20180912','yyyyMMdd')) for conversion, it works fine.

OK, date_sub can implement your requirement, like below:
select date_sub(CAST(unix_timestamp('20180912','yyyyMMdd') AS TIMESTAMP), 20) as date;
+-------------+--+
| date |
+-------------+--+
| 2018-08-23 |
+-------------+--+
And your join SQL could write like below:
select
a.*,
b.*
from
table1
left join
table2 on date1 = regexp_replace(cast(to_date(date_sub(CAST(unix_timestamp(cast(date2 as string),'yyyyMMdd') AS TIMESTAMP), 20)) as string),'-','')

Related

Netezza - find the first date of the prior quarter

I'm trying to get the first day of the previous quarter from today's date however I can't find logic for Netezza SQL.
For SQL Server I could use the following:
select dateadd(quarter, datediff(quarter, 0, getdate()) - 1, 0)
There doesn't appear to be an equivalent of datediff in Netezza, any suggestions would be greatly appreciated
=> select now(), (date_trunc('quarter', now()) - interval ('3 months'))::date as result;
NOW | RESULT
---------------------+------------
2022-09-02 13:09:05 | 2022-04-01
(1 row)
Based on a similar answer here I was able to adapt my code to the following:
WHERE
TABLE_A.DATE_FIELD BETWEEN (SELECT TO_DATE(TO_CHAR(TO_DATE(TO_CHAR(CURRENT_DATE, 'YYYYQ'),'YYYYQ')-1,'YYYYQ'),'YYYYQ')) AND (SELECT TO_DATE(TO_CHAR(CURRENT_DATE,'YYYYQ'),'YYYYQ'))

How to form a dynamic pivot table or return multiple values from GROUP BY subquery

I'm having some major issues with the following query formation:
I have projects with start and end dates
Name Start End
---------------------------------------
Project 1 2020-08-01 2020-09-10
Project 2 2020-01-01 2025-01-01
and I'm trying to count the monthly working days within each project with the following subquery
select datetrunc('month', days) as d_month, count(days) as d_count
from generate_series(greatest('2020-08-01'::date, p.start), least('2020-09-14'::date, p.end), '1 day'::interval) days
where extract(DOW from days) not IN (0, 6)
group by d_month
where p.start is from the aliased main query and the dates are hard-coded for now, this correctly gives me the following result:
{"d_month"=>2020-08-01 00:00:00 +0000, "d_count"=>21}
{"d_month"=>2020-09-01 00:00:00 +0000, "d_count"=>10}
However subqueries can't return multiple values. The date range for the query is dynamic, so I would either need to somehow return the query as:
Name Start End 2020-08-01 2020-09-01 ...
-------------------------------------------------------------------------
Project 1 2020-08-01 2020-09-10 21 8
Project 2 2020-01-01 2025-01-01 21 10
Or simply return the whole subquery as JSON, but it doesn't seem to working either.
Any idea on how to achieve this or whether there are simpler solutions for this?
The most correct solution would be to create an actual calendar table that holds every possible day of interest to your business and, at a minimum for your purpose here, marks work days.
Ideally you would have columns to hold fiscal quarters, periods, and weeks to match your industry. You would also mark holidays. Joining to this table makes these kinds of calculations a snap.
create table calendar (
ddate date not null primary key,
is_work_day boolean default true
);
insert into calendar
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from generate_series(
'2000-01-01'::timestamp,
'2099-12-31'::timestamp,
interval '1 day'
) as gs(ts);
Assuming a calendar table is not within scope, you can do this:
with bounds as (
select min(start) as first_start, max("end") as last_end
from my_projects
), cal as (
select ts::date as ddate,
extract(dow from ts) not in (0,6) as is_work_day
from bounds
cross join generate_series(
first_start,
last_end,
interval '1 day'
) as gs(ts)
), bymonth as (
select p.name, p.start, p.end,
date_trunc('month', c.ddate) as month_start,
count(*) as work_days
from my_projects p
join cal c on c.ddate between p.start and p.end
where c.is_work_day
group by p.name, p.start, p.end, month_start
)
select jsonb_object_agg(to_char(month_start, 'YYYY-MM-DD'), work_days)
|| jsonb_object_agg('name', name)
|| jsonb_object_agg('start', start)
|| jsonb_object_agg('end', "end") as result
from bymonth
group by name;
Doing a pivot from rows to columns in SQL is usually a bad idea, so the query produces json for you.

BigQuery - DATE_TRUNC error

trying to get the monthly aggregated data from Legacy table. Meaning date columns are strings:
amount date_create
100 2018-01-05
200 2018-02-03
300 2018-01-22
However, the command
Select DATE_TRUNC(DATE date_create, MONTH) as month,
sum(amount) as amount_m
from table
group by 1
Returns the following error:
Error: Syntax error: Expected ")" but got identifier "date_create"
Why does this query not run and what can be done to avoid the issue?
Thanks
It looks like you meant to cast date_create instead of using the DATE keyword (which is how you construct a literal value) there. Try this instead:
Select DATE_TRUNC(DATE(date_create), MONTH) as month,
sum(amount) as amount_m
from table
GROUP BY 1
I figured it out:
date_trunc(cast(date_create as date), MONTH) as Month
Another option for BigQuery Standard SQL - using PARSE_DATE function
#standardSQL
WITH `project.dataset.table` AS (
SELECT 100 amount, '2018-01-05' date_create UNION ALL
SELECT 200, '2018-02-03' UNION ALL
SELECT 300, '2018-01-22'
)
SELECT
DATE_TRUNC(PARSE_DATE('%Y-%m-%d', date_create), MONTH) AS month,
SUM(amount) AS amount_m
FROM `project.dataset.table`
GROUP BY 1
with result as
Row month amount_m
1 2018-01-01 400
2 2018-02-01 200
In practice - I prefer PARSE_DATE over CAST as former kind of documents expectation about data format
Try to add double quote to date_creat :
Select DATE_TRUNC('date_create', MONTH) as month,
sum(amount) as amount_m
from table
group by 1

CROSSTAB PostgreSQL - Alternative for PIVOT in Oracle

I'm migrating a query of Oracle pivot to PostgreSQL crosstab.
create table(cntry numeric,week numeric,year numeric,days text,day text);
insert into x_c values(1,15,2015,'DAY1','MON');
...
insert into x_c values(1,15,2015,'DAY7','SUN');
insert into x_c values(2,15,2015,'DAY1','MON');
...
values(4,15,2015,'DAY7','SUN');
I have 4 weeks with 28 rows like this in a table. My Oracle query looks like this:
SELECT * FROM(select * from x_c)
PIVOT (MIN(DAY) FOR (DAYS) IN
('DAY1' AS DAY1 ,'DAY2' DAY2,'DAY3' DAY3,'DAY4' DAY4,'DAY5' DAY5,'DAY6' DAY6,'DAY7' DAY7 ));
Result:
cntry|week|year|day1|day2|day3|day4|day4|day6|day7|
---------------------------------------------------
1 | 15 |2015| MON| TUE| WED| THU| FRI| SAT| SUN|
...
4 | 18 |2015| MON| ...
Now I have written a Postgres crosstab query like this:
select *
from crosstab('select cntry,week,year,days,min(day) as day
from x_c
group by cntry,week,year,days'
,'select distinct days from x_c order by 1'
) as (cntry numeric,week numeric,year numeric
,day1 text,day2 text,day3 text,day4 text, day5 text,day6 text,day7 text);
I'm getting only one row as output:
1|17|2015|MON|TUE| ... -- only this row is coming
Where am I doing wrong?
ORDER BY was missing in your original query. The manual:
In practice the SQL query should always specify ORDER BY 1,2 to ensure that the input rows are properly ordered, that is, values with the same row_name are brought together and correctly ordered within the row.
More importantly (and more tricky), crosstab() requires exactly one row_name column. Detailed explanation in this closely related answer:
Crosstab splitting results due to presence of unrelated field
The solution you found is to nest multiple columns in an array and later unnest again. That's needlessly expensive, error prone and limited (only works for columns with identical data types or you need to cast and possibly lose proper sort order).
Instead, generate a surrogate row_name column with rank() or dense_rank() (rnk in my example):
SELECT cntry, week, year, day1, day2, day3, day4, day5, day6, day7
FROM crosstab (
'SELECT dense_rank() OVER (ORDER BY cntry, week, year)::int AS rnk
, cntry, week, year, days, day
FROM x_c
ORDER BY rnk, days'
, $$SELECT unnest('{DAY1,DAY2,DAY3,DAY4,DAY5,DAY6,DAY7}'::text[])$$
) AS ct (rnk int, cntry int, week int, year int
, day1 text, day2 text, day3 text, day4 text, day5 text, day6 text, day7 text)
ORDER BY rnk;
I use the data type integer for out columns cntry, week, year because that seems to be the (cheaper) appropriate type. You can also use numeric like you had it.
Basics for crosstab queries here:
PostgreSQL Crosstab Query
I got this figured out from http://www.postgresonline.com/journal/categories/24-tablefunc
select year_wk_cntry.t[1],year_wk_cntry.t[2],year_wk_cntry.t[3],day1,day2,day3,day4,day5,day6,day7
from crosstab('select ARRAY[country :: numeric,week,year] as t,days,min(day) as day
from x_c group by country,week,year,days order by 1,2
','select distinct days from x_c order by 1')
as year_wk_cntry (t numeric[],day1 text,day2 text,day3 text,
day4 text, day5 text,day6 text,day7 text);
thanks!!

Select lines whose date-field is in a given month and year

My SQL table looks like this:
id (int) | date (date) | text1 (varchar) | text2 (varchar)
I want to select the lines whose date suits a given month and year, regardless of the day.
Both month and year are given in the select-statement as integers.
So the missing thing is the where-clause. Perhaps extract() is the thing I'm looking for, but I don't know how to use it with the two integers, e.g. 2011 and 02.
You can use extract:
SELECT * FROM yourtable
WHERE EXTRACT(month FROM "date") = 2
AND EXTRACT(year FROM "date") = 2011
But in this case you could also do this:
SELECT * FROM yourtable
WHERE "date" >= '2011-02-01' AND "date" < '2011-03-01'