Specific number of quarters from date using HiveQL - hiveql

I am trying to bring a specific number (8) of quarters using a traction date from table. the date format is YYYYMMDD
I could write a select using case to display specific quarter depending on current month.
I could find beginning of month using trunc function but could not find a logic to bring the last 8 quarters of data

Convert date to Hive format first.
Then use DENSE_RANK() to number rows by quarters (order by year desc and quarter desc) then filter by rnk<=8:
select * from
(
--calculate DENSE_RANK
select s.*, DENSE_RANK() over(order by year(your_date) desc, quarter(your_date) desc) as rnk
from
(
--convert date to YYYY-MM-DD format
select t.*, from_unixtime(unix_timestamp(),'yyyyMMdd') your_date
from table_name t
--Also restrict your dataset to select only few last years here
--because you do not need to scan all data
--so add WHERE clause here
)s
)s where rnk<=8;
See manual on functions here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
You can optimize this query knowing your data and how your table is partitioned, restrict the dataset. Also add partition by clause to the over() if you need to query last 8 quarters for each key.

Related

delete all but two sorted items postgresql

In my structure I have the following, I would like to keep (yellow) the most recent dates and delete the remaining? I don't necessary know the most recent date (ie 17/4/2021 and 10/2/2021 in my example) for each stock_id but I know I want to keep only the two most recent items.
Is that possible?
Thank you
Note: this assumes that dates do not repeat within each stock_id group in your table, so top two dates are always unique.
You can assign rank to each row within stock_id after ordering by date and delete rows where rank is greater than 2.
DELETE FROM mytable
WHERE (stock_id, date) NOT IN (
SELECT
stock_id,
date
FROM (
SELECT
stock_id,
date,
row_number() over (partition by stock_id order by date desc) as rank
FROM mytable
) ranks
WHERE rank <= 2
)

Get the first live record on each quarter

we have a pricing table and I need to get the first live record on each quarter, the table structure is like this:
record_id (int)
start_date (date)
price (decimal)
live (boolean)
I need to be able to get the first "live" record on each quarter.
So far, I've been able to do this:
SELECT DISTINCT EXTRACT(QUARTER FROM start_date::TIMESTAMP) as quarter,
EXTRACT(YEAR FROM start_date::TIMESTAMP) as year,
distinct start_date,
live
FROM record_pricing rp
group by year, quarter,record_instance_uid
order by year,quarter;
I get this:
As you can see there are live and not live records there in the results, I just need the first live record on each Q, as highlighted in the picture above as an example.
you can use:
SELECT *, ROW_NUMBER() OVER(PARTITION BY year,quarter order by start_date asc) as Rank,
FROM (
SELECT EXTRACT(QUARTER FROM start_date::TIMESTAMP) as quarter,
EXTRACT(YEAR FROM start_date::TIMESTAMP) as year,
record_instance_uid,live,start_date
FROM record_pricing rp
)Tab
where tab.Rank=1

HQL: Max date of previous month

Good morning,
I have a problem I've been trying to solve for but am getting now where.
I need to find the max date of the previous month. Normally I would just use the following to find the last day of the previous month: last_day(add_months(current_date, -1)
However, this particular data set doesn't always have the last day with data. E.g. Last day in the data for May was May 30th. Obviously if i try using the syntax above it would return no data because it would be looking for 5/31.
So is there a way to find the "max" day available in the data of the previous month? Or the month prior etc.?
For example like this (two scans of table: one in subquery to find max date and one in main query):
select *
from mytable
where as_of_date in (select max(as_of_date) from mytable where as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
Or (single scan + analytic function) like this
select col1 ... colN
from
(
select t.*, rank() over (partition by month (t.as_of_date) order by t.as_of_date desc) rnk
from mytable t
where --If you have partition on date, this WHERE may improve performance
t.as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
)s
where rnk=1

SQL SSRS aggregate fuctions

I am trying to figure out the aggregate functions in SQL SSRS to give me to sum of total sales for the given information by YEAR. I need to combine the year, the months within that year and provide the total sum of sales for that year. For example: for 2018 I need to combine month's 2-12 and provide the total sum, for 2019 combine 1-12 and provide total sum and so on.
enter image description here
I'm not sure where to begin on this one as I am new to SQL SSRS. Any help would be appreciated!
UPDATE:
Ideally I want this to be the end result:
id Year Price
102140 2019 ($XXXXX.XX)
102140 2018 ($XXXXX.XX)
102140 2017 ($XXXXX.XX)
And so on.
your query:
Select customer_id
, year_ordered
--, month_ordered
--, extended_price
--, SUM(extended_price) OVER (PARTITION BY year_ordered) AS year_total
, SUM(extended_price) AS year_total
From customer_order_history
Where customer_id = '101646'
Group By
customer_id
, year_ordered
, extended_price
--, month_ordered
Provides this:
enter image description here
multiple "years_ordered" because it is still using each month and that months SUM of price.
There are two approaches.
Do this in your dataset query:
SELECT Customer_id, year_ordered, SUM(extended_price) AS Price
FROM myTable
GROUP BY Customer_id, year_ordered
This option is best when you will never need the month values themselves in the report (i.e. you don't intend to have a drill down to the month data)
Do this in SSRS
By default you will get a RowGroup called "Details" (look under the main design area and you will row groups and column groups).
You can right-click this and add grouping for both customer_id and year_ordered. You can then change the extended_price textbox's value property to =SUM(Fields!extended_price.Value)
You could use a window function in your SQL:
select [year], [month], [price], SUM(PRICE) OVER (PARTITION BY year) as yearTotal
from myTable

How to reference a date range from another CTE in a where clause without joining to it?

I'm trying to write a query for Hive that uses the system date to determine both yesterday's date as well as the date 30 days ago. This will provide me with a rolling 30 days without the need to manually feed the date range to the query every time I run it.
I have that code working fine in a CTE. The problem I'm having is in referencing those dates in another CTE without joining the CTEs together, which I can't do since there's not a common field to join on.
I've tried various approaches but I get a "ParseException" every time.
WITH
date_range AS (
SELECT
CAST(from_unixtime(unix_timestamp()-30*60*60*24,'yyyyMMdd') AS INT) AS start_date,
CAST(from_unixtime(unix_timestamp()-1*60*60*24,'yyyyMMdd') AS INT) AS end_date
)
SELECT * FROM myTable
WHERE date_id BETWEEN (SELECT start_date FROM date_range) AND (SELECT end_date FROM date_range)
The intended result is the set of records from myTable that have a date_id between the start_date and end_date as found in the CTE date_range. Perhaps I'm going about this all wrong?
You can do a cross join, it does not require ON condition. Your date_range dataset is one row only, you can CROSS JOIN it with your_table if necessary and it will be transformed to a map-join (your small dataset will be broadcasted to all the mappers and loaded into each mapper memory and will work very fast), check the EXPLAIN command and make sure it is a map-join:
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=250000000;
WITH
date_range AS (
SELECT
CAST(from_unixtime(unix_timestamp()-30*60*60*24,'yyyyMMdd') AS INT) AS start_date,
CAST(from_unixtime(unix_timestamp()-1*60*60*24,'yyyyMMdd') AS INT) AS end_date
)
SELECT t.*
FROM myTable t
CROSS JOIN date_range d
WHERE t.date_id BETWEEN d.start_date AND d.end_date
Also instead if this you can calculate dates in the where clause:
SELECT t.*
FROM myTable t
CROSS JOIN date_range d
WHERE t.date_id
BETWEEN CAST(from_unixtime(unix_timestamp()-30*60*60*24,'yyyyMMdd') AS INT)
AND CAST(from_unixtime(unix_timestamp()-1*60*60*24,'yyyyMMdd') AS INT)