HQL: Max date of previous month - date

Good morning,
I have a problem I've been trying to solve for but am getting now where.
I need to find the max date of the previous month. Normally I would just use the following to find the last day of the previous month: last_day(add_months(current_date, -1)
However, this particular data set doesn't always have the last day with data. E.g. Last day in the data for May was May 30th. Obviously if i try using the syntax above it would return no data because it would be looking for 5/31.
So is there a way to find the "max" day available in the data of the previous month? Or the month prior etc.?

For example like this (two scans of table: one in subquery to find max date and one in main query):
select *
from mytable
where as_of_date in (select max(as_of_date) from mytable where as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
Or (single scan + analytic function) like this
select col1 ... colN
from
(
select t.*, rank() over (partition by month (t.as_of_date) order by t.as_of_date desc) rnk
from mytable t
where --If you have partition on date, this WHERE may improve performance
t.as_of_date between first_day(add_months(current_date, -1)) and last_day(add_months(current_date, -1))
)s
where rnk=1

Related

postgresql get weekly average of cases with daily data

I have a table called Table1. I am trying to get the weekly average, but I only have daily data. My table contains the following attributes: caseID, date, status and some other (irrelevant) attributes. With the following query, I made the following table which comes close to what I want:
However, I would like to add a average per week of the number of cases. I have look everywhere, but I am not sure how to include that. Has anybody any clues for how to add that.
Thanks.
To expand on #luuk's answer...
SELECT
date,
COUNT(id) as countcase,
EXTRACT(WEEK FROM date) AS weeknbr,
AVG(COUNT(id)) OVER (PARTITION BY EXTRACT(WEEK FROM date)) as weeklyavg
FROM table1
GROUP BY date, weeknbr
ORDER BY date, weeknbr
This is possible as the Aggregation / GROUP BY is applied before the window/analytic function.
select
date,
countcase,
extract(week from date) as weeknbr,
avg(countcase) over (partition by extract(week from date)) as weeklyavg
from table1;

delete all but two sorted items postgresql

In my structure I have the following, I would like to keep (yellow) the most recent dates and delete the remaining? I don't necessary know the most recent date (ie 17/4/2021 and 10/2/2021 in my example) for each stock_id but I know I want to keep only the two most recent items.
Is that possible?
Thank you
Note: this assumes that dates do not repeat within each stock_id group in your table, so top two dates are always unique.
You can assign rank to each row within stock_id after ordering by date and delete rows where rank is greater than 2.
DELETE FROM mytable
WHERE (stock_id, date) NOT IN (
SELECT
stock_id,
date
FROM (
SELECT
stock_id,
date,
row_number() over (partition by stock_id order by date desc) as rank
FROM mytable
) ranks
WHERE rank <= 2
)

Find time difference between two most recent orders

I am trying to estimate the time of a new order from repeat customers by finding the time difference between the most recent order and the second most recent order, and then adding that difference to the most recent order.
I have been trying limit and offset, but this returns a blanket date for every row. I am thinking I need to do a lateral join, but not sure how to implement it correctly. When I try to do it, I receive no output.
select public.orders.customer_id,
max(public.orders.created_at) as last_order_date,
(select created_at from public.orders group by created_at order by created_at desc limit 1 offset 1) as second_last
from public.orders
inner join
(select
customer_id, count(*)
from public.orders
where status = 'fulfilled'
group by public.orders.customer_id
having count(customer_id) >1) repeat_customers
on public.orders.customer_id = repeat_customers.customer_id
group by public.orders.customer_id;
I wanted the second_last field to be populated by the second most recent date for each customer_id, but the output is the second most recent date for the entire table, resulting in the same date for every entry.
For your second_last column you're not limiting it per customer, it will indeed find the max of everything just like the results you've seen. See the WHERE clause in the example below which should solve this:
(SELECT
created_at
FROM
public.orders po
WHERE
po.customer_id = customer_id
ORDER BY
created_at
LIMIT 1 OFFSET 1) AS second_last
I've also aliased the table because I wasn't sure if it would complain about ambiguity since the same table is mentioned in the main select.

Calculate best sale between several sellers

I'm using postgre .
Let's say there are 5 sellers .
Each month sale is recorded inside the database like this ( userId:6, january : 10000$, february:20000$ , march : 10000$ ... ,december:50000$, year :2018 )
I need to calculate , possibily with only one query, the best of each month sale in one array of this format : ( january : 15000$, february:30000$ , march : 40000$ , year :2018 ), i dont need the userId . I simply need to compare each sales per months and display the best amount ...
For now, i've got this code, who works well, givin me the user 6 sales per month on a given year :
SELECT date_trunc('month', date_vente) AS txn_month, sum(prix_vente) as monthly_sum,count(prix_vente) AS monthly_count
FROM crm_vente
WHERE 1=1
AND date_part('year', date_vente) = 2018
AND id_user = 6
GROUP BY txn_month ORDER BY txn_month
I wonder if somebody could tell me what kind of technology i could use to get the best of sales each 12 months between of the 5 employees .
COuld i use view ? SHould i better do a for loop in php, with each of the users sales per months, then do a kind of comparative array ?
No need to give me a full resolution, but maybe an advice on how to do, directly with postgre ? Because my only solution for now is to use php and to do a not nice code .
Nice day, ill check on MOnday
Sorry for my english
WITH monthly_sales AS (
SELECT
date_trunc('month', date_vente) AS txn_month,
user_id,
sum(prix_vente) as monthly_sum,
FROM crm_vente
WHERE 1=1
AND date_part('year', date_vente) = 2018
GROUP BY txn_month, user_id
ORDER BY txn_month, user_id),
rank_monthly_sales_by_user_id AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY txn_month ORDER BY monthly_sum DESC) AS rank
FROM monthly_sales)
SELECT
txn_month,
monthly_sum
FROM rank_monthly_sales_by_user_id
WHERE rank = 1
ORDER BY txn_month ASC;
Firstly what you should do is get the totals per month by user. This is the top subquery called monthly sales. Monthly_sales sums the sales of each user by month
Next, to get the top user for each month in terms of their total sales you have to rank the rows returned by the previous subquery. This is down by ROW_NUMBER()
ROW_NUMBER() gets the row number in a specified window, in this case it's ordering the rows from monthly_sales for each month (it starts ordering again from 1 each month). The PARTITION BY statement is the window in which we want to perform the row count, here it's month since we want to order our user_id's sales by month. The ORDER BY statement says how to order the rows from 1 to n. We're using monthly_sum in descending order. So the highest monthly sum is 1, lowest is 6
The next query is selecting only the rows from rank_monthly_sales_by_user_id that are the top sales for the month (WHERE rank = 1)
This leaves us with a output where is row is a month, with the highest sale for that month
Let me know if that was what you needed help with

Specific number of quarters from date using HiveQL

I am trying to bring a specific number (8) of quarters using a traction date from table. the date format is YYYYMMDD
I could write a select using case to display specific quarter depending on current month.
I could find beginning of month using trunc function but could not find a logic to bring the last 8 quarters of data
Convert date to Hive format first.
Then use DENSE_RANK() to number rows by quarters (order by year desc and quarter desc) then filter by rnk<=8:
select * from
(
--calculate DENSE_RANK
select s.*, DENSE_RANK() over(order by year(your_date) desc, quarter(your_date) desc) as rnk
from
(
--convert date to YYYY-MM-DD format
select t.*, from_unixtime(unix_timestamp(),'yyyyMMdd') your_date
from table_name t
--Also restrict your dataset to select only few last years here
--because you do not need to scan all data
--so add WHERE clause here
)s
)s where rnk<=8;
See manual on functions here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
You can optimize this query knowing your data and how your table is partitioned, restrict the dataset. Also add partition by clause to the over() if you need to query last 8 quarters for each key.