PostgreSQL: calculating multiple averages in one query

PostgreSQL: calculating multiple averages in one query - postgresql

I'd like to calculate an average amount of money spent per day for the past 7, 30, 90 and 180 days. I know how to do it using PL/pgSQL, but I'd prefer to do it in one query if possible. Something like this:
SELECT SUM(amount)/days
FROM transactions
WHERE created_at > CURRENT_DATE - ((days || ' day')::INTERVAL)
AND days = ANY(ARRAY[7,30,90,180]);
ERROR: column "days" does not exist

You can use unnest to convert the array into a table and the use a correlated subquery to calculate the average:
SELECT
days,
(SELECT SUM(amount)/days
FROM transactions
WHERE created_at > CURRENT_DATE - ((days || ' day')::INTERVAL)
) AS average
FROM unnest(ARRAY[7,30,90,180]) t(days)

Related

Calculations inside window function in PostgreSQL

I have a dataset of sales. To summarize, the structure is
client_id
date_purchase
There might be several purchases done by the same customer on different dates. There can also be several purchases done on the same date (by different or the same customer).
My goal is to get the number of customers, for any given day, that made 2 or more purchases between that day and 90 days prior.
That is, the expected output is
date_purchase
number_of_customers
2022-12-19
200
2022-12-18
194
(...)
Please note this calculates, for any given date, the number of customer with 2+ purchases between that date and 90 days prior.
I know it has something to do with a window function. But so far I have not found a way to calculate, for every window of 90 days, how many customers have done 2+ purchases.
I've tried several window functions with no success:
partition by date_purchase
range between interval '90 days' preceding and current row
So far I can't get to calculate correctly the number for each date.

Window function doesn't seem to be relevant here because there is no relationship between the rows of the same window. A simple query or a self-join query should provide the expected result.
Assuming that client_id and date_purchase are two columns of my_table :
1. Query for a given date reference_date :
SELECT a.reference_date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT reference_date , client_id
FROM my_table
WHERE date_purchase <= reference_date AND date_purchase >= reference_date - INTERVAL '90 days'
GROUP BY client_id
HAVING count(*) >= 2
) AS a
2. Query for a given interval of dates reference_date => reference_date + INTERVAL '20 days' :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN generate_series(reference_date, reference_date + INTERVAL '20 days', '1 day') AS ref(date)
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date
3. Query for all the date_purchase in mytable :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN (SELECT DISTINCT date_purchase AS date FROM my_table) AS ref
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date

Count and records from yesterday and add datecolumn next to it with yesterday's date in Bigquery, standardSQL

I've been able to get a sql running where I grab the count of all records from the day before.
SELECT count(*)
FROM mytable
WHERE date(ingest_time) >= (DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND
date(ingest_time) < (CURRENT_DATE());
Adding to the SQL above in Bigquery, how do I generate a date column next to it that shows that these records are from yesterday with the date.
Something like this:
1) 3000390 | 2019-11-13

Instead of SELECT count(*) use SELECT count(*), DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)

calculating last 5 years from current date in hive

I need to calculate some count based on the given time frame
I need to consider the dates between current date and last 5 years
select count(*) from table where (year(current_date) -year('2015-12-01')) < 5 ;
above query will give counts for last 5 years however it will consider only year part but I need exact counts considering days so if I write
select count(*) from table where datediff(current_date,final_dt) <= 1825 ;
it won't consider the leap years if any in the last 5 years
so Is there any function in hive to calculate exact difference between two dates consider scenarios like leap years?

Use add_months function (assuming the dates should go back to 2013-05-25 with the current date being 2018-05-25).
select count(*)
from table
where final_dt >= add_months(current_date,-60) and final_dt <= current_date

I think you are trying to calculate count(*) all records between current_date and a date which is 5 year in the past from current_date, in this case, you can do something like this:
SELECT count(*) FROM table_1 WHERE date_column BETWEEN current_date AND to_date(CONCAT(YEAR(current_date) - 5, '-', MONTH(current_date), '-', DAY(current_date)));
And SELECT datediff( current_date() ,to_date(CONCAT(YEAR(current_date) - 5, '-', MONTH(current_date), '-', DAY(current_date))));
gives you 1826 (considering the fact that 2016 is a leap year).

Monthly retention in Amazon redshift

I'm trying to calculate monthly retention rate in Amazon Redshift and have come up with the following query:
Query 1
SELECT EXTRACT(year FROM activity.created_at) AS Year,
EXTRACT(month FROM activity.created_at) AS Month,
COUNT(DISTINCT activity.member_id) AS active_users,
COUNT(DISTINCT future_activity.member_id) AS retained_users,
COUNT(DISTINCT future_activity.member_id) / COUNT(DISTINCT activity.member_id)::float AS retention
FROM ads.fbs_page_view_staging activity
LEFT JOIN ads.fbs_page_view_staging AS future_activity
ON activity.mongo_id = future_activity.mongo_id
AND datediff ('month',activity.created_at,future_activity.created_at) = 1
GROUP BY Year,
Month
ORDER BY Year,
Month
For some reason this query returns zero retained_users and zero retention. I'd appreciate any help regarding why this may be happening or maybe a completely different query for monthly retention would work.
I modified the query as per another SO post and here it goes:
Query 2
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', created_at)) OVER (PARTITION BY member_id
ORDER BY date_trunc('month', created_at))
= date_trunc('month', created_at) - interval '1 month'
OR NULL AS repeat_transaction
FROM ads.fbs_page_view_staging
WHERE created_at >= '2016-01-01'::date
AND created_at < '2016-04-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
This query gives me the following error:
An error occurred when executing the SQL command:
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('m...
[Amazon](500310) Invalid operation: Interval values with month or year parts are not supported
Details:
-----------------------------------------------
error: Interval values with month or year parts are not supported
code: 8001
context: interval months: "1"
query: 616822
location: cg_constmanager.cpp:145
process: padbmaster [pid=15116]
-----------------------------------------------;
I have a feeling that Query 2 would fare better than Query 1, so I'd prefer to fix the error on that.
Any help would be much appreciated.

Query 1 looks good. I tried similar one. See below. You are using self join on table (ads.fbs_page_view_staging) and the same column (created_at). Assuming mongo_id is unique, the datediff('month'....) will always return 0 and datediff ('month',activity.created_at,future_activity.created_at) = 1 will always be false.
-- Count distinct events of join_col_id that have lapsed for one month.
SELECT count(distinct E.join_col_id) dist_ct
FROM public.fact_events E
JOIN public.dim_table Z
ON E.join_col_id = Z.join_col_id
WHERE datediff('month', event_time, sysdate) = 1;
-- 2771654 -- dist_ct

Customize query of postgresql

I am using postgresql database, for i am trying to achieve like i have two queries and but i don't want to use multiple queries so is it possible to manage by single query ?
Query 1 :
select coalesce(sum("dummy"),0) as sum from generate_series ('2014-09-09 00:00:00'::timestamp,'2014-09-09 23:59:59','1 minute')
minutes(minute) LEFT JOIN report ON
minutes.minute=date_trunc('minute', report.fetchdate)
AND fetchdate >= '2014-09-09 00:00:00' AND fetchdate <= '2014-09-09 23:59:00'
AND entity_id ='0' group by minute order by minute
OUTPUT:
Total count of dummy field for each minutes of each day it means each day have total (24*60=1440) records
Note : This Query Using for single Day
Query2 :
select date(day)as day,coalesce(sum("dummy"),0) as sum from generate_series ('2014-09-06 00:00:01'::date,'2014-09-12 23:59:59'::date,'1 day'::interval) days(day) LEFT JOIN report ON days.day=date_trunc('day', report.fetchdate) AND entity_id ='0' group by day order by day
OUTPUT:
give total count of dummy field for each day between day 2014-09-06 to 2014-09-12 it means total 7 records (Date : 6,7,8,9,10,11,12)
Note :This Query using for more than 1 days
Required Output:
1) Need to see total count of dummy field of each day between specified date(Output of 2nd query)
2) Need to see maximum call of each day
Ex :
Suppose i am search by any two days then need to break in single date and get data for each minute of each date and whenever we have maximum count of dummy field of particular day then need to show as output maximum call for each day

select
date_trunc('day', minute) as day,
sum(minute_sum) as day_sum,
max(minute_sum) as max_minute_sum
from (
select
minute,
coalesce(sum("dummy"),0) as minute_sum
from
generate_series(
'2014-09-06'::timestamp,
'2014-09-13'::timestamp - interval '1 minute',
'1 minute'
) minutes(minute)
left join
report on
minutes.minute = date_trunc('minute', report.fetchdate)
and entity_id ='0'
group by minute
) s
group by 1
order by 1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

PostgreSQL: calculating multiple averages in one query - postgresql

You can use unnest to convert the array into a table and the use a correlated subquery to calculate the average: SELECT days, (SELECT SUM(amount)/days FROM transactions WHERE created_at > CURRENT_DATE - ((days || ' day')::INTERVAL) ) AS average FROM unnest(ARRAY[7,30,90,180]) t(days)

Related

Calculations inside window function in PostgreSQL

Count and records from yesterday and add datecolumn next to it with yesterday's date in Bigquery, standardSQL

calculating last 5 years from current date in hive

Monthly retention in Amazon redshift

Customize query of postgresql

Categories

Resources