How can I make the denominator a constant for each of the numbers in the same row in SQL? - amazon-redshift

I am trying to create a table with the average amount of sales divided by a cohort of users that signed up for an account in a certain month, however, I can only figure out to divide by the number of people that made a purchase in that specific month which is lower than the total amount of the cohort. How do I change the query below to make each of the avg_sucessful_transacted amounts divide by cohort 0 for each month?
thank you.
select sum (t.amount_in_dollars)/ count (distinct u.id) as Avg_Successful_Transacted, (datediff(month,[u.created:month],[t.createdon:month])) as Cohort, [u.created:month] as Months,
count (distinct u.id) as Users
from [transaction_cache as t]
left join [user_cache as u] on t.owner = u.id
where t.type = 'savings' and t.status = 'successful' and [u.created:year] > ['2017-01-01':date:year]
group by cohort, months
order by Cohort, Months

You will need to break out the cohort sizing into its own subquery or CTE in order to calculate the total number of distinct users who were created during the month which matches the cohort's basis month.
I approached this by bucketing users by the month they were created using the date_trunc('Month', <date>, <date>) function, but you may wish to approach it differently based on the specific business logic that generates your cohorts.
I don't work with Periscope, so the example query below is structured for pure Redshift, but hopefully it is easy to translate the syntax into Periscope's expected format:
WITH cohort_sizes AS (
SELECT date_trunc('Month', created)::DATE AS cohort_month
, COUNT(DISTINCT(id)) AS cohort_size
FROM user_cache u
GROUP BY 1
),
cohort_transactions AS (
SELECT date_trunc('Month', created)::DATE AS cohort_month
, createdon
, owner
, type
, status
, amount_in_dollars
, id
, created
FROM transaction_cache t
LEFT JOIN user_cache u ON t.owner = u.id
WHERE t.type = 'savings'
AND t.status = 'successful'
AND u.created > '2017-01-01'
)
SELECT SUM(t.amount_in_dollars) / s.cohort_size AS Avg_Successful_Transacted
, (datediff(MONTH, u.created, t.createdon)) AS Cohort
, u.created AS Months
, count(DISTINCT u.id) AS Users
FROM cohort_transactions t
JOIN cohort_sizes s ON t.cohort_month = s.cohort_month
LEFT JOIN user_cache AS u ON t.owner = u.id
GROUP BY s.cohort_size, Cohort, Months
ORDER BY Cohort, Months
;

Related

PostgreSQL - SQL function to loop through all months of the year and pull 10 random records from each

I am attempting to pull 10 random records from each month of this year using this query here but I get an error "ERROR: relation "c1" does not exist
"
Not sure where I'm going wrong - I think it may be I'm using Mysql syntax instead, but how do I resolve this?
My desired output is like this
Month
Another header
2021-01
random email 1
2021-01
random email 2
total of ten random emails from January, then ten more for each month this year (til November of course as Dec yet to happen)..
With CTE AS
(
Select month,
email,
Row_Number() Over (Partition By month Order By FLOOR(RANDOM()*(1-1000000+1))) AS RN
From (
SELECT
DISTINCT(TO_CHAR(DATE_TRUNC('month', timestamp ), 'YYYY-MM')) AS month
,CASE
WHEN
JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'name') = 'email'
THEN
JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'value')
END AS email
FROM form_submits_y2 fs
WHERE fs.website_id IN (791)
AND month LIKE '2021%'
GROUP BY 1,2
ORDER BY 1 ASC
)
)
SELECT *
FROM CTE C1
LEFT JOIN
(SELECT RN
,month
,email
FROM CTE C2
WHERE C2.month = C1.month
ORDER BY RANDOM() LIMIT 10) C3
ON C1.RN = C3.RN
ORDER By month ASC```
You can't reference an outer table inside a derived table with a regular join. You need to use left join lateral to make that work
I did end up finding a more elegant solution to my query here via this source from github :
SELECT
month
,email
FROM
(
Select month,
email,
Row_Number() Over (Partition By month Order By FLOOR(RANDOM()*(1-1000000+1))) AS RN
From (
SELECT
TO_CHAR(DATE_TRUNC('month', timestamp ), 'YYYY-MM') AS month
,CASE
WHEN JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'name') = 'email'
THEN JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'value')
END AS email
FROM form_submits_y2 fs
WHERE fs.website_id IN (791)
AND month LIKE '2021%'
GROUP BY 1,2
ORDER BY 1 ASC
)
) q
WHERE
RN <=10
ORDER BY month ASC

Get an average monthly view of active members (Postgresql)

I am working with members data. I have the responsible Coach, the coachee entry, exit status and date. Because some coachees might graduate/leave during a month I want to calculate a daily number and then get a monthly average of active members for each coach. That means that I need to take in the account all coachees from previous months, that are still active that current month. This is my data:
I am thinking of creating a variable first where I can get the daily active member count for each coach. This is my first approach:
with all_years as (
select y.year, m.month, d.day
from generate_series(2019, 2022) as y(year)
cross join generate_series(1, 12) as m(month)
cross join generate_series(1, 31) as d(day) --<<*not sure how to adjust for days with less than 31 days??*
select ay.*, coach, coachee, entry_status, entry_date, exit_reason, exit_date, sum(count) over (partition by ay.coach order by ay.year, ay.month, ay.day)
from all_years ay
left join table t
on --.... *not sure what I can join on in this case*;
I am open to an easier approach, this logic is just an idea.
You can cross join the list of distinct coaches with all dates to generat combinations, then bring the table with a left join:
select d.dt, c.coach, count(t.coach) no_coachees
from (select distinct coach from mytable) c
cross join generate_series('2019-01-01'::date, '2022-12-31'::date, '1 day':: interval) d(dt)
left join mytable t on t.coach = c.coach and t.entry_date <= d.dt and t.exit_date > d.dt
group by d.dt, c.coach
Then you can use another level of aggregation to get the monthly average:
select date_trunc('month', d.dt) d_month, coach, avg(no_coachees) avg_coaches
from (
select d.dt, c.coach, count(t.coach) no_coachees
from (select distinct coach from mytable) c
cross join generate_series('2019-01-01'::date, '2022-12-31'::date, '1 day':: interval) d(dt)
left join mytable t on t.coach = c.coach and t.entry_date <= d.dt and t.exit_date > d.dt
group by d.dt, c.coach
) t
group by date_trunc('month', d.dt), coach

subquery problem - need to get avg of a sum

I have 2 tables
sales table
weekly sales, store, date
store table
store, type, size
my sales table has multiple years, multiple stores and multiple types. I'm trying to get the avg sales by sqft for each store type per year. I have a sub query that shows the sales by sqft for each store but Im having trouble then rolling it up into my main query to get the avg by type
Anything jumps out with my final query?
SELECT
date_part('year', sales.date) AS year,
stores.type,
AVG(sales_by_sqft)
FROM
(SELECT
SUM((sales.weekly_sales)/stores.size) AS sales_by_sqft
FROM SALES
INNER JOIN stores ON sales.store = stores.store
GROUP BY sales.store) AS sq
FROM sales
INNER JOIN stores ON sales.store = stores.store
WHERE date_part('year', date) = 2012
GROUP BY year, stores.type;
getting a syntax error on the second FROM statement
I figured it out. AVG doesn't work on money. Once I changed that data type to integer, it all fell in place
SELECT
year,
type,
ROUND(AVG(sales_by_sqft),2)AS avg_sales_by_sqft
FROM
(SELECT
date_part('year', sales.date) AS year,
stores.type,
sales.store,
stores.size,
SUM(sales.weekly_sales) AS total_sales,
SUM(sales.weekly_sales)/ AVG(stores.size) AS sales_by_sqft
FROM sales
INNER JOIN stores ON sales.store = stores.store
GROUP BY year, stores.type, sales.store, stores.size) AS sq
GROUP BY 1,2
ORDER BY 1,3 DESC;

Correlated subquery in Postgres

I have a query like below to find the stock details of certain products.The query is working fine but i think it is not efficient and fast enough(DB: postgresql version 11).
There is a CTE "result_set"in this code where i need to find the "quantity of a product ordered"(qty_last_7d_from_oos_date) during the period between out of stock and last 7 days before out of stock date.Same like this i have to find the revenue also.
So what i did is wrote a same subquery two times one outputting the revenue and other the quantity which is not an efficient step.So someone have any suggestions on how to rewrite this and make it an efficient code.
WITH final as
(
SELECT product_id,product_name,item_sku,out_of_stock_at
,out_of_stock_at - INTERVAL '7 days' as previous_7_days
,back_in_stock_at
FROM oos_base
)
SELECT product_id,product_name,item_sku,out_of_stock_at,previous_7_days
,back_in_stock_at
,(SELECT coalesce(sum(i.qty_ordered), 0) AS qty_last_7d_from_oos_date
FROM ol.orders o
LEFT JOIN ol.items i ON i.order_id = o.order_id
LEFT JOIN ol.products p ON p.product_id = i.product_id AND i.store_id = p.store_id
WHERE o.order_state_2 IN('complete','processing')
AND f.product_id=p.product_id
AND o.created_at_order :: DATE BETWEEN f.previous_7_days::DATE AND COALESCE(f.out_of_stock_at::DATE,current_date)
)
,( SELECT coalesce(sum(i.row_amount_minus_discount_order), 0) AS rev_last_7d_from_oos_date
FROM ol.orders o
LEFT JOIN ol.items i ON i.order_id = o.order_id
LEFT JOIN ol.products p ON p.product_id = i.product_id AND i.store_id = p.store_id
WHERE o.order_state_2 IN('complete','processing')
AND f.product_id=p.product_id
AND o.created_at_order :: DATE BETWEEN f.previous_7_days::DATE AND COALESCE(f.out_of_stock_at::DATE,current_date)
)
FROM final f
In the above code the CTE "final" gives you two dates "out_of_stock_at" &
"previous_7_days". I want to find the quantity and revenue of a product based on this 2 dates means between "previous_7_days" & "out_of_stock_at".
Below query will give the quantity and revenue of the products but the period between "previous_7_days" & "out_of_stock_at"from the above CTE.
As of now i have used the below code two times to obtain the information of revenue and quantity.
SELECT coalesce(sum(i.qty_ordered), 0) AS qty ,
coalesce(sum(i.row_amount_minus_discount_order), 0)
FROM ol.orders o
LEFT JOIN ol.items i ON i.order_id = o.order_id
LEFT JOIN ol.products p ON p.product_id = i.product_id AND i.store_id = p.store_id
WHERE o.order_state_2 IN('complete','processing')
AND f.product_id=p.product_id
AND o.created_at_order :: DATE BETWEEN f.previous_7_days::DATE AND COALESCE(f.out_of_stock_at::DATE,current_date)

How do you organize this query by week

Here is my Query so far:
select one.week, total, comeback, round(comeback)::Numeric / total::numeric * 100 as comeback_percent
FROM
(
SELECT count(username) as total, week
FROM
(
select row_number () over (partition by u.id order by creation_date) as row, username, date_trunc ('month', creation_date)::date AS week
FROM users u
left join entries e on u.id = e.user_id
where ((entry_type = 0 and distance >= 1) or (entry_type = 1 and seconds_running >= 600))
) x
where row = 1
group by week
order by week asc
) one
join
(
SELECT count(username) as comeback, week
FROM
(
select row_number () over (partition by u.id order by creation_date) as row, username, runs_completed, date_trunc ('month', creation_date)::date AS week
FROM entries e
left join users u on e.user_id = u.id
where ((entry_type = 0 and distance >= 1) or (entry_type = 1 and seconds_running >= 600))
) y
where runs_completed > 1 and row = 1
group by week
order by week asc
) two
on one.week = two.week
What I want to accomplish, is return a line graph for users that have completed one run with us, grouped by week, and assign percentages for that week of anyone who has completed a second run EVER, not just within that week. Our funnel has improved by a factor of 5 since we started, yet the line graph that is produced does not show similar results.
I could be incorrectly joining them together, or there may be a cleaner way to use CTE or window functions to perform this query, I am open to any and all suggestions. Thanks!
If you need tables or further information, let me know. I'm happy to provide anything that may be needed.