I need to generate a series of months within a select

I need to generate a series of months within a select - postgresql

I work with PostgreSQL integrated into Redash, and am currently needing to do the following report:
generate active_month for each subscription in the database, following the rule: for each month in the last 12, if that month falls in between the subscription's creation and cancelation date, create a line with that month as 'active_month'
report should contain count of subscriptions on each active_month, filtered by unit
My current code uses a nested select and looks like the following:
select
unit,
active_month,
subscriptions
from
(
select
unit_id unit,
to_char(
date_trunc('month', created_at) :: timestamp :: date,
'YYYY-MM'
) active_month,
count(id) subscriptions
from
[DATABASE HERE]
where
created_at >= (timestamp '{{ today }}' - interval '12 months')
group by
created_at,
unit
order by
active_month,
unit_id
) inner_query
group by
active_month,
unit,
subscriptions
order by
active_month,
unit
However, this doesn't work, as:
It only takes into consideration the creation month, and not every month for which the subscription was active
It seems to be counting different dates from the same month as different fields, since there are multiple lines with the same unit and month on them
Any suggestions about how can I fix this?

Generate the Date Range for the months you are interested in (With CTE or sub-select) then use the element contained operator for appropriate date selection. See example, of course I just generated random data and it is dependent on my actually understanding what you are asking. Which is questionable!
with calendar (date_range) as
( select daterange ( (current_date-interval '12 months')::date
, current_date
, '[]'
)
) --select * calendar;
select
unit,
active_month,
subscriptions
from
(
select
unit_id unit,
to_char(
date_trunc('month', created_at) :: timestamp :: date,
'YYYY-MM'
) active_month,
count(id) subscriptions
from
DATABASE_HERE
where
created_at::date <# (select date_range from calendar) --<<<< change
group by
created_at,
unit
order by
active_month,
unit_id
) inner_query
group by
active_month,
unit,
subscriptions
order by
active_month,
unit;

Related

PostgreSQL - SQL function to loop through all months of the year and pull 10 random records from each

I am attempting to pull 10 random records from each month of this year using this query here but I get an error "ERROR: relation "c1" does not exist
"
Not sure where I'm going wrong - I think it may be I'm using Mysql syntax instead, but how do I resolve this?
My desired output is like this
Month
Another header
2021-01
random email 1
2021-01
random email 2
total of ten random emails from January, then ten more for each month this year (til November of course as Dec yet to happen)..
With CTE AS
(
Select month,
email,
Row_Number() Over (Partition By month Order By FLOOR(RANDOM()*(1-1000000+1))) AS RN
From (
SELECT
DISTINCT(TO_CHAR(DATE_TRUNC('month', timestamp ), 'YYYY-MM')) AS month
,CASE
WHEN
JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'name') = 'email'
THEN
JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'value')
END AS email
FROM form_submits_y2 fs
WHERE fs.website_id IN (791)
AND month LIKE '2021%'
GROUP BY 1,2
ORDER BY 1 ASC
)
)
SELECT *
FROM CTE C1
LEFT JOIN
(SELECT RN
,month
,email
FROM CTE C2
WHERE C2.month = C1.month
ORDER BY RANDOM() LIMIT 10) C3
ON C1.RN = C3.RN
ORDER By month ASC```

You can't reference an outer table inside a derived table with a regular join. You need to use left join lateral to make that work

I did end up finding a more elegant solution to my query here via this source from github :
SELECT
month
,email
FROM
(
Select month,
email,
Row_Number() Over (Partition By month Order By FLOOR(RANDOM()*(1-1000000+1))) AS RN
From (
SELECT
TO_CHAR(DATE_TRUNC('month', timestamp ), 'YYYY-MM') AS month
,CASE
WHEN JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'name') = 'email'
THEN JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'value')
END AS email
FROM form_submits_y2 fs
WHERE fs.website_id IN (791)
AND month LIKE '2021%'
GROUP BY 1,2
ORDER BY 1 ASC
)
) q
WHERE
RN <=10
ORDER BY month ASC

Is there a SQL code for cumulative count of SaaS customer over months?

I have a table with:
ID (id client), date_start (subscription of SaaS), date_end (could be a date value or be NULL).
So I need a cumulative count of active clients month by month.
any idea on how to write that in Postgres and achieve this result?
Starting from this, but I don't know how to proceed
select
date_trunc('month', c.date_start)::date,
count(*)
from customer

Please check next solution:
select
subscrubed_date,
subscrubed_customers,
unsubscrubed_customers,
coalesce(subscrubed_customers, 0) - coalesce(unsubscrubed_customers, 0) cumulative
from (
select distinct
date_trunc('month', c.date_start)::date subscrubed_date,
sum(1) over (order by date_trunc('month', c.date_start)) subscrubed_customers
from customer c
order by subscrubed_date
) subscribed
left join (
select distinct
date_trunc('month', c.date_end)::date unsubscrubed_date,
sum(1) over (order by date_trunc('month', c.date_end)) unsubscrubed_customers
from customer c
where date_end is not null
order by unsubscrubed_date
) unsubscribed on subscribed.subscrubed_date = unsubscribed.unsubscrubed_date;
share SQL query

You have a table of customers. With a start date and sometimes an end date. As you want to group by date, but there are two dates in the table, you need to split these first.
Then, you may have months where only customers came and others where only customers left. So, you'll want a full outer join of the two sets.
For a cumulative sum (also called a running total), use SUM OVER.
with came as
(
select date_trunc('month', date_start) as month, count(*) as cnt
from customer
group by date_trunc('month', date_start)
)
, went as
(
select date_trunc('month', date_end) as month, count(*) as cnt
from customer
where date_end is not null
group by date_trunc('month', date_end)
)
select
month,
came.cnt as cust_new,
went.cnt as cust_gone,
sum(came.cnt - went.cnt) over (order by month) as cust_active
from came full outer join went using (month)
order by month;

Filter duplicates on row_number results

I'm trying to make a query on PostgreSQL that gives me the top 10 jobs that take more time each month (excluding current month), I have made this query so far but it gives me duplicates on the job name. How can I filter these?
SELECT job, month, duration
FROM (
SELECT
month,
job,
duration,
ROW_NUMBER() OVER (PARTITION BY month ORDER BY duration DESC) AS RN
FROM
run_history
WHERE
owner = 'john'
) x
WHERE RN <= 10
AND month < TO_CHAR(CURRENT_DATE, 'yyyymm')

Sounds like there can be multiple rows per (owner, month, job) and you want to work with the maximum duration per month for each job.
If so, aggregate computing max(duration) first, then use row_number() on top of it:
SELECT job, month, max_duration
FROM (
SELECT month, job, max(duration) AS max_duration
, row_number() OVER (PARTITION BY month ORDER BY max(duration) DESC NULLS LAST) AS rn
FROM run_history
WHERE owner = 'john'
AND month < to_char(CURRENT_DATE, 'yyyymm')
GROUP BY month, job
) sub
WHERE rn <= 10
ORDER BY month DESC, rn;
Aside: consider integer or date instead of text for the column month: cleaner and more efficient.

PostgreSQL SELECT date before max(DATE)

I need to select the rows for which the difference between max(date) and the date just before max(date) is smaller than 366 days. I know about SELECT MAX(date) FROM table to get the last date from now, but how could I get the date before?
I would need a query of this kind:
SELECT code, MAX(date) - before_date FROM troncon WHERE MAX(date) - before_date < 366 ;
NB : before_date does not refer to anything and is to be replaced by a functionnal stuff.
Edit : Example of the table I'm testing it on:
CREATE TABLE troncon (code INTEGER, ope_date DATE) ;
INSERT INTO troncon (code, ope_date) VALUES
('C086000-T10001', '2014-11-11'),
('C086000-T10001', '2014-11-11'),
('C086000-T10002', '2014-12-03'),
('C086000-T10002', '2014-01-03'),
('C086000-T10003', '2014-08-11'),
('C086000-T10003', '2014-03-03'),
('C086000-T10003', '2012-02-27'),
('C086000-T10004', '2014-08-11'),
('C086000-T10004', '2013-12-30'),
('C086000-T10004', '2013-06-01'),
('C086000-T10004', '2012-07-31'),
('C086000-T10005', '2013-10-01'),
('C086000-T10005', '2012-11-01'),
('C086000-T10006', '2014-04-01'),
('C086000-T10006', '2014-05-15'),
('C086000-T10001', '2014-07-05'),
('C086000-T10003', '2014-03-03');
Many thanks!

The sub query contains all rows joined with the unique max date, and you select only ones which there differente with the max date is smaller than 366 days:
select * from
(
SELECT id, date, max(date) over(partition by code) max_date FROM your_table
) A
where max_date - date < interval '366 day'
PS: As #a_horse_with_no_name said, you can partition by code to get maximum_date for each code.

Join on generate_series and count

I'm trying to find the # users who did action A or action B on a monthly basis.
Table: User
- id
- "creationDate"
Table: action_A
- user_id (= user.id)
- "creationDate"
Table: action_B
- user_id (= user.id)
- "creationDate"
The general idea of what I was trying to do was that I'd find the list of users who did action A in Month X and the list of users who did action B in Month X, then count how many ids are there for every month based on a generate_series of monthly dates.
I tried the following, however, the query times out when running and I'm not sure if there's any way to optimize it (or if it is even correct).
SELECT monthseries."Month", count(*)
FROM
(SELECT to_char(DAY::date, 'YYYY-MM') AS "Month"
FROM generate_series('2014-01-01'::date, CURRENT_DATE, '1 month') DAY) monthseries
LEFT JOIN
(SELECT to_char("creationDate", 'YYYY-MM') AS "Month",
id
FROM action_A) did_action_A ON monthseries."Month" = did_action_A."Month"
LEFT JOIN
(SELECT to_char("creationDate", 'YYYY-MM') AS "Month",
id
FROM action_B) did_action_B ON monthseries."Month" = did_action_B."Month"
GROUP BY monthseries."Month"
Any comments/ help would be immensely helpful!

If you want to count distinct users:
select to_char(month, 'YYYY-MM') as "Month", count(*)
from
generate_series(
'2014-01-01'::date, current_date, '1 month'
) monthseries (month)
left join (
(
select distinct date_trunc('month', "creationDate") as month, id
from action_a
) a
full outer join (
select distinct date_trunc('month', "creationDate") as month, id
from action_b
) b using (month, id)
) s using (month)
group by 1
order by 1