PostgreSQL - SQL function to loop through all months of the year and pull 10 random records from each - postgresql

I am attempting to pull 10 random records from each month of this year using this query here but I get an error "ERROR: relation "c1" does not exist
"
Not sure where I'm going wrong - I think it may be I'm using Mysql syntax instead, but how do I resolve this?
My desired output is like this
Month
Another header
2021-01
random email 1
2021-01
random email 2
total of ten random emails from January, then ten more for each month this year (til November of course as Dec yet to happen)..
With CTE AS
(
Select month,
email,
Row_Number() Over (Partition By month Order By FLOOR(RANDOM()*(1-1000000+1))) AS RN
From (
SELECT
DISTINCT(TO_CHAR(DATE_TRUNC('month', timestamp ), 'YYYY-MM')) AS month
,CASE
WHEN
JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'name') = 'email'
THEN
JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'value')
END AS email
FROM form_submits_y2 fs
WHERE fs.website_id IN (791)
AND month LIKE '2021%'
GROUP BY 1,2
ORDER BY 1 ASC
)
)
SELECT *
FROM CTE C1
LEFT JOIN
(SELECT RN
,month
,email
FROM CTE C2
WHERE C2.month = C1.month
ORDER BY RANDOM() LIMIT 10) C3
ON C1.RN = C3.RN
ORDER By month ASC```

You can't reference an outer table inside a derived table with a regular join. You need to use left join lateral to make that work

I did end up finding a more elegant solution to my query here via this source from github :
SELECT
month
,email
FROM
(
Select month,
email,
Row_Number() Over (Partition By month Order By FLOOR(RANDOM()*(1-1000000+1))) AS RN
From (
SELECT
TO_CHAR(DATE_TRUNC('month', timestamp ), 'YYYY-MM') AS month
,CASE
WHEN JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'name') = 'email'
THEN JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'value')
END AS email
FROM form_submits_y2 fs
WHERE fs.website_id IN (791)
AND month LIKE '2021%'
GROUP BY 1,2
ORDER BY 1 ASC
)
) q
WHERE
RN <=10
ORDER BY month ASC

Related

Is there a SQL code for cumulative count of SaaS customer over months?

I have a table with:
ID (id client), date_start (subscription of SaaS), date_end (could be a date value or be NULL).
So I need a cumulative count of active clients month by month.
any idea on how to write that in Postgres and achieve this result?
Starting from this, but I don't know how to proceed
select
date_trunc('month', c.date_start)::date,
count(*)
from customer
Please check next solution:
select
subscrubed_date,
subscrubed_customers,
unsubscrubed_customers,
coalesce(subscrubed_customers, 0) - coalesce(unsubscrubed_customers, 0) cumulative
from (
select distinct
date_trunc('month', c.date_start)::date subscrubed_date,
sum(1) over (order by date_trunc('month', c.date_start)) subscrubed_customers
from customer c
order by subscrubed_date
) subscribed
left join (
select distinct
date_trunc('month', c.date_end)::date unsubscrubed_date,
sum(1) over (order by date_trunc('month', c.date_end)) unsubscrubed_customers
from customer c
where date_end is not null
order by unsubscrubed_date
) unsubscribed on subscribed.subscrubed_date = unsubscribed.unsubscrubed_date;
share SQL query
You have a table of customers. With a start date and sometimes an end date. As you want to group by date, but there are two dates in the table, you need to split these first.
Then, you may have months where only customers came and others where only customers left. So, you'll want a full outer join of the two sets.
For a cumulative sum (also called a running total), use SUM OVER.
with came as
(
select date_trunc('month', date_start) as month, count(*) as cnt
from customer
group by date_trunc('month', date_start)
)
, went as
(
select date_trunc('month', date_end) as month, count(*) as cnt
from customer
where date_end is not null
group by date_trunc('month', date_end)
)
select
month,
came.cnt as cust_new,
went.cnt as cust_gone,
sum(came.cnt - went.cnt) over (order by month) as cust_active
from came full outer join went using (month)
order by month;

Count distinct loop in sql

I am trying to pull unique active users before a date.
So specifically, I have a date range (let's say August - November) where I want to know the cumulative unique active users on or before a day within a month.
So, the pseudocode would look something like this:
SELECT COUNT(DISTINCT USERS) FROM USER_DB
WHERE
Month = [loop through months 8-11]
AND
DAY <= [day in loop of 1:31]
The output I desire is something Like this
step-by-step demo: db<>fiddle
SELECT
mydate,
SUM( -- 3
COUNT(DISTINCT username) -- 1, 2
) OVER (ORDER BY mydate) -- 3
FROM t
GROUP BY mydate -- 2
GROUP BY your date and count the users
Because you don't want to count ALL user accesses, but only one access per user and day, you need to add the DISTINCT
This is a window function. This one aggregates all counts which where previously done cumulatively.
If you want to get unique user over ALL days (count a user only on its first access) you can filter the users with a DISTINCT ON clause first:
demo: db<>fiddle
SELECT DISTINCT ON (username)
*
FROM t
ORDER BY username, mydate
This yields:
SELECT
mydate,
SUM(
COUNT(*)
) OVER (ORDER BY mydate)
FROM (
SELECT DISTINCT ON (username)
*
FROM t
ORDER BY username, mydate
) s
GROUP BY mydate

Filter duplicates on row_number results

I'm trying to make a query on PostgreSQL that gives me the top 10 jobs that take more time each month (excluding current month), I have made this query so far but it gives me duplicates on the job name. How can I filter these?
SELECT job, month, duration
FROM (
SELECT
month,
job,
duration,
ROW_NUMBER() OVER (PARTITION BY month ORDER BY duration DESC) AS RN
FROM
run_history
WHERE
owner = 'john'
) x
WHERE RN <= 10
AND month < TO_CHAR(CURRENT_DATE, 'yyyymm')
Sounds like there can be multiple rows per (owner, month, job) and you want to work with the maximum duration per month for each job.
If so, aggregate computing max(duration) first, then use row_number() on top of it:
SELECT job, month, max_duration
FROM (
SELECT month, job, max(duration) AS max_duration
, row_number() OVER (PARTITION BY month ORDER BY max(duration) DESC NULLS LAST) AS rn
FROM run_history
WHERE owner = 'john'
AND month < to_char(CURRENT_DATE, 'yyyymm')
GROUP BY month, job
) sub
WHERE rn <= 10
ORDER BY month DESC, rn;
Aside: consider integer or date instead of text for the column month: cleaner and more efficient.

How can I make the denominator a constant for each of the numbers in the same row in SQL?

I am trying to create a table with the average amount of sales divided by a cohort of users that signed up for an account in a certain month, however, I can only figure out to divide by the number of people that made a purchase in that specific month which is lower than the total amount of the cohort. How do I change the query below to make each of the avg_sucessful_transacted amounts divide by cohort 0 for each month?
thank you.
select sum (t.amount_in_dollars)/ count (distinct u.id) as Avg_Successful_Transacted, (datediff(month,[u.created:month],[t.createdon:month])) as Cohort, [u.created:month] as Months,
count (distinct u.id) as Users
from [transaction_cache as t]
left join [user_cache as u] on t.owner = u.id
where t.type = 'savings' and t.status = 'successful' and [u.created:year] > ['2017-01-01':date:year]
group by cohort, months
order by Cohort, Months
You will need to break out the cohort sizing into its own subquery or CTE in order to calculate the total number of distinct users who were created during the month which matches the cohort's basis month.
I approached this by bucketing users by the month they were created using the date_trunc('Month', <date>, <date>) function, but you may wish to approach it differently based on the specific business logic that generates your cohorts.
I don't work with Periscope, so the example query below is structured for pure Redshift, but hopefully it is easy to translate the syntax into Periscope's expected format:
WITH cohort_sizes AS (
SELECT date_trunc('Month', created)::DATE AS cohort_month
, COUNT(DISTINCT(id)) AS cohort_size
FROM user_cache u
GROUP BY 1
),
cohort_transactions AS (
SELECT date_trunc('Month', created)::DATE AS cohort_month
, createdon
, owner
, type
, status
, amount_in_dollars
, id
, created
FROM transaction_cache t
LEFT JOIN user_cache u ON t.owner = u.id
WHERE t.type = 'savings'
AND t.status = 'successful'
AND u.created > '2017-01-01'
)
SELECT SUM(t.amount_in_dollars) / s.cohort_size AS Avg_Successful_Transacted
, (datediff(MONTH, u.created, t.createdon)) AS Cohort
, u.created AS Months
, count(DISTINCT u.id) AS Users
FROM cohort_transactions t
JOIN cohort_sizes s ON t.cohort_month = s.cohort_month
LEFT JOIN user_cache AS u ON t.owner = u.id
GROUP BY s.cohort_size, Cohort, Months
ORDER BY Cohort, Months
;

Grouping SQL results by continous time intervals (oracle sql)

I have following data in the table as below and I am looking for a way to group the continuous time intervals for each id to return:
CREATE TABLE DUMMY
(
ID VARCHAR2(10 BYTE),
TIME_STAMP VARCHAR2(8 BYTE),
NAME VARCHAR2(255 BYTE)
);
SELECT ID, min(TIME_STAMP) "startDate", max(TIME_STAMP) "endDate", NAME
GROUP BY ID , NAME
something like
100 20011128 20011203 David
100 20011204 20011207 Unknown
100 20011208 20011215 David
100 20011216 20011220 Sara
and so on ...
ps. I have a sample script, but i don't know how to attach my file.
Hi every one here is more input:
There is only one record with time_stamp for a specific ID.
Users can be different, for example for day 1 David, day 2 unknown, day 3 David and so on.
So there is one row for every day of year for each ID but with different users.
Now, i want to see the break point, differences base on time_stamp intervals from day one
until last day for a specific ID in day order from begin day until last day.
Query Result should be :
ID NAME MIN_DATE MAX_DATE
100 David 20011128 20050407
100 Sara 20050408 20050417
100 David 20050418 20080416
100 Unknown 20080417 20080507
100 David 20080508 20080508
100 Unknown 20080509 20080607
100 David 20080608 20080608
100 Unknown 20080609 20080921
100 David 20080922 20080922
100 Unknown 20080923 20081231
100 David 20090101 20090405
thanks
Hi again, many thanks to everyone, i have solved the problem, here is the solution:
select id, min(time_stamp), max(time_stamp), name
from ( select id, time_stamp, name,
max(rn) over (order by time_stamp) grp
from ( select id, time_stamp, name,
case
when lag(name) over (order by time_stamp) <> name or
row_number() over (order by time_stamp) = 1
then row_number() over (order by time_stamp)
end rn
from dummy
)
)
group by id, grp, name
order by 1
Select
ID,
Name,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id,
Name
That should work.
IF you want the date range for each Id, but all the names you can do:
Select
d.Id,
d.Name,
dr.min_date,
dr.max_date
from
Dummy d
JOIN
(Select
Id,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id
) dr
on ( dr.Id = d.Id)