Join and aggregate on two columns - for every month even months with no data? - tsql

Using SQL Server 2005.
I have a table with calendar months
Month, fiscalorder
june,1
july,2
..
may,12
And another table with employees and a repeating monthly amount
employee, month, amount
john, july, 10
john, july, 3
john, august,2
mary, june, 2
mary, feb, 5
I need to join and aggregate these by month, but every month (even months without data) to report for every employe, but employee then fiscal order.
Output:
june, john, 0
july, john, 13
august,john,2
sept, john, 0
..
june,mary,2

Assuming Sql Server 2005+
Declare #CalenderMonths Table ([Month] Varchar(20),FiscalOrder Int)
Insert Into #CalenderMonths Values
('June',1),('July',2),('August',3),('September',4),('October',5),('November',6),
('December',7),('January',8),('February',9),('March',10),('April',11),('May', 12)
Declare #Employee Table(employee varchar(50), [month] Varchar(20), amount int )
Insert Into #Employee Values('john', 'July', 10),('john', 'July',3),('john','August',2),('mary','June',2),('mary', 'February',5)
;with cte as
(
Select employee,[month],TotalAmount = sum(amount)
from #Employee
group by employee,[month]
)
select x.[Month],x.employee,amount = coalesce(c.TotalAmount,0)
from (
select distinct c.[Month],e.employee
from #CalenderMonths c cross join cte e)x
left join cte c on x.[Month] = c.[Month] and x.employee = c.employee
order by 2

SELECT month,employee,SUM(amount) amount
FROM(
SELECT m.month, e.employee, ISNULL(s.amount, 0) AS amount
FROM dbo.months AS m
CROSS JOIN (SELECT DISTINCT employee FROM dbo.sales) AS e
LEFT JOIN dbo.sales AS s
ON s.employee = e.employee
AND m.month = s.month
)X
GROUP BY month, employee

Related

PostgreSQL - SQL function to loop through all months of the year and pull 10 random records from each

I am attempting to pull 10 random records from each month of this year using this query here but I get an error "ERROR: relation "c1" does not exist
"
Not sure where I'm going wrong - I think it may be I'm using Mysql syntax instead, but how do I resolve this?
My desired output is like this
Month
Another header
2021-01
random email 1
2021-01
random email 2
total of ten random emails from January, then ten more for each month this year (til November of course as Dec yet to happen)..
With CTE AS
(
Select month,
email,
Row_Number() Over (Partition By month Order By FLOOR(RANDOM()*(1-1000000+1))) AS RN
From (
SELECT
DISTINCT(TO_CHAR(DATE_TRUNC('month', timestamp ), 'YYYY-MM')) AS month
,CASE
WHEN
JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'name') = 'email'
THEN
JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'value')
END AS email
FROM form_submits_y2 fs
WHERE fs.website_id IN (791)
AND month LIKE '2021%'
GROUP BY 1,2
ORDER BY 1 ASC
)
)
SELECT *
FROM CTE C1
LEFT JOIN
(SELECT RN
,month
,email
FROM CTE C2
WHERE C2.month = C1.month
ORDER BY RANDOM() LIMIT 10) C3
ON C1.RN = C3.RN
ORDER By month ASC```
You can't reference an outer table inside a derived table with a regular join. You need to use left join lateral to make that work
I did end up finding a more elegant solution to my query here via this source from github :
SELECT
month
,email
FROM
(
Select month,
email,
Row_Number() Over (Partition By month Order By FLOOR(RANDOM()*(1-1000000+1))) AS RN
From (
SELECT
TO_CHAR(DATE_TRUNC('month', timestamp ), 'YYYY-MM') AS month
,CASE
WHEN JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'name') = 'email'
THEN JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (form_data,0),'value')
END AS email
FROM form_submits_y2 fs
WHERE fs.website_id IN (791)
AND month LIKE '2021%'
GROUP BY 1,2
ORDER BY 1 ASC
)
) q
WHERE
RN <=10
ORDER BY month ASC

Is there a SQL code for cumulative count of SaaS customer over months?

I have a table with:
ID (id client), date_start (subscription of SaaS), date_end (could be a date value or be NULL).
So I need a cumulative count of active clients month by month.
any idea on how to write that in Postgres and achieve this result?
Starting from this, but I don't know how to proceed
select
date_trunc('month', c.date_start)::date,
count(*)
from customer
Please check next solution:
select
subscrubed_date,
subscrubed_customers,
unsubscrubed_customers,
coalesce(subscrubed_customers, 0) - coalesce(unsubscrubed_customers, 0) cumulative
from (
select distinct
date_trunc('month', c.date_start)::date subscrubed_date,
sum(1) over (order by date_trunc('month', c.date_start)) subscrubed_customers
from customer c
order by subscrubed_date
) subscribed
left join (
select distinct
date_trunc('month', c.date_end)::date unsubscrubed_date,
sum(1) over (order by date_trunc('month', c.date_end)) unsubscrubed_customers
from customer c
where date_end is not null
order by unsubscrubed_date
) unsubscribed on subscribed.subscrubed_date = unsubscribed.unsubscrubed_date;
share SQL query
You have a table of customers. With a start date and sometimes an end date. As you want to group by date, but there are two dates in the table, you need to split these first.
Then, you may have months where only customers came and others where only customers left. So, you'll want a full outer join of the two sets.
For a cumulative sum (also called a running total), use SUM OVER.
with came as
(
select date_trunc('month', date_start) as month, count(*) as cnt
from customer
group by date_trunc('month', date_start)
)
, went as
(
select date_trunc('month', date_end) as month, count(*) as cnt
from customer
where date_end is not null
group by date_trunc('month', date_end)
)
select
month,
came.cnt as cust_new,
went.cnt as cust_gone,
sum(came.cnt - went.cnt) over (order by month) as cust_active
from came full outer join went using (month)
order by month;

How to find the first and last date prior to a particular date in Postgresql?

I am a SQL beginner. I have trouble on finding the answer of this question
For each customer_id who made an order on January 1, 2006, what was their historical (prior to January 1, 2006) first and last order dates?
I've tried to solve it using a subquery. But I don't know how to find the first and last order dates prior to Jan 1.
Columns of table A:
customer_id
order_id
order_date
revenue
product_id
Columns of table B:
product_id
category_id
SELECT customer_id, order_date FROM A
(
SELECT customer_id FROM A
WHERE order_date = ‘2006-01-01’
)
WHERE ...
There are two subqueries actually. First for "For each customer_id who made an order on January 1, 2006" and second for "their historical (prior to January 1, 2006) first and last order dates"
So, first:
select customer_id from A where order_date = '2006-01-01';
and second:
select customer_id, min(order_date) as first_date, max(order_date) as last_date
from A
where order_date < '2006-01-01' group by customer_id;
Finally you need to get only those customers from second subquery who exists in the first one:
select customer_id, min(order_date) as first_date, max(order_date) as last_date
from A as t1
where
order_date < '2006-01-01' and
customer_id in (
select customer_id from A where order_date = '2006-01-01')
group by customer_id;
or, could be more efficient:
select customer_id, min(order_date) as first_date, max(order_date) as last_date
from A as t1
where
order_date < '2006-01-01' and
exists (
select 1 from A as t2
where t1.customer_id = t2.customer_id and t2.order_date = '2006-01-01')
group by customer_id;
You can use conditionals in aggregate functions:
SELECT customer_id, MIN(order_date) AS first, MAX(order_date) AS last FROM A
WHERE customer_id IN (SELECT customer_id FROM A WHERE order_date = ‘2006-01-01’) AND order_date < '2006-01-01'
GROUP BY customer_id;

Cumulative sum with group by and join

I'm a little struggled with finding a clean way to do this. Assume that I have the following records in my table named Records:
|Name| |InsertDate| |Size|
john 30.06.2015 1
john 10.01.2016 10
john 12.01.2016 100
john 05.03.2016 1000
doe 01.01.2016 1
How do I get the records for year of 2016 and month is equal to or less than 3 grouped by month(even that month does not exists e.g. month 2 in this case) with cumulative sum of Size including that month? I want to get the result as the following:
|Name| |Month| |Size|
john 1 111
john 2 111
john 3 1111
doe 1 1
As other commenters have already stated, you simply need a table with dates in that you can join from to give you the dates that your source table does not have records for:
-- Build the source data table.
declare #t table(Name nvarchar(10)
,InsertDate date
,Size int
);
insert into #t values
('john','20150630',1 )
,('john','20160110',10 )
,('john','20160112',100 )
,('john','20160305',1000)
,('doe' ,'20160101',1 );
-- Specify the year you want to search for by storing the first day here.
declare #year date = '20160101';
-- This derived table builds a set of dates that you can join from.
-- LEFT JOINing from here is what gives you rows for months without records in your source data.
with Dates
as
(
select #year as MonthStart
,dateadd(day,-1,dateadd(month,1,#year)) as MonthEnd
union all
select dateadd(month,1,MonthStart)
,dateadd(day,-1,dateadd(month,2,MonthStart))
from Dates
where dateadd(month,1,MonthStart) < dateadd(yyyy,1,#year)
)
select t.Name
,d.MonthStart
,sum(t.Size) as Size
from Dates d
left join #t t
on(t.InsertDate <= d.MonthEnd)
where d.MonthStart <= '20160301' -- Without knowing what your logic is for specifying values only up to March, I have left this part for you to automate.
group by t.Name
,d.MonthStart
order by t.Name
,d.MonthStart;
If you have a static date reference table in your database, you don't need to do the derived table creation and can just do:
select d.DateValue
,<Other columns>
from DatesReferenceTable d
left join <Other Tables> o
on(d.DateValue = o.AnyDateColumn)
etc
Here's another approach that utilizes a tally table (aka numbers table) to create the date table. Note my comments.
-- Build the source data table.
declare #t table(Name nvarchar(10), InsertDate date, Size int);
insert into #t values
('john','20150630',1 )
,('john','20160110',10 )
,('john','20160112',100 )
,('john','20160305',1000)
,('doe' ,'20160101',1 );
-- A year is fine, don't need a date data type
declare #year smallint = 2016;
WITH -- dummy rows for a tally table:
E AS (SELECT E FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(e)),
dateRange(totalDays, mn, mx) AS -- Get the range and number of months to create
(
SELECT DATEDIFF(MONTH, MIN(InsertDate), MAX(InsertDate)), MIN(InsertDate), MAX(InsertDate)
FROM #t
),
iTally(N) AS -- Tally Oh! Create an inline Tally (aka numbers) table starting with 0
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1))-1
FROM E a CROSS JOIN E b CROSS JOIN E c CROSS JOIN E d
),
RunningTotal AS -- perform a running total by year/month for each person (Name)
(
SELECT
yr = YEAR(DATEADD(MONTH, n, mn)),
mo = MONTH(DATEADD(MONTH, n, mn)),
Name,
Size = SUM(Size) OVER
(PARTITION BY Name ORDER BY YEAR(DATEADD(MONTH, n, mn)), MONTH(DATEADD(MONTH, n, mn)))
FROM iTally
CROSS JOIN dateRange
LEFT JOIN #t ON MONTH(InsertDate) = MONTH(DATEADD(MONTH, n, mn))
WHERE N <= totalDays
) -- Final output will only return rows where the year matches #year:
SELECT
name = ISNULL(name, LAG(Name, 1) OVER (ORDER BY yr, mo)),
yr, mo,
size = ISNULL(Size, LAG(Size, 1) OVER (ORDER BY yr, mo))
FROM RunningTotal
WHERE yr = #year
GROUP BY yr, mo, name, size;
Results:
name yr mo size
---------- ----------- ----------- -----------
doe 2016 1 1
john 2016 1 111
john 2016 2 111
john 2016 3 1111

Insert 'Dummy Rows' In SQL with aggregate functions

I have a table that includes the following fields:
Company
fruit
date
qty
price
total_price
I have a query that summarizes the data:
SELECT
company,
extract(year from date) as year,
case
when EXTRACT(MONTH FROM DATE) <= 3
THEN 'Q1'
WHEN EXTRACT(MONTH FROM DATE) BETWEEN 4 AND 6
THEN 'Q2'
WHEN EXTRACT(MONTH FROM DATE) BETWEEN 7 AND 9
THEN 'Q3'
WHEN EXTRACT(MONTH FROM DATE) BETWEEN 10 AND 12
THEN 'Q4'
ELSE 'UNKNOWN'
END AS QUARTER,
COUNT(*) AS purchases,
SUM(QTY) AS total_fruit_purchased,
SUM(total_price) AS COMISSION
FROM fees.fruit_paid
where extract(year from date) >= 2015
GROUP BY
company,
extract(year from date),
case
when EXTRACT(MONTH FROM DATE) <= 3
THEN 'Q1'
WHEN EXTRACT(MONTH FROM DATE) BETWEEN 4 AND 6
THEN 'Q2'
WHEN EXTRACT(MONTH FROM DATE) BETWEEN 7 AND 9
THEN 'Q3'
WHEN EXTRACT(MONTH FROM DATE) BETWEEN 10 AND 12
THEN 'Q4'
ELSE 'UNKNOWN'
END
ORDER BY
company,
YEAR,
QUARTER
How would I go about including dummy rows into the output if there is no data for a certain company for a year & quarter in my time frame?
It's a little complicated, so let me step you through it. First, you need a list of companies, including companies that don't have purchases. You can get that with:
SELECT DISTINCT company
FROM fruit_paid;
And you could just use that to get a summary for all companies and all time, with something like this:
WITH
company_names as
(SELECT DISTINCT company
FROM fruit_paid)
SELECT company_names.company, sum(fruit_paid.qty) as total_fruit_purchased
FROM company_names
LEFT OUTER JOIN fruit_paid
ON company_names.company = fruit_paid.company
GROUP BY company_names.company;
Two things going on there: 1- I'm taking the earlier query ("SELECT DISTINCT ..."), and making it something called a common table expression -- it just means I can use that query like a table. 2- I'm using a LEFT OUTER JOIN, which will use the left table to define the rows, and join in everything from the right table where it can.
That's the core of what's needed to get this done. You can get a list of years with the generate_series function. Something like:
SELECT extract(YEAR from generate_series) as year
FROM generate_series('2015-01-01'::date,
'2016-01-01'::date,
'1 year');
That you can turn around and add to the previous query:
WITH
company_names as
(SELECT DISTINCT company
FROM fruit_paid),
years as
(SELECT extract(YEAR from generate_series) as year
FROM generate_series('2015-01-01'::date,
'2016-01-01'::date,
'1 year'))
SELECT company_names.company,
years.year,
sum(fruit_paid.qty) as total_fruit_purchased
FROM company_names
CROSS JOIN years
LEFT OUTER JOIN fruit_paid
ON company_names.company = fruit_paid.company
AND extract(YEAR from fruit_paid.date) = years.year
GROUP BY company_names.company, years.year;
Using a CROSS JOIN there gets you one row per combination of the left table (company_names) and the right table (years). In other words, one row for each company and year, regardless of purchases. That you left join with fruit_paid again to get the purchase data.
The last piece is the quarters. There's an easy way to get a list of all quarters:
VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4');
So let's add that in:
WITH
company_names as
(SELECT DISTINCT company
FROM fruit_paid),
years as
(SELECT extract(YEAR from generate_series) as year
FROM generate_series('2015-01-01'::date,
'2016-01-01'::date,
'1 year')),
quarters (quarter) as
(VALUES ('Q1'), ('Q2'), ('Q3'), ('Q4'))
SELECT company_names.company,
years.year,
quarters.quarter,
sum(fruit_paid.qty) as total_fruit_purchased
FROM company_names
CROSS JOIN years
CROSS JOIN quarters
LEFT OUTER JOIN fruit_paid
ON company_names.company = fruit_paid.company
AND extract(YEAR from fruit_paid.date) = years.year
AND quarters.quarter = case
WHEN extract(MONTH FROM date) <= 3
THEN 'Q1'
WHEN extract(MONTH FROM date) BETWEEN 4 AND 6
THEN 'Q2'
WHEN extract(MONTH FROM date) BETWEEN 7 AND 9
THEN 'Q3'
WHEN extract(MONTH FROM date) BETWEEN 10 AND 12
THEN 'Q4'
ELSE 'UNKNOWN'
END
GROUP BY company_names.company, years.year, quarters.quarter;
The only fancy new thing we did here was give a bit more definition for the quarters common table expression -- we added a column name. For pseudo-queries like VALUES, it can be handy to be able to define the names for the columns returned, and it makes the query a bit more understandable later (otherwise we'd be joining on quarters.column, which is just tough to understand).
Hope that helps.
Here's some of the relevant documentation:
Table Expressions, which goes over using query results as tables, and also touches on the different join types.
WITH queries, which specifically talks about common table expressions.
VALUES, which I include for completeness.
Set-Returning Functions, which describes all of the ways to use generate_series.