How to find the first and last date prior to a particular date in Postgresql? - postgresql

I am a SQL beginner. I have trouble on finding the answer of this question
For each customer_id who made an order on January 1, 2006, what was their historical (prior to January 1, 2006) first and last order dates?
I've tried to solve it using a subquery. But I don't know how to find the first and last order dates prior to Jan 1.
Columns of table A:
customer_id
order_id
order_date
revenue
product_id
Columns of table B:
product_id
category_id
SELECT customer_id, order_date FROM A
(
SELECT customer_id FROM A
WHERE order_date = ‘2006-01-01’
)
WHERE ...

There are two subqueries actually. First for "For each customer_id who made an order on January 1, 2006" and second for "their historical (prior to January 1, 2006) first and last order dates"
So, first:
select customer_id from A where order_date = '2006-01-01';
and second:
select customer_id, min(order_date) as first_date, max(order_date) as last_date
from A
where order_date < '2006-01-01' group by customer_id;
Finally you need to get only those customers from second subquery who exists in the first one:
select customer_id, min(order_date) as first_date, max(order_date) as last_date
from A as t1
where
order_date < '2006-01-01' and
customer_id in (
select customer_id from A where order_date = '2006-01-01')
group by customer_id;
or, could be more efficient:
select customer_id, min(order_date) as first_date, max(order_date) as last_date
from A as t1
where
order_date < '2006-01-01' and
exists (
select 1 from A as t2
where t1.customer_id = t2.customer_id and t2.order_date = '2006-01-01')
group by customer_id;

You can use conditionals in aggregate functions:
SELECT customer_id, MIN(order_date) AS first, MAX(order_date) AS last FROM A
WHERE customer_id IN (SELECT customer_id FROM A WHERE order_date = ‘2006-01-01’) AND order_date < '2006-01-01'
GROUP BY customer_id;

Related

Is there a SQL code for cumulative count of SaaS customer over months?

I have a table with:
ID (id client), date_start (subscription of SaaS), date_end (could be a date value or be NULL).
So I need a cumulative count of active clients month by month.
any idea on how to write that in Postgres and achieve this result?
Starting from this, but I don't know how to proceed
select
date_trunc('month', c.date_start)::date,
count(*)
from customer
Please check next solution:
select
subscrubed_date,
subscrubed_customers,
unsubscrubed_customers,
coalesce(subscrubed_customers, 0) - coalesce(unsubscrubed_customers, 0) cumulative
from (
select distinct
date_trunc('month', c.date_start)::date subscrubed_date,
sum(1) over (order by date_trunc('month', c.date_start)) subscrubed_customers
from customer c
order by subscrubed_date
) subscribed
left join (
select distinct
date_trunc('month', c.date_end)::date unsubscrubed_date,
sum(1) over (order by date_trunc('month', c.date_end)) unsubscrubed_customers
from customer c
where date_end is not null
order by unsubscrubed_date
) unsubscribed on subscribed.subscrubed_date = unsubscribed.unsubscrubed_date;
share SQL query
You have a table of customers. With a start date and sometimes an end date. As you want to group by date, but there are two dates in the table, you need to split these first.
Then, you may have months where only customers came and others where only customers left. So, you'll want a full outer join of the two sets.
For a cumulative sum (also called a running total), use SUM OVER.
with came as
(
select date_trunc('month', date_start) as month, count(*) as cnt
from customer
group by date_trunc('month', date_start)
)
, went as
(
select date_trunc('month', date_end) as month, count(*) as cnt
from customer
where date_end is not null
group by date_trunc('month', date_end)
)
select
month,
came.cnt as cust_new,
went.cnt as cust_gone,
sum(came.cnt - went.cnt) over (order by month) as cust_active
from came full outer join went using (month)
order by month;

Days since last purchase postgres (for each purchase)

Just have a standard orders table:
order_id
order_date
customer_id
order_total
Trying to write a query that generates a column that shows the days since the last purchase, for each customer. If the customer had no prior orders, the value would be zero.
I have tried something like this:
WITH user_data AS (
SELECT customer_id, order_total, order_date::DATE,
ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY order_date::DATE DESC
)
AS order_count
FROM transactions
WHERE STATUS = 100 AND order_total > 0
)
SELECT * FROM user_data WHERE order_count < 3;
Which I could feed into tableau, then use some table calculations to wrangle the data, but I really would like to understand the SQL approach. My approach also only analyzes the most recent 2 transactions, which is a drawback.
Thanks
You should use lag() function:
select *,
lag(order_date) over (partition by customer_id order by order_date)
as prior_order_date
from transactions
order by order_id
To have the number of days since last order, just subtract the prior order date from the current order date:
select *,
order_date- lag(order_date) over (partition by customer_id order by order_date)
as days_since_last_order
from transactions
order by order_id
The query selects null if there is no prior order. You can use coalesce() to change it to zero.
You indicated that you need to calculate number of days since the last purchase.
..Trying to write a query that generates a column that shows the days
since the last purchase
So, basically you need get a difference between now and last purchase date for each client. Query can be the following:
-- test DDL
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date DATE,
customer_id INTEGER,
order_total INTEGER
);
INSERT INTO orders(order_date, customer_id, order_total) VALUES
('01-01-2015'::DATE,1,2),
('01-02-2015'::DATE,1,3),
('02-01-2015'::DATE,2,4),
('02-02-2015'::DATE,2,5),
('03-01-2015'::DATE,3,6),
('03-02-2015'::DATE,3,7);
WITH orderdata AS (
SELECT customer_id,order_total,order_date,
(now()::DATE - max(order_date) OVER (PARTITION BY customer_id)) as days_since_purchase
FROM orders
WHERE order_total > 0
)
SELECT DISTINCT customer_id ,days_since_purchase FROM orderdata ORDER BY customer_id;

postgres - get top category purchased by customer

I have a denormalized table with the columns:
buyer_id
order_id
item_id
item_price
item_category
I would like to return something that returns 1 row per buyer_id
buyer_id, sum(item_price), item_category
-- but ONLY for the category with the highest rank of sales along that specific buyer_id.
I can't get row_number() or partition to work because I need to order by the sum of item_price relative to item_category relative to buyer. Am I overlooking anything obvious?
You need a few layers of fudging here:
SELECT buyer_id, item_sum, item_category
FROM (
SELECT buyer_id,
rank() OVER (PARTITION BY buyer_id ORDER BY item_sum DESC) AS rnk,
item_sum, item_category
FROM (
SELECT buyer_id, sum(item_price) AS item_sum, item_category
FROM my_table
GROUP BY 1, 3) AS sub2) AS sub
WHERE rnk = 1;
In sub2 you calculate the sum of 'item_price' for each 'item_category' for each 'buyer_id'. In sub you rank these with a window function by 'buyer_id', ordering by 'item_sum' in descending order (so the highest 'item_sum' comes first). In the main query you select those rows where rnk = 1.

Join and aggregate on two columns - for every month even months with no data?

Using SQL Server 2005.
I have a table with calendar months
Month, fiscalorder
june,1
july,2
..
may,12
And another table with employees and a repeating monthly amount
employee, month, amount
john, july, 10
john, july, 3
john, august,2
mary, june, 2
mary, feb, 5
I need to join and aggregate these by month, but every month (even months without data) to report for every employe, but employee then fiscal order.
Output:
june, john, 0
july, john, 13
august,john,2
sept, john, 0
..
june,mary,2
Assuming Sql Server 2005+
Declare #CalenderMonths Table ([Month] Varchar(20),FiscalOrder Int)
Insert Into #CalenderMonths Values
('June',1),('July',2),('August',3),('September',4),('October',5),('November',6),
('December',7),('January',8),('February',9),('March',10),('April',11),('May', 12)
Declare #Employee Table(employee varchar(50), [month] Varchar(20), amount int )
Insert Into #Employee Values('john', 'July', 10),('john', 'July',3),('john','August',2),('mary','June',2),('mary', 'February',5)
;with cte as
(
Select employee,[month],TotalAmount = sum(amount)
from #Employee
group by employee,[month]
)
select x.[Month],x.employee,amount = coalesce(c.TotalAmount,0)
from (
select distinct c.[Month],e.employee
from #CalenderMonths c cross join cte e)x
left join cte c on x.[Month] = c.[Month] and x.employee = c.employee
order by 2
SELECT month,employee,SUM(amount) amount
FROM(
SELECT m.month, e.employee, ISNULL(s.amount, 0) AS amount
FROM dbo.months AS m
CROSS JOIN (SELECT DISTINCT employee FROM dbo.sales) AS e
LEFT JOIN dbo.sales AS s
ON s.employee = e.employee
AND m.month = s.month
)X
GROUP BY month, employee

Subtract the previous row of data where the id is the same as the row above

I have been trying all afternoon to try and achieve this with no success.
I have a db in with info on customers and the date that they purchase products from the store. It is grouped by a batch ID which I have converted into a date format.
So in my table I now have:
CustomerID|Date
1234 |2011-10-18
1234 |2011-10-22
1235 |2011-11-16
1235 |2011-11-17
What I want to achieve is to see the number of days between the most recent purchase and the last purchase and so on.
For example:
CustomerID|Date |Outcome
1234 |2011-10-18 |
1234 |2011-10-22 | 4
1235 |2011-11-16 |
1235 |2011-11-17 | 1
I have tried joining the table to itself but the problem I have is that I end up joining in the same format. I then tried with my join statement to return where it did <> match date.
Hope this makes sense, any help appreciated. I have searched all the relevant topics on here.
Will there be multiple groups of CustomerID? Or only and always grouped together?
DECLARE #myTable TABLE
(
CustomerID INT,
Date DATETIME
)
INSERT INTO #myTable
SELECT 1234, '2011-10-14' UNION ALL
SELECT 1234, '2011-10-18' UNION ALL
SELECT 1234, '2011-10-22' UNION ALL
SELECT 1234, '2011-10-26' UNION ALL
SELECT 1235, '2011-11-16' UNION ALL
SELECT 1235, '2011-11-17' UNION ALL
SELECT 1235, '2011-11-18' UNION ALL
SELECT 1235, '2011-11-19'
SELECT CustomerID,
MIN(date),
MAX(date),
DATEDIFF(day,MIN(date),MAX(date)) Outcome
FROM #myTable
GROUP BY CustomerID
SELECT a.CustomerID,
a.[Date],
ISNULL(DATEDIFF(DAY, b.[Date], a.[Date]),0) Outcome
FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY [CustomerID] ORDER BY date) Row,
CustomerID,
Date
FROM #myTable
) A
LEFT JOIN
(
SELECT ROW_NUMBER() OVER(PARTITION BY [CustomerID] ORDER BY date) Row,
CustomerID,
Date
FROM #myTable
) B ON a.CustomerID = b.CustomerID AND A.Row = B.Row + 1