Days since last purchase postgres (for each purchase) - postgresql

Just have a standard orders table:
order_id
order_date
customer_id
order_total
Trying to write a query that generates a column that shows the days since the last purchase, for each customer. If the customer had no prior orders, the value would be zero.
I have tried something like this:
WITH user_data AS (
SELECT customer_id, order_total, order_date::DATE,
ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY order_date::DATE DESC
)
AS order_count
FROM transactions
WHERE STATUS = 100 AND order_total > 0
)
SELECT * FROM user_data WHERE order_count < 3;
Which I could feed into tableau, then use some table calculations to wrangle the data, but I really would like to understand the SQL approach. My approach also only analyzes the most recent 2 transactions, which is a drawback.
Thanks

You should use lag() function:
select *,
lag(order_date) over (partition by customer_id order by order_date)
as prior_order_date
from transactions
order by order_id
To have the number of days since last order, just subtract the prior order date from the current order date:
select *,
order_date- lag(order_date) over (partition by customer_id order by order_date)
as days_since_last_order
from transactions
order by order_id
The query selects null if there is no prior order. You can use coalesce() to change it to zero.

You indicated that you need to calculate number of days since the last purchase.
..Trying to write a query that generates a column that shows the days
since the last purchase
So, basically you need get a difference between now and last purchase date for each client. Query can be the following:
-- test DDL
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date DATE,
customer_id INTEGER,
order_total INTEGER
);
INSERT INTO orders(order_date, customer_id, order_total) VALUES
('01-01-2015'::DATE,1,2),
('01-02-2015'::DATE,1,3),
('02-01-2015'::DATE,2,4),
('02-02-2015'::DATE,2,5),
('03-01-2015'::DATE,3,6),
('03-02-2015'::DATE,3,7);
WITH orderdata AS (
SELECT customer_id,order_total,order_date,
(now()::DATE - max(order_date) OVER (PARTITION BY customer_id)) as days_since_purchase
FROM orders
WHERE order_total > 0
)
SELECT DISTINCT customer_id ,days_since_purchase FROM orderdata ORDER BY customer_id;

Related

How to find the first and last date prior to a particular date in Postgresql?

I am a SQL beginner. I have trouble on finding the answer of this question
For each customer_id who made an order on January 1, 2006, what was their historical (prior to January 1, 2006) first and last order dates?
I've tried to solve it using a subquery. But I don't know how to find the first and last order dates prior to Jan 1.
Columns of table A:
customer_id
order_id
order_date
revenue
product_id
Columns of table B:
product_id
category_id
SELECT customer_id, order_date FROM A
(
SELECT customer_id FROM A
WHERE order_date = ‘2006-01-01’
)
WHERE ...
There are two subqueries actually. First for "For each customer_id who made an order on January 1, 2006" and second for "their historical (prior to January 1, 2006) first and last order dates"
So, first:
select customer_id from A where order_date = '2006-01-01';
and second:
select customer_id, min(order_date) as first_date, max(order_date) as last_date
from A
where order_date < '2006-01-01' group by customer_id;
Finally you need to get only those customers from second subquery who exists in the first one:
select customer_id, min(order_date) as first_date, max(order_date) as last_date
from A as t1
where
order_date < '2006-01-01' and
customer_id in (
select customer_id from A where order_date = '2006-01-01')
group by customer_id;
or, could be more efficient:
select customer_id, min(order_date) as first_date, max(order_date) as last_date
from A as t1
where
order_date < '2006-01-01' and
exists (
select 1 from A as t2
where t1.customer_id = t2.customer_id and t2.order_date = '2006-01-01')
group by customer_id;
You can use conditionals in aggregate functions:
SELECT customer_id, MIN(order_date) AS first, MAX(order_date) AS last FROM A
WHERE customer_id IN (SELECT customer_id FROM A WHERE order_date = ‘2006-01-01’) AND order_date < '2006-01-01'
GROUP BY customer_id;

How to join on closest date in Postgresql

Suppose, I have following tables
product_prices
product|price|date
-------+-----+----------
apple |10 |2014-03-01
-------+-----+----------
apple |20 |2014-05-02
-------+-----+----------
egg |2 |2014-03-03
-------+-----+----------
egg |4 |2015-10-12
purchases:
user|product|date
----+-------+----------
John|apple |2014-03-02
----+-------+----------
John|apple |2014-06-03
----+-------+----------
John|egg |2014-08-13
----+-------+----------
John|egg |2016-08-13
What I need is table similar to this:
name|product|purchase date |price date|price
----+-------+--------------+----------+-----
John|apple |2014-03-02 |2014-03-01|10
----+-------+--------------+----------+-----
John|apple |2014-06-03 |2014-05-02|20
----+-------+--------------+----------+-----
John|egg |2014-08-13 |2014-08-13|2
----+-------+--------------+----------+-----
John|egg |2016-08-13 |2015-10-12|4
Or "what is the price for product at this day". Where price is calculated based on date from products table.
On real DB I tried to use something similar to:
SELECT name, product, pu.date, pp.date, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp
ON pu.date = (
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1);
But I keep either getting only left part of table (with (null) instead of product dates and prices) or many rows with all the combinations of prices and dates.
I would suggest changing product_prices table to use a daterange column instead (or at least a start_date and an end_date).
You can use an exclusion constraint to make sure you never have overlapping ranges for one product and an insert trigger that "closes" the "current" prices and creates a new unbounded range for the newly inserted price.
A daterange can efficiently be indexed and with that in place the query gets as easy as:
SELECT name, product, pu.date, pp.valid_during, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp ON pu.date <# pp.valid_during
(assuming the range column is named valid_during)
The exclusion constraint would only work however if the product was an integer (not a varchar) - but I guess your real product_purchases table uses a foreign key to some product table anyway (which is an integer).
The new table definitions could look something like this:
create table purchase_prices
(
product_id integer not null references products,
price numeric(16,4) not null,
valid_during daterange not null
);
And the constraint that prevents overlapping ranges:
alter table purchase_prices
add constraint check_price_range
exclude using gist (product_id with =, valid_during with &&);
The constraint needs the btree_gist extension.
As always improving query speed comes with a price and in this case it's the higher maintenance costs for the GiST index. You would need to run some tests to see if the easier (and most probably much faster) query outweighs the slower insert performance on purchase_prices.
Look at your scalar sub-query very closely. It is not correlated back to the outer query. In other words, it will return the same result every time: the latest date in the product_prices table. Period. Think about the query out of context:
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1
There are two problems with it:
It will return 2015-10-12 for every row in the join and ultimately, nothing was purchased on that date, hence, null.
Your approximation of closest is that the dates are equal. Unless you have a product_prices row for every product for every single date, you'll always have misses. "Closest" implies distance and ranking.
WITH close_prices_by_purchase AS (
SELECT
p.user,
p.product,
p.date pp.date,
pp.price,
row_number() over (partition by pp.product, order by pp.date desc) as distance -- calculate distance between purchase date and price date
FROM purchases AS p
INNER JOIN product_prices AS pp on pp.product = p.product
WHERE pp.date < p.date
)
SELECT user as name, product, pu.date as purchase_date, pp.date as price_date, price
FROM close_prices_by_purchase AS cpbp
WHERE distance = 1; -- shortest distance
You can try something like this, although I am sure there's a better way:
with diffs as (
select
a.*,
b."date" as bdate,
b.price,
b."date" - a."date" as diffdays,
row_number() over (
partition by "user", a."product", a."date"
order by "user", a."product", a."date", b."date" - a."date" desc
) as sr
from purchases a
inner join product_prices b on a.product = b.product
where b."date" - a."date" < 1
)
select
"user" as "name",
product,
"date" as "purchase date",
bdate as "price date",
price
from diffs
where sr = 1
Example: https://www.db-fiddle.com/f/dwQ9EXmp1SdpNpxyV1wc6M/0
Explanation
I attempted to join both tables and find the difference between dates of purchase and price, and ranked them by closest date prior to the purchase. Rank of 1 will go to the closest date. Then, data with rank of 1 was extracted.
This is a great place to use date ranges! We know the start date of the price range and we can use a window function to get the next date. At that point, it's really easy to figure out the price on any day.
with price_ranges as
(select product,
price,
date as price_date,
daterange(date, lead(date, 1)
OVER (partition by product order by date), '[)'
) as valid_price_range from product_prices
)
select "user" as name,
purchases.product,
purchases.date,
price_date,
price
from purchases
join price_ranges on purchases.product = price_ranges.product
and purchases.date <# price_ranges.valid_price_range
order by purchases.date;

How to select corresponding record alongside aggregate function with having clause

Let's say I have an orders table with customer_id, order_total, and order_date columns. I'd like to build a report that shows all customers who haven't placed an order in the last 30 days, with a column for the total amount their last order was.
This gets all of the customers who should be on the report:
select customer, max(order_date), (select order_total from orders o2 where o2.customer = orders.customer order by order_date desc limit 1)
from orders
group by 1
having max(order_date) < NOW() - '30 days'::interval
Is there a better way to do this that doesn't require a subquery but instead uses a window function or other more efficient method in order to access the total amount from the most recent order? The techniques from How to select id with max date group by category in PostgreSQL? are related, but the extra having restriction seems to stop me from using something like DISTINCT ON.
demo:db<>fiddle
Solution with row_number window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)
SELECT
customer, order_date, order_total
FROM (
SELECT
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total,
row_number() OVER w as row_count
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
) s
WHERE row_count = 1 AND order_date < CURRENT_DATE - 30
Solution with DISTINCT ON (https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT):
SELECT
customer, order_date, order_total
FROM (
SELECT DISTINCT ON (customer)
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
ORDER BY customer, order_date DESC
) s
WHERE order_date < CURRENT_DATE - 30
Explanation:
In both solutions I am working with the first_value window function. The window function's frame is defined by customers. The rows within the customers' groups are ordered descending by date which gives the latest row first (last_value is not working as expected every time). So it is possible to get the last order_date and the last order_total of this order.
The difference between both solutions is the filtering. I showed both versions because sometimes one of them is significantly faster
The window function style is creating a row count within the frames. Every first row can be filtered later. This is done by adding a row_number window function. The benefit of this solution comes out when you are trying to filter the first two or three data sets. You simply have to change the filter from WHERE row_count = 1 to WHERE row_count = 2
But if you want only one single row per group you just need to ensure that the expected row per group is ordered to be the first row in the group. Then the DISTINCT ON function can delete all following rows. DISTINCT ON (customer) gives the first (ordered) row per customer group.
Try to join table on itself
select o1.customer, max(order_date),
from orders o1
join orders o2 on o1.id=o2.id
group by o1.customer
having max(o1.order_date) < NOW() - '30 days'::interval
Subqueries in select is a bad idea, because DB will execute a query for each row
If you use postgres you can also try to use CTE
https://www.postgresql.org/docs/9.6/static/queries-with.html
WITH t as (
select id, order_total from orders o2 where o2.customer = orders.customer
order by order_date desc limit 1
) select o1.customer, max(order_date),
from orders o1
join t t.id=o2.id
group by o1.customer
having max(order_date) < NOW() - '30 days'::interval

PostgreSQL get results that have been created 24 hours from now

I have two tables that I am joining together. I want to filter the results based on whether or not it had been created 24 hours prior. Here are my tables.
table user_infos (
id integer,
date_created timestamp with timezone,
name varchar(40)
);
table user_data (
id integer,
team_name varchar(40)
);
This is my query that I am using to join them together and hopefully filter them:
SELECT timestampdiff(HOUR, user_infos.date_created, now()) as hours_since,
user_data.id, user_data.team_name,
user_infos.name, user_infos.date_created
FROM user_data
JOIN user_infos
ON user_infos.id=user_data.id
WHERE timestampdiff(HOUR, user_infos.date_created, now()) < 24
ORDER BY name ASC, id ASC
LIMIT 50 OFFSET 0
What I am trying to do is join the two tables such that the id, team_name, name, and date-created would be treated as one table.
Then I would like to filter it such that I only get the results that were created 24 hours ago. This is what I am using the timestampdiff for.
Then I ORDER then by name and id in ascending order.
then limit the results to 50.
Everything look good except that I doesn't work. When I run this query it tells me that the "hour" column does not exist.
Clearly there is something subtle here that is messing everything up. Does anyone have any suggestions?
Alternatively, I've tried this, but it tells me that there is a syntax error at 1;
SELECT
user_data.id, user_data.team_name,
user_infos.name, user_infos.date_created
FROM user_data
JOIN user_infos
ON user_infos.id=user_data.id
WHERE user_infos.date_created
BETWEEN DATE( DATE_SUB( NOW() , INTERVAL 1 DAY ) ) AND
DATE ( NOW() )
ORDER BY name ASC, id ASC
LIMIT 50 OFFSET 0
I think your problem is with your data types. You are checking if a timestamp field is between a casted date field (which removes the time from the date). NOW() is different than the DATE(NOW()).
So you have 2 options. You can either remove the DATE() casting and it should work, or you can cast the date_created to a date.
SELECT
user_data.id, user_data.team_name,
user_infos.name, user_infos.date_created
FROM user_data
JOIN user_infos
ON user_infos.id=user_data.id
WHERE user_infos.date_created
BETWEEN DATE_SUB( NOW() , INTERVAL 1 DAY ) AND
NOW()
ORDER BY name ASC, id ASC
LIMIT 50 OFFSET 0
SQL Fiddle Demo

Date Range between the Last 2 Records Sql Server 2008

Hi Given that I have a table with 2 columns.
Table Booking
Column Amount-TransactionDate
Get me total Amount between Last 2 transactionDate.
How do you do that ?How do you get the last transaction but 01
Any suggestions?
You can use a common table expression (CTE) to assign a sequence number to each row based on descending order of the transaction date. And then select the rows with a filter to get the last 2 rows.
This query displays the last two transactions in the table
WITH BookingCTE AS (
SELECT ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) as Sequence,
Amount, TransactionDate
FROM Booking
)
SELECT Sequence, Amount, TransactionDate
FROM BookingCTE
WHERE Sequence <= 2
;
This query give you the total amount for the last two transactions.
WITH BookingCTE AS (
SELECT ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) as Sequence, Amount, TransactionDate
FROM Booking
)
SELECT SUM(Amount) AS TotalAmount
FROM BookingCTE
WHERE Sequence <= 2
;