Calculate previous order date and status in Postgres - postgresql

I have a simple table of orders, and I need to calculate some stats for each order. Essentially I have a Postgres db with fields:
Order_ID (unique), User_ID, Created_at (date), City, Total
I want to write a query that will generate, for each Order_ID:
1) the Created_at date of the user's most recent order prior to the current Order_ID (so if a customer placed order with Order_ID=200005b on 9/20/14, what is the date of that user's most recent previous order?)
2) another field showing a user's "Status" based on this date, given the following cases:
-- if this is user's first order, Status="new";
-- if most recent previous order date <= 60 days before the given/current order, Status="active";
-- if most recent previous order date > 60 days before the given/current order, Status="reactivated"
I think there's a way to write this query using some nested SELECTS, and maybe a self-join, but I don't know PostgreSQL well enough to understand the ordering of queries. I have been able to generate an "Order_N" field using the following query that I could use to lookup (Order_N)-1 to find the date, but I get stuck once trying to use that in nesting.
SELECT
user_id,
order_id,
created_at,
row_number() over (partition by user_id order by created_at ) as order_n
order by user_id, created_at;
Does anyone have any ideas?

Related

How to get latest data for a column when using grouping in postgres

I am using postgres alongside sequelize. I have encountered a case where I need to write a coustom query which groups the records are a particular field. I know for the remaning columns that are not used for grouping, I need to use a aggregate function like SUM. But the problem is that for some columns I need to get the one what is the latest one (DESC sorted by created_at). I see no function in sql to do so. Is my only option to write subqueries or is there a better way? Thanks?
For better understanding, If you look at the below picture, I want the group the records with address. So after the query there should only be two records, one with sydney and the other with new york. But when it comes to the distance, I want the result of the query to contain the distance form the row that was most recently created, i.e with the latest created_at.
so the final two query results should be:
sydney 100 2022-09-05 18:14:53.492131+05:45
new york 40 2022-09-05 18:14:46.23328+05:45
select address, distance, created_at
from(
select address, distance, created_at, row_number() over(partition by address order by created_at DESC) as rn
from table) x
where rn = 1

Find time difference between two most recent orders

I am trying to estimate the time of a new order from repeat customers by finding the time difference between the most recent order and the second most recent order, and then adding that difference to the most recent order.
I have been trying limit and offset, but this returns a blanket date for every row. I am thinking I need to do a lateral join, but not sure how to implement it correctly. When I try to do it, I receive no output.
select public.orders.customer_id,
max(public.orders.created_at) as last_order_date,
(select created_at from public.orders group by created_at order by created_at desc limit 1 offset 1) as second_last
from public.orders
inner join
(select
customer_id, count(*)
from public.orders
where status = 'fulfilled'
group by public.orders.customer_id
having count(customer_id) >1) repeat_customers
on public.orders.customer_id = repeat_customers.customer_id
group by public.orders.customer_id;
I wanted the second_last field to be populated by the second most recent date for each customer_id, but the output is the second most recent date for the entire table, resulting in the same date for every entry.
For your second_last column you're not limiting it per customer, it will indeed find the max of everything just like the results you've seen. See the WHERE clause in the example below which should solve this:
(SELECT
created_at
FROM
public.orders po
WHERE
po.customer_id = customer_id
ORDER BY
created_at
LIMIT 1 OFFSET 1) AS second_last
I've also aliased the table because I wasn't sure if it would complain about ambiguity since the same table is mentioned in the main select.

Condition and max reference in redshift window function

I have a list of dates, accounts, and sources of data. I'm taking the latest max date for each account and using that number in my window reference.
In my window reference, I'm using row_number () to assign unique rows to each account and sources of data that we're receiving and sorting it by the max date for each account and source of data. The end result should list out one row for each unique account + source of data combination, with the max date available in that combination. The record with the highest date will have 1 listed.
I'm trying to set a condition on my window function where only rows that populate with 1 are listed in the query, while the other ones are not shown at all. This is what I have below and where I get stuck:
SELECT
date,
account,
data source,
MAX(date) max_date,
ROW_NUMBER () OVER (PARTITION BY account ORDER BY max_date) ROWNUM
FROM table
GROUP BY
date,
account,
data source
Any help is greatly appreciated. I can elaborate on anything if necessary
If I understood your question correctly this SQL would do the trick
SELECT
date,
account,
data source,
MAX(date) max_date
FROM (
SELECT
date,
account,
data source,
MAX(date) max_date,
ROW_NUMBER () OVER (PARTITION BY account ORDER BY max_date) ROWNUM
FROM table
GROUP BY
date,
account,
data source
)
where ROWNUM = 1
If you do not need the row number for anything other than uniqueness then a query like this should work:
select distinct t.account, data_source, date
from table t
join (select account, max(date) max_date from table group by account) m
on t.account=m.account and t.date=m.max_date
This can still generate two records for one account if two records for different data sources have the identical date. If that is a possibility then mdem7's approach is probably best.
It's a bit unclear from the question but if you want each combination of account and data_source with its max date making sure there are no duplicates, then distinct should be enough:
select distinct account, data_source, max(date) max_date
from table t
group by account, data_source

PostgreSQL DISTINCT problem: works locally but not on server

I've come across a vexing problem with a PostgreSQL query. This works in my local development environment:
SELECT distinct (user_id) user_id, created_at, is_goodday
FROM table
WHERE ((created_at >= '2011-07-01 00:00:00') AND user_id = 95
AND (created_at < '2011-08-01 00:00:00'))
ORDER BY user_id, created_at ASC;
...but gives the following error on my QA server (which is on Heroku):
PGError: ERROR: syntax error at or near "user_id"
LINE 1: SELECT distinct (user_id) user_id, created_at,
^
Why could this be?
Other possibly relevant info:
I have tried single-quoting and double-quoting the field names
It's a Rails 3 app, but I'm using this SQL raw, i.e. no ActiveRecord magic
My local version of Postgres is 9.0.4 on Mac, but I have no idea what version Heroku is using
As per your comment, the standard PostgreSQL version of that query would be:
SELECT user_id, created_at, is_goodday
FROM table
WHERE created_at >= '2011-07-01 00:00:00'
AND created_at < '2011-08-01 00:00:00'
AND user_id = 95
ORDER BY created_at DESC, id DESC
LIMIT 1
You don't need user_id in the ORDER BY because you have user_id = 95, you want created_at DESC in the ORDER BY to put the most recent created_at at the top; then you LIMIT 1 to slice off just the first row in the result set. GROUP BY can be used to enforce uniqueness or if you need to group things for an aggregate function but you don't need it for either one of those here as you can get uniqueness through ORDER BY and LIMIT and you can hide your aggregation inside the ORDER BY (i.e. you don't need MAX because ORDER BY does that for you).
Since you have user_id = 95 in your WHERE, you don't need user_id in the SELECT but you can leave it in if that makes it easier for you in Ruby-land.
It is possible that you could have multiple entries with the same created_at so I added an id DESC to the ORDER BY to force PostgreSQL to choose the one with the highest id. There's nothing wrong with being paranoid when they really are out to get you and bugs definitely are out to get you.
Also, you want DESC in your ORDER BY to get the highest values at the top, ASC puts the lowest values at the top. The more recent timestamps will be the higher ones.
In general, the GROUP BY and SELECT have to match up because:
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
But that doesn't matter here because you don't need a GROUP BY at all. I linked to the 8.3 version of the documentation to match the PostgreSQL version you're using.
There are probably various other ways to do this but this one as probably as straight forward and clear as you're going to get.
put a quote in user_id like user_id = '95'. Your query should be
SELECT distinct (user_id) as uid, created_at, is_goodday FROM table WHERE
((created_at >= '2011-07-01 00:00:00') AND user_id = '95' AND (created_at < '2011-08-01 00:00:00')) ORDER BY user_id, created_at ASC;
You're using DISTINCT ON (without writing the ON). Perhaps you should write the ON. Perhaps your postgres server dates from before the feature was implemented (which is pretty old by now).
If all else fails, you can always do that with some GROUP BY...

How do I order my query by a field and still group by a subset of that field in db2?

Sorry if the title is confusing. Here is the query I have
Select MONTH(DATE(TIMESTAMP)), SUM(FIELD1), SUM(FIELD2) from TABLE WHERE TIMESTAMP BETWEEN '2009-07-26 00:00:00' AND '2010-02-24 23:59:59' GROUP BY MONTH(DATE(TIMESTAMP))
This will let me get the month number out of the query. The problem is that right now it is sorting the months 1,2,3,4.... when it spans two separate years. I need to be able to sort this query by year then month.
If I add "ORDER BY TIMESTAMP" at the end of my query I get this error:
Column TIMESTAMP or expression in SELECT list not valid. SQLCODE=-122
Also I changed the field names for this question to keep it clear the field isn't actually called TIMESTAMP
You need to group by year then month.:
SELECT YEAR(YourField),
Month(YourField),
SUM(Field1),
SUM(Field2)
FROM Table
WHERE...
GROUP BY
YEAR(YourField),
Month(YourField)
ORDER BY
YEAR(YourField),
Month(YourField)