nested select using count to tally each row in Postgres

nested select using count to tally each row in Postgres - postgresql

I have a table where each row is a unique order with a unique order_id, but users can have multiple rows/orders.
Orders table -> order_id, user_id, orderedat (datetime), sales
I am trying to return a query that calculates, for each order_id, how many previous orders the associated user has made. (Essentially, does this row represent the user's first order? 5th order? 20th order? etc.)
I'm trying something like this nested select, but am getting an error "more than one row returned by a subquery used as an expression."
SELECT
order_id,
user_id,
COUNT (order_id) AS order_n
FROM
orders
WHERE orderedat >= (
SELECT
orderedat
FROM
fulfillments
GROUP BY
order_id
)
GROUP BY
order_id
Any thoughts on how to achieve this in postgres?
///////////////
Further complication: with another column called "Status," how to only count rows with specific values in Status? I'd like to just skip orders in the number unless they have a status of "paid" or "placed". For example:
data:
order_id user_id orderedat status
001 max 10/1/14 paid
002 max 10/20/14 placed
003 max 10/21/14 cancelled
004 bill 10/5/14 deleted
005 max 10/31/14 paid
006 bill 10/24/14 placed
results:
order_id user_id orderedat orders_so_far
001 max 10/1/14 1
002 max 10/20/14 2
003 max 10/21/14 null
005 max 10/31/14 3
004 bill 10/5/14 null
006 bill 10/24/14 1

This can be done using a window function:
SELECT order_id,
user_id,
orderdat,
row_number() over (partition by user_id order by orderedat ) as orders_so_far
FROM orders
order by user_id, orderdat

Related

Postgres : Need distinct records count

I have a table with duplicate entries and the objective is to get the distinct entries based on the latest time stamp.
In my case 'serial_no' will have duplicate entries but I select unique entries based on the latest time stamp.
Below query is giving me the unique results with the latest time stamp.
But my concern is I need to get the total of unique entries.
For example assume my table has 40 entries overall. With the below query I am able to get 20 unique rows based on the serial number.
But the 'total' is returned as 40 instead of 20.
Any help on this pls?
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
ORDER BY
serial_no asc
product_info table intially has this data
serial_no name timestamp
11212 pulp12 2018-06-01 20:00:01
11213 mango 2018-06-01 17:00:01
11214 grapes 2018-06-02 04:00:01
11215 orange 2018-06-02 07:05:30
11212 pulp12 2018-06-03 14:00:01
11213 mango 2018-06-03 13:00:00
After the distict query I got all unique results based on the latest
timestamp:
serial_no name timestamp total
11212 pulp12 2018-06-03 14:00:01 6
11213 mango 2018-06-03 13:00:00 6
11214 grapes 2018-06-02 04:00:01 6
11215 orange 2018-06-02 07:05:30 6
But total is appearing as 6 . I wanted the total to be 4 since it has
only 4 unique entries.
I am not sure how to modify my existing query to get this desired
result.

Postgres supports COUNT(DISTINCT column_name), so if I have understood your request, using that instead of COUNT(*) will work, and you can drop the OVER.

What you could do is move the window function to a higher level select statement. This is because window function is evaluated before distinct on and limit clauses are applied. Also, you can not include DISTINCT keyword within window functions - it has not been implemented yet (as of Postgres 9.6).
SELECT
*,
COUNT(*) OVER() as total -- here
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC
LIMIT
10
) AS my_info
Additionally, offset is not required there and one more sorting is also superfluous. I've removed these.
Another way would be to include a computed column in the select clause but this would not be as fast as it would require one more scan of the table. This is obviously assuming that your total is strictly connected to your resultset and not what's beyond that being stored in the table, but gets filtered out.

select count(*), serial_no from product_info group by serial_no
will give you the number of duplicates for each serial number
The most mindless way of incorporating that information would be to join in a sub query
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
join (select count(*) as counts, serial_no from product_info group by serial_no) as X
on X.serial_no = my_info.serial_no
ORDER BY
serial_no asc

Days since last purchase postgres (for each purchase)

Just have a standard orders table:
order_id
order_date
customer_id
order_total
Trying to write a query that generates a column that shows the days since the last purchase, for each customer. If the customer had no prior orders, the value would be zero.
I have tried something like this:
WITH user_data AS (
SELECT customer_id, order_total, order_date::DATE,
ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY order_date::DATE DESC
)
AS order_count
FROM transactions
WHERE STATUS = 100 AND order_total > 0
)
SELECT * FROM user_data WHERE order_count < 3;
Which I could feed into tableau, then use some table calculations to wrangle the data, but I really would like to understand the SQL approach. My approach also only analyzes the most recent 2 transactions, which is a drawback.
Thanks

You should use lag() function:
select *,
lag(order_date) over (partition by customer_id order by order_date)
as prior_order_date
from transactions
order by order_id
To have the number of days since last order, just subtract the prior order date from the current order date:
select *,
order_date- lag(order_date) over (partition by customer_id order by order_date)
as days_since_last_order
from transactions
order by order_id
The query selects null if there is no prior order. You can use coalesce() to change it to zero.

You indicated that you need to calculate number of days since the last purchase.
..Trying to write a query that generates a column that shows the days
since the last purchase
So, basically you need get a difference between now and last purchase date for each client. Query can be the following:
-- test DDL
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date DATE,
customer_id INTEGER,
order_total INTEGER
);
INSERT INTO orders(order_date, customer_id, order_total) VALUES
('01-01-2015'::DATE,1,2),
('01-02-2015'::DATE,1,3),
('02-01-2015'::DATE,2,4),
('02-02-2015'::DATE,2,5),
('03-01-2015'::DATE,3,6),
('03-02-2015'::DATE,3,7);
WITH orderdata AS (
SELECT customer_id,order_total,order_date,
(now()::DATE - max(order_date) OVER (PARTITION BY customer_id)) as days_since_purchase
FROM orders
WHERE order_total > 0
)
SELECT DISTINCT customer_id ,days_since_purchase FROM orderdata ORDER BY customer_id;

Finding values with same value on one field but distinct value on other field

Lets assume I have this postgresql table:
ProductStore
product_id
store_id
price
How do I query "which products hasn't the same price on every store".
I mean, same value on product_id but distinct value on price.

SELECT product_id
FROM "ProductStore"
GROUP BY product_id
HAVING COUNT(DISTINCT price) > 1

Date Range between the Last 2 Records Sql Server 2008

Hi Given that I have a table with 2 columns.
Table Booking
Column Amount-TransactionDate
Get me total Amount between Last 2 transactionDate.
How do you do that ?How do you get the last transaction but 01
Any suggestions?

You can use a common table expression (CTE) to assign a sequence number to each row based on descending order of the transaction date. And then select the rows with a filter to get the last 2 rows.
This query displays the last two transactions in the table
WITH BookingCTE AS (
SELECT ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) as Sequence,
Amount, TransactionDate
FROM Booking
)
SELECT Sequence, Amount, TransactionDate
FROM BookingCTE
WHERE Sequence <= 2
;
This query give you the total amount for the last two transactions.
WITH BookingCTE AS (
SELECT ROW_NUMBER() OVER (ORDER BY TransactionDate DESC) as Sequence, Amount, TransactionDate
FROM Booking
)
SELECT SUM(Amount) AS TotalAmount
FROM BookingCTE
WHERE Sequence <= 2
;

Grouping SQL results by continous time intervals (oracle sql)

I have following data in the table as below and I am looking for a way to group the continuous time intervals for each id to return:
CREATE TABLE DUMMY
(
ID VARCHAR2(10 BYTE),
TIME_STAMP VARCHAR2(8 BYTE),
NAME VARCHAR2(255 BYTE)
);
SELECT ID, min(TIME_STAMP) "startDate", max(TIME_STAMP) "endDate", NAME
GROUP BY ID , NAME
something like
100 20011128 20011203 David
100 20011204 20011207 Unknown
100 20011208 20011215 David
100 20011216 20011220 Sara
and so on ...
ps. I have a sample script, but i don't know how to attach my file.
Hi every one here is more input:
There is only one record with time_stamp for a specific ID.
Users can be different, for example for day 1 David, day 2 unknown, day 3 David and so on.
So there is one row for every day of year for each ID but with different users.
Now, i want to see the break point, differences base on time_stamp intervals from day one
until last day for a specific ID in day order from begin day until last day.
Query Result should be :
ID NAME MIN_DATE MAX_DATE
100 David 20011128 20050407
100 Sara 20050408 20050417
100 David 20050418 20080416
100 Unknown 20080417 20080507
100 David 20080508 20080508
100 Unknown 20080509 20080607
100 David 20080608 20080608
100 Unknown 20080609 20080921
100 David 20080922 20080922
100 Unknown 20080923 20081231
100 David 20090101 20090405
thanks
Hi again, many thanks to everyone, i have solved the problem, here is the solution:
select id, min(time_stamp), max(time_stamp), name
from ( select id, time_stamp, name,
max(rn) over (order by time_stamp) grp
from ( select id, time_stamp, name,
case
when lag(name) over (order by time_stamp) <> name or
row_number() over (order by time_stamp) = 1
then row_number() over (order by time_stamp)
end rn
from dummy
)
)
group by id, grp, name
order by 1

Select
ID,
Name,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id,
Name
That should work.
IF you want the date range for each Id, but all the names you can do:
Select
d.Id,
d.Name,
dr.min_date,
dr.max_date
from
Dummy d
JOIN
(Select
Id,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id
) dr
on ( dr.Id = d.Id)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse