Max fuction in Postgres does not give the max value - postgresql

I am writing a simple SQL query to get the latest record from every customer and to get the max of device_count if there are multiple records for a customer with same timestamp. However, the max function doesn't seem to take the max value though. Any help would be appreciated.
My SQL query -
select sub.customerid, max(sub.device_count) from(
SELECT customerid, device_count,
RANK() OVER
(
PARTITION by customerid
ORDER BY date_time desc
) AS rownum
FROM tableA) sub
WHERE rownum = 1
group by 1
Sample data:
customerid
device_count
date_time
A
3573
2021-07-26 02:15:09-05:00
A
4
2021-07-26 02:15:13-05:00
A
16988
2021-07-26 02:15:13-05:00
A
20696
2021-07-26 02:15:13-05:00
A
24655
2021-07-26 02:15:13-05:00
Desired Output should be to get the row with max device_count which is 24655 but I get 16988 as the output.

try to :
sort your table using ORDER BY customerid,device_count
Then apply the LAST_VALUE(device_count) window function aver the customerid partition.
Apply LAST_VALUE() to find the latest device_count (since it's sorted ascending, the last device_count value is the max).

You need to put device_count into the window function's order by and take out the aggregation:
select sub.customerid, device_count from(
SELECT customerid, device_count,
RANK() OVER
(
PARTITION by customerid
ORDER BY date_time desc, device_count desc
) AS rownum
FROM tableA) sub where rownum=1;
But if the top row for a customerid has ties (in both date_time and device_count fields) it will return all such ties. So better to replace RANK() with ROW_NUMBER().

Related

simpler query for counting total row in the column

not sure if my below query script right to execute as I am trying to find just one query in Oracle SQL query
select distinct (master_id) , last_name from (
select q2.* , max (count_a) over (partition by master_id)
count_b from ( select q1.* , count (*) over (partition by
master_id order by purchased_date desc ) count_a from profile q1)
q2) where count_b > 2
I am trying to minimise timer to execute get data by reducing sub query
for example above it has two subqueries
max (count_a) over (partition by master_id) count_b
count (*) over (partition by master_id order by purchased_date desc ) count_a
so I played around until this query
max (count (*)) over (partition by master_id) count
SQL query script;
select * from profile a
join( select * from (
select master_id, max (count(*)) over (partition by
master_id) count from profile) where count >2) b
ON a. master_id = b. master_id
Thank you in advance for your help

Fetch minimum value for each NTILE bucket in Hive

I am trying to partition the data into percentiles (100 equal buckets) using NTILE window function for each merchant_id ordered by score column. The output of the query will contain merchant_id, score, and percentile for every record in the source table. (Sample code below)
CREATE TABLE merchant_score_ntiles
AS
SELECT merchant_id, score, NTILE(100) OVER (PARTITION BY merchant_id ORDER BY score DESC) as percentile
FROM merch_table
This will return sample output as follows:
merchant_id,score,percentile
1001,900,1
1001,800,1
1001,760,1
1002,900,2
1002,800,2
1002,750,2
Is there a way we can return only the minimum score for each merchant_id based on percentile column such as below?
merchant_id,score,percentile
1001,760,1
1002,750,2
You can try to use ROW_NUMBER window function in subquery before using NTILE window function
SELECT merchant_id,
score,
NTILE(100) OVER (PARTITION BY merchant_id ORDER BY score DESC) as percentile
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY merchant_id ORDER BY score) rn
FROM merch_table
) t1
WHERE rn = 1

PostgreSQL command : using the result obtained from first Query and using it In second Query : write as single query

SELECT partner_id
FROM trip_delivery_sales ts
WHERE ts.route_id='152'
GROUP BY ts.partner_id
From the query we can get the partners id.Using that partner id we want check in trip delicery sales lines table and want to find each customer last two sale product quantity sum. If last two sale have product qty as 2 & 5 want result as partner_id | count as Mn2333 - 7
here fore example i take partner id as 34806. But i want to check all partner_id obtained from last query
SELECT product_qty
FROM trip_delivery_sales_lines td
WHERE td.partner_id='34806'
AND td.route_id='152'
AND td.product_id='432'
ORDER BY td.order_date DESC
LIMIT 2
You can run this query
SELECT td.partner_id,sum(product_qty)
FROM trip_delivery_sales_lines td,
(SELECT partner_id FROM trip_delivery_sales ts WHERE ts.route_id='152') as ts
WHERE td.partner_id=ts.partner_id
AND td.product_id='432'
GROUP BY td.partner_id
ORDER BY td.order_date DESC
LIMIT 2
Or this one
with ts as (SELECT distinct partner_id FROM trip_delivery_sales WHERE route_id='152')
SELECT td.partner_id,sum(product_qty)
FROM trip_delivery_sales_lines td,ts
WHERE td.partner_id=ts.partner_id
AND td.product_id='432'
GROUP BY td.partner_id
ORDER BY td.order_date DESC
LIMIT 2
You might be looking for
SELECT DISTINCT ts.partner_id, ARRAY(
SELECT product_qty
FROM trip_delivery_sales_lines td
WHERE td.partner_id=ts.partner_id
AND td.product_id='432'
ORDER BY td.order_date DESC
LIMIT 2
) AS product_qty_arr
FROM trip_delivery_sales ts
WHERE ts.route_id='152'
or just
SELECT
partner_id,
array_agg(product_qty ORDER BY order_date DESC) as product_qty_arr
FROM (
SELECT
td.partner_id,
td.product_qty,
td.order_date,
row_number() OVER (PARTITION BY td.partner_id ORDER BY td.order_date DESC)
FROM trip_delivery_sales_lines td
JOIN trip_delivery_sales ts USING (partner_id)
WHERE ts.route_id='152'
AND td.product_id='432'
) AS enumerated
WHERE row_number <= 2
GROUP BY partner_id
See also PostgreSQL: top n entries per item in same table or Optimize GROUP BY query to retrieve latest row per user

How to select corresponding record alongside aggregate function with having clause

Let's say I have an orders table with customer_id, order_total, and order_date columns. I'd like to build a report that shows all customers who haven't placed an order in the last 30 days, with a column for the total amount their last order was.
This gets all of the customers who should be on the report:
select customer, max(order_date), (select order_total from orders o2 where o2.customer = orders.customer order by order_date desc limit 1)
from orders
group by 1
having max(order_date) < NOW() - '30 days'::interval
Is there a better way to do this that doesn't require a subquery but instead uses a window function or other more efficient method in order to access the total amount from the most recent order? The techniques from How to select id with max date group by category in PostgreSQL? are related, but the extra having restriction seems to stop me from using something like DISTINCT ON.
demo:db<>fiddle
Solution with row_number window function (https://www.postgresql.org/docs/current/static/tutorial-window.html)
SELECT
customer, order_date, order_total
FROM (
SELECT
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total,
row_number() OVER w as row_count
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
) s
WHERE row_count = 1 AND order_date < CURRENT_DATE - 30
Solution with DISTINCT ON (https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT):
SELECT
customer, order_date, order_total
FROM (
SELECT DISTINCT ON (customer)
*,
first_value(order_date) OVER w as last_order,
first_value(order_total) OVER w as last_total
FROM orders
WINDOW w AS (PARTITION BY customer ORDER BY order_date DESC)
ORDER BY customer, order_date DESC
) s
WHERE order_date < CURRENT_DATE - 30
Explanation:
In both solutions I am working with the first_value window function. The window function's frame is defined by customers. The rows within the customers' groups are ordered descending by date which gives the latest row first (last_value is not working as expected every time). So it is possible to get the last order_date and the last order_total of this order.
The difference between both solutions is the filtering. I showed both versions because sometimes one of them is significantly faster
The window function style is creating a row count within the frames. Every first row can be filtered later. This is done by adding a row_number window function. The benefit of this solution comes out when you are trying to filter the first two or three data sets. You simply have to change the filter from WHERE row_count = 1 to WHERE row_count = 2
But if you want only one single row per group you just need to ensure that the expected row per group is ordered to be the first row in the group. Then the DISTINCT ON function can delete all following rows. DISTINCT ON (customer) gives the first (ordered) row per customer group.
Try to join table on itself
select o1.customer, max(order_date),
from orders o1
join orders o2 on o1.id=o2.id
group by o1.customer
having max(o1.order_date) < NOW() - '30 days'::interval
Subqueries in select is a bad idea, because DB will execute a query for each row
If you use postgres you can also try to use CTE
https://www.postgresql.org/docs/9.6/static/queries-with.html
WITH t as (
select id, order_total from orders o2 where o2.customer = orders.customer
order by order_date desc limit 1
) select o1.customer, max(order_date),
from orders o1
join t t.id=o2.id
group by o1.customer
having max(order_date) < NOW() - '30 days'::interval

Postgres : Need distinct records count

I have a table with duplicate entries and the objective is to get the distinct entries based on the latest time stamp.
In my case 'serial_no' will have duplicate entries but I select unique entries based on the latest time stamp.
Below query is giving me the unique results with the latest time stamp.
But my concern is I need to get the total of unique entries.
For example assume my table has 40 entries overall. With the below query I am able to get 20 unique rows based on the serial number.
But the 'total' is returned as 40 instead of 20.
Any help on this pls?
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
ORDER BY
serial_no asc
product_info table intially has this data
serial_no name timestamp
11212 pulp12 2018-06-01 20:00:01
11213 mango 2018-06-01 17:00:01
11214 grapes 2018-06-02 04:00:01
11215 orange 2018-06-02 07:05:30
11212 pulp12 2018-06-03 14:00:01
11213 mango 2018-06-03 13:00:00
After the distict query I got all unique results based on the latest
timestamp:
serial_no name timestamp total
11212 pulp12 2018-06-03 14:00:01 6
11213 mango 2018-06-03 13:00:00 6
11214 grapes 2018-06-02 04:00:01 6
11215 orange 2018-06-02 07:05:30 6
But total is appearing as 6 . I wanted the total to be 4 since it has
only 4 unique entries.
I am not sure how to modify my existing query to get this desired
result.
Postgres supports COUNT(DISTINCT column_name), so if I have understood your request, using that instead of COUNT(*) will work, and you can drop the OVER.
What you could do is move the window function to a higher level select statement. This is because window function is evaluated before distinct on and limit clauses are applied. Also, you can not include DISTINCT keyword within window functions - it has not been implemented yet (as of Postgres 9.6).
SELECT
*,
COUNT(*) OVER() as total -- here
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC
LIMIT
10
) AS my_info
Additionally, offset is not required there and one more sorting is also superfluous. I've removed these.
Another way would be to include a computed column in the select clause but this would not be as fast as it would require one more scan of the table. This is obviously assuming that your total is strictly connected to your resultset and not what's beyond that being stored in the table, but gets filtered out.
select count(*), serial_no from product_info group by serial_no
will give you the number of duplicates for each serial number
The most mindless way of incorporating that information would be to join in a sub query
SELECT
*
FROM
(
SELECT
DISTINCT ON (serial_no) id,
serial_no,
name,
timestamp,
COUNT(*) OVER() as total
FROM
product_info
INNER JOIN my.account ON id = accountid
WHERE
lower(name) = 'hello'
ORDER BY
serial_no,
timestamp DESC OFFSET 0
LIMIT
10
) AS my_info
join (select count(*) as counts, serial_no from product_info group by serial_no) as X
on X.serial_no = my_info.serial_no
ORDER BY
serial_no asc