How to get latest data for a column when using grouping in postgres - postgresql

I am using postgres alongside sequelize. I have encountered a case where I need to write a coustom query which groups the records are a particular field. I know for the remaning columns that are not used for grouping, I need to use a aggregate function like SUM. But the problem is that for some columns I need to get the one what is the latest one (DESC sorted by created_at). I see no function in sql to do so. Is my only option to write subqueries or is there a better way? Thanks?
For better understanding, If you look at the below picture, I want the group the records with address. So after the query there should only be two records, one with sydney and the other with new york. But when it comes to the distance, I want the result of the query to contain the distance form the row that was most recently created, i.e with the latest created_at.
so the final two query results should be:
sydney 100 2022-09-05 18:14:53.492131+05:45
new york 40 2022-09-05 18:14:46.23328+05:45

select address, distance, created_at
from(
select address, distance, created_at, row_number() over(partition by address order by created_at DESC) as rn
from table) x
where rn = 1

Related

PostgreSQL how to GROUP BY single field from returned table

So I have complicated query, to simplify let it be like
SELECT
t.*,
SUM(a.hours) AS spent_hours
FROM (
SELECT
person.id,
person.name,
person.age,
SUM(contacts.id) AS contact_count
FROM
person
JOIN contacts ON contacts.person_id = person.id
) AS t
JOIN activities AS a ON a.person_id = t.id
GROUP BY t.id
Such query works fine in MySQL, but Postgres needs to know that GROUP BY field is unique, and despite it actually is, in this case I need to GROUP BY all returned fields from returned t table.
I can do that, but I don't believe that will work efficiently with big data.
I can't JOIN with activities directly in first query, as person can have several contacts which will lead query counting hours of activity several time for every joined contact.
Is there a Postgres way to make this query work? Maybe force to treat Postgres t.id as unique or some other solution that will make same in Postgres way?
This query will not work on both database system, there is an aggregate function in the inner query but you are not grouping it(unless you use window functions). Of course there is a special case for MySQL, you can use it with disabling "sql_mode=only_full_group_by". So, MySQL allows this usage because of it' s database engine parameter, but you cannot do that in PostgreSQL.
I knew MySQL allowed indeterminate grouping, but I honestly never knew how it implemented it... it always seemed imprecise to me, conceptually.
So depending on what that means (I'm too lazy to look it up), you might need one of two possible solutions, or maybe a third.
If you intent is to see all rows (perform the aggregate function but not consolidate/group rows), then you want a windowing function, invoked by partition by. Here is a really dumbed down version in your query:
.
SELECT
t.*,
SUM (a.hours) over (partition by t.id) AS spent_hours
FROM t
JOIN activities AS a ON a.person_id = t.id
This means you want all records in table t, not one record per t.id. But each row will also contain a sum of the hours for all values that value of id.
For example the sum column would look like this:
Name Hours Sum Hours
----- ----- ---------
Smith 20 120
Jones 30 30
Smith 100 120
Whereas a group by would have had Smith once and could not have displayed the hours column in detail.
If you really did only want one row per t.id, then Postgres will require you to tell it how to determine which row. In the example above for Smith, do you want to see the 20 or the 100?
There is another possibility, but I think I'll let you reply first. My gut tells me option 1 is what you're after and you want the analytic function.

How to limit to just one result per condition when looking through multiple OR/IN conditions in the WHERE clause (Postgresql)

For Example:
SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
I want to LIMIT 1 for each of the countries in my IN clause so I only see a total of 3 rows: One customer for per country (1 German, 1 France, 1 UK). Is there a simple way to do that?
Normally, a simple GROUP BY would suffice for this type of solution, however as you have specified that you want to include ALL of the columns in the result, then we can use the ROW_NUMBER() window function to provide a value to filter on.
As a general rule it is important to specify the column to sort on (ORDER BY) for all windowing or paged queries to make the result repeatable.
As no schema has been supplied, I have used Name as the field to sort on for the window, please update that (or the question) with any other field you would like, the PK is a good candidate if you have nothing else to go on.
SELECT * FROM
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY Country ORDER BY Name) AS _rn
FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
)
WHERE _rn = 1
The PARTITION BY forces the ROW_NUMBER to be counted across all records with the same Country value, starting at 1, so in this case we only select the rows that get a row number (aliased as _rn) of 1.
The WHERE clause could have been in the outer query if you really want to, but ROW_NUMBER() can only be specified in the SELECT or ORDER BY clauses of the query, so to use it as a filter criteria we are forced to wrap the results in some way.

In TSQL, How do I add a count column that counts the number of rows in my query?

This can be done a number of ways, which I will explain at the end. For now, I have been given a work assignment that includes the following (simplified):
"Create a record each week to track the current status that has the following: account numbers (unique within each report), a random number (provided), their status (Green, Orange, or Blue), and make sure the record also has a column which tells me how many records their are this week."
I do not need code to generate a random number.
Columns: Account, RanNum, Status, NumberOfRowsThisWeek
How do I handle adding a column that determines the number of rows in my query and produces that number, static, within each row of that column?
I may try to tweak the request and apply a rising number. How would I go about doing it in this case?
Edit: SQL Server 2014
You are not telling us which database you are using.
In SQL Server, the newer versions at least, you have windowing function or analytical functions available, and they are also available in most other popular RDBMS
You could do what you want in SQL Server by adding this to your select
,count(*) over (partition by 1) as [NrOfRows]
An analytical function does the "standard" query, and then performs the windowing function on the result set.
The count above, counts the rows in the result set, partitioned by the constant 1, which is of course stable across all rows, so it gives the full rowcount.
It is perhaps not standard in all databases to allow a constant in that way, perhaps this would give a better result in some, I know it works in SQL Server:
,count(*) over (partition by (select 1 n)) as [NrOfRows]
it sounds like you want to do some kind of simple count() / group by query
select Account, RanNum, Status, count(*) as NumberOfRowsThisWeek
from tablename
group by Account, RanNum, Status
you my need to do
select Account, RanNum, Status, NumberOfRowsThisWeek
from (
select Account, Status, count(*) as NumberOfRowsThisWeek
from tablename
group by Account, Status
)
because the random number will confuse the group by by making every row unique.

Calculate previous order date and status in Postgres

I have a simple table of orders, and I need to calculate some stats for each order. Essentially I have a Postgres db with fields:
Order_ID (unique), User_ID, Created_at (date), City, Total
I want to write a query that will generate, for each Order_ID:
1) the Created_at date of the user's most recent order prior to the current Order_ID (so if a customer placed order with Order_ID=200005b on 9/20/14, what is the date of that user's most recent previous order?)
2) another field showing a user's "Status" based on this date, given the following cases:
-- if this is user's first order, Status="new";
-- if most recent previous order date <= 60 days before the given/current order, Status="active";
-- if most recent previous order date > 60 days before the given/current order, Status="reactivated"
I think there's a way to write this query using some nested SELECTS, and maybe a self-join, but I don't know PostgreSQL well enough to understand the ordering of queries. I have been able to generate an "Order_N" field using the following query that I could use to lookup (Order_N)-1 to find the date, but I get stuck once trying to use that in nesting.
SELECT
user_id,
order_id,
created_at,
row_number() over (partition by user_id order by created_at ) as order_n
order by user_id, created_at;
Does anyone have any ideas?

Calculate Mode - "Highest frequency row" DB2

What would be the most efficient way to calculating the mode across tables with joins in DB2..
I am trying to get the value with the most frequency(count) for a given column(ID - candidate key for joined table) on a given date.
The idea is to get the most common (value) from the table which has different (value)s for some accounts (for the same ID and date). We need to make it unique for use in another table.
You can use common table expressions [CTE's], indicated by WITH, to break the logic down into logical steps. First we'll build the summary rows, then we'll assign a ranking to the rows within each group, then pick out the ones that with the highest count of records.
Let's say we want to know which flavor of each item sells the most frequently on each date (perhaps assuming a record is quantity one).
WITH s as
(
SELECT itemID, saleDate, flavor, count(*) as tally
FROM sales
GROUP BY itemID, saleDate, flavor
), r as
(
SELECT itemID, saleDate, flavor, tally,
RANK() OVER (PARTITION BY itemID, saleDate ORDER BY tally desc) as pri
FROM s
)
SELECT itemID, saleDate, flavor, tally
FROM r
WHERE pri = 1
Here the names "s" and "r" refer to the result set from their respective CTE's. These names can then be used as to represent a table in another part of the statement.
The pri column will have the RANK() of tally value on the summary row from the first section "s" within the window of itemID and saleDate. Tally is descending, because we want the largest value first, which will get a RANK() of 1. Then in the main SELECT we simply pick those summary records which were first in their partition.
By using RANK() or DENSE_RANK() we could get back multiple flavors for an itemID, saleDate, if they are tied for first place. This could be eliminated by replacing RANK() with ROW_NUMBER(), but it would arbitrarily pick one of the tied flavors as a winner, and this may not be correct answer for the problem at hand.
If we had a sales quantity column in the table, we could replace COUNT(*) with SUM(salesqty) and find what had sold the most units.