Column name seems to be use instead of alias in GROUP BY - postgresql

For a project, I'm looking to get all results group by day.
Here is my query:
SELECT MAX(id) AS id,
SUM(value) AS value,
country,
cast(TO_CHAR(date, 'dd/mm/yyyy') AS DATE) AS date
FROM records
GROUP BY date, country
My problem is that records are not groupped correctly when I use my "date" alias, instead it seems to group by the field name.
Results with group by alias
It works if I use indices instead of alias, but I'd like to have column's name in my result :
SELECT MAX(id) AS id,
SUM(value) AS value,
country,
cast(TO_CHAR(date, 'dd/mm/yyyy') AS DATE) AS date
FROM records
GROUP BY 3, 4
Results with group by indices
Has someone an idea why it works this way?

Quote from the manual
An expression used inside a grouping_element can be an input column name, or the name or ordinal number of an output column (SELECT list item), or an arbitrary expression formed from input-column values. In case of ambiguity, a GROUP BY name will be interpreted as an input-column name rather than an output column name
(emphasis mine)
So the (input) column names always have precedence over column aliases.

The two GROUP BY clauses are not equivalent.
In both, the SELECT clause is:
SELECT
MAX(id) AS id,
SUM(value) AS value,
country,
cast(TO_CHAR(date, 'dd/mm/yyyy') AS DATE) AS date
So the columns will be (id, value, country, date).
The first query groups by date then country:
GROUP BY date, country
The second groups by country then date:
GROUP BY 3, 4
With different hierarchy of GROUP BY you'll get different results, such as what you show.

Related

How to limit to just one result per condition when looking through multiple OR/IN conditions in the WHERE clause (Postgresql)

For Example:
SELECT * FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
I want to LIMIT 1 for each of the countries in my IN clause so I only see a total of 3 rows: One customer for per country (1 German, 1 France, 1 UK). Is there a simple way to do that?
Normally, a simple GROUP BY would suffice for this type of solution, however as you have specified that you want to include ALL of the columns in the result, then we can use the ROW_NUMBER() window function to provide a value to filter on.
As a general rule it is important to specify the column to sort on (ORDER BY) for all windowing or paged queries to make the result repeatable.
As no schema has been supplied, I have used Name as the field to sort on for the window, please update that (or the question) with any other field you would like, the PK is a good candidate if you have nothing else to go on.
SELECT * FROM
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY Country ORDER BY Name) AS _rn
FROM Customers
WHERE Country IN ('Germany', 'France', 'UK')
)
WHERE _rn = 1
The PARTITION BY forces the ROW_NUMBER to be counted across all records with the same Country value, starting at 1, so in this case we only select the rows that get a row number (aliased as _rn) of 1.
The WHERE clause could have been in the outer query if you really want to, but ROW_NUMBER() can only be specified in the SELECT or ORDER BY clauses of the query, so to use it as a filter criteria we are forced to wrap the results in some way.

Sort in non-alpabetical order in postgresql

I'm automating a process at work where the output needs to be in a certain non-alphabetical order depending on a name (internal_product, type text) in addition to a number (type text). First I'm running a subquery where I collect information from four slightly different tables using joins. I then append the result with a union before the outer group by sums units and amounts. The pseudo-query is as follows:
select name, number, internal_product, sum(units), sum(amount) from (
select fields, sum(x)
from t1
join join-conditions
join join-conditions
group by name, number, internal_product
union
.....
select fields, sum(x)
from t5
join join-conditions
join join-conditions
group by name, number, internal_product
) as foo
group by name, number, internal_product
order by number, name;
I tried to change a column in a helper table used in one of the joins to an enum type since it is used in the outer group by (SO-thread) but the column type of course needs to be the same in the join-condition so the modified query was not valid. There are 30 product names so I would like to avoid using a CASE name as suggested by gbn and Guffa.
Are there other ways to apply a certain order in a order by?
It might be overkill or complicated for your case, but you could create a custom collation in postgres to sort the way you want. Have a look at the documentation.
https://www.postgresql.org/docs/11/collation.html

PostgreSQL order by month name on distinction

I have this query
SELECT DISTINCT ON (tours.departure.departure_month)
tours.departure.departure_month
FROM tours.departure
But I want to order the distinct months by month name. I've tried this from a similar question to_date(tours.departure.departure_month, 'Month'),
but I cannot get it to work with DISTINCT ON.
What is the column type of departure_month is it date or month names, if its date type column you can try the following:
SELECT DISTINCT ON (tours.departure.departure_month)
tours.departure.departure_month
FROM tours.departure
ORDER BY month(tours.departure.departure_month) DESC;

Postgres groupby query

I need to group by the product name and by the date.
so from the above example, I am expecting to see the result as below. Is it possible . Can some one help me out please.Thanks!
Maybe I'm missing something in your question, but this is as simple as:
select product_name,
date,
count(*) as cnt
from the_table
group by product_name,
date
order by product_name
Btw: date is a horrible name for a column. First because it's a reserved word but more importantly because it does not document what you store in the column. It could be a "purchase_date", a "sold_date", an "expiry_date", ... ?

Postgres: Distinct but only for one column

I have a table on pgsql with names (having more than 1 mio. rows), but I have also many duplicates. I select 3 fields: id, name, metadata.
I want to select them randomly with ORDER BY RANDOM() and LIMIT 1000, so I do this is many steps to save some memory in my PHP script.
But how can I do that so it only gives me a list having no duplicates in names.
For example [1,"Michael Fox","2003-03-03,34,M,4545"] will be returned but not [2,"Michael Fox","1989-02-23,M,5633"]. The name field is the most important and must be unique in the list everytime I do the select and it must be random.
I tried with GROUP BY name, bu then it expects me to have id and metadata in the GROUP BY as well or in a aggragate function, but I dont want to have them somehow filtered.
Anyone knows how to fetch many columns but do only a distinct on one column?
To do a distinct on only one (or n) column(s):
select distinct on (name)
name, col1, col2
from names
This will return any of the rows containing the name. If you want to control which of the rows will be returned you need to order:
select distinct on (name)
name, col1, col2
from names
order by name, col1
Will return the first row when ordered by col1.
distinct on:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
Anyone knows how to fetch many columns but do only a distinct on one column?
You want the DISTINCT ON clause.
You didn't provide sample data or a complete query so I don't have anything to show you. You want to write something like:
SELECT DISTINCT ON (name) fields, id, name, metadata FROM the_table;
This will return an unpredictable (but not "random") set of rows. If you want to make it predictable add an ORDER BY per Clodaldo's answer. If you want to make it truly random, you'll want to ORDER BY random().
To do a distinct on n columns:
select distinct on (col1, col2) col1, col2, col3, col4 from names
SELECT NAME,MAX(ID) as ID,MAX(METADATA) as METADATA
from SOMETABLE
GROUP BY NAME