Selecting more info from the same row after using GROUP BY - postgresql

I have a table containing, for example, this data:
id | value | name | date
1 | 1 | 'one' | 2015-01-02
2 | 1 | 'two' | 2015-02-03
3 | 2 | 'three'| 2014-01-03
4 | 2 | 'four' | 2014-01-02
I want for each distinct value, the name of the row with the latest date. So:
value | name | date
1 | 'two' | 2015-02-03
2 | 'three'| 2014-01-03
I currently have this query: SELECT value, MAX(date) FROM table GROUP BY value, which gives me the value and date columns I'm looking for. How do I modify the query to add the name field? Simply adding it to the SELECT clause won't work, as Postgres will (understandably) complain I have to add it to the GROUP BY clause. But doing so will add it to the uniqueness check, and my query will return all 4 rows. All I need is the name of the row where it found the latest date.

distinct on() is the most efficient way to do this with Postgres
select distinct on (value) id, value, name, date
from the_table
order by value, date;
SQLFiddle example: http://sqlfiddle.com/#!15/dff68/1

This will give you all required fields:
select t1.* from table t1
inner join (
SELECT value, MAX(date) as date FROM table GROUP BY value
)t2 on t1.date=t2.date;
SQL Fiddle: http://sqlfiddle.com/#!15/9491f/2

Related

tsql - How to convert multiples rows and columns into one row

id | acct_num | name | orderdt
1 1006A Joe Doe 1/1/2021
2 1006A Joe Doe 1/5/2021
EXPECTED OUTPUT
id | acct_num | name | orderdt | id1 | acct_num1 | NAME1 | orderdt1
1 1006A Joe Doe 1/1/2021 2 1006A Joe Doe 1/5/2021
My query is the following:
Select id,
acct_num,
name,
orderdt
from order_tbl
where acct_num = '1006A'
and orderdt >= '1/1/2021'
If you always have one or two rows you could do it like this (I'm assuming the latest version of SQL Server because you said TSQL):
NOTE: If you have a known max (eg 4) this solution can be converted to support any number by changing the modulus and adding more columns and another join.
WITH order_table_numbered as
(
SELECT ID, ACCT_NUM, NAME, ORDERDT,
ROW_NUMBER() AS (PARTITION BY ACCT_NUM ORDER BY ORDERDT) as RN
)
SELECT first.id as id, first.acct_num as acct_num, first.num as num, first.order_dt as orderdt,
second.id as id1, second.acct_num as acct_num1, second.num as num1, second.order_dt as orderdt1
FROM order_table_numbered first
LEFT JOIN order_table_numbered second ON first.ACCT_NUM = second.ACCT_NUM and (second.RN % 2 = 0)
WHERE first.RN % 2 = 1
If you have an unknown number of rows I think you should solve this on the client OR convert the groups to XML -- the XML support in SQL Server is not bad.

Postgres : Get multiple columns with group by

Table
select * from hello;
id | name
----+------
1 | abc
2 | xyz
3 | abc
4 | dfg
5 | abc
(5 rows)
Query
select name,count(*) from hello where name in ('abc', 'dfg') group by name;
name | count
------+-------
dfg | 1
abc | 3
(2 rows)
In the above query, I am trying to get the count of the rows whose name is in the tuple. However, I want to get the id as well with the count of the names. Is there a way this can be achievable? Thanks
If you want to return the "id" values, then you can use a window function:
select id, name, count(*) over(PARTITION BY name)
from hello
where name in ('abc', 'dfg');
This will return the id values along with the count of rows per name.
If you want to see all IDs for each name, you need to aggregate them:
select name, count(*), array_agg(id) as ids
from hello
where name in ('abc', 'dfg')
group by name;
This returns something like this:
name | count | ids
-----+-------+--------
abc | 3 | {1,3,5}
dfg | 1 | {4}

Postgresql: get first item of an ordered group not working [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 6 years ago.
I have this data:
| id | person_id | date |
|--------|-----------|---------------------|
| 313962 | 1111111 | 2016-04-14 16:00:00 | --> this row
| 313946 | 2222222 | 2015-03-13 15:00:00 | --> this row
| 313937 | 1111111 | 2014-02-12 14:00:00 |
| 313944 | 1111111 | 2013-01-11 13:00:00 |
| ...... | ....... | ................... |
-What I would like to select are the indicated rows, i.e. the rows with the most recent date for each person_id.
-Also the output format for the date must be dd-mm-YYYY
So far I was trying with this:
SELECT
l.person_id,
to_char(DATE(l.date), 'dd-mm-YYYY') AS user_date
FROM login l
group by l.person_id
order by l.date desc
I was trying different approaches, but I have all kind of Aggregation error messages such as:
for select distinct order by expressions must appear
And
must appear in the GROUP BY clause or be used in an aggregate function
Any idea?
There are several ways, but the simplest way (and perhaps more efficient - but not SQL standard) is to rely on Postgresql's DISTINCT ON:
SELECT DISTINCT ON (person_id )
id, person_id , date
FROM login
ORDER BY person_id , date desc
The date formatting (do you really want that?) can be done in a outer select:
SELECT id,person_id, to_char(DATE(date), 'dd-mm-YYYY') as date
FROM (
SELECT DISTINCT ON (person_id )
id, person_id , date
FROM login
ORDER BY person_id, date desc )
AS XXX;
You can do it with a subquery, something like this:
SELECT
l.person_id,
to_char(DATE(l.date), 'dd-mm-YYYY') AS user_date
FROM login l
where l.date = (select max(date) from login where person_id = l.person_id)
order by l.person_id
You need something like the following to know which date to grab for each person.
select l.person_id, to_char(DATE(d.maxdate), 'dd-mm-YYYY')
from login l
inner join
(select person_id, max(date) as maxdate
from login group by person_id) d on l.person_id = d.person_id
order by d.maxdate desc

Remove duplicate with lower Date from SQL result

I have following table:
CREATE TABLE Kundendaten (
beschreiben_knr INTEGER REFERENCES Kunde(knr) DEFERRABLE INITIALLY DEFERRED,
erstelldatum DATE,
anschrift VARCHAR(40),
sonderrabat INTEGER,
PRIMARY KEY (erstelldatum, beschreiben_knr)
);
If i make this query:
select * from Kundendaten ORDER BY erstelldatum DESC;
i get:
beschreiben_knr | erstelldatum | anschrift | sonderrabat
-----------------+--------------+---------------+-------------
1 | 2015-11-01 | Winkelgasse 5 | 0
2 | 2015-11-01 | Badeteich 7 | 10
3 | 2015-11-01 | Senfgasse 7 | 15
1 | 2015-10-30 | Sonnenweg 3 | 5
But i need to get only the entry for the highest date entry if there are more then one. In this case the last row should not appear.
How can i achieve this in postgresql?
You want something like WHERE erstelldatum = MAX(DATE) but that doesn't work. You can use a sub-query to get the newest date.
SELECT *
FROM Kundendaten
WHERE erstelldatum = (
SELECT MAX(erstelldatum) FROM Kundendaten
);
(SQL Fiddle)
Postgres will optimize that subquery so it is only run once, but you'll want to make sure erstelldatum is indexed.

adding missing date in a table in PostgreSQL

I have a table that contains data for every day in 2002, but it has some missing dates. Namely, 354 records for 2002 (instead of 365). For my calculations, I need to have the missing data in the table with Null values
+-----+------------+------------+
| ID | rainfall | date |
+-----+------------+------------+
| 100 | 110.2 | 2002-05-06 |
| 101 | 56.6 | 2002-05-07 |
| 102 | 65.6 | 2002-05-09 |
| 103 | 75.9 | 2002-05-10 |
+-----+------------+------------+
you see that 2002-05-08 is missing. I want my final table to be like:
+-----+------------+------------+
| ID | rainfall | date |
+-----+------------+------------+
| 100 | 110.2 | 2002-05-06 |
| 101 | 56.6 | 2002-05-07 |
| 102 | | 2002-05-08 |
| 103 | 65.6 | 2002-05-09 |
| 104 | 75.9 | 2002-05-10 |
+-----+------------+------------+
Is there a way to do that in PostgreSQL?
It doesn't matter if I have the result just as a query result (not necessarily an updated table)
date is a reserved word in standard SQL and the name of a data type in PostgreSQL. PostgreSQL allows it as identifier, but that doesn't make it a good idea. I use thedate as column name instead.
Don't rely on the absence of gaps in a surrogate ID. That's almost always a bad idea. Treat such an ID as unique number without meaning, even if it seems to carry certain other attributes most of the time.
In this particular case, as #Clodoaldo commented, thedate seems to be a perfect primary key and the column id is just cruft - which I removed:
CREATE TEMP TABLE tbl (thedate date PRIMARY KEY, rainfall numeric);
INSERT INTO tbl(thedate, rainfall) VALUES
('2002-05-06', 110.2)
, ('2002-05-07', 56.6)
, ('2002-05-09', 65.6)
, ('2002-05-10', 75.9);
Query
Full table by query:
SELECT x.thedate, t.rainfall -- rainfall automatically NULL for missing rows
FROM (
SELECT generate_series(min(thedate), max(thedate), '1d')::date AS thedate
FROM tbl
) x
LEFT JOIN tbl t USING (thedate)
ORDER BY x.thedate
Similar to what #a_horse_with_no_name posted, but simplified and ignoring the pruned id.
Fills in gaps between first and last date found in the table. If there can be leading / lagging gaps, extend accordingly. You can use date_trunc() like #Clodoaldo demonstrated - but his query suffers from syntax errors and can be simpler.
INSERT missing rows
The fastest and most readable way to do it is a NOT EXISTS anti-semi-join.
INSERT INTO tbl (thedate, rainfall)
SELECT x.thedate, NULL
FROM (
SELECT generate_series(min(thedate), max(thedate), '1d')::date AS thedate
FROM tbl
) x
WHERE NOT EXISTS (SELECT 1 FROM tbl t WHERE t.thedate = x.thedate)
Just do an outer join against a query that returns all dates in 2002:
with all_dates as (
select date '2002-01-01' + i as date_col
from generate_series(0, extract(doy from date '2002-12-31')::int - 1) as i
)
select row_number() over (order by ad.date_col) as id,
t.rainfall,
ad.date_col as date
from all_dates ad
left join your_table t on ad.date_col = t.date
order by ad.date_col;
This will not change your table, it will just produce the result as desired.
Note that the generated id column will not contain the same values as the ID column in your table as it is merely a counter in the result set.
You could also replace the row_number() function with extract(doy from ad.date_col)
To fill the gaps. This will not reorder the IDs:
insert into t (rainfall, "date") values
select null, "date"
from (
select d::date as "date"
from (
t
right join
generate_series(
(select date_trunc('year', min("date")) from t)::timestamp,
(select max("date") from t),
'1 day'
) s(d) on t."date" = s.d::date
where t."date" is null
) q
) s
You have to fully re-create your table as indexes haves to change.
The better way to do it is to use your prefered dbi language, make a loop ignoring ID and putting values in a new table with new serialized IDs.
for day in (whole needed calendar)
value = select rainfall from oldbrokentable where date = day
insert into newcleanedtable date=day, rainfall=value, id=serialized
(That's not real code! Just conceptual to be adapted to your prefered scripting language)