how to make array_agg() work like group_concat() from mySQL - postgresql

So I have this table:
create table test (
id integer,
rank integer,
image varchar(30)
);
Then some values:
id | rank | image
---+------+-------
1 | 2 | bbb
1 | 3 | ccc
1 | 1 | aaa
2 | 3 | c
2 | 1 | a
2 | 2 | b
I want to group them by id and concatenate the image name in the order given by rank. In mySQL I can do this:
select id,
group_concat( image order by rank asc separator ',' )
from test
group by id;
And the output would be:
1 aaa,bbb,ccc
2 a,b,c
Is there a way I can have this in postgresql?
If I try to use array_agg() the names will not show in the correct order and apparently I was not able to find a way to sort them. (I was using postgres 8.4 )

In PostgreSQL 8.4 you cannot explicitly order array_agg but you can work around it by ordering the rows passed into to the group/aggregate with a subquery:
SELECT id, array_to_string(array_agg(image), ',')
FROM (SELECT * FROM test ORDER BY id, rank) x
GROUP BY id;
In PostgreSQL 9.0 aggregate expressions can have an ORDER BY clause:
SELECT id, array_to_string(array_agg(image ORDER BY rank), ',')
FROM test
GROUP BY id;

Related

How to get id of the row which was selected by aggregate function? [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 4 years ago.
I have next data:
id | name | amount | datefrom
---------------------------
3 | a | 8 | 2018-01-01
4 | a | 3 | 2018-01-15 10:00
5 | b | 1 | 2018-02-20
I can group result with the next query:
select name, max(amount) from table group by name
But I need the id of selected row too. Thus I have tried:
select max(id), name, max(amount) from table group by name
And as it was expected it returns:
id | name | amount
-----------
4 | a | 8
5 | b | 1
But I need the id to have 3 for the amount of 8:
id | name | amount
-----------
3 | a | 8
5 | b | 1
Is this possible?
PS. This is required for billing task. At some day 2018-01-15 configuration of a was changed and user consumes some resource 10h with the amount of 8 and rests the day 14h -- 3. I need to count such a day by the maximum value. Thus row with id = 4 is just ignored for 2018-01-15 day. (for next day 2018-01-16 I will bill the amount of 3)
So I take for billing the row:
3 | a | 8 | 2018-01-01
And if something is wrong with it. I must report that row with id == 3 is wrong.
But when I used aggregation function the information about id is lost.
Would be awesome if this is possible:
select current(id), name, max(amount) from table group by name
select aggregated_row(id), name, max(amount) from table group by name
Here agg_row refer to the row which was selected by aggregation function max
UPD
I resolve the task as:
SELECT
(
SELECT id FROM t2
WHERE id = ANY ( ARRAY_AGG( tf.id ) ) AND amount = MAX( tf.amount )
) id,
name,
MAX(amount) ma,
SUM( ratio )
FROM t2 tf
GROUP BY name
UPD
It would be much better to use window functions
There are at least 3 ways, see below:
CREATE TEMP TABLE test (
id integer, name text, amount numeric, datefrom timestamptz
);
COPY test FROM STDIN (FORMAT csv);
3,a,8,2018-01-01
4,a,3,2018-01-15 10:00
5,b,1,2018-02-20
6,b,1,2019-01-01
\.
Method 1. using DISTINCT ON (PostgreSQL-specific)
SELECT DISTINCT ON (name)
id, name, amount
FROM test
ORDER BY name, amount DESC, datefrom ASC;
Method 2. using window functions
SELECT id, name, amount FROM (
SELECT *, row_number() OVER (
PARTITION BY name
ORDER BY amount DESC, datefrom ASC) AS __rn
FROM test) AS x
WHERE x.__rn = 1;
Method 3. using corelated subquery
SELECT id, name, amount FROM test
WHERE id = (
SELECT id FROM test AS t2
WHERE t2.name = test.name
ORDER BY amount DESC, datefrom ASC
LIMIT 1
);
demo: db<>fiddle
You need DISTINCT ON which filters the first row per group.
SELECT DISTINCT ON (name)
*
FROM table
ORDER BY name, amount DESC
You need a nested inner join. Try this -
SELECT id, T2.name, T2.amount
FROM TABLE T
INNER JOIN (SELECT name, MAX(amount) amount
FROM TABLE
GROUP BY name) T2
ON T.amount = T2.amount

PostgreSQL Group By not working as expected - wants too many inclusions

I have a simple postgresql table that I'm tying to query. Imaging a table like this...
| ID | Account_ID | Iteration |
|----|------------|-----------|
| 1 | 100 | 1 |
| 2 | 101 | 1 |
| 3 | 100 | 2 |
I need to get the ID column for each Account_ID where Iteration is at its maximum value. So, you'd think something like this would work
SELECT "ID", "Account_ID", MAX("Iteration")
FROM "Table_Name"
GROUP BY "Account_ID"
And I expect to get:
| ID | Account_ID | MAX(Iteration) |
|----|------------|----------------|
| 2 | 101 | 1 |
| 3 | 100 | 2 |
But when I do this, Postgres complains:
ERROR: column "ID" must appear in the GROUP BY clause or be used in an aggregate function
Which, when I do that it just destroys the grouping altogether and gives me the whole table!
Is the best way to approach this using the following?
SELECT DISTINCT ON ("Account_ID") "ID", "Account_ID", "Iteration"
FROM "Marketing_Sparks"
ORDER BY "Account_ID" ASC, "Iteration" DESC;
The GROUP BY statement aggregates rows with the same values in the columns included in the group by into a single row. Because this row isn't the same as the original row, you can't have a column that is not in the group by or in an aggregate function. To get what you want, you will probably have to select without the ID column, then join the result to the original table. I don't know PostgreSQL syntax, but I assume it would be something like the following.
SELECT Table_Name.ID, aggregate.Account_ID, aggregate.MIteration
(SELECT Account_ID, MAX(Iteration) AS MIteration
FROM Table_Name
GROUP BY Account_ID) aggregate
LEFT JOIN Table_Name ON aggregate.Account_ID = Table_Name.Account_ID AND
aggregate.MIteration = Tabel_Name.Iteration

Find all multipolygons from one table within another

So, I've got two tables - PLUTO (pieces of land), and NYZMA (rezoning boundaries). They look like:
pluto nyzma
id | geom name | geom
-------------------- -------------------
1 | MULTIPOLYGON(x) A | MULTIPOLYGON(a)
2 | MULTIPOLYGON(y) B | MULTIPOLYGON(b)
And I want it to spit out something like this, assuming that PLUTO record 1 is in multipolygons A and B, and PLUTO record 2 is in neither:
pluto_id | nyzma_id
-------------------
1 | [A, B]
2 |
How do I, for every PLUTO record's corresponding geometry, cycle through each NYZMA record, and print the names of any whose geometry matches?
Join the two tables using the spatial function ST_Contains. Than use GROUP BY and ARRAY_AGG in the main query:
WITH subquery AS (
SELECT pluto.id, nyzma.name
FROM pluto LEFT OUTER JOIN nyzma
ON ST_Contains(nyzma.geom, pluto.geom)
)
SELECT id, array_agg(name) FROM subquery GROUP BY id;

Update Count column in Postgresql

I have a single table laid out as such:
id | name | count
1 | John |
2 | Jim |
3 | John |
4 | Tim |
I need to fill out the count column such that the result is the number of times the specific name shows up in the column name.
The result should be:
id | name | count
1 | John | 2
2 | Jim | 1
3 | John | 2
4 | Tim | 1
I can get the count of occurrences of unique names easily using:
SELECT COUNT(name)
FROM table
GROUP BY name
But that doesn't fit into an UPDATE statement due to it returning multiple rows.
I can also get it narrowed down to a single row by doing this:
SELECT COUNT(name)
FROM table
WHERE name = 'John'
GROUP BY name
But that doesn't allow me to fill out the entire column, just the 'John' rows.
you can do that with a common table expression:
with counted as (
select name, count(*) as name_count
from the_table
group by name
)
update the_table
set "count" = c.name_count
from counted c
where c.name = the_table.name;
Another (slower) option would be to use a co-related sub-query:
update the_table
set "count" = (select count(*)
from the_table t2
where t2.name = the_table.name);
But in general it is a bad idea to store values that can easily be calculated on the fly:
select id,
name,
count(*) over (partition by name) as name_count
from the_table;
Another method : Using a derived table
UPDATE tb
SET count = t.count
FROM (
SELECT count(NAME)
,NAME
FROM tb
GROUP BY 2
) t
WHERE t.NAME = tb.NAME

T SQL Question: How to apply changes to similar rows based on one row's value

In my SQL Server 2008 DB, I have a table with records sort of like this:
ID 1 | Group1 | \ftp\path\group1\file1.txt
ID 2 | Group1 | C:\local\file\path\group1\file1.txt
ID 3 | Group1 | C:\local\file\path\group1\file1.txt
ID 4 | Group1 | C:\local\file\path\group1\file1.txt
ID 5 | Group2 | \ftp\path\group2\file1.txt
ID 6 | Group2 | C:\local\file\path\group2\file1.txt
ID 7 | Group2 | C:\local\file\path\group2\file1.txt
I need to update the table to look like this:
ID 1 | Group1 | \ftp\path\group1\file1.txt
ID 2 | Group1 | \ftp\path\group1\file1.txt
ID 3 | Group1 | \ftp\path\group1\file1.txt
ID 4 | Group1 | \ftp\path\group1\file1.txt
ID 5 | Group2 | \ftp\path\group2\file1.txt
ID 6 | Group2 | \ftp\path\group2\file1.txt
ID 7 | Group2 | \ftp\path\group2\file1.txt
I just don't know how to start this. It's easy for me to find the values in the third column, because they match this wildcard: %:\%.
So, I'm trying to replace the value in those fields that match that wildcard with the correct value in a record that does not match that wildcard. Damn, it's so hard to explain it.
I'm probably doing a poor job of explaining this issue but the right words are eluding me at the moment.
Any ideas? I appreciate the help.
This gets the results you show, but I don't think the rules I applied match the way you described how you got here. You're talking about a wildcard '%:\%' but I see nothing in any of the data that looks anything like that.
DECLARE #foo TABLE
(
ID VARCHAR(32) PRIMARY KEY,
[Group] VARCHAR(32),
Val VARCHAR(32)
);
INSERT #foo SELECT 'ID 1','Group1','Value 1'
UNION ALL SELECT 'ID 2','Group1','Value 2'
UNION ALL SELECT 'ID 3','Group1','Value 3'
UNION ALL SELECT 'ID 4','Group1','Value 4'
UNION ALL SELECT 'ID 5','Group2','A Different Value 1'
UNION ALL SELECT 'ID 6','Group2','A Different Value 2'
UNION ALL SELECT 'ID 7','Group2','A Different Value 3';
SELECT ID, [Group], Val FROM #foo;
WITH x AS
(
SELECT
ID, [Group], Val,
rn = ROW_NUMBER() OVER (PARTITION BY [Group] ORDER BY val)
FROM #foo
)
UPDATE x
SET x.Val = y.Val
FROM x
INNER JOIN x AS y
ON x.[Group] = y.[Group]
WHERE y.rn = 1 AND x.rn > 1;
SELECT ID, [Group], Val FROM #foo;
Something like this maybe?
UPDATE table
SET table.valueColumn = CT.correctValueColumn
FROM table as CT
INNER JOIN table as IT on IT.group = CT.group AND CT.valueColumn LIKE '%:\%'
WHERE IT.valueColumn NOT LIKE '%:\%'
I don't have management studio on this machine so I'm not sure it's syntatically correct.
Hope this helps some.