Preserve the order of distinct inside string_agg - postgresql

My SQL function:
with recursive locpais as (
select l.id, l.nome, l.tipo tid, lp.pai
from loc l
left join locpai lp on lp.loc = l.id
where l.id = 12554
union
select l.id, l.nome, l.tipo tid, lp.pai
from loc l
left join locpai lp on lp.loc = l.id
join locpais p on (l.id = p.pai)
)
select * from locpais
gives me
12554 | PARNA Pico da Neblina | 9 | 1564
12554 | PARNA Pico da Neblina | 9 | 1547
1547 | São Gabriel da Cachoeira | 8 | 1400
1564 | Santa Isabel do Rio Negro | 8 | 1400
1400 | RIO NEGRO | 7 | 908
908 | NORTE AMAZONENSE | 6 | 234
234 | Amazonas | 5 | 229
229 | Norte | 4 | 30
30 | Brasil | 3 |
which is a hierarchy of places. "PARNA" stands for "National Park", and this one covers two cities: São Gabriel da Cachoeira and Santa Isabel do Rio Negro. Thus it's appearing twice.
If I change the last line for
select string_agg(nome,', ') from locpais
I get
"PARNA Pico da Neblina, PARNA Pico da Neblina, São Gabriel da
Cachoeira, Santa Isabel do Rio Negro, RIO NEGRO, NORTE AMAZONENSE,
Amazonas, Norte, Brasil"
Which is almost fine, except for the double "PARNA Pico da Neblina". So I tried:
select string_agg(distinct nome, ', ') from locpais
but now I get
"Amazonas, Brasil, Norte, NORTE AMAZONENSE, PARNA Pico da Neblina, RIO
NEGRO, Santa Isabel do Rio Negro, São Gabriel da Cachoeira"
Which is out of order. I'm trying to add an order by inside the string_agg, but couldn't make it work yet. The definition of the tables were given here.

As you've found out, you cannot combine DISTINCT and ORDER BY if you don't order by the distinct expression first:
neither in aggregates:
If DISTINCT is specified in addition to an order_by_clause, then all the ORDER BY expressions must match regular arguments of the aggregate; that is, you cannot sort on an expression that is not included in the DISTINCT list.
nor in SELECT:
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
However could use something like
array_to_string(arry_uniq_stable(array_agg(nome ORDER BY tid DESC)), ', ')
with the help of a function arry_uniq_stable that removes duplicates in an array w/o altering it's order like I gave an example for in https://stackoverflow.com/a/42399297/5805552
Please take care to use an ORDER BY expression that actually gives you an deterministic result. With the example you have given, tid alone would be not enough, as there are duplicate values (8) with different nome.

select string_agg(nome,', ')
from (
select distinct nome
from locpais
order by tid desc
) s

Related

How to find which posts have the highest comments and which posts have the fewest comments?

I am very new to postgreSQl and SQL and databases, I hope you guys can help me with this, i want to know which posts have the most amount of comments and which have the least amount of comments and the users need to be specified too.
CREATE SCHEMA perf_demo;
SET search_path TO perf_demo;
-- Tables
CREATE TABLE users(
id SERIAL -- PRIMARY KEY
, email VARCHAR(40) NOT NULL UNIQUE
);
CREATE TABLE posts(
id SERIAL -- PRIMARY KEY
, user_id INTEGER NOT NULL -- REFERENCES users(id)
, title VARCHAR(100) NOT NULL UNIQUE
);
CREATE TABLE comments(
id SERIAL -- PRIMARY KEY
, user_id INTEGER NOT NULL -- REFERENCES users(id)
, post_id INTEGER NOT NULL -- REFERENCES posts(id)
, body VARCHAR(500) NOT NULL
);
-- Generate approx. N users
-- Note: NULL values might lead to lesser rows than N value.
INSERT INTO users(email)
WITH query AS (
SELECT 'user_' || seq || '#'
|| ( CASE (random() * 5)::INT
WHEN 0 THEN 'my'
WHEN 1 THEN 'your'
WHEN 2 THEN 'his'
WHEN 3 THEN 'her'
WHEN 4 THEN 'our'
END )
|| '.mail' AS email
FROM generate_series(1, 5) seq -- Important: Replace N with a useful value
)
SELECT email
FROM query
WHERE email IS NOT NULL;
-- Generate N posts
INSERT INTO posts(user_id, title)
WITH expanded AS (
SELECT random(), seq, u.id AS user_id
FROM generate_series(1, 8) seq, users u -- Important: Replace N with a useful value
),
shuffled AS (
SELECT e.*
FROM expanded e
INNER JOIN (
SELECT ei.seq, min(ei.random) FROM expanded ei GROUP BY ei.seq
) em ON (e.seq = em.seq AND e.random = em.min)
ORDER BY e.seq
)
-- Top 20 programming languages: https://www.tiobe.com/tiobe-index/
SELECT s.user_id,
'Let''s talk about (' || s.seq || ') '
|| ( CASE (random() * 19 + 1)::INT
WHEN 1 THEN 'C'
WHEN 2 THEN 'Python'
WHEN 3 THEN 'Java'
WHEN 4 THEN 'C++'
WHEN 5 THEN 'C#'
WHEN 6 THEN 'Visual Basic'
WHEN 7 THEN 'JavaScript'
WHEN 8 THEN 'Assembly language'
WHEN 9 THEN 'PHP'
WHEN 10 THEN 'SQL'
WHEN 11 THEN 'Ruby'
WHEN 12 THEN 'Classic Visual Basic'
WHEN 13 THEN 'R'
WHEN 14 THEN 'Groovy'
WHEN 15 THEN 'MATLAB'
WHEN 16 THEN 'Go'
WHEN 17 THEN 'Delphi/Object Pascal'
WHEN 18 THEN 'Swift'
WHEN 19 THEN 'Perl'
WHEN 20 THEN 'Fortran'
END ) AS title
FROM shuffled s;
-- Generate N comments
-- Note: The cross-join is a performance killer.
-- Try the SELECT without INSERT with small N values to get an estimation of the execution time.
-- With these values you can extrapolate the execution time for a bigger N value.
INSERT INTO comments(user_id, post_id, body)
WITH expanded AS (
SELECT random(), seq, u.id AS user_id, p.id AS post_id
FROM generate_series(1, 10) seq, users u, posts p -- Important: Replace N with a useful value
),
shuffled AS (
SELECT e.*
FROM expanded e
INNER JOIN ( SELECT ei.seq, min(ei.random) FROM expanded ei GROUP BY ei.seq ) em ON (e.seq = em.seq AND e.random = em.min)
ORDER BY e.seq
)
SELECT s.user_id, s.post_id, 'Here some comment: ' || md5(random()::text) AS body
FROM shuffled s;
Could someone show me how this could be done please, I am new to SQL/postgres any help would be much appreciated. an Example would be very helpful too.
Good effort in pasting the whole dataset creation procedure, is what it needs to be included in order to make the example reproducible.
Let's start first with, how to join several tables: you have your posts table which contains the user_id and we can use it to join with users with the following.
SELECT email,
users.id user_id,
posts.id post_id,
title
from posts join users
on posts.user_id=users.id;
This will list the posts together with the authors. Check the joining condition (after the ON) stating the fields we're using. The result should be similar to the below
email | user_id | post_id | title
------------------+---------+---------+----------------------------------------
user_1#her.mail | 1 | 5 | Let's talk about (5) Visual Basic
user_1#her.mail | 1 | 2 | Let's talk about (2) Assembly language
user_3#her.mail | 3 | 8 | Let's talk about (8) R
user_3#her.mail | 3 | 7 | Let's talk about (7) Perl
user_4#her.mail | 4 | 6 | Let's talk about (6) Visual Basic
user_5#your.mail | 5 | 4 | Let's talk about (4) R
user_5#your.mail | 5 | 3 | Let's talk about (3) C
user_5#your.mail | 5 | 1 | Let's talk about (1) Ruby
(8 rows)
Now it's time to join this result, with the comments table. Since a post can have comments or not and you want to show all posts even if you don't have any comments you should use the LEFT OUTER JOIN (more info about join types here
So let's rewrite the above to include comments
SELECT email,
users.id user_id,
posts.id post_id,
title,
comments.body
from posts
join users
on posts.user_id=users.id
left outer join comments
on posts.id = comments.post_id
;
Check out the join between posts and comments based on post_id.
The result of the query is the list of posts, related author and comments, similar to the below
email | user_id | post_id | title | body
------------------+---------+---------+----------------------------------------+-----------------------------------------------------
user_1#her.mail | 1 | 5 | Let's talk about (5) Visual Basic |
user_1#her.mail | 1 | 2 | Let's talk about (2) Assembly language |
user_3#her.mail | 3 | 8 | Let's talk about (8) R | Here some comment: 200bb07acfbac893aed60e018b47b92b
user_3#her.mail | 3 | 8 | Let's talk about (8) R | Here some comment: 66159adaed11404b1c88ca23b6a689ef
user_3#her.mail | 3 | 8 | Let's talk about (8) R | Here some comment: e5cc1f7c10bb6103053bf281d3cadb60
user_3#her.mail | 3 | 8 | Let's talk about (8) R | Here some comment: 5ae8674c2ef819af0b1a93398efd9418
user_3#her.mail | 3 | 7 | Let's talk about (7) Perl | Here some comment: 5b818da691c1570dcf732ed8f6b718b3
user_3#her.mail | 3 | 7 | Let's talk about (7) Perl | Here some comment: 88a990e9495841f8ed628cdce576a766
user_4#her.mail | 4 | 6 | Let's talk about (6) Visual Basic |
user_5#your.mail | 5 | 4 | Let's talk about (4) R | Here some comment: ed19bb476eb220d6618e224a0ac2910d
user_5#your.mail | 5 | 3 | Let's talk about (3) C | Here some comment: 23cd43836a44aeba47ad212985f210a7
user_5#your.mail | 5 | 1 | Let's talk about (1) Ruby | Here some comment: b83999120bd2bb09d71aa0c6c83a05dd
user_5#your.mail | 5 | 1 | Let's talk about (1) Ruby | Here some comment: b4895f4e0aa0e0106b5d3834af80275e
(13 rows)
Now you can start aggregating and counting comments for a certain post. You can use PG's aggregation functions, we'll use the COUNT here.
SELECT email,
users.id user_id,
posts.id post_id,
title,
count(comments.id) nr_comments
from posts
join users
on posts.user_id=users.id
left outer join comments
on posts.id = comments.post_id
group by email,
users.id,
posts.id,
title
;
Check out that we're counting the comments.id field but we could also perform a count(*) which just counts the rows. Also check that we are grouping our results by email, users.id, post.id and title, the columns we are showing alongside the count.
The result should be similar to
email | user_id | post_id | title | nr_comments
------------------+---------+---------+----------------------------------------+-------------
user_3#her.mail | 3 | 7 | Let's talk about (7) Perl | 2
user_5#your.mail | 5 | 3 | Let's talk about (3) C | 1
user_5#your.mail | 5 | 1 | Let's talk about (1) Ruby | 2
user_3#her.mail | 3 | 8 | Let's talk about (8) R | 4
user_1#her.mail | 1 | 5 | Let's talk about (5) Visual Basic | 0
user_5#your.mail | 5 | 4 | Let's talk about (4) R | 1
user_4#her.mail | 4 | 6 | Let's talk about (6) Visual Basic | 0
user_1#her.mail | 1 | 2 | Let's talk about (2) Assembly language | 0
(8 rows)
This should be the result you're looking for. Just bear in mind, that you're showing the user from users who wrote the post, not the one who commented. To view who commented you'll need to change the joining conditions.

How to get dynamic number of columns in Postgresql crosstab

I'm new to the postgresql crosstab function and have tried out a few solutions here on SO but still stuck. So basically I have a query that result in an output like the one below:
|student_name|subject_name|marks|
|------------|------------|-----|
|John Doe |ENGLISH |65 |
|John Doe |MATHEMATICS |72 |
|Mary Jane |ENGLISH |74 |
|Mary Jane |MATHEMATICS |70 |
|------------|------------|-----|
And the output I'm aiming for with crosstab is:
|student_name| ENGLISH | MATHEMATICS |
|------------|---------|-------------|
|John Doe | 65 | 72 |
|Mary Jane | 74 | 70 |
|------------|---------|-------------|
My query that returns the first table (without crosstab) is:
SELECT student_name, subject_name, sum(marks) as marks FROM (
SELECT student_id, student_name, class_name, exam_type, subject_name, total_mark as marks, total_grade_weight as out_of, percentage, grade, sort_order
FROM(
SELECT student_id, student_name, class_name, exam_type, subject_name, total_mark, total_grade_weight, ceil(total_mark::float/total_grade_weight::float*100) as percentage,
(select grade from app.grading where (total_mark::float/total_grade_weight::float)*100 between min_mark and max_mark) as grade, sort_order
FROM (
SELECT --big query with lots of JOINS
) q ORDER BY sort_order
)v GROUP BY v.student_id, v.student_name, v.class_name, v.exam_type, v.subject_name, v.total_mark, v.total_grade_weight, v.percentage, v.grade, v.sort_order
ORDER BY student_name ASC, sort_order ASC
)a
GROUP BY student_name, subject_name
ORDER BY student_name
And for the crosstab, this is where I get stuck with the columns.
SELECT * FROM
crosstab(' //the query above here ',
$$VALUES ('MATHEMATICS'::text), ('marks')$$
) AS ct
(student_name text, subject_name character varying, marks numeric);
If I run it as shown above, this is what I end up with:
|student_name|subject_name|marks|
|------------|------------|-----|
|John Doe | 65 | |
|Mary Jane | 74 | |
|____________|____________|_____|
As in it says subject_name not ENGLISH or MATHEMATICS. Obviously now I see I don't need the marks column but how can I get it to pull all the subject names as the column names? They could be two, they could be 12.
Solved it, but I would have preferred a much more dynamic solution. I replaced this;
$$VALUES ('MATHEMATICS'::text), ('marks')$$ with this;
'SELECT subject_name FROM app.subjects WHERE ... ORDER BY ...' The downside to my solution is that the last part changes to
(student_name text, english bigint, mathematics bigint, physics bigint, biology bigint, chemistry bigint, history bigint, ...);
That is, I have to list all the subjects manually and exactly in the order they are listed in from the above select. I don't find this very convenient but it works.

GROUP BY name and ORDER BY point & time MYSQLi

I'm new to this forum and I hope to find my solution about my problem.
I have this table :
name time points car date
Daniel | 55s | 210 | red |20/01/2018
Daniel | 45s | 250 | green |21/01/2018
Julie | 54s | 220 | red |19/01/2018
Julie | 33s | 150 | yellow|22/01/2018
and I wish to sort it like this
name time points car date
Daniel | 45s | 250 |green |21/01/2018
Julie | 54s | 220 |red |19/01/2018
first sorting by points, than sorting by time and group by name (optional the count)
I use this
SELECT NAME, MAX(POINTS) POINTS, MAX(TIME) TIME, MAX(CAR) CAR, MAX(DATE) DATE
FROM ( SELECT A.* FROM test A LEFT OUTER JOIN test B ON A.NAME=B.NAME AND
A.POINTS<B.POINTS AND A.TIME>B.TIME WHERE B.NAME IS NULL ) as sub GROUP BY NAME
and I get this :
name POINTS TIME CAR DATE
Daniel 250 45 green 2018-01-21
Julie 220 54 yellow 2018-01-22
Julie should have car=red & 2018-01-19
For Daniel it looks good
how can I get thise values (car & date) ?
thanks
Nico
You could give this a shot. It contains the table against itself and gets only records with the highest points and lowest time.
SELECT NAME, MAX(POINTS) POINTS, MAX(TIME) TIME
FROM
(
SELECT A.* FROM test A
LEFT OUTER JOIN test B ON A.NAME=B.NAME AND A.POINTS<B.POINTS AND A.TIME>B.TIME
WHERE B.NAME IS NULL
) GROUP BY NAME
For additional, try this:
SELECT * FROM
(
SELECT OUTERTEST.*,
#row_num := IF(#prev_value=OUTERTEST.name,#row_num+1,1) AS RowNumber,
#prev_value := OUTERTEST.name
FROM (SELECT * FROM TEST ORDER BY NAME, TEST.POINTS DESC, TEST.TIME ASC) OUTERTEST, (SELECT #row_num := 1, #prev_value := '') x
) A
WHERE A.ROWNUMBER=1
I did more test with this table
id name time points
1 Daniel 55 1140
2 Judie 54 1144
3 Judie 33 1028
4 Daniel 45 1180
5 Judie 53 1148
I apply this request
SELECT NAME, MAX(POINTS) POINTS, sub.TIME FROM (SELECT * FROM Testpoint ORDER BY POINTS DESC, TIME ASC) AS sub
GROUP BY sub.name
I have the max point for each name but the time is not the right one
name POINTS time
Daniel 1180 55
Judie 1148 54
Judie should have 53 for time and not 54
What I did wrong?
thankyou
Nico

Combining three very similar queries? (Postgres)

So I have three queries. I'm trying to combine them all into one query. Here they are with their outputs:
Query 1:
SELECT distinct on (name) name, count(distinct board_id)
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY name
ORDER BY name ASC
Output:
A | 15
B | 26
C | 24
D | 11
E | 31
F | 32
G | 16
Query 2:
SELECT distinct on (name) name, count(board_id) as total
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY 1, board_id
ORDER BY name, total DESC
Output:
A | 435
B | 246
C | 611
D | 121
E | 436
F | 723
G | 293
Finally, the last query:
SELECT distinct on (name) name, count(board_id) as total
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY 1
ORDER BY name, total DESC
Output:
A | 14667
B | 65123
C | 87426
D | 55198
E | 80612
F | 31485
G | 43392
Is it possible to format it to be like this:
A | 15 | 435 | 14667
B | 26 | 246 | 65123
C | 24 | 611 | 87426
D | 11 | 121 | 55198
E | 31 | 436 | 80612
F | 32 | 723 | 31485
G | 16 | 293 | 43392
EDIT:
With #Clodoaldo Neto 's help, I combined the first and the third queries with this:
SELECT name, count(distinct board_id), count(board_id) as total
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY 1
ORDER BY description ASC
The only thing preventing me from combining the second query with this new one is the GROUP BY clause needing board_id to be in it. Any thoughts from here?
This is hard to get right without test data. But here is my try:
with s as (
select name, grouping(name, board_id) as grp,
count(distinct board_id) as dist_total,
count(*) as name_total,
count(*) as name_board_total
from
tablea
inner join
table_b on tablea.id = table_b.id
group by grouping sets ((name), (name, board_id))
)
select name, dist_total, name_total, name_board_total
from
(
select name, dist_total, name_total
from s
where grp = 1
) r
inner join
(
select name, max(name_board_total) as name_board_total
from s
where grp = 0
group by name
) q using (name)
order by name
https://www.postgresql.org/docs/current/static/queries-table-expressions.html#QUERIES-GROUPING-SETS

Grouped LIMIT 10 in Postgresql

I have a query:
select
a.kli,
b.term_desc,
count(distinct(a.adic)) as count,
a.partner_id
from
ad_delivery.sgmt_kli_adic a
join wand.wandterms b on a.kli = b.term_code
join wand.wandterms c on b.term_desc=c.term_desc
join dwh.sgmt_clients e on a.partner_id::varchar = e.partner_id
join dwh.schema_names f on e.partner_id::integer = f.partner_id::integer
where
a.partner_id::integer in (f.partner_id)
and c.class_code = 969
group by a.partner_id, b.term_desc, a.kli
order by partner_id, count desc;
which brings back counts for certain terms per partner_id. I want to be able to show the top 10 for each of the ~40 partner_id in order by the count desc
the query results look like
db=# SELECT * FROM xxx;
pid | term_desc | count
----+------------+------
4 | termdesc1 | 3434
4 | termdesc2 | 235
4 | termdesc3 | 367
4 | termdesc4 | 4533
5 | termdesc1 | 235
5 | termdesc2 | 567
5 | termdesc3 | 344
5 | termdesc4 | 56
(10k+ rows)
You could add a rank column and then filter the result by the rank :
select
a.kli,
b.term_desc,
count(distinct(a.adic)) as count,
a.partner_id,
RANK() OVER (PARTITION BY a.partner_id order by a.partner_id DESC) AS r
from
ad_delivery.sgmt_kli_adic a
join wand.wandterms b on a.kli = b.term_code
join wand.wandterms c on b.term_desc=c.term_desc
join dwh.sgmt_clients e on a.partner_id::varchar = e.partner_id
join dwh.schema_names f on e.partner_id::integer = f.partner_id::integer
where
a.partner_id::integer in (f.partner_id)
and c.class_code = 969
group by a.partner_id, b.term_desc, a.kli
HAVING r < 11
order by partner_id, count desc;
I have not tested the code, however the trick is ranking the each row of the GROUP BY and filter the resultset with the HAVING clause, keeping only item with a lower rank than 11 (you will get 10 item per group).