PostgreSQL distinct and group on different fields - postgresql

With the following query I can get the list of project members added in the memberships table, union'ed with the the projects owners (who may not have an entry in the memberships table)
select sub.user, sub.project, sub.role, sub.order, sub.name from
(SELECT
memberships."user",
memberships.project,
memberships.role,
roles."order",
roles.name
FROM memberships
JOIN roles ON roles.id = memberships.role
UNION
SELECT projects.owner AS "user",
projects.id AS project,
1 AS role,
0 AS "order",
'admin'::text AS name
FROM projects
) as sub
The above query yields the following result set.
8 2 1 0 "admin"
8 1 3 2 "contributor" (added through memberships table)
6 1 1 0 "admin"
8 4 1 0 "admin"
8 1 1 0 "admin" (duplicate because user #8 is the owner of project #1)
Now I want to remove the duplicate entries by taking the contents of the row that has least order. using distinct on (sub.order) does not include all rows
select distinct on (sub.order) * from
-- the same subquery
order by sub.order
The above yields
8 2 1 0 "admin"
8 1 3 2 "contributor"
Using group by sub.user, sub.project and aggregating min(sub.order) works but the other two fields like role and name is left out
select sub.user, sub.project, min(sub.order) from
-- the same subquery
group by sub.user, sub.project
I want the role, name and order of the row that has the minimum order when grouped with user, project pair

I want the role, name and order of the row that has the minimum order when grouped with user, project pair
The distinct on must enumerate the "grouping" columns - then the order by clause must contain the same columns, followed by the column(s) to use to break the ties.
You probably want:
select distinct on (t.user, t.project) *
from (
-- the same subquery --
) t
order by t.user, t.project, t.order

Related

PostgreSQL: Merging sets of rows which text fields are contained in other sets of rows

Given the following table, I need to merge the fields in different "id" only if they are the same type (person or dog), and always as the value of every field of an "id" is contained in the values of other "ids".
id
being
feature
values
1
person
name
John;Paul
1
person
surname
Smith
2
dog
name
Ringo
3
dog
name
Snowy
4
person
name
John
4
person
surname
5
person
name
John;Ringo
5
person
surname
Smith
In this example, the merge results should be as follows:
1 and 4 (Since 4's name is present in 1's name and 4's surname is empty)
1 and 5 cannot be merged (the name field show different values)
4 and 5 can be merged
2 and 3 (dogs) cannot be merged. They have only the field "name" and they do not share values.
2 and 3 cannot be merged with 1, 4, 5 since they have different values in "being".
id
being
feature
values
1
person
name
John;Paul
1
person
surname
Smith
2
dog
name
Ringo
3
dog
name
Snowy
5
person
name
John;Ringo
5
person
surname
Smith
I have tried this:
UPDATE table a
SET values = (SELECT array_to_string(array_agg(distinct values),';') AS values FROM table b
WHERE a.being= b.being
AND a.feature= b.feature
AND a.id<> b.id
AND a.values LIKE '%'||a.values||'%'
)
WHERE (select count (*) FROM (SELECT DISTINCT c.being, c.id from table c where a.being=c.being) as temp) >1
;
This doesn't work well because it will merge, for example, 1 and 5. Besides, it duplicates values when merging that field.
One option is to aggregate names with surnames on "id" and "being". Once you get a single string per "id", a self join may find when a full name is completely included inside another (where the "being" is same for both "id"s), then you just select the smallest fullname, candidate for deletion:
WITH cte AS (
SELECT id,
being,
STRING_AGG(values, ';') AS fullname
FROM tab
GROUP BY id,
being
)
DELETE FROM tab
WHERE id IN (SELECT t2.id
FROM cte t1
INNER JOIN cte t2
ON t1.being = t2.being
AND t1.id > t2.id
AND t1.fullname LIKE CONCAT('%',t2.fullname,'%'));
Check the demo here.

How can I join table to delete duplicates entry in PostgreSQL

I actually have this which is working :
DELETE FROM "WU_MatchingUsers" WHERE "id" IN (SELECT "id" FROM (SELECT "id", ROW_NUMBER() OVER( PARTITION BY "IDWU_User1", "IDWU_User2" ORDER BY "id" ASC) AS row_num FROM "WU_MatchingUsers") t WHERE t.row_num >1);
This delete all duplicates entry by the more recent one in "WU_MatchingUsers" but now I have another table which is : "WU_UsersSpheres" which contain Sphere id associate with user ID.
Now I would like that my query can filter / delete only Users from a specific Spheres.
So Wu_UserSpheres look like this :
id | idSpheres | IDUser
1 1 1
2 1 2
3 2 3
4 2 4
5 2 5
So the goal is to only delete duplicate of my matching where id of the users are in a specific Spheres.

how to list records that conform to a sequentially incrementing id in postgres

Is there a way to select records are sequentially incremented?
for example, for a list of records
id 0
id 1
id 3
id 4
id 5
id 8
a command like:
select id incrementally from 3
Will return values 3,4 and 5. It won't return 8 because it's not sequentially incrementing from 5.
step-by-step demo:db<>fiddle
WITH groups AS ( -- 2
SELECT
*,
id - row_number() OVER (ORDER BY id) as group_id -- 1
FROM mytable
)
SELECT
*
FROM groups
WHERE group_id = ( -- 4
SELECT group_id FROM groups WHERE id = 3 -- 3
)
row_number() window function create a consecutive row count. With this difference you are able to create groups of consecutive records (id values which are increasing by 1)
This query is put into a WITH clause because we reuse the result twice in the next step
Select the recently created group_id
Filter the table for this group.
Additionally: If you want to start your output at id = 4, for example, you need to add a AND id >= 4 filter to the WHERE clause

Postgres - Using window function in grouped rows

According to the Postgres Doc at https://www.postgresql.org/docs/9.4/queries-table-expressions.html#QUERIES-WINDOW it states
If the query contains any window functions (...), these functions are evaluated after any grouping, aggregation, and HAVING filtering is performed. That is, if the query uses any aggregates, GROUP BY, or HAVING, then the rows seen by the window functions are the group rows instead of the original table rows from FROM/WHERE.
I didn't get the concept of " then the rows seen by the window functions are the group rows instead of the original table rows from FROM/WHERE". Allow me to use an example to explain my doubt:
Using this ready to run example below
with cte as (
select 1 as primary_id, 1 as foreign_id, 10 as begins
union
select 2 as primary_id, 1 as foreign_id, 20 as begins
union
select 3 as primary_id, 1 as foreign_id, 30 as begins
union
select 4 as primary_id, 2 as foreign_id, 40 as begins
)
select foreign_id, count(*) over () as window_rows_count, count(*) as grouped_rows_count
from cte
group by foreign_id
You may notice that the result is
So if "the rows seen by the window functions are the group rows".. then ¿why window_rows_count is returning a different value from grouped_rows_count?
If you remove the window function from the query:
select foreign_id, count(*) as grouped_rows_count
from cte
group by foreign_id
the result, as expected is this:
> foreign_id | grouped_rows_count
> ---------: | -----------------:
> 1 | 3
> 2 | 1
and on this result, which is 2 rows, if you also apply the window function count(*) over(), it will return 2, because it counts all the rows of the resultset since the over clause is empty, without any partition.
You should follow the last comment on your post.
And for more analysis, you may process the following query :
with cte as (
select 1 as primary_id, 1 as foreign_id, 10 as begins
union
select 2 as primary_id, 1 as foreign_id, 20 as begins
union
select 3 as primary_id, 1 as foreign_id, 30 as begins
union
select 4 as primary_id, 2 as foreign_id, 40 as begins
)
select foreign_id, count(*) over (PARTITION BY foreign_id) as window_rows_count, count(*) as grouped_rows_count
from cte
group by foreign_id ;
You'll see this time that you are getting 1 row for each foreign id.
Checkout the documentation on postgres at this url :
https://www.postgresql.org/docs/13/tutorial-window.html
The window function is applied to the whole set obtained by the former query.

How to normalize group by count results?

How can the results of a "group by" count be normalized by the count's sum?
For example, given:
User Rating (1-5)
----------------------
1 3
1 4
1 2
3 5
4 3
3 2
2 3
The result will be:
User Count Percentage
---------------------------
1 3 .42 (=3/7)
2 1 .14 (=1/7)
3 2 .28 (...)
4 1 .14
So for each user the number of ratings they provided is given as the percentage of the total ratings provided by everyone.
SELECT DISTINCT ON (user) user, count(*) OVER (PARTITION BY user) AS cnt,
count(*) OVER (PARTITION BY user) / count(*) OVER () AS percentage;
The count(*) OVER (PARTITION BY user) is a so-called window function. Window functions let you perform some operation over a "window" created by some "partition" which is here made over the user id. In plain and simple English: the partitioned count(*) is calculated for each distinct user value, so in effect it counts the number of rows for each user value.
Without using a windowing function or variables, you will need to cross join a grouped subquery on a second "maxed" subquery then select again to return a subset you can work with.
SELECT
B.UserID,
B.UserCount,
A.CountAll
FROM
(
SELECT
CountAll=SUM(UserCount)
FROM
(
SELECT
UserCount=COUNT(*)
FROM
MyTable
GROUP BY
UserID
) AS A
)AS C
CROSS JOIN(
SELECT
UserID,
UserCount=COUNT(*)
FROM
MyTable
GROUP BY
UserID
)AS B