PostgreSQL: Merging sets of rows which text fields are contained in other sets of rows - postgresql

Given the following table, I need to merge the fields in different "id" only if they are the same type (person or dog), and always as the value of every field of an "id" is contained in the values of other "ids".
id
being
feature
values
1
person
name
John;Paul
1
person
surname
Smith
2
dog
name
Ringo
3
dog
name
Snowy
4
person
name
John
4
person
surname
5
person
name
John;Ringo
5
person
surname
Smith
In this example, the merge results should be as follows:
1 and 4 (Since 4's name is present in 1's name and 4's surname is empty)
1 and 5 cannot be merged (the name field show different values)
4 and 5 can be merged
2 and 3 (dogs) cannot be merged. They have only the field "name" and they do not share values.
2 and 3 cannot be merged with 1, 4, 5 since they have different values in "being".
id
being
feature
values
1
person
name
John;Paul
1
person
surname
Smith
2
dog
name
Ringo
3
dog
name
Snowy
5
person
name
John;Ringo
5
person
surname
Smith
I have tried this:
UPDATE table a
SET values = (SELECT array_to_string(array_agg(distinct values),';') AS values FROM table b
WHERE a.being= b.being
AND a.feature= b.feature
AND a.id<> b.id
AND a.values LIKE '%'||a.values||'%'
)
WHERE (select count (*) FROM (SELECT DISTINCT c.being, c.id from table c where a.being=c.being) as temp) >1
;
This doesn't work well because it will merge, for example, 1 and 5. Besides, it duplicates values when merging that field.

One option is to aggregate names with surnames on "id" and "being". Once you get a single string per "id", a self join may find when a full name is completely included inside another (where the "being" is same for both "id"s), then you just select the smallest fullname, candidate for deletion:
WITH cte AS (
SELECT id,
being,
STRING_AGG(values, ';') AS fullname
FROM tab
GROUP BY id,
being
)
DELETE FROM tab
WHERE id IN (SELECT t2.id
FROM cte t1
INNER JOIN cte t2
ON t1.being = t2.being
AND t1.id > t2.id
AND t1.fullname LIKE CONCAT('%',t2.fullname,'%'));
Check the demo here.

Related

PostgreSQL - How to display a corresponding string on every entry in string_agg()?

I have 2 tables:
Employee
ID Name
1 John
2 Ben
3 Adam
Employer
ID Name
1 James
2 Rob
3 Paul
I want to string_agg() and concatenate the two tables in one record as a single column. Now I wanted another column than will determine that if that string is from "Employee" table, it will display "Employee" and "Employer" if the data comes from the "Employer" table.
Here's my code for displaying the table:
SELECT string_agg(e.Name, CHR(10)) || CHR(10) || string_agg(er.Name, CHR(10)), PERSON_STATUS
FROM Employee e, Employer er
Here's my expected output:
ID Name PERSON_STATUS
1 John Employee
Ben Employee
Adam Employee
James Employer
Rob Employer
Paul Employer
NOTE: I know this can be done by adding another column in the table but that's not the case of this scenario. This is just an example to illustrate my problem.
Based on your sample, I'd say that you need UNION ALL rather than an aggregate:
SELECT id, name, 'Employee'::text AS person_status
FROM employee
UNION ALL
SELECT id, name, 'Employer'::text
from employer;
SELECT 1 AS id, STRING_AGG(name, E'\r\n') AS name, STRING_AGG(person_status, E'\r\n') AS person_status
FROM (
SELECT name, 'Employee' AS person_status
FROM employee
UNION ALL
SELECT name, 'Employer'
FROM employer
) data
Returns:
Ok, so first we merge our 2 tables into 3 columns. We can select arbitrary values this way.
select
"ID", -- Double quotes are necesary for capitalised aliases
"Name",
'Employee' as "PERSON_STATUS"
from
employee
union
select
"ID",
"Name",
'Employer'
from
employer
We then subquery this and perform our string operations as required.
select
string_agg(concat(people."Name", ' ', people."PERSON_STATUS"), chr(10))
from
(
select
"ID",
"Name",
'Employee' as "PERSON_STATUS"
from
employee
union
select
"ID",
"Name",
'Employer'
from
employer
) as people

Subsetting records that contain multiple values in one column

In my postgres table, I have two columns of interest: id and name - my goal is to only keep records where id has more than one value in name. In other words, would like to keep all records of ids that have multiple values and where at least one of those values is B
UPDATE: I have tried adding WHERE EXISTS to the queries below but this does not work
The sample data would look like this:
> test
id name
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
7 7 A
8 2 B
9 1 B
10 2 B
and the output would look like this:
> output
id name
1 1 A
2 2 A
8 2 B
9 1 B
10 2 B
How would one write a query to select only these kinds records?
Based on your description you would seem to want:
select id, name
from (select t.*, min(name) over (partition by id) as min_name,
max(name) over (partition by id) as max_name
from t
) t
where min_name < max_name;
This can be done using EXISTS:
select id, name
from test t1
where exists (select *
from test t2
where t1.id = t2.id
and t1.name <> t2.name) -- this will select those with multiple names for the id
and exists (select *
from test t3
where t1.id = t3.id
and t3.name = 'B') -- this will select those with at least one b for that id
Those records where for their id more than one name shines up, right?
This could be formulated in "SQL" as follows:
select * from table t1
where id in (
select id
from table t2
group by id
having count(name) > 1)

Using a WHERE EXISTS statement for two contradictory conditions in Redshift/Postgres?

I'm looking to write a query that, given an id, first name, and last name, returns IDs corresponding to every ID that has at least one row containing a first name 'Steve' and a last name 'Smith', in addition to at least one row that corresponds to a first name 'Steve' and does not correspond to a last name 'Smith'. I tried the below query but it returns 0 rows.
SELECT DISTINCT id FROM t
WHERE EXISTS (SELECT 1 FROM t WHERE first_name = 'Steve' AND last_name != 'Smith')
AND NOT EXISTS (SELECT 1 FROM t WHERE first_name = 'Steve' AND last_name = 'Smith')
I suspect it's because within a single row, both conditions cannot simultaneously be true, even though they can both be true across multiple rows for the same ID.
How should I modify or rewrite this query to return the IDs of interest?
Apparently you could write this another way. Give me all IDs where there's at least one Steve Smith but not all of them are Steve Smiths.
select id
from t
group by id
having count(case when first_name = 'Steve' and last_name = 'Smith' then 1 end)
between 1 and count(last_name) - 1

Limit for inner Join Table

I have a scenario where I am joining three tables and getting the results.
My problem is i have apply limit for joined table.
Take below example, i have three tables 1) books and 2) Customer 3)author. I need to find list of books sold today with author and customer name however i just need last nth customers not all by passing books Id
Books Customer Authors
--------------- ---------------------- -------------
Id Name AID Id BID Name Date AID Name
1 1 1 ABC 1 A1
2 2 1 CED 2 A2
3 3 2 DFG
How we can achieve this?
You are looking for LATERAL.
Sample:
SELECT B.Id, C.Name
FROM Books B,
LATERAL (SELECT * FROM Customer WHERE B.ID=C.BID ORDER BY ID DESC LIMIT N) C
WHERE B.ID = ANY(ids)
AND Date=Current_date

How to get records from table based on two other tables JPA

I'm trying to do something simple in JPA.
I have a table Businesses:
BusinessId name
------------ ------
1 Joe
2 bob
And table Products:
productID name
------------ ------
1 Pen
2 paper
Because they related as meny-to-many I created another table businessesHasProductID:
BusinessId productID
------------ -----------
1 1
1 2
2 2
Now I want to select BusinessId and productID form businessesHasProductID where the name of BusinessId = 'x' and the name of productID = 'y'.
I built the tables and then I created the entity classes (from wizard in netBeans). I know how to get the "Businesses" table where Businesses.name = 'x' and I know how to get "Products" table where Products.name = 'y'. but I want to combine these results and get the IDs.
I tried to do :
Query query = getEntityManager().createQuery("
SELECT b FROM businessesHasProductID WHERE b.BusinessId IN
(SELECT t0.BusinessId FROM Businesses t0 WHERE t0.BusinessId = 'x')
AND b.productID IN
(SELECT t1.productID FROM Products t1 WHERE t1.productID = 'y')
");
That's not worked. It complains that the IN contains invalid data.
If I understand correctly, you want to get all the [bId, pId] tuples that exist in the join table and for which the name of the business identified by bId is 'x' and the name of the product identified by pId is 'y'.
If so, the following query should do what you want:
select business.businessId, product.productId
from Business business
inner join business.products product
where business.name = 'x'
and product.name = 'y'