show records that have only one matchin row in another table - postgresql

I need to write a sql code that probably is very simple but I am very new to it.
I need to find all the records from one table that have matching id (but no more than one) from the other table. eg. one table contains records of the employees and the second one with employees' telephone numbers. i need to find all employees with only one telephone no

Sample data would be nice. In absence of:
SELECT
employees.employee_id
FROM
employees
LEFT JOIN
(SELECT distinct on(employee_id) employee_id FROM emp_phone) AS phone
ON
employees.employee_id = phone.employee_id
WHERE
phone.employee_id IS NOT NULL;

You need a join of the 2 tables, group by employee and the condition in the having clause:
SELECT e.employee_id, e.name
FROM employees e INNER JOIN numbers n
ON e.employee_id = n.employee_id
GROUP BY e.employee_id, e.name
HAVING COUNT(*) = 1;

If there can be more than a few numbers per employee in the table with the employees' telephone numbers (calling it tel), then it's cheaper to avoid GROUP BY and HAVING which has to process all rows. Find employees with "unique" numbers using a self-anti-join with NOT EXISTS.
While you don't need more than the employee_id and their unique phone number, you don't even have to involve the employee table at all:
SELECT *
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
);
If you need additional columns from the employee table:
SELECT * -- or any columns you need
FROM (
SELECT employee_id AS id, tel_number -- or any columns you need
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
)
) t
JOIN employee e USING (id);
The column alias in the subquery (employee_id AS id) is just for convenience. Then the outer join condition can be USING (id), and the ID column is only included once in the result, even with SELECT * ...
Simpler with a smart naming convention that uses employee_id for the employee ID everywhere. But it's a widespread anti-pattern to use employee.id instead.
Related:
JOIN table if condition is satisfied, else perform no join

Related

How to update duplicate rows in a table n postgresql

I have created synthetic data for a typical call center.
Below is the screenshot of the table I have created.
Table 1:
Problem statement: Since this is completely random data, I noticed that there are some customers who are being assigned to the same agents whenever they call again.
So using this query I was able to test such a case and count the number of times agents are being repeated for each customer.
select agentid, customerid, count(customerid) from aa_dev.calls group by agentid, customerid having count(customerid) > 1 ;
Table 2
I have a separate agents table to called aa_dev.agents in which the agent's ids are stored
Now I want to replace the agentid for such cases, such that if agentid is repeated 6 times for a single customer then 5 of the times the agent id should be updated with any other agentid from the table but call time shouldn't be overlapping That means the agent we are replacing with should not be busy on the time the call is going one.
I have assigned row numbers to each repeated ones.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY agentid, customerid ORDER BY random()) rn,
COUNT(*) OVER (PARTITION BY agentid, customerid) cnt
FROM aa_dev.calls
)
SELECT agentid, customerid, rn
FROM cte
WHERE cnt > 1;
This way I could visualize the repetition clearly.
So I don't want to update row 1 but the rest.
Is there any way I can acheive this? Can I use the row number and write a query according to the row number to update rownum 2 onwards row one by one with each row having a unique agent?
If you don't want duplicates in your artificial data, it's probably better to not generate them.
But if you already have a table with duplicates and want to work on the duplicates, either updating them or deleting, here is the easy way:
You need a unique ID for each updated row. If you don't have it,
add it temporarily. Then you can use this pattern to update all duplicates
except the first one:
To add artificial id column to preexisting table, use:
ALTER TABLE calls ADD id serial;
In my case I generated a test table with 100 random rows:
CREATE TEMP TABLE calls (id serial, agentid int, customerid int);
INSERT INTO calls (agentid, customerid)
SELECT (random()*10)::int, (random()*10)::int
FROM generate_series(1, 100) n;
Define what constitutes a duplicate and find duplicates in data:
SELECT agentid, customerid, count(*), array_agg(id) id
FROM calls
GROUP BY 1,2 HAVING count(*)>1
ORDER BY 1,2;
Update all the duplicate rows except first one with NULLs:
UPDATE calls SET agentid = whatever_needed
FROM (
SELECT array_agg(id) id, min(id) idmin FROM calls
GROUP BY agentid, customerid HAVING count(*)>1
) AS dup
WHERE calls.id = ANY(dup.id) AND calls.id <> dup.idmin;
Alternatively, remove all duplicates except first one:
DELETE FROM calls
USING (
SELECT array_agg(id) id, min(id) idmin FROM calls
GROUP BY agentid, customerid HAVING count(*)>1
) AS dup
WHERE calls.id = ANY(dup.id) AND calls.id <> dup.idmin;

Add Column in table with value partition by group

My table is somethingg like
CREATE TABLE table1
(
_id text,
name text,
data_type int,
data_value int,
data_date timestamp -- insertion time
);
Now due to a system bug, many duplicate entries are created and I need to remove those duplicated and keep only unique entries excluding data_date because it is a system generated date.
My query to do that is something like:
DELETE FROM table1 A
USING ( SELECT _id, name, data_type, data_value, MIN(data_date) min_date
FROM table1
GROUP BY _id, name, data_type, data_value
HAVING count(data_date) > 1) B
WHERE A._id = B._id
AND A.name = B.name
AND A.data_type = B.data_type
AND A.data_value = B.data_value
AND A.data_date != B.min_date;
However this query works, having millions of records in the table, I want a faster way for it. My idea is to create a new column with value as partition by [_id, name, data_type, data_value] or columns which are in group by. However, I could not find the way to create such column.
I would appretiate if any one may suggest a way to create such column.
Edit 1:
There is another thing to add, I don't want to use CTE or subquery for updating this new column because it will be same as my existing query.
The best way is simply creating a new table without duplicated records:
CREATE...
SELECT _id, name, data_type, data_value, MIN(data_date) min_date
FROM table1
GROUP BY _id, name, data_type, data_value;
Alternatively, you can create a rank and then filter, but a subquery is needed.
RANK() OVER (PARTITION BY your_variables ORDER BY data_date ASC) r
And then filter r=1.

Returning rows with distinct column value with data jpa named query

Assuming I have a table with 3 columns, ID, Name, City and I want to use named query to return rows with unique city..can it be done?
Are you asking whether it is possible to write a query that will return the cities that appear in exactly one row, in a table that has ID/Name/City triplets where there could be multiple rows for the same city but with different names?
If so, it would depend on the database engine behind the scenes - but you could try things like:
with candidates (city, num) as (
select city, count(*) from table
group by city
)
select city from candidates where num = 1
Or
select t1.city from table t1
where not exists (
select * from table t2
where t2.city = t1.city and t2.id <> t1.id
)
where table is your table with these triplets.

Query-Sql Developer

I am creating some queries for my project, but I face some difficulties with the follow ones:
A SELECT statement containing a subquery to retrieve a list of Locations (location id and street_address) that have employees with higher salary than the average of their department. The list must contain the number of those employees and their total salary per location. Name these aggregates respectively "emp" and "totalsalary". The locations in the list must be ordered by location_id.
Select LOCATION_ID, STREET_ADDRESS
from HR.LOCATIONS IN
(Select Employee_id
from HR.Employees
Where Salary > round(avg(SALARY)))
order by location_id;
error: SQL command not properly ended
and the second query is the following
The JOB_HISTORY table can contain more than one entries for an employee who was hired more than once. Create a query to retrieve a list of Employees that were hired more than once. Include the columns EMPLOYEE_ID, LAST_NAME, FIRST_NAME and the aggregate "Times Hired".
SELECT FIRST_NAME,LAST_NAME,EMPLOYEE_ID,
count (*)as TIMES_HIRED
from HR.JOB_HISTORY, HR.EMPLOYEES
where EMPLOYEE_ID= LAST_NAME
having COUNT(*) >1;
error: not a single-group
Try these hope they help. I am making an assumption that employee table has Location_Id column. I am adding Employee_id to Group by to make sure you get correct TotalSalary:
Select LOCATION_ID, STREET_ADDRESS, Count(Employee_id) AS emp, SUM(salary) AS totalsalary
from HR.LOCATIONS INNER JOIN
(Select Employee_id, salary
from HR.Employees
Having Salary > round(avg(SALARY), 0)) AS Emp ON HR.LOCATION_ID = Emp.Location_ID
Group By LOCATION_ID, STREET_ADDRESS, Employee_id
order by location_id;
For the second question:
SELECT FIRST_NAME,LAST_NAME,EMPLOYEE_ID,
count(Employee_id) as TIMES_HIRED
from HR.JOB_HISTORY inner join HR.EMPLOYEES On JOB_HISTORY.Employee_id = Employees.Employee_id
Group By FIRST_NAME,LAST_NAME,EMPLOYEE_ID
Having count(Employee_id) >1;

Removing duplicate rows from relation

I have the following code which produces a relation:
SELECT book_id, shipments.customer_id
FROM shipments
LEFT JOIN editions ON (shipments.isbn = editions.isbn)
LEFT JOIN customers ON (shipments.customer_id = customers.customer_id)
In this relation, there are customer_ids as well as book_ids of books they have bought. My goal is to create a relation with each book in it and then how many unique customers bought it. I assume one way to achieve this is to eliminate all duplicate rows in the relation and then counting the instances of each book_id.
So my question is: How can I delete all duplicate rows from this relation?
Thanks!
EDIT: So what I mean is that I want all the rows in the relation to be unique. If there are three identical rows for example, two of them should be removed.
This will give you all the {customer,edition} pairs for which an order exists:
SELECT *
FROM customers c
JOIN editions e ON (
SELECT * FROM shipments s
WHERE s.isbn = e.isbn
AND s.customer_id = c.customer_id
);
The duplicates are in table shipments. You can remove these with a DISTINCT clause and then count them in an outer query GROUP BY isbn:
SELECT isbn, count(customer_id) AS unique_buyers
FROM (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer
GROUP BY isbn;
If you want a list of all books, even where no purchases were made, you should LEFT JOIN the above to the list of all books:
SELECT isbn, coalesce(unique_buyers, 0) AS books_sold_to_unique_buyers
FROM editions
LEFT JOIN (
SELECT isbn, count(customer_id) AS unique_buyers
FROM (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer
GROUP BY isbn) books_bought USING (isbn)
ORDER BY isbn;
You can write this more succinctly by joining before counting:
SELECT isbn, count(customer_id) AS books_sold_to_unique_buyers
FROM editions
LEFT JOIN (
SELECT DISTINCT isbn, customer_id FROM shipments) book_buyer USING (isbn)
GROUP BY isbn
ORDER BY isbn;