Get one record from left join if count is 1 else null - left-join

I have a table Employee with primary key as Employee_id and other table for email address which has multiple email address for one employee. I need to get all employees from Table 1 with logic that if one email address exists then it is pulled from Table "Email" , if more than one then null.
Table 1 data
I tried to write it as below query and it is not working. Can anyone please help. ALl your inputs are greatly appreciated.
SELECT employee_id,
CASE WHEN count(adr.email_addr) =1 and count(dupes.email_addr)=1 and adr.email_addr=dupes.email_addr
then adr.email_addr
else null END email_address
FROM Employee Emp
LEFT OUTER JOIN
(SELECT employee_id, email_addr, count(*) qty
FROM Emailadress
HAVING count(*) > 1)
dupes
ON EMP.employee_id = dupes.employee_id
group by emp.employee_id, adr.email_addr
I am using impala interface to execute.
Thanks,
Manju

Related

How can I improve this query in postgresql? Its taking more than 48 houers already

I do have the following query and I'm running it against a postgresql db which has more than 10M entries in table account_message and 1M entries in table message.
Postgresql is in Version PostgreSQL 11.12, compiled by Visual C++ build 1914, 64-bit
Is there any way to make this query faster because it takes more than 2 days already and did not finish yet.
DELETE FROM account_message WHERE message_id in
(SELECT t2.id FROM message t2 WHERE NOT EXISTS
(SELECT 1 FROM customer t1 WHERE
t1.username = t2.username));
Table account_message has the following columns:
id (bigint)(primary key)
user_id (bigint)
message_id (bigint)
isRead (boolean)
isDeleted (boolean)
Table message has the following columns:
id (bigint)(primary key)
username (character varying)(255)
text (character varying)(10000)
details(character varying)(1000)
status(integer)
Table customer has the following columns:
username (character varying)(255)(primary key)
type(character varying)(500)
details(character varying)(10000)
status(integer)
active(boolean)
This did the trick for me and also makes it much faster.
DELETE FROM account_message WHERE message_id IN (
SELECT m.id FROM message m
LEFT JOIN customer c ON m.username = c.username
WHERE c.username IS NULL LIMIT 1000)
You may be able to improve this by
getting rid of your dependent subquery, and
doing it in batches.
Try this to get a batch of one thousand message ids to delete. LEFT JOIN ... WHERE col IS NULL is a way to write WHERE NOT EXISTS without a dependent subqiery.
SELECT m.id
FROM message m
LEFT JOIN customer c ON m.username = c.username
WHERE c.username IS NULL
LIMIT 1000
Then, use the subquery in a statement. Repeat the statement until it deletes no rows.
DELETE
FROM account_message
WHERE message_id IN (
SELECT m.id
FROM message m
LEFT JOIN customer c ON m.username = c.username
WHERE c.username IS NULL
LIMIT 1000)
Doing this in batches of 1000 helps performance: it splits your operation into multiple reasonably sized database transactions.
First, try to optimize the select inside brakets. Something like:
DELETE FROM account_message WHERE message_id in
(
select t2.id message t2
left join customer t1 on (t1.username = t2.username)
where t2.username is NULL
)

How do I make my RANK () OVER query work in select?

table image
I have this table that I need to sort in the following way:
need to rank Departments by Salary;
need to show if Salary = NULL - 'No data to be shown' message
need to add total salary paid to the department
need to count people in the department
SELECT RANK() OVER (
ORDER BY Salary DESC
)
,CASE
WHEN Salary IS NULL
THEN 'NO DATA TO BE SHOWN'
ELSE Salary
,Count(Fname)
,Total(Salary) FROM dbo.Employees
I get an error saying:
Column 'dbo.Employees.Salary' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Why so?
Column 'dbo.Employees.Salary' is invalid in the select list because it
is not contained in either an aggregate function or the GROUP BY
clause.
Why so?
The aggregate functions are returning a single value for the whole table, you can't SELECT a field alongside them it doesn't makes sense. Like say, you have a students table you apply Sum(marks) for the whole students table, and you are then also selecting student's name Select studentname in your query. Which student's name will the database engine select? Confusing
Column "invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"
I tried this-
using inner query
SELECT RANK() OVER (ORDER BY SAL DESC) RANK,FNAME,DEPARTMENT
CASE
WHEN SAL IS NULL THEN 'NO DATA TO BE SHOWN'
ELSE SAL
END
FROM
(SELECT COUNT(FNAME) FNAME, SUM(SALARY) SAL, DEPARTMENT
FROM TESTEMPLOYEE
GROUP BY DEPARTMENT) t

show records that have only one matchin row in another table

I need to write a sql code that probably is very simple but I am very new to it.
I need to find all the records from one table that have matching id (but no more than one) from the other table. eg. one table contains records of the employees and the second one with employees' telephone numbers. i need to find all employees with only one telephone no
Sample data would be nice. In absence of:
SELECT
employees.employee_id
FROM
employees
LEFT JOIN
(SELECT distinct on(employee_id) employee_id FROM emp_phone) AS phone
ON
employees.employee_id = phone.employee_id
WHERE
phone.employee_id IS NOT NULL;
You need a join of the 2 tables, group by employee and the condition in the having clause:
SELECT e.employee_id, e.name
FROM employees e INNER JOIN numbers n
ON e.employee_id = n.employee_id
GROUP BY e.employee_id, e.name
HAVING COUNT(*) = 1;
If there can be more than a few numbers per employee in the table with the employees' telephone numbers (calling it tel), then it's cheaper to avoid GROUP BY and HAVING which has to process all rows. Find employees with "unique" numbers using a self-anti-join with NOT EXISTS.
While you don't need more than the employee_id and their unique phone number, you don't even have to involve the employee table at all:
SELECT *
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
);
If you need additional columns from the employee table:
SELECT * -- or any columns you need
FROM (
SELECT employee_id AS id, tel_number -- or any columns you need
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
)
) t
JOIN employee e USING (id);
The column alias in the subquery (employee_id AS id) is just for convenience. Then the outer join condition can be USING (id), and the ID column is only included once in the result, even with SELECT * ...
Simpler with a smart naming convention that uses employee_id for the employee ID everywhere. But it's a widespread anti-pattern to use employee.id instead.
Related:
JOIN table if condition is satisfied, else perform no join

find all companies where all employees in specific state

I have a table employees with columns:
company_id,
id,
opt_state (ceased_membership, ignition, opted_out, opted_in),
opt_out_on.
I want to query all companies where all employees opt-state is in ('ceased_membership', 'ignition', 'opted_out') and the date opt_out_on when last employee left.
I have tried this but it didn't work
select company_id from employees where id=all(select id from
employees
where opt_state in ('ceased_membership', 'ignition','opted_out')
Then I wrote this query below, which worked very well and gave me the resolution I was looking for. However, I'd like to ask here if this can be done differently, more elegantly.
SELECT
e.company_id
, max_opt_out
FROM (
SELECT DISTINCT
company_id
, count(id)
OVER (
PARTITION BY company_id ) opt_out
FROM employees
WHERE opt_state IN ('ceased_membership', 'ignition', 'opted_out')) e
LEFT JOIN (
SELECT
company_id
, count(id) opt_in
, max(opt_out_on) max_opt_out
FROM employees
GROUP BY company_id) S
ON e.company_id = s.company_id
WHERE e.opt_out = s.opt_in;
This seems like a good time to use the HAVING clause
SELECT company_id, max(opt_out_on)
FROM employees e
GROUP BY company_id
HAVING bool_and( opt_state in ('ceased_membership', 'ignition','opted_out'));
HAVING in a bit like a WHERE but the condition apples to whole GROUPS
bool_and is an agregate function that is only true when all the records in the group are result in true.
I'd say that you want to query a maximum out_out_on for each company that only have employees in a set of states, which means that do not have any employee not in a set of states.
So, translated to SQL:
select company_id, max(opt_out_on)
from employees e
where not exists(
select 1 from employees
where company_id=e.company_id
and opt_state not in ('ceased_membership', 'ignition','opted_out')
)
group by company_id;

Query-Sql Developer

I am creating some queries for my project, but I face some difficulties with the follow ones:
A SELECT statement containing a subquery to retrieve a list of Locations (location id and street_address) that have employees with higher salary than the average of their department. The list must contain the number of those employees and their total salary per location. Name these aggregates respectively "emp" and "totalsalary". The locations in the list must be ordered by location_id.
Select LOCATION_ID, STREET_ADDRESS
from HR.LOCATIONS IN
(Select Employee_id
from HR.Employees
Where Salary > round(avg(SALARY)))
order by location_id;
error: SQL command not properly ended
and the second query is the following
The JOB_HISTORY table can contain more than one entries for an employee who was hired more than once. Create a query to retrieve a list of Employees that were hired more than once. Include the columns EMPLOYEE_ID, LAST_NAME, FIRST_NAME and the aggregate "Times Hired".
SELECT FIRST_NAME,LAST_NAME,EMPLOYEE_ID,
count (*)as TIMES_HIRED
from HR.JOB_HISTORY, HR.EMPLOYEES
where EMPLOYEE_ID= LAST_NAME
having COUNT(*) >1;
error: not a single-group
Try these hope they help. I am making an assumption that employee table has Location_Id column. I am adding Employee_id to Group by to make sure you get correct TotalSalary:
Select LOCATION_ID, STREET_ADDRESS, Count(Employee_id) AS emp, SUM(salary) AS totalsalary
from HR.LOCATIONS INNER JOIN
(Select Employee_id, salary
from HR.Employees
Having Salary > round(avg(SALARY), 0)) AS Emp ON HR.LOCATION_ID = Emp.Location_ID
Group By LOCATION_ID, STREET_ADDRESS, Employee_id
order by location_id;
For the second question:
SELECT FIRST_NAME,LAST_NAME,EMPLOYEE_ID,
count(Employee_id) as TIMES_HIRED
from HR.JOB_HISTORY inner join HR.EMPLOYEES On JOB_HISTORY.Employee_id = Employees.Employee_id
Group By FIRST_NAME,LAST_NAME,EMPLOYEE_ID
Having count(Employee_id) >1;