How to reuse Postgres variables declared using 'WITH' operator - postgresql

i need to delete records from two tables, but i cannot perform it consistently, because of after deletion from first table there will be no data to delete from second.
I have tried following:
WITH person_ids AS(select person_id from application_person
where application_id in (select DISTINCT duplicate.id
from application duplicate inner join application application
on duplicate.document_id = application.document_id
where duplicate.modify_date < application.modify_date)),
delete from application_person where application_person.person_id in (select person_id from person_ids);
delete from person where id in (select person_id from person_ids);
For second call of person_ids i have Query failed: ERROR: relation "person_ids" does not exist
What am i doing wrong?
Thanks.

What am i doing wrong?
You have two separate statements. The person_ids is only in scope within the first one, which lasts until the semicolon.
You'll want to use
WITH duplicate_applications AS (
select DISTINCT duplicate.id
from application duplicate
inner join application application using (document_id)
where duplicate.modify_date < application.modify_date)
), deleted_persons AS (
delete from application_person
where application_id in (select application_id from duplicate_applications)
returning person_id
)
delete from person
where id in (select person_id from deleted_persons);

Related

How to update duplicate rows in a table n postgresql

I have created synthetic data for a typical call center.
Below is the screenshot of the table I have created.
Table 1:
Problem statement: Since this is completely random data, I noticed that there are some customers who are being assigned to the same agents whenever they call again.
So using this query I was able to test such a case and count the number of times agents are being repeated for each customer.
select agentid, customerid, count(customerid) from aa_dev.calls group by agentid, customerid having count(customerid) > 1 ;
Table 2
I have a separate agents table to called aa_dev.agents in which the agent's ids are stored
Now I want to replace the agentid for such cases, such that if agentid is repeated 6 times for a single customer then 5 of the times the agent id should be updated with any other agentid from the table but call time shouldn't be overlapping That means the agent we are replacing with should not be busy on the time the call is going one.
I have assigned row numbers to each repeated ones.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY agentid, customerid ORDER BY random()) rn,
COUNT(*) OVER (PARTITION BY agentid, customerid) cnt
FROM aa_dev.calls
)
SELECT agentid, customerid, rn
FROM cte
WHERE cnt > 1;
This way I could visualize the repetition clearly.
So I don't want to update row 1 but the rest.
Is there any way I can acheive this? Can I use the row number and write a query according to the row number to update rownum 2 onwards row one by one with each row having a unique agent?
If you don't want duplicates in your artificial data, it's probably better to not generate them.
But if you already have a table with duplicates and want to work on the duplicates, either updating them or deleting, here is the easy way:
You need a unique ID for each updated row. If you don't have it,
add it temporarily. Then you can use this pattern to update all duplicates
except the first one:
To add artificial id column to preexisting table, use:
ALTER TABLE calls ADD id serial;
In my case I generated a test table with 100 random rows:
CREATE TEMP TABLE calls (id serial, agentid int, customerid int);
INSERT INTO calls (agentid, customerid)
SELECT (random()*10)::int, (random()*10)::int
FROM generate_series(1, 100) n;
Define what constitutes a duplicate and find duplicates in data:
SELECT agentid, customerid, count(*), array_agg(id) id
FROM calls
GROUP BY 1,2 HAVING count(*)>1
ORDER BY 1,2;
Update all the duplicate rows except first one with NULLs:
UPDATE calls SET agentid = whatever_needed
FROM (
SELECT array_agg(id) id, min(id) idmin FROM calls
GROUP BY agentid, customerid HAVING count(*)>1
) AS dup
WHERE calls.id = ANY(dup.id) AND calls.id <> dup.idmin;
Alternatively, remove all duplicates except first one:
DELETE FROM calls
USING (
SELECT array_agg(id) id, min(id) idmin FROM calls
GROUP BY agentid, customerid HAVING count(*)>1
) AS dup
WHERE calls.id = ANY(dup.id) AND calls.id <> dup.idmin;

show records that have only one matchin row in another table

I need to write a sql code that probably is very simple but I am very new to it.
I need to find all the records from one table that have matching id (but no more than one) from the other table. eg. one table contains records of the employees and the second one with employees' telephone numbers. i need to find all employees with only one telephone no
Sample data would be nice. In absence of:
SELECT
employees.employee_id
FROM
employees
LEFT JOIN
(SELECT distinct on(employee_id) employee_id FROM emp_phone) AS phone
ON
employees.employee_id = phone.employee_id
WHERE
phone.employee_id IS NOT NULL;
You need a join of the 2 tables, group by employee and the condition in the having clause:
SELECT e.employee_id, e.name
FROM employees e INNER JOIN numbers n
ON e.employee_id = n.employee_id
GROUP BY e.employee_id, e.name
HAVING COUNT(*) = 1;
If there can be more than a few numbers per employee in the table with the employees' telephone numbers (calling it tel), then it's cheaper to avoid GROUP BY and HAVING which has to process all rows. Find employees with "unique" numbers using a self-anti-join with NOT EXISTS.
While you don't need more than the employee_id and their unique phone number, you don't even have to involve the employee table at all:
SELECT *
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
);
If you need additional columns from the employee table:
SELECT * -- or any columns you need
FROM (
SELECT employee_id AS id, tel_number -- or any columns you need
FROM tel t
WHERE NOT EXISTS (
SELECT FROM tel
WHERE employee_id = t.employee_id
AND tel_number <> t.tel_number -- or use PK column
)
) t
JOIN employee e USING (id);
The column alias in the subquery (employee_id AS id) is just for convenience. Then the outer join condition can be USING (id), and the ID column is only included once in the result, even with SELECT * ...
Simpler with a smart naming convention that uses employee_id for the employee ID everywhere. But it's a widespread anti-pattern to use employee.id instead.
Related:
JOIN table if condition is satisfied, else perform no join

PostgreSQL - Delete duplicated records - ERROR: too many range table entries

I have a HyperTable (TimescaleDB extension) called "conferimenti"
I am trying to delete about 2500 duplicated rows
DELETE FROM conferimenti
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY dataora, idcomune, codicestazione, tiporifiuto, codicetag
ORDER BY id ) AS row_num
FROM conferimenti ) t
WHERE t.row_num > 1);
throws an error
ERROR: too many range table entries
SQL state: 54000
Executing this query i have a one column "id" with all the ids
SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY dataora, idcomune, codicestazione, tiporifiuto, codicetag
ORDER BY id ) AS row_num
FROM conferimenti ) t
WHERE t.row_num > 1
I cannot disable triggers
The sql state 5400 is for a "program limit exceeded", but there is nothing specifically for "ERROR: too many range table entries". Further you indicate that you "cannot disable triggers" which leads to the conclusion this is an internally generated application error; not a Postgres generated error. Seems like someone has established a Business Rule limiting the number of deletes. You need to investigate and determine that limit. Then revise you delete state to delete no more that that value.
DELETE FROM conferimenti
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY dataora, idcomune, codicestazione, tiporifiuto, codicetag
ORDER BY id ) AS row_num
FROM conferimenti ) t
WHERE t.row_num > 1
LIMIT <business_rule_max> );
Then run it multiple times as needed.

Hive - top n records within a group

I am currently using Hive and I have a table with the fields user_id and value. I want to order the values in descending order within each user_id and then only emit the top 100 records for each user_id. This is the code I am attempting to use:
DROP TABLE IF EXISTS mytable2
CREATE TABLE mytable2 AS
SELECT * FROM
(SELECT *, rank (user_id) as rank
FROM
(SELECT * from mytable
DISTRIBUTE BY user_id
SORT BY user_id, value DESC)a )b
WHERE rank<101
ORDER BY rank;
However when I run this query, I get the following error:
Error while compiling statement: FAILED: SemanticException [Error 10247]: Missing over clause for function : rank [ERROR_STATUS]
FYI - My UserIds are alpha-numeric.
Can anyone help?
Thanks in advance.
Add comment
As the error message says, you have error using the rank function,
try to add over after rank as following:
....
(SELECT *, rank (user_id) over (order by user_id) as rank
....
for further information how to use the rank function you could refer to this documentation

Firebird 2.5 Removing Rows with Duplicate Fields

I am trying to removing duplicate values which, for some reason, was imported in a specific Table.
There is no Primary Key in this table.
There is 27797 unique records.
Select distinct txdate, plunumber from itemaudit
Give me the correct records, but only displays the txdate, plunumber of course.
If it was possible to select all the fields but only select the distinct of txdate,plunumber I could export the values, delete the duplicated ones and re-import.
Or if its possible to delete the distinct values from the entire table.
If you select the distinct of all fields the value is incorrect.
To get all information on the duplicates, you simply need to query all information for the duplicate rows using a JOIN:
SELECT b.*
FROM (SELECT COUNT(*) as cnt, txdate, plunumber
FROM itemaudit
GROUP BY txdate, plunumber
HAVING COUNT(*) > 1) a
INNER JOIN itemaudit b ON a.txdate = b.txdate AND a.plunumber = b.plunumber
DELETE FROM itemaudit t1
WHERE EXISTS (
SELECT 1 FROM itemaudit t2
WHERE t1.txdate = t2.txdate and t1.plunumber = t2.plunumber
AND t1.RDB$DB_KEY < t2.RDB$DB_KEY
);