Select from a delete subquery returning values - postgresql

I'm trying to combine two steps into a single query. I'm trying to remove rows from one table with a particular store ID, and then deactivate employees on another table if they no longer have any matching rows in the first table. Here's what I've got:
UPDATE business.employee
SET active = FALSE
WHERE employee_id IN
(SELECT employee_id FROM (DELETE FROM business.employeeStore
WHERE store_id = 1000
RETURNING employee_id) Deleted
LEFT JOIN business.employeeStore EmployeeStore
ON Deleted.employee_id = EmployeeStore.employee_id
WHERE EmployeeStore.store_id IS NULL)
Logically, I think what I've written is sound, but syntactically, it's not quite there. It seems like this should be possible, since the DELETE FROM subquery is returning a single column table of results, and that subquery works fine by itself. But it tells me there is a syntax error at or near FROM. Even if I don't include the UPDATE portion of the query, and just do the interior SELECT part, it gives me the same error.
UPDATE: I tried using a WITH command to get around the syntax problem as follows:
WITH Deleted AS (DELETE FROM business.employeeStore
WHERE store_id = 1000
RETURNING employee_id)
UPDATE business.employee
SET active = FALSE
WHERE employee_id IN
(SELECT employee_id FROM Deleted
LEFT JOIN business.employeeStore EmployeeStore
ON Deleted.employee_id = EmployeeStore.employee_id
WHERE EmployeeStore.store_id IS NULL)
This doesn't produce any errors, but after playing around with the code for a while, I've determined that while it does get the results from the WITH part, it doesn't actually do the DELETE until after the UPDATE completes. So the SELECT subquery doesn't return any results.

I finally was able to work out how to do this using the WITH. The main issue was needing to handle the table in its pre-DELETE state. I've kept it all in one query like so:
WITH Deleted AS
(DELETE FROM business.employeeStore
WHERE store_id = 1000
RETURNING employee_id)
UPDATE business.employee
SET active = FALSE
WHERE employee_id IN
(SELECT employee_id FROM Deleted)
AND employee_id NOT IN
(SELECT employee_id FROM Deleted
JOIN business.employeeStore EmployeeStore
ON Deleted.employee_id = EmployeeStore.employee_id
WHERE EmployeeStore.store_id != 1000)

Related

The last updated data shows first in the postgres selet query?

I have simple query that takes some results from User model.
Query 1:
SELECT users.id, users.name, company_id, updated_at
FROM "users"
WHERE (TRIM(telephone) = '8973847' AND company_id = 90)
LIMIT 20 OFFSET 0;
Result:
Then I have done some update on the customer 341683 and again I run the same query that time the result shows different, means the last updated shows first. So postgres is taking the last updated by default or any other things happens here?
Without an order by clause, the database is free to return rows in any order, and will usually just return them in whichever way is fastest. It stands to reason the row you recently updated will be in some cache, and thus returned first.
If you need to rely on the order of the returned rows, you need to explicitly state it, e.g.:
SELECT users.id, users.name, company_id, updated_at
FROM "users"
WHERE (TRIM(telephone) = '8973847' AND company_id = 90)
ORDER BY id -- Here!
LIMIT 20 OFFSET 0

PostgreSQL - order randomly, but with NULLs first

I have a query that takes all rows out of a table, and joins with another table I am updating. The other table has some items that have been checked (these get a value), and some which are not yet checked. I am trying to implement a way to update all records, but make sure any NULLs get sorted as quickly as possible. I have the following query:
SELECT * FROM posts
LEFT JOIN post_stats
ON post_stats.post_id = posts.id
ORDER BY RANDOM() NULLS FIRST LIMIT 10
However, this is ordering everything randomly. Is there a way to order everything randomly, but any NULLs get shown first?
Note that you don't even specify which column can contain NULLs in your query. This is an indicator that something is going wrong.
The following query (replace with what you need) should do what you want.
SELECT *
FROM posts
LEFT JOIN post_stats ON post_stats.post_id = posts.id
ORDER BY <YOUR_COLUMN> IS NOT NULL, RANDOM()
LIMIT 10;

Postgres Remove records by duplicate control_id [duplicate]

I have a table in a PostgreSQL 8.3.8 database, which has no keys/constraints on it, and has multiple rows with exactly the same values.
I would like to remove all duplicates and keep only 1 copy of each row.
There is one column in particular (named "key") which may be used to identify duplicates, i.e. there should only exist one entry for each distinct "key".
How can I do this? (Ideally, with a single SQL command.)
Speed is not a problem in this case (there are only a few rows).
A faster solution is
DELETE FROM dups a USING (
SELECT MIN(ctid) as ctid, key
FROM dups
GROUP BY key HAVING COUNT(*) > 1
) b
WHERE a.key = b.key
AND a.ctid <> b.ctid
DELETE FROM dupes a
WHERE a.ctid <> (SELECT min(b.ctid)
FROM dupes b
WHERE a.key = b.key);
This is fast and concise:
DELETE FROM dupes T1
USING dupes T2
WHERE T1.ctid < T2.ctid -- delete the older versions
AND T1.key = T2.key; -- add more columns if needed
See also my answer at How to delete duplicate rows without unique identifier which includes more information.
EXISTS is simple and among the fastest for most data distributions:
DELETE FROM dupes d
WHERE EXISTS (
SELECT FROM dupes
WHERE key = d.key
AND ctid < d.ctid
);
From each set of duplicate rows (defined by identical key), this keeps the one row with the minimum ctid.
Result is identical to the currently accepted answer by a_horse. Just faster, because EXISTS can stop evaluating as soon as the first offending row is found, while the alternative with min() has to consider all rows per group to compute the minimum. Speed is of no concern to this question, but why not take it?
You may want to add a UNIQUE constraint after cleaning up, to prevent duplicates from creeping back in:
ALTER TABLE dupes ADD CONSTRAINT constraint_name_here UNIQUE (key);
About the system column ctid:
Is the system column “ctid” legitimate for identifying rows to delete?
If there is any other column defined UNIQUE NOT NULL column in the table (like a PRIMARY KEY) then, by all means, use it instead of ctid.
If key can be NULL and you only want one of those, too, use IS NOT DISTINCT FROM instead of =. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
As that's slower, you might instead run the above query as is, and this in addition:
DELETE FROM dupes d
WHERE key IS NULL
AND EXISTS (
SELECT FROM dupes
WHERE key IS NULL
AND ctid < d.ctid
);
And consider:
Create unique constraint with null columns
For small tables, indexes generally do not help performance. And we need not look further.
For big tables and few duplicates, an existing index on (key) can help (a lot).
For mostly duplicates, an index may add more cost than benefit, as it has to be kept up to date concurrently. Finding duplicates without index becomes faster anyway because there are so many and EXISTS only needs to find one. But consider a completely different approach if you can afford it (i.e. concurrent access allows it): Write the few surviving rows to a new table. That also removes table (and index) bloat in the process. See:
How to delete duplicate entries?
I tried this:
DELETE FROM tablename
WHERE id IN (SELECT id
FROM (SELECT id,
ROW_NUMBER() OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
FROM tablename) t
WHERE t.rnum > 1);
provided by Postgres wiki:
https://wiki.postgresql.org/wiki/Deleting_duplicates
I would use a temporary table:
create table tab_temp as
select distinct f1, f2, f3, fn
from tab;
Then, delete tab and rename tab_temp into tab.
I had to create my own version. Version written by #a_horse_with_no_name is way too slow on my table (21M rows). And #rapimo simply doesn't delete dups.
Here is what I use on PostgreSQL 9.5
DELETE FROM your_table
WHERE ctid IN (
SELECT unnest(array_remove(all_ctids, actid))
FROM (
SELECT
min(b.ctid) AS actid,
array_agg(ctid) AS all_ctids
FROM your_table b
GROUP BY key1, key2, key3, key4
HAVING count(*) > 1) c);
Another approach (works only if you have any unique field like id in your table) to find all unique ids by columns and remove other ids that are not in unique list
DELETE
FROM users
WHERE users.id NOT IN (SELECT DISTINCT ON (username, email) id FROM users);
Postgresql has windows function, you can use rank() to archive your goal, sample:
WITH ranked as (
SELECT
id, column1,
"rank" () OVER (
PARTITION BY column1
order by column1 asc
) AS r
FROM
table1
)
delete from table1 t1
using ranked
where t1.id = ranked.id and ranked.r > 1
Here is another solution, that worked for me.
delete from table_name a using table_name b
where a.id < b.id
and a.column1 = b.column1;
How about:
WITH
u AS (SELECT DISTINCT * FROM your_table),
x AS (DELETE FROM your_table)
INSERT INTO your_table SELECT * FROM u;
I had been concerned about execution order, would the DELETE happen before the SELECT DISTINCT, but it works fine for me.
And has the added bonus of not needing any knowledge about the table structure.
Here is a solution using PARTITION BY and the virtual ctid column, which is works like a primary key, at least within a single session:
DELETE FROM dups
USING (
SELECT
ctid,
(
ctid != min(ctid) OVER (PARTITION BY key_column1, key_column2 [...])
) AS is_duplicate
FROM dups
) dups_find_duplicates
WHERE dups.ctid == dups_find_duplicates.ctid
AND dups_find_duplicates.is_duplicate
A subquery is used to mark all rows as duplicates or not, based on whether they share the same "key columns", but not the same ctid, as the "first" one found in the "partition" of rows sharing the same keys.
In other words, "first" is defined as:
min(ctid) OVER (PARTITION BY key_column1, key_column2 [...])
Then, all rows where is_duplicate is true are deleted by their ctid.
From the documentation, ctid represents (emphasis mine):
The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. A primary key should be used to identify logical rows.
well, none of this solution would work if the id is duplicated which is my use case, then the solution is simple:
myTable:
id name
0 value
0 value
0 value
1 value1
1 value1
create dedupMyTable as select distinct * from myTable;
delete from myTable;
insert into myTable select * from dedupMyTable;
select * from myTable;
id name
0 value
1 value1
well you shouldn't have duplicates id into your table unless it doesn't have PK constraints or simply doesn't support it such as Hive/data lake tables
Better pay attention when loading your data to avoid dups over ID's
DELETE FROM tracking_order
WHERE
mvd_id IN (---column you need to remove duplicate
SELECT
mvd_id
FROM (
SELECT
mvd_id,thoi_gian_gui,
ROW_NUMBER() OVER (
PARTITION BY mvd_id
ORDER BY thoi_gian_gui desc) AS row_num
FROM
tracking_order
) s_alias
WHERE row_num > 1)
AND thoi_gian_gui in ( --column you used to compare to delete duplicates, eg last update time
SELECT
thoi_gian_gui
FROM (
SELECT
thoi_gian_gui,
ROW_NUMBER() OVER (
PARTITION BY mvd_id
ORDER BY thoi_gian_gui desc) AS row_num
FROM
tracking_order
) s_alias
WHERE row_num > 1)
My code, I remove all duplicates 7800445 row and keep only 1 copy of each row with 7 min 28 secs.
enter image description here
This worked well for me. I had a table, terms, that contained duplicate values. Ran a query to populate a temp table with all of the duplicate rows. Then I ran the a delete statement with those ids in the temp table. value is the column that contained the duplicates.
CREATE TEMP TABLE dupids AS
select id from (
select value, id, row_number()
over (partition by value order by value)
as rownum from terms
) tmp
where rownum >= 2;
delete from [table] where id in (select id from dupids)

how to use results from first query in second query

Ive been reading about mysqli multi_query and couldnt find a way to do this (if its possible)
$db->multi_query("SELECT id FROM table WHERE session='1';
UPDATE table SET last_login=NOW() WHERE id=table.id");
It doesnt seem to work. I am trying to use the id of the first query to update the second. is this possible
UPDATE table
SET last_login = NOW()
WHERE id IN (SELECT id
FROM table2
WHERE session = '1')
That will update all your records with session = '1'. Assuming of course that the subquery returns more than one result set, which from what I can see, it will.
This also allows you to drop the multi_query() method, as it's just a single query.
In response to the comment:
According to http://lists.mysql.com/mysql/219882 this doesn't appear to be possible with MySQL. Although I suppose you could go for something like:
$db->multiquery(
"UPDATE table
SET last_login = NOW()
WHERE id IN (SELECT id
FROM table2
WHERE session = '1');
SELECT id
FROM table2
WHERE session = '1';"
);
Which is ugly, performing the same query twice, but should do what you want.

How to prune a table down to the first 5000 records of 50000

I have a rather large table of 50000 records, and I want to cut this down to 5000. How would I write an SQL query to delete the other 45000 records. The basic table structure contains the column of a datetime.
A rough idea of the query I want is the following
DELETE FROM mytable WHERE countexceeded(5000) ORDER BY filedate DESC;
I could write this in C# somehow grabbing the row index number and doing some work around that, however is there a tidy way to do this?
The answer you have accepted is not valid syntax as DELETE does not allow an ORDER BY clause. You can use
;WITH T AS
(
SELECT TOP 45000 *
FROM mytable
ORDER BY filedate
)
DELETE FROM T
DELETE TOP(45000) FROM mytable ORDER BY filedate ASC;
Change the order by to ascending to get the rows in reverse order and then delete the top 45000.
Hope this helps.
Edit:-
I apologize for the invalid syntax. Here is my second attempt.
DELETE FROM myTable a INNER JOIN
(SELECT TOP(45000) * FROM myTable ORDER BY fileDate ASC) b ON a.id = b.id
If you do not have a unique column then please use Martin Smith's CTE answer.
if the table is correctly ordered:
DELETE FROM mytable LIMIT 5000
if not and the table has correctly ordered auto_increment index:
get the row
SELECT id, filedate FROM mytable LIMIT 1, 50000;
save the id and then delete
DELETE FROM mytable WHERE id >= #id;
if not ordered correctly, you could use filedate instead of id, but if it's a date without time, you could get undesired rows deleted from the same date, so be carefull with filedate deletion solution