Deleting records using select statement - postgresql

I am have to delete records whose count is > 1. For this, at the first step, I need to pick rec_id from custd table whose count is greater than 1 and delete data for that particular rec_id except the rec_id having highest id value.
select rec_id , field_id, count(*)
from mst.custom_data cd
group by rec_id, field_id
having count(*) > 1;
The output looks like :
rec_id field_id count
141761; 3; 2
117460; 7; 2
141970; 2; 2
select * from mst.custom_data where rec_id = '141761' and field_id=3
id field_id rec_id
200; 3; 141761
53791; 3; 141761
So, the above which is containing the least id should be deleted.

We can try using a correlated subquery here:
DELETE
FROM mst.custom_data m1
WHERE EXISTS (SELECT 1 FROM mst.custom_data m2
WHERE m1.rec_id = m2.rec_id AND m1.field_id = m2.field_id
GROUP BY rec_id, field_id
HAVING COUNT(*) > 1 AND MAX(m2.id) > m1.id);
The correlated subquery returns a record for a given (rec_id, field_id) group value if the outer id value being considered for deletion is stictly less than the max id for that group. This is the logic you requested.

Related

how to list records that conform to a sequentially incrementing id in postgres

Is there a way to select records are sequentially incremented?
for example, for a list of records
id 0
id 1
id 3
id 4
id 5
id 8
a command like:
select id incrementally from 3
Will return values 3,4 and 5. It won't return 8 because it's not sequentially incrementing from 5.
step-by-step demo:db<>fiddle
WITH groups AS ( -- 2
SELECT
*,
id - row_number() OVER (ORDER BY id) as group_id -- 1
FROM mytable
)
SELECT
*
FROM groups
WHERE group_id = ( -- 4
SELECT group_id FROM groups WHERE id = 3 -- 3
)
row_number() window function create a consecutive row count. With this difference you are able to create groups of consecutive records (id values which are increasing by 1)
This query is put into a WITH clause because we reuse the result twice in the next step
Select the recently created group_id
Filter the table for this group.
Additionally: If you want to start your output at id = 4, for example, you need to add a AND id >= 4 filter to the WHERE clause

Inner join removed from the SQL query

I have a below SQL query to get the three records for notifying purpose.
SELECT orders.msg
FROM orders
INNER JOIN
(
SELECT id
FROM orders
WHERE type_id = 12
ORDER BY id DESC LIMIT 3 OFFSET 0
) AS items
ON orders.id = items.id;
When trying to make the query optimized, i made the changes as below.
SELECT orders.msg
FROM orders
WHERE type_id = 12
ORDER BY id DESC LIMIT 3 OFFSET 0;
Is the modified query seems to be OK or did i miss anything here or any other way of doing is there??
The simplified version on the bottom looks logically identical, to me, to the one on top:
SELECT msg
FROM orders
WHERE type_id = 12
ORDER BY id DESC LIMIT 3;
Note that the above query could benefit from the following index:
CREATE INDEX idx ON orders (type_id, id, msg);
This index would completely cover the WHERE, ORDER BY, and SELECT clauses.
You can try this also:
SELECT orders.msg
FROM orders
WHERE orders.id
IN (
SELECT id
FROM orders
WHERE type_id = 12
ORDER BY id
DESC LIMIT 3 OFFSET 0
)

How to optimize selecting one random row from a set acquired by JOIN

Query in English:
Retrieve a random row from stuff.
row is not mentioned in done.
row belongs to the highest* scored friend.
*if no rows that belong to highest scored friend are found, take the next friend, an so on.
My current query takes too long to complete, because it is randomly ordering all stuff, while it should randomly order batch after batch.
Here is an sqlfiddle with tables and data.
My query:
WITH ordered_friends AS (SELECT *
FROM friends
ORDER BY score DESC)
SELECT s.stuff_id
FROM ordered_friends
INNER JOIN (SELECT *
FROM stuff
ORDER BY random()) AS s ON s.owner = ordered_friends.friend
WHERE NOT EXISTS(
SELECT 1
FROM done
WHERE done.me = 42
AND done.friend = s.owner
AND done.stuff_id = s.stuff_id
)
-- but it should keep the order of ordered_friends (score)
-- it does not have to reorder all stuff
-- one batch for each friend is enough until a satisfying row is found.
LIMIT 1;
How about this?
SELECT s.stuff_id
FROM friends
CROSS JOIN LATERAL (SELECT stuff_id
FROM stuff
WHERE stuff.owner = friends.friend
AND NOT EXISTS(SELECT 1
FROM done
WHERE done.me = 42
AND done.friend = stuff.owner
AND done.stuff_id = stuff.stuff_id
)
ORDER BY random()
LIMIT 1
) s
ORDER BY friends.score DESC
LIMIT 1;
The following indexes would make it fast:
CREATE INDEX ON friends(score); -- for sorting
CREATE INDEX ON stuff(owner); -- for the nested loop
CREATE INDEX ON done(stuff_id, friend); -- for NOT EXISTS

After doing CTE Select Order By and then Update, Update results are not ordered the same (TSQL)

The code is roughly like this:
WITH cte AS
(
SELECT TOP 4 id, due_date, check
FROM table_a a
INNER JOIN table_b b ON a.linkid = b.linkid
WHERE
b.status = 1
AND due_date > GetDate()
ORDER BY due_date, id
)
UPDATE cte
SET check = 1
OUTPUT
INSERTED.id,
INSERTED.due_date
Note: the actual data has same due_date.
When I ran the SELECT statement only inside the cte, I could get the result, for ex: 1, 2, 3, 4.
But after the UPDATE statement, the updated results are: 4, 1, 2, 3
Why is this (order-change) happening?
How to keep or re-order the results back to 1,2,3,4 in this same 1 query?
In MSDN https://msdn.microsoft.com/pl-pl/library/ms177564(v=sql.110).aspx you can read that
There is no guarantee that the order in which the changes are applied
to the table and the order in which the rows are inserted into the
output table or table variable will correspond.
Thats mean you can't solve your problem with only one query. But you still can use one batch to do what you need. Because your output don't guarantee the order then you have to save it in another table and order it after update. This code will return your output values in order that you assume:
declare #outputTable table( id int, due_date date);
with cte as (
select top 4 id, due_date, check
from table_a a
inner join table_b b on a.linkid = b.linkid
where b.status = 1
and due_date > GetDate()
order by due_date, id
)
update cte
set check = 1
output inserted.id, inserted.due_date
into #outputTable;
select *
from #outputTable
order by due_date, id;

TSQL - Mapping one table to another without using cursor

I have tables with following structure
create table Doc(
id int identity(1, 1) primary key,
DocumentStartValue varchar(100)
)
create Metadata (
DocumentValue varchar(100),
StartDesignation char(1),
PageNumber int
)
GO
Doc contains
id DocumentStartValue
1000 ID-1
1100 ID-5
2000 ID-8
3000 ID-9
Metadata contains
Documentvalue StartDesignation PageNumber
ID-1 D 0
ID-2 NULL 1
ID-3 NULL 2
ID-4 NULL 3
ID-5 D 0
ID-6 NULL 1
ID-7 NULL 2
ID-8 D 0
ID-9 D 0
What I need to is to map Metadata.DocumentValues to Doc.id
So the result I need is something like
id DocumentValue PageNumber
1000 ID-1 0
1000 ID-2 1
1000 ID-3 2
1000 ID-4 3
1100 ID-5 0
1100 ID-6 1
1100 ID-7 2
2000 ID-8 0
3000 ID-9 0
Can it be achieved without the use of cursor?
Something like, sorry can't test
;WITH RowList AS
( --assign RowNums to each row...
SELECT
ROW_NUMBER() OVER (ORDER BY id) AS RowNum,
id, DocumentStartValue
FROM
doc
), RowPairs AS
( --this allows us to pair a row with the previous rows to create ranges
SELECT
R.DocumentStartValue AS Start, R.id,
R1.DocumentStartValue AS End
FROM
RowList R JOIN RowList R1 ON R.RowNum + 1 = R1.RowNum
)
--use ranges to join back and get the data
SELECT
RP.id, M.DocumentValue, M.PageNumber
FROM
RowPairs RP
JOIN
Metadata M ON RP.Start <= M.DocumentValue AND M.DocumentValue < RP.End
Edit: This assumes that you can rely on the ID-x values matching and being ascending. If so, StartDesignation is superfluous/redundant and may conflict with the Doc table DocumentStartValue
with rm as
(
select DocumentValue
,PageNumber
,case when StartDesignation = 'D' then 1 else 0 end as IsStart
,row_number() over (order by DocumentValue) as RowNumber
from Metadata
)
,gm as
(
select
DocumentValue as DocumentGroup
,DocumentValue
,PageNumber
,RowNumber
from rm
where RowNumber = 1
union all
select
case when rm.IsStart = 1 then rm.DocumentValue else gm.DocumentGroup end
,rm.DocumentValue
,rm.PageNumber
,rm.RowNumber
from gm
inner join rm on rm.RowNumber = (gm.RowNumber + 1)
)
select d.id, gm.DocumentValue, gm.PageNumber
from Doc d
inner join gm on d.DocumentStartValue = gm.DocumentGroup
Try to use query above (maybe you will need to add option (maxrecursion ...) also) and add index on DocumentValue for Metadata table. Also, it it's possible - it will be better to save appropriate group on Metadat rows inserting.
UPD: I've tested it and fixed errors in my query, not it works and give result as in initial question.
UPD2: And recommended indexes:
create clustered index IX_Metadata on Metadata (DocumentValue)
create nonclustered index IX_Doc_StartValue on Doc (DocumentStartValue)