Firebird 2.5 Removing Rows with Duplicate Fields - firebird

I am trying to removing duplicate values which, for some reason, was imported in a specific Table.
There is no Primary Key in this table.
There is 27797 unique records.
Select distinct txdate, plunumber from itemaudit
Give me the correct records, but only displays the txdate, plunumber of course.
If it was possible to select all the fields but only select the distinct of txdate,plunumber I could export the values, delete the duplicated ones and re-import.
Or if its possible to delete the distinct values from the entire table.
If you select the distinct of all fields the value is incorrect.

To get all information on the duplicates, you simply need to query all information for the duplicate rows using a JOIN:
SELECT b.*
FROM (SELECT COUNT(*) as cnt, txdate, plunumber
FROM itemaudit
GROUP BY txdate, plunumber
HAVING COUNT(*) > 1) a
INNER JOIN itemaudit b ON a.txdate = b.txdate AND a.plunumber = b.plunumber

DELETE FROM itemaudit t1
WHERE EXISTS (
SELECT 1 FROM itemaudit t2
WHERE t1.txdate = t2.txdate and t1.plunumber = t2.plunumber
AND t1.RDB$DB_KEY < t2.RDB$DB_KEY
);

Related

Using two COUNT in SELECT returns the same values

SELECT user_posts.id,
COUNT(user_post_comments.post_id) as number_of_comments,
COUNT(user_post_reactions.post_id) as number_of_reactions
FROM user_posts
LEFT JOIN user_post_comments
ON (user_posts.id = user_post_comments.post_id)
LEFT JOIN user_post_reactions
ON (user_posts.id = user_post_reactions.post_id)
WHERE user_posts.user_id = '850e6511-2f30-472d-95a1-59a02308b46a'
group by user_posts.id
I have this query for getting the number of comments and reactions from another table by post_id
current output screenshot
To caluclate number of comments and reactions just use subqueries. No need to join and group by.
SELECT user_posts.id,
( select COUNT(*) from user_post_comments
where user_posts.id = user_post_comments.post_id
) as number_of_comments,
( select COUNT(*) from user_post_reactions
where user_posts.id = user_post_reactions.post_id
) as number_of_reactions
FROM user_posts
WHERE user_posts.user_id = '850e6511-2f30-472d-95a1-59a02308b46a'
If both joins return non-null rows, what you get for each count is the product of the number of rows from each for a given user_posts.id. One way you could fix that is by counting distinct identifiers for each table, e.g.
COUNT(DISTINCT user_post_comments.id) as number_of_comments
(Assuming "id" exists as a primary key on that table). This may not be spectacularly efficient, but is relatively simple.

PostgreSQL group by all fields

I have a query like this:
SELECT
table1.*,
sum(table2.amount) as totalamount
FROM table1
join table2 on table1.key = table2.key
GROUP BY table1.*;
I got the error: column "table1.key" must appear in the GROUP BY clause or be used in an aggregate function.
Are there any way to group "all" field?
There is no shortcut syntax for grouping by all columns, but it's probably not necessary in the described case. If the key column is a primary key, it's enough when you use it:
GROUP BY table1.key;
You have to specify all the column names in group by that are selected and are not part of aggregate function ( SUM/COUNT etc)
select c1,c2,c4,sum(c3) FROM totalamount
group by c1,c2,c4;
A shortcut to avoid writing the columns again in group by would be to specify them as numbers.
select c1,c2,c4,sum(c3) FROM t
group by 1,2,3;
I found another way to solve, not perfect but maybe it's useful:
SELECT string_agg(column_name::character varying, ',') as columns
FROM information_schema.columns
WHERE table_schema = 'your_schema'
AND table_name = 'your_table
Then apply this select result to main query like this:
$columns = $result[0]["columns"];
SELECT
table1.*,
sum(table2.amount) as totalamount
FROM table1
join table2 on table1.key = table2.key
GROUP BY $columns;

How to remove all the duplicate records except the last occurence in a table

I have a interim table without any primary key and identity. I need to check one of the columns (branch_ref) value for duplicate entries and should mark the flag as 'D' if the branch_ref is same for more than one record except the last occurrence in the table. How can we do this?
Actual data as stored in table.
select branch_name,branch_reference,address_1,zip_cd,null as flag_val FROM Branch_Master
As per above table, I need all flag to be updated as ā€˜Dā€™ except for 6th (brach_reference=9910) and 16th record (branch_reference=99100 and zip_cd=612).
When I use row_number function to identify the duplicates order gets changed.
SELECT branch_name,branch_reference,address_1,zip_cd,flag_val, ROW_NUMBER() OVER(PARTITION BY branch_reference ORDER BY branch_reference) RID
FROM Branch_Master
Am using below query to update flag_val and its updating wrong records.
;WITH CTE AS
(
SELECT branch_name,branch_reference,address_1,zip_cd,flag_val, ROW_NUMBER() OVER(PARTITION BY branch_reference ORDER BY branch_reference) RID
FROM Branch_Master
WHERE flag_val IS NULL
)
UPDATE C1 SET flag_val = 'D'
FROM CTE C1
LEFT OUTER JOIN (SELECT branch_reference, max(RID) MRID FROM CTE GROUP BY branch_reference) C2
ON C1.branch_reference=C2.branch_reference and C1.RID=C2.MRID
WHERE C2.branch_reference IS NULL

PostgreSQL Removing duplicates

I am working on postgres query to remove duplicates from a table. The following table is dynamically generated and I want to write a select query which will remove the record if the first row has duplicate values.
The table looks something like this
Ist col 2nd col
4 62
6 34
5 26
5 12
I want to write a select query which remove either row 3 or 4.
There is no need for an intermediate table:
delete from df1
where ctid not in (select min(ctid)
from df1
group by first_column);
If you are deleting many rows from a large table, the approach with an intermediate table is probably faster.
If you just want to get unique values for one column, you can use:
select distinct on (first_column) *
from the_table
order by first_column;
Or simply
select first_column, min(second_column)
from the_table
group by first_column;
select count(first) as cnt, first, second
from df1
group by first
having(count(first) = 1)
if you want to keep one of the rows (sorry, I initially missed it if you wanted that):
select first, min(second)
from df1
group by first
Where the table's name is df1 and the columns are named first and second.
You can actually leave off the count(first) as cnt if you want.
At the risk of stating the obvious, once you know how to select the data you want (or don't want) the delete the records any of a dozen ways is simple.
If you want to replace the table or make a new table you can just use create table as for the deletion:
create table tmp as
select count(first) as cnt, first, second
from df1
group by first
having(count(first) = 1);
drop table df1;
create table df1 as select * from tmp;
or using DELETE FROM:
DELETE FROM df1 WHERE first NOT IN (SELECT first FROM tmp);
You could also use select into, etc, etc.
if you want to SELECT unique rows:
SELECT * FROM ztable u
WHERE NOT EXISTS ( -- There is no other record
SELECT * FROM ztable x
WHERE x.id = u.id -- with the same id
AND x.ctid < u.ctid -- , but with a different(lower) "internal" rowid
); -- so u.* must be unique
if you want to SELECT the other rows, which were suppressed in the previous query:
SELECT * FROM ztable nu
WHERE EXISTS ( -- another record exists
SELECT * FROM ztable x
WHERE x.id = nu.id -- with the same id
AND x.ctid < nu.ctid -- , but with a different(lower) "internal" rowid
);
if you want to DELETE records, making the table unique (but keeping one record per id):
DELETE FROM ztable d
WHERE EXISTS ( -- another record exists
SELECT * FROM ztable x
WHERE x.id = d.id -- with the same id
AND x.ctid < d.ctid -- , but with a different(lower) "internal" rowid
);
So basically I did this
create temp t1 as
select first, min (second) as second
from df1
group by first
select * from df1
inner join t1 on t1.first = df1.first and t1.second = df1.second
Its a satisfactory answer. Thanks for your help #Hack-R

Simple SELECT, but adding JOIN returns too many rows

The query below returns 9,817 records. Now, I want to SELECT one more field from another table. See the 2 lines that are commented out, where I've simply selected this additional field and added a JOIN statement to bind this new columns. With these lines added, the query now returns 649,200 records and I can't figure out why! I guess something is wrong with my WHERE criteria in conjunction with the JOIN statement. Please help, thanks.
SELECT DISTINCT dbo.IMPORT_DOCUMENTS.ITEMID, BEGDOC, BATCHID
--, dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.CATEGORY_ID
FROM IMPORT_DOCUMENTS
--JOIN dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS ON
dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID = dbo.IMPORT_DOCUMENTS.ITEMID
WHERE (BATCHID LIKE 'IC0%' OR BATCHID LIKE 'LP0%')
AND dbo.IMPORT_DOCUMENTS.ITEMID IN
(SELECT dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID FROM
CATEGORY_COLLECTION_CATEGORY_RESULTS
WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN(
SELECT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16))
AND Sample_Id > 0)
AND dbo.IMPORT_DOCUMENTS.ITEMID NOT IN
(SELECT ASSIGNMENT_FOLDER_DOCUMENTS.Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)
One possible reason is because one of your tables contains data at lower level, lower than your join key. For example, there may be multiple records per item id. The same item id is repeated X number of times. I would fix the query like the below. Without data knowledge, Try running the below modified query.... If output is not what you're looking for, convert it into SELECT Within a Select...
Hope this helps....
Try this SQL: SELECT DISTINCT a.ITEMID, a.BEGDOC, a.BATCHID, b.CATEGORY_ID FROM IMPORT_DOCUMENTS a JOIN (SELECT DISTINCT ITEMID FROM CATEGORY_COLLECTION_CATEGORY_RESULTS WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN (SELECT DISTINCT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16)) AND Sample_Id > 0) B ON a.ITEMID =b.ITEMID WHERE a.(a.BATCHID LIKE 'IC0%' OR a.BATCHID LIKE 'LP0%') AND a.ITEMID NOT IN (SELECT DIDTINCT Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)