Remove the repetitive row by checking different column

Remove the repetitive row by checking different column - postgresql

I have data sample such as below :
How can PostgreSQL eliminate the duplicate row by check column1 and column2 at different row ? I just doing a normal comparison but still unlucky at this time. Hope someone may share the idea.

SELECT d1.*
FROM distance d1
WHERE NOT EXISTS (SELECT 1
FROM distance d2
WHERE d1."from" = d2."to"
AND d1."to" = d2."from"
AND d2."from" < d2."to"
);
If there is a “duplicate”, this query will only pick the row where "from" < "to".

Related

Repeat the value from left table if date from right table falls between start date and end date

I would like to join my two tables on [Group] and [YearMonth] Dates. Where [YRMO_NB] from Table 2 falls between [ENR_START] AND [ENR_END] from Table 1 then repeat the value of column [PHASE] for each related row, just like the second picture last column = [PHASE] and leaves unmatched rows blank.
I did this which only gives me exact matches:
ON A.GROUP = PHASE.GROUP
AND A.YRMO_NB = PHASE.ENR_START
Table 1
Table 2
Is there an easy way to do this ?
Thank You!

I figured it out
ON A.GROUP = PHASE.GROUP
AND A.YRMO_NB >= PHASE.ENR_START
and A.YRMO_NB <= PHASE.ENR_END

How to do a one-to-many join with conditions in posgreql

Forgive me, I don't know how to ask this question and google for an answer. May have already been answered elsewhere on Stack, let me know if it is.
I want to use postgresql to join one table, Table A, with Table B such that the values in one set of columns in Table A are joined and multiplied (one-to-many join) by the corresponding values in a set of columns in Table B, based on whether the values in the set of columns in Table B are within the range of the values in the set of columns in Table A.
Basically:
Where Start_A >= Start_B AND End_A <= End_B
Like so:

i think this can help you. But in your quest and result where you question "Basically: Where Start_A >= Start_B AND End_A <= End_B", I think this your mistake because in result i saw Start_A <= Start_B AND End_A >= End_B. And id write the query for you:
SELECT *
FROM a LEFT JOIN b ON startA <= startB
WHERE endA >= endB

Finding if values in two columns exist

I have two columns of dates and I want to run a query that returns TRUE if there is a date in existence in the first column and in existence in the second column.
I know how to do it when I'm looking for a match (if the data entry in column A is the SAME as the entry in column B), but I don't know know how to find if data entry in column A and B are in existence.
Does anyone know how to do this? Thanks!

If data in a column is present, it IS NOT NULL. You can query for that on both columns, with and AND clause to get your result:
SELECT (date1 IS NOT NULL AND date2 IS NOT NULL) AS both_dates
FROM mytable;

So, rephrasing:
For any two entries in table x with date columns a and b, is there some pair of rows x1 and x2 where x1.a = x2.b?
If that's what you're trying to do, you want a self-join, e.g, presuming the presence of a single key column named id:
SELECT x1.id, x2.id, x1.a AS x1_a_x2_b
FROM mytable x1
INNER JOIN mytable x2 ON (x1.a = x2.b);

Long running query on a self joined table

I try to improve the performance of a query which updates a coloumn on each row of a table, by comparing the actual row's values with all other rows in the same table. Here is the query:
update F set
PartOfPairRC = 1
from RangeChange F
where Reject=0
and exists(
select 1 from RangeChange S
where F.StoreID = S.StoreID
and F.ItemNo = S.ItemNo
and F.Reject = S.Reject
and F.ChangeDateEnd = S.ChangeDate - 1)
The query's performance degrades rapidly as the number of rows in the table increases. I have 50 millon rows in the table.
Is there a better way to do this? Would SSIS be able to handle such an operation better?
Any help much appreciated, thanks Robert

You can try to create a index on that table:
create index idx_test on RangeChange(StoreID, ItemNo, Reject, ChangeDateEnd) where reject = 0
--when you are not using the SQL enterprise get rid of the where condition in the index and put the reject column as included column in the index
--make sure you have a clustered index already on the table (when not you can create the index above as clustered)
-- I would write the query as a join:
update F set
F.PartOfPairRC = 1
from RangeChange F
join RangeChange S
on F.StoreID = S.StoreID
and F.ItemNo = S.ItemNo
and F.Reject = S.Reject
and F.ChangeDateEnd = S.ChangeDate - 1
where F.Reject=0 and S.Reject = 0

PostgreSQL and pl/pgsql SYNTAX to update fields based on SELECT and FUNCTION (while loop, DISTINCT COUNT)

I have a large database, that I want to do some logic to update new fields.
The primary key is id for the table harvard_assignees
The LOGIC GOES LIKE THIS
Select all of the records based on id
For each record (WHILE), if (state is NOT NULL && country is NULL), update country_out = "US" ELSE update country_out=country
I see step 1 as a PostgreSQL query and step 2 as a function. Just trying to figure out the easiest way to implement natively with the exact syntax.
====
The second function is a little more interesting, requiring (I believe) DISTINCT:
Find all DISTINCT foreign_keys (a bivariate key of pat_type,patent)
Count Records that contain that value (e.g., n=3 records have fkey "D","388585")
Update those 3 records to identify percent as 1/n (e.g., UPDATE 3 records, set percent = 1/3)

For the first one:
UPDATE
harvard_assignees
SET
country_out = (CASE
WHEN (state is NOT NULL AND country is NULL) THEN 'US'
ELSE country
END);
At first it had condition "id = ..." but I removed that because I believe you actually want to update all records.
And for the second one:
UPDATE
example_table
SET
percent = (SELECT 1/cnt FROM (SELECT count(*) AS cnt FROM example_table AS x WHERE x.fn_key_1 = example_table.fn_key_1 AND x.fn_key_2 = example_table.fn_key_2) AS tmp WHERE cnt > 0)
That one will be kind of slow though.
I'm thinking on a solution based on window functions, you may want to explore those too.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Remove the repetitive row by checking different column - postgresql

I have data sample such as below : How can PostgreSQL eliminate the duplicate row by check column1 and column2 at different row ? I just doing a normal comparison but still unlucky at this time. Hope someone may share the idea.

SELECT d1.* FROM distance d1 WHERE NOT EXISTS (SELECT 1 FROM distance d2 WHERE d1."from" = d2."to" AND d1."to" = d2."from" AND d2."from" < d2."to" ); If there is a “duplicate”, this query will only pick the row where "from" < "to".

Related

Repeat the value from left table if date from right table falls between start date and end date

How to do a one-to-many join with conditions in posgreql

Finding if values in two columns exist

Long running query on a self joined table

PostgreSQL and pl/pgsql SYNTAX to update fields based on SELECT and FUNCTION (while loop, DISTINCT COUNT)

Categories

Resources