Remove the repetitive row by checking different column - postgresql

I have data sample such as below :
How can PostgreSQL eliminate the duplicate row by check column1 and column2 at different row ? I just doing a normal comparison but still unlucky at this time. Hope someone may share the idea.

SELECT d1.*
FROM distance d1
WHERE NOT EXISTS (SELECT 1
FROM distance d2
WHERE d1."from" = d2."to"
AND d1."to" = d2."from"
AND d2."from" < d2."to"
);
If there is a “duplicate”, this query will only pick the row where "from" < "to".

Related

Repeat the value from left table if date from right table falls between start date and end date

I would like to join my two tables on [Group] and [YearMonth] Dates. Where [YRMO_NB] from Table 2 falls between [ENR_START] AND [ENR_END] from Table 1 then repeat the value of column [PHASE] for each related row, just like the second picture last column = [PHASE] and leaves unmatched rows blank.
I did this which only gives me exact matches:
ON A.GROUP = PHASE.GROUP
AND A.YRMO_NB = PHASE.ENR_START
Table 1
Table 2
Is there an easy way to do this ?
Thank You!
I figured it out
ON A.GROUP = PHASE.GROUP
AND A.YRMO_NB >= PHASE.ENR_START
and A.YRMO_NB <= PHASE.ENR_END

How to do a one-to-many join with conditions in posgreql

Forgive me, I don't know how to ask this question and google for an answer. May have already been answered elsewhere on Stack, let me know if it is.
I want to use postgresql to join one table, Table A, with Table B such that the values in one set of columns in Table A are joined and multiplied (one-to-many join) by the corresponding values in a set of columns in Table B, based on whether the values in the set of columns in Table B are within the range of the values in the set of columns in Table A.
Basically:
Where Start_A >= Start_B AND End_A <= End_B
Like so:
i think this can help you. But in your quest and result where you question "Basically: Where Start_A >= Start_B AND End_A <= End_B", I think this your mistake because in result i saw Start_A <= Start_B AND End_A >= End_B. And id write the query for you:
SELECT *
FROM a LEFT JOIN b ON startA <= startB
WHERE endA >= endB

Finding if values in two columns exist

I have two columns of dates and I want to run a query that returns TRUE if there is a date in existence in the first column and in existence in the second column.
I know how to do it when I'm looking for a match (if the data entry in column A is the SAME as the entry in column B), but I don't know know how to find if data entry in column A and B are in existence.
Does anyone know how to do this? Thanks!
If data in a column is present, it IS NOT NULL. You can query for that on both columns, with and AND clause to get your result:
SELECT (date1 IS NOT NULL AND date2 IS NOT NULL) AS both_dates
FROM mytable;
So, rephrasing:
For any two entries in table x with date columns a and b, is there some pair of rows x1 and x2 where x1.a = x2.b?
If that's what you're trying to do, you want a self-join, e.g, presuming the presence of a single key column named id:
SELECT x1.id, x2.id, x1.a AS x1_a_x2_b
FROM mytable x1
INNER JOIN mytable x2 ON (x1.a = x2.b);

Long running query on a self joined table

I try to improve the performance of a query which updates a coloumn on each row of a table, by comparing the actual row's values with all other rows in the same table. Here is the query:
update F set
PartOfPairRC = 1
from RangeChange F
where Reject=0
and exists(
select 1 from RangeChange S
where F.StoreID = S.StoreID
and F.ItemNo = S.ItemNo
and F.Reject = S.Reject
and F.ChangeDateEnd = S.ChangeDate - 1)
The query's performance degrades rapidly as the number of rows in the table increases. I have 50 millon rows in the table.
Is there a better way to do this? Would SSIS be able to handle such an operation better?
Any help much appreciated, thanks Robert
You can try to create a index on that table:
create index idx_test on RangeChange(StoreID, ItemNo, Reject, ChangeDateEnd) where reject = 0
--when you are not using the SQL enterprise get rid of the where condition in the index and put the reject column as included column in the index
--make sure you have a clustered index already on the table (when not you can create the index above as clustered)
-- I would write the query as a join:
update F set
F.PartOfPairRC = 1
from RangeChange F
join RangeChange S
on F.StoreID = S.StoreID
and F.ItemNo = S.ItemNo
and F.Reject = S.Reject
and F.ChangeDateEnd = S.ChangeDate - 1
where F.Reject=0 and S.Reject = 0

PostgreSQL and pl/pgsql SYNTAX to update fields based on SELECT and FUNCTION (while loop, DISTINCT COUNT)

I have a large database, that I want to do some logic to update new fields.
The primary key is id for the table harvard_assignees
The LOGIC GOES LIKE THIS
Select all of the records based on id
For each record (WHILE), if (state is NOT NULL && country is NULL), update country_out = "US" ELSE update country_out=country
I see step 1 as a PostgreSQL query and step 2 as a function. Just trying to figure out the easiest way to implement natively with the exact syntax.
====
The second function is a little more interesting, requiring (I believe) DISTINCT:
Find all DISTINCT foreign_keys (a bivariate key of pat_type,patent)
Count Records that contain that value (e.g., n=3 records have fkey "D","388585")
Update those 3 records to identify percent as 1/n (e.g., UPDATE 3 records, set percent = 1/3)
For the first one:
UPDATE
harvard_assignees
SET
country_out = (CASE
WHEN (state is NOT NULL AND country is NULL) THEN 'US'
ELSE country
END);
At first it had condition "id = ..." but I removed that because I believe you actually want to update all records.
And for the second one:
UPDATE
example_table
SET
percent = (SELECT 1/cnt FROM (SELECT count(*) AS cnt FROM example_table AS x WHERE x.fn_key_1 = example_table.fn_key_1 AND x.fn_key_2 = example_table.fn_key_2) AS tmp WHERE cnt > 0)
That one will be kind of slow though.
I'm thinking on a solution based on window functions, you may want to explore those too.