How to insert new rows only on tables without Primary or Foreign Keys? - tsql

Scenario: I have two tables. Table A and Table B, both have the same exact columns. My task is to create a master table. I need to ensure no duplicates are in the master table unless it is a new record.
problem: Whoever built the tables did not assign a Primary Key to the table.
Attempts: I attempted running an INSERT INTO WHERE NOT EXISTS query (below as an example not the actual query I ran)
Question: the portion of the query below WHERE t2.id = t1.id confuses me, my table has a multitude of columns, there is no id column like I said it has no PRIMARY key to anchor the match, so, in a scenario where all I have are values without primary keys, how can I append only new records? Also, perhaps I am going about this the wrong way but are there any other functions or options through TSQL worth considering? Maybe not an INSERT INTO statement or perhaps something else? My SQL skills aren't yet that advance so I am not asking for a solution but perhaps ideas or other methods worth considering? Any ideas are welcome.
INSERT INTO TABLE_2
(id, name)
SELECT t1.id,
t1.name
FROM TABLE_1 t1
WHERE NOT EXISTS(SELECT id
FROM TABLE_2 t2
WHERE t2.id = t1.id)

If I understand your question correctly, you would need to amend the SQL sample you posted by changing the condition t2.id = t1.id to whatever columns you do have.
Say your 2 tables have name and brand columns and you don't want duplicates, just change the sample to:
WHERE t2.name = t1.name
AND t2.brand = t1.brand
This will ensure you don't insert and rows in table 2 from table 1 which are duplicates. You would have to make sure the where condition contains all columns (you said the table schemas are identical).
Also, the above code sample copies everything into table 2 - but you said you want a master table - so you'd have to change it to insert into the master table, not table 2.

Related

redshift nulling columns when joining another table

table_1 has 35 columns, table_2 has 20 columns
query is:
select table1.*,
table2.f1,
...
table2.f20
FROM public.table_1 as table1
left join public.table_2 as table2
on table1.id = table2.id
and table1.arrival_time::date <= table2.end_date::date
and table2.activity_date < table2.end_date
;
this works I expect 469 rows to be returned and that's what I get. However several fields from table_1 get displayed as null instead of the values in the table.
These fields are NOT part of the join.
Due to IP concerns I can't provide the full details of the tables, each field in table_1 and table_2 are varchar (don't ask me why a timestamp is stored as a varchar - its a long story that I have no control over)
This query WORKS in RDS PostgreSQL!
Any ideas why it has a problem in redshift?
Well I'll be very confused.
table_1 is data from two sources joined together - I didn't even think to look at the sources. Turns out the linked source had no data for one value.
Just goes to show that when looking at just a piece of the data you need to look HARD at all the data.
Now I'm off to find a better source for the missing data.
Thanks for your time!
James

Update or insert with outer join in postgres

Is it possible to add a new column to an existing table from another table using insert or update in conjunction with full outer join .
In my main table i am missing some records in one column in the other table i have all those records i want to take the full record set into the maintable table. Something like this;
UPDATE maintable
SET all_records= othertable.records
FROM
FULL JOIN othertable on maintable.col = othertable.records;
Where maintable.col has same id a othertable.records.
I know i could simply join the tables but i have a lot of comments in the maintable i don't want to have to copy paste back in if possible. As i understand using where is equivalent of a left join so won't show me what i'm missing
EDIT:
What i want is effectively a new maintable.col with all the records i can then pare down based on presence of records in other cols from other tables
Try this:
UPDATE maintable
SET all_records = o.records
FROM othertable o
WHERE maintable.col = o.records;
This is the general syntax to use in postgres when updating via a join.
HTH
EDIT
BTW you will need to change this - I used your example, but you are updating the maintable with the column used for the join! Your set needs to be something like SET missingcol = o.extracol
AMENDED GENERALISED ANSWER (following off-line chat)
To take a simplified example, suppose that you have two tables maintable and subtable, each with the same columns, but where the subtable has extra records. For both tables id is the primary key. To fill maintable with the missing records, for pre 9.5 versions of Postgres you must use the following syntax:
INSERT INTO maintable (SELECT * FROM subtable s WHERE NOT EXISTS
(SELECT 1 FROM maintable m WHERE m.id = s.id));
Since 9.5 there is a (preferred) alternative:
INSERT INTO maintable (SELECT * FROM subtable) ON CONFLICT DO NOTHING;
This is preferred because (apart from being simpler) it avoids the situation that has been known to arise in the former, where a race condition is created between the INSERT and the sub-SELECT.
Obviously when the columns are different, you need to specify in the INSERT statement which columns are inserted from which. Something like:
INSERT INTO maintable (id, ColA, ColB)
(SELECT id, ColE, ColG FROM subtable ....)
Similarly the common field might not be id in both tables. However, the simplified example should be enough to point you in the right direction.

Postgres subquery has access to column in a higher level table. Is this a bug? or a feature I don't understand?

I don't understand why the following doesn't fail. How does the subquery have access to a column from a different table at the higher level?
drop table if exists temp_a;
create temp table temp_a as
(
select 1 as col_a
);
drop table if exists temp_b;
create temp table temp_b as
(
select 2 as col_b
);
select col_a from temp_a where col_a in (select col_a from temp_b);
/*why doesn't this fail?*/
The following fail, as I would expect them to.
select col_a from temp_b;
/*ERROR: column "col_a" does not exist*/
select * from temp_a cross join (select col_a from temp_b) as sq;
/*ERROR: column "col_a" does not exist
*HINT: There is a column named "col_a" in table "temp_a", but it cannot be referenced from this part of the query.*/
I know about the LATERAL keyword (link, link) but I'm not using LATERAL here. Also, this query succeeds even in pre-9.3 versions of Postgres (when the LATERAL keyword was introduced.)
Here's a sqlfiddle: http://sqlfiddle.com/#!10/09f62/5/0
Thank you for any insights.
Although this feature might be confusing, without it, several types of queries would be more difficult, slower, or impossible to write in sql. This feature is called a "correlated subquery" and the correlation can serve a similar function as a join.
For example: Consider this statement
select first_name, last_name from users u
where exists (select * from orders o where o.user_id=u.user_id)
Now this query will get the names of all the users who have ever placed an order. Now, I know, you can get that info using a join to the orders table, but you'd also have to use a "distinct", which would internally require a sort and would likely perform a tad worse than this query. You could also produce a similar query with a group by.
Here's a better example that's pretty practical, and not just for performance reasons. Suppose you want to delete all users who have no orders and no tickets.
delete from users u where
not exists (select * from orders o where o.user_d = u.user_id)
and not exists (select * from tickets t where t.user_id=u.ticket_id)
One very important thing to note is that you should fully qualify or alias your table names when doing this or you might wind up with a typo that completely messes up the query and silently "just works" while returning bad data.
The following is an example of what NOT to do.
select * from users
where exists (select * from product where last_updated_by=user_id)
This looks just fine until you look at the tables and realize that the table "product" has no "last_updated_by" field and the user table does, which returns the wrong data. Add the alias and the query will fail because no "last_updated_by" column exists in product.
I hope this has given you some examples that show you how to use this feature. I use them all the time in update and delete statements (as well as in selects-- but I find an absolute need for them in updates and deletes often)

Easy way of re-indexing my table

Good Day,
I'm currently using posgresql as my backend and I have to make huge changes on my table fields.
I will be using two tables.
Table 1 Table 2
Old Index New Index
Product Id Old Index
Address Product Id
Contact no Address
Contact no
Email
I have to migrate all details from Table 1 from Table 2. I’m using a different index for Table 2.
For my other tables to recognize my old index I used this query
Update Table 2 Set OldIndex =Table2.index
From(select Oldindex from Table 1)as new,Table 1
Where Table1.Productid =Table2.Productid
I have other tables related to Table 1 so my goal is to replace the old index with new index and hope that other tables can see the changes too.
But I’m not sure I’m doing this right. my query is slow, I hope someone can test my query and point me on the right direction if I'm doing it all wrong, thank you in advance.
Would you mind to try MERGE
MERGE INTO Table2 AS b
USING Table1 AS p
ON p.product_id = b.product_id
WHEN MATCHED THEN b.OldIndex = b.NewIndex
I do not know how it works for postgresql, but you can find some samples here: https://wiki.postgresql.org/wiki/MergeTestExamples
The way to do this in PostgreSQL is to use a writable CTE (available in 9.2 and later).
In this way you would do something like:
WITH up (UPDATE table2
SET ....
FROM table1 t1
WHERE t1.product_id = table2.product_id
RETURNING product_id)
INSERT INTO table2 (...)
SELECT ...
FROM table1
WHERE product_id NOT IN (select product_id from up);
You can find some examples here.

When is a Deadlock not a Deadlock?

I'm asking this question because I'm getting a deadlock from time to time that I don't understand.
This is the scenario:
Stored Procedure that updates table A:
UPDATE A
SET A.Column = #SomeValue
WHERE A.ID = #ID
Stored Procedure that inserts into a temp table #temp:
INSERT INTO #temp (Column1,Column2)
SELECT B.Column1, A.Column2
FROM B
INNER JOIN A
ON A.ID = B.ID
WHERE B.Code IN ('Something','SomethingElse')
I see that there could possibly be a lock wait but I fail to see how a deadlock would occur, am I missing something obvious?
EDIT:
The SPs that I typed here are obviously simplified versions but I'm using the columns involved. The structure of both tables would be:
CREATE TABLE A (ID IDENTITY
CONSTRAINT PRIMARY KEY,
Column VARCHAR (100))
CREATE TABLE B (ID IDENTITY
CONSTRAINT PRIMARY KEY,
Code VARCHAR (100))
Try this since its causeing locks specify for the tables name the table hint and keyword:
WITH(NOLOCK)
So some thing like this for your scenario:
INSERT INTO #temp (Column1,Column2)
SELECT B.Column1, A.Column2
FROM B WITH(NOLCOK)
INNER JOIN A WITH(NOLOCK)
ON A.ID = B.ID
WHERE B.Code IN ('Something','SomethingElse')
See how you go then.
You can lookup table hint also for tsql, sql server to see which one suits you best. The one I specified NOLCOK will not cause locks and also it will skip locked rows as some other process is using them, so if you dont care you can use it.
I am not sure with temp tables but you can also use table hints with INSERT, INSERT INTO WITH(TABLE_HINT).