Insert data into table effeciently, postgresql - postgresql

I am new to postgresql (and databases in general) and was hoping to get some pointers on improving the efficiency of the following statement.
I am inserting data from one table to another, and do not want to insert duplicate values. I have a rid (unique identifier in each table) that are indexed and are Primary Keys.
I am currently using the following statement:
INSERT INTO table1 SELECT * FROM table2 WHERE rid NOT IN (SELECT rid FROM table1).
As of now the table one is 200,000 records, table2 is 20,000 records. Table1 is going to keep growing (probably to around 2,000,000) and table2 will stay around 20,000 records. As of now the statement takes about 15 minutes to run. I am concerned that as Table1 grows this is going to take way to long. Any suggestions?

This should be more efficient than your current query:
INSERT INTO table1
SELECT *
FROM table2
WHERE NOT EXISTS (
SELECT 1 FROM table1 WHERE table1.rid = table2.rid
);

insert into table1
select t2.*
from
table2 t2
left join
table1 t1 on t1.rid = t2.rid
where t1.rid is null

Related

Postgres select queries inside a transaction is slow

What could be the reason that select queries performed inside a transaction is really slow
even though the queries are with just their primary key?
I have two tables after updating one row in table1 the query in table2 within the same transaction is really slow
even though the query on table2 is just
select * from table2 where id = 10
table1 and table2 have a lot of rows

How to update joined table using condition

I'm having an issue with a simple update statement. I'm new to postgresql and I'm still stuck on MS Sql Server syntax.
What I want to do is to update all records from table1 which are not present / don't exist in table2. Table1 and Table2 are having an 1 to 1 relation. The join column is "colx" from my example
On Ms SQL Server I would have something like this:
UPDATE table1 set col1='some value' from table1 t1 LEFT JOIN table2 t2 on t1.colx=t2.colx WHERE t2.colx IS NULL
or
UPDATE table1 set col1='some value' from table1 t1 where not exists (select 1 from table2 t2 where t1.colx=t2.colx)
My issue is when performing the same on PostgreSql it updates all records from table1, not only the records matching the condition (e.g. I was expecting 4 records to be updated, but all records from table1 are updated instead).
I checked using a select statement the join condition for all possible approaches and I have the expected result (e.g. 4 records).
Is there anything I'm missing?
Your question is not very clear about the requirement.
What I understood is you want to update the value of col1 in table1 for those records which are not present in the table2.
You can try it this way in Postgresql:
UPDATE table1 t1 set col1='some value' where not exists(select 1 from table2 where colx=t1.colx)
DEMO

Does SQL execute subqueries fully?

Imagine I have this SQL query and the table2 is HUGE.
select product_id, count(product_id)
from table1
where table2_ptr_id in (select id
from table2
where author is not null)
Will SQL first execute the subquery and load all the table2 into memory? like if table1 has 10 rows and table2 has 10 million rows will it be better to join first and then filter? Or DB is smart enough to optimize this query as it is written.
You have to EXPLAIN the query to know what it is doing.
However, your query will likely perform better in PostgreSQL if you rewrite it to
SELECT product_id
FROM table1
WHERE EXISTS (SELECT 1
FROM table2
WHERE table2.id = table1.table2_ptr_id
AND table2.author IS NOT NULL);
Then PostgreSQL can use an anti-join, which will probably perform much better with a huge table2.
Remark: the count in your query doesn't make any sense to me.

PostgreSQL 9.4.5: Limit number of results on INNER JOIN

I'm trying to implement a many-to-many relationship using PostgreSQL's Array type, because it scales better for my use case than a join table would. I have two tables: table1 and table2. table1 is the parent in the relationship, having the column child_ids bigint[] default array[]::bigint[]. A single row in table1 can have upwards of tens of thousands of references to table2 in the table1.child_ids column, therefore I want to try to limit the amount returned by my query to a maximum of 10. How would I structure this query?
My query to dereference the child ids is SELECT *, json_agg(table2.*) as children FROM table1 INNER JOIN table2 ON table2 = ANY(table1.child_ids). I don't see a way I could set a limit without limiting the entire response as a whole. Is there a way to either limit this INNER JOIN, or at least utilize a subquery to that I can use LIMIT to restrict the amount of results from table2?
This would have been dead simple with properly normalized tables, but here goes with arrays:
SELECT *
FROM table1 t1, LATERAL (
SELECT json_agg(*) AS children
FROM table2
WHERE id = ANY (t1.child_ids)
LIMIT 10) t2;
Of course, you have no influence over which 10 rows per id of table2 will be selected.

Select from view and join from one table or another if no record available in first

I have a view and two tables. Tables one and two have the same columns, but table one is has as small number of records, and table two has old data and a huge number of records.
I have to join a view with these two tables to get the latest data from table one; if a record from the view is not available in table one then I have to select the record from table two.
How can i achieve this with MySQL?
I came to know by doing some research in internet that we can't apply full join and sub query in from clause.
Just do a simple UNION of the results excluding the records in table2 that are already mentioned in table1:
SELECT * FROM table1
UNION
SELECT * FROM table2
WHERE NOT EXISTS (SELECT * FROM table1 WHERE table2.id = table1.id)
Something like this.
SELECT *
FROM view1 V
INNER JOIN (SELECT COALESCE(a.commoncol, b.commoncol) AS commoncol
FROM table1 A
FULL OUTER JOIN table2 B
ON A.commoncol = B.commoncol) C
ON v.viewcol = c.commoncol
If you are using Mysql then check here to simulate Full Outer Join in MySQL
are you trying to update the view from two tables where old record in view needs to be overwritten by latest/updated record from table1 and non existant records from table1 to be appended from table2?
, or are you creating a view from two tables?