PostgreSQL 9.4.5: Limit number of results on INNER JOIN - postgresql

I'm trying to implement a many-to-many relationship using PostgreSQL's Array type, because it scales better for my use case than a join table would. I have two tables: table1 and table2. table1 is the parent in the relationship, having the column child_ids bigint[] default array[]::bigint[]. A single row in table1 can have upwards of tens of thousands of references to table2 in the table1.child_ids column, therefore I want to try to limit the amount returned by my query to a maximum of 10. How would I structure this query?
My query to dereference the child ids is SELECT *, json_agg(table2.*) as children FROM table1 INNER JOIN table2 ON table2 = ANY(table1.child_ids). I don't see a way I could set a limit without limiting the entire response as a whole. Is there a way to either limit this INNER JOIN, or at least utilize a subquery to that I can use LIMIT to restrict the amount of results from table2?

This would have been dead simple with properly normalized tables, but here goes with arrays:
SELECT *
FROM table1 t1, LATERAL (
SELECT json_agg(*) AS children
FROM table2
WHERE id = ANY (t1.child_ids)
LIMIT 10) t2;
Of course, you have no influence over which 10 rows per id of table2 will be selected.

Related

If two PostgreSQL table has gist index, can their union be considered as an indexed table?

I have three tables, table_a, table_b, table_c. All of them has gist index.
I would like to perform a left join between table_c and the UNION of table_a and table_b.
Can the UNION be considered "indexed"? I assume it would better to create new table as the UNION, but these tables are huge so I try to avoid this kind of redundancy.
In terms of SQL, my question:
Is this
SELECT * FROM myschema.table_c AS a
LEFT JOIN
(SELECT col_1,col_2,the_geom FROM myschema.table_a
UNION
SELECT col_1,col_2,the_geom FROM myschema.table_b) AS b
ON ST_Intersects(a.the_geom,b.the_geom);
equal to this?
CREATE TABLE myschema.table_d AS
SELECT col_1,col_2,the_geom FROM myschema.table_a
UNION
SELECT col_1,col_2,the_geom FROM myschema.table_b;
CREATE INDEX idx_table_d_the_geom
ON myschema.table_d USING gist
(the_geom)
TABLESPACE mydb;
SELECT * FROM myschema.table_c AS a
LEFT JOIN myschema.table_d AS b
ON ST_Intersects(a.the_geom,b.the_geom);
You can look at the execution plan with EXPLAIN, but I doubt that it will use the indexes.
Rather than performing a left join between one table and the union of three other tables, you should perform the union of the left joins between the one table and each of the three tables in turn. That will be a longer statement, but PostgreSQL will be sure to use the index if that can speed up the left joins.
Be sure to use UNION ALL rather than UNION unless you really have to remove duplicates.

Select from view and join from one table or another if no record available in first

I have a view and two tables. Tables one and two have the same columns, but table one is has as small number of records, and table two has old data and a huge number of records.
I have to join a view with these two tables to get the latest data from table one; if a record from the view is not available in table one then I have to select the record from table two.
How can i achieve this with MySQL?
I came to know by doing some research in internet that we can't apply full join and sub query in from clause.
Just do a simple UNION of the results excluding the records in table2 that are already mentioned in table1:
SELECT * FROM table1
UNION
SELECT * FROM table2
WHERE NOT EXISTS (SELECT * FROM table1 WHERE table2.id = table1.id)
Something like this.
SELECT *
FROM view1 V
INNER JOIN (SELECT COALESCE(a.commoncol, b.commoncol) AS commoncol
FROM table1 A
FULL OUTER JOIN table2 B
ON A.commoncol = B.commoncol) C
ON v.viewcol = c.commoncol
If you are using Mysql then check here to simulate Full Outer Join in MySQL
are you trying to update the view from two tables where old record in view needs to be overwritten by latest/updated record from table1 and non existant records from table1 to be appended from table2?
, or are you creating a view from two tables?

How does COUNT(*) behave in an inner join

Take this query:
SELECT c.CustomerID, c.AccountNumber, COUNT(*) AS CountOfOrders,
SUM(s.TotalDue) AS SumOfTotalDue
FROM Sales.Customer AS c
INNER JOIN Sales.SalesOrderheader AS s ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID, c.AccountNumber
ORDER BY c.CustomerID;
I expected COUNT(*) to count the rows in Sales.Customer but to my surprise it counts the number of rows in the joined table.
Any idea why this is? Also, is there a way to be explicit in specifying which table COUNT() should operate on?
Query Processing Order...
The FROM clause is processed before the SELECT clause -- which is to say -- by the time SELECT comes into play, there is only one (virtual) table it is selecting from -- namely, the individual tables after their joined (JOIN), filtered (WHERE), etc.
If you just want to count over the one table, then you might try a couple of things...
COUNT(DISTINCT table1.id)
Or turn the table you want to count into a sub-query with count() inside of it

Insert data into table effeciently, postgresql

I am new to postgresql (and databases in general) and was hoping to get some pointers on improving the efficiency of the following statement.
I am inserting data from one table to another, and do not want to insert duplicate values. I have a rid (unique identifier in each table) that are indexed and are Primary Keys.
I am currently using the following statement:
INSERT INTO table1 SELECT * FROM table2 WHERE rid NOT IN (SELECT rid FROM table1).
As of now the table one is 200,000 records, table2 is 20,000 records. Table1 is going to keep growing (probably to around 2,000,000) and table2 will stay around 20,000 records. As of now the statement takes about 15 minutes to run. I am concerned that as Table1 grows this is going to take way to long. Any suggestions?
This should be more efficient than your current query:
INSERT INTO table1
SELECT *
FROM table2
WHERE NOT EXISTS (
SELECT 1 FROM table1 WHERE table1.rid = table2.rid
);
insert into table1
select t2.*
from
table2 t2
left join
table1 t1 on t1.rid = t2.rid
where t1.rid is null

Why does SQL JOIN allow duplicates but IN does not

Example scenario:
TABLE_A contains a column called ID and also contains duplicate rows. There is another table called ID_TABLE that contains IDs. Assuming no duplicates in ID_TABLE -
If I do:
SELECT * FROM TABLE_A
INNER JOIN ID_TABLE ON ID_TABLE.ID = TABLE_A.ID
There will be duplicates in the result set. However, if I do:
SELECT * FROM TABLE_A
WHERE TABLE_A.ID IN (SELECT ID_TABLE.ID FROM ID_TABLE)
There will not be any duplicates in the result set.
Does anyone know why the JOIN clause allows duplicates while the IN clause does not? I had thought they did the same thing.
Thanks
It's not that it's allowing duplicates. By joining the two tables, you are creating a product from table 1 and table 2, so if TABLE_A has two records for ID=1 and ID_Table has 1 record, the resulting product is two records. Using IN doesn't cause a multiplication of records, even if the value is listed in the IN clause multiple times as you are only getting the unique records matching the values within the IN clause.