Why does SQL JOIN allow duplicates but IN does not - tsql

Example scenario:
TABLE_A contains a column called ID and also contains duplicate rows. There is another table called ID_TABLE that contains IDs. Assuming no duplicates in ID_TABLE -
If I do:
SELECT * FROM TABLE_A
INNER JOIN ID_TABLE ON ID_TABLE.ID = TABLE_A.ID
There will be duplicates in the result set. However, if I do:
SELECT * FROM TABLE_A
WHERE TABLE_A.ID IN (SELECT ID_TABLE.ID FROM ID_TABLE)
There will not be any duplicates in the result set.
Does anyone know why the JOIN clause allows duplicates while the IN clause does not? I had thought they did the same thing.
Thanks

It's not that it's allowing duplicates. By joining the two tables, you are creating a product from table 1 and table 2, so if TABLE_A has two records for ID=1 and ID_Table has 1 record, the resulting product is two records. Using IN doesn't cause a multiplication of records, even if the value is listed in the IN clause multiple times as you are only getting the unique records matching the values within the IN clause.

Related

How to do a Select * followed by a join SEA-ORM

I want to do a join with another table. I followed the tutorial on the site and the my code compiles but it's not performing the join and instead just selects the first table.
SELECT
"table1.col1"
"table1.col2"
"table1.col3"
FROM
"table1"
JOIN "table2" ON "table1"."col1" = "table2"."col1"
LIMIT
1
It is only returning the data from table1 and not concatenating the columns where the condition for table1 and table2 is met.
I execute the query using the following code:
Entity::find()
.from_raw_sql(Statement::from_string(DatabaseBackend::Postgres, query.to_owned()))
.all(&self.connection)
.await?
That returns a Vec<Model>. Is this the correct way? Also, how can I build a SQL statement using an Entity as the base which looks like SELECT * from "table1".
After 'SELECT' (and before 'FROM') you are specifying which columns
to include in the output,
and you are selecting only three columns from table1 in your code.
Add the columns you want to include from table2 here, and you may get
the results you want.

Merging in powerquery

although i selected Full Join, i couldn't get the all rows from both tables.
how can i get all rows from both tables ? (all 12093 rows)
maybe another join type may help ?
let
Source = Table.NestedJoin(#"Beton Irsaliye Kumulatif",{"Proje No & Adi", "Firma Kodu"},#"Beton Muhasebe Kumulatif",{"Proje No & Adi", "Hesap No"},"Beton Muhasebe Kumulatif",JoinKind.FullOuter)
in
Source
Your merge is accounting for all your rows. It's just that 4 of the rows in the first table don't have matches in the second table.
Here's a simple example of what is happening. Here, I have two tables: Table1 and Table2. Both have 10 rows. In fact, both are exactly the same.
If I choose to do a Full Outer join with these, using Col1 and Col2 for matching, I'll see this:
It tells me that 10 of the rows from the first table (Table1) match rows of the second table (Table2).
Now, if I change the last two rows of Table1 (specifically, the last two rows of Col2 of Table1) like this:
Then when I try to do a Full Outer join the same way, I'll see this:
Only 8 of the rows from the first table (Table1) match rows of the second table (Table2).
But when I continue with the merge, I'll see Table1's information in a table with Table2's matching information as embedded tables in column "NewColumn" of that table:
When I then expand "NewColumn", I see all the info from Table1, as before, and all matching info from Table2, as well as rows that don't have matches between the two tables.
All rows of both tables are accounted for.

PostgreSQL 9.4.5: Limit number of results on INNER JOIN

I'm trying to implement a many-to-many relationship using PostgreSQL's Array type, because it scales better for my use case than a join table would. I have two tables: table1 and table2. table1 is the parent in the relationship, having the column child_ids bigint[] default array[]::bigint[]. A single row in table1 can have upwards of tens of thousands of references to table2 in the table1.child_ids column, therefore I want to try to limit the amount returned by my query to a maximum of 10. How would I structure this query?
My query to dereference the child ids is SELECT *, json_agg(table2.*) as children FROM table1 INNER JOIN table2 ON table2 = ANY(table1.child_ids). I don't see a way I could set a limit without limiting the entire response as a whole. Is there a way to either limit this INNER JOIN, or at least utilize a subquery to that I can use LIMIT to restrict the amount of results from table2?
This would have been dead simple with properly normalized tables, but here goes with arrays:
SELECT *
FROM table1 t1, LATERAL (
SELECT json_agg(*) AS children
FROM table2
WHERE id = ANY (t1.child_ids)
LIMIT 10) t2;
Of course, you have no influence over which 10 rows per id of table2 will be selected.

Most Efficient way to insert multiple rows of integers

I have 2 simple select queries to get me a list of id's. My first table returns lets say 5 ids.
1, 2, 5, 10, 23
My second table returns a list of 50 ids not in any order.
Whats is the most efficient way to write a query to map each of my ids from my first table to all the ids from the second table?
edit: sorry Here is more info.
If table 1 has a result of ids = 1, 2, 5, 10, 23
and table 2 has a list of ids = 123, 234, 345, 456, 567
I would like to write an insert that would insert into table 3 these values
Table1ID | Table2ID
1|123
1|234
1|345
1|456
1|567
2|123
2|234
2|345
2|456
2|567
and so on.
It seems like what you are looking for is a Cartesian Product.
You can accomplish this simply by joining the two tables together with no join condition, which is accomplished by CROSS JOIN.
INSERT dbo.TableC (AID, BID)
SELECT A.ID, B.ID
FROM
dbo.TableA A
CROSS JOIN dbo.TableB B
;
Here is an image with a visualization of a Cartesian product. The inputs are small, just the column of symbols on the left corresponding to the first table, and the column on the right being the second table. Upon performing a JOIN with no conditions, you get one row per connecting line in the middle.
Use INSERT INTO ... SELECT statement with cross join:
INSERT INTO TableC (ID1, ID2)
SELECT A.ID AS ID1, b.ID AS ID2 FROM TableA A CROSS JOIN TableB B;
Sample DEMO
INSERT INTO…SELECT is described on MSDN: INSERT (Transact-SQL)
You can use INSERT INTO <target_table> SELECT <columns> FROM
<source_table> to efficiently transfer a large number of rows from one
table, such as a staging table, to another table with minimal logging.
Minimal logging can improve the performance of the statement and
reduce the possibility of the operation filling the available
transaction log space during the transaction.

sql query to retrieve DISTINCT rows on left join

I am developing a t-sql query to return left join of two tables, but when I just select records from Table A, it gives me only 2 records. The problem though is when I left join it Table B, it gives me 4 records. How can I reduce this to just 2 records?
One problem though is that I am only aware of one PK/FK to link these two tables.
The field you are using for the join must exist more than once in table B - this is why multiple rows are being returned in the join. In order to reduce the row count you will have to either add further fields to the join, or add a where clause to filter out rows not required.
Alternatively you could use a GROUP BY statement to group the rows up, but this may not be what you need.
Remember that the left join brings you null fields from joined table.
Also you can use select(distinct), but i can't see well you issue. Can you give us more details?