sql query to retrieve DISTINCT rows on left join - tsql

I am developing a t-sql query to return left join of two tables, but when I just select records from Table A, it gives me only 2 records. The problem though is when I left join it Table B, it gives me 4 records. How can I reduce this to just 2 records?
One problem though is that I am only aware of one PK/FK to link these two tables.

The field you are using for the join must exist more than once in table B - this is why multiple rows are being returned in the join. In order to reduce the row count you will have to either add further fields to the join, or add a where clause to filter out rows not required.
Alternatively you could use a GROUP BY statement to group the rows up, but this may not be what you need.

Remember that the left join brings you null fields from joined table.
Also you can use select(distinct), but i can't see well you issue. Can you give us more details?

Related

Why does Postgres choose different data solely based on columns selected?

I'm running two different queries with two unions each inside a subquery:
So the structure is:
SELECT *
FROM (subquery_1
UNION SELECT subquery_2)
Now, if I perform the query on the left, I get this result:
However, the query on the right returns this result:
How are the results differing even though the conditions have not changed in either query, and the only difference was one of the selected columns in a subquery?
This is very counter-intuitive.
The operator UNION removes duplicate rows from the returned resultset.
Removing a column from the SELECT statement may produce duplicate rows that would not exist if the removed column was there.
Try UNION ALL instead, which will return in any case all the rows of the unioned queries.
See a simplified demo.

Quicksight query two tables has problem with left join

I have a bad result with using query in Quicksight.
I have one table with campaign, if I query just on this table, it is ok; I have all campaigns in my list. But when I use left join, i have just the results which match with join table.
Is this normal?
I have already tried all join possibilities and it is the same.

LEFT JOIN returns incorrect result in PostgreSQL

I have two tables: A (525,968 records) and B (517,831 records). I want to generate a table with all the rows from A and the matched records from B. Both tables has column "id" and column "year". The combination of id and year in table A is unique, but not in table B. So I wrote the following query:
SELECT
A.id,
A.year,
A.v1,
B.x1,
B.e1
FROM
A
LEFT JOIN B ON (A.id = B.id AND A.year = B.year);
I thought the result should contain the same total number of records in A, but it only returns about 517,950 records. I'm wondering what the possible cause may be.
Thanks!
First of all, I understand that this is an example, but postgres may hava an issues with capital letters in the table names.
Secondly, it may be a good idea to check how exactly you calculated 525,968 records. The thing is - if you use sime kind of client of database administration / queries - it may show you different / technical information about tables (there may be internal row counters in postgres that may actually differ from the number of records).
And finally to check yourself do something like
SELECT
count("A".id)
FROM
"A"

Why does SQL JOIN allow duplicates but IN does not

Example scenario:
TABLE_A contains a column called ID and also contains duplicate rows. There is another table called ID_TABLE that contains IDs. Assuming no duplicates in ID_TABLE -
If I do:
SELECT * FROM TABLE_A
INNER JOIN ID_TABLE ON ID_TABLE.ID = TABLE_A.ID
There will be duplicates in the result set. However, if I do:
SELECT * FROM TABLE_A
WHERE TABLE_A.ID IN (SELECT ID_TABLE.ID FROM ID_TABLE)
There will not be any duplicates in the result set.
Does anyone know why the JOIN clause allows duplicates while the IN clause does not? I had thought they did the same thing.
Thanks
It's not that it's allowing duplicates. By joining the two tables, you are creating a product from table 1 and table 2, so if TABLE_A has two records for ID=1 and ID_Table has 1 record, the resulting product is two records. Using IN doesn't cause a multiplication of records, even if the value is listed in the IN clause multiple times as you are only getting the unique records matching the values within the IN clause.

Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094

SELECT DISTINCT tblJobReq.JobReqId
, tblJobReq.JobStatusId
, tblJobClass.JobClassId
, tblJobClass.Title
, tblJobReq.JobClassSubTitle
, tblJobAnnouncement.JobClassDesc
, tblJobAnnouncement.EndDate
, blJobAnnouncement.AgencyMktgVerbage
, tblJobAnnouncement.SpecInfo
, tblJobAnnouncement.Benefits
, tblSalary.MinRateSal
, tblSalary.MaxRateSal
, tblSalary.MinRateHour
, tblSalary.MaxRateHour
, tblJobClass.StatementEval
, tblJobReq.ApprovalDate
, tblJobReq.RecruiterId
, tblJobReq.AgencyId
FROM ((tblJobReq
LEFT JOIN tblJobAnnouncement ON tblJobReq.JobReqId = tblJobAnnouncement.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary ON tblJobClass.SalaryCode = tblSalary.SalaryCode
WHERE (tblJobReq.JobClassId in (SELECT JobClassId
from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
When i try to execute the query it results in the following error.
Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094
I checked and didn't find any solution. The only way is to truncate (substring())the "tblJobAnnouncement.JobClassDesc" in the query which has column size of around 8000.
Do we have any work around so that i need not truncate the values. Or Can this query be optimised? Any setting in SQL Server 2000?
The [non obvious] reason why SQL needs to SORT is the DISTINCT keyword.
Depending on the data and underlying table structures, you may be able to do away with this DISTINCT, and hence not trigger this error.
You readily found the alternative solution which is to truncate some of the fields in the SELECT list.
Edit: Answering "Can you please explain how DISTINCT would be the reason here?"
Generally, the fashion in which the DISTINCT requirement is satisfied varies with
the data context (expected number of rows, presence/absence of index, size of row...)
the version/make of the SQL implementation (the query optimizer in particular receives new or modified heuristics with each new version, sometimes resulting in alternate query plans for various constructs in various contexts)
Yet, all the possible plans associated with a "DISTINCT query" involve *some form* of sorting of the qualifying records. In its simplest form, the plan "fist" produces the list of qualifying rows (records) (the list of records which satisfy the WHERE/JOINs/etc. parts of the query) and then sorts this list (which possibly includes some duplicates), only retaining the very first occurrence of each distinct row. In other cases, for example when only a few columns are selected and when some index(es) covering these columns is(are) available, no explicit sorting step is used in the query plan but the reliance on an index implicitly implies the "sortability" of the underlying columns. In other cases yet, steps involving various forms of merging or hashing are selected by the query optimizer, and these too, eventually, imply the ability of comparing two rows.
Bottom line: DISTINCT implies some sorting.
In the specific case of the question, the error reported by SQL Server and preventing the completion of the query is that "Sorting is not possible on rows bigger than..." AND, the DISTINCT keyword is the only apparent reason for the query to require any sorting (BTW many other SQL constructs imply sorting: for example UNION) hence the idea of removing the DISTINCT (if it is logically possible).
In fact you should remove it, for test purposes, to assert that, without DISTINCT, the query completes OK (if only including some duplicates). Once this fact is confirmed, and if effectively the query could produce duplicate rows, look into ways of producing a duplicate-free query without the DISTINCT keyword; constructs involving subqueries can sometimes be used for this purpose.
An unrelated hint, is to use table aliases, using a short string to avoid repeating these long table names. For example (only did a few tables, but you get the idea...)
SELECT DISTINCT JR.JobReqId, JR.JobStatusId,
tblJobClass.JobClassId, tblJobClass.Title,
JR.JobClassSubTitle, JA.JobClassDesc, JA.EndDate, JA.AgencyMktgVerbage,
JA.SpecInfo, JA.Benefits,
S.MinRateSal, S.MaxRateSal, S.MinRateHour, S.MaxRateHour,
tblJobClass.StatementEval,
JR.ApprovalDate, JR.RecruiterId, JR.AgencyId
FROM (
(tblJobReq AS JR
LEFT JOIN tblJobAnnouncement AS JA ON JR.JobReqId = JA.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary AS S ON tblJobClass.SalaryCode = S.SalaryCode
WHERE (JR.JobClassId in
(SELECT JobClassId from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
FYI, running this SQL command on your DB can fix the problem if it is caused by space that needs to be reclaimed after dropping variable length columns:
DBCC CLEANTABLE (0,[dbo.TableName])
See: http://msdn.microsoft.com/en-us/library/ms174418.aspx
This is a limitation of SQL Server 2000. You can:
Split it into two queries and combine elsewhere
SELECT ID, ColumnA, ColumnB FROM TableA JOIN TableB
SELECT ID, ColumnC, ColumnD FROM TableA JOIN TableB
Truncate the columns appropriately
SELECT LEFT(LongColumn,2000)...
Remove any redundant columns from the SELECT
SELECT ColumnA, ColumnB, --IDColumnNotUsedInOutput
FROM TableA
Migrate off of SQL Server 2000