Left Join Hangs

Left Join Hangs - tsql

I am trying to figure out what could be causing a left join to hang. I've narrowed a problem down to a specific table but I can't for the life of me figure out what might be going on. Basically, I have two tables, lets call them table A and table B. When I left join table A to table B (its a 1 to 1 relationship with table B not always having a related record to table A) the query hangs. When I inner join table A to table B, it runs in about a half second returning about 27,000 records. Why is it when I run a left join, which should take a bit longer but not by much, it hangs? Could I have bad data in table B? The fields I'm joining are bigint's. I'm stumped on this one. Any help would be much appreciated.
Here is my sql:
select
RegMemberTrip.idmember,
RegParent1.idMember_Parent1,
regparent1.idParent1
from
regmembertrip
left join
regparent1 on RegMemberTrip.idmember = regparent1.idMember_Parent1
where
regmembertrip.IDRound = 25
RegParent1 is a view
If I change the where criteria to '= 24' it works fine. IDRound = 25 is fairly new data. And like I said, if I keep this the way it is (idround = 25) with an inner join it works fine.
Thanks,
Ben

Have you tried the execution path tool in the Management Console? Are you sure your left join is not in fact doing a giant cartesian product across A and B?

Related

Can I apply predicates to the same columns against multiple tables in a JOIN only once?

I want to join two tables together and add additional information from two other tables to the same columns in both queried tables. I've come up with the below code, which works, but I don't feel comfortable about having to add another JOIN clause for each table, as it would make the query substantially long if I wanted to join/add more things.
Is there a way to combine it, so that I can join additional tables only once (just use S and E aliases every time)?
SELECT
J.JobId,
J.StandardJobId,
S.JobName,
J.EngineerId,
E.EngineerName,
JF.JobId AS FollowUpJobId,
JF.StandardJobId AS FollowUpStandardJobId,
SF.JobName AS FollowUpJobName,
JF.EngineerId AS FollowUpEngineerId,
EF.EngineerName AS FollowUpEngineerName
FROM
Jobs J
INNER JOIN
Jobs JF
ON
J.FollowUpJobId = JF.JobId
INNER JOIN
StandardJobs S
ON
J.StandardJobId = S.StandardJobId
INNER JOIN
Engineers E
ON
E.EngineerId = J.EngineerId
INNER JOIN
StandardJobs SF
ON
SF.StandardJobId = JF.StandardJobId
INNER JOIN
Engineers EF
ON
EF.EngineerId = JF.EngineerId

One approach would be to use a Common Table Expression (CTE) - something like:
with cte as
(SELECT J.JobId,
J.StandardJobId,
S.JobName,
J.EngineerId,
E.EngineerName,
J.FollowUpJobId
FROM Jobs J
INNER JOIN StandardJobs S ON J.StandardJobId = S.StandardJobId
INNER JOIN Engineers E ON E.EngineerId = J.EngineerId)
SELECT O.*,
F.StandardJobId AS FollowUpStandardJobId,
F.JobName AS FollowUpJobName,
F.EngineerId AS FollowUpEngineerId,
F.EngineerName AS FollowUpEngineerName
FROM CTE AS O
JOIN CTE AS F ON O.FollowUpJobId = F.JobId

You can sort of do this with either a CTE (Common Table Expressions, the WITH clause) or a View:
;WITH Jobs_Extended As
(
SELECT j.*,
s.JobName,
E.EngineerName
FROM Jobs As j
JOIN StandardJobs As s ON s.StandardJobId = j.StandardJobId
JOIN Engineer As e ON e.EngineerId = j.EngineerId
)
SELECT
J.JobId,
J.StandardJobId,
J.JobName,
J.EngineerId,
J.EngineerName,
JF.JobId AS FollowUpJobId,
JF.StandardJobId AS FollowUpStandardJobId,
JF.JobName AS FollowUpJobName,
JF.EngineerId AS FollowUpEngineerId,
JF.EngineerName AS FollowUpEngineerName
FROM Jobs_Extended J
JOIN Jobs_Extended JF ON J.FollowUpJobId = JF.JobId
In this example the CTE Jobs_Extended becomes a defined alias for the relationship between the Jobs, Engineers and StandardJobs tables. Then once defined, you can use it multiple times in the query without having to redefine those interior relations.
You can do the same thing by change the WITH to a View, which will make the defined alias permannet in your database.

No, you cannot avoid JOINing related tables each time a separate reference is needed. The issue is that you are not working with the tables in a general sense but instead working with the specific rows of each table, even more specifically, just those rows that match the JOIN and WHERE conditions.
There is no way to specify the references to either StandardJobs or Engineers only once because you are needing to work with two rows from each table at the same time, at least in the given example.
However, depending on which direction you are wanting to go with "additional tables" (more references to Jobs or more lookups like StandardJobs and Engineers for the given 2 references of Jobs), the CTE construct shown by Mark is the probably the easiest / best way to abstract it. I posted this answer mainly to explain the issue at hand.

T-SQL different JOIN approaches, same results, which one would you prefer?

these are 3 approaches how to make a join. I would like to hear some word on perforance of these 3 queries.
Thank you
SELECT * FROM
tableA A LEFT JOIN tableB B
INNER JOIN tableC C
ON C.ColumnC = B.ColumnB
ON B.ColumnB = A.ColumnB
WHERE ColumnX = 'XY'
Versus
SELECT * FROM
tableA A LEFT JOIN tableB B
ON B.ColumnB = A.ColumnB
INNER JOIN tableC C
ON C.ColumnC = B.ColumnB
WHERE ColumnX = 'XY'
Versus Common Table Expression
WITH T...

It does not matter.
SQL Server has a cost-based optimizer (as opposed to a rule-based optimizer). That means that the engine is able to figure out that both of your first two options are identical. Run your estimated and actual execution plans and you will see that this is the case.
The only reason you would choose one option over the other is for readability's sake. I go with your second option, because it's a lot easier to read when there are a great many joins involved. ON clauses in reverse order become quite difficult to track.

In my experience, any of the above could be quicker depending on your tables.
As you're setting up joins, you want to start with the most restrictive as possible (without negatively affecting your end result, obviously). This same logic also applies to the Where clause for the same reason. By starting with the most restrictive, you're limiting the number of rows that are being joined and thus evaluated by the Where clause and then returned/manipulated in the select clause. For my answers below regarding the three specific scenarios, I'm assuming a sufficiently complicated query that is doing more than just looking to combine data from multiple tables (i.e., queries answering specific questions).
If Table A is huge and Tables B & C are smaller and more directly related to the data you're trying to isolate, then the first option would likely be fastest.
If Table B or C are huge and Table A is more related to your desired data, the second option would likely be fastest.
As far as option 3 goes, I love CTEs but I try to only use them when I need to do so. Using a CTE will speed up your overall query if the data joined, manipulated, and returned by the CTE is only related to the rest of the query in a limited fashion. Including tables that are only partially related to your end result in your primary string of joins is going to needlessly slow down your query. If you can parse out that data into a CTE, it can run quickly by itself and then be incorporated back into the main query at the end.

View doesn't increase performance of correlated subquery?

I have table User and table Order, now I want to find the names of user who made more than 100 orders, I can do a query like below:
SELECT U.name
FROM User U
WHERE 100 < (
SELECT COUNT(*) FROM Orders O
WHERE O.uid=U.uid
)
This is slow because of correlated subquery.
Therefore I think I can optimize it by creating a view which contains how many order each user has made like below
View UserOrderCount
uid orderCount
0 11
1 108
2 100
3 99
4 32
5 67
Then the query is much simpler:
SELECT U.name
FROM User U, UserOrderCount C
WHERE 100 < C.orderCount And U.uid=C.cid;
But this turn out to take more time, I can't figure out why...
Please shed some light on this, thanks in advance!
EDIT:
Here is how the view is created:
CREATE VIEW UserOrderCount
AS
select U.uid, count(*) AS orderCount
from User U, orders O
group by U.uid;

Creating a view would not be expected to make this faster. The same amount of work still needs to be done behind the scenes. The view just makes the text of your query simpler to look at; it doesn't make it simpler to execute. (But note that you are using two different queries, one is a subquery and one is a join. This is orthogonal to the use of a view: you could have used the view in a subquery, or you could have done a join without the view. Mixing these two things together is likely to cause confusion.)
With Materialized Views, new in 9.3, it would actually store the computed results of the counting, so the execution would be faster. But the price you pay for this is that you would need to refresh the materialized view periodically, and you would be using out-of-date counts in the mean time.

SELECT U.name,count(*)
FROM User U
JOIN Orders o on o.uid=u.uid
WHERE COUNT(*) > 100
GROUP BY U.NAME
You do not need a view for that...

Using OR in WHERE produces unexpected results

In a query today I wanted to select rows that had a particular company name in either of two columns from two different tables. I did it like
WHERE (N.COMPANY LIKE '%ACME%' OR C.COMPANY LIKE '%ACME%')
First, it took 20 minutes to run, which was odd because if I only filtered on one of the columns it took about 10 seconds. Second, when it finished, values for some columns in some rows were NULL when I know for a fact there are values in the database for those columns in those records. So what is going on here? Why can't SQL do the OR on two columns from two tables?
(I worked around it with a UNION - I ran it for C.COMPANY LIKE '%ACME%' and UNIONED it with a SELECT... WHERE N.COMPANY LIKE '%ACME%')

It is bad practice to mix up the JOIN and the WHERE keyword in one single query, because WHERE is an INNER JOIN behind the scenes. From my experience this often results in uncontrolled joins which one don't want.
Try to place your LIKE's after the join of the corresponding tables directly like this:
...
JOIN N ON <Your existing relationship> AND N.COMPANY LIKE '%ACME%'
...
JOIN C ON <Your existing relationship> AND C.COMPANY LIKE '%ACME%'
...

PostgreSQL slow COUNT() - is trigger the only solution?

I have a table with posts, which are categorized by:
type
tag
language
All of those "categories" are stored in next tables (posts_types) and connected via next tables (posts_types_assignment).
COUNTing in PostgreSQL is really slow (i have more than 500k records in that table) and i need to get the number of posts categorized by any combination of type/tag/lang.
If i would solve it through triggers, it would be full of many multi-level loops, which really doesn't look like nice and is hard to maintenance.
Is there any other solution how to effectively get actual number of posts categorized in any type/tag/language?

Let me get this straight.
You have a table posts. You have a table posts_types. The two have a many to many join on posts_types_assignment. And you have some query like this that is slow:
SELECT count(*)
FROM posts p
JOIN posts_types_assigment pta1
ON p.id = pta1.post_id
JOIN posts_types pt1
ON pt1.id = pta1.post_type_id
AND pt1.type = 'language'
AND pt1.name = 'English'
JOIN posts_types_assigment pta2
ON p.id = pta2.post_id
JOIN posts_types pt2
ON pt2.id = pta2.post_type_id
AND pt2.type = 'tag'
AND pt2.name = 'awesome'
And you would like to know why it is painfully slow.
My first note is that PostgreSQL would have to do a lot less work if you had the identifiers in the posts table rather than in the joins. But that is a moot issue, the decision has been made.
My more useful note is that I believe that PostgreSQL has a similar query optimizer to Oracle. In which case to limit the combinatorial explosion of possible query plans that it has to consider, it only considers plans that start with some table, and then repeatedly joins on one more data set at a time. However no such query plan will work here. You can start with pt1, get 1 record, then go to pta1, get a bunch of records, join p, wind up with the same number of records, then join pta2, and now you get a huge number of records, then join to pt2, get just a few records. Joining to pta2 is the slow step, because the database has no idea which records you want, and therefore has to create a temporary result set for every combination of a post and a piece of metadata (type, language or tag) on it.
If this is indeed your problem, then the right plan looks like this. Join pt1 to pta1, put an index on it. Join pt2 to pta2, then join to the result of the first query, then join to p. Then count. This means that we don't get huge result sets.
If this the case, there is no way to tell the query optimizer that this once you want it to think up a new type of execution plan. But there is a way to force it.
CREATE TEMPORARY TABLE t1
AS
SELECT pta*
FROM posts_types pt
JOIN posts_types_assignment pta
ON pt.id = pta.post_type_id
WHERE pt.type = 'language'
AND pt.name = 'English';
CREATE INDEX idx1 ON t1 (post_id);
CREATE TEMPORARY TABLE t2
AS
SELECT pta*
FROM posts_types pt
JOIN posts_types_assignment pta
ON pt.id = pta.post_type_id
JOIN t1
ON t1.post_id = pta.post_id
WHERE pt.type = 'language'
AND pt.name = 'English';
SELECT COUNT(*)
FROM posts p
JOIN t1
ON p.id = t1.post_id;
Barring random typos, etc, this is likely to perform somewhat better. If it doesn't, double check the indexes on your tables.

As btilly notes, and if he has correctly guessed the schema, the table design does not help - it seems (at first sight, at least) that, for example, to have three tables posts_tag(post_id,tag) post_lang(post_id,lang) post_type(post_id,type) would be more natural and much more efficient.
Apart from that (or in addition to that), one could think of a table or materialized view that summarizes all the possible countings, with columns (lang,type,tag,nposts). Of course, to compute this in full would be VERY slow, but (apart from the first time) it can be done either in full "in background", at some intervals (if the data does not vary much, and if you don't require exact counts), or eagerly with triggers.
See for example here

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Left Join Hangs - tsql

Have you tried the execution path tool in the Management Console? Are you sure your left join is not in fact doing a giant cartesian product across A and B?

Related

Can I apply predicates to the same columns against multiple tables in a JOIN only once?

T-SQL different JOIN approaches, same results, which one would you prefer?

View doesn't increase performance of correlated subquery?

Using OR in WHERE produces unexpected results

PostgreSQL slow COUNT() - is trigger the only solution?

Categories

Resources