join multiple fields to one table - select

I have 2 tables (there are more but un related to question) optionValue and productStock
I want to get the option names from the optionValue table for each option1, option2, option3 (the query below will should help to make more sense)
below is my attempt, the current query it only works if all options are set but is returns null if any option is not set:
SELECT s.option1, n1.name s.optionName1,
s.option2, n2.name s.optionName2,
s.option3, n3.name s.optionName3
FROM productStock as s
INNER JOIN optionValue n1 on s.option1 = v1.optionValueID
INNER JOIN optionValue n2 on s.option2 = v2.optionValueID
INNER JOIN optionValue n3 on s.option3 = v3.optionValueID
WHERE s.productStockID = 1
I understand why it doesn't work because when the option is null ther is no matches to the optionValue table but im not sure how to fix it (if it is fixable)
I read in a couple of places about using IN or COALESCE but I don't understand how to use them.

It seems like some of your syntax is a little incorrect.
Apart from that you want LEFT OUTER JOIN instead of INNER JOIN.
Visual explanation of SQL joins.

What you really need first it to correct your database design. Anytime you have fields like this:
s.option1,s.option2, s.option3
Then what you really need is a child table to store the information. What happens when you need 6 options or 25? This is a very bad database design and will cause no end of problems incuding the inefficent query you now have to write. This is a cancer at the heart of your system and needs to be fixed before anything else is done.

Related

PostgreSQL Select acorss tables

I have very little experience with PostgreSQL and looking for a way using pgadmin to list out the rows across tables with foreign keys so that I can quickly update some records without doing this via a webapp.
All my tables are prefixed with app
I can select * from one table, but don't know what I'm doing beyond that to get the related table data.
SELECT app_choice.choice_text, app_choice.question_id, app_choice_choice_value
FROM app_choice;
INNER JOIN app_projectquestionnairequestion
ON app.questionnaire.id = app_projectquestionnairequestion.id;
Some my choice table is linked to my projectquestionnairequestion table. So basically every question in projectquestionnairequestion has a related choice in the choice table.
But not all questions have a choice available so i need a way to list all the questions to let me add a choice to it?
Sorry for the bad explanation. It's hard to explain when I dont know the terms
Thanks
This is not an answer but a comment that doesn't fit in the comments section. Don't upvote it since I'll remove it.
Your syntax is a bit wrong, but you are not too far from it. Also, you should use "table aliases" as well (c and q below). Your query could look like:
SELECT
c.choice_text, c.question_id, c.choice_value,
q.name -- this would be the "name" column of the second table
FROM app_choice c
INNER JOIN app_projectquestionnairequestion q
ON q.id = c.question_id;

T-SQL different JOIN approaches, same results, which one would you prefer?

these are 3 approaches how to make a join. I would like to hear some word on perforance of these 3 queries.
Thank you
SELECT * FROM
tableA A LEFT JOIN tableB B
INNER JOIN tableC C
ON C.ColumnC = B.ColumnB
ON B.ColumnB = A.ColumnB
WHERE ColumnX = 'XY'
Versus
SELECT * FROM
tableA A LEFT JOIN tableB B
ON B.ColumnB = A.ColumnB
INNER JOIN tableC C
ON C.ColumnC = B.ColumnB
WHERE ColumnX = 'XY'
Versus Common Table Expression
WITH T...
It does not matter.
SQL Server has a cost-based optimizer (as opposed to a rule-based optimizer). That means that the engine is able to figure out that both of your first two options are identical. Run your estimated and actual execution plans and you will see that this is the case.
The only reason you would choose one option over the other is for readability's sake. I go with your second option, because it's a lot easier to read when there are a great many joins involved. ON clauses in reverse order become quite difficult to track.
In my experience, any of the above could be quicker depending on your tables.
As you're setting up joins, you want to start with the most restrictive as possible (without negatively affecting your end result, obviously). This same logic also applies to the Where clause for the same reason. By starting with the most restrictive, you're limiting the number of rows that are being joined and thus evaluated by the Where clause and then returned/manipulated in the select clause. For my answers below regarding the three specific scenarios, I'm assuming a sufficiently complicated query that is doing more than just looking to combine data from multiple tables (i.e., queries answering specific questions).
If Table A is huge and Tables B & C are smaller and more directly related to the data you're trying to isolate, then the first option would likely be fastest.
If Table B or C are huge and Table A is more related to your desired data, the second option would likely be fastest.
As far as option 3 goes, I love CTEs but I try to only use them when I need to do so. Using a CTE will speed up your overall query if the data joined, manipulated, and returned by the CTE is only related to the rest of the query in a limited fashion. Including tables that are only partially related to your end result in your primary string of joins is going to needlessly slow down your query. If you can parse out that data into a CTE, it can run quickly by itself and then be incorporated back into the main query at the end.

Left Join Hangs

I am trying to figure out what could be causing a left join to hang. I've narrowed a problem down to a specific table but I can't for the life of me figure out what might be going on. Basically, I have two tables, lets call them table A and table B. When I left join table A to table B (its a 1 to 1 relationship with table B not always having a related record to table A) the query hangs. When I inner join table A to table B, it runs in about a half second returning about 27,000 records. Why is it when I run a left join, which should take a bit longer but not by much, it hangs? Could I have bad data in table B? The fields I'm joining are bigint's. I'm stumped on this one. Any help would be much appreciated.
Here is my sql:
select
RegMemberTrip.idmember,
RegParent1.idMember_Parent1,
regparent1.idParent1
from
regmembertrip
left join
regparent1 on RegMemberTrip.idmember = regparent1.idMember_Parent1
where
regmembertrip.IDRound = 25
RegParent1 is a view
If I change the where criteria to '= 24' it works fine. IDRound = 25 is fairly new data. And like I said, if I keep this the way it is (idround = 25) with an inner join it works fine.
Thanks,
Ben
Have you tried the execution path tool in the Management Console? Are you sure your left join is not in fact doing a giant cartesian product across A and B?

PostgreSQL slow COUNT() - is trigger the only solution?

I have a table with posts, which are categorized by:
type
tag
language
All of those "categories" are stored in next tables (posts_types) and connected via next tables (posts_types_assignment).
COUNTing in PostgreSQL is really slow (i have more than 500k records in that table) and i need to get the number of posts categorized by any combination of type/tag/lang.
If i would solve it through triggers, it would be full of many multi-level loops, which really doesn't look like nice and is hard to maintenance.
Is there any other solution how to effectively get actual number of posts categorized in any type/tag/language?
Let me get this straight.
You have a table posts. You have a table posts_types. The two have a many to many join on posts_types_assignment. And you have some query like this that is slow:
SELECT count(*)
FROM posts p
JOIN posts_types_assigment pta1
ON p.id = pta1.post_id
JOIN posts_types pt1
ON pt1.id = pta1.post_type_id
AND pt1.type = 'language'
AND pt1.name = 'English'
JOIN posts_types_assigment pta2
ON p.id = pta2.post_id
JOIN posts_types pt2
ON pt2.id = pta2.post_type_id
AND pt2.type = 'tag'
AND pt2.name = 'awesome'
And you would like to know why it is painfully slow.
My first note is that PostgreSQL would have to do a lot less work if you had the identifiers in the posts table rather than in the joins. But that is a moot issue, the decision has been made.
My more useful note is that I believe that PostgreSQL has a similar query optimizer to Oracle. In which case to limit the combinatorial explosion of possible query plans that it has to consider, it only considers plans that start with some table, and then repeatedly joins on one more data set at a time. However no such query plan will work here. You can start with pt1, get 1 record, then go to pta1, get a bunch of records, join p, wind up with the same number of records, then join pta2, and now you get a huge number of records, then join to pt2, get just a few records. Joining to pta2 is the slow step, because the database has no idea which records you want, and therefore has to create a temporary result set for every combination of a post and a piece of metadata (type, language or tag) on it.
If this is indeed your problem, then the right plan looks like this. Join pt1 to pta1, put an index on it. Join pt2 to pta2, then join to the result of the first query, then join to p. Then count. This means that we don't get huge result sets.
If this the case, there is no way to tell the query optimizer that this once you want it to think up a new type of execution plan. But there is a way to force it.
CREATE TEMPORARY TABLE t1
AS
SELECT pta*
FROM posts_types pt
JOIN posts_types_assignment pta
ON pt.id = pta.post_type_id
WHERE pt.type = 'language'
AND pt.name = 'English';
CREATE INDEX idx1 ON t1 (post_id);
CREATE TEMPORARY TABLE t2
AS
SELECT pta*
FROM posts_types pt
JOIN posts_types_assignment pta
ON pt.id = pta.post_type_id
JOIN t1
ON t1.post_id = pta.post_id
WHERE pt.type = 'language'
AND pt.name = 'English';
SELECT COUNT(*)
FROM posts p
JOIN t1
ON p.id = t1.post_id;
Barring random typos, etc, this is likely to perform somewhat better. If it doesn't, double check the indexes on your tables.
As btilly notes, and if he has correctly guessed the schema, the table design does not help - it seems (at first sight, at least) that, for example, to have three tables posts_tag(post_id,tag) post_lang(post_id,lang) post_type(post_id,type) would be more natural and much more efficient.
Apart from that (or in addition to that), one could think of a table or materialized view that summarizes all the possible countings, with columns (lang,type,tag,nposts). Of course, to compute this in full would be VERY slow, but (apart from the first time) it can be done either in full "in background", at some intervals (if the data does not vary much, and if you don't require exact counts), or eagerly with triggers.
See for example here

Can UNION ALL be faster than JOINs or do my JOINs just suck?

I have a Notes table with a uniqueidentifier column that I use as a FK for a variety of other tables in the database (don't worry, the uniqueidentifier columns on the other tables aren't clustered PKs). These other tables represent something of a hierarchy of business objects. As a simple representation, let's say I have two other tables:
Leads (PK LeadID)
Quotes (PK QuoteID, FK LeadID)
In the display of a Lead in the application, I need to show all notes related to the lead, including those tagged to any Quote that belongs to that lead. I have two options as far as I can see — either a UNION ALL or several LEFT JOIN statements. Here's how they'd look:
SELECT N.*
FROM Notes N
JOIN Leads L ON N.TargetUniqueID = L.UniqueID
WHERE L.LeadID = #LeadID
UNION ALL
SELECT N.*
FROM Notes N
JOIN Quotes Q ON N.TargetUniqueID = Q.UniqueID
WHERE Q.LeadID = #LeadID
Or...
SELECT N.*
FROM Notes N
LEFT JOIN Leads L ON N.TargetUniqueID = L.UniqueID
LEFT JOIN Quotes Q ON N.TargetUniqueID = Q.UniqueID
WHERE L.LeadID = #LeadID OR Q.LeadID = #LeadID
In real life I have a total of five tables that the notes could be attached to, and that number could grow as the application grows. I already have non-clustered indexes set up on the uniqueidentifier columns I'm using, and SQL Profiler says I can't make any more improvements, but when I do a performance test on a realistically-sized test data set, I get the following numbers:
UNION ALL — 0.010 sec
LEFT JOIN — 0.744 sec
I had always heard that using UNION was bad, and that UNION ALL was only marginally better, but the performance numbers don't seem to bear that out. Granted, the UNION ALL SQL code might be more of a pain to maintain, but at that kind of performance difference it's probably worth it.
So is UNION ALL really better here or am I missing something on the LEFT JOIN code that is slowing things down?
The UNION ALL version would probably be satisfied quite easily by 2 index seeks. OR can lead to scans. What do the execution plans look like?
Also have you tried this to avoid accessing Notes twice?
;WITH J AS
(
SELECT UniqueID FROM Leads WHERE LeadID = #LeadID
UNION ALL
SELECT UniqueID FROM Quotes WHERE LeadID = #LeadID
)
SELECT N.* /*Don't use * though!*/
FROM Notes N
JOIN J ON N.TargetUniqueID = J.UniqueID
I may be wrong, but I think that you will get a better performance if you rewrite you JOIN version to
SELECT N.*
FROM Notes N
LEFT JOIN Leads L ON N.TargetUniqueID = L.UniqueID AND L.LeadID = #LeadID
LEFT JOIN Quotes Q ON N.TargetUniqueID = Q.UniqueID AND Q.LeadID = #LeadID
WHERE Q.LeadID IS NOT NULL OR L.LeadID IS NOT NULL
In my experience, SQL Server is really bad with join conditions containing OR. I also use UNIONs in that case, and I got similar results like you (maybe half a second instead of 20).
Who said UNIONS are bad? Especially if you use UNION ALL, there should not be a performance hit, as UNION would have to go through the result to only keep unique records (actually doing something like distinct or group by).
You second query wouldn' even give correct results as it would covert the left joins to inner joins, see here for explantion as to why your syntax is bad:
http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN
UNION is slower, but UNION ALL should be pretty quick, right?