If you've got a very complicated SELECT statement and some records aren't included because of a join, what is the easiest way to debug this and find the reasons why?
Change the JOINS / INNER JOINS to OUTER JOINS and look for NULLs where they shouldn't be.
You could include your ON logic in the WHERE clause, phrase it like this:
WHERE 1=1
AND...
AND...
and just comment out as many of the terms until you isolate the unexpected behaviour.
I don't know if this will help you, but when I find myself with a complex Select that I'm having a hard time maintaining, or debugging, I'll break it up into separate common table expressions (CTE's). I've found this makes many of my queries much easier to understand and maintain.
Binary Search stylee:
If you have 10 joins, comment out the last 5.
Still have the problem? comment out the last 2/3 joins that are still uncommented
Still have the problem? comment out the last 1/2 joins that are still uncommented
Do this until you get down to it working, then the problem will lie in the last joins you commented out.
Yes you could do them one at a time but this is usually quicker.
Obviously you will have to comment out all the columns not used in the select statement, but I normally just put /* */ around all the columns, then put a * instead.
Just look at the number of results returned.
Related
First of all, English it's not my first language, feel free to edit my question and I'm sorry for any mistakes that can offend you or not being so clear exposing the problem.
I have a few sql queries with lots of joins, these joins are based on clustered index (no worries about that). Some of the joins are used only to respect normalization and because is intuitive to maintenance, but sometimes it's possible to skip some of then. It's not clear to me what to do about these joins in terms of best practices.
Edit:
A simple example:
select *
from things
join things_categories on
things_categories.id_thing = things.id_thing
join categories on
categories.id_category = things_categories.id_category
join categories_properties on
categories_properties.id_category = categories.id_category
where
categories_properties.bo_default = 1
But it's possible to do:
select *
from things
join things_categories on
things_categories.id_thing = things.id_thing
join categories_properties on
categories_properties.id_category = things_categories.id_category
where
categories_properties.bo_default = 1
The second join it's not necessary (I do have integrity at database level), it's there only because makes the code more intuitive and respect the database normalization. I'm not sure if I should follow the smallest possible and efficient path or leave unnecessary joins to respect normalization and make the code more intuitive.
Any tips?
All the best.
It deppends, wheter you've or not integrity already.
In one hand, if the categories_properties table has a foreign key in the id_category column, then the integrity exists and you don't need to make the join with the categories table.
On the other hand, if the integrity might not exist (i.e.: there are id_categories in categories_properties table that are not defined in categories table), then you should make the join.
The join:
join categories on
categories.id_category = things_categories.id_category
is very necessary, since the categories table is used in the next join:
join categories_properties on
categories_properties.id_category = categories.id_category
So it's definitely required, if it's not already defined, as SQL requires for you to establish the links it needs to index and join one to the next.
What is however very painful, is the select *.
You don't need all that info, since * will bring all data from all tables.
Perhaps you could specify what you need from each table or, at worst, use things.* to specify all columns of a specific table.
If you do not need a join do not use it. You are taking a totally unneeded performance hit. Don't force the database to do work it doesn't need to do because you think it looks more comlete, you should consider performance ahead of readability in a query. After all once you start writing performant SQl code, it will become more readable to you. However, make sure you actually don't need it before eliminating it by making sure both versions of the query return the same result set.
I designed a set of tables in pgAdmin. I gave names like Products and ProductRID. I was very surprised though when I went to query this table only to find a query like this yielded unknown relation:
select * from Products
Apparently the proper way to access this is
select * from "Products"
which is very ugly. I can rename the tables to all lower case to query without quotes, but then it looks ugly. Is there any kind of a setting so that it will retain the case, but behave without case sensitivity?
No there is no magic setting. The best way to deal with case sensitivity is to not quote your relations when you are creating them. If you are early on in schema design, go ahead and rename them (and column names) to lower case. The "looks ugly" problem will go away because in your queries you can still do
SELECT * FROM Products
and it will work fine.
You may check the relative wiki to get the precise answer
Why are my table and column names not recognized in my query? Why is capitalization not preserved?
Hope it clarifies.
I am receiving a "wildcard query expansion resulted in too many terms" error when executing a query similar to the following:
SELECT *
FROM table_a
WHERE contains(clob_field, '%a%') > 0;
Does anyone know a workaround/solution to this problem?
According to this, you may need to increase the wildcard_maxterms parameter, or take further steps. See the link for details (I'm not an expert in Oracle Text though).
Why can't one use an output column in the having clause in postgresql? It doesn't change expressivity of the language anyhow, just forces people to rewrite output column definition in having clause. Is a way to avoid that, apart from putting the whole query as a subquery in SELECT * FROM (...) AS t WHERE condition ?
Bacause it's not implemented? And if you're asking why it wasn't implemented, I see 2 possible explanations:
standard doesn't require it
nobody had time to spent on it
if you'd like to have it - mail to -hackers, talk about, and then implement.
Frankly I don't see it as a big problem - it's not like you have 1000 characters to retype.
I have a select statement and a cursor to iterate the rows I get. the problem is that I have many columns (more than 500), and so "fetch .. into #variable" is impossible for me. how can I iterate the columns (one by one, I need to process the data)?
Thanks in advance,
n.b
Two choices.
1/ Use SSIS or ADO.Net to pour through your dataset row by row.
2/ Consider what you're actually needing to achieve and find a set-based approach.
My preference is for option 2. Let us know what you need done and we'll find a way.
Rob
You can build a SQL string using sys.columns or INFORMATION_SCHEMA queries. Here's a post I wrote on that.