Threats / drawbacks behind ISNULL(), NULLIF(), and COALESCE() - tsql

I am working on a Microsoft SQL Server 2005 with Transact-SQL.
I am trying to concatenate string values coming from different columns of the same table dealing with NULL values.
Say for example the table is Person and the columns are FirstName, SurnamePrefix, LegalSurname
It happened that concatenating a string value with a NULL value (coming from two different columns) returns in output a NULL value.
I tried different scenarios to prevent NULL values in output:
Starting from:
Person.FirstName + ' ' + COALESCE(RTRIM(LTRIM(Person.SurnamePrefix)) + ' ', '') + Person.LegalSurname
I changed my statement to:
COALESCE(Person.FirstName + ' ', '') + COALESCE(Person.SurnamePrefix, '') + COALESCE(' ' + Person.LegalSurname, '')
Then I came across functions like ISNULL(), NULLIF() etc.
Which is the best and efficient approach to show empty strings values in output rather than NULL values?
Is the solution affected by the version of the SQL Server? (i.e. 2005, 2008, etc.)

ISNULL is good for default values, as you are doing. COALESCE has the advantage of accepting more than two arguments. NULLIF is quite different as it returns a NULL if the arguments are equal.
You can benchmark them for performance. I suspect that the difference is negligible and that it is far more important to opt for clarity in your code.

This isn't a direct answer to your question, but although it is deprecated in future versions of SQL Server, SQL 2005 allows you to set CONCAT_NULL_YIELDS_NULL off at connection level. (It's also possible to set it at database level using an ALTER DATABASE command, but this is likely to affect the behaviour of existing queries).
You could set this before running your queries:
SET CONCAT_NULL_YIELDS_NULL OFF
SELECT 'a' + NULL
yields the result
a
From the perspective of maintainability it might be better to avoid doing this - it will confuse the unwary - but it is another alternative to what you're doing now.

Aaron Bertrand compared COALESCE with ISNULL and found no significant performance difference between the two.

Related

Alphanumeric sorting without any pattern on the strings [duplicate]

I've got a Postgres ORDER BY issue with the following table:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
I select the last record with SELECT * FROM employees ORDER BY em_code DESC
Strip alphabets from em_code usiging reg exp and store in ec_alpha
Cast the remating part to integer ec_num
Increment by one ec_num++
Pad with sufficient zeors and prefix ec_alpha again
When em_code reaches EM1000, the above algorithm fails.
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
Any idea how to select EM1000?
Since Postgres 9.6, it is possible to specify a collation which will sort columns with numbers naturally.
https://www.postgresql.org/docs/10/collation.html
-- First create a collation with numeric sorting
CREATE COLLATION numeric (provider = icu, locale = 'en#colNumeric=yes');
-- Alter table to use the collation
ALTER TABLE "employees" ALTER COLUMN "em_code" type TEXT COLLATE numeric;
Now just query as you would otherwise.
SELECT * FROM employees ORDER BY em_code
On my data, I get results in this order (note that it also sorts foreign numerals):
Value
0
0001
001
1
06
6
13
۱۳
14
One approach you can take is to create a naturalsort function for this. Here's an example, written by Postgres legend RhodiumToad.
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches($1, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D is the regular expression class-shorthand for "non-digits".
'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
you can use just this line
"ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
I wrote about this in detail in this related question:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
I came up with something slightly different.
The basic idea is to create an array of tuples (integer, string) and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
https://github.com/ccsalway/dbNaturalSort
The following solution is a combination of various ideas presented in another question, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

When using 'Replace' built in function in T-SQL within a Select Query, does the data on the table get modified?

I have the following query
SELECT
[DocID],
[Docunum],
[Comments] = REPLACE(REPLACE([Comments], CHAR(13), ''), CHAR(10), '')
FROM
[Billy].[dbo].[order]
WHERE
DocDate = '2017-12-20 00:00:00.000'
I was wondering if the replace function, actually changes the value in the database? My concern is that this is ERP and I do not want referential integrity problems. I only want to eliminate the carriage separators from the NVARCHAR column to avoid spacing issues while pasting in Excel. I do not want any values changed in the database.
Any feedback would be appreciated. I have searched and did not find anything that answered this specifically. If I missed something please post link for reference if possible.
Actually here you are using replace in Select query so it will not affect your database it will only affect your result which is returned by this query, so here you are safe.

Why do I get a DATETIME conversion error in TSQL?

I know there are numerous questions about this topic, even one I asked myself a while ago (here). Now I ran into a different problem, and neither myself nor my colleagues know what the reason for the strange behaviour is.
We've got a relatively simple SQL statement quite like this:
SELECT
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16) AS MyDate,
SomeOtherColumn,
...
FROM
MyTable
INNER JOIN MyOtherTable
ON MyTable.ID = MyOtherTable.MyTableID
WHERE
MyTable.ID > SomeValue AND
MyText LIKE 'Date: %'
This is not my database and also not my SQL statement, and I didn't create the great schema to store datetime values in varchar columns, so please ignore that bit.
The problem we are facing right now is a SQL conversion error 241 ("Conversion failed when converting date and/or time from character string.").
Now I know that the query optimiser may change the execution plan that the WHERE clause may be used to filter results after the conversion is attempted, but the really strange thing is that I don't get any errors when I delete all of the WHERE clause.
I also don't get any errors when I add a single line to the statement above as follows:
SELECT
MyText, -- This is the added line
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16) AS MyDate,
...
As soon as I remove it I get the conversion error again. Manually checking the values in the MyText column without trying to convert them does not show that there are any records which might cause a problem.
What is the reason for the conversion error? Why do I not run into it when I also select the column as part of the SELECT statement?
Update
Here the execution plan, although I don't think it's going to help.
Sometimes, SQL Server aggressively optimizes by pushing conversion operations earlier in the process than they would otherwise need to be. (It shouldn't. See SQL Server should not raise illogical errors on Connect, as an example).
When you just select:
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16)
Then the optimizer decides it can perform this conversion as part of the table/index scan or seek - right at the point at which it's reading the data from the table (and, importantly, before, or at the same time, as the WHERE clause filter). The rest of the query can then just use the converted value.
When you select:
MyText, -- This is the added line
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16)
It decides to let the conversion happen later. Importantly, the conversion now (by happenstance) happens later than the WHERE clause filter which should, by rights, be filtering all rows before the conversion is attempted.
The only safe way to deal with this is to force the filtering to definitely occur before the conversion is attempted. If you're not dealing with aggregates, a CASE expression may be safe enough:
SELECT CASE WHEN MyText LIKE 'Date: %' THEN CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16) END
Otherwise, the even safer option is to split the query into two separate queries, and store the intermediate results in a temp table or table variable (views, CTEs and subqueries don't count, because the optimizer can "see through" such constructs)

syb_describe in DBD::Sybase

I am looking to extract Sybase datatype for all the columns in a table. When I try to achieve this using $sth->{TYPE}, I get a numeric version of the datatype (i.e. instead of sybase datatype varchar, I get 0).
From the DBD::Sybase documentation, I noticed that SYBTYPE attribute of syb_describe function might be able to produce what I am looking for. But it seems that my understanding is not proper. SYBTYPE also prints datatype in numeric form only.
Is there any way to fetch the textual representation of actual Sybase datatype (instead of the number)?
It sounds like you wish to reverse engineer the create table definition. Here is an SQL script you can use for Sybase or SQL Server tables.
select c.name,
"type(size)"=case
when t.name in ("char", "varchar") then
t.name + "(" + rtrim(convert(char(3), c.length)) + ")"
else t.name
end,
"null"=case
when convert(bit, (c.status & 8)) = 0 then "NOT NULL"
else "NULL"
end
from syscolumns c, systypes t
where c.id = object_id("my_table_name")
and c.usertype *= t.usertype
order by c.colid
go
Note: This could still be edited with a nawk script to create a real SQL schema file.
The nawk script would strip the header, add "create table my_table_name", add commas, strip the footer and add a "go".
Good SQL, good night!
I found a workaround (Note: This does not answer the question though):
What I did was simply joined the sysobjects, systypes and syscolumns system tables.

Is it possible to use CASE with IN?

I'm trying to construct a T-SQL statement with a WHERE clause determined by an input parameter. Something like:
SELECT * FROM table
WHERE id IN
CASE WHEN #param THEN
(1,2,4,5,8)
ELSE
(9,7,3)
END
I've tried all combination of moving the IN, CASE etc around that I can think of. Is this (or something like it) possible?
try this:
SELECT * FROM table
WHERE (#param='??' AND id IN (1,2,4,5,8))
OR (#param!='??' AND id in (9,7,3))
this will have a problem using an index.
The key with a dynamic search conditions is to make sure an index is used, instead of how can I easily reuse code, eliminate duplications in a query, or try to do everything with the same query. Here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
It covers all the issues and methods of trying to write queries with multiple optional search conditions. This main thing you need to be concerned with is not the duplication of code, but the use of an index. If your query fails to use an index, it will preform poorly. There are several techniques that can be used, which may or may not allow an index to be used.
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
if you are on the proper version of SQL Server 2008, there is an additional technique that can be used, see: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
If you are on that proper release of SQL Server 2008, you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)
if #param = 'whatever'
select * from tbl where id in (1,2,4,5,8)
else
select * from tbl where id in (9,7,3)