I know there are numerous questions about this topic, even one I asked myself a while ago (here). Now I ran into a different problem, and neither myself nor my colleagues know what the reason for the strange behaviour is.
We've got a relatively simple SQL statement quite like this:
SELECT
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16) AS MyDate,
SomeOtherColumn,
...
FROM
MyTable
INNER JOIN MyOtherTable
ON MyTable.ID = MyOtherTable.MyTableID
WHERE
MyTable.ID > SomeValue AND
MyText LIKE 'Date: %'
This is not my database and also not my SQL statement, and I didn't create the great schema to store datetime values in varchar columns, so please ignore that bit.
The problem we are facing right now is a SQL conversion error 241 ("Conversion failed when converting date and/or time from character string.").
Now I know that the query optimiser may change the execution plan that the WHERE clause may be used to filter results after the conversion is attempted, but the really strange thing is that I don't get any errors when I delete all of the WHERE clause.
I also don't get any errors when I add a single line to the statement above as follows:
SELECT
MyText, -- This is the added line
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16) AS MyDate,
...
As soon as I remove it I get the conversion error again. Manually checking the values in the MyText column without trying to convert them does not show that there are any records which might cause a problem.
What is the reason for the conversion error? Why do I not run into it when I also select the column as part of the SELECT statement?
Update
Here the execution plan, although I don't think it's going to help.
Sometimes, SQL Server aggressively optimizes by pushing conversion operations earlier in the process than they would otherwise need to be. (It shouldn't. See SQL Server should not raise illogical errors on Connect, as an example).
When you just select:
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16)
Then the optimizer decides it can perform this conversion as part of the table/index scan or seek - right at the point at which it's reading the data from the table (and, importantly, before, or at the same time, as the WHERE clause filter). The rest of the query can then just use the converted value.
When you select:
MyText, -- This is the added line
CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16)
It decides to let the conversion happen later. Importantly, the conversion now (by happenstance) happens later than the WHERE clause filter which should, by rights, be filtering all rows before the conversion is attempted.
The only safe way to deal with this is to force the filtering to definitely occur before the conversion is attempted. If you're not dealing with aggregates, a CASE expression may be safe enough:
SELECT CASE WHEN MyText LIKE 'Date: %' THEN CONVERT(DATETIME, SUBSTRING(MyText, CHARINDEX('Date:', MyText) + 8, 16) END
Otherwise, the even safer option is to split the query into two separate queries, and store the intermediate results in a temp table or table variable (views, CTEs and subqueries don't count, because the optimizer can "see through" such constructs)
Related
I'm trying to pull data for certain dates out of a staging table where the offshore developers imported everything in the file, so I need to filter out the "non-data" rows and convert the remaining strings to datetime.
Which should be simple enough but... I keep getting this error:
The conversion of a varchar data type to a datetime data type resulted in an out-of-range value.
I've taken the query and pulled it apart, made sure there are no invalid strings left and even tried a few different configurations of the query. Here's what I've got now:
SELECT *
FROM
(
select cdt = CAST(cmplt_date as DateTime), *
from stage_hist
WHERE cmplt_date NOT LIKE '(%'
AND ltrim(rtrim(cmplt_date)) NOT LIKE ''
AND cmplt_date NOT LIKE '--%'
) f
WHERE f.cdt BETWEEN '2017-09-01' AND '2017-10-01'
To make sure the conversion is working at least, I can run the inner query and the cast actually works for all rows. I get a valid data set for the rows and no errors, so the actual cast is working.
The BETWEEN statement must be throwing the error then, right? But I've casted both strings I use for that successfully, and even taken a value out of the table and did a test query using it which also works succesfully:
select 1
WHERE CAST(' 2017-09-26' as DateTime) BETWEEN '2017-09-01' AND '2017-10-01'
So if all the casts work individually, how come I'm getting an out-of-range error when running the real query?
I am guessing that this is due to the fact that in your cmplt_date field there are values which are not valid dates. Yes, I know you are filtering them using a WHERE clause, but know that Logical Processing Order of the SELECT statement is not always the actual order. What does this mean is that sometimes, the SQL Engine my start performing your CAST operation before finishing the filtering.
You are using SQL Server 2012, so you can just add TRY_CAST:
SELECT *
FROM
(
select cdt = TRY_CAST(cmplt_date as DateTime), *
from stage_hist
WHERE cmplt_date NOT LIKE '(%'
AND ltrim(rtrim(cmplt_date)) NOT LIKE ''
AND cmplt_date NOT LIKE '--%'
) f
WHERE f.cdt BETWEEN '2017-09-01' AND '2017-10-01'
In my table results from column work_time (interval type) display as 200:00:00. Is it possible to cut the seconds part, so it will be displayed as 200:00? Or, even better: 200h00min (I've seen it accepts h unit in insert so why not load it like this?).
Preferably, by altering work_time column, not by changing the select query.
This is not something you should do by altering a column but by changing the select query in some way. If you change the column you are changing storage and functional uses, and that's not good. To change it on output, you need to modify how it is retrieved.
You have two basic options. The first is to modify your select queries directly, using to_char(myintervalcol, 'HH24:MI')
However if your issue is that you have a common format you want to have universal access to in your select query, PostgreSQL has a neat trick I call "table methods." You can attach a function to a table in such a way that you can call it in a similar (but not quite identical) syntax to a new column. In this case you would do something like:
CREATE OR REPLACE FUNCTION myinterval_nosecs(mytable) RETURNS text LANGUAGE SQL
IMMUTABLE AS
$$
SELECT to_char($1.myintervalcol, 'HH24:MI');
$$;
This works on the row input, not on the underlying table. As it always returns the same information for the same input, you can mark it immutable and even index the output (meaning it can be run at plan time and indexed used).
To call this, you'd do something like:
SELECT myinterval_nosecs(m) FROM mytable m;
But you can then use the special syntax above to rewrite that as:
SELECT m.myinterval_nosecs FROM mytable m;
Note that since myinterval_nosecs is a function you cannot omit the m. at the beginning. This is because the query planner will rewrite the query in the former syntax and will not guess as to which relation you mean to run it against.
I am working on a Microsoft SQL Server 2005 with Transact-SQL.
I am trying to concatenate string values coming from different columns of the same table dealing with NULL values.
Say for example the table is Person and the columns are FirstName, SurnamePrefix, LegalSurname
It happened that concatenating a string value with a NULL value (coming from two different columns) returns in output a NULL value.
I tried different scenarios to prevent NULL values in output:
Starting from:
Person.FirstName + ' ' + COALESCE(RTRIM(LTRIM(Person.SurnamePrefix)) + ' ', '') + Person.LegalSurname
I changed my statement to:
COALESCE(Person.FirstName + ' ', '') + COALESCE(Person.SurnamePrefix, '') + COALESCE(' ' + Person.LegalSurname, '')
Then I came across functions like ISNULL(), NULLIF() etc.
Which is the best and efficient approach to show empty strings values in output rather than NULL values?
Is the solution affected by the version of the SQL Server? (i.e. 2005, 2008, etc.)
ISNULL is good for default values, as you are doing. COALESCE has the advantage of accepting more than two arguments. NULLIF is quite different as it returns a NULL if the arguments are equal.
You can benchmark them for performance. I suspect that the difference is negligible and that it is far more important to opt for clarity in your code.
This isn't a direct answer to your question, but although it is deprecated in future versions of SQL Server, SQL 2005 allows you to set CONCAT_NULL_YIELDS_NULL off at connection level. (It's also possible to set it at database level using an ALTER DATABASE command, but this is likely to affect the behaviour of existing queries).
You could set this before running your queries:
SET CONCAT_NULL_YIELDS_NULL OFF
SELECT 'a' + NULL
yields the result
a
From the perspective of maintainability it might be better to avoid doing this - it will confuse the unwary - but it is another alternative to what you're doing now.
Aaron Bertrand compared COALESCE with ISNULL and found no significant performance difference between the two.
I have a complex query that requires a rank in it. I've learned that the standard way of doing that is by using the technique found on this page: http://thinkdiff.net/mysql/how-to-get-rank-using-mysql-query/. I'm using Infobright as the back end and it doesn't work quite as expected. That is, while a standard MySQL engine would show the rank as 1, 2, 3, 4, etc... Brighthouse (Infobright's engine) would return 1, 1, 1, 1, etc.... So I came up with a strategy of setting a variable, a function, and then execute it in the query. Here's a proof of concept query that does just that:
SET #rank = 0;
DROP FUNCTION IF EXISTS __GetRank;
DELIMITER $$
CREATE FUNCTION __GetRank() RETURNS INT
BEGIN
SET #rank = #rank + 1;
return #rank;
END$$
DELIMITER ;
select __GetRank() AS rank, id from accounts;
I then copied and pasted the function into Jasper Report's iReport and then compiled my report. After executing it, I get syntax errors. So I thought that perhaps the ; was throwing it off. So at the top of the query, I put in DELIMITER ;. This did not work either.
Is what I'm wanting to do even possible? If so, how? And if there's an Infobright way of getting a rank without writing a function, I'd be open to that too.
Infobright does not support functions.
From the site: http://www.infobright.org/forums/viewthread/1871/#7485
Indeed, IB supports stored procedures, but does not support stored functions nor user defined functions.
select if(#rank is null,#rank:= 0,#rank:= #rank +1) as rank, id from accounts
Does not work, because you cannot write to #vars in queries.
This:
SELECT
(SELECT COUNT(*)
FROM mytable t1
WHERE t1.rankedcolumn > t2.rankedcolumn) AS rank,
t2.rankedcolumn
FROM mytable t2 WHERE ...;
will work, but is very slow of course.
Disclaimer, not my code, but Jakub Wroblewski's (Infobright founder)
Hope this helps...
Here's how I solved this. I had my server side program execute a mysql script. I then took the output and converted it to a CSV. I then used this as the input data for my report. A little convoluted, but it works.
I think this is best asked in the form of a simple example. The following chunk of SQL causes a "DB-Library Error:20049 Severity:4 Message:Data-conversion resulted in overflow" message, but how come?
declare #a numeric(18,6), #b numeric(18,6), #c numeric(18,6)
select #a = 1.000000, #b = 1.000000, #c = 1.000000
select #a/(#b/#c)
go
How is this any different to:
select 1.000000/(1.000000/1.000000)
go
which works fine?
I ran into the same problem the last time I tried to use Sybase (many years ago). Coming from a SQL Server mindset, I didn't realize that Sybase would attempt to coerce the decimals out -- which, mathematically, is what it should do. :)
From the Sybase manual:
Arithmetic overflow errors occur when
the new type has too few decimal
places to accommodate the results.
And further down:
During implicit conversions to numeric
or decimal types, loss of scale
generates a scale error. Use the
arithabort numeric_truncation option
to determine how serious such an error
is considered. The default setting,
arithabort numeric_truncation on,
aborts the statement that causes the
error but continues to process other
statements in the transaction or
batch. If you set arithabort
numeric_truncation off, Adaptive
Server truncates the query results and
continues processing.
So assuming that the loss of precision is acceptable in your scenario, you probably want the following at the beginning of your transaction:
SET ARITHABORT NUMERIC_TRUNCATION OFF
And then at the end of your transaction:
SET ARITHABORT NUMERIC_TRUNCATION ON
This is what solved it for me those many years ago ...
This is just speculation, but could it be that the DBMS doesn't look at the dynamic value of your variables but only the potential values? Thus, a six-decimal numeric divided by a six-decimal numeric could result in a twelve-decimal numeric; in the literal division, the DBMS knows there is no overflow. Still not sure why the DBMS would care, though--shouldn't it return the result of two six-decimal divisions as up to a 18-decimal numeric?
Because you have declared the variables in the first example the result is expected to be of the same declaration (i.e. numeric (18,6)) but it is not.
I have to say that the first one worked in SQL2005 though (returned 1.000000 [The same declared type]) while the second one returned (1.00000000000000000000000 [A total different declaration]).
Not directly related, but could possibly save someone some time with the Arithmetic overflow errors using Sybase ASE (12.5.0.3).
I was setting a few default values in a temporary table which I intended to update later on, and stumbled on to an Arithmetic overflow error.
declare #a numeric(6,3)
select 0.000 as thenumber into #test --indirect declare
select #a = ( select thenumber + 100 from #test )
update #test set thenumber = #a
select * from #test
Shows the error:
Arithmetic overflow during implicit conversion of NUMERIC value '100.000' to a NUMERIC field .
Which in my head should work, but doesn't as the 'thenumber' column wasn't declared ( or indirectly declared as decimal(4,3) ). So you would have to indirectly declare the temp table column with scale and precision to the format you want, as in my case was 000.000.
select 000.000 as thenumber into #test --this solved it
Hopefully that saves someone some time :)