Returning values based on delimited string entries

Returning values based on delimited string entries - tsql

In TSQL, the string in the database record is 'A/A/A' or 'A/B/A' (examples). I want to parse the string and for the first instance return '1'; in the 2nd instance, return '2'. That is, if all the values between the separators are the same, return a value; otherwise return another value. What is the best way to do this?

A bit blind answer:
Read the whole value in a variable. Read the first value part in another:
declare #entire nvarchar(max), #single nvarchar(max)
select/set #entire=....
set #single=left(#entire,charindex('/',#entire)-1)
Compare entire with #single replicated after removing slashes:
set #entire=replace(#entire,'/','')
select case when replicate(#single,len(#entire)/len(#single))=#entire
then 1 else 0 end as [What you want]

Something like this should work:
SELECT
x.*,
CASE
WHEN N > 1 THEN 0
ELSE 1
END Result
FROM (
SELECT
t.Column1,
t.Column2,
t.Column3,
t.SomeColumn,
COUNT(DISTINCT s.value) N
FROM dbo.YourTable t
OUTER APPLY STRING_SPLIT(t.SomeColumn,'/') s
GROUP BY
t.Column1,
t.Column2,
t.Column3,
t.SomeColumn
) x
;

Based on your simple example (no edge cases accounted for) the following should work for you:
select string, iif(replace(s,v,'')='',1,0) as Result
from t
cross apply (
values(left(string,charindex('/', string)-1),(replace(string,'/','')))
)s(v,s);
Example Fiddle

Related

T-SQL Join on foreign key that has leading zero

I need to link various tables that each have a common key (a serial number in this case). In some tables the key has a leading zero e.g. '037443' and on others it doesn't e.g. '37443'. In both cases the serial refers to the same product. To confound things serial 'numbers' are not always just numeric e.g. may be "BDO1234", in these cases there is never a leading zero.
I'd prefer to use the WHERE statement (WHERE a.key = b.key) but could use joins if required. Is there any way to do this?
I'm still learning so please keep it simple if possible. Many thanks.

Based on the accepted answer in this link, I've written a small tsql sample to show you what I meant by 'the right direction':
Create the test table:
CREATE TABLE tblTempTest
(
keyCol varchar(20)
)
GO
Populate it:
INSERT INTO tblTempTest VALUES
('1234'), ('01234'), ('10234'), ('0k234'), ('k2304'), ('00034')
Select values:
SELECT keyCol,
SUBSTRING(keyCol, PATINDEX('%[^0]%', keyCol + '.'), LEN(keyCol)) As trimmed
FROM tblTempTest
Results:
keyCol trimmed
-------------------- --------------------
1234 1234
01234 1234
10234 10234
0k234 k234
k2304 k2304
00034 34
Cleanup:
DROP TABLE tblTempTest
Note that the values are alpha-numeric, and only leading zeroes are trimmed.
One possible drawback is that if there is a 0 after a white space it will not be trimmed, but that's an easy fix - just add ltrim:
SUBSTRING(LTRIM(keyCol), PATINDEX('%[^0]%', LTRIM(keyCol + '.')), LEN(keyCol)) As trimmed

You need to create a function
CREATE FUNCTION CompareSerialNumbers(#SerialA varchar(max), #SerialB varchar(max))
RETURNS bit
AS
BEGIN
DECLARE #ReturnValue AS bit
IF (ISNUMERIC(#SerialA) = 1 AND ISNUMERIC(#SerialB) = 1)
SELECT #ReturnValue =
CASE
WHEN CAST(#SerialA AS int) = CAST(#SerialB AS int) THEN 1
ELSE 0
END
ELSE
SELECT #ReturnValue =
CASE
WHEN #SerialA = #SerialB THEN 1
ELSE 0
END
RETURN #ReturnValue
END;
GO
If both are numeric then it compares them as integers otherwise it compares them as strings.

sp_executesql vs user defined scalar function

In the table below I am storing some conditions like this:
Then, generally, in second table, I am having the following records:
and what I need is to compare these values using the right condition and store the result ( let's say '0' for false, and '1' for true in additional column).
I am going to do this in a store procedure and basically I am going to compare from several to hundreds of records.
What of the possible solution is to use sp_executesql for each row building dynamic statements and the other is to create my own scalar function and to call it for eacy row using cross apply.
Could anyone tell which is the more efficient way?
Note: I know that the best way to answer this is to make the two solutions and test, but I am hoping that there might be answered of this, based on other stuff like caching and SQL internal optimizations and others, which will save me a lot of time because this is only part of a bigger problem.

I don't see the need in use of sp_executesql in this case. You can obtain result for all records at once in a single statement:
select Result = case
when ct.Abbreviation='=' and t.ValueOne=t.ValueTwo then 1
when ct.Abbreviation='>' and t.ValueOne>t.ValueTwo then 1
when ct.Abbreviation='>=' and t.ValueOne>=t.ValueTwo then 1
when ct.Abbreviation='<=' and t.ValueOne<=t.ValueTwo then 1
when ct.Abbreviation='<>' and t.ValueOne<>t.ValueTwo then 1
when ct.Abbreviation='<' and t.ValueOne<t.ValueTwo then 1
else 0 end
from YourTable t
join ConditionType ct on ct.ID = t.ConditionTypeID
and update additional column with something like:
;with cte as (
select t.AdditionalColumn, Result = case
when ct.Abbreviation='=' and t.ValueOne=t.ValueTwo then 1
when ct.Abbreviation='>' and t.ValueOne>t.ValueTwo then 1
when ct.Abbreviation='>=' and t.ValueOne>=t.ValueTwo then 1
when ct.Abbreviation='<=' and t.ValueOne<=t.ValueTwo then 1
when ct.Abbreviation='<>' and t.ValueOne<>t.ValueTwo then 1
when ct.Abbreviation='<' and t.ValueOne<t.ValueTwo then 1
else 0 end
from YourTable t
join ConditionType ct on ct.ID = t.ConditionTypeID
)
update cte
set AdditionalColumn = Result
If above logic is supposed to be applied in many places, not just over one table, then yes you may think about function. Though I would used rather inline table-valued function (not scalar), because of there is overhead imposed with use of user defined scalar functions (to call and return, and the more rows to be processed the more time wastes).
create function ftComparison
(
#v1 float,
#v2 float,
#cType int
)
returns table
as return
select
Result = case
when ct.Abbreviation='=' and #v1=#v2 then 1
when ct.Abbreviation='>' and #v1>#v2 then 1
when ct.Abbreviation='>=' and #v1>=#v2 then 1
when ct.Abbreviation='<=' and #v1<=#v2 then 1
when ct.Abbreviation='<>' and #v1<>#v2 then 1
when ct.Abbreviation='<' and #v1<#v2 then 1
else 0
end
from ConditionType ct
where ct.ID = #cType
which can be applied then as:
select f.Result
from YourTable t
cross apply ftComparison(ValueOne, ValueTwo, t.ConditionTypeID) f
or
select f.Result
from YourAnotherTable t
cross apply ftComparison(SomeValueColumn, SomeOtherValueColumn, #someConditionType) f

Recursive replace from a table of characters

In short, I am looking for a single recursive query that can perform multiple replaces over one string. I have a notion it can be done, but am failing to wrap my head around it.
Granted, I'd prefer the biz-layer of the application, or even the CLR, to do the replacing, but these are not options in this case.
More specifically, I want to replace the below mess - which is C&P in 8 different stored procedures - with a TVF.
SET #temp = REPLACE(RTRIM(#target), '~', '-')
SET #temp = REPLACE(#temp, '''', '-')
SET #temp = REPLACE(#temp, '!', '-')
SET #temp = REPLACE(#temp, '#', '-')
SET #temp = REPLACE(#temp, '#', '-')
-- 23 additional lines reducted
SET #target = #temp
Here is where I've started:
-- I have a split string TVF called tvf_SplitString that takes a string
-- and a splitter, and returns a table with one row for each element.
-- EDIT: tvf_SplitString returns a two-column table: pos, element, of which
-- pos is simply the row_number of the element.
SELECT REPLACE('A~B!C#D#C!B~A', MM.ELEMENT, '-') TGT
FROM dbo.tvf_SplitString('~-''-!-#-#', '-') MM
Notice I've joined all the offending characters into a single string separated by '-' (knowing that '-' will never be one of the offending characters), which is then split. The result from this query looks like:
TGT
------------
A-B!C#D#C!B-A
A~B!C#D#C!B~A
A~B-C#D#C-B~A
A~B!C-D-C!B~A
A~B!C#D#C!B~A
So, the replace clearly works, but now I want it to be recursive so I can pull the top 1 and eventually come out with:
TGT
------------
A-B-C-D-C-B-A
Any ideas on how to accomplish this with one query?
EDIT: Well, actual recursion isn't necessary if there's another way. I'm pondering the use of a table of numbers here, too.

You can use this in a scalar function. I use it to remove all control characters from some external input.
SELECT #target = REPLACE(#target, invalidChar, '-')
FROM (VALUES ('~'),(''''),('!'),('#'),('#')) AS T(invalidChar)

I figured it out. I failed to mention that the tvf_SplitString function returns a row number as "pos" (although a subquery assigning row_number could also have worked). With that fact, I could control cross join between the recursive call and the split.
-- the cast to varchar(max) matches the output of the TVF, otherwise error.
-- The iteration counter is joined to the row number value from the split string
-- function to ensure each iteration only replaces on one character.
WITH XX AS (SELECT CAST('A~B!C#D#C!B~A' AS VARCHAR(MAX)) TGT, 1 RN
UNION ALL
SELECT REPLACE(XX.TGT, MM.ELEMENT, '-'), RN + 1 RN
FROM XX, dbo.tvf_SplitString('~-''-!-#-#', '-') MM
WHERE XX.RN = MM.pos)
SELECT TOP 1 XX.TGT
FROM XX
ORDER BY RN DESC
Still, I'm open to other suggestions.

SQL invalid conversion return null instead of throwing error

I have a table with a varchar column, and I want to find values that match a certain number. So lets say that column contains the following entries (except with millions of rows in real life):
123456789012
2345678
3456
23 45
713?2
00123456789012
So I decide I want all the rows which are numerically 123456789012 write a statement that looks something like this:
SELECT * FROM MyTable WHERE CAST(MyColumn as bigint) = 123456789012
It should return the first and last row, but instead the whole query blows up because it can't convert the "23 45" and "713?2" to bigint.
Is there another way to do the conversion that will return NULL for values that can't convert?

SQL Server does NOT guarantee boolean operator short-circuit, see On SQL Server boolean operator short-circuit. So all solution using ISNUMERIC(...) AND CAST(...) are fundamentally flawed (they may work, but hey can arbitrarily fail later dependiong on the generated plan). A better solution is using CASE, as Thomas suggests: CASE ISNUMERIC(...) WHEN 1 THEN CAST(...) ELSE NULL END. But, as gbn pointed out, ISNUMERIC is notoriously finicky in identifying what 'numeric' means and many cases where one would expect it to return 0 it returns 1. So mixing the CASE with the LIKE:
CASE WHEN MyRow NOT LIKE '%[^0-9]%' THEN CAST(MyRow as bigint) ELSE NULL END
But the real problem is that if you have millions of rows and you have to search them like this, you'll always end up scanning end-to-end since the expression is not SARG-able (no matter how we rewrite it). The real issue here is data purity, and should be addressed at the appropriate level, where the data is populated. Another thing to consider is if is possible to create a persisted computed column with this expression and create a filtered index on it which eliminates NULL (ie. non-numeric). That would speed up things a little.

If you are using SQL Server 2012 you can use the 2 new methods:
TRY_CAST()
TRY_CONVERT()
Both methods are equivalent. They return a value cast to the specified data type if the cast succeeds; otherwise, returns null. The only difference is that CONVERT is SQL Server specific, CAST is ANSI. using CAST will make your code more portable (although not sure if any other database provider implements TRY_CAST)

ISNUMERIC will accept empty string and values like 1.23 or 5E-04 so could be unreliable.
And you don't know what order things will be evaluated in so it could still fail (SQL is declarative, not procedural, so the WHERE clause probably won't be evaluated left to right)
So:
you want to accept value that consist only of the characters 0-9
you need to materialise the "number" filter so it's applied before CAST
Something like:
SELECT
*
FROM
(
SELECT TOP 2000000000 *
FROM MyTable
WHERE MyColumn NOT LIKE '%[^0-9]%' --double negative rejects anything except 0-9
ORDER BY MyColumn
) foo
WHERE
CAST(MyColumn as bigint) = 123456789012 --applied after number check
Edit: quick example that fails.
CREATE TABLE #foo (bigintstring varchar(100))
INSERT #foo (bigintstring )VALUES ('1.23')
INSERT #foo (bigintstring )VALUES ('1 23')
INSERT #foo (bigintstring )VALUES ('123')
SELECT * FROM #foo
WHERE
ISNUMERIC(bigintstring) = 1
AND
CAST(bigintstring AS bigint) = 123

SELECT *
FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as float) = 123456789012

The ISNUMERIC() function should give you what you need.
SELECT * FROM MyTable
WHERE ISNUMERIC(MyRow) = 1
AND CAST(MyRow as bigint) = 123456789012
And to add a case statement like Thomas suggested:
SELECT * FROM MyTable
WHERE CASE(ISNUMERIC(MyRow)
WHEN 1 THEN CAST(MyRow as bigint)
ELSE NULL
END = 123456789012
http://msdn.microsoft.com/en-us/library/ms186272.aspx

SELECT *
FROM MyTable
WHERE (ISNUMERIC(MyColumn) = 1) AND (CAST(MyColumn as bigint) = 123456789012)
Additionally you can use a CASE statement in order to get null values.
SELECT
CASE
WHEN (ISNUMERIC(MyColumn) = 1) THEN CAST(MyColumn as bigint)
ELSE NULL
END AS 'MyColumnAsBigInt'
FROM tableName
If you require additional filtering, for numerics which are not valid to be cast to bigint, you can use the following instead of ISNUMERIC:
PATINDEX('%[^0-9]%',MyColumn)) = 0
If you need decimal values instead of integers, cast to float instead and change the regex to '%[^0-9.]%'

How can I query 'between' numeric data on a not numeric field?

I've got a query that I've just found in the database that is failing causing a report to fall over. The basic gist of the query:
Select *
From table
Where IsNull(myField, '') <> ''
And IsNumeric(myField) = 1
And Convert(int, myField) Between #StartRange And #EndRange
Now, myField doesn't contain numeric data in all the rows [it is of nvarchar type]... but this query was obviously designed such that it only cares about rows where the data in this field is numeric.
The problem with this is that T-SQL (near as I understand) doesn't shortcircuit the Where clause thus causing it to ditch out on records where the data is not numeric with the exception:
Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the nvarchar value '/A' to data type int.
Short of dumping all the rows where myField is numeric into a temporary table and then querying that for rows where the field is in the specified range, what can I do that is optimal?
My first parse purely to attempt to analyse the returned data and see what was going on was:
Select *
From (
Select *
From table
Where IsNull(myField, '') <> ''
And IsNumeric(myField) = 1
) t0
Where Convert(int, myField) Between #StartRange And #EndRange
But I get the same error I did for the first query which I'm not sure I understand as I'm not converting any data that shouldn't be numeric at this point. The subquery should only have returned rows where myField contains numeric data.
Maybe I need my morning tea, but does this make sense to anyone? Another set of eyes would help.
Thanks in advance

IsNumeric only tells you that the string can be converted to one of the numeric types in SQL Server. It may be able to convert it to money, or to a float, but may not be able to convert it to an int.
Change your
IsNumeric(myField) = 1
to be:
not myField like '%[^0-9]%' and LEN(myField) < 9
(that is, you want myField to contain only digits, and fit in an int)
Edit examples:
select ISNUMERIC('.'),ISNUMERIC('£'),ISNUMERIC('1d9')
result:
----------- ----------- -----------
1 1 1
(1 row(s) affected)

You'd have to force SQL to evaluate the expressions in a certain order.
Here is one solution
Select *
From ( TOP 2000000000
Select *
From table
Where IsNumeric(myField) = 1
And IsNull(myField, '') <> ''
ORDER BY Key
) t0
Where Convert(int, myField) Between #StartRange And #EndRange
and another
Select *
From table
Where
CASE
WHEN IsNumeric(myField) = 1 And IsNull(myField, '') <> ''
THEN Convert(int, myField) ELSE #StartRange-1
END Between #StartRange And #EndRange
The first technique is "intermediate materialisation": it forces a sort on a working table.
The 2nd relies on CASE ORDER evaluation is guaranteed
Neither is pretty or whizzy
SQL is declarative: you tell the optimiser what you want, not how to do it. The tricks above force things to be done in a certain order.

Not sure if this helps you, but I did read somewhere that incorrect conversion using CONVERT will always generate error in SQL. So I think it would be better to use CASE in where clause to avoid having CONVERT to run on all rows

Use a CASE statement.
declare #StartRange int
declare #EndRange int
set #StartRange = 1
set #EndRange = 3
select *
from TestData
WHERE Case WHEN ISNUMERIC(Value) = 0 THEN 0
WHEN Value IS NULL THEN 0
WHEN Value = '' THEN 0
WHEN CONVERT(int, Value) BETWEEN #StartRange AND #EndRange THEN 1
END = 1

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Returning values based on delimited string entries - tsql

Something like this should work: SELECT x.*, CASE WHEN N > 1 THEN 0 ELSE 1 END Result FROM ( SELECT t.Column1, t.Column2, t.Column3, t.SomeColumn, COUNT(DISTINCT s.value) N FROM dbo.YourTable t OUTER APPLY STRING_SPLIT(t.SomeColumn,'/') s GROUP BY t.Column1, t.Column2, t.Column3, t.SomeColumn ) x ;

Based on your simple example (no edge cases accounted for) the following should work for you: select string, iif(replace(s,v,'')='',1,0) as Result from t cross apply ( values(left(string,charindex('/', string)-1),(replace(string,'/',''))) )s(v,s); Example Fiddle

Related

T-SQL Join on foreign key that has leading zero

sp_executesql vs user defined scalar function

Recursive replace from a table of characters

SQL invalid conversion return null instead of throwing error

How can I query 'between' numeric data on a not numeric field?

Categories

Resources