I have a t-sql query that looks like this:
select * from (
SELECT [Id], replace(ca.[AKey], '-', '') as [AKey1], rtrim(replace(replace(replace(lower([Name]), '#', ''), '(1.0)', ''), '(2.5)', '')) as [Name], [Key], dw.[AKey], replace(lower(trim([wName])), '#', '') as [wName]
FROM [dbo].[wTable] ca
FULL JOIN (select * from [dw].[wTable]) dw on
rtrim(left( replace(replace(replace(lower(dw.[wName]), '(1.0)', ''), '(2.5)', ''), '#', ''), 5))+'%'
like
rtrim(left( replace(replace(replace(lower(ca.[Name] ), '(1.0)', ''), '(2.5)', ''), '#', ''), 5))+'%'
and
right(rtrim(replace(replace(replace(lower(dw.[wName]), '(1.0)', ''), '(2.5)', ''), '#', '')), 2)
like
right(rtrim(replace(replace(replace(lower(ca.[Name] ), '(1.0)', ''), '(2.5)', ''), '#', '')), 2)
) tp
As you can see, during the JOIN, it's removing some fuzzy characters that may or may not exist, and it's checking to see if the first 5 characters in the wName column match with the first 5 characters in the Name column, then doing the same for the last 2 characters in the columns.
So essentially, it's matching on the first 5 characters AND last 2 characters.
What I'm trying to add is an additional column that will tell me if the resulting columns are an exact match or if they are fuzzy. In other words, if they are an exact match it should say 'True' or something like that, and if they are a fuzzy match I would ideally like it to tell me how far off they are. For example, how many characters do not match.
As JNevil mentioned you could use Levenshtein. You can also use Damarau-Levenshtein or the Longest Common Substring depending on how accurate you want to get and what your performance expectations are.
Below are two solutions. The first is a Levenshtein solution using a copy I grabbed from Phil Factor here. The Longest Common Substring solution uses my version of the Longest Common Substring which is fastest available for SQL Server (by far).
-- sample data
declare #t1 table (string1 varchar(100));
declare #t2 table (string2 varchar(100));
insert #t1 values ('abc'),('xxyz'),('1234'),('9923');
insert #t2 values ('abcd'),('xyz'),('2345'),('zzz');
-- Levenshtein
select string1, string2, Ld
from
(
select *, Ld = dbo.LEVENSHTEIN(t1.string1, t2.string2)
from #t1 t1
cross join #t2 t2
) compare
where ld <= 2;
-- Longest Common Substring
select string1, string2, lcss = item, lcssLen = itemlen, diff = mx.L-itemLen
from #t1 t1
cross join #t2 t2
cross apply dbo.lcssWindowAB(t1.string1, t2.string2, 20)
cross apply (values (IIF(len(string1) > len(string2), len(string1),len(string2)))) mx(L)
where mx.L-itemLen <= 2;
RESULTS
string1 string2 Ld
-------- -------- -----
abc abcd 1
xxyz xyz 1
1234 2345 2
string1 string2 lcss lcssLen diff
-------- -------- ----- ----------- -----------
abc abcd abc 3 1
xxyz xyz xyz 3 1
1234 2345 234 3 1
9923 2345 23 2 2
This does not answer your question but should get you started.
P.S. The Levenshtein function I posted does have a small bug, it says the distance between "9923" and "2345" is 4, the correct answer would be two. There's other Levenshtein functions out there though.
Related
I can't replace every 2 characters of a string with a '.'
select STUFF('abcdefghi', 3, 1, '.') c3,STUFF('abcdefghi', 5, 1,
'.') c5,STUFF('abcdefghi', 7, 1, '.') c7,STUFF('abcdefghi', 9, 1, '.')
c9
if I use STUFF I should subsequently overlap the strings c3, c5, c7 and c9. but I can't find a method
can you help me?
initial string:
abcdefghi
the result I would like is
ab.de.gh.
the string can be up to 50 characters
Create a numbers / tally / digits table, if you don't have one already, then you can use this to target each character position:
with digits as ( /* This would be a real table, here it's just to test */
select n from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))x(n)
), t as (
select 'abcdefghi' as s
)
select String_Agg( case when d.n%3 = 0 then '.' else Substring(t.s, d.n, 1) end, '')
from t
cross apply digits d
where d.n <Len(t.s)
Using for xml with existing table
with digits as (
select n from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))x(n)
),
r as (
select t.id, case when d.n%3=0 then '.' else Substring(t.s, d.n, 1) end ch
from t
cross apply digits d
where d.n <Len(t.s)
)
select result=(select '' + ch
from r r2
where r2.id=r.id
for xml path('')
)
from r
group by r.id
You can try it like this:
Easiest might be a quirky update ike here:
DECLARE #string VARCHAR(100)='abcdefghijklmnopqrstuvwxyz';
SELECT #string = STUFF(#string,3*A.pos,1,'.')
FROM (SELECT TOP(LEN(#string)/3) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM master..spt_values) A(pos);
SELECT #string;
Better/Cleaner/Prettier was a recursive CTE:
We use a declared table to have some tabular sample data
DECLARE #tbl TABLE(ID INT IDENTITY, SomeString VARCHAR(200));
INSERT INTO #tbl VALUES('')
,('a')
,('ab')
,('abc')
,('abcd')
,('abcde')
,('abcdefghijklmnopqrstuvwxyz');
--the query
WITH recCTE AS
(
SELECT ID
,SomeString
,(LEN(SomeString)+1)/3 AS CountDots
,1 AS OccuranceOfDot
,SUBSTRING(SomeString,4,LEN(SomeString)) AS RestString
,CAST(LEFT(SomeString,2) AS VARCHAR(MAX)) AS Growing
FROM #tbl
UNION ALL
SELECT t.ID
,r.SomeString
,r.CountDots
,r.OccuranceOfDot+2
,SUBSTRING(RestString,4,LEN(RestString))
,CONCAT(Growing,'.',LEFT(r.RestString,2))
FROM #tbl t
INNER JOIN recCTE r ON t.ID=r.ID
WHERE r.OccuranceOfDot/2<r.CountDots-1
)
SELECT TOP 1 WITH TIES ID,Growing
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY OccuranceOfDot DESC);
--the result
1
2 a
3 ab
4 ab
5 ab
6 ab.de
7 ab.de.gh.jk.mn.pq.st.vw.yz
The idea in short
We use a recursive CTE to walk along the string
we add the needed portion together with a dot
We stop, when the remaining length is to short to continue
a little magic is the ORDER BY ROW_NUMBER() OVER() together with TOP 1 WITH TIES. This will allow all first rows (frist per ID) to appear.
I have a table with unstructured data I am trying to analyze to try to build a relational lookup. I do not have use of word cloud software.
I really have no idea how to solve this problem. Searching for solutions has lead me to tools that might do this for me that cost money, not coded solutions.
Basically my data looks like this:
CK1 CK2 Comment
--------------------------------------------------------------
1 A This is a comment.
2 A Another comment here.
And this is what I need to create:
CK1 CK2 Words
--------------------------------------------------------------
1 A This
1 A is
1 A a
1 A comment.
2 A Another
2 A comment
2 A here.
What you are trying to do is tokenize a string using a space as a Delimiter. In the SQL world people often refer to functions that do this as a "Splitter". The potential pitfall of using a splitter for this type of thing is how words can be separated by multiple spaces, tabs, CHAR(10)'s, CHAR(13)'s, CHAR()'s, etc. Poor grammar, such as not adding a space after a period results in this:
" End of sentence.Next sentence"
sentence.Next is returned as a word.
The way I like to tokenize human text is to:
Replace any text that isn't a character with a space
Replace duplicate spaces
Trim the string
Split the newly transformed string using a space as the delimiter.
Below is my solution followed by the DDL to create the functions used.
-- Sample Data
DECLARE #yourtable TABLE (CK1 INT, CK2 CHAR(1), Comment VARCHAR(8000));
INSERT #yourtable (CK1, CK2, Comment)
VALUES
(1,'A','This is a typical comment...Follewed by another...'),
(2,'A','This comment has double spaces and tabs and even carriage
returns!');
-- Solution
SELECT t.CK1, t.CK2, split.itemNumber, split.itemIndex, split.itemLength, split.item
FROM #yourtable AS t
CROSS APPLY samd.patReplace(t.Comment,'[^a-zA-Z ]',' ') AS c1
CROSS APPLY samd.removeDupChar8K(c1.newString,' ') AS c2
CROSS APPLY samd.delimitedSplitAB8K(LTRIM(RTRIM(c2.NewString)),' ') AS split;
Results (truncated for brevity):
CK1 CK2 itemNumber itemIndex itemLength item
----------- ---- -------------------- ----------- ----------- --------------
1 A 1 1 4 This
1 A 2 6 2 is
1 A 3 9 1 a
1 A 4 11 7 typical
1 A 5 19 7 comment
...
2 A 1 1 4 This
2 A 2 6 7 comment
2 A 3 14 3 has
2 A 4 18 6 double
...
Note that the splitter I'm using is based of Jeff Moden's Delimited Split8K with a couple tweeks.
Functions used:
CREATE FUNCTION dbo.rangeAB
(
#low bigint,
#high bigint,
#gap bigint,
#row1 bit
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH L1(N) AS
(
SELECT 1
FROM (VALUES
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0)) T(N) -- 90 values
),
L2(N) AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),
iTally AS (SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)
SELECT r.RN, r.OP, r.N1, r.N2
FROM
(
SELECT
RN = 0,
OP = (#high-#low)/#gap,
N1 = #low,
N2 = #gap+#low
WHERE #row1 = 0
UNION ALL -- COALESCE required in the TOP statement below for error handling purposes
SELECT TOP (ABS((COALESCE(#high,0)-COALESCE(#low,0))/COALESCE(#gap,0)+COALESCE(#row1,1)))
RN = i.rn,
OP = (#high-#low)/#gap+(2*#row1)-i.rn,
N1 = (i.rn-#row1)*#gap+#low,
N2 = (i.rn-(#row1-1))*#gap+#low
FROM iTally AS i
ORDER BY i.rn
) AS r
WHERE #high&#low&#gap&#row1 IS NOT NULL AND #high >= #low AND #gap > 0;
GO
CREATE FUNCTION samd.NGrams8k
(
#string VARCHAR(8000), -- Input string
#N INT -- requested token size
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
position = r.RN,
token = SUBSTRING(#string, CHECKSUM(r.RN), #N)
FROM dbo.rangeAB(1, LEN(#string)+1-#N,1,1) AS r
WHERE #N > 0 AND #N <= LEN(#string);
GO
CREATE FUNCTION samd.patReplace8K
(
#string VARCHAR(8000),
#pattern VARCHAR(50),
#replace VARCHAR(20)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newString =
(
SELECT CASE WHEN #string = CAST('' AS VARCHAR(8000)) THEN CAST('' AS VARCHAR(8000))
WHEN #pattern+#replace+#string IS NOT NULL THEN
CASE WHEN PATINDEX(#pattern,token COLLATE Latin1_General_BIN)=0
THEN ng.token ELSE #replace END END
FROM samd.NGrams8K(#string, 1) AS ng
ORDER BY ng.position
FOR XML PATH(''),TYPE
).value('text()[1]', 'VARCHAR(8000)');
GO
CREATE FUNCTION samd.delimitedSplitAB8K
(
#string VARCHAR(8000), -- input string
#delimiter CHAR(1) -- delimiter
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
itemNumber = ROW_NUMBER() OVER (ORDER BY d.p),
itemIndex = CHECKSUM(ISNULL(NULLIF(d.p+1, 0),1)),
itemLength = CHECKSUM(item.ln),
item = SUBSTRING(#string, d.p+1, item.ln)
FROM (VALUES (DATALENGTH(#string))) AS l(s) -- length of the string
CROSS APPLY
(
SELECT 0 UNION ALL -- for handling leading delimiters
SELECT ng.position
FROM samd.NGrams8K(#string, 1) AS ng
WHERE token = #delimiter
) AS d(p) -- delimiter.position
CROSS APPLY (VALUES( --LEAD(d.p, 1, l.s+l.d) OVER (ORDER BY d.p) - (d.p+l.d)
ISNULL(NULLIF(CHARINDEX(#delimiter,#string,d.p+1),0)-(d.p+1), l.s-d.p))) AS item(ln);
GO
CREATE FUNCTION dbo.RemoveDupChar8K(#string varchar(8000), #char char(1))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT NewString =
replace(replace(replace(replace(replace(replace(replace(
#string COLLATE LATIN1_GENERAL_BIN,
replicate(#char,33), #char), --33
replicate(#char,17), #char), --17
replicate(#char,9 ), #char), -- 9
replicate(#char,5 ), #char), -- 5
replicate(#char,3 ), #char), -- 3
replicate(#char,2 ), #char), -- 2
replicate(#char,2 ), #char); -- 2
GO
1) If we are using SQL Server 2016 and above then we should probably
use the built-in function STRING_SPLIT
-- SQL 2016and above
DECLARE #txt NVARCHAR(100) = N'This is a comment.'
select [value] from STRING_SPLIT(#txt, ' ')
2) Only if 1 does not fit, then if the number of separation (the space in our case) is less then 3 which fit your sample data, then we should probably use PARSENAME
-- BEFORE SQL 2016 if we have less than 4 parts
DECLARE #txt NVARCHAR(100) = N'This is a comment.'
DECLARE #Temp NVARCHAR(200) = REPLACE (#txt,'.','#')
SELECT t FROM (VALUES(1),(2),(3),(4))T1(n)
CROSS APPLY (SELECT REPLACE(PARSENAME(REPLACE(#Temp,' ','.'),T1.n), '#','.'))T2(t)
3) Only if the 1 and 2 does not fit, then we should use SQLCLR function
http://dataeducation.com/sqlclr-string-splitting-part-2-even-faster-even-more-scalable/
4) Only if we cannot use 1,2 and we cannot use SQLCLR (which implies a real problematic administration and has nothing with security since you can have all the SQLCLR function in a read-only database for the use of all users, as I explain in my lectures), then you can use T-SQL and create UDF.
https://sqlperformance.com/2012/07/t-sql-queries/split-strings
This question already has answers here:
Turning a Comma Separated string into individual rows
(16 answers)
Closed 5 years ago.
Suppose I have a table like this with an undetermined number of comma-delimited values in one column:
thingID personID
1 123,234,345
2 456,567
and I want to get it into a form like this:
thingID personID
1 123
1 234
1 345
2 456
2 567
What is my best option for doing this?
Oh I should mention the data is in a SQL 2008 R2 database so I may not be able to use the very latest functionality.
Use CROSS APPLY with a string splitting function.
To find the string splitting function that works best for you, read Aaron Bertrand's Split strings the right way – or the next best way.
For this demonstration I've chosen to use the SplitStrings_XML function, simply because it's the first pure t-sql function in the article:
CREATE FUNCTION dbo.SplitStrings_XML
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
Now that we have a string splitting function, create and populate the sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
thingID int,
personID varchar(max)
)
INSERT INTO #T VALUES
(1, '123,234,345'),
(2, '456,567')
The query:
SELECT thingId, Item
FROM #T
CROSS APPLY dbo.SplitStrings_XML(personID, ',')
Results:
thingId Item
1 123
1 234
1 345
2 456
2 567
You can see a live demo on rextester.
There are several ways to do that. Here are two methods for SQL Server 2008:
XML-Method: requires the string to allow for the xml-trick (no invalid XML chars)
SELECT a.thingID, Split.a.value('.', 'VARCHAR(100)') AS Data
FROM (SELECT OtherID,
CAST('<M>' + REPLACE(personID, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM table1) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
Recursive method:
;WITH tmp(thingID, DataItem, Data) AS (
SELECT thingID, LEFT(personID, CHARINDEX(',', personID + ',') - 1),
STUFF(personID, 1, CHARINDEX(',', personID + ','), '')
FROM table1
UNION ALL
SELECT thingID, LEFT(personID, CHARINDEX(',', personID + ',') - 1),
STUFF(personID, 1, CHARINDEX(',', personID + ','), '')
FROM tmp
WHERE Data > ''
)
SELECT thingID, DataItem AS personID
FROM tmp
I have say a table with as under
Values
--------
12Null345XXX23456
6712356
Expected Output
----------------
23456
123
How can I do this using TSql(prefereable set based) ?
Thanks
So, you want the longest substring of consecutive numeric digits? Maybe something like this (t here is recreating your original table; if it already exists, skip that CTE and use your table)?:
;with ten as (
select i from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) x(i)
), subs as (
select SUBSTRING('0123456789',s.i,l.i) as sub, l.i as run
from ten as s
join ten as l on s.i+l.i<=11
), t as (
select s
from (
values ('12Null345XXX23456'),('6712356'))x(s)
)
select x.sub
from t
cross apply (select top 1 subs.sub
from subs
where CHARINDEX(subs.sub,t.s)>0
order by subs.run desc) x
I've found a small annoyance that I was wondering how to get around...
In a simplified example, say I need to return "TEST B-19" and "TEST B-20"
I have a where clause that looks like:
where [Name] LIKE 'TEST B-[12][90]'
and it works... unless there's a "TEST B-10" or "TEST-B29" value that I don't want.
I'd rather not resort to doing both cases, because in more complex situations that would become prohibitive.
I tried:
where [Name] LIKE 'TEST B-[19-20]'
but of course that doesn't work because it is looking for single characters...
Thoughts? Again, this is a very simple example, I'd be looking for ways to grab ranges from 16 to 32 or 234 to 459 without grabbing all the extra values that could be created.
EDITED to include test examples...
You might see "TEXAS 22" or "THX 99-20-110-B6" or "E-19" or "SOUTHERN B" or "122 FLOWERS" in that field. The presense of digits is common, but not a steadfast rule, and there are absolutely no general patterns for hypens, digits, characters, order, etc.
I would divide the Name column into the text parts and the number parts, and convert the number parts into an integer, and then check if that one was between the values. Something like:
where cast(substring([Name], 7, 2) as integer) between 19 and 20
And, of course, if the possible structure of [Name] is much more complex, you'd have to calculate the values for 7 and 2, not hardcode them....
EDIT: If you want to filter out the ones not conforming to the pattern first, do the following:
where [Name] LIKE '%TEST B-__%'
and cast(substring([Name], CHARINDEX('TEST B-', [Name]) + LEN('TEST B-'), 2) as integer) between 19 and 20
Maybe it's faster using CHARINDEX in place of the LIKE in the topmost line two, especially if you put an index on the computed value, but... That is only optimization... :)
EDIT: Tested the procedure. Given the following data:
jajajajajajajTEST B-100
jajajajajajajTEST B-85
jajajajjTEST B-100
jajjajajTEST B-100
jajajajajajajTEST B-00
jajajajaTEST B-100
jajajajajajajEST B-99
jajajajajajajTEST B-100
jajajajajajajTEST B-19
jajajajjTEST B-100
jajjajajTEST B-120
jajajajajajajTEST B-00
jajajajaTEST B-150
jajajajajajajEST B-20
TEST B-20asdfh asdfkh
The query returns the following rows:
jajajajajajajTEST B-19
TEST B-20asdfh asdfkh
Wildcards or no, you still have to edit the query every time you want to change the range definition. If you're always dealing with a range (and it's not always the same range), you might use parameters. For example:
note: for some reason (this has happened in many other posts as well), when I try to post code beginning with 'declare', SO hangs and times-out. I reported it on meta already, but nobody could reproduce it (including me). Here it's happening again, so I took the 'D' off, and now it works. I'll come back tomorrow, and it will let me put the 'D' back on.
DECLARE #min varchar(5)
DECLARE #max varchar(5)
SET #min = 'B-19'
SET #max = 'B-20'
SELECT
...
WHERE NAME BETWEEN #min AND #max
You should avoid formatting [NAME] as others have suggested (using function on it) -- this way, your search can benefit from an index on it.
In any case -- you might re-consider your table structure. It sounds like 'TEST B-19' is a composite (non-normalized) value of category ('TEST') + sub-category ('B') + instance ('19'). Put it in a lookup table with 4 columns (id being the first), and then join it by id in whatever query needs to output the composite value. This will make searching and indexing much easier and faster.
In the absence of test data, I generated my own. I just removed the Test B- prefix, converted to int and did a Between
With Numerals As
(
Select top 100 row_number() over (order by name) TestNumeral
from sys.columns
),
TestNumbers AS
(
Select 'TEST B-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
)
Select *
From TestNumbers
Where Cast (Replace (TestNumber, 'TEST B-', '') as Integer) between 1 and 16
This gave me
TestNumber
-------------------------------------
TEST B-1
TEST B-2
TEST B-3
TEST B-4
TEST B-5
TEST B-6
TEST B-7
TEST B-8
TEST B-9
TEST B-10
TEST B-11
TEST B-12
TEST B-13
TEST B-14
TEST B-15
TEST B-16
This means, however, that if you have different strategies for naming tests, you would have to remove all different kinds of prefixes.
Now, on the other hand, if your Test numbers are in the TEST-Space-TestType-Hyphen-TestNumber format, you could use PatIndex and SubString
With Numerals As
(
Select top 100 row_number() over (order by name) TestNumeral
from sys.columns
),
TestNumbers AS
(
Select 'TEST B-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
Where TestNumeral Between 10 and 19
UNION
Select 'TEST A-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
Where TestNumeral Between 20 and 29
)
Select *
From TestNumbers
Where Cast (SubString (TestNumber, PATINDEX ('%-%', TestNumber)+1, Len (TestNumber) - PATINDEX ('%-%', TestNumber)) as Integer) between 16 and 26
That should yield the following
TestNumber
-------------------------------------
TEST A-20
TEST A-21
TEST A-22
TEST A-23
TEST A-24
TEST A-25
TEST A-26
TEST B-16
TEST B-17
TEST B-18
TEST B-19
All of your examples seem to have the test numbers at the end. So if you can create a table of patterns and then JOIN using a LIKE statement, you may be able make it work. Here is an example:
;
With TestNumbers As
(
select 'E-1' TestNumber
union select 'E-2'
union select 'E-3'
union select 'E-4'
union select 'E-5'
union select 'E-6'
union select 'E-7'
union select 'SOUTHERN B1'
union select 'SOUTHERN B2'
union select 'SOUTHERN B3'
union select 'SOUTHERN B4'
union select 'SOUTHERN B5'
union select 'SOUTHERN B6'
union select 'SOUTHERN B7'
union select 'Southern CC'
union select 'Southern DD'
union select 'Southern EE'
union select 'TEST B-1'
union select 'TEST B-2'
union select 'TEST B-3'
union select 'TEST B-4'
union select 'TEST B-5'
union select 'TEST B-6'
union select 'TEST B-7'
union select 'TEXAS 1'
union select 'TEXAS 2'
union select 'TEXAS 3'
union select 'TEXAS 4'
union select 'TEXAS 5'
union select 'TEXAS 6'
union select 'TEXAS 7'
union select 'THX 99-20-110-B1'
union select 'THX 99-20-110-B2'
union select 'THX 99-20-110-B3'
union select 'THX 99-20-110-B4'
union select 'THX 99-20-110-B5'
union select 'THX 99-20-110-B6'
union select 'THX 99-20-110-B7'
union select 'Southern AA'
union select 'Southern CC'
union select 'Southern DD'
union select 'Southern EE'
),
Prefixes as
(
Select 'TEXAS ' TestPrefix
Union Select 'THX 99-20-110-B'
Union Select 'E-'
Union Select 'SOUTHERN B'
Union Select 'TEST B-'
)
Select TN.TestNumber
From TestNumbers TN, Prefixes P
Where 1=1
And TN.TestNumber Like '%' + P.TestPrefix + '%'
And Cast (REPLACE (Tn.TestNumber, p.TestPrefix, '') AS INTEGER) between 4 and 6
This will give you
TestNumber
----------------
E-4
E-5
E-6
SOUTHERN B4
SOUTHERN B5
SOUTHERN B6
TEST B-4
TEST B-5
TEST B-6
TEXAS 4
TEXAS 5
TEXAS 6
THX 99-20-110-B4
THX 99-20-110-B5
THX 99-20-110-B6
(15 row(s) affected)
Is this acceptable:
WHERE [Name] IN ( 'TEST B-19', 'TEST B-20' )
The list of values can come from a subquery, e.g.:
WHERE [Name] IN ( SELECT [Name] FROM Elsewhere WHERE ... )