TSQL Fuzzy address matching grouping, 2019 Edition

TSQL Fuzzy address matching grouping, 2019 Edition - tsql

I have this situation where people asked to group on bad addresses. And I need to work on the tools/env I have, I don't have choice for Google API or 3rd party Data Science tools. I also did my HW, see posts several years old, so still want to check all if any updates available.
In my scenario people want to group IDs 1-6 into single, rest I added for neg test.
SELECT * INTO #t FROM ( --test data: select * from #t drop table #t
SELECT 1 Id, '1 CROLANA HEIGHTS' Adr UNION -- A vs O
SELECT 2 Id, '1 CROLONA HEIGHTS' Adr union
SELECT 3 Id, '1 CROLONA HEIGHT DRIVE' Adr union
SELECT 4 Id,'1 CROLONA HEIGHTS DR' Adr union
SELECT 5 Id, '1 CROLONA HGHTS DR' Adr union
SELECT 6 Id, '1 CROLONA HTS DR' Adr UNION
---------------------------------------- rest should not match
SELECT 7 Id, '1 CORWING DR' Adr UNION
SELECT 8 Id, '1 SUNNYHILL DRIVE' Adr UNION
SELECT 9 Id, '1 CROWN HILL DR' Adr UNION
SELECT 10 Id, '1 ADDISON DRv' Adr ) a
------------------- and below is my fuzzy working script which can be improved)
SELECT id, adr, LEAD(adr,1) OVER ( ORDER BY adr ) adr_lead,
SOUNDEX(adr) Sdx, DIFFERENCE(adr, LEAD(adr,1) OVER ( ORDER BY adr )) diff
--- SOUNDEX(adr), COUNT(*) c
FROM #t
--GROUP BY SOUNDEX(adr)
WHERE SOUNDEX(adr) = SOUNDEX('1 CROLANA HEIGHTS')

There is suggestions which I gladly take. I'm using intell replace at the end of string and standalone words to improve data.
DECLARE #st VARCHAR(100) = 'La_Beg_10 La_midleMacy La' --replace et the end of string
SELECT 'ryba', #st, '-->' f, CASE WHEN #st LIKE '%' + ' La'
THEN SUBSTRING(#st,1,LEN(#st) - LEN('La')) + 'Lane' ELSE #st END N

Related

How to collapse overlapping date periods with acceptable gaps using T-SQL?

We want to group our members' enrollments into "continuous enrollments," allowing for a gap of up to 45 days. I know how to use LEAD to determine if an enrollment should be grouped with the next, but I don't know how to group them. Would it be more appropriate to add 45 to the term date and subtract 45 from the effective date, then check for overlapping date periods? My goal is to have a SQL view that returns the results similar to the final query below. Thank you for your help.
SELECT '101' AS MemID, '2021-01-01' AS EffDate, '2021-01-31' AS TermDate INTO #T1 UNION
SELECT '101', '2021-02-01', '2021-02-28' UNION
SELECT '101', '2021-03-01', '2021-03-31' UNION
SELECT '101', '2021-06-01', '2021-06-30' UNION
SELECT '999', '2021-01-01', '2021-01-15' UNION
SELECT '999', '2021-09-01', '2021-09-28' UNION
SELECT '999', '2021-10-01', '2021-10-31'
SELECT *
, LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS LeadEffDate
, DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate))) AS DaysToNextEnrollment
, CASE WHEN (DATEDIFF(DAY, TermDate, (LEAD(EffDate) OVER (PARTITION BY MemID ORDER BY EffDate)))) <= 45 THEN 1 ELSE 0 END AS CombineWithNextRecord
FROM #T1
-- result objective
SELECT 101 AS MemID, '2021-01-01' AS EffDate, '2021-03-31' AS TermDate UNION
SELECT 101, '2021-06-01', '2021-06-30' UNION
SELECT 999, '2021-01-01', '2021-01-15' UNION
SELECT 999, '2021-09-01', '2021-10-31'

I think you are really close. Your question is very similar to
TSQL - creating from-to date table while ignoring in-between steps with conditions with a logic difference on what you want to consider to be the same group.
My basic approach is to use the LAG() function to figure out the previous values for MemID and TermDate and combine that with your 45 day rule to define a group. And finally get the first and last values of each group.
Here is my response to that question modified to your situation.
SELECT
a4.MemID
, CONVERT (DATE, a4.First_EffDate) AS [EffDate]
, CONVERT (DATE, a4.TermDate) AS [TermDate]
FROM (
SELECT
a3.MemID
, a3.EffDate
, a3.TermDate
, a3.MemID_group
, FIRST_VALUE (a3.EffDate) OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate) AS [First_EffDate]
, ROW_NUMBER () OVER (PARTITION BY a3.MemID_group ORDER BY a3.EffDate DESC) AS [Row_number]
FROM (
SELECT
a2.MemID
, a2.EffDate
, a2.TermDate
, a2.Previous_MemID
, a2.Previous_TermDate
, a2.New_group
, SUM (a2.New_group) OVER (ORDER BY a2.MemID, a2.EffDate) AS [MemID_group]
FROM (
SELECT
a1.MemID
, a1.EffDate
, a1.TermDate
, a1.Previous_MemID
, a1.Previous_TermDate
---------------------------------------------------------------------------------
-- new group if the MemID is different from the previous row OR
-- if the MemID is the same as the previous row AND it has been more than 45 days
-- between the TermDate of the previous row and the EffDate of the current row
,
IIF((a1.MemID <> a1.Previous_MemID)
OR (
a1.MemID = a1.Previous_MemID
AND DATEDIFF (DAY, a1.Previous_TermDate, a1.EffDate) > 45
)
, 1
, 0) AS [New_group]
---------------------------------------------------------------------------------
FROM (
SELECT
MemID
, EffDate
, TermDate
, LAG (MemID) OVER (ORDER BY MemID) AS [Previous_MemID]
, LAG (TermDate) OVER (PARTITION BY MemID ORDER BY EffDate) AS [Previous_TermDate]
FROM #T1
) a1
) a2
) a3
) a4
WHERE a4.[Row_number] = 1;
Here is the dbfiddle.

TSQL - in a string, replace a character with a fixed one every 2 characters

I can't replace every 2 characters of a string with a '.'
select STUFF('abcdefghi', 3, 1, '.') c3,STUFF('abcdefghi', 5, 1,
'.') c5,STUFF('abcdefghi', 7, 1, '.') c7,STUFF('abcdefghi', 9, 1, '.')
c9
if I use STUFF I should subsequently overlap the strings c3, c5, c7 and c9. but I can't find a method
can you help me?
initial string:
abcdefghi
the result I would like is
ab.de.gh.
the string can be up to 50 characters

Create a numbers / tally / digits table, if you don't have one already, then you can use this to target each character position:
with digits as ( /* This would be a real table, here it's just to test */
select n from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))x(n)
), t as (
select 'abcdefghi' as s
)
select String_Agg( case when d.n%3 = 0 then '.' else Substring(t.s, d.n, 1) end, '')
from t
cross apply digits d
where d.n <Len(t.s)
Using for xml with existing table
with digits as (
select n from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))x(n)
),
r as (
select t.id, case when d.n%3=0 then '.' else Substring(t.s, d.n, 1) end ch
from t
cross apply digits d
where d.n <Len(t.s)
)
select result=(select '' + ch
from r r2
where r2.id=r.id
for xml path('')
)
from r
group by r.id

You can try it like this:
Easiest might be a quirky update ike here:
DECLARE #string VARCHAR(100)='abcdefghijklmnopqrstuvwxyz';
SELECT #string = STUFF(#string,3*A.pos,1,'.')
FROM (SELECT TOP(LEN(#string)/3) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM master..spt_values) A(pos);
SELECT #string;
Better/Cleaner/Prettier was a recursive CTE:
We use a declared table to have some tabular sample data
DECLARE #tbl TABLE(ID INT IDENTITY, SomeString VARCHAR(200));
INSERT INTO #tbl VALUES('')
,('a')
,('ab')
,('abc')
,('abcd')
,('abcde')
,('abcdefghijklmnopqrstuvwxyz');
--the query
WITH recCTE AS
(
SELECT ID
,SomeString
,(LEN(SomeString)+1)/3 AS CountDots
,1 AS OccuranceOfDot
,SUBSTRING(SomeString,4,LEN(SomeString)) AS RestString
,CAST(LEFT(SomeString,2) AS VARCHAR(MAX)) AS Growing
FROM #tbl
UNION ALL
SELECT t.ID
,r.SomeString
,r.CountDots
,r.OccuranceOfDot+2
,SUBSTRING(RestString,4,LEN(RestString))
,CONCAT(Growing,'.',LEFT(r.RestString,2))
FROM #tbl t
INNER JOIN recCTE r ON t.ID=r.ID
WHERE r.OccuranceOfDot/2<r.CountDots-1
)
SELECT TOP 1 WITH TIES ID,Growing
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY OccuranceOfDot DESC);
--the result
1
2 a
3 ab
4 ab
5 ab
6 ab.de
7 ab.de.gh.jk.mn.pq.st.vw.yz
The idea in short
We use a recursive CTE to walk along the string
we add the needed portion together with a dot
We stop, when the remaining length is to short to continue
a little magic is the ORDER BY ROW_NUMBER() OVER() together with TOP 1 WITH TIES. This will allow all first rows (frist per ID) to appear.

Checking Slowly Changing Dimension 2

I have a table that looks like this:
A slowly changing dimension type 2, according to Kimball.
Key is just a surrogate key, a key to make rows unique.
As you can see there are three rows for product A.
Timelines for this product are ok. During time the description of the product changes.
From 1-1-2020 up until 4-1-2020 the description of this product was ProdA1.
From 5-1-2020 up until 12-2-2020 the description of this product was ProdA2 etc.
If you look at product B, you see there are gaps in the timeline.
We use DB2 V12 z/Os. How can I check if there are gaps in the timelines for each and every product?
Tried this, but doesn't work
with selectie (key, tel) as
(select product, count(*)
from PROD_TAB
group by product
having count(*) > 1)
Select * from
PROD_TAB A
inner join selectie B
on A.product = B.product
Where not exists
(SELECT 1 from PROD_TAB C
WHERE A.product = C.product
AND A.END_DATE + 1 DAY = C.START_DATE
)
Does anyone know the answer?

The following query returns all gaps for all products.
The idea is to enumerate (RN column) all periods inside each product by START_DATE and join each record with its next period record.
WITH
/*
MYTAB (PRODUCT, DESCRIPTION, START_DATE, END_DATE) AS
(
SELECT 'A', 'ProdA1', DATE('2020-01-01'), DATE('2020-01-04') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA2', DATE('2020-01-05'), DATE('2020-02-12') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'A', 'ProdA3', DATE('2020-02-13'), DATE('2020-12-31') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB1', DATE('2020-01-05'), DATE('2020-01-09') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB2', DATE('2020-01-12'), DATE('2020-03-14') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB3', DATE('2020-03-15'), DATE('2020-04-18') FROM SYSIBM.SYSDUMMY1
UNION ALL SELECT 'B', 'ProdB4', DATE('2020-04-16'), DATE('2020-05-03') FROM SYSIBM.SYSDUMMY1
)
,
*/
MYTAB_ENUM AS
(
SELECT
T.*
, ROWNUMBER() OVER (PARTITION BY PRODUCT ORDER BY START_DATE) RN
FROM MYTAB T
)
SELECT A.PRODUCT, A.END_DATE + 1 START_DT, B.START_DATE - 1 END_DT
FROM MYTAB_ENUM A
JOIN MYTAB_ENUM B ON B.PRODUCT = A.PRODUCT AND B.RN = A.RN + 1
WHERE A.END_DATE + 1 <> B.START_DATE
AND A.END_DATE < B.START_DATE;
The result is:
|PRODUCT|START_DT |END_DT |
|-------|----------|----------|
|B |2020-01-10|2020-01-11|
May be more efficient way:
WITH MYTAB2 AS
(
SELECT
T.*
, LAG(END_DATE) OVER (PARTITION BY PRODUCT ORDER BY START_DATE) END_DATE_PREV
FROM MYTAB T
)
SELECT PRODUCT, END_DATE_PREV + 1 START_DATE, START_DATE - 1 END_DATE
FROM MYTAB2
WHERE END_DATE_PREV + 1 <> START_DATE
AND END_DATE_PREV < START_DATE;

Thnx Mark, will try this one of these days.
Never heard of LAG in DB2 V12 for z/Os
Will read about it
Thnx

Select Date and Count, Group By Date -- How to show Dates with NULL Counts?

SELECT
CAST(c.DT AS DATE) AS 'Date'
, COUNT(p.PatternID) AS 'Count'
FROM CalendarMain c
LEFT OUTER JOIN Pattern p
ON c.DT = p.PatternDate
INNER JOIN Result r
ON p.PatternID = r.PatternID
INNER JOIN Detail d
ON p.PatternID = d.PatternID
WHERE r.Type = 7
AND d.Panel = 501
AND CAST(c.DT AS DATE)
BETWEEN '20190101' AND '20190201'
GROUP BY CAST(c.DT AS DATE)
ORDER BY CAST(c.DT AS DATE)
The query above isn't working for me. It still skips days where the COUNT is NULL for it's c.DT.
c.DT and p.PatternDate are both time DateTime, although c.DT can't be NULL. It is actually the PK for the table. It is populated as DateTimes for every single day from 2015 to 2049, so the records for those days exist.
Another weird thing I noticed is that nothing returns at all when I join C.DT = p.PatternDate without a CAST or CONVERT to a Date style. Not sure why when they are both DateTimes.

There are a few things to talk about here. At this stage it's not clear what you're actually trying to count. If it's the number of "patterns" per day for the month of Jan 2019, then:
Your BETWEEN will also count any activity occurring on 1 Feb,
It looks like one pattern could have multiple results, potentially causing a miscount
It looks like one pattern could have multiple details, potentially causing a miscount
If one pattern has say 3 eligible results, and also 4 details, you'll get the cross product of them. Your count will be 12.
I'm going to assume:
you only want the distinct number of patterns, regardless of the number of details and results.
You only want January's activity
--Set up some dummy data
DROP TABLE IF EXISTS #CalendarMain
SELECT cast('20190101' as datetime) as DT
INTO #CalendarMain
UNION ALL SELECT '20190102' as DT
UNION ALL SELECT '20190103' as DT
UNION ALL SELECT '20190104' as DT
UNION ALL SELECT '20190105' as DT --etc etc
;
DROP TABLE IF EXISTS #Pattern
SELECT cast('1'as int) as PatternID
,cast('20190101 13:00' as datetime) as PatternDate
INTO #Pattern
UNION ALL SELECT 2,'20190101 14:00'
UNION ALL SELECT 3,'20190101 15:00'
UNION ALL SELECT 4,'20190104 11:00'
UNION ALL SELECT 5,'20190104 14:00'
;
DROP TABLE IF EXISTS #Result
SELECT cast(100 as int) as ResultID
,cast(1 as int) as PatternID
,cast(7 as int) as [Type]
INTO #Result
UNION ALL SELECT 101,1,7
UNION ALL SELECT 102,1,8
UNION ALL SELECT 103,1,9
UNION ALL SELECT 104,2,8
UNION ALL SELECT 105,2,7
UNION ALL SELECT 106,3,7
UNION ALL SELECT 107,3,8
UNION ALL SELECT 108,4,7
UNION ALL SELECT 109,5,7
UNION ALL SELECT 110,5,8
;
DROP TABLE IF EXISTS #Detail
SELECT cast(201 as int) as DetailID
,cast(1 as int) as PatternID
,cast(501 as int) as Panel
INTO #Detail
UNION ALL SELECT 202,1,502
UNION ALL SELECT 203,1,503
UNION ALL SELECT 204,1,502
UNION ALL SELECT 205,1,502
UNION ALL SELECT 206,1,502
UNION ALL SELECT 207,2,502
UNION ALL SELECT 208,2,503
UNION ALL SELECT 209,2,502
UNION ALL SELECT 210,4,502
UNION ALL SELECT 211,4,501
;
-- create some variables
DECLARE #start_date as date = '20190101';
DECLARE #end_date as date = '20190201'; --I assume this is an exclusive end date
SELECT cal.DT
,isnull(patterns.[count],0) as [Count]
FROM #CalendarMain cal
LEFT JOIN ( SELECT cast(p.PatternDate as date) as PatternDate
,COUNT(DISTINCT p.PatternID) as [Count]
FROM #Pattern p
JOIN #Result r ON p.PatternID = r.PatternID
JOIN #Detail d ON p.PatternID = d.PatternID
WHERE r.[Type] = 7
and d.Panel = 501
GROUP BY cast(p.PatternDate as date)
) patterns ON cal.DT = patterns.patternDate
WHERE cal.DT >= #start_date
and cal.DT < #end_date --Your code would have included 1 Feb, which I assume was unintentional.
ORDER BY cal.DT
;

t-sql "LIKE" and Pattern Matching

I've found a small annoyance that I was wondering how to get around...
In a simplified example, say I need to return "TEST B-19" and "TEST B-20"
I have a where clause that looks like:
where [Name] LIKE 'TEST B-[12][90]'
and it works... unless there's a "TEST B-10" or "TEST-B29" value that I don't want.
I'd rather not resort to doing both cases, because in more complex situations that would become prohibitive.
I tried:
where [Name] LIKE 'TEST B-[19-20]'
but of course that doesn't work because it is looking for single characters...
Thoughts? Again, this is a very simple example, I'd be looking for ways to grab ranges from 16 to 32 or 234 to 459 without grabbing all the extra values that could be created.
EDITED to include test examples...
You might see "TEXAS 22" or "THX 99-20-110-B6" or "E-19" or "SOUTHERN B" or "122 FLOWERS" in that field. The presense of digits is common, but not a steadfast rule, and there are absolutely no general patterns for hypens, digits, characters, order, etc.

I would divide the Name column into the text parts and the number parts, and convert the number parts into an integer, and then check if that one was between the values. Something like:
where cast(substring([Name], 7, 2) as integer) between 19 and 20
And, of course, if the possible structure of [Name] is much more complex, you'd have to calculate the values for 7 and 2, not hardcode them....
EDIT: If you want to filter out the ones not conforming to the pattern first, do the following:
where [Name] LIKE '%TEST B-__%'
and cast(substring([Name], CHARINDEX('TEST B-', [Name]) + LEN('TEST B-'), 2) as integer) between 19 and 20
Maybe it's faster using CHARINDEX in place of the LIKE in the topmost line two, especially if you put an index on the computed value, but... That is only optimization... :)
EDIT: Tested the procedure. Given the following data:
jajajajajajajTEST B-100
jajajajajajajTEST B-85
jajajajjTEST B-100
jajjajajTEST B-100
jajajajajajajTEST B-00
jajajajaTEST B-100
jajajajajajajEST B-99
jajajajajajajTEST B-100
jajajajajajajTEST B-19
jajajajjTEST B-100
jajjajajTEST B-120
jajajajajajajTEST B-00
jajajajaTEST B-150
jajajajajajajEST B-20
TEST B-20asdfh asdfkh
The query returns the following rows:
jajajajajajajTEST B-19
TEST B-20asdfh asdfkh

Wildcards or no, you still have to edit the query every time you want to change the range definition. If you're always dealing with a range (and it's not always the same range), you might use parameters. For example:
note: for some reason (this has happened in many other posts as well), when I try to post code beginning with 'declare', SO hangs and times-out. I reported it on meta already, but nobody could reproduce it (including me). Here it's happening again, so I took the 'D' off, and now it works. I'll come back tomorrow, and it will let me put the 'D' back on.
DECLARE #min varchar(5)
DECLARE #max varchar(5)
SET #min = 'B-19'
SET #max = 'B-20'
SELECT
...
WHERE NAME BETWEEN #min AND #max
You should avoid formatting [NAME] as others have suggested (using function on it) -- this way, your search can benefit from an index on it.
In any case -- you might re-consider your table structure. It sounds like 'TEST B-19' is a composite (non-normalized) value of category ('TEST') + sub-category ('B') + instance ('19'). Put it in a lookup table with 4 columns (id being the first), and then join it by id in whatever query needs to output the composite value. This will make searching and indexing much easier and faster.

In the absence of test data, I generated my own. I just removed the Test B- prefix, converted to int and did a Between
With Numerals As
(
Select top 100 row_number() over (order by name) TestNumeral
from sys.columns
),
TestNumbers AS
(
Select 'TEST B-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
)
Select *
From TestNumbers
Where Cast (Replace (TestNumber, 'TEST B-', '') as Integer) between 1 and 16
This gave me
TestNumber
-------------------------------------
TEST B-1
TEST B-2
TEST B-3
TEST B-4
TEST B-5
TEST B-6
TEST B-7
TEST B-8
TEST B-9
TEST B-10
TEST B-11
TEST B-12
TEST B-13
TEST B-14
TEST B-15
TEST B-16
This means, however, that if you have different strategies for naming tests, you would have to remove all different kinds of prefixes.
Now, on the other hand, if your Test numbers are in the TEST-Space-TestType-Hyphen-TestNumber format, you could use PatIndex and SubString
With Numerals As
(
Select top 100 row_number() over (order by name) TestNumeral
from sys.columns
),
TestNumbers AS
(
Select 'TEST B-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
Where TestNumeral Between 10 and 19
UNION
Select 'TEST A-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
Where TestNumeral Between 20 and 29
)
Select *
From TestNumbers
Where Cast (SubString (TestNumber, PATINDEX ('%-%', TestNumber)+1, Len (TestNumber) - PATINDEX ('%-%', TestNumber)) as Integer) between 16 and 26
That should yield the following
TestNumber
-------------------------------------
TEST A-20
TEST A-21
TEST A-22
TEST A-23
TEST A-24
TEST A-25
TEST A-26
TEST B-16
TEST B-17
TEST B-18
TEST B-19
All of your examples seem to have the test numbers at the end. So if you can create a table of patterns and then JOIN using a LIKE statement, you may be able make it work. Here is an example:
;
With TestNumbers As
(
select 'E-1' TestNumber
union select 'E-2'
union select 'E-3'
union select 'E-4'
union select 'E-5'
union select 'E-6'
union select 'E-7'
union select 'SOUTHERN B1'
union select 'SOUTHERN B2'
union select 'SOUTHERN B3'
union select 'SOUTHERN B4'
union select 'SOUTHERN B5'
union select 'SOUTHERN B6'
union select 'SOUTHERN B7'
union select 'Southern CC'
union select 'Southern DD'
union select 'Southern EE'
union select 'TEST B-1'
union select 'TEST B-2'
union select 'TEST B-3'
union select 'TEST B-4'
union select 'TEST B-5'
union select 'TEST B-6'
union select 'TEST B-7'
union select 'TEXAS 1'
union select 'TEXAS 2'
union select 'TEXAS 3'
union select 'TEXAS 4'
union select 'TEXAS 5'
union select 'TEXAS 6'
union select 'TEXAS 7'
union select 'THX 99-20-110-B1'
union select 'THX 99-20-110-B2'
union select 'THX 99-20-110-B3'
union select 'THX 99-20-110-B4'
union select 'THX 99-20-110-B5'
union select 'THX 99-20-110-B6'
union select 'THX 99-20-110-B7'
union select 'Southern AA'
union select 'Southern CC'
union select 'Southern DD'
union select 'Southern EE'
),
Prefixes as
(
Select 'TEXAS ' TestPrefix
Union Select 'THX 99-20-110-B'
Union Select 'E-'
Union Select 'SOUTHERN B'
Union Select 'TEST B-'
)
Select TN.TestNumber
From TestNumbers TN, Prefixes P
Where 1=1
And TN.TestNumber Like '%' + P.TestPrefix + '%'
And Cast (REPLACE (Tn.TestNumber, p.TestPrefix, '') AS INTEGER) between 4 and 6
This will give you
TestNumber
----------------
E-4
E-5
E-6
SOUTHERN B4
SOUTHERN B5
SOUTHERN B6
TEST B-4
TEST B-5
TEST B-6
TEXAS 4
TEXAS 5
TEXAS 6
THX 99-20-110-B4
THX 99-20-110-B5
THX 99-20-110-B6
(15 row(s) affected)

Is this acceptable:
WHERE [Name] IN ( 'TEST B-19', 'TEST B-20' )
The list of values can come from a subquery, e.g.:
WHERE [Name] IN ( SELECT [Name] FROM Elsewhere WHERE ... )