Oracle string replacement using lookup tables - oracle10g

Can someone please help me with this issue?
I have my client address stored in a table (table CT) as a string which could have some abbreviated data. Example: 123 RD. I have another look up table (table LK) which contains possible street names for the abbreviations. RD = ROAD.
I would like to create a new table which has the complete address for this string.
Input:
table CT:
Add1 Add2 Add3
------------------------
123 RD APT 2 BLDG 1
test DR null null
main RD null BLDG2
table LK:
abbreviation completestreet
----------------------------------
RD road
APT apartment
BLDG building
DR drive
I would like to join these two tables to achieve the following:
123 ROAD APARTMENT 2 BUILDING 1
test DRIVE null null
main ROAD null BUILDING 2

You probably want to find a solution that uses regexp_replace together with the regexp word boundaries ^| and |$:
with ct as (
select '123 RD' add1, 'APT 2' add2, 'BLDG 1' add3 from dual union all
select '123 3RD street' add1, 'APT 2' add2, 'BLDG 1' add3 from dual union all
select 'test DR' add1, '' add2, '' add3 from dual union all
select 'main RD' add1, '' add2, 'BLDG2' add3 from dual
),
lk as (
select 'RD' abbreviation, 'road' completestreet from dual union all
select 'APT' abbreviation, 'apartment' completestreet from dual union all
select 'BLDG' abbreviation, 'building' completestreet from dual union all
select 'DR' abbreviation, 'drive' completestreet from dual
),
recursion (add1, add2, add3, l) as (
select add1, add2, add3, 1 l from ct union all
select regexp_replace(add1, '(^| )' ||abbreviation || '( |$)', '\1' || completestreet || '\2'),
regexp_replace(add2, '(^| )' ||abbreviation || '( |$)', '\1' || completestreet || '\2'),
regexp_replace(add3, '(^| )' ||abbreviation || '( |$)', '\1' || completestreet || '\2'),
l+1
from recursion join (
select
row_number() over (order by abbreviation) r,
abbreviation,
completestreet
from lk
) lk
on l=r
)
select substrb(add1, 1, 30),
substrb(add2, 1, 30),
substrb(add3, 1, 30)
from recursion
where l=(select count(*)+1 from lk);

Related

How to use REGEXP_SUBSTR for searches between two tables with Snowflake?

I am currently using the following code to be able to do an intelligent text search between two tables, none of the tables have any relationship that joins them.
The goal of my SQL is to be able to find the text of table_A that is in table_b, regardless of whether it is accompanied by special characters.
With the following code using the REGEXP_SUBSTR instruction, I am getting two problems:
1.) My SQL performance decreases exponentially when we talk about many records to match (Is there another better way?
2.) when the text has a special character it doesn't work for example the '.'
Thank you
--Create test tables
CREATE OR REPLACE TEMPORARY TABLE TABLE_A
AS
SELECT 'heLLO' AS CHAINE
,'ENGLISH' AS TYPE
UNION
SELECT 'HI' AS CHAINE
,'ENGLISH' AS TYPE
UNION
SELECT 'bONJOUR' AS CHAINE
,'FRENCH' AS TYPE
UNION
SELECT 'hOLa' AS CHAINE
,'SPANISH' AS TYPE
;
CREATE OR REPLACE TEMPORARY TABLE TABLE_B
AS
SELECT 'HELLO *' AS CHAINE
UNION
SELECT 'HI.' AS CHAINE
UNION
SELECT 'BONJOUR -' AS CHAINE
UNION
SELECT 'hOLa' AS CHAINE
;
Here the query that makes the match between the two tables
SELECT TABLE_A.* ,TABLE_B.*
FROM TABLE_A
INNER JOIN TABLE_B ON
(
TABLE_A.TYPE ='ENGLISH'
AND REGEXP_SUBSTR (TABLE_A.CHAINE
,'.*\\b' || REPLACE(TABLE_B.CHAINE,'.','.\\') || '\\b.*'
,1
,1
,'i') IS NOT NULL
)
The current result is good, the word 'heLLO' was found no matter if it was case sensitive, however the word 'HI' was not found as it had a dot
It's slow because you are breaking the golden rule "never use function in your WHERE clause", given an ON is just a special WHERE it still hold. If you want good performance in any database, you want to use EQUI Joins (aka a = b).
If we looking at what this toy example is doing you can see you join is really a CROSS JOIN with some expensive WHERE clauses:
with TABLE_A(chaine, type) as (
select * from values
('heLLO','ENGLISH'),
('HI','ENGLISH'),
('bONJOUR','FRENCH'),
('hOLa','SPANISH')
), TABLE_B(CHAINE) as (
select * from values
('HELLO *'),
('HI.'),
('BONJOUR -'),
('hOLa')
), pre_tb as (
select
*
,REPLACE(CHAINE,'.','') as rep
,'.*\\b' || rep || '\\b.*' as pata
,'\\b' || rep || '\\b' as patb
from table_b
)
SELECT ta.*
,tb.*
,REGEXP_SUBSTr (ta.CHAINE , tb.pata ,1 ,1 ,'i') IS NOT NULL as reg_a
,REGEXP_SUBSTr (ta.CHAINE , tb.patb ,1 ,1 ,'i') IS NOT NULL as reg_b
FROM TABLE_A as ta
JOIN pre_tb as tb
ON ta.TYPE ='ENGLISH'
--AND REGEXP_SUBSTr (ta.CHAINE , tb.pat ,1 ,1 ,'i') IS NOT NULL
I applied the suggestion to update the REPLACE, but the above gives:
CHAINE
TYPE
CHAINE_2
REP
PATA
PATB
REG_A
REG_B
heLLO
ENGLISH
HELLO *
HELLO *
.*\bHELLO \b.
\bHELLO *\b
TRUE
TRUE
heLLO
ENGLISH
HI.
HI
.\bHI\b.
\bHI\b
FALSE
FALSE
heLLO
ENGLISH
BONJOUR -
BONJOUR -
.\bBONJOUR -\b.
\bBONJOUR -\b
FALSE
FALSE
heLLO
ENGLISH
hOLa
hOLa
.\bhOLa\b.
\bhOLa\b
FALSE
FALSE
HI
ENGLISH
HELLO *
HELLO *
.*\bHELLO \b.
\bHELLO *\b
FALSE
FALSE
HI
ENGLISH
HI.
HI
.\bHI\b.
\bHI\b
TRUE
TRUE
HI
ENGLISH
BONJOUR -
BONJOUR -
.\bBONJOUR -\b.
\bBONJOUR -\b
FALSE
FALSE
HI
ENGLISH
hOLa
hOLa
.\bhOLa\b.
\bhOLa\b
FALSE
FALSE
so what this is showing is you don't need the wild .* on the start/end of the REGEXP_SUBSTR. as REG_A and REG_B are equal results. But If also shows the start of pre-processing your data.
Also of note the HELLO * only works because * is valid regex syntax.. which makes this code rather dangerous. That stuff should really be stripped off, or correctly escaped.
The REGEX_SUBSTR can be replaced with REGEXP_LIKE because you really are only asking is there a match, and not want the result, thus don't ask for more than you want.
Thus I would be inclined to use code like this:
with TABLE_A(chaine, type) as (
select * from values
('heLLO','ENGLISH'),
('HI','ENGLISH'),
('bONJOUR','FRENCH'),
('hOLa','SPANISH')
), TABLE_B(CHAINE) as (
select * from values
('HELLO *'),
('HI.'),
('BONJOUR -'),
('hOLa')
), pre_ta as (
select chaine, lower(chaine) as l_chaine, type
from table_a
), pre_tb as (
select
*
,regexp_substr(lower(CHAINE), '[a-z]+') as rep
,'\\b' || rep || '\\b' as pat
from table_b
)
SELECT ta.*
,tb.*
,REGEXP_LIKE (ta.CHAINE , tb.pat) as reg
FROM pre_ta as ta
JOIN pre_tb as tb
ON ta.TYPE ='ENGLISH'
and if you are just want the first match to work:
with TABLE_A(chaine, type) as (
select * from values
('heLLO','ENGLISH'),
('HI','ENGLISH'),
('bONJOUR','FRENCH'),
('hOLa','SPANISH')
), TABLE_B(CHAINE) as (
select * from values
('HELLO *'),
('HI.'),
('BONJOUR -'),
('hOLa')
), pre_ta as (
select chaine, type,
regexp_substr(lower(chaine), '[a-z]+') as rep
from table_a
), pre_tb as (
select
*
,regexp_substr(lower(chaine), '[a-z]+') as rep
from table_b
)
SELECT ta.*
,tb.*
,ta.rep = tb.rep
FROM pre_ta as ta
JOIN pre_tb as tb
ON ta.TYPE ='ENGLISH'
and if multi-match is needed I would use SPLIT_TO_TABLE:
with TABLE_A(chaine, type) as (
select * from values
('heLLO','ENGLISH'),
('bob heLLO','ENGLISH'),
('HI','ENGLISH'),
('bONJOUR','FRENCH'),
('hOLa','SPANISH')
), TABLE_B(CHAINE) as (
select * from values
('HELLO *'),
('HI. cat*'),
('BONJOUR -'),
('hOLa')
), pre_ta as (
select t.chaine, t.type,
regexp_substr(lower(trim(s.value)), '[a-z]+') as rep
from table_a as t, table(split_to_table(chaine, ' ')) s
where rep <> ''
), pre_tb as (
select
*
,regexp_substr(lower(trim(s.value)), '[a-z]+') as rep
from table_b as t, table(split_to_table(chaine, ' ')) s
where rep <> ''
)
SELECT ta.*
,tb.*
,ta.rep = tb.rep
FROM pre_ta as ta
JOIN pre_tb as tb
ON ta.TYPE ='ENGLISH'
which is now using an equi-join, thus putting that back into the ON clause:
with TABLE_A(chaine, type) as (
select * from values
('heLLO','ENGLISH'),
('bob heLLO','ENGLISH'),
('HI','ENGLISH'),
('bONJOUR','FRENCH'),
('hOLa','SPANISH')
), TABLE_B(CHAINE) as (
select * from values
('HELLO *'),
('HI. cat*'),
('BONJOUR -'),
('hOLa')
), pre_ta as (
select t.chaine, t.type,
regexp_substr(lower(trim(s.value)), '[a-z]+') as rep
from table_a as t, table(split_to_table(chaine, ' ')) s
where rep <> ''
), pre_tb as (
select
*
,regexp_substr(lower(trim(s.value)), '[a-z]+') as rep
from table_b as t, table(split_to_table(chaine, ' ')) s
where rep <> ''
)
SELECT ta.chaine,
ta.type,
tb.chaine
FROM pre_ta as ta
JOIN pre_tb as tb
ON ta.TYPE ='ENGLISH'
AND ta.rep = tb.rep
gives:
CHAINE
TYPE
CHAINE_2
heLLO
ENGLISH
HELLO *
bob heLLO
ENGLISH
HELLO *
HI
ENGLISH
HI. cat*

RedShift: troubles with regexp_substr

I have this JSON at RedShift: {"skippable": true, "unit": true}
I want to get only words between "" (JSON keys). Example: "skippable", "unit" etc.
I use this QUERY:
SELECT regexp_substr(REPLACE(REPLACE(attributes, '{', ''), '}', '')::VARCHAR, '\S+:') AS regexp, JSON_PARSE(attributes) AS attributes_super
FROM source.table
WHERE prompttype != 'input'.
But I have nothing to column "regexp".
Solution is:
SELECT
n::int
INTO TEMP numbers
FROM
(SELECT
row_number() over (order by true) as n
FROM table limit 30)
CROSS JOIN
(SELECT
max(regexp_count(attributes, '[,]')) as max_num
FROM table limit 30)
WHERE
n <= max_num + 1;
WITH all_values AS (
SELECT c.id, c.attributes, c.attributes_super.prompt, c.attributes_super.description,
c.attributes_super.topic, c.attributes_super.context,
c.attributes_super.use_case, c.attributes_super.subtitle, c.attributes_super.txValues, c.attributes_super.flashmode,
c.attributes_super.skippable, c.attributes_super.videoMaxDuration, c.attributes_super.defaultCameraFacing, c.attributes_super.locationRequired
FROM (
SELECT *, JSON_PARSE(attributes) AS attributes_super
FROM table
WHERE prompttype != 'input'
) AS c
ORDER BY created DESC
limit 1
), list_of_attr AS (
SELECT *, regexp_substr(split_part(attributes,',',n), '\"[0-9a-zA-Z]+\"') as others_attrs
FROM
all_values
CROSS JOIN
numbers
WHERE
split_part(attributes,',',n) is not null
AND split_part(attributes,',',n) != ''
), combine_attrs AS (
SELECT id, attributes, prompt, description,
topic, context, use_case, subtitle, txvalues, flashmode,
skippable, videomaxduration, defaultcamerafacing, locationrequired, LISTAGG(others_attrs, ',') AS others_attrs
FROM list_of_attr
GROUP BY id, attributes, prompt, description, topic,
context, use_case, subtitle, txvalues, flashmode,
skippable, videomaxduration, defaultcamerafacing, locationrequired)

TSQL how to produce UNION result without actual UNION command

Can I produce results like in my example below without actual UNION command. In my real scenario I have 1000 cat(egories) and would like to save typing and learn how to make it smarter without WHERE hard coding. Appreciate your hints, not sure if I can do PIVOT. Thanks M
My setup: SQL Server 2017 (RTM-CU22)
My test reproducable test source and sample code which I'd like to modify:
/*
SELECT * INTO #t FROM (
SELECT 'A ' Cat, 101 Score UNION ALL
SELECT 'A ' Cat, 102 Score UNION ALL
SELECT 'A ' Cat, 103 Score UNION ALL
SELECT 'BB' Cat, 2001 Score UNION ALL
SELECT 'BB' Cat, 2002 Score UNION ALL
SELECT 'CCC' Cat, 3333 Score
) b --- select * from #t
*/
-- this is desired output made with UNION.
SELECT 'A ' Cat, COUNT(1) CCount FROM #t WHERE Cat = 'A' UNION
SELECT 'BB ' Cat, COUNT(1) CCount FROM #t WHERE Cat = 'BB' UNION
SELECT 'CCC' Cat, COUNT(1) CCount FROM #t WHERE Cat NOT IN ('A','BB')
and this is my desired output:
If all you are looking for is a count of each Cat you can do the following:
SELECT Cat, COUNT(*) CCount FROM [#t]
GROUP BY [Cat]

T-SQL split string by - and space

I'm having difficult time with T-SQL and I was wondering if somebody could me point me to the right track.
I have the following variable called #input
DECLARE #input nvarchar(100);
SET #input= '27364 - John Smith';
-- SET #input= '27364 - John Andrew Smith';
I need to split this string in 3 parts (ID,Firstname and LastName) or 4 if the string contains a MiddleName. For security reason I cannot use functions.
My aproach was use Substring and Charindex.
SET #Id = SUBSTRING(#input, 1, CASE CHARINDEX('-', #input)
WHEN 0
THEN LEN(#input)
ELSE
CHARINDEX('-', #input) - 2
END);
SET #FirstName = SUBSTRING(#input, CASE CHARINDEX(' ', #input)
WHEN 0
THEN LEN(#input) + 1
ELSE
CHARINDEX(' ', #input) + 1
END, 1000);
SET #LastName = SUBSTRING(#input, CASE CHARINDEX(' ', #input)
WHEN 0
THEN LEN(#input) + 1
ELSE
CHARINDEX('0', #input) + 1
END, 1000);
Select #PartyCode,#FirstName,#LastName
I am stuck because I don't know how to proceed and also the code has to be smart enough to add a fourth split if Middlename exists.
Any thoughts?
Thanks in advance
Hopefully this is part of a normalization project. This data is breaking 1NF and one really should avoid that...
Try it like this
The advantages
typesafe values
ad-hoc SQL
set based
If you want you might use a CASE WHEN to check if the last part is NULL and place Part2 into Part3 in this case...
DECLARE #input table(teststring nvarchar(100));
INSERT INTO #input VALUES
(N'27364 - John Smith'),(N'27364 - John Andrew Smith');
WITH Splitted AS
(
SELECT CAST(N'<x>' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(teststring,N' - ',N' '),N'&',N'&'),N'<',N'<'),N'>',N'>'),N' ',N'</x><x>') + N'</x>' AS XML) testXML
FROM #input
)
SELECT testXML.value('/x[1]','int') AS Number
,testXML.value('/x[2]','nvarchar(max)') AS Part1
,testXML.value('/x[3]','nvarchar(max)') AS Part2
,testXML.value('/x[4]','nvarchar(max)') AS Part3
FROM Splitted
The result
Number Part1 Part2 Part3
27364 John Smith NULL
27364 John Andrew Smith
SQL Server 2016 has a new built-in function called STRING_SPLIT()
Assuming creating built-in functions, but CLR functions are not allowed:
CREATE FUNCTION dbo.WORD_SPLIT
(
#String AS nvarchar(4000)
)
RETURNS TABLE
AS
RETURN
(
WITH Spaces AS
(
SELECT Spaced.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT 1)) AS ordinal
FROM STRING_SPLIT(#String, ' ') AS Spaced
)
, Tabs AS
(
SELECT Tabbed.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY s.ordinal, (SELECT 1)) AS ordinal
FROM Spaces AS s
CROSS APPLY STRING_SPLIT(s.[value], ' ') AS Tabbed
)
, NewLines1 AS
(
SELECT NewLined1.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY t.ordinal, (SELECT 1)) AS ordinal
FROM Tabs AS t
CROSS APPLY STRING_SPLIT(t.[value], CHAR(13)) AS NewLined1
)
, NewLines2 AS
(
SELECT NewLined2.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY nl1.ordinal, (SELECT 1)) AS ordinal
FROM NewLines1 AS nl1
CROSS APPLY STRING_SPLIT(nl1.[value], CHAR(10)) AS NewLined2
)
SELECT LTRIM(RTRIM(nl2.[value])) AS [value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY nl2.ordinal, (SELECT 1)) AS ordinal
FROM NewLines2 AS nl2
WHERE LTRIM(RTRIM(nl2.[value])) <> ''
)
GO
Usage:
-- Not Normailized
SELECT i.*, split.[value], split.[ordinal]
FROM #input AS i
CROSS APPLY dbo.WORD_SPLIT(i.teststring) AS split
-- Normalized
;WITH Splitted AS
(
SELECT split.[value], split.[ordinal]
FROM #input AS i
CROSS APPLY dbo.WORD_SPLIT(i.teststring) AS split
)
SELECT *
FROM (SELECT [value], 'part' + CONVERT(nvarchar(20), [ordinal]) AS [parts] FROM Splitted) AS s
PIVOT (MAX([value]) FOR [parts] IN ([part1], [part2], [part3], [part4])
Or assuming that, per-security, you are not allowed to make schema changes:
WITH Splitting AS
(
SELECT teststring AS [value]
FROM #input
)
WITH Spaces AS
(
SELECT Spaced.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY (SELECT 1)) AS ordinal
FROM Splitting AS sp
CROSS APPLY STRING_SPLIT(sp.[value], ' ') AS Spaced
)
, Tabs AS
(
SELECT Tabbed.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY s.ordinal, (SELECT 1)) AS ordinal
FROM Spaces AS s
CROSS APPLY STRING_SPLIT(s.[value], ' ') AS Tabbed
)
, NewLines1 AS
(
SELECT NewLined1.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY t.ordinal, (SELECT 1)) AS ordinal
FROM Tabs AS t
CROSS APPLY STRING_SPLIT(t.[value], CHAR(13)) AS NewLined1
)
, NewLines2 AS
(
SELECT NewLined2.[value], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY nl1.ordinal, (SELECT 1)) AS ordinal
FROM NewLines1 AS nl1
CROSS APPLY STRING_SPLIT(nl1.[value], CHAR(10)) AS NewLined2
)
, Splitted AS
(
SELECT LTRIM(RTRIM(nl2.[value])) AS [teststring], ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY nl2.ordinal, (SELECT 1)) AS ordinal
FROM NewLines2 AS nl2
WHERE LTRIM(RTRIM(nl2.[value])) <> ''
)
SELECT *
FROM (SELECT [value], 'part' + CONVERT(nvarchar(20), [ordinal]) AS [parts] FROM Splitted) AS s
PIVOT (MAX([value]) FOR [parts] IN ([part1], [part2], [part3], [part4])
Hopefully helpful!

t-sql "LIKE" and Pattern Matching

I've found a small annoyance that I was wondering how to get around...
In a simplified example, say I need to return "TEST B-19" and "TEST B-20"
I have a where clause that looks like:
where [Name] LIKE 'TEST B-[12][90]'
and it works... unless there's a "TEST B-10" or "TEST-B29" value that I don't want.
I'd rather not resort to doing both cases, because in more complex situations that would become prohibitive.
I tried:
where [Name] LIKE 'TEST B-[19-20]'
but of course that doesn't work because it is looking for single characters...
Thoughts? Again, this is a very simple example, I'd be looking for ways to grab ranges from 16 to 32 or 234 to 459 without grabbing all the extra values that could be created.
EDITED to include test examples...
You might see "TEXAS 22" or "THX 99-20-110-B6" or "E-19" or "SOUTHERN B" or "122 FLOWERS" in that field. The presense of digits is common, but not a steadfast rule, and there are absolutely no general patterns for hypens, digits, characters, order, etc.
I would divide the Name column into the text parts and the number parts, and convert the number parts into an integer, and then check if that one was between the values. Something like:
where cast(substring([Name], 7, 2) as integer) between 19 and 20
And, of course, if the possible structure of [Name] is much more complex, you'd have to calculate the values for 7 and 2, not hardcode them....
EDIT: If you want to filter out the ones not conforming to the pattern first, do the following:
where [Name] LIKE '%TEST B-__%'
and cast(substring([Name], CHARINDEX('TEST B-', [Name]) + LEN('TEST B-'), 2) as integer) between 19 and 20
Maybe it's faster using CHARINDEX in place of the LIKE in the topmost line two, especially if you put an index on the computed value, but... That is only optimization... :)
EDIT: Tested the procedure. Given the following data:
jajajajajajajTEST B-100
jajajajajajajTEST B-85
jajajajjTEST B-100
jajjajajTEST B-100
jajajajajajajTEST B-00
jajajajaTEST B-100
jajajajajajajEST B-99
jajajajajajajTEST B-100
jajajajajajajTEST B-19
jajajajjTEST B-100
jajjajajTEST B-120
jajajajajajajTEST B-00
jajajajaTEST B-150
jajajajajajajEST B-20
TEST B-20asdfh asdfkh
The query returns the following rows:
jajajajajajajTEST B-19
TEST B-20asdfh asdfkh
Wildcards or no, you still have to edit the query every time you want to change the range definition. If you're always dealing with a range (and it's not always the same range), you might use parameters. For example:
note: for some reason (this has happened in many other posts as well), when I try to post code beginning with 'declare', SO hangs and times-out. I reported it on meta already, but nobody could reproduce it (including me). Here it's happening again, so I took the 'D' off, and now it works. I'll come back tomorrow, and it will let me put the 'D' back on.
DECLARE #min varchar(5)
DECLARE #max varchar(5)
SET #min = 'B-19'
SET #max = 'B-20'
SELECT
...
WHERE NAME BETWEEN #min AND #max
You should avoid formatting [NAME] as others have suggested (using function on it) -- this way, your search can benefit from an index on it.
In any case -- you might re-consider your table structure. It sounds like 'TEST B-19' is a composite (non-normalized) value of category ('TEST') + sub-category ('B') + instance ('19'). Put it in a lookup table with 4 columns (id being the first), and then join it by id in whatever query needs to output the composite value. This will make searching and indexing much easier and faster.
In the absence of test data, I generated my own. I just removed the Test B- prefix, converted to int and did a Between
With Numerals As
(
Select top 100 row_number() over (order by name) TestNumeral
from sys.columns
),
TestNumbers AS
(
Select 'TEST B-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
)
Select *
From TestNumbers
Where Cast (Replace (TestNumber, 'TEST B-', '') as Integer) between 1 and 16
This gave me
TestNumber
-------------------------------------
TEST B-1
TEST B-2
TEST B-3
TEST B-4
TEST B-5
TEST B-6
TEST B-7
TEST B-8
TEST B-9
TEST B-10
TEST B-11
TEST B-12
TEST B-13
TEST B-14
TEST B-15
TEST B-16
This means, however, that if you have different strategies for naming tests, you would have to remove all different kinds of prefixes.
Now, on the other hand, if your Test numbers are in the TEST-Space-TestType-Hyphen-TestNumber format, you could use PatIndex and SubString
With Numerals As
(
Select top 100 row_number() over (order by name) TestNumeral
from sys.columns
),
TestNumbers AS
(
Select 'TEST B-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
Where TestNumeral Between 10 and 19
UNION
Select 'TEST A-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
Where TestNumeral Between 20 and 29
)
Select *
From TestNumbers
Where Cast (SubString (TestNumber, PATINDEX ('%-%', TestNumber)+1, Len (TestNumber) - PATINDEX ('%-%', TestNumber)) as Integer) between 16 and 26
That should yield the following
TestNumber
-------------------------------------
TEST A-20
TEST A-21
TEST A-22
TEST A-23
TEST A-24
TEST A-25
TEST A-26
TEST B-16
TEST B-17
TEST B-18
TEST B-19
All of your examples seem to have the test numbers at the end. So if you can create a table of patterns and then JOIN using a LIKE statement, you may be able make it work. Here is an example:
;
With TestNumbers As
(
select 'E-1' TestNumber
union select 'E-2'
union select 'E-3'
union select 'E-4'
union select 'E-5'
union select 'E-6'
union select 'E-7'
union select 'SOUTHERN B1'
union select 'SOUTHERN B2'
union select 'SOUTHERN B3'
union select 'SOUTHERN B4'
union select 'SOUTHERN B5'
union select 'SOUTHERN B6'
union select 'SOUTHERN B7'
union select 'Southern CC'
union select 'Southern DD'
union select 'Southern EE'
union select 'TEST B-1'
union select 'TEST B-2'
union select 'TEST B-3'
union select 'TEST B-4'
union select 'TEST B-5'
union select 'TEST B-6'
union select 'TEST B-7'
union select 'TEXAS 1'
union select 'TEXAS 2'
union select 'TEXAS 3'
union select 'TEXAS 4'
union select 'TEXAS 5'
union select 'TEXAS 6'
union select 'TEXAS 7'
union select 'THX 99-20-110-B1'
union select 'THX 99-20-110-B2'
union select 'THX 99-20-110-B3'
union select 'THX 99-20-110-B4'
union select 'THX 99-20-110-B5'
union select 'THX 99-20-110-B6'
union select 'THX 99-20-110-B7'
union select 'Southern AA'
union select 'Southern CC'
union select 'Southern DD'
union select 'Southern EE'
),
Prefixes as
(
Select 'TEXAS ' TestPrefix
Union Select 'THX 99-20-110-B'
Union Select 'E-'
Union Select 'SOUTHERN B'
Union Select 'TEST B-'
)
Select TN.TestNumber
From TestNumbers TN, Prefixes P
Where 1=1
And TN.TestNumber Like '%' + P.TestPrefix + '%'
And Cast (REPLACE (Tn.TestNumber, p.TestPrefix, '') AS INTEGER) between 4 and 6
This will give you
TestNumber
----------------
E-4
E-5
E-6
SOUTHERN B4
SOUTHERN B5
SOUTHERN B6
TEST B-4
TEST B-5
TEST B-6
TEXAS 4
TEXAS 5
TEXAS 6
THX 99-20-110-B4
THX 99-20-110-B5
THX 99-20-110-B6
(15 row(s) affected)
Is this acceptable:
WHERE [Name] IN ( 'TEST B-19', 'TEST B-20' )
The list of values can come from a subquery, e.g.:
WHERE [Name] IN ( SELECT [Name] FROM Elsewhere WHERE ... )