t-sql "LIKE" and Pattern Matching - tsql

I've found a small annoyance that I was wondering how to get around...
In a simplified example, say I need to return "TEST B-19" and "TEST B-20"
I have a where clause that looks like:
where [Name] LIKE 'TEST B-[12][90]'
and it works... unless there's a "TEST B-10" or "TEST-B29" value that I don't want.
I'd rather not resort to doing both cases, because in more complex situations that would become prohibitive.
I tried:
where [Name] LIKE 'TEST B-[19-20]'
but of course that doesn't work because it is looking for single characters...
Thoughts? Again, this is a very simple example, I'd be looking for ways to grab ranges from 16 to 32 or 234 to 459 without grabbing all the extra values that could be created.
EDITED to include test examples...
You might see "TEXAS 22" or "THX 99-20-110-B6" or "E-19" or "SOUTHERN B" or "122 FLOWERS" in that field. The presense of digits is common, but not a steadfast rule, and there are absolutely no general patterns for hypens, digits, characters, order, etc.

I would divide the Name column into the text parts and the number parts, and convert the number parts into an integer, and then check if that one was between the values. Something like:
where cast(substring([Name], 7, 2) as integer) between 19 and 20
And, of course, if the possible structure of [Name] is much more complex, you'd have to calculate the values for 7 and 2, not hardcode them....
EDIT: If you want to filter out the ones not conforming to the pattern first, do the following:
where [Name] LIKE '%TEST B-__%'
and cast(substring([Name], CHARINDEX('TEST B-', [Name]) + LEN('TEST B-'), 2) as integer) between 19 and 20
Maybe it's faster using CHARINDEX in place of the LIKE in the topmost line two, especially if you put an index on the computed value, but... That is only optimization... :)
EDIT: Tested the procedure. Given the following data:
jajajajajajajTEST B-100
jajajajajajajTEST B-85
jajajajjTEST B-100
jajjajajTEST B-100
jajajajajajajTEST B-00
jajajajaTEST B-100
jajajajajajajEST B-99
jajajajajajajTEST B-100
jajajajajajajTEST B-19
jajajajjTEST B-100
jajjajajTEST B-120
jajajajajajajTEST B-00
jajajajaTEST B-150
jajajajajajajEST B-20
TEST B-20asdfh asdfkh
The query returns the following rows:
jajajajajajajTEST B-19
TEST B-20asdfh asdfkh

Wildcards or no, you still have to edit the query every time you want to change the range definition. If you're always dealing with a range (and it's not always the same range), you might use parameters. For example:
note: for some reason (this has happened in many other posts as well), when I try to post code beginning with 'declare', SO hangs and times-out. I reported it on meta already, but nobody could reproduce it (including me). Here it's happening again, so I took the 'D' off, and now it works. I'll come back tomorrow, and it will let me put the 'D' back on.
DECLARE #min varchar(5)
DECLARE #max varchar(5)
SET #min = 'B-19'
SET #max = 'B-20'
SELECT
...
WHERE NAME BETWEEN #min AND #max
You should avoid formatting [NAME] as others have suggested (using function on it) -- this way, your search can benefit from an index on it.
In any case -- you might re-consider your table structure. It sounds like 'TEST B-19' is a composite (non-normalized) value of category ('TEST') + sub-category ('B') + instance ('19'). Put it in a lookup table with 4 columns (id being the first), and then join it by id in whatever query needs to output the composite value. This will make searching and indexing much easier and faster.

In the absence of test data, I generated my own. I just removed the Test B- prefix, converted to int and did a Between
With Numerals As
(
Select top 100 row_number() over (order by name) TestNumeral
from sys.columns
),
TestNumbers AS
(
Select 'TEST B-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
)
Select *
From TestNumbers
Where Cast (Replace (TestNumber, 'TEST B-', '') as Integer) between 1 and 16
This gave me
TestNumber
-------------------------------------
TEST B-1
TEST B-2
TEST B-3
TEST B-4
TEST B-5
TEST B-6
TEST B-7
TEST B-8
TEST B-9
TEST B-10
TEST B-11
TEST B-12
TEST B-13
TEST B-14
TEST B-15
TEST B-16
This means, however, that if you have different strategies for naming tests, you would have to remove all different kinds of prefixes.
Now, on the other hand, if your Test numbers are in the TEST-Space-TestType-Hyphen-TestNumber format, you could use PatIndex and SubString
With Numerals As
(
Select top 100 row_number() over (order by name) TestNumeral
from sys.columns
),
TestNumbers AS
(
Select 'TEST B-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
Where TestNumeral Between 10 and 19
UNION
Select 'TEST A-' + Convert (VarChar, TestNumeral) TestNumber
From Numerals
Where TestNumeral Between 20 and 29
)
Select *
From TestNumbers
Where Cast (SubString (TestNumber, PATINDEX ('%-%', TestNumber)+1, Len (TestNumber) - PATINDEX ('%-%', TestNumber)) as Integer) between 16 and 26
That should yield the following
TestNumber
-------------------------------------
TEST A-20
TEST A-21
TEST A-22
TEST A-23
TEST A-24
TEST A-25
TEST A-26
TEST B-16
TEST B-17
TEST B-18
TEST B-19
All of your examples seem to have the test numbers at the end. So if you can create a table of patterns and then JOIN using a LIKE statement, you may be able make it work. Here is an example:
;
With TestNumbers As
(
select 'E-1' TestNumber
union select 'E-2'
union select 'E-3'
union select 'E-4'
union select 'E-5'
union select 'E-6'
union select 'E-7'
union select 'SOUTHERN B1'
union select 'SOUTHERN B2'
union select 'SOUTHERN B3'
union select 'SOUTHERN B4'
union select 'SOUTHERN B5'
union select 'SOUTHERN B6'
union select 'SOUTHERN B7'
union select 'Southern CC'
union select 'Southern DD'
union select 'Southern EE'
union select 'TEST B-1'
union select 'TEST B-2'
union select 'TEST B-3'
union select 'TEST B-4'
union select 'TEST B-5'
union select 'TEST B-6'
union select 'TEST B-7'
union select 'TEXAS 1'
union select 'TEXAS 2'
union select 'TEXAS 3'
union select 'TEXAS 4'
union select 'TEXAS 5'
union select 'TEXAS 6'
union select 'TEXAS 7'
union select 'THX 99-20-110-B1'
union select 'THX 99-20-110-B2'
union select 'THX 99-20-110-B3'
union select 'THX 99-20-110-B4'
union select 'THX 99-20-110-B5'
union select 'THX 99-20-110-B6'
union select 'THX 99-20-110-B7'
union select 'Southern AA'
union select 'Southern CC'
union select 'Southern DD'
union select 'Southern EE'
),
Prefixes as
(
Select 'TEXAS ' TestPrefix
Union Select 'THX 99-20-110-B'
Union Select 'E-'
Union Select 'SOUTHERN B'
Union Select 'TEST B-'
)
Select TN.TestNumber
From TestNumbers TN, Prefixes P
Where 1=1
And TN.TestNumber Like '%' + P.TestPrefix + '%'
And Cast (REPLACE (Tn.TestNumber, p.TestPrefix, '') AS INTEGER) between 4 and 6
This will give you
TestNumber
----------------
E-4
E-5
E-6
SOUTHERN B4
SOUTHERN B5
SOUTHERN B6
TEST B-4
TEST B-5
TEST B-6
TEXAS 4
TEXAS 5
TEXAS 6
THX 99-20-110-B4
THX 99-20-110-B5
THX 99-20-110-B6
(15 row(s) affected)

Is this acceptable:
WHERE [Name] IN ( 'TEST B-19', 'TEST B-20' )
The list of values can come from a subquery, e.g.:
WHERE [Name] IN ( SELECT [Name] FROM Elsewhere WHERE ... )

Related

TSQL - in a string, replace a character with a fixed one every 2 characters

I can't replace every 2 characters of a string with a '.'
select STUFF('abcdefghi', 3, 1, '.') c3,STUFF('abcdefghi', 5, 1,
'.') c5,STUFF('abcdefghi', 7, 1, '.') c7,STUFF('abcdefghi', 9, 1, '.')
c9
if I use STUFF I should subsequently overlap the strings c3, c5, c7 and c9. but I can't find a method
can you help me?
initial string:
abcdefghi
the result I would like is
ab.de.gh.
the string can be up to 50 characters
Create a numbers / tally / digits table, if you don't have one already, then you can use this to target each character position:
with digits as ( /* This would be a real table, here it's just to test */
select n from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))x(n)
), t as (
select 'abcdefghi' as s
)
select String_Agg( case when d.n%3 = 0 then '.' else Substring(t.s, d.n, 1) end, '')
from t
cross apply digits d
where d.n <Len(t.s)
Using for xml with existing table
with digits as (
select n from (values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))x(n)
),
r as (
select t.id, case when d.n%3=0 then '.' else Substring(t.s, d.n, 1) end ch
from t
cross apply digits d
where d.n <Len(t.s)
)
select result=(select '' + ch
from r r2
where r2.id=r.id
for xml path('')
)
from r
group by r.id
You can try it like this:
Easiest might be a quirky update ike here:
DECLARE #string VARCHAR(100)='abcdefghijklmnopqrstuvwxyz';
SELECT #string = STUFF(#string,3*A.pos,1,'.')
FROM (SELECT TOP(LEN(#string)/3) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM master..spt_values) A(pos);
SELECT #string;
Better/Cleaner/Prettier was a recursive CTE:
We use a declared table to have some tabular sample data
DECLARE #tbl TABLE(ID INT IDENTITY, SomeString VARCHAR(200));
INSERT INTO #tbl VALUES('')
,('a')
,('ab')
,('abc')
,('abcd')
,('abcde')
,('abcdefghijklmnopqrstuvwxyz');
--the query
WITH recCTE AS
(
SELECT ID
,SomeString
,(LEN(SomeString)+1)/3 AS CountDots
,1 AS OccuranceOfDot
,SUBSTRING(SomeString,4,LEN(SomeString)) AS RestString
,CAST(LEFT(SomeString,2) AS VARCHAR(MAX)) AS Growing
FROM #tbl
UNION ALL
SELECT t.ID
,r.SomeString
,r.CountDots
,r.OccuranceOfDot+2
,SUBSTRING(RestString,4,LEN(RestString))
,CONCAT(Growing,'.',LEFT(r.RestString,2))
FROM #tbl t
INNER JOIN recCTE r ON t.ID=r.ID
WHERE r.OccuranceOfDot/2<r.CountDots-1
)
SELECT TOP 1 WITH TIES ID,Growing
FROM recCTE
ORDER BY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY OccuranceOfDot DESC);
--the result
1
2 a
3 ab
4 ab
5 ab
6 ab.de
7 ab.de.gh.jk.mn.pq.st.vw.yz
The idea in short
We use a recursive CTE to walk along the string
we add the needed portion together with a dot
We stop, when the remaining length is to short to continue
a little magic is the ORDER BY ROW_NUMBER() OVER() together with TOP 1 WITH TIES. This will allow all first rows (frist per ID) to appear.

TSQL how to produce UNION result without actual UNION command

Can I produce results like in my example below without actual UNION command. In my real scenario I have 1000 cat(egories) and would like to save typing and learn how to make it smarter without WHERE hard coding. Appreciate your hints, not sure if I can do PIVOT. Thanks M
My setup: SQL Server 2017 (RTM-CU22)
My test reproducable test source and sample code which I'd like to modify:
/*
SELECT * INTO #t FROM (
SELECT 'A ' Cat, 101 Score UNION ALL
SELECT 'A ' Cat, 102 Score UNION ALL
SELECT 'A ' Cat, 103 Score UNION ALL
SELECT 'BB' Cat, 2001 Score UNION ALL
SELECT 'BB' Cat, 2002 Score UNION ALL
SELECT 'CCC' Cat, 3333 Score
) b --- select * from #t
*/
-- this is desired output made with UNION.
SELECT 'A ' Cat, COUNT(1) CCount FROM #t WHERE Cat = 'A' UNION
SELECT 'BB ' Cat, COUNT(1) CCount FROM #t WHERE Cat = 'BB' UNION
SELECT 'CCC' Cat, COUNT(1) CCount FROM #t WHERE Cat NOT IN ('A','BB')
and this is my desired output:
If all you are looking for is a count of each Cat you can do the following:
SELECT Cat, COUNT(*) CCount FROM [#t]
GROUP BY [Cat]

TSQL Fuzzy address matching grouping, 2019 Edition

I have this situation where people asked to group on bad addresses. And I need to work on the tools/env I have, I don't have choice for Google API or 3rd party Data Science tools. I also did my HW, see posts several years old, so still want to check all if any updates available.
In my scenario people want to group IDs 1-6 into single, rest I added for neg test.
SELECT * INTO #t FROM ( --test data: select * from #t drop table #t
SELECT 1 Id, '1 CROLANA HEIGHTS' Adr UNION -- A vs O
SELECT 2 Id, '1 CROLONA HEIGHTS' Adr union
SELECT 3 Id, '1 CROLONA HEIGHT DRIVE' Adr union
SELECT 4 Id,'1 CROLONA HEIGHTS DR' Adr union
SELECT 5 Id, '1 CROLONA HGHTS DR' Adr union
SELECT 6 Id, '1 CROLONA HTS DR' Adr UNION
---------------------------------------- rest should not match
SELECT 7 Id, '1 CORWING DR' Adr UNION
SELECT 8 Id, '1 SUNNYHILL DRIVE' Adr UNION
SELECT 9 Id, '1 CROWN HILL DR' Adr UNION
SELECT 10 Id, '1 ADDISON DRv' Adr ) a
------------------- and below is my fuzzy working script which can be improved)
SELECT id, adr, LEAD(adr,1) OVER ( ORDER BY adr ) adr_lead,
SOUNDEX(adr) Sdx, DIFFERENCE(adr, LEAD(adr,1) OVER ( ORDER BY adr )) diff
--- SOUNDEX(adr), COUNT(*) c
FROM #t
--GROUP BY SOUNDEX(adr)
WHERE SOUNDEX(adr) = SOUNDEX('1 CROLANA HEIGHTS')
There is suggestions which I gladly take. I'm using intell replace at the end of string and standalone words to improve data.
DECLARE #st VARCHAR(100) = 'La_Beg_10 La_midleMacy La' --replace et the end of string
SELECT 'ryba', #st, '-->' f, CASE WHEN #st LIKE '%' + ' La'
THEN SUBSTRING(#st,1,LEN(#st) - LEN('La')) + 'Lane' ELSE #st END N

Postgress by a CASE with DISTINCT in select

I have a query like below getting the error - 'SELECT DISTINCT, ORDER BY expressions must appear in select list'
select distinct name
from fruits
order by case
when name = 'mango' then 1
else 2
end
This results 4 records, say
apple, mango, pear and grape
How can I make sure I get Mango as the first record always and the rest follow. I tried using the case statement, but not able to get the desired results. Any ideas will be appreciated.
I believe this should accomplish what you describe as needing.
select distinct
name,
case name when 'Mango' then 1 else 2 end as fruitOrder
from fruits
order by
fruitOrder
If you need to always have 'mango' in first position, no matter the other rows, this could be a way:
with fruits(name) as (
select 'apple' from dual union all
select 'mango' from dual union all
select 'pear' from dual union all
select 'grape' from dual
)
select name
from fruits
order by case
when name = 'mango' then 1
else 2
end
If you need to add a DISTINCT, this should work:
select distinct name,
case
when name = 'mango' then 1
else 2
end orderCol
from fruits
order by orderCol
This will give you 'Mango' followed by the others in order;
WITH get_rows AS
(SELECT DISTINCT item_type
FROM the_item)
SELECT item_type
FROM
(SELECT 1 as seq, item_type
FROM get_rows
WHERE item_type = 'Mango'
UNION ALL
SELECT 2 as seq, item_type
FROM get_rows
WHERE item_type <> 'Mango')
ORDER BY seq, item_type

TSQL Sort Union based on column not being returned

(Please note, I require a SQL Server 2005 solution)
I have a UNION query, where the first part returns multiple rows in a particular order, and the second part returns a single row which MUST the last row of the result set.
The easiest way I've found so far is to include an extra "sort" column, BUT I do not want this column to be returned with the data set.
Please note, this example has a single column, but the real query has many columns, built via dynamic query...
SELECT [TITLE],
(SELECT COUNT(*) FROM dbo.[OTHERTABLE] WHERE ...) AS [VALUE],
0 AS [EXTRAORDER]
FROM dbo.[LOOKUPTABLE]
UNION
SELECT 'Total',
(SELECT COUNT(*) FROM dbo.[OTHERTABLE]),
1 AS [EXTRAORDER]
ORDER BY [EXTRAORDER], [TITLE]
How can I creating this so that all the columns excluding EXTRAORDER are returned (preferably without manually listing all the desired columns)?
Unless anybody can come up with a better solution, I have currently settled for the following...
(I was heading down the same route as SQLhint.com was in their answer. Unfortunately their answer - at the time of writing - is still incorrect, and therefore I cannot upvote it. The Total row will still be ordered within the results of the main SELECT, rather than be "appended" to the end.)
Ideally I wanted a solution that didn't require the replication of all the columns required in the final data set. Unfortunately this solution does NOT satisfy this requirement, but at least it works!
The solution was to use CTE...
; WITH [DATA] AS (
SELECT [TITLE],
(SELECT COUNT(*) FROM dbo.[OTHERTABLE] WHERE ...) AS [VALUE],
0 AS [EXTRAORDER]
FROM dbo.[LOOKUPTABLE]
UNION
SELECT 'Total',
(SELECT COUNT(*) FROM dbo.[OTHERTABLE]),
1
)
SELECT [TITLE], [VALUE]
FROM [DATA]
ORDER BY [EXTRAORDER], [TITLE]
I think the best way is to return 2 result sets, but to respond strictly to your question:
SELECT [title], [value]
FROM
(SELECT [TITLE],
(SELECT COUNT(*) FROM dbo.[OTHERTABLE] WHERE ...) AS [VALUE],
0 AS [EXTRAORDER]
FROM dbo.[LOOKUPTABLE]
UNION
SELECT 'Total',
(SELECT COUNT(*) FROM dbo.[OTHERTABLE]),
1 AS [EXTRAORDER]) as A
ORDER BY CASE WHEN [title] = 'Total' THEN 'zzz' ELSE [title] END
How about this?
SELECT [TITLE], [VALUE]
FROM (
SELECT [TITLE],
(SELECT COUNT(*) FROM dbo.[OTHERTABLE] WHERE ...) AS [VALUE]
FROM dbo.[LOOKUPTABLE]
UNION
SELECT 'Total',
(SELECT COUNT(*) FROM dbo.[OTHERTABLE])
) [DATA]
ORDER BY (case when [TITLE] = 'Total' then 1 else 0 end), [TITLE]
This removed the [EXTRAORDER] column but still orders based on the [TITLE] treating 'Total' as the last item.