Extract data from sql string - tsql

I have the following data in a column. I want to extract the 'matching details' score just showing as 542. The problem is the matching score can also be more than 3 characters long. Can someone help?
MatchingDetails score="542" maxScore="-96" matchRule="abcdef"><rule name="Person_Forename" score="279" /><rule name="Person_Surname" score="263"

One way is to use a combination of charindex, patindex, and substring:
DECLARE #S varchar(100) = 'MatchingDetails score="542" maxScore="-96" matchRule="abcdef">'
SELECT SUBSTRING(#S,
patindex('% score="%', #S) + 8,
charindex('"', #S, patindex('% score="%', #S) + 9) - patindex('% score="%', #S) - 8)
Result:
542

If your data is an XML string, perhaps something like this
Example (corrected xml)
Declare #S varchar(max) = '
<MatchingDetails score="542" maxScore="-96" matchRule="abcdef" >
<rule name="Person_Forename" score="279"></rule>
<rule name="Person_Surname" score="263"></rule>
</MatchingDetails>
'
Select convert(xml,#S).value('MatchingDetails[1]/#score','int')
Returns
542

Related

Trying to extract text using CHARINDEX ()- 1 but getting an error

I have a column with Names, and I am trying to split the column into First and Last Name using Text functions such as LEFT/SUBSTRING/CHARINDEX.
Data in the column:
Name
Yang, Jon
Huang, Eugene
Torres, Ruben
Zhu, Christy
Johnson, Elizabeth
Everything works fine as long as I use this code:
SELECT
[Name]
--,LEFT([Name], CHARINDEX(' ', [Name])) AS FirstName
,SUBSTRING([Name], 1, CHARINDEX(' ', [Name] )) AS FirstName
FROM
DataModeling.Customer
But the problem arises when I try to subtract 1 from CHARINDEX to exclude the Comma from the result and it throws this error:
I have done this operation many times in Excel so trying to replicate it with TSQL. Any suggestion on what I am doing wrong is helpful.
You get that error when CHARINDEX(' ', [Name] ) return 0. So minus 1 will make it negative and it is invalid value for substring()
You can use CASE expression to check the return value from CHARINDEX() and return the correct value to substring()
Or, you can "cheat" by using
CHARINDEX( ' ', [Name] + ' ' )
So CHARINDEX() will always return a value that is more than 0

TSQL query to extract a value between to char where a specific set of characters is there

I have a problem I can't seem to figure out. I am trying to extract capacity from a product description. It is always between two values, "," and "oz." however there could be other commas included in the description that are not part of what I'm trying to extract. Example value is , 15 oz., or , 2 oz.,
I'm trying to find values that have the oz in them and are between two commas and I have been completely unsuccessfully. I've tried many things, but here is the latest that I have tried today and I'm just getting an error.
SELECT SUBSTRING(
FullDescription,
CHARINDEX(',', FullDescription),
CHARINDEX('oz.',FullDescription)
- CHARINDEX(',', FullDescription)
+ Len('oz.')
)
from CatalogManagement.Product
Since the backwards pattern ,.zo is more recognisable, I'd go with the REVERSE function
Sample values:
"something, something more, 18oz., complete"
"shorter, 12oz., remainder"
"there is no capacity, in this, value"
"a bit more, 14oz, and some followups, maybe"
SELECT REVERSE(
SUBSTRING (
REVERSE(FullDescription),
CHARINDEX(',.zo', REVERSE(FullDescription)) + 1,
CHARINDEX(',', REVERSE(FullDescription), CHARINDEX(',.zo', REVERSE(FullDescription)) + 1) - CHARINDEX(',.zo', REVERSE(FullDescription)) - 1
)
)
FROM CatalogManagement.Product
WHERE FullDescription LIKE '%oz.,%'
You might use XML-splitting together with a XQuery predicate:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(MAX));
INSERT INTO #tbl VALUES('Here is one with an amount, 1 oz., some more text')
,('Here is one with no amount, some more text')
,('a, 10 oz.')
,('b, 20oz., no blank between oz and the number')
,('30oz., starts with the pattern, no leading comma');
SELECT t.*
,A.oz.value('.','nvarchar(max)') oz
FROM #tbl t
CROSS APPLY(SELECT CAST('<x>' + REPLACE((SELECT t.YourString AS [*] FOR XML PATH('')),',','</x><x>') + '</x>' AS XML)
.query('/x[contains(text()[1],"oz.")]')) A(oz);
The idea in short:
We use some string methods to replace commas with XML tags and to cast your string to XML. each fragment is placed within a decent <x> element.
We use a predicate to return just the fragments containing "oz.".
You can filter easily with
WHERE LEN(A.oz.value('.','nvarchar(max)'))>0

How to find/replace weird whitespace in string

I find in my sql database string whit weird whitespace which cannot be replace like REPLACE(string, ' ', '') RTRIM and cant it even find with string = '% %'. This space is even transfered to new table when using SELECT string INTO
If i select this string in managment studio and copy that is seems is normal space and when everything is works but cant do nothing directly from database. What else can i do? Its some kind of error or can i try some special character for this?
First, you must identify the character.
You can do that by using a tally table (or a cte) and the Unicode function:
The following script will return a table with two columns: one contains a char and the other it's unicode value:
DECLARE #Str nvarchar(100) = N'This is a string containing 1 number and some words.';
with Tally(n) as
(
SELECT TOP(LEN(#str)) ROW_NUMBER() OVER(ORDER BY ##SPID)
FROM sys.objects a
--CROSS JOIN sys.objects b -- (unremark if there are not enough rows in the tally cte)
)
SELECT SUBSTRING(#str, n, 1) As TheChar,
UNICODE(SUBSTRING(#str, n, 1)) As TheCode
FROM Tally
WHERE n <= LEN(#str)
You can also add a condition to the where clause to only include "special" chars:
AND SUBSTRING(#str, n, 1) NOT LIKE '[a-zA-Z0-9]'
Then you can replace it using it's unicode value using nchar (I've used 32 in this example since it's unicode "regular" space:
SELECT REPLACE(#str, NCHAR(32), '|')
Result:
This|is|a|string|containing|1|number|and|some|words.

Sum like operation for strings in t-sql

I already have query to concatenate
DECLARE #ids VARCHAR(8000)
SELECT #ids = COALESCE(#ids + ', ', '') + concatenatedid
FROM #HH
but if I have to do it inline how can I do that? Any help please.
SELECT sum(quantity), COALESCE(#ids + ', ', '') + concatenatedid from #HH
Thanks.
Use the XML PATH trick. You may need a CAST
SELECT
SUBSTRING(
(
SELECT
',' + concatenatedid
FROM
#HH
FOR XML PATH ('')
)
, 2, 7999)
Also:
Join characters using SET BASED APPROACH (Sql Server 2005)
Subquery returned more than 1 value

Extract the first word of a string in a SQL Server query

What's the best way to extract the first word of a string in sql server query?
SELECT CASE CHARINDEX(' ', #Foo, 1)
WHEN 0 THEN #Foo -- empty or single word
ELSE SUBSTRING(#Foo, 1, CHARINDEX(' ', #Foo, 1) - 1) -- multi-word
END
You could perhaps use this in a UDF:
CREATE FUNCTION [dbo].[FirstWord] (#value varchar(max))
RETURNS varchar(max)
AS
BEGIN
RETURN CASE CHARINDEX(' ', #value, 1)
WHEN 0 THEN #value
ELSE SUBSTRING(#value, 1, CHARINDEX(' ', #value, 1) - 1) END
END
GO -- test:
SELECT dbo.FirstWord(NULL)
SELECT dbo.FirstWord('')
SELECT dbo.FirstWord('abc')
SELECT dbo.FirstWord('abc def')
SELECT dbo.FirstWord('abc def ghi')
I wanted to do something like this without making a separate function, and came up with this simple one-line approach:
DECLARE #test NVARCHAR(255)
SET #test = 'First Second'
SELECT SUBSTRING(#test,1,(CHARINDEX(' ',#test + ' ')-1))
This would return the result "First"
It's short, just not as robust, as it assumes your string doesn't start with a space. It will handle one-word inputs, multi-word inputs, and empty string inputs.
Enhancement of Ben Brandt's answer to compensate even if the string starts with space by applying LTRIM(). Tried to edit his answer but rejected, so I am now posting it here separately.
DECLARE #test NVARCHAR(255)
SET #test = 'First Second'
SELECT SUBSTRING(LTRIM(#test),1,(CHARINDEX(' ',LTRIM(#test) + ' ')-1))
Adding the following before the RETURN statement would solve for the cases where a leading space was included in the field:
SET #Value = LTRIM(RTRIM(#Value))
Marc's answer got me most of the way to what I needed, but I had to go with patIndex rather than charIndex because sometimes characters other than spaces mark the ends of my data's words. Here I'm using '%[ /-]%' to look for space, slash, or dash.
Select race_id, race_description
, Case patIndex ('%[ /-]%', LTrim (race_description))
When 0 Then LTrim (race_description)
Else substring (LTrim (race_description), 1, patIndex ('%[ /-]%', LTrim (race_description)) - 1)
End race_abbreviation
from tbl_races
Results...
race_id race_description race_abbreviation
------- ------------------------- -----------------
1 White White
2 Black or African American Black
3 Hispanic/Latino Hispanic
Caveat: this is for a small data set (US federal race reporting categories); I don't know what would happen to performance when scaled up to huge numbers.
DECLARE #string NVARCHAR(50)
SET #string = 'CUT STRING'
SELECT LEFT(#string,(PATINDEX('% %',#string)))
Extract the first word from the indicated field:
SELECT SUBSTRING(field1, 1, CHARINDEX(' ', field1)) FROM table1;
Extract the second and successive words from the indicated field:
SELECT SUBSTRING(field1, CHARINDEX(' ', field1)+1, LEN (field1)-CHARINDEX(' ', field1)) FROM table1;
A slight tweak to the function returns the next word from a start point in the entry
CREATE FUNCTION [dbo].[GetWord]
(
#value varchar(max)
, #startLocation int
)
RETURNS varchar(max)
AS
BEGIN
SET #value = LTRIM(RTRIM(#Value))
SELECT #startLocation =
CASE
WHEN #startLocation > Len(#value) THEN LEN(#value)
ELSE #startLocation
END
SELECT #value =
CASE
WHEN #startLocation > 1
THEN LTRIM(RTRIM(RIGHT(#value, LEN(#value) - #startLocation)))
ELSE #value
END
RETURN CASE CHARINDEX(' ', #value, 1)
WHEN 0 THEN #value
ELSE SUBSTRING(#value, 1, CHARINDEX(' ', #value, 1) - 1)
END
END
GO
SELECT dbo.GetWord(NULL, 1)
SELECT dbo.GetWord('', 1)
SELECT dbo.GetWord('abc', 1)
SELECT dbo.GetWord('abc def', 4)
SELECT dbo.GetWord('abc def ghi', 20)
Try This:
Select race_id, race_description
, Case patIndex ('%[ /-]%', LTrim (race_description))
When 0 Then LTrim (race_description)
Else substring (LTrim (race_description), 1, patIndex ('%[ /-]%', LTrim (race_description)) - 1)
End race_abbreviation
from tbl_races