Return substring before and after a certain word - substring

I have a column with large volumes of text in each record. I need to search for a certain word and return the 50 characters before the word and 50 characters after the word. I'm using substring to return the characters after, but can't find a way to return the characters before the word.
substring(RepText,(PATINDEX('certainword',RepText),50) works great for returning text after 'certainword', but not before the word.
Any suggestions would be great!

Here is an example, which prints 2 letters before and after the word sub
declare #st varchar(100) ='test subis good'
select PATINDEX('%sub%', #st)
select SUBSTRING(#st, PATINDEX('%sub%', #st) + LEN('sub'), 2),
SUBSTRING(#st, PATINDEX('%sub%', #st) - ( case when len(#st) < 2 then len(#st) else 2 end), 2)

How about this? If this helps, please up vote my answer so I can get lots of points and look cool :)
if schema_id(N'utility') is null
execute (N'create schema utility');
go
if object_id(N'[utility].[get_some]'
, N'TF') is not null
drop function [utility].[get_some];
go
/*
select [lead], [lag] from [utility].[get_some] (N'return_abcdeandabcde_this', N'and', 5, 5);
select [lead], [lag] from [utility].[get_some] (N'return_a c eanda c e_this', N'and', 5, 5);
select [lead], [lag] from [utility].[get_some] (N'return_ bcd and bcd _this', N'and', 5, 5);
*/
create function [utility].[get_some] (
#input [nvarchar](max)
, #search [nvarchar](max)
, #lead_length [int]
, #lag_length [int])
returns #data table (
[lead] [nvarchar](max)
, [lag] [nvarchar](max))
as
begin
insert into #data
([lead],[lag])
select substring(#input
, charindex(#search
, #input) - #lead_length
, #lead_length) as [lead]
, substring(#input
, charindex(#search, #input) + len(#search)
, #lag_length) as [lag];
return;
end;
go

Related

TSQL query to extract a value between to char where a specific set of characters is there

I have a problem I can't seem to figure out. I am trying to extract capacity from a product description. It is always between two values, "," and "oz." however there could be other commas included in the description that are not part of what I'm trying to extract. Example value is , 15 oz., or , 2 oz.,
I'm trying to find values that have the oz in them and are between two commas and I have been completely unsuccessfully. I've tried many things, but here is the latest that I have tried today and I'm just getting an error.
SELECT SUBSTRING(
FullDescription,
CHARINDEX(',', FullDescription),
CHARINDEX('oz.',FullDescription)
- CHARINDEX(',', FullDescription)
+ Len('oz.')
)
from CatalogManagement.Product
Since the backwards pattern ,.zo is more recognisable, I'd go with the REVERSE function
Sample values:
"something, something more, 18oz., complete"
"shorter, 12oz., remainder"
"there is no capacity, in this, value"
"a bit more, 14oz, and some followups, maybe"
SELECT REVERSE(
SUBSTRING (
REVERSE(FullDescription),
CHARINDEX(',.zo', REVERSE(FullDescription)) + 1,
CHARINDEX(',', REVERSE(FullDescription), CHARINDEX(',.zo', REVERSE(FullDescription)) + 1) - CHARINDEX(',.zo', REVERSE(FullDescription)) - 1
)
)
FROM CatalogManagement.Product
WHERE FullDescription LIKE '%oz.,%'
You might use XML-splitting together with a XQuery predicate:
DECLARE #tbl TABLE(ID INT IDENTITY, YourString VARCHAR(MAX));
INSERT INTO #tbl VALUES('Here is one with an amount, 1 oz., some more text')
,('Here is one with no amount, some more text')
,('a, 10 oz.')
,('b, 20oz., no blank between oz and the number')
,('30oz., starts with the pattern, no leading comma');
SELECT t.*
,A.oz.value('.','nvarchar(max)') oz
FROM #tbl t
CROSS APPLY(SELECT CAST('<x>' + REPLACE((SELECT t.YourString AS [*] FOR XML PATH('')),',','</x><x>') + '</x>' AS XML)
.query('/x[contains(text()[1],"oz.")]')) A(oz);
The idea in short:
We use some string methods to replace commas with XML tags and to cast your string to XML. each fragment is placed within a decent <x> element.
We use a predicate to return just the fragments containing "oz.".
You can filter easily with
WHERE LEN(A.oz.value('.','nvarchar(max)'))>0

Get Substrings From DB2 Column

I Have: AAAA/DATA1/Data2;xyx;pqr
this data
I want only:DATA1 And Data2
If this is for a specific row, maybe use SUBSTR? Something like
SELECT
SUBSTR(column, 6, 5) AS col1
, SUBSTR(column, 13, 5) AS col2
FROM table
Here is something else you can do.. Although it gets pretty complicated, and this isn't the exact answer you are looking for but it will get you started. Hope this helps:
WITH test AS (
SELECT characters
FROM ( VALUES
( 'AAAA/DATA1/Data2;xyx;pqr'
) )
AS testing(characters)
)
SELECT
SUBSTR(characters, 1, LOCATE('/', characters) - 1) AS FIRST_PART
, SUBSTR(characters, LOCATE('/', characters) + 1) AS SECOND_PART
, SUBSTR(characters, LOCATE('/', characters, LOCATE('/', characters) + 1) + 1)
AS THIRD_PART
FROM test
;
DB2 does not have a single function for this, unfortunately. Check out this answer here: How to split a string value based on a delimiter in DB2

Extract data from sql string

I have the following data in a column. I want to extract the 'matching details' score just showing as 542. The problem is the matching score can also be more than 3 characters long. Can someone help?
MatchingDetails score="542" maxScore="-96" matchRule="abcdef"><rule name="Person_Forename" score="279" /><rule name="Person_Surname" score="263"
One way is to use a combination of charindex, patindex, and substring:
DECLARE #S varchar(100) = 'MatchingDetails score="542" maxScore="-96" matchRule="abcdef">'
SELECT SUBSTRING(#S,
patindex('% score="%', #S) + 8,
charindex('"', #S, patindex('% score="%', #S) + 9) - patindex('% score="%', #S) - 8)
Result:
542
If your data is an XML string, perhaps something like this
Example (corrected xml)
Declare #S varchar(max) = '
<MatchingDetails score="542" maxScore="-96" matchRule="abcdef" >
<rule name="Person_Forename" score="279"></rule>
<rule name="Person_Surname" score="263"></rule>
</MatchingDetails>
'
Select convert(xml,#S).value('MatchingDetails[1]/#score','int')
Returns
542

How to extract letters from a string using Firebird SQL

I want to implement a stored procedure that extract letters from a varchar in firebird.
Example :
v_accountno' is of type varchar(50) and has the following values
accountno 1 - 000023208821
accountno 2 - 390026826850868140H
accountno 3 - 0700765001003267KAH
I want to extract the letters from v_accountno and output it in o_letter.
In my example: o_letter will store H for accountno 2 and KAH for accountno 3.
I tried the following stored procedure, which obviously won't work for accountno 3. (Please help).
CREATE OR ALTER PROCEDURE SP_EXTRACT_LETTER
returns (
o_letter varchar(50))
as
declare variable v_accountno varchar(50);
begin
v_accountno = '390026826850868140H';
if (not (:v_accountno similar to '[[:DIGIT:]]*')) then
begin
-- My SP won't work in for accountno 3 '0700765001003267KAH'
v_accountno = longsubstr(v_accountno, strlen(v_accountno), strlen(v_accountno));
o_letter = v_accountno;
end
suspend;
end
One solution would be to replace every digits with empty string like:
o_letter = REPLACE(v_accountno, '0', '')
o_letter = REPLACE(o_letter, '1', '')
o_letter = REPLACE(o_letter, '2', '')
...
Since Firebird 3, you can use substring for this, using its regex facility (using the similar clause):
substring(v_accountno similar '[[:digit:]]*#"[[:alpha:]]*#"' escape '#')
See also this dbfiddle.

Extract the first word of a string in a SQL Server query

What's the best way to extract the first word of a string in sql server query?
SELECT CASE CHARINDEX(' ', #Foo, 1)
WHEN 0 THEN #Foo -- empty or single word
ELSE SUBSTRING(#Foo, 1, CHARINDEX(' ', #Foo, 1) - 1) -- multi-word
END
You could perhaps use this in a UDF:
CREATE FUNCTION [dbo].[FirstWord] (#value varchar(max))
RETURNS varchar(max)
AS
BEGIN
RETURN CASE CHARINDEX(' ', #value, 1)
WHEN 0 THEN #value
ELSE SUBSTRING(#value, 1, CHARINDEX(' ', #value, 1) - 1) END
END
GO -- test:
SELECT dbo.FirstWord(NULL)
SELECT dbo.FirstWord('')
SELECT dbo.FirstWord('abc')
SELECT dbo.FirstWord('abc def')
SELECT dbo.FirstWord('abc def ghi')
I wanted to do something like this without making a separate function, and came up with this simple one-line approach:
DECLARE #test NVARCHAR(255)
SET #test = 'First Second'
SELECT SUBSTRING(#test,1,(CHARINDEX(' ',#test + ' ')-1))
This would return the result "First"
It's short, just not as robust, as it assumes your string doesn't start with a space. It will handle one-word inputs, multi-word inputs, and empty string inputs.
Enhancement of Ben Brandt's answer to compensate even if the string starts with space by applying LTRIM(). Tried to edit his answer but rejected, so I am now posting it here separately.
DECLARE #test NVARCHAR(255)
SET #test = 'First Second'
SELECT SUBSTRING(LTRIM(#test),1,(CHARINDEX(' ',LTRIM(#test) + ' ')-1))
Adding the following before the RETURN statement would solve for the cases where a leading space was included in the field:
SET #Value = LTRIM(RTRIM(#Value))
Marc's answer got me most of the way to what I needed, but I had to go with patIndex rather than charIndex because sometimes characters other than spaces mark the ends of my data's words. Here I'm using '%[ /-]%' to look for space, slash, or dash.
Select race_id, race_description
, Case patIndex ('%[ /-]%', LTrim (race_description))
When 0 Then LTrim (race_description)
Else substring (LTrim (race_description), 1, patIndex ('%[ /-]%', LTrim (race_description)) - 1)
End race_abbreviation
from tbl_races
Results...
race_id race_description race_abbreviation
------- ------------------------- -----------------
1 White White
2 Black or African American Black
3 Hispanic/Latino Hispanic
Caveat: this is for a small data set (US federal race reporting categories); I don't know what would happen to performance when scaled up to huge numbers.
DECLARE #string NVARCHAR(50)
SET #string = 'CUT STRING'
SELECT LEFT(#string,(PATINDEX('% %',#string)))
Extract the first word from the indicated field:
SELECT SUBSTRING(field1, 1, CHARINDEX(' ', field1)) FROM table1;
Extract the second and successive words from the indicated field:
SELECT SUBSTRING(field1, CHARINDEX(' ', field1)+1, LEN (field1)-CHARINDEX(' ', field1)) FROM table1;
A slight tweak to the function returns the next word from a start point in the entry
CREATE FUNCTION [dbo].[GetWord]
(
#value varchar(max)
, #startLocation int
)
RETURNS varchar(max)
AS
BEGIN
SET #value = LTRIM(RTRIM(#Value))
SELECT #startLocation =
CASE
WHEN #startLocation > Len(#value) THEN LEN(#value)
ELSE #startLocation
END
SELECT #value =
CASE
WHEN #startLocation > 1
THEN LTRIM(RTRIM(RIGHT(#value, LEN(#value) - #startLocation)))
ELSE #value
END
RETURN CASE CHARINDEX(' ', #value, 1)
WHEN 0 THEN #value
ELSE SUBSTRING(#value, 1, CHARINDEX(' ', #value, 1) - 1)
END
END
GO
SELECT dbo.GetWord(NULL, 1)
SELECT dbo.GetWord('', 1)
SELECT dbo.GetWord('abc', 1)
SELECT dbo.GetWord('abc def', 4)
SELECT dbo.GetWord('abc def ghi', 20)
Try This:
Select race_id, race_description
, Case patIndex ('%[ /-]%', LTrim (race_description))
When 0 Then LTrim (race_description)
Else substring (LTrim (race_description), 1, patIndex ('%[ /-]%', LTrim (race_description)) - 1)
End race_abbreviation
from tbl_races