Extract the first word of a string in a SQL Server query - tsql

What's the best way to extract the first word of a string in sql server query?

SELECT CASE CHARINDEX(' ', #Foo, 1)
WHEN 0 THEN #Foo -- empty or single word
ELSE SUBSTRING(#Foo, 1, CHARINDEX(' ', #Foo, 1) - 1) -- multi-word
END
You could perhaps use this in a UDF:
CREATE FUNCTION [dbo].[FirstWord] (#value varchar(max))
RETURNS varchar(max)
AS
BEGIN
RETURN CASE CHARINDEX(' ', #value, 1)
WHEN 0 THEN #value
ELSE SUBSTRING(#value, 1, CHARINDEX(' ', #value, 1) - 1) END
END
GO -- test:
SELECT dbo.FirstWord(NULL)
SELECT dbo.FirstWord('')
SELECT dbo.FirstWord('abc')
SELECT dbo.FirstWord('abc def')
SELECT dbo.FirstWord('abc def ghi')

I wanted to do something like this without making a separate function, and came up with this simple one-line approach:
DECLARE #test NVARCHAR(255)
SET #test = 'First Second'
SELECT SUBSTRING(#test,1,(CHARINDEX(' ',#test + ' ')-1))
This would return the result "First"
It's short, just not as robust, as it assumes your string doesn't start with a space. It will handle one-word inputs, multi-word inputs, and empty string inputs.

Enhancement of Ben Brandt's answer to compensate even if the string starts with space by applying LTRIM(). Tried to edit his answer but rejected, so I am now posting it here separately.
DECLARE #test NVARCHAR(255)
SET #test = 'First Second'
SELECT SUBSTRING(LTRIM(#test),1,(CHARINDEX(' ',LTRIM(#test) + ' ')-1))

Adding the following before the RETURN statement would solve for the cases where a leading space was included in the field:
SET #Value = LTRIM(RTRIM(#Value))

Marc's answer got me most of the way to what I needed, but I had to go with patIndex rather than charIndex because sometimes characters other than spaces mark the ends of my data's words. Here I'm using '%[ /-]%' to look for space, slash, or dash.
Select race_id, race_description
, Case patIndex ('%[ /-]%', LTrim (race_description))
When 0 Then LTrim (race_description)
Else substring (LTrim (race_description), 1, patIndex ('%[ /-]%', LTrim (race_description)) - 1)
End race_abbreviation
from tbl_races
Results...
race_id race_description race_abbreviation
------- ------------------------- -----------------
1 White White
2 Black or African American Black
3 Hispanic/Latino Hispanic
Caveat: this is for a small data set (US federal race reporting categories); I don't know what would happen to performance when scaled up to huge numbers.

DECLARE #string NVARCHAR(50)
SET #string = 'CUT STRING'
SELECT LEFT(#string,(PATINDEX('% %',#string)))

Extract the first word from the indicated field:
SELECT SUBSTRING(field1, 1, CHARINDEX(' ', field1)) FROM table1;
Extract the second and successive words from the indicated field:
SELECT SUBSTRING(field1, CHARINDEX(' ', field1)+1, LEN (field1)-CHARINDEX(' ', field1)) FROM table1;

A slight tweak to the function returns the next word from a start point in the entry
CREATE FUNCTION [dbo].[GetWord]
(
#value varchar(max)
, #startLocation int
)
RETURNS varchar(max)
AS
BEGIN
SET #value = LTRIM(RTRIM(#Value))
SELECT #startLocation =
CASE
WHEN #startLocation > Len(#value) THEN LEN(#value)
ELSE #startLocation
END
SELECT #value =
CASE
WHEN #startLocation > 1
THEN LTRIM(RTRIM(RIGHT(#value, LEN(#value) - #startLocation)))
ELSE #value
END
RETURN CASE CHARINDEX(' ', #value, 1)
WHEN 0 THEN #value
ELSE SUBSTRING(#value, 1, CHARINDEX(' ', #value, 1) - 1)
END
END
GO
SELECT dbo.GetWord(NULL, 1)
SELECT dbo.GetWord('', 1)
SELECT dbo.GetWord('abc', 1)
SELECT dbo.GetWord('abc def', 4)
SELECT dbo.GetWord('abc def ghi', 20)

Try This:
Select race_id, race_description
, Case patIndex ('%[ /-]%', LTrim (race_description))
When 0 Then LTrim (race_description)
Else substring (LTrim (race_description), 1, patIndex ('%[ /-]%', LTrim (race_description)) - 1)
End race_abbreviation
from tbl_races

Related

How to find/replace weird whitespace in string

I find in my sql database string whit weird whitespace which cannot be replace like REPLACE(string, ' ', '') RTRIM and cant it even find with string = '% %'. This space is even transfered to new table when using SELECT string INTO
If i select this string in managment studio and copy that is seems is normal space and when everything is works but cant do nothing directly from database. What else can i do? Its some kind of error or can i try some special character for this?
First, you must identify the character.
You can do that by using a tally table (or a cte) and the Unicode function:
The following script will return a table with two columns: one contains a char and the other it's unicode value:
DECLARE #Str nvarchar(100) = N'This is a string containing 1 number and some words.';
with Tally(n) as
(
SELECT TOP(LEN(#str)) ROW_NUMBER() OVER(ORDER BY ##SPID)
FROM sys.objects a
--CROSS JOIN sys.objects b -- (unremark if there are not enough rows in the tally cte)
)
SELECT SUBSTRING(#str, n, 1) As TheChar,
UNICODE(SUBSTRING(#str, n, 1)) As TheCode
FROM Tally
WHERE n <= LEN(#str)
You can also add a condition to the where clause to only include "special" chars:
AND SUBSTRING(#str, n, 1) NOT LIKE '[a-zA-Z0-9]'
Then you can replace it using it's unicode value using nchar (I've used 32 in this example since it's unicode "regular" space:
SELECT REPLACE(#str, NCHAR(32), '|')
Result:
This|is|a|string|containing|1|number|and|some|words.

PostgreSQL return last n words

How to return last n words using Postgres.
I have tried using LEFT method.
SELECT DISTINCT LEFT(name, -4) FROM my_table;
but it return last 4 characters ,i want to return last 3 words.
demo:db<>fiddle
You can do this using a the SUBSTRING() function and regular expressions:
SELECT
SUBSTRING(name FROM '((\S+\s+){0,3}\S+$)')
FROM my_table
This has been explained here: How can I match the last two words in a sentence in PostgreSQL?
\S+ is a string of non-whitespace characters
\s+ is a string of whitespace characters (e.g. one space)
(\S+\s+){0,3} Zero to three words separated by a space
\S+$ one word at the end of the text.
-> creates 4 words (or less if there are no more).
One way is to use regexp_split_to_array() to split the string into the words it contains and then put a string back together using the last 3 words in that array.
SELECT coalesce(w.words[array_length(w.words, 1) - 2] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1) - 1] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1)], '')
FROM mytable t
CROSS JOIN LATERAL (SELECT regexp_split_to_array(t."name", ' ') words) w;
db<>fiddle
RIGHT() should do
SELECT RIGHT('MYCOLUMN', 4); -- returns LUMN
UPD
You can convert to array and then back to string
SELECT array_to_string(sentence[(array_length(sentence,1)-3):(array_length(sentence,1))],' ','*')
FROM
(
SELECT regexp_split_to_array('this is the one of the way to get the last four words of the string', E'\\s+') AS sentence
) foo;
DEMO HERE

How to write recursive TSQL replace query?

I am using SSMS 2008 and am trying to write a recursive replace statement. I have a good start on this, but it is not working fully yet. I want to replace every occurrence of XML tags occurring in one column with empty string. So I want to replace the whole range from "<" to ">" for each record. Here is what I have:
DECLARE #I INTEGER
SET #I = 3
while
#I > 0
--(select [note_text] from #TEMP_PN where [note_text] LIKE '%<%')
BEGIN
UPDATE #TEMP_PN
SET [note_text] = replace([note_text],substring([note_text],CHARINDEX('<',[note_text]),CHARINDEX('>',[note_text])),'')
from #TEMP_PN
where [note_text] LIKE '%Microsoft-com%'
SET #I = #I - 1
END
SELECT * FROM #TEMP_PN
The problem with this code is I hardcoded #I to be 3. However, I want to make it continue replacing from "<" to ">" with empty string for each record until there are no more "<" chars. So I tried the commented out line above but this gives me an error on more than one record / subquery. How can I achieve this recursive functionality? Also, my Replace statement above only replaced "<" chars for some records, strangely enough.
I tried your sample code, but it still does not replace all instances of this text per record and for some records it does not replace any text although there is "<" in these records. Here is a record where your script does not replace any substrings. Maybe this is a special character problem?
<DIV class=gc-message-sms-row><SPAN class=gc-message-sms-from>TLS: </SPAN><SPAN class=gc-message-sms-text>Hi Reggie... I'm on my way to Lynn.. see you soon</SPAN> <SPAN class=gc-message-sms-time>3:09 PM </SPAN></DIV>
You were pretty close... the problem is that the SUBSTRING's third parameter is a length, not the position to stop at.
DECLARE #RowsUpdated INT
SET #RowsUpdated = 1
WHILE (#RowsUpdated > 0)
BEGIN
UPDATE #TEMP_PN
SET [note_text] =
REPLACE(
[note_text],
substring(
[note_text],
CHARINDEX('<',[note_text]),
CHARINDEX(
'>',
SUBSTRING([note_text], CHARINDEX('<',[note_text]), 1 + LEN([note_text]) - CHARINDEX('<',[note_text]))
)
),
'')
from #TEMP_PN
where [note_text] LIKE '%Microsoft-com%' and [note_text] like '%<%>%'
SET #RowsUpdated = ##ROWCOUNT
END
SELECT * FROM #TEMP_PN
SECOND EDIT:
OK, I've updated both queries; this code should now handle the leading > before the first tag... which I think could have been the issue.
DECLARE #TestString VARCHAR(MAX)
SELECT #TestString = '><DIV class=gc-message-sms-row><SPAN class=gc-message-sms-from>TLS: </SPAN><SPAN class=gc-message-sms-text>Hi Reggie... I''m on my way to Lynn.. see you soon</SPAN> <SPAN class=gc-message-sms-time>3:09 PM </SPAN></DIV>'
DECLARE #RowsUpdated INT
SET #RowsUpdated = 1
WHILE (#RowsUpdated > 0)
BEGIN
SELECT
#TestString =
REPLACE(
#TestString,
substring(
#TestString,
CHARINDEX('<',#TestString),
CHARINDEX(
'>',
SUBSTRING(#TestString, CHARINDEX('<',#TestString), 1 + LEN(#TestString) - CHARINDEX('<',#TestString))
)
),
'')
WHERE #TestString LIKE '%<%>%'
SET #RowsUpdated = ##ROWCOUNT
END
SELECT #TestString
Could it be because that note doesn't meet the [note_text] LIKE '%Microsoft-com%' criteria?

T-SQL: How to obtain the exact length of a string in characters?

I'm generating T-SQL SELECT statements for tables for which I have no data type information up-front. In these statements, I need to perform string manipulation operations that depend on the length of the original value of the tables' columns.
One example (but not the only one) is to insert some text at a specific position in a string, including the option to insert it at the end:
SELECT
CASE WHEN (LEN ([t0].[Product] = 8)
THEN [t0].[Product] + 'test'
ELSE STUFF ([t0].[Product], 8, 0, 'test')
END
FROM [OrderItem] [t0]
(The CASE WHEN + LEN is required because STUFF doesn't allow me to insert text at the end of a string.)
The problem is that LEN excludes trailing blanks, which will ruin the calculation.
I know I can use DATALENGTH, which does not exclude trailing blanks, but I can't convert the bytes returned by DATALENGTH to the characters required by STUFF because I don't know whether the Product column is of type varchar or nvarchar.
So, how can I generate a SQL statement that depends on the exact length of a string in characters without up-front information about the string data type being used?
Here's what I ended up using:
SELECT
CASE WHEN ((LEN ([t0].[Product] + '#') - 1) = 8)
THEN [t0].[Product] + 'test'
ELSE STUFF ([t0].[Product], 8, 0, 'test')
END
FROM [OrderItem] [t0]
Measurements indicate that the LEN (... + '#') - 1 trick is about the same speed as LEN (...) alone.
Thanks for all the good answers!
try this:
SELECT
CASE WHEN (LEN (REPLACE([t0].[Product],' ', '#') = 8)
THEN [t0].[Product] + 'test'
ELSE STUFF ([t0].[Product], 8, 0, 'test')
END
FROM [OrderItem] [t0]
Can't you look up the type information for the columns in the system tables?
If not then to determine whether or not a column is varchar or nvarchar this would do it.
create table #test
(
c varchar(50),
n nvarchar(50)
)
insert into #test values ('1,2,3,4 ',N'1,2,3,4,5 ')
SELECT
CASE
WHEN datalength(CAST(c AS nvarchar(MAX))) = datalength(c)
THEN 'c is nvarchar'
ELSE 'c is char'
END,
CASE
WHEN datalength(CAST(n AS nvarchar(MAX))) = datalength(n)
THEN 'n is nvarchar'
ELSE 'n is char'
END
FROM #test
Use DATALENGTH and SQL_VARIANT_PROPERTY:
SELECT
CASE
WHEN 8
= DATALENGTH([t0].[Product])
/ CASE SQL_VARIANT_PROPERTY([t0].[Product],'BaseType') WHEN 'nvarchar' THEN 2 ELSE 1 END
THEN [t0].[Product] + 'test'
ELSE STUFF ([t0].[Product], 8, 0, 'test')
END
FROM [OrderItem] [t0]
If there are no leading blanks, len(reverse(column_name)) will give you the column length.

DESCENDING/ASCENDING Parameter to a stored procedure

I have the following SP
CREATE PROCEDURE GetAllHouses
set #webRegionID = 2
set #sortBy = 'case_no'
set #sortDirection = 'ASC'
AS
BEGIN
Select
tbl_houses.*
from tbl_houses
where
postal in (select zipcode from crm_zipcodes where web_region_id = #webRegionID)
ORDER BY
CASE UPPER(#sortBy)
when 'CASE_NO' then case_no
when 'AREA' then area
when 'FURNISHED' then furnished
when 'TYPE' then [type]
when 'SQUAREFEETS' then squarefeets
when 'BEDROOMS' then bedrooms
when 'LIVINGROOMS' then livingrooms
when 'BATHROOMS' then bathrooms
when 'LEASE_FROM' then lease_from
when 'RENT' then rent
else case_no
END
END
GO
Now everything in that SP works but I want to be able to choose whether I want to sort ASCENDING or DESCENDING.
I really can't fint no solution for that using SQL and can't find anything in google.
As you can see I have the parameter sortDirection and I have tried using it in multiple ways but always with errors... Tried Case Statements, IF statements and so on but it is complicated by the fact that I want to insert a keyword.
Help will be very much appriciated, I have tried must of the things that comes into mind but haven't been able to get it right.
You could use two order by fields:
CASE #sortDir WHEN 'ASC' THEN
CASE UPPER(#sortBy)
...
END
END ASC,
CASE #sortDir WHEN 'DESC' THEN
CASE UPPER(#sortBy)
...
END
END DESC
A CASE will evaluate as NULL if none of the WHEN clauses match, so that causes one of the two fields to evaluate to NULL for every row (not affecting the sort order) and the other has the appropriate direction.
One drawback, though, is that you'd need to duplicate your #sortBy CASE statement. You could achieve the same thing using dynamic SQL with sp_executesql and writing a 'ASC' or 'DESC' literal depending on the parameter.
That code is going to get very unmanageable very quickly as you'll need to double nest your CASE WHEN's... one set for the Column to order by, and nested set for whethers it's ASC or DESC
Might be better to consider using Dynamic SQL here...
DECLARE #sql nvarchar(max)
SET #sql = '
Select
tbl_houses.*
from tbl_houses
where
postal in (select zipcode from crm_zipcodes where web_region_id = ' + #webRegionID + ') ORDER BY '
SET #sql = #sql + ' ' + #sortBy + ' ' + #sortDirection
EXEC (#sql)
You could do it with some dynamic SQL and calling it with an EXEC. Beware SQL injection though if the user has any control over the parameters.
CREATE PROCEDURE GetAllHouses
set #webRegionID = 2
set #sortBy = 'case_no'
set #sortDirection = 'ASC'
AS
BEGIN
DECLARE #dynamicSQL NVARCHAR(MAX)
SET #dynamicSQL =
'
SELECT
tbl_houses.*
FROM
tbl_houses
WHERE
postal
IN
(
SELECT
zipcode
FROM
crm_zipcodes
WHERE
web_region_id = ' + CONVERT(nvarchar(10), #webRegionID) + '
)
ORDER BY
' + #sortBy + ' ' + #sortDirection
EXEC(#dynamicSQL)
END
GO