PostgreSQL return last n words - postgresql

How to return last n words using Postgres.
I have tried using LEFT method.
SELECT DISTINCT LEFT(name, -4) FROM my_table;
but it return last 4 characters ,i want to return last 3 words.

You can do this using a the SUBSTRING() function and regular expressions:
SUBSTRING(name FROM '((\S+\s+){0,3}\S+$)')
FROM my_table
This has been explained here: How can I match the last two words in a sentence in PostgreSQL?
\S+ is a string of non-whitespace characters
\s+ is a string of whitespace characters (e.g. one space)
(\S+\s+){0,3} Zero to three words separated by a space
\S+$ one word at the end of the text.
-> creates 4 words (or less if there are no more).

One way is to use regexp_split_to_array() to split the string into the words it contains and then put a string back together using the last 3 words in that array.
SELECT coalesce(w.words[array_length(w.words, 1) - 2] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1) - 1] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1)], '')
FROM mytable t
CROSS JOIN LATERAL (SELECT regexp_split_to_array(t."name", ' ') words) w;

RIGHT() should do
You can convert to array and then back to string
SELECT array_to_string(sentence[(array_length(sentence,1)-3):(array_length(sentence,1))],' ','*')
SELECT regexp_split_to_array('this is the one of the way to get the last four words of the string', E'\\s+') AS sentence
) foo;


I know that I can do a text search in Postgres with TextSearch and get some result with
select ts_headline('german',content, tq, 'MaxFragments=4, MinWords=5, MaxWords=12,
ShortWord=3, StartSel = <strong>, StopSel = </strong>') as highlight, ...
FROM to_tsquery('german', 'test') tq ...
Is there a similar way to apply to content the same limitations? i.e. to get directly up to 12 words from the column content.
You could use regular expressions:
SELECT (regexp_match(
regexp_replace(content, '[^\w\s]+', ' ', 'g'),
))[1] FROM ...
That will first replace everything that is not a space or alphanumerical character with a space and then return the first 10 words.

I find in my sql database string whit weird whitespace which cannot be replace like REPLACE(string, ' ', '') RTRIM and cant it even find with string = '% %'. This space is even transfered to new table when using SELECT string INTO
If i select this string in managment studio and copy that is seems is normal space and when everything is works but cant do nothing directly from database. What else can i do? Its some kind of error or can i try some special character for this?
First, you must identify the character.
You can do that by using a tally table (or a cte) and the Unicode function:
The following script will return a table with two columns: one contains a char and the other it's unicode value:
DECLARE #Str nvarchar(100) = N'This is a string containing 1 number and some words.';
with Tally(n) as
FROM sys.objects a
--CROSS JOIN sys.objects b -- (unremark if there are not enough rows in the tally cte)
SELECT SUBSTRING(#str, n, 1) As TheChar,
UNICODE(SUBSTRING(#str, n, 1)) As TheCode
FROM Tally
WHERE n <= LEN(#str)
You can also add a condition to the where clause to only include "special" chars:
AND SUBSTRING(#str, n, 1) NOT LIKE '[a-zA-Z0-9]'
Then you can replace it using it's unicode value using nchar (I've used 32 in this example since it's unicode "regular" space:
SELECT REPLACE(#str, NCHAR(32), '|')

I have a table mytable that has a column ngram which is a VARCHAR2. I want to SELECT only those rows where ngram does not contain any whitespaces (tabs, spaces, EOLs etc). What should I replace <COND> below with?
SELECT ngram FROM mytable WHERE <COND>;
You could use regexp_instr (or regexp_like, or other regexp functions), see here for example
where regexp_instr(ngram, '[ '|| CHR(10) || CHR(13) || CHR(9) ||']') = 0
the white space is managed here '[ '
chr(10) = line feed
chr(13) = carriage return
chr(9) = tab
you can use CHR and INSTR function ASCII code of the characters you want to filter for example your where clause can be like this for an special character:
INSTR(ngram,CHR(the ASCI CODE of special char))=0
or the condition can be like this:
and ngram not like '%'||CHR(0)||'%' -- for null
and ngram not like '%'||CHR(31)||'%' -- for unit separator
and ngram not like '%'||CHR(127)||'%'-- for delete
here you can get all codes
This should match ngram where it contains no whitespace characters by using the \s shorthand for all whitespace characters. I only tested by inserting a TAB into a string in a VARCHAR2 column and it was then excluded:
where regexp_instr(ngram, '\s') = 0;

I'm trying to truncate leading zero from the address. example:
1 06TH ST
12 02ND AVE
123 001St CT
expected output
1 6TH ST
12 2ND AVE
123 1St CT
Here is what i have:
update table
set address = regexp_replace(address,'(0\d+(ST|ND|TH))','?????? need help here')
where address ~ '\s0\d+(ST|ND|TH)\s';
many thanks in advance
assuming that the address always has some number/letter address (1234, 1a, 33B) followed by a sequence of 1 or more spaces followed by the part you want to strip leading zeroes...
select substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0') from table;
or, to update the table:
update table set address = substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0');
What you are looking for is the back references in the regular expressions:
UPDATE table
SET address = regexp_replace(address, '\m0+(\d+\w+)', '\1', 'g')
WHERE address ~ '\m0+(\d+\w+)'
\m used to match the beginning of a word (to avoid replacing inside words (f.ex. in 101Th)
0+ truncates all zeros (does not included in the capturing parenthesis)
\d+ used to capture the remaining numbers
\w+ used to capture the remaining word characters
a word caracter can be any alphanumeric character, and the underscore _.

What's the best way to extract the first word of a string in sql server query?
WHEN 0 THEN #Foo -- empty or single word
ELSE SUBSTRING(#Foo, 1, CHARINDEX(' ', #Foo, 1) - 1) -- multi-word
You could perhaps use this in a UDF:
CREATE FUNCTION [dbo].[FirstWord] (#value varchar(max))
RETURNS varchar(max)
WHEN 0 THEN #value
ELSE SUBSTRING(#value, 1, CHARINDEX(' ', #value, 1) - 1) END
GO -- test:
SELECT dbo.FirstWord(NULL)
SELECT dbo.FirstWord('')
SELECT dbo.FirstWord('abc')
SELECT dbo.FirstWord('abc def')
SELECT dbo.FirstWord('abc def ghi')
I wanted to do something like this without making a separate function, and came up with this simple one-line approach:
SET #test = 'First Second'
SELECT SUBSTRING(#test,1,(CHARINDEX(' ',#test + ' ')-1))
This would return the result "First"
It's short, just not as robust, as it assumes your string doesn't start with a space. It will handle one-word inputs, multi-word inputs, and empty string inputs.
Enhancement of Ben Brandt's answer to compensate even if the string starts with space by applying LTRIM(). Tried to edit his answer but rejected, so I am now posting it here separately.
SET #test = 'First Second'
SELECT SUBSTRING(LTRIM(#test),1,(CHARINDEX(' ',LTRIM(#test) + ' ')-1))
Adding the following before the RETURN statement would solve for the cases where a leading space was included in the field:
SET #Value = LTRIM(RTRIM(#Value))
Marc's answer got me most of the way to what I needed, but I had to go with patIndex rather than charIndex because sometimes characters other than spaces mark the ends of my data's words. Here I'm using '%[ /-]%' to look for space, slash, or dash.
Select race_id, race_description
, Case patIndex ('%[ /-]%', LTrim (race_description))
When 0 Then LTrim (race_description)
Else substring (LTrim (race_description), 1, patIndex ('%[ /-]%', LTrim (race_description)) - 1)
End race_abbreviation
from tbl_races
race_id race_description race_abbreviation
------- ------------------------- -----------------
1 White White
2 Black or African American Black
3 Hispanic/Latino Hispanic
Caveat: this is for a small data set (US federal race reporting categories); I don't know what would happen to performance when scaled up to huge numbers.
SET #string = 'CUT STRING'
SELECT LEFT(#string,(PATINDEX('% %',#string)))
Extract the first word from the indicated field:
SELECT SUBSTRING(field1, 1, CHARINDEX(' ', field1)) FROM table1;
Extract the second and successive words from the indicated field:
SELECT SUBSTRING(field1, CHARINDEX(' ', field1)+1, LEN (field1)-CHARINDEX(' ', field1)) FROM table1;
A slight tweak to the function returns the next word from a start point in the entry
#value varchar(max)
, #startLocation int
RETURNS varchar(max)
SET #value = LTRIM(RTRIM(#Value))
SELECT #startLocation =
WHEN #startLocation > Len(#value) THEN LEN(#value)
ELSE #startLocation
SELECT #value =
WHEN #startLocation > 1
THEN LTRIM(RTRIM(RIGHT(#value, LEN(#value) - #startLocation)))
ELSE #value
WHEN 0 THEN #value
ELSE SUBSTRING(#value, 1, CHARINDEX(' ', #value, 1) - 1)
SELECT dbo.GetWord(NULL, 1)
SELECT dbo.GetWord('', 1)
SELECT dbo.GetWord('abc', 1)
SELECT dbo.GetWord('abc def', 4)
SELECT dbo.GetWord('abc def ghi', 20)
Try This:
Select race_id, race_description
, Case patIndex ('%[ /-]%', LTrim (race_description))
When 0 Then LTrim (race_description)
Else substring (LTrim (race_description), 1, patIndex ('%[ /-]%', LTrim (race_description)) - 1)
End race_abbreviation
from tbl_races