PostgreSQL return last n words - postgresql

How to return last n words using Postgres.
I have tried using LEFT method.
SELECT DISTINCT LEFT(name, -4) FROM my_table;
but it return last 4 characters ,i want to return last 3 words.

demo:db<>fiddle
You can do this using a the SUBSTRING() function and regular expressions:
SELECT
SUBSTRING(name FROM '((\S+\s+){0,3}\S+$)')
FROM my_table
This has been explained here: How can I match the last two words in a sentence in PostgreSQL?
\S+ is a string of non-whitespace characters
\s+ is a string of whitespace characters (e.g. one space)
(\S+\s+){0,3} Zero to three words separated by a space
\S+$ one word at the end of the text.
-> creates 4 words (or less if there are no more).

One way is to use regexp_split_to_array() to split the string into the words it contains and then put a string back together using the last 3 words in that array.
SELECT coalesce(w.words[array_length(w.words, 1) - 2] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1) - 1] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1)], '')
FROM mytable t
CROSS JOIN LATERAL (SELECT regexp_split_to_array(t."name", ' ') words) w;
db<>fiddle

RIGHT() should do
SELECT RIGHT('MYCOLUMN', 4); -- returns LUMN
UPD
You can convert to array and then back to string
SELECT array_to_string(sentence[(array_length(sentence,1)-3):(array_length(sentence,1))],' ','*')
FROM
(
SELECT regexp_split_to_array('this is the one of the way to get the last four words of the string', E'\\s+') AS sentence
) foo;
DEMO HERE

Related

Postgres: Retrieve first n words from column

I know that I can do a text search in Postgres with TextSearch and get some result with
select ts_headline('german',content, tq, 'MaxFragments=4, MinWords=5, MaxWords=12,
ShortWord=3, StartSel = <strong>, StopSel = </strong>') as highlight, ...
FROM to_tsquery('german', 'test') tq ...
Is there a similar way to apply to content the same limitations? i.e. to get directly up to 12 words from the column content.
You could use regular expressions:
SELECT (regexp_match(
regexp_replace(content, '[^\w\s]+', ' ', 'g'),
'^\s*((?:\w+\s+){9}\w+)'
))[1] FROM ...
That will first replace everything that is not a space or alphanumerical character with a space and then return the first 10 words.

How to find/replace weird whitespace in string

I find in my sql database string whit weird whitespace which cannot be replace like REPLACE(string, ' ', '') RTRIM and cant it even find with string = '% %'. This space is even transfered to new table when using SELECT string INTO
If i select this string in managment studio and copy that is seems is normal space and when everything is works but cant do nothing directly from database. What else can i do? Its some kind of error or can i try some special character for this?
First, you must identify the character.
You can do that by using a tally table (or a cte) and the Unicode function:
The following script will return a table with two columns: one contains a char and the other it's unicode value:
DECLARE #Str nvarchar(100) = N'This is a string containing 1 number and some words.';
with Tally(n) as
(
SELECT TOP(LEN(#str)) ROW_NUMBER() OVER(ORDER BY ##SPID)
FROM sys.objects a
--CROSS JOIN sys.objects b -- (unremark if there are not enough rows in the tally cte)
)
SELECT SUBSTRING(#str, n, 1) As TheChar,
UNICODE(SUBSTRING(#str, n, 1)) As TheCode
FROM Tally
WHERE n <= LEN(#str)
You can also add a condition to the where clause to only include "special" chars:
AND SUBSTRING(#str, n, 1) NOT LIKE '[a-zA-Z0-9]'
Then you can replace it using it's unicode value using nchar (I've used 32 in this example since it's unicode "regular" space:
SELECT REPLACE(#str, NCHAR(32), '|')
Result:
This|is|a|string|containing|1|number|and|some|words.

Oracle PL/SQL: How do I filter out whitespaces in SELECT?

I have a table mytable that has a column ngram which is a VARCHAR2. I want to SELECT only those rows where ngram does not contain any whitespaces (tabs, spaces, EOLs etc). What should I replace <COND> below with?
SELECT ngram FROM mytable WHERE <COND>;
Thanks!
You could use regexp_instr (or regexp_like, or other regexp functions), see here for example
where regexp_instr(ngram, '[ '|| CHR(10) || CHR(13) || CHR(9) ||']') = 0
the white space is managed here '[ '
chr(10) = line feed
chr(13) = carriage return
chr(9) = tab
you can use CHR and INSTR function ASCII code of the characters you want to filter for example your where clause can be like this for an special character:
INSTR(ngram,CHR(the ASCI CODE of special char))=0
or the condition can be like this:
where
and ngram not like '%'||CHR(0)||'%' -- for null
.
.
.
and ngram not like '%'||CHR(31)||'%' -- for unit separator
and ngram not like '%'||CHR(127)||'%'-- for delete
here you can get all codes http://www.theasciicode.com.ar/extended-ascii-code/non-breaking-space-no-break-space-ascii-code-255.html
This should match ngram where it contains no whitespace characters by using the \s shorthand for all whitespace characters. I only tested by inserting a TAB into a string in a VARCHAR2 column and it was then excluded:
where regexp_instr(ngram, '\s') = 0;

Truncating leading zero from the string in postgresql

I'm trying to truncate leading zero from the address. example:
input
1 06TH ST
12 02ND AVE
123 001St CT
expected output
1 6TH ST
12 2ND AVE
123 1St CT
Here is what i have:
update table
set address = regexp_replace(address,'(0\d+(ST|ND|TH))','?????? need help here')
where address ~ '\s0\d+(ST|ND|TH)\s';
many thanks in advance
assuming that the address always has some number/letter address (1234, 1a, 33B) followed by a sequence of 1 or more spaces followed by the part you want to strip leading zeroes...
select substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0') from table;
or, to update the table:
update table set address = substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0');
-g
What you are looking for is the back references in the regular expressions:
UPDATE table
SET address = regexp_replace(address, '\m0+(\d+\w+)', '\1', 'g')
WHERE address ~ '\m0+(\d+\w+)'
Also:
\m used to match the beginning of a word (to avoid replacing inside words (f.ex. in 101Th)
0+ truncates all zeros (does not included in the capturing parenthesis)
\d+ used to capture the remaining numbers
\w+ used to capture the remaining word characters
a word caracter can be any alphanumeric character, and the underscore _.

Extract the first word of a string in a SQL Server query

What's the best way to extract the first word of a string in sql server query?
SELECT CASE CHARINDEX(' ', #Foo, 1)
WHEN 0 THEN #Foo -- empty or single word
ELSE SUBSTRING(#Foo, 1, CHARINDEX(' ', #Foo, 1) - 1) -- multi-word
END
You could perhaps use this in a UDF:
CREATE FUNCTION [dbo].[FirstWord] (#value varchar(max))
RETURNS varchar(max)
AS
BEGIN
RETURN CASE CHARINDEX(' ', #value, 1)
WHEN 0 THEN #value
ELSE SUBSTRING(#value, 1, CHARINDEX(' ', #value, 1) - 1) END
END
GO -- test:
SELECT dbo.FirstWord(NULL)
SELECT dbo.FirstWord('')
SELECT dbo.FirstWord('abc')
SELECT dbo.FirstWord('abc def')
SELECT dbo.FirstWord('abc def ghi')
I wanted to do something like this without making a separate function, and came up with this simple one-line approach:
DECLARE #test NVARCHAR(255)
SET #test = 'First Second'
SELECT SUBSTRING(#test,1,(CHARINDEX(' ',#test + ' ')-1))
This would return the result "First"
It's short, just not as robust, as it assumes your string doesn't start with a space. It will handle one-word inputs, multi-word inputs, and empty string inputs.
Enhancement of Ben Brandt's answer to compensate even if the string starts with space by applying LTRIM(). Tried to edit his answer but rejected, so I am now posting it here separately.
DECLARE #test NVARCHAR(255)
SET #test = 'First Second'
SELECT SUBSTRING(LTRIM(#test),1,(CHARINDEX(' ',LTRIM(#test) + ' ')-1))
Adding the following before the RETURN statement would solve for the cases where a leading space was included in the field:
SET #Value = LTRIM(RTRIM(#Value))
Marc's answer got me most of the way to what I needed, but I had to go with patIndex rather than charIndex because sometimes characters other than spaces mark the ends of my data's words. Here I'm using '%[ /-]%' to look for space, slash, or dash.
Select race_id, race_description
, Case patIndex ('%[ /-]%', LTrim (race_description))
When 0 Then LTrim (race_description)
Else substring (LTrim (race_description), 1, patIndex ('%[ /-]%', LTrim (race_description)) - 1)
End race_abbreviation
from tbl_races
Results...
race_id race_description race_abbreviation
------- ------------------------- -----------------
1 White White
2 Black or African American Black
3 Hispanic/Latino Hispanic
Caveat: this is for a small data set (US federal race reporting categories); I don't know what would happen to performance when scaled up to huge numbers.
DECLARE #string NVARCHAR(50)
SET #string = 'CUT STRING'
SELECT LEFT(#string,(PATINDEX('% %',#string)))
Extract the first word from the indicated field:
SELECT SUBSTRING(field1, 1, CHARINDEX(' ', field1)) FROM table1;
Extract the second and successive words from the indicated field:
SELECT SUBSTRING(field1, CHARINDEX(' ', field1)+1, LEN (field1)-CHARINDEX(' ', field1)) FROM table1;
A slight tweak to the function returns the next word from a start point in the entry
CREATE FUNCTION [dbo].[GetWord]
(
#value varchar(max)
, #startLocation int
)
RETURNS varchar(max)
AS
BEGIN
SET #value = LTRIM(RTRIM(#Value))
SELECT #startLocation =
CASE
WHEN #startLocation > Len(#value) THEN LEN(#value)
ELSE #startLocation
END
SELECT #value =
CASE
WHEN #startLocation > 1
THEN LTRIM(RTRIM(RIGHT(#value, LEN(#value) - #startLocation)))
ELSE #value
END
RETURN CASE CHARINDEX(' ', #value, 1)
WHEN 0 THEN #value
ELSE SUBSTRING(#value, 1, CHARINDEX(' ', #value, 1) - 1)
END
END
GO
SELECT dbo.GetWord(NULL, 1)
SELECT dbo.GetWord('', 1)
SELECT dbo.GetWord('abc', 1)
SELECT dbo.GetWord('abc def', 4)
SELECT dbo.GetWord('abc def ghi', 20)
Try This:
Select race_id, race_description
, Case patIndex ('%[ /-]%', LTrim (race_description))
When 0 Then LTrim (race_description)
Else substring (LTrim (race_description), 1, patIndex ('%[ /-]%', LTrim (race_description)) - 1)
End race_abbreviation
from tbl_races