Truncating leading zero from the string in postgresql - postgresql

I'm trying to truncate leading zero from the address. example:
input
1 06TH ST
12 02ND AVE
123 001St CT
expected output
1 6TH ST
12 2ND AVE
123 1St CT
Here is what i have:
update table
set address = regexp_replace(address,'(0\d+(ST|ND|TH))','?????? need help here')
where address ~ '\s0\d+(ST|ND|TH)\s';
many thanks in advance

assuming that the address always has some number/letter address (1234, 1a, 33B) followed by a sequence of 1 or more spaces followed by the part you want to strip leading zeroes...
select substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0') from table;
or, to update the table:
update table set address = substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0');
-g

What you are looking for is the back references in the regular expressions:
UPDATE table
SET address = regexp_replace(address, '\m0+(\d+\w+)', '\1', 'g')
WHERE address ~ '\m0+(\d+\w+)'
Also:
\m used to match the beginning of a word (to avoid replacing inside words (f.ex. in 101Th)
0+ truncates all zeros (does not included in the capturing parenthesis)
\d+ used to capture the remaining numbers
\w+ used to capture the remaining word characters
a word caracter can be any alphanumeric character, and the underscore _.

Related

DB2 remove empty lines

I have strings like this
#
word_1
word_2
#
word_3
#
#
where # represents empty lines.
I'd like to remove those empty lines, for getting
word_1
word_2
word_3
I've tried replacing CHR(10) and CHR(13) with '' but then I get
word_1word_2word_3
I've seen I can remove the first empty line using LTRIM, but how to get rid of all of them?
You must remove all new-line characters followed by new-line character, and a single new-line character at the start and the end of a string. All these replacements can be done with a single expression.
Starting from v11.1
select regexp_replace (s, '\r\n(?=\r\n)|^\r\n|\r\n$', '')
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
Note, that you may have a new-line character encoded as x'0a' instead of x'0d0a'. Remove all the \r characters in this case from the expression above.
dbfiddle link.
Starting from v9.7
select xmlcast (xmlquery ('replace (replace ($d, "^\r\n|\r\n$", ""), "(\r\n){2,}", "$1")' passing s as "d") as varchar (100))
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
dbfiddle link.

Postgres: Retrieve first n words from column

I know that I can do a text search in Postgres with TextSearch and get some result with
select ts_headline('german',content, tq, 'MaxFragments=4, MinWords=5, MaxWords=12,
ShortWord=3, StartSel = <strong>, StopSel = </strong>') as highlight, ...
FROM to_tsquery('german', 'test') tq ...
Is there a similar way to apply to content the same limitations? i.e. to get directly up to 12 words from the column content.
You could use regular expressions:
SELECT (regexp_match(
regexp_replace(content, '[^\w\s]+', ' ', 'g'),
'^\s*((?:\w+\s+){9}\w+)'
))[1] FROM ...
That will first replace everything that is not a space or alphanumerical character with a space and then return the first 10 words.

PostgreSQL return last n words

How to return last n words using Postgres.
I have tried using LEFT method.
SELECT DISTINCT LEFT(name, -4) FROM my_table;
but it return last 4 characters ,i want to return last 3 words.
demo:db<>fiddle
You can do this using a the SUBSTRING() function and regular expressions:
SELECT
SUBSTRING(name FROM '((\S+\s+){0,3}\S+$)')
FROM my_table
This has been explained here: How can I match the last two words in a sentence in PostgreSQL?
\S+ is a string of non-whitespace characters
\s+ is a string of whitespace characters (e.g. one space)
(\S+\s+){0,3} Zero to three words separated by a space
\S+$ one word at the end of the text.
-> creates 4 words (or less if there are no more).
One way is to use regexp_split_to_array() to split the string into the words it contains and then put a string back together using the last 3 words in that array.
SELECT coalesce(w.words[array_length(w.words, 1) - 2] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1) - 1] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1)], '')
FROM mytable t
CROSS JOIN LATERAL (SELECT regexp_split_to_array(t."name", ' ') words) w;
db<>fiddle
RIGHT() should do
SELECT RIGHT('MYCOLUMN', 4); -- returns LUMN
UPD
You can convert to array and then back to string
SELECT array_to_string(sentence[(array_length(sentence,1)-3):(array_length(sentence,1))],' ','*')
FROM
(
SELECT regexp_split_to_array('this is the one of the way to get the last four words of the string', E'\\s+') AS sentence
) foo;
DEMO HERE

Padding Fields With White Space

I have the following piece of code in my SELECT statement -
SELECT convert(varchar (24),ra.Reference)
If a result is - R0_2, so 4 characters, how do you go about padding the trailing space (to the right) with the remaining 20 characters to make up 24?
Similar in that if I have a figure of say 18.00 what I want is to add a # to the front, which I know I can achieve with a CONCAT.
However this field I want to be 16 characters and any leading space to be filled with white space, so this example would look like -
'xxxxxxxxxx#18.00' (where x is a blank space)
Thank you for any advice.
One trick you can use is to just concatenate to the string an amount of padding which is guaranteed to fill the missing spaces. For the case of a string 24 characters long, in your first example, we can concatenate 24 spaces to the end of that string. Then, take the first 24 characters from the left, and the resulting string should be right padded by spaces. Similar logic applies to the other case.
First query:
SELECT LEFT(CONVERT(varchar(24), ra.Reference) + ' ', 24)
FROM yourTable
Second query:
SELECT RIGHT(' ' + '#' + CONVERT(varchar(16), ra.TotalValue), 16)
FROM yourTable
You could also use REPLICATE to accurately pad based on the length of text for each cell to ensure it's always 24 characters:
DECLARE #Test1 VARCHAR(24) = 'Test',
#Test2 VARCHAR(24) = 'Longer String'
SELECT CONCAT(#Test1, REPLICATE(N' ', 24 - LEN(#Test1))),
CONCAT(#Test2, REPLICATE(N' ', 24 - LEN(#Test2)))
And for the #....
DECLARE #Number DECIMAL(4,2) = 18.00
SELECT CONCAT(REPLICATE(' ', 15 - LEN(CONVERT(VARCHAR(16), #Number))), '#',#Number)
I used 15 here despite it being 16 characters to account for the addition of the #

How to replace captured group with evaluated expression (adding an integer value to capture group)

I need to convert some strings with this format:
B12F34
to something like that:
Building 12 - Floor 34
but I have to add a value, say 10, to the second capture group so the new string would be as:
Building 12 - Floor 44
I can use this postgres sentence to get almost everything done, but I don't know how to add the value to the second capture group.
SELECT regexp_replace('B12F34', 'B(\d+)F(\d+)', 'Building \1 - Floor \2', 'g');
I have been searching for a way to add a value to \2 but all I have found is that I can use 'E'-modifier, and then \1 and \2 need to be \\1 and \\2:
SELECT regexp_replace('B12F34', 'B(\d+)F(\d+)', E'Building \\1 - Floor \\2', 'g')
I need some sentence like this one:
SELECT regexp_replace('B12F34', 'B(\d+)F(\d+)', E'Building \\1 - Floor \\2+10', 'g')
to get ........ Floor 44 instead of ........ Floor 34+10
You can not do this in regexp alone because regexp does not support math on captured groups even if they are all numeric characters. So you have to get the group that represents the floor number, do the math and splice it back in:
SELECT regexp_replace('B12F34', 'B(\d+)F(\d+)', 'Building \1 - Floor ') ||
((regexp_matches('B12F34', '[0-9]+$'))[1]::int + 10)::text;
Not very efficient because of the two regexp calls. Another option is to just get the two numbers in a sub-query and assemble the string in the main query:
SELECT format('Building %L - Floor %L', m.num[1], (m.num[2])::int + 10)
FROM (
SELECT regexp_matches('B12F34', '[0-9]+', 'g') AS num) m;