How to find text pattern in string for postgresql SQL - postgresql

For this sample string: "... Key Match extra text..."
How do I get the value "Match", which is the string between blank spaces after "Key"?
is there a better way than:
Find position of "Key "->pos1, find position of first blank space after p1 -> p2, substring(string, p1,p2)?
This is not working as I expected
Select substring('Key Match extra text', 'Key (.+) ');
---
Match extra

You can make the regex be "non-greedy", so that .+ matches as few as possible:
Select substring('Key Match extra text', 'Key (.+?) ');
Or you can change . to something that won't match spaces:
Select substring('Key Match extra text', 'Key (\S+) ');

Related

DB2 remove empty lines

I have strings like this
#
word_1
word_2
#
word_3
#
#
where # represents empty lines.
I'd like to remove those empty lines, for getting
word_1
word_2
word_3
I've tried replacing CHR(10) and CHR(13) with '' but then I get
word_1word_2word_3
I've seen I can remove the first empty line using LTRIM, but how to get rid of all of them?
You must remove all new-line characters followed by new-line character, and a single new-line character at the start and the end of a string. All these replacements can be done with a single expression.
Starting from v11.1
select regexp_replace (s, '\r\n(?=\r\n)|^\r\n|\r\n$', '')
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
Note, that you may have a new-line character encoded as x'0a' instead of x'0d0a'. Remove all the \r characters in this case from the expression above.
dbfiddle link.
Starting from v9.7
select xmlcast (xmlquery ('replace (replace ($d, "^\r\n|\r\n$", ""), "(\r\n){2,}", "$1")' passing s as "d") as varchar (100))
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
dbfiddle link.

Postgres: Retrieve first n words from column

I know that I can do a text search in Postgres with TextSearch and get some result with
select ts_headline('german',content, tq, 'MaxFragments=4, MinWords=5, MaxWords=12,
ShortWord=3, StartSel = <strong>, StopSel = </strong>') as highlight, ...
FROM to_tsquery('german', 'test') tq ...
Is there a similar way to apply to content the same limitations? i.e. to get directly up to 12 words from the column content.
You could use regular expressions:
SELECT (regexp_match(
regexp_replace(content, '[^\w\s]+', ' ', 'g'),
'^\s*((?:\w+\s+){9}\w+)'
))[1] FROM ...
That will first replace everything that is not a space or alphanumerical character with a space and then return the first 10 words.

PostgreSQL return last n words

How to return last n words using Postgres.
I have tried using LEFT method.
SELECT DISTINCT LEFT(name, -4) FROM my_table;
but it return last 4 characters ,i want to return last 3 words.
demo:db<>fiddle
You can do this using a the SUBSTRING() function and regular expressions:
SELECT
SUBSTRING(name FROM '((\S+\s+){0,3}\S+$)')
FROM my_table
This has been explained here: How can I match the last two words in a sentence in PostgreSQL?
\S+ is a string of non-whitespace characters
\s+ is a string of whitespace characters (e.g. one space)
(\S+\s+){0,3} Zero to three words separated by a space
\S+$ one word at the end of the text.
-> creates 4 words (or less if there are no more).
One way is to use regexp_split_to_array() to split the string into the words it contains and then put a string back together using the last 3 words in that array.
SELECT coalesce(w.words[array_length(w.words, 1) - 2] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1) - 1] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1)], '')
FROM mytable t
CROSS JOIN LATERAL (SELECT regexp_split_to_array(t."name", ' ') words) w;
db<>fiddle
RIGHT() should do
SELECT RIGHT('MYCOLUMN', 4); -- returns LUMN
UPD
You can convert to array and then back to string
SELECT array_to_string(sentence[(array_length(sentence,1)-3):(array_length(sentence,1))],' ','*')
FROM
(
SELECT regexp_split_to_array('this is the one of the way to get the last four words of the string', E'\\s+') AS sentence
) foo;
DEMO HERE

Postgresql: Remove hyphens and whitespaces

I am currently working on DB data which contains whitespaces and hyphens. I searched over the net and found this Remove/replace special characters in column values? . I tried to follow the answer but I am still getting hyphens. I tried playing around with it, I can only remove the whitespace
conn_p = p.connect("dbname='p_test' user='postgres' password='postgres' host='localhost'")
conn_t = p.connect("dbname='t_mig1' user='postgres' password='postgres' host='localhost'")
cur_p = conn_p.cursor()
cur_t = conn_t.cursor()
cur_t.execute("SELECT CAST(REGEXP_REPLACE(studentnumber, ' ', '') as integer), firstname, middlename, lastname FROM sprofile")
rows = cur_t.fetchall()
for row in rows:
print "Inserting ", row[0], row[1], row[2], row[3]
cur_p.execute(""" INSERT INTO "a_recipient" (id, first_name, middle_name, last_name) VALUES ('%s', '%s', '%s', '%s') """ % (row[0], row[1], row[2], row[3]))
cur_p.commit()
cur_pl.close()
cur_t.close()
What I would like to achieve is if I got a studentnumber of 001-2012-1456, it will be displayed as 000120121456.
To wipe out all characters in a set efficiently use translate. It takes a set of characters to translate into another set of characters. If the other set is empty it deletes them.
test=> select translate('001-2012-145 6', '- ', '');
translate
-------------
00120121456
While translate is simpler and faster for this particular job, it's important to know how to use regexes for others. To do it with regexp_replace there's two changes you need to make.
First, you have to match the set of - and as [- ].
Then, you have to specify to replace all occurrences, otherwise it will stop after the first one. That's done with the g flag.
test=> select regexp_replace('001-2012-145 6', '[- ]', '', 'g');
regexp_replace
----------------
00120121456
Here's a tutorial on POSIX regular expressions and character sets.
Its very simple to use inbuilt translate function.
Example:
select translate('001-2012-145 6', '- ', '');
Output of above command :
00120121456

Truncating leading zero from the string in postgresql

I'm trying to truncate leading zero from the address. example:
input
1 06TH ST
12 02ND AVE
123 001St CT
expected output
1 6TH ST
12 2ND AVE
123 1St CT
Here is what i have:
update table
set address = regexp_replace(address,'(0\d+(ST|ND|TH))','?????? need help here')
where address ~ '\s0\d+(ST|ND|TH)\s';
many thanks in advance
assuming that the address always has some number/letter address (1234, 1a, 33B) followed by a sequence of 1 or more spaces followed by the part you want to strip leading zeroes...
select substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0') from table;
or, to update the table:
update table set address = substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0');
-g
What you are looking for is the back references in the regular expressions:
UPDATE table
SET address = regexp_replace(address, '\m0+(\d+\w+)', '\1', 'g')
WHERE address ~ '\m0+(\d+\w+)'
Also:
\m used to match the beginning of a word (to avoid replacing inside words (f.ex. in 101Th)
0+ truncates all zeros (does not included in the capturing parenthesis)
\d+ used to capture the remaining numbers
\w+ used to capture the remaining word characters
a word caracter can be any alphanumeric character, and the underscore _.