Postgres regex stop creating double space - postgresql

I have a query that looks like this:
select regexp_replace('john (junior) jones','\([^)]*\)','','g');
regexp_replace
------------------
john jones
As you can see, this query removes the values in brackets but it results in a double space remaining.
Is there an easy way around this?
So far I have this, which works to an extent:
select regexp_replace((regexp_replace('john (junior) jones','\([^)]*\)','','g')),'\s','');
regexp_replace
------------------
john jones
The above works but not when I pass through something like this:
select regexp_replace((regexp_replace('john (junior) jones (hughes) smith','\([^)]*\)','','g')),'\s','');
regexp_replace
---------------------
john jones smith

SELECT regexp_replace(
'john (junior) jones (hughes) smith',
' *\([^)]*\) *',
' ',
'g'
);
regexp_replace
══════════════════
john jones smith
(1 row)
To explain the regular expression:
an arbitrary number of spaces, followed by an opening parenthesis ( *\()
an arbitrary number of characters that are not a closing parenthesis ([^)]*)
a closing parenthesis and arbitrarily many spaces (\) *)
That is replaced with a single space.

Related

Get only character records from column

Table:
create table tbl_prefix
(
col_pre varchar
);
Records:
insert into tbl_prefix values
('Mr.'),('Mrs.'),('Ms.'),('Dr.'),
('Jr.'),('Sr.'),('II'),('III'),
('IV'),('V'),('VI'),('VII'),
('VIII'),('I'),('IX'),('X'),
('Officer'),('Judge'),('Master');
Expected output:
col_pre
----------
Mr.
Mrs.
Ms.
Dr.
Jr.
Sr.
Officer
Judge
Master
Try:
select *
from tbl_prefix
where col_pre ~ '[^a-zA-Z]'
Getting:
col_pre
----------
Mr.
Mrs.
Ms.
Dr.
Jr.
Sr.
One approach here might be to match any prefix which is not a Roman numeral:
SELECT *
FROM tbl_prefix
WHERE col_pre !~ '^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$';
Demo
The regex pattern used here for Roman numerals was gratefully taken from this SO question:
How do you match only valid roman numerals with a regular expression?

how to find non- Ascii key symbols in a string. DB2?

How to find non-ASCII symbols in a string ? (We are using DB2)
We have tried following select statement but it is not working.
SELECT columnname
FROM tablename
WHERE columnname LIKE '%[' + CHAR(127) + '-' + CHAR(255) + ']%'
COLLATE Latin1_General_100_BIN2
I guess you were trying to use CHR() function, instead of CHAR(), which is a data-type.
If you are using a newer db2 version, that has REGEXP functions, you can try using REGEXP_LIKE() function.
Follow an example from samble db:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE REGEXP_LIKE(LASTNAME,'[E-H]')
EMPNO LASTNAME
------ ---------------
000010 HAAS
000020 THOMPSON
000050 GEYER
000060 STERN
000090 HENDERSON
000100 SPENSER
000110 LUCCHESSI
000120 O'CONNELL
000140 NICHOLLS
000170 YOSHIMURA
000180 SCOUTTEN
000190 WALKER
000210 JONES
000230 JEFFERSON
000250 SMITH
000260 JOHNSON
000270 PEREZ
000280 SCHNEIDER
000290 PARKER
000300 SMITH
000310 SETRIGHT
000320 MEHTA
000330 LEE
000340 GOUNOT
200010 HEMMINGER
200220 JOHN
200240 MONTEVERDE
200280 SCHWARTZ
200310 SPRINGER
200330 WONG
30 record(s) selected.
All names selected contains letters from E to H, as specified by the search-pattern.
As I didn't have any row containing such ranges.. I updated one of the rows, adding chars 169 and 174 to it.
Update employee set LASTNAME = ('LEE' || chr(169) || chr(174)) WHERE LASTNAME = 'LEE'
and, using this REGEXP_LIKE function:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE REGEXP_LIKE(LASTNAME , '[' || CHR(127) || '-' || CHR(255) || ']')"
EMPNO LASTNAME
------ ---------------
000330 LEE©®
1 record(s) selected.
Regards

Regexp replace country code and blank spaces from phone number

Please help me with this issue. I'm very bad at regex things. I need to remove country code and blank spaces at once from phone number. Something like:
'+12 345 678' to '345678'. Thanks for any help!
demo: db<>fiddle
Assuming the country code always is the first three characters:
SELECT replace(right('+12 345 678', -3), ' ', '')
right('xyz', -3) removes the first three characters
replace('xyz', ' ', '') removes the spaces.
More general:
SELECT
replace(right(numbers, -3), ' ', '')
FROM
phone

Postgresql: Remove hyphens and whitespaces

I am currently working on DB data which contains whitespaces and hyphens. I searched over the net and found this Remove/replace special characters in column values? . I tried to follow the answer but I am still getting hyphens. I tried playing around with it, I can only remove the whitespace
conn_p = p.connect("dbname='p_test' user='postgres' password='postgres' host='localhost'")
conn_t = p.connect("dbname='t_mig1' user='postgres' password='postgres' host='localhost'")
cur_p = conn_p.cursor()
cur_t = conn_t.cursor()
cur_t.execute("SELECT CAST(REGEXP_REPLACE(studentnumber, ' ', '') as integer), firstname, middlename, lastname FROM sprofile")
rows = cur_t.fetchall()
for row in rows:
print "Inserting ", row[0], row[1], row[2], row[3]
cur_p.execute(""" INSERT INTO "a_recipient" (id, first_name, middle_name, last_name) VALUES ('%s', '%s', '%s', '%s') """ % (row[0], row[1], row[2], row[3]))
cur_p.commit()
cur_pl.close()
cur_t.close()
What I would like to achieve is if I got a studentnumber of 001-2012-1456, it will be displayed as 000120121456.
To wipe out all characters in a set efficiently use translate. It takes a set of characters to translate into another set of characters. If the other set is empty it deletes them.
test=> select translate('001-2012-145 6', '- ', '');
translate
-------------
00120121456
While translate is simpler and faster for this particular job, it's important to know how to use regexes for others. To do it with regexp_replace there's two changes you need to make.
First, you have to match the set of - and as [- ].
Then, you have to specify to replace all occurrences, otherwise it will stop after the first one. That's done with the g flag.
test=> select regexp_replace('001-2012-145 6', '[- ]', '', 'g');
regexp_replace
----------------
00120121456
Here's a tutorial on POSIX regular expressions and character sets.
Its very simple to use inbuilt translate function.
Example:
select translate('001-2012-145 6', '- ', '');
Output of above command :
00120121456

Return first and last words in a person name - postgres

I have a list of names and I want to separate the first and last words in a person's name.
I was trying to use the "trim" function without success.
Can someone explain how could I do it?
table:
Names
Mary Johnson Angel Smith
Dinah Robertson Donald
Paul Blank Power Silver
Then I want to have as a result:
Names
Mary Smith
Dinah Donald
Paul Silver
Thanks,
You can do it simply with regular expressions, like:
substring(trim(name) FROM '^([^ ]+)') || ' ' || substring(trim(name) FROM '([^ ]+)$')
Of course it would only work you are 100% there is always supplied at least a first and a last name. I'm not 100% sure it is the case for everybody in the World. For instance: would that work for names in Chinese? I'm not sure and I avoid doing any assumption about people names. The best is to simply ask the user two fields, one for the "name" and another for "How would you like to be called?".
Another approach, which takes advantage of Postgres string processing built-in functions:
SELECT split_part(name, ' ', 1) as first_token,
split_part(name, ' ', array_length(regexp_split_to_array(name, ' '), 1)) as last_token
FROM mytable
Here's how I extracted full names from emails with a dot in them, eg Jeremy.Thompson#abc.com
SELECT split_part(email, '.', 1) || ' ' || replace(split_part(email, '.', 2), '#abc','')
FROM people
Result:
Jeremy | Thompson
You can easily replace the dot with a space:
SELECT split_part(email, ' ', 1) || ' ' || replace(split_part(email, ' ', 2), '#abc','')
FROM people