regexp_replace in PostgreSQL

regexp_replace in PostgreSQL - postgresql

I want to replace E-*-[_]F* string into E-*-\_F*. The code I am using is below.
select regexp_replace('E-*-[_]F*','-[\[(.)\]]', E'\\', 'g'); -- E-*\_]F*
I am not able to remove the closing bracket.

assuming you want the character inside the braces to be placed after a backslash:
jasen=# select regexp_replace('E-*-[_]F*','-\[(.)\]', '\\\1', 'g');
regexp_replace
----------------
E-*\_F*
(1 row)
The pattern looks for any character (.) between -[ and ]
the parentheses make it remember the character.
The whole matched part is replaced with a backslash, represented by \\ , followed by the first (and only) remembered part \1.

Related

Postgres escape double quotes

I am working with a malformed database which seems to have double quotes as part of the column names.
Example:
|"Market" |
|---------|
|Japan |
|UK |
|USA |
And I want to select like below
SELECT "\"Market\"" FROM mytable; /* Does not work */
How does one select such a thing?

The documentation says
[A] delimited identifier or quoted identifier […] is formed by enclosing an arbitrary sequence of characters in double-quotes ("). […]
Quoted identifiers can contain any character, except the character with code zero. (To include a double quote, write two double quotes.)
So you'll want to use
SELECT """Market""" AS "Market" FROM mytable;
An alternative would be
A variant of quoted identifiers allows including escaped Unicode characters identified by their code points. This variant starts with U& (upper or lower case U followed by ampersand) immediately before the opening double quote, without any spaces in between, for example U&"foo". […] Inside the quotes, Unicode characters can be specified in escaped form by writing a backslash followed by the four-digit hexadecimal code point number or alternatively a backslash followed by a plus sign followed by a six-digit hexadecimal code point number.
which in your case would mean
SELECT U&"\0022Market\0022" AS "Market" FROM mytable;
SELECT U&"\+000022Market\+000022" AS "Market" FROM mytable;
Disclaimer: your database may not actually have double quotes as part of the name itself. As mentioned in the comments, this might just be the way in which the tool you are using does display a column named Market (not market) since
Quoting an identifier also makes it case-sensitive
So all you might need could be
SELECT "Market" FROM mytable;

How to replace a chain of the same characters of unkown length from a string to a unique occurrence or it?

I have the following string in my DB, in a column we will name 'info', in a table we can call 'conversation':
Hello, I''''''''''''''''m Brian and I''''m looking for the kitchen
I would like to know if it's possible to replace the '''''' to a single occurrence of itself in PostgreSQL.
So:
Hello, I'm Brian and I'm looking for the kitchen

You can use regexp_replace for that:
select regexp_replace(info, '''+', '''', 'g')
from conversation;
The regular expression looks a bit weird due to the escaping of the single quotes, but it essentially is: '+ which means "at least one single quoted followed by any number of single quotes" and the replacement values (third parameter) is just one single quote.
Online example: https://rextester.com/HGWDZ41975

How to replace a single quote character with two single quote characters

I want to replace all instances of a string quote (') with two single quotes ('') in a string.
Lets say e'QmfgLu/]sf]/sd is a string and I want to replace ' with ''.
The result must be e''OmfgLu/]
I tried this query:
update customer set name=REGEXP_REPLACE(name, E'\'', '''');
also
update customer set name=REPLACE(name, E'\'', '''');
This query is not properly working. What is the suitable way to write the query?

You may replace a single occurrence of single quotes with 2 quotes using this regexp.
update customer set name=REGEXP_REPLACE(name, $$([^'])'([^'])$$, $$\1''\2$$ ,'g');
$$([^'])'([^'])$$ - represents a sequence of any character other than a single quote followed by a quote and then a non-quote character.
I'm using the dollar quoting to avoid confusing quotes.
Demo
EDIT
As #edruid pointed out, to handle case for quotes at the start and end of string, use:
REGEXP_REPLACE(name, $$([^']|^)'(?!')$$, $$\1''$$ ,'g')
This uses a negative lookahead for matching a single quote - (?!')
Demo2

In postgres the way to have a single quote in a string is to type '' (' is used as the escape character) so your replace would be
update customer set name=REGEXP_REPLACE(name, E'''', '''''', 'g');
(skip the final 'g' if you only want to replace the first ')
or without resorting to regexp:
update customer set name=REPLACE(name, '''', '''''');

does Postgres string functions (including regexp_*) consider tab as spaces?

I am running some SQL over a Postgres Server 9.5
a field got sometimes has leading spaces, including literal white space, and tab spaces '\t'
in many programming languages it's easy to do with a regexp replace, like this in JavaScript:
> ' \tafsdfwef\t \n'.replace(/\s+/g, '')
'afsdfwef'
then I found PostgreSQL also has this regexp_replace function and it also support \s to mean [[:space:]]
https://www.postgresql.org/docs/10/functions-matching.html#FUNCTIONS-POSIX-REGEXP
but this \s seems only recognizing literally white spaces ' ' ? the question is does this PostgreSQL regex support \s to include all kinds of spaces ( tabs, newlines )?
db=> SELECT regexp_replace('\tafsdfwef', '\s+', '');
regexp_replace
----------------
\tafsdfwef
(1 row)
db=> SELECT regexp_matches('\tafsdfwef', '\s+');
regexp_matches
----------------
(0 rows)
then I tested if trim function can recognize the other spaces? seems also no?
db=> SELECT trim('\tafsdfwef\t');
btrim
--------------
\tafsdfwef\t
(1 row)
db=> SELECT trim(' \tafsdfwef\t');
btrim
--------------
\tafsdfwef\t
(1 row)
db=> SELECT trim(' \tafsdfwef\t \n ');
btrim
------------------
\tafsdfwef\t \n
(1 row)
So, does PostgreSQL have an easy function can do strip all kinds of spaces, in leading, in middle, and at tail of a string?
EDIT: My complain is also toward the PostgreSQL documentation, they mentioned \t to [:space:] but isn't really all kinds of spaces, as most programmers know, it mentioned POSIX regex but isn't really POSIX,
anyone knows a better place to file them a bug ?
https://www.postgresql.org/docs/10/functions-matching.html#FUNCTIONS-POSIX-REGEXP
EDIT: here is Mozilla JavaScript documentation, what \s means
a single white space character, including space, tab, form feed, line feed and other Unicode spaces. Equivalent to [ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff].
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

Yes, Postgres regexp functions do consider tab as space. Actually the text '\tafsdfwef' does not contain tab character. You have to write the letter E (upper or lower case) just before the opening single quote to get tab char (and/or other escape chars) in it:
SELECT regexp_replace(E'\ta\nb\fc\rd', '\s', '', 'g')
regexp_replace
----------------
abcd
(1 row)
Read about string constants in the documentation.

PostgreSQL regexp_replace with matched expression

I am using PostgreSQL regexp_replace function to escape square brackets, parentheses and backslash in a string so that I could use that string as a regex pattern itself (there are other manipulations done on this string as well before using it, but they are outside the scope of this question. The idea is to replace:
[ with \[
] with \]
( with \(
) with \)
\ with \\
Postgres documentation page on regular expressions states the following:
The replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text.
However regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\1', 'g'); produces abc \ def\.
Further down on that same page, an example is given, which uses \\1 notation - so I tried that.
Yet, regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\\1', 'g'); produces abc \1def\1.
I would guess this is expected, but regexp_replace('abc [def]', '([\[\]\(\)\\])', E'.\\1', 'g'); produces abc .[def.]. That is, escaping works with characters other than the standard backslash.
At this point I don't know how to proceed. What can I do to actually give me the replacement I want?

OK, found the answer. Apparently, I need to double-escape the backslash in the replacement. Also, I need to E-prefix and double-escape backslashes in the search pattern on older versions of postgres (8.3 in my case). The final code looks like this:
regexp_replace('abc [def]', E'([\\[\\]\\(\\)\\\\\?\\|_%])', E'\\\\\\1', 'g')
Yes, it looks horrible, but it works :)

it's simpliest way
select regexp_replace('abc [def]', '([\[\]\(\)\\])', '\\\1', 'g')

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

regexp_replace in PostgreSQL - postgresql

I want to replace E--[_]F string into E--\_F. The code I am using is below. select regexp_replace('E--[_]F','-[\[(.)\]]', E'\\', 'g'); -- E-\_]F I am not able to remove the closing bracket.

Related

Postgres escape double quotes

How to replace a chain of the same characters of unkown length from a string to a unique occurrence or it?

How to replace a single quote character with two single quote characters

does Postgres string functions (including regexp_*) consider tab as spaces?

PostgreSQL regexp_replace with matched expression

Categories

Resources

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

regexp_replace in PostgreSQL - postgresql

I want to replace E-*-[_]F* string into E-*-\_F*. The code I am using is below. select regexp_replace('E-*-[_]F*','-[\[(.)\]]', E'\\', 'g'); -- E-*\_]F* I am not able to remove the closing bracket.

Related

Postgres escape double quotes

How to replace a chain of the same characters of unkown length from a string to a unique occurrence or it?

How to replace a single quote character with two single quote characters

does Postgres string functions (including regexp_*) consider tab as spaces?

PostgreSQL regexp_replace with matched expression

Categories

Resources

I want to replace E--[_]F string into E--\_F. The code I am using is below. select regexp_replace('E--[_]F','-[\[(.)\]]', E'\\', 'g'); -- E-\_]F I am not able to remove the closing bracket.