Postgresql : regexp_replace to remove special characters - postgresql

i need to remove all special character in PostgreSQL like this
' " , . / \ | ] [ { } & * - % ^ ! # #
i try this
SELECT regexp_replace('Test.010. " # $ %. تجربه', '[^\w\s\u0600-\u06FF]', ' ', 'g');
and result
Test 010
Arabic character removed !
i need to remove only special character only and leave or replace Arabic & English & number

You can use translate to convert those specific characters to spaces:
select translate('Test.010. " # $ %. تجربه', '''",./\|][{}&*-%^!##', ' ');
translate
--------------------------
Test 010 $ تجربه

Related

DB2 remove empty lines

I have strings like this
#
word_1
word_2
#
word_3
#
#
where # represents empty lines.
I'd like to remove those empty lines, for getting
word_1
word_2
word_3
I've tried replacing CHR(10) and CHR(13) with '' but then I get
word_1word_2word_3
I've seen I can remove the first empty line using LTRIM, but how to get rid of all of them?
You must remove all new-line characters followed by new-line character, and a single new-line character at the start and the end of a string. All these replacements can be done with a single expression.
Starting from v11.1
select regexp_replace (s, '\r\n(?=\r\n)|^\r\n|\r\n$', '')
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
Note, that you may have a new-line character encoded as x'0a' instead of x'0d0a'. Remove all the \r characters in this case from the expression above.
dbfiddle link.
Starting from v9.7
select xmlcast (xmlquery ('replace (replace ($d, "^\r\n|\r\n$", ""), "(\r\n){2,}", "$1")' passing s as "d") as varchar (100))
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
dbfiddle link.

Redshift: how to remove all newline characters in a field

I am wondering how I can remove all newline characters in Redshift from a field. I tried something like this:
replace(replace(body, '\n', ' '), '\r', ' ')
and
regexp_replace(body, '[\n\r]+', ' ')
But it didn't work. Please share if you know how to do this.
Use chr(10) instead of \n
example:
select replace(CONCAT('Text 1' , chr(10), 'Text 2'), chr(10), '-') as txt
This should help
regexp_replace(column, '\r|\n', '')
To remove line breaks:
SELECT REPLACE('This line has
a line break', CHR(10), '');
This gives output: This line hasa line break. You can see more ASCII or CHR() codes here: https://www.petefreitag.com/cheatsheets/ascii-codes/
To remove special characters like \r, \n, \t
Assuming col1 has text like This line has\r\n special characters.
Using replace()
SELECT REPLACE(REPLACE(col1, '\\r', ''), '\\n', '');
We need to escape \ because backslash is special character in SQL (used to escape double quotes, etc...)
Using regexp_replace()
SELECT REGEXP_REPLACE(col1, '(\\\\r|\\\\n)', '');
We need to escape \ because it is a special character in SQL and we need to escape the resulting backslashes again because backslash is a special character in regex as well.
Both replace() and regexp_replace() gives output: This line has special characters.

Antler matches a similar rule (but fails on the parts that differ)

I'm creating an Xtext plugin and for some reason, the following line incorrectly matches the StringStatement rule when it should match the UnstringStatement rule:
UNSTRING test2 DELIMITED BY " " INTO test2 END-UNSTRING
Here is my grammar:
Program:
(elements+=Elemental)*
(s+=Statement)*
;
Variable_Name:
varName=ID ("-" ID)*
;
Variable_Reference:
varRef=ID ("-" ID)*
;
Elemental:
'VAR' var=Variable_Name
;
Statement:
(us=UnstringStatement|s=StringStatement)
;
StringParam:
Variable_Reference | STRING
;
StringStatement:
'STRING' in=StringParam 'DELIMITED BY SIZE INTO' out=Variable_Reference 'END-STRING'
;
UnstringStatement:
'UNSTRING' in=StringParam 'DELIMITED BY' string2=STRING 'INTO' (outs+=Variable_Reference)* 'END-UNSTRING'
;
When I run the project as an Eclipse Application, the 'UNSTRING' token is highlighted (correctly), but the rest of the line has the error "Mismatched character '"' expecting 'S'." The 'S' that the error refers too, is from 'SIZE'.
Any idea why the two rules overlap like this?
EDIT, forgot the STRING rule:
terminal STRING :
'"' ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|'"') )* '"' |
"'" ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|"'") )* "'"
;
EDIT 2:
After stepping through some of the Lexer code, I discovered that the token "DELIMITED BY" is incorrectly matched to "DELIMITED BY SIZE INTO", which then fails.
EDIT 3 FIXED:
I fixed this, but have no idea why it works. I just added a terminal DELIMITED_BY:
terminal DELIMITED_BY: 'DELIMITED BY'
StringStatement:
'STRING' in=StringParam DELIMITED_BY 'SIZE INTO' out=Variable_Reference 'END-STRING'
;
UnstringStatement:
'UNSTRING' in=StringParam DELIMITED_BY string2=STRING 'INTO' (outs+=Variable_Reference)* 'END-UNSTRING'
;
The STRING Token looks too greedy. In ANTLR the expression should be
terminal STRING :
'"' ( '\\' . | !('\\'|'"') )*? '"' |
"'" ( '\\' . | !('\\'|"'") )*? "'"
;

Oracle PL/SQL: How do I filter out whitespaces in SELECT?

I have a table mytable that has a column ngram which is a VARCHAR2. I want to SELECT only those rows where ngram does not contain any whitespaces (tabs, spaces, EOLs etc). What should I replace <COND> below with?
SELECT ngram FROM mytable WHERE <COND>;
Thanks!
You could use regexp_instr (or regexp_like, or other regexp functions), see here for example
where regexp_instr(ngram, '[ '|| CHR(10) || CHR(13) || CHR(9) ||']') = 0
the white space is managed here '[ '
chr(10) = line feed
chr(13) = carriage return
chr(9) = tab
you can use CHR and INSTR function ASCII code of the characters you want to filter for example your where clause can be like this for an special character:
INSTR(ngram,CHR(the ASCI CODE of special char))=0
or the condition can be like this:
where
and ngram not like '%'||CHR(0)||'%' -- for null
.
.
.
and ngram not like '%'||CHR(31)||'%' -- for unit separator
and ngram not like '%'||CHR(127)||'%'-- for delete
here you can get all codes http://www.theasciicode.com.ar/extended-ascii-code/non-breaking-space-no-break-space-ascii-code-255.html
This should match ngram where it contains no whitespace characters by using the \s shorthand for all whitespace characters. I only tested by inserting a TAB into a string in a VARCHAR2 column and it was then excluded:
where regexp_instr(ngram, '\s') = 0;

Truncating leading zero from the string in postgresql

I'm trying to truncate leading zero from the address. example:
input
1 06TH ST
12 02ND AVE
123 001St CT
expected output
1 6TH ST
12 2ND AVE
123 1St CT
Here is what i have:
update table
set address = regexp_replace(address,'(0\d+(ST|ND|TH))','?????? need help here')
where address ~ '\s0\d+(ST|ND|TH)\s';
many thanks in advance
assuming that the address always has some number/letter address (1234, 1a, 33B) followed by a sequence of 1 or more spaces followed by the part you want to strip leading zeroes...
select substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0') from table;
or, to update the table:
update table set address = substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0');
-g
What you are looking for is the back references in the regular expressions:
UPDATE table
SET address = regexp_replace(address, '\m0+(\d+\w+)', '\1', 'g')
WHERE address ~ '\m0+(\d+\w+)'
Also:
\m used to match the beginning of a word (to avoid replacing inside words (f.ex. in 101Th)
0+ truncates all zeros (does not included in the capturing parenthesis)
\d+ used to capture the remaining numbers
\w+ used to capture the remaining word characters
a word caracter can be any alphanumeric character, and the underscore _.