REGEXP_REPLACE replace spaces between two symbols - postgresql

I need to replace all spaces with one % between two specific symbols (# and &); like followings:
'this # is test &that did not #turn& out well'
should be converted to
'this #%is%test%&that did not #turn& out well'
and
'#pattern matching& is my number one enemy'
to
'#pattern%matching& is my number one enemy'
I almost read all related questions in stackoverflow and other sites but couldn't get a helpful answer.

One (inefficient) way of doing this is by doing multiple REGEXP_REPLACE calls.
For example, lets look at the following plpgsql function.
CREATE OR REPLACE FUNCTION replaceSpacesBetweenTwoSymbols(startChar TEXT, endChar TEXT, textToParse TEXT)
RETURNS TEXT
AS $$
DECLARE resultText TEXT := textToParse;
DECLARE tempText TEXT := textToParse;
BEGIN
WHILE TRUE LOOP
tempText = REGEXP_REPLACE(resultText,
'(' || startChar || '[^' || endChar || ']*)' || '( )(.*' || endChar || ')',
'\1%\3');
IF tempText = resultText
THEN RETURN resultText;
END IF;
resultText := tempText;
END LOOP;
RETURN resultText;
END;
$$
LANGUAGE 'plpgsql';
We create a function that takes three arguments, the startChar, the endChar and the textToParse which holds the text that will be trimmed.
We start by creating a a regular expression based on the startChar and endChar. If the value of startChar is # and the value of endChar is & we will get the following regular expression:
(#[^&]*)( )(.*&)
This regular expression is consisted of three groups:
(#[^&]*) - This group matches the text that is between the # and an an empty space character - ' ';
( ) - This group matches a single space character.
(.*&) - This group matches the text that is between a space character and the & character.
In order to replace the space (group 2), we use the following REGEXP_REPLACE call:
REGEXP_REPLACE(resultText,' (#[^&]*)( )(.*&)', '\1%\3')
From that expression you can see that we are replacing the second group (which is a space) with the % character.
This way, we will only replace one space per one REGEXP_REPLACE execution.
Once we find that there are no more spaces that need to be replaced, we return the modified TEXT.
At this moment, the spaces are replaced with % characters. One last thing we need to do is to replace the multiple consecutive % characters with a single %.
That can be done with another REGEXP_REPLACE call at the end.
So for example:
SELECT REGEXP_REPLACE(replaceSpacesBetweenTwoSymbols('#','&','this # is test &that did not #turn& out well'),'%{2,}','%');
Will return
this #%is%test%&that did not #turn& out well
as a result, while this
SELECT REGEXP_REPLACE(replaceSpacesBetweenTwoSymbols('#','&','this is #a more complex& task #test a a & w'),'%{2,}','%');
will return
this is #a%more%complex& task #test%a%a%& w
as a result.

Related

DB2 remove empty lines

I have strings like this
#
word_1
word_2
#
word_3
#
#
where # represents empty lines.
I'd like to remove those empty lines, for getting
word_1
word_2
word_3
I've tried replacing CHR(10) and CHR(13) with '' but then I get
word_1word_2word_3
I've seen I can remove the first empty line using LTRIM, but how to get rid of all of them?
You must remove all new-line characters followed by new-line character, and a single new-line character at the start and the end of a string. All these replacements can be done with a single expression.
Starting from v11.1
select regexp_replace (s, '\r\n(?=\r\n)|^\r\n|\r\n$', '')
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
Note, that you may have a new-line character encoded as x'0a' instead of x'0d0a'. Remove all the \r characters in this case from the expression above.
dbfiddle link.
Starting from v9.7
select xmlcast (xmlquery ('replace (replace ($d, "^\r\n|\r\n$", ""), "(\r\n){2,}", "$1")' passing s as "d") as varchar (100))
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
dbfiddle link.

postgresql replace symbol for store function variable?

I find '#' can replace some string in posgresql like:
SELECT
'Record #' || i
FROM
generate_series(1, 1000) i;
The symbol '#' can replace the by i, in this SQL commands.
I want to do similar like this for function like
SELECT lo_unlink('#') || i
From
SELECT picture i FROM image where belong_to = 254;
Of course '#' there is grammar error around '#', I just want to know any placehold grammar or symbol can work in function parameter like it replace string mark '#'
The symbol # is not replaced, you are concatenating a number to a string containing this symbol.
This concatenation can occur anywhere, including when building a function parameter:
SELECT lo_unlink('#' || i)
From
SELECT picture i FROM image where belong_to = 254;
which, if i is number 1, will create (and call) lo_unlink('#1')
so you may want to use i directly:
SELECT lo_unlink(i)
From
SELECT picture i FROM image where belong_to = 254;
which, if i is number 1, will create (and call) lo_unlink(1)

PostgreSQL return last n words

How to return last n words using Postgres.
I have tried using LEFT method.
SELECT DISTINCT LEFT(name, -4) FROM my_table;
but it return last 4 characters ,i want to return last 3 words.
demo:db<>fiddle
You can do this using a the SUBSTRING() function and regular expressions:
SELECT
SUBSTRING(name FROM '((\S+\s+){0,3}\S+$)')
FROM my_table
This has been explained here: How can I match the last two words in a sentence in PostgreSQL?
\S+ is a string of non-whitespace characters
\s+ is a string of whitespace characters (e.g. one space)
(\S+\s+){0,3} Zero to three words separated by a space
\S+$ one word at the end of the text.
-> creates 4 words (or less if there are no more).
One way is to use regexp_split_to_array() to split the string into the words it contains and then put a string back together using the last 3 words in that array.
SELECT coalesce(w.words[array_length(w.words, 1) - 2] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1) - 1] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1)], '')
FROM mytable t
CROSS JOIN LATERAL (SELECT regexp_split_to_array(t."name", ' ') words) w;
db<>fiddle
RIGHT() should do
SELECT RIGHT('MYCOLUMN', 4); -- returns LUMN
UPD
You can convert to array and then back to string
SELECT array_to_string(sentence[(array_length(sentence,1)-3):(array_length(sentence,1))],' ','*')
FROM
(
SELECT regexp_split_to_array('this is the one of the way to get the last four words of the string', E'\\s+') AS sentence
) foo;
DEMO HERE

Replacing Turkish characters with English characters

I have a table which has 120 columns and some of them is including Turkish characters (for example "ç","ğ","ı","ö"). So i want to replace this Turkish characters with English characters (for example "c","g","i","o"). When i use "TRANWRD Function" it could be really hard because i should write the function 120 times and sometimes hte column names could be change so always i have to check the code one by one because of that.
Is there a simple macro which replaces this characters in all columns .
EDIT
In retrospect, this is an overly complicated solution... The translate() function should be used, as pointed by another user. It could be integrated in a SAS function defined with PROC FCMP when used repeatedly.
A combination of regular expressions and a DO loop can achieve that.
Step 1: Build a conversion table in the following manner
Accentuated letters that resolve to the same replacement character are put on a single line, separated by the | symbol.
data conversions;
infile datalines dsd;
input orig $ repl $;
datalines;
ç,c
ğ,g
ı,l
ö|ò|ó,o
ë|è,e
;
Step 2: Store original and replacement strings in macro variables
proc sql noprint;
select orig, repl, count(*)
into :orig separated by ";",
:repl separated by ";",
:nrepl
from conversions;
quit;
Step 3: Do the actual conversion
Just to show how it works, let's deal with just one column.
data convert(drop=i re);
myString = "ç ğı òö ë, è";
do i = 1 to &nrepl;
re = prxparse("s/" || scan("&orig",i,";") || "/" || scan("&repl",i,";") || "/");
myString = prxchange(re,-1,myString);
end;
run;
Resulting myString: "c gl oo e, e"
To process all character columns, we use an array
Say your table is named mySource and you want all character variables to be processed; we'll create a vector called cols for that.
data convert(drop=i re);
set mySource;
array cols(*) _character_;
do c = 1 to dim(cols);
do i = 1 to &nrepl;
re = prxparse("s/" || scan("&orig",i,";") || "/" || scan("&repl",i,";") || "/");
cols(c) = prxchange(re,-1,cols(c));
end;
end;
run;
When changing single characters TRANSLATE is the proper function, it will be one line of code.
translated = translate(string,"cgio","çğıö");
First get all your columns from dictionary, and then replace the values of all of them in a macro do loop.
You can try a program like this (Replace MYTABLE with your table name):
proc sql;
select name , count(*) into :columns separated by ' ', :count
from dictionary.columns
where memname = 'MYTABLE';
quit;
%macro m;
data mytable;
set mytable;
%do i=1 %to &count;
%scan(&columns ,&i) = tranwrd(%scan(&columns ,&i),"ç","c");
%scan(&columns ,&i) = tranwrd(%scan(&columns ,&i),"ğ","g");
...
%end;
%mend;
%m;

Oracle PL/SQL: How do I filter out whitespaces in SELECT?

I have a table mytable that has a column ngram which is a VARCHAR2. I want to SELECT only those rows where ngram does not contain any whitespaces (tabs, spaces, EOLs etc). What should I replace <COND> below with?
SELECT ngram FROM mytable WHERE <COND>;
Thanks!
You could use regexp_instr (or regexp_like, or other regexp functions), see here for example
where regexp_instr(ngram, '[ '|| CHR(10) || CHR(13) || CHR(9) ||']') = 0
the white space is managed here '[ '
chr(10) = line feed
chr(13) = carriage return
chr(9) = tab
you can use CHR and INSTR function ASCII code of the characters you want to filter for example your where clause can be like this for an special character:
INSTR(ngram,CHR(the ASCI CODE of special char))=0
or the condition can be like this:
where
and ngram not like '%'||CHR(0)||'%' -- for null
.
.
.
and ngram not like '%'||CHR(31)||'%' -- for unit separator
and ngram not like '%'||CHR(127)||'%'-- for delete
here you can get all codes http://www.theasciicode.com.ar/extended-ascii-code/non-breaking-space-no-break-space-ascii-code-255.html
This should match ngram where it contains no whitespace characters by using the \s shorthand for all whitespace characters. I only tested by inserting a TAB into a string in a VARCHAR2 column and it was then excluded:
where regexp_instr(ngram, '\s') = 0;