I have strings like this
#
word_1
word_2
#
word_3
#
#
where # represents empty lines.
I'd like to remove those empty lines, for getting
word_1
word_2
word_3
I've tried replacing CHR(10) and CHR(13) with '' but then I get
word_1word_2word_3
I've seen I can remove the first empty line using LTRIM, but how to get rid of all of them?
You must remove all new-line characters followed by new-line character, and a single new-line character at the start and the end of a string. All these replacements can be done with a single expression.
Starting from v11.1
select regexp_replace (s, '\r\n(?=\r\n)|^\r\n|\r\n$', '')
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
Note, that you may have a new-line character encoded as x'0a' instead of x'0d0a'. Remove all the \r characters in this case from the expression above.
dbfiddle link.
Starting from v9.7
select xmlcast (xmlquery ('replace (replace ($d, "^\r\n|\r\n$", ""), "(\r\n){2,}", "$1")' passing s as "d") as varchar (100))
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
dbfiddle link.
Related
How to return last n words using Postgres.
I have tried using LEFT method.
SELECT DISTINCT LEFT(name, -4) FROM my_table;
but it return last 4 characters ,i want to return last 3 words.
demo:db<>fiddle
You can do this using a the SUBSTRING() function and regular expressions:
SELECT
SUBSTRING(name FROM '((\S+\s+){0,3}\S+$)')
FROM my_table
This has been explained here: How can I match the last two words in a sentence in PostgreSQL?
\S+ is a string of non-whitespace characters
\s+ is a string of whitespace characters (e.g. one space)
(\S+\s+){0,3} Zero to three words separated by a space
\S+$ one word at the end of the text.
-> creates 4 words (or less if there are no more).
One way is to use regexp_split_to_array() to split the string into the words it contains and then put a string back together using the last 3 words in that array.
SELECT coalesce(w.words[array_length(w.words, 1) - 2] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1) - 1] || ' ', '')
|| coalesce(w.words[array_length(w.words, 1)], '')
FROM mytable t
CROSS JOIN LATERAL (SELECT regexp_split_to_array(t."name", ' ') words) w;
db<>fiddle
RIGHT() should do
SELECT RIGHT('MYCOLUMN', 4); -- returns LUMN
UPD
You can convert to array and then back to string
SELECT array_to_string(sentence[(array_length(sentence,1)-3):(array_length(sentence,1))],' ','*')
FROM
(
SELECT regexp_split_to_array('this is the one of the way to get the last four words of the string', E'\\s+') AS sentence
) foo;
DEMO HERE
I need to replace all spaces with one % between two specific symbols (# and &); like followings:
'this # is test &that did not #turn& out well'
should be converted to
'this #%is%test%&that did not #turn& out well'
and
'#pattern matching& is my number one enemy'
to
'#pattern%matching& is my number one enemy'
I almost read all related questions in stackoverflow and other sites but couldn't get a helpful answer.
One (inefficient) way of doing this is by doing multiple REGEXP_REPLACE calls.
For example, lets look at the following plpgsql function.
CREATE OR REPLACE FUNCTION replaceSpacesBetweenTwoSymbols(startChar TEXT, endChar TEXT, textToParse TEXT)
RETURNS TEXT
AS $$
DECLARE resultText TEXT := textToParse;
DECLARE tempText TEXT := textToParse;
BEGIN
WHILE TRUE LOOP
tempText = REGEXP_REPLACE(resultText,
'(' || startChar || '[^' || endChar || ']*)' || '( )(.*' || endChar || ')',
'\1%\3');
IF tempText = resultText
THEN RETURN resultText;
END IF;
resultText := tempText;
END LOOP;
RETURN resultText;
END;
$$
LANGUAGE 'plpgsql';
We create a function that takes three arguments, the startChar, the endChar and the textToParse which holds the text that will be trimmed.
We start by creating a a regular expression based on the startChar and endChar. If the value of startChar is # and the value of endChar is & we will get the following regular expression:
(#[^&]*)( )(.*&)
This regular expression is consisted of three groups:
(#[^&]*) - This group matches the text that is between the # and an an empty space character - ' ';
( ) - This group matches a single space character.
(.*&) - This group matches the text that is between a space character and the & character.
In order to replace the space (group 2), we use the following REGEXP_REPLACE call:
REGEXP_REPLACE(resultText,' (#[^&]*)( )(.*&)', '\1%\3')
From that expression you can see that we are replacing the second group (which is a space) with the % character.
This way, we will only replace one space per one REGEXP_REPLACE execution.
Once we find that there are no more spaces that need to be replaced, we return the modified TEXT.
At this moment, the spaces are replaced with % characters. One last thing we need to do is to replace the multiple consecutive % characters with a single %.
That can be done with another REGEXP_REPLACE call at the end.
So for example:
SELECT REGEXP_REPLACE(replaceSpacesBetweenTwoSymbols('#','&','this # is test &that did not #turn& out well'),'%{2,}','%');
Will return
this #%is%test%&that did not #turn& out well
as a result, while this
SELECT REGEXP_REPLACE(replaceSpacesBetweenTwoSymbols('#','&','this is #a more complex& task #test a a & w'),'%{2,}','%');
will return
this is #a%more%complex& task #test%a%a%& w
as a result.
I am wondering how I can remove all newline characters in Redshift from a field. I tried something like this:
replace(replace(body, '\n', ' '), '\r', ' ')
and
regexp_replace(body, '[\n\r]+', ' ')
But it didn't work. Please share if you know how to do this.
Use chr(10) instead of \n
example:
select replace(CONCAT('Text 1' , chr(10), 'Text 2'), chr(10), '-') as txt
This should help
regexp_replace(column, '\r|\n', '')
To remove line breaks:
SELECT REPLACE('This line has
a line break', CHR(10), '');
This gives output: This line hasa line break. You can see more ASCII or CHR() codes here: https://www.petefreitag.com/cheatsheets/ascii-codes/
To remove special characters like \r, \n, \t
Assuming col1 has text like This line has\r\n special characters.
Using replace()
SELECT REPLACE(REPLACE(col1, '\\r', ''), '\\n', '');
We need to escape \ because backslash is special character in SQL (used to escape double quotes, etc...)
Using regexp_replace()
SELECT REGEXP_REPLACE(col1, '(\\\\r|\\\\n)', '');
We need to escape \ because it is a special character in SQL and we need to escape the resulting backslashes again because backslash is a special character in regex as well.
Both replace() and regexp_replace() gives output: This line has special characters.
I have a table mytable that has a column ngram which is a VARCHAR2. I want to SELECT only those rows where ngram does not contain any whitespaces (tabs, spaces, EOLs etc). What should I replace <COND> below with?
SELECT ngram FROM mytable WHERE <COND>;
Thanks!
You could use regexp_instr (or regexp_like, or other regexp functions), see here for example
where regexp_instr(ngram, '[ '|| CHR(10) || CHR(13) || CHR(9) ||']') = 0
the white space is managed here '[ '
chr(10) = line feed
chr(13) = carriage return
chr(9) = tab
you can use CHR and INSTR function ASCII code of the characters you want to filter for example your where clause can be like this for an special character:
INSTR(ngram,CHR(the ASCI CODE of special char))=0
or the condition can be like this:
where
and ngram not like '%'||CHR(0)||'%' -- for null
.
.
.
and ngram not like '%'||CHR(31)||'%' -- for unit separator
and ngram not like '%'||CHR(127)||'%'-- for delete
here you can get all codes http://www.theasciicode.com.ar/extended-ascii-code/non-breaking-space-no-break-space-ascii-code-255.html
This should match ngram where it contains no whitespace characters by using the \s shorthand for all whitespace characters. I only tested by inserting a TAB into a string in a VARCHAR2 column and it was then excluded:
where regexp_instr(ngram, '\s') = 0;
I'm trying to truncate leading zero from the address. example:
input
1 06TH ST
12 02ND AVE
123 001St CT
expected output
1 6TH ST
12 2ND AVE
123 1St CT
Here is what i have:
update table
set address = regexp_replace(address,'(0\d+(ST|ND|TH))','?????? need help here')
where address ~ '\s0\d+(ST|ND|TH)\s';
many thanks in advance
assuming that the address always has some number/letter address (1234, 1a, 33B) followed by a sequence of 1 or more spaces followed by the part you want to strip leading zeroes...
select substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0') from table;
or, to update the table:
update table set address = substr(address, 1, strpos(address, ' ')) || ltrim(substr(address, strpos(address, ' ')), ' 0');
-g
What you are looking for is the back references in the regular expressions:
UPDATE table
SET address = regexp_replace(address, '\m0+(\d+\w+)', '\1', 'g')
WHERE address ~ '\m0+(\d+\w+)'
Also:
\m used to match the beginning of a word (to avoid replacing inside words (f.ex. in 101Th)
0+ truncates all zeros (does not included in the capturing parenthesis)
\d+ used to capture the remaining numbers
\w+ used to capture the remaining word characters
a word caracter can be any alphanumeric character, and the underscore _.