Redshift: how to remove all newline characters in a field - amazon-redshift

I am wondering how I can remove all newline characters in Redshift from a field. I tried something like this:
replace(replace(body, '\n', ' '), '\r', ' ')
and
regexp_replace(body, '[\n\r]+', ' ')
But it didn't work. Please share if you know how to do this.

Use chr(10) instead of \n
example:
select replace(CONCAT('Text 1' , chr(10), 'Text 2'), chr(10), '-') as txt

This should help
regexp_replace(column, '\r|\n', '')

To remove line breaks:
SELECT REPLACE('This line has
a line break', CHR(10), '');
This gives output: This line hasa line break. You can see more ASCII or CHR() codes here: https://www.petefreitag.com/cheatsheets/ascii-codes/
To remove special characters like \r, \n, \t
Assuming col1 has text like This line has\r\n special characters.
Using replace()
SELECT REPLACE(REPLACE(col1, '\\r', ''), '\\n', '');
We need to escape \ because backslash is special character in SQL (used to escape double quotes, etc...)
Using regexp_replace()
SELECT REGEXP_REPLACE(col1, '(\\\\r|\\\\n)', '');
We need to escape \ because it is a special character in SQL and we need to escape the resulting backslashes again because backslash is a special character in regex as well.
Both replace() and regexp_replace() gives output: This line has special characters.

Related

DB2 remove empty lines

I have strings like this
#
word_1
word_2
#
word_3
#
#
where # represents empty lines.
I'd like to remove those empty lines, for getting
word_1
word_2
word_3
I've tried replacing CHR(10) and CHR(13) with '' but then I get
word_1word_2word_3
I've seen I can remove the first empty line using LTRIM, but how to get rid of all of them?
You must remove all new-line characters followed by new-line character, and a single new-line character at the start and the end of a string. All these replacements can be done with a single expression.
Starting from v11.1
select regexp_replace (s, '\r\n(?=\r\n)|^\r\n|\r\n$', '')
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
Note, that you may have a new-line character encoded as x'0a' instead of x'0d0a'. Remove all the \r characters in this case from the expression above.
dbfiddle link.
Starting from v9.7
select xmlcast (xmlquery ('replace (replace ($d, "^\r\n|\r\n$", ""), "(\r\n){2,}", "$1")' passing s as "d") as varchar (100))
from (values x'0d0a' || 'abc' || x'0d0a0d0a'|| 'def' || x'0d0a') t (s)
dbfiddle link.

Regexp replace country code and blank spaces from phone number

Please help me with this issue. I'm very bad at regex things. I need to remove country code and blank spaces at once from phone number. Something like:
'+12 345 678' to '345678'. Thanks for any help!
demo: db<>fiddle
Assuming the country code always is the first three characters:
SELECT replace(right('+12 345 678', -3), ' ', '')
right('xyz', -3) removes the first three characters
replace('xyz', ' ', '') removes the spaces.
More general:
SELECT
replace(right(numbers, -3), ' ', '')
FROM
phone

Postgresql: Remove hyphens and whitespaces

I am currently working on DB data which contains whitespaces and hyphens. I searched over the net and found this Remove/replace special characters in column values? . I tried to follow the answer but I am still getting hyphens. I tried playing around with it, I can only remove the whitespace
conn_p = p.connect("dbname='p_test' user='postgres' password='postgres' host='localhost'")
conn_t = p.connect("dbname='t_mig1' user='postgres' password='postgres' host='localhost'")
cur_p = conn_p.cursor()
cur_t = conn_t.cursor()
cur_t.execute("SELECT CAST(REGEXP_REPLACE(studentnumber, ' ', '') as integer), firstname, middlename, lastname FROM sprofile")
rows = cur_t.fetchall()
for row in rows:
print "Inserting ", row[0], row[1], row[2], row[3]
cur_p.execute(""" INSERT INTO "a_recipient" (id, first_name, middle_name, last_name) VALUES ('%s', '%s', '%s', '%s') """ % (row[0], row[1], row[2], row[3]))
cur_p.commit()
cur_pl.close()
cur_t.close()
What I would like to achieve is if I got a studentnumber of 001-2012-1456, it will be displayed as 000120121456.
To wipe out all characters in a set efficiently use translate. It takes a set of characters to translate into another set of characters. If the other set is empty it deletes them.
test=> select translate('001-2012-145 6', '- ', '');
translate
-------------
00120121456
While translate is simpler and faster for this particular job, it's important to know how to use regexes for others. To do it with regexp_replace there's two changes you need to make.
First, you have to match the set of - and as [- ].
Then, you have to specify to replace all occurrences, otherwise it will stop after the first one. That's done with the g flag.
test=> select regexp_replace('001-2012-145 6', '[- ]', '', 'g');
regexp_replace
----------------
00120121456
Here's a tutorial on POSIX regular expressions and character sets.
Its very simple to use inbuilt translate function.
Example:
select translate('001-2012-145 6', '- ', '');
Output of above command :
00120121456

Oracle PL/SQL: How do I filter out whitespaces in SELECT?

I have a table mytable that has a column ngram which is a VARCHAR2. I want to SELECT only those rows where ngram does not contain any whitespaces (tabs, spaces, EOLs etc). What should I replace <COND> below with?
SELECT ngram FROM mytable WHERE <COND>;
Thanks!
You could use regexp_instr (or regexp_like, or other regexp functions), see here for example
where regexp_instr(ngram, '[ '|| CHR(10) || CHR(13) || CHR(9) ||']') = 0
the white space is managed here '[ '
chr(10) = line feed
chr(13) = carriage return
chr(9) = tab
you can use CHR and INSTR function ASCII code of the characters you want to filter for example your where clause can be like this for an special character:
INSTR(ngram,CHR(the ASCI CODE of special char))=0
or the condition can be like this:
where
and ngram not like '%'||CHR(0)||'%' -- for null
.
.
.
and ngram not like '%'||CHR(31)||'%' -- for unit separator
and ngram not like '%'||CHR(127)||'%'-- for delete
here you can get all codes http://www.theasciicode.com.ar/extended-ascii-code/non-breaking-space-no-break-space-ascii-code-255.html
This should match ngram where it contains no whitespace characters by using the \s shorthand for all whitespace characters. I only tested by inserting a TAB into a string in a VARCHAR2 column and it was then excluded:
where regexp_instr(ngram, '\s') = 0;

Why is the T-SQL "LIKE" operator not evaluating this expression like I think it should?

I am attempting to error trap a T-SQL variable name by making sure that the value of the variable is prefixed with a bracket "[".
Here's an example of how I am trying to do this:
DECLARE #thing nvarchar(20)
SET #thing = '[55555'
IF(#thing NOT LIKE '[' + '%') --If the value does not start with [ then add it
BEGIN
SET #thing = '[' + #thing
END
PRINT #thing
The example above PRINT's [[55555
Notice that the original value of #thing was prefixed with the bracket "[". I was expecting the IF condition would have returned false since "[55555" is LIKE '[' + '%'
Why is the IF condition not returning false? And, more importantly I suppose, what is the correct syntax to check for the existence of a string that occurs at the beginning of a variable string value?
EDIT
It appears as there is something special about the bracket "[". When I run LIKE on a bracket it doesn't do what I expect, but when I don't use a bracket the LIKE works the way I expect.
Check out these examples:
IF('[' NOT LIKE '[')
BEGIN
PRINT '[ is NOT LIKE ['
END
ELSE
BEGIN
PRINT '[ is LIKE ['
END
IF('StackO' NOT LIKE 'StackO')
BEGIN
PRINT 'STACKO is NOT LIKE StackO'
END
ELSE
BEGIN
PRINT 'STACKO is LIKE StackO'
END
Here's the output of the two conditions:
[ is NOT LIKE [
STACKO is LIKE StackO
I believe it may be because '[' is actually part of the LIKE operators syntax, as defined here: http://msdn.microsoft.com/en-us/library/ms179859.aspx
You need to define an escape character to escape the [, like this:
DECLARE #thing nvarchar(20)
SET #thing = '[55555'
IF(#thing NOT LIKE '\[%' ESCAPE '\' )
BEGIN
SET #thing = '[' + #thing
END
PRINT #thing
An alternative solution would be the following:
DECLARE #thing nvarchar(20)
SET #thing = '[55555'
IF(LEFT(#thing,1) <> '[') --If the value does not start with [ then add it
BEGIN
SET #thing = '[' + #thing
END
PRINT #thing
To get it working change your check to
IF (SUBSTRING(#thing, 1,1) != '[')
The reason why the like is not working is because [ is a special char in like. just like % is. See here
Bracket characters ([ and ]) are special wildcard characters in T-SQL. To search for those literal characters, you'll want to escape those characters (indicate that you want to search for those literal characters, rather than employing them as wildcards). Use ESCAPE to do this, like so:
DECLARE #thing nvarchar(20)
SET #thing = '[55555'
-- pick an escape character you won't see in your content
IF(#thing NOT LIKE '![' + '%' ESCAPE '!')
BEGIN
SET #thing = '[' + #thing
END
PRINT #thing
This prints [55555.
From MSDN:
You can search for character strings that include one or more of the special wildcard characters... To search for the percent sign as a character instead of as a wildcard character, the ESCAPE keyword and escape character must be provided. For example, a sample database contains a column named comment that contains the text 30%. To search for any rows that contain the string 30% anywhere in the comment column, specify a WHERE clause such as WHERE comment LIKE '%30!%%' ESCAPE '!'.
You have to escape special characters (brackets, single quotes, etc.). In this case, you could do this:
LIKE '[['
EDIT:
PS -- [ is a special character because it can be used for wildcards, like this: LIKE '[0-9]' to do pattern-matching. (In this case, the match is like a regex -- any digit between 0 and 9.