PostgreSQL 9.3: Pass more than 100 arguments to `REPLACE` function - postgresql

I am passing around 1000 of arguments to the function REPLACE.
Example:
String contains some values:
Declare
str1 varchar = '1,2,3,4.................1000';
Now I want to replace the , with the "," for which I am using the following
script:
SELECT REPLACE(str1,',','","');
But getting an error:
Error Detail:
cannot pass more than 100 arguments to a function

The replace function can search for only one string to replace. You can look for multiple strings with the regexp_replace function. This example replaces both a and c with nothing:
select regexp_replace('abc', '(a)|(c)', '', 'g');
-->
b
The g option stands for global, which allows multiple replacements. Note that regex_replace can look for multiple strings, but is still limited to one replacement string.

Related

Postgres replacing 'text' with e'text'

I inserted a bunch of rows with a text field like content='...\n...\n...'.
I didn't use e in front, like conent=e'...\n...\n..., so now \n is not actually displayed as a newline - it's printed as text.
How do I fix this, i.e. how to change every row's content field from '...' to e'...'?
The syntax variant E'string' makes Postgres interpret the given string as Posix escape string. \n encoding a newline is only one of many interpreted escape sequences (even if the most common one). See:
Insert text with single quotes in PostgreSQL
To "re-evaluate" your Posix escape string, you could use a simple function with dynamic SQL like this:
CREATE OR REPLACE FUNCTION f_eval_posix_escapes(INOUT _string text)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE 'SELECT E''' || _string || '''' INTO _string;
END
$func$;
WARNING 1: This is inherently unsafe! We have to evaluate input strings dynamically without quoting and escaping, which allows SQL injection. Only use this in a safe environment.
WARNING 2: Don't apply repeatedly. Or it will misinterpret your actual string with genuine \ characters, etc.
WARNING 3: This simple function is imperfect as it cannot cope with nested single quotes properly. If you have some of those, consider instead:
Unescape a string with escaped newlines and carriage returns
Apply:
UPDATE tbl
SET content = f_eval_posix_escapes(content)
WHERE content IS DISTINCT FROM f_eval_posix_escapes(content);
db<>fiddle here
Note the added WHERE clause to skip updates that would not change anything. See:
How do I (or can I) SELECT DISTINCT on multiple columns?
Use REPLACE in an update query. Something like this: (I'm on mobile so please ignore any typo or syntax erro)
UPDATE table
SET
column = REPLACE(column, '\n', e'\n')

postgres regexp_matches strange behavior

Following the short docs on regexp_matches:
Return all captured substrings resulting from matching a POSIX regular expression against the string.
Example: regexp_matches('foobarbequebaz', '(bar)(beque)') returns {bar,beque}
With that in mind, I'd expect the result of regexp_matches('barbarbar', '(bar)') to be {bar,bar,bar}
However, only {bar} is returned.
Is this the expected behavior? Am I missing something?
Note:
calling regexp_matches('barbarbar', '(bar)', 'g') does return all 3 bars, but in table form:
regexp_matches text[]
{bar}
{bar}
{bar}
This behavior is described more in details in 9.7.3. POSIX Regular Expressions :
The regexp_matches function returns a set of text arrays of captured
substring(s) resulting from matching a POSIX regular expression
pattern to a string. It has the same syntax as regexp_match. This
function returns no rows if there is no match, one row if there is a
match and the g flag is not given, or N rows if there are N matches
and the g flag is given. Each returned row is a text array containing
the whole matched substring or the substrings matching parenthesized
subexpressions of the pattern, just as described above for
regexp_match. regexp_matches accepts all the flags shown in Table
9.24, plus the g flag which commands it to return all matches, not just the first one.
This is expected behavior. The function returns a set of text[] which means that multiple matches are presented in multiple rows. Why is it organized this way? The goal is to make it possible to find more than one token from a single match. In this case, they are presented in the form of an array. The documentation delivers a telling example:
SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
regexp_matches
----------------
{bar,beque}
{bazil,barf}
(2 rows)
The query returns two matches, each of them containing two tokens found.

PostgreSQL regexp.replace all unwanted chars

I have registration codes in my PostgreSQL table which are written messy, like MU-321-AB, MU/321/AB, MU 321-AB and so forth...
I would need to clear all of this to get MU321AB.
For this I uses following expression:
SELECT DISTINCT regexp_replace(ccode, '([^A-Za-z0-9])', ''), ...
This expression work as expected in 'NET' but not in PostgreSQL where it 'clears' only first occurrence of unwanted character.
How would I modify regular expression which will replace all unwanted chars in string to get clear code with only letters and numbers?
Use the global flag, but without any capture groups:
SELECT DISTINCT regexp_replace(ccode, '[^A-Za-z0-9]', '', 'g'), ...
Note that the global flag is part of the standard regular expression parser, so .NET is not following the standard in this case. Also, since you do not want anything extracted from the string - you just want to replace some characters - you should not use capture groups ().

Convert all hex in a string to its char value in Redshift

In Redshift, I'm trying to convert strings like this:
http%3A%2F%2Fwww.amazon.com%2FTest%3Fname%3DGary%26Bob
To look like this:
http://www.amazon.com/Test?name=Gary&Bob
Basically I need to convert all of the hex in a string to its char value. The only way I can think of is to use a regex function. I tried to do it in two different ways and received error messages for both:
SELECT REGEXP_REPLACE(hex_string, '%([[:xdigit:]][[:xdigit:]])', CHR(x'\\1'::int))
ERROR: 22P02: "\" is not a valid hexadecimal digit
SELECT REGEXP_REPLACE(hex_string, '%([[:xdigit:]][[:xdigit:]])',CHR(STRTOL('0x'||'\\1', 16)::int))
ERROR: 22023: The input 0x\1 is not valid to be converted to base 16
The CHR and STRTOL functions works by itself. For example:
SELECT CHR(x'3A'::int)
SELECT CHR(STRTOL('0x3A', 16)::int)
both returns
:
And if I run the same pattern using a different function (other than CHR and STRTOL), it works:
REGEXP_REPLACE(hex_string, '%([[:xdigit:]][[:xdigit:]])', LOWER('{H}'||'\\1'||'{/H}'))
returns
http{h}3A{/h}{h}2F{/h}{h}2F{/h}www.amazon.com{h}2F{/h}Test{h}3F{/h}name{h}3D{/h}Gary{h}26{/h}Bob
But for some reason those functions won't recognize the regex matching group.
Any tips on how I can do this?
I guess the other solution is to use nested REPLACE() functions for all of the special hex characters, but that's probably a very last resort.
What you want to do is called "URL decode".
Currently there is no built-in function for doing this, but you can create a custom User-Defined Function (make sure you have the required privileges):
CREATE FUNCTION urldecode(url VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
import urllib
return urllib.unquote(url).decode('utf8') # or 'latin-1', depending on how the text is encoded
$$ LANGUAGE plpythonu;
Example query:
SELECT urldecode('http%3A%2F%2Fwww.amazon.com%2FTest%3Fname%3DGary%26Bob');
Result:
http://www.amazon.com/Test?name=Gary&Bob
I tried #hiddenbit's answer in REDSHIFT, but Python 3 isn't supported. The following Py2 code did work for me, however:
DROP FUNCTION urldecode(varchar);
CREATE FUNCTION urldecode(url VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
import urllib
return urllib.unquote(url)
$$ LANGUAGE plpythonu;

Use multiple words in FullText Search input string

I have basic stored procedure that performs a full text search against 3 columns in a table by passing in a #Keyword parameter. It works fine with one word but falls over when I try pass in more than one word. I'm not sure why. The error says:
Syntax error near 'search item' in the full-text search condition 'this is a search item'
SELECT S.[SeriesID],
S.[Name] as 'SeriesName',
P.[PackageID],
P.[Name]
FROM [Series] S
INNER JOIN [PackageSeries] PS ON S.[SeriesID] = PS.[PackageID]
INNER JOIN [Package] P ON PS.[PackageID] = P.[PackageID]
WHERE CONTAINS ((S.[Name],S.[Description], S.[Keywords]),#Keywords)
AND (S.[IsActive] = 1) AND (P.[IsActive] = 1)
ORDER BY [Name] ASC
You will have to do some pre-processing on your #Keyword parameter before passing it into the SQL statement. SQL expects that keyword searches will be separated by boolean logic or surrounded in quotes. So, if you are searching for the phrase, it will have to be in quotes:
SET #Keyword = '"this is a search item"'
If you want to search for all the words then you'll need something like
SET #Keyword = '"this" AND "is" AND "a" AND "search" AND "item"'
For more information, see the T-SQL CONTAINS syntax, looking in particular at the Examples section.
As an additional note, be sure to replace the double-quote character (with a space) so you don't mess up your full-text query. See this question for details on how to do that: SQL Server Full Text Search Escape Characters?
Further to Aaron's answer, provided you are using SQL Server 2016 or greater (130), you could use the in-built string fuctions to pre-process your input string. E.g.
SELECT
#QueryString = ISNULL(STRING_AGG('"' + value + '*"', ' AND '), '""')
FROM
STRING_SPLIT(#Keywords, ' ');
Which will produce a query string you can pass to CONTAINS or FREETEXT that looks like this:
'"this*" AND "is*" AND "a*" AND "search*" AND "item*"'
or, when #Keywords is null:
""