ERROR: requested character too large for encoding: 14844072 - postgresql

I am converting following line of code from Oracle to PostgreSQL.
In Oracle:
select CHR(14844072) from dual
Output:
"
"
In postgresql:
select CHR(14844072);
Getting an error:
SQL Error [54000]: ERROR: requested character too large for encoding:
14844072

The behavior of the function is different from Oracle to Postgresql.
In oracle the statement is valid. So is, for example:
select CHR(0) from dual;
While in Postgresql, you can't SELECT CHR(0):
chr(0) is disallowed because text data types cannot store that
character.
Source: https://www.postgresql.org/docs/14/functions-string.html
This is just an example. More specific: what do you expect with value 14844072? Empty string is nonsense for Postgresql.
In Oracle you have this situation:
For single-byte character sets, if n > 256, then Oracle Database returns the binary equivalent of n mod 256
For multibyte character sets, n must resolve to one entire code point
But:
Invalid code points are not validated, and the result of specifying
invalid code points is indeterminate.
Source: https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions019.htm
In PostgreSQL the function depends from encoding, but, assuming you use UTF8:
In UTF8 encoding the argument is treated as a Unicode code point. In
other multibyte encodings the argument must designate an ASCII
character
Short answer: you need to work on the application code OR build your own function, something like this (just en example):
CREATE OR REPLACE FUNCTION myCHR(integer) RETURNS TEXT
AS $$
BEGIN
IF $1 = 0 THEN
RETURN '';
ELSE
IF $1 <= 1114111 THEN --replace the number according to your encoding
RETURN CHR($1);
ELSE
RETURN '';
END IF;
END IF;
END;
$$ LANGUAGE plpgsql;

In Oracle, this function expects an UTF8 encoded value. Now 14844072 in hex is E280A8, which corresponds to the UNICODE code point hex 2028, the "line separator" character.
In PostgreSQL, chr() expexts the code point as argument. So feed it the decimal value that corresponds to hex 2028:
SELECT chr(8232);

Related

How to deal with input parameter containing single quote in between value?

I have function which take varying character from frontend and return certain computed values, but the issue iam facing is when input value for that parameter contain single quote than its throwing error like procedure does not exist.
CREATE OR REPLACE PROCEDURE compute(p_company_name character varying DEFAULT NULL::character, INOUT response double precision DEFAULT NULL::double precision)
LANGUAGE plpgsql
AS $procedure$
begin
select estimate into response from tableA
where comp = p_company_name;
exception
when others then select -1 into response;---other error
end
$procedure$
;
For all input value without quote in it works fine when input value for parameter is like p_company_name = samsung's then it throwing error.
Please help thanks.
Your code is broken - you use wrong (or you don't use) parameter escaping. Every input should be sanitized by quote escaping:
Input: "Pavel's book" -> Output "Pavel''s book"
select foo('samsung's'); -- syntax error
select foo('samsung''s'); -- ok
or you can use custom string
select foo($$samsung's$$); -- ok
You should to read some about SQL injection, because if you see described problem, then your application is SQL injection vulnerable.

Convert all hex in a string to its char value in Redshift

In Redshift, I'm trying to convert strings like this:
http%3A%2F%2Fwww.amazon.com%2FTest%3Fname%3DGary%26Bob
To look like this:
http://www.amazon.com/Test?name=Gary&Bob
Basically I need to convert all of the hex in a string to its char value. The only way I can think of is to use a regex function. I tried to do it in two different ways and received error messages for both:
SELECT REGEXP_REPLACE(hex_string, '%([[:xdigit:]][[:xdigit:]])', CHR(x'\\1'::int))
ERROR: 22P02: "\" is not a valid hexadecimal digit
SELECT REGEXP_REPLACE(hex_string, '%([[:xdigit:]][[:xdigit:]])',CHR(STRTOL('0x'||'\\1', 16)::int))
ERROR: 22023: The input 0x\1 is not valid to be converted to base 16
The CHR and STRTOL functions works by itself. For example:
SELECT CHR(x'3A'::int)
SELECT CHR(STRTOL('0x3A', 16)::int)
both returns
:
And if I run the same pattern using a different function (other than CHR and STRTOL), it works:
REGEXP_REPLACE(hex_string, '%([[:xdigit:]][[:xdigit:]])', LOWER('{H}'||'\\1'||'{/H}'))
returns
http{h}3A{/h}{h}2F{/h}{h}2F{/h}www.amazon.com{h}2F{/h}Test{h}3F{/h}name{h}3D{/h}Gary{h}26{/h}Bob
But for some reason those functions won't recognize the regex matching group.
Any tips on how I can do this?
I guess the other solution is to use nested REPLACE() functions for all of the special hex characters, but that's probably a very last resort.
What you want to do is called "URL decode".
Currently there is no built-in function for doing this, but you can create a custom User-Defined Function (make sure you have the required privileges):
CREATE FUNCTION urldecode(url VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
import urllib
return urllib.unquote(url).decode('utf8') # or 'latin-1', depending on how the text is encoded
$$ LANGUAGE plpythonu;
Example query:
SELECT urldecode('http%3A%2F%2Fwww.amazon.com%2FTest%3Fname%3DGary%26Bob');
Result:
http://www.amazon.com/Test?name=Gary&Bob
I tried #hiddenbit's answer in REDSHIFT, but Python 3 isn't supported. The following Py2 code did work for me, however:
DROP FUNCTION urldecode(varchar);
CREATE FUNCTION urldecode(url VARCHAR)
RETURNS varchar
IMMUTABLE AS $$
import urllib
return urllib.unquote(url)
$$ LANGUAGE plpythonu;

Replacing nonbreaking spaces (%A0) in Postgres

I've got some values in a varchar column that are separated by nonbreaking spaces (urlencoded %A0 instead of %20). I'm trying to replace them with spaces, but can't seem to get the syntax right:
select regexp_replace('hello world', E'\xa0', ' ')
What is the correct way to encode the character in a Postgres regexp_replace function? Or, is there a better way to do the replacement?
Replacing '\xa0' didn't work for me, possibly because my strings were in UTF-8 rather than Latin1 or other where the character is encoded directly as A0. (U+A0 is encoded with bytes C2 A0 in UTF-8)
I found it more practical to replace it as a code point (U+A0) rather than as the encoded bytes (C2 A0 or A0):
select replace('456321 ', E'\u00a0', '') -- value is E'456321\u00a0'
This may help you
select replace('Hello world', '\xa0', '')
Ref Postgresql (Current) Section 9.4. String Functions and Operators

How to XOR md5 hash values and cast them to HEX in postgresql

What I have tried so far
SELECT md5(text) will return text (hex strings) .
After that We need to xor them
SELECT x'hex_string' # x'hex_string';
But the above results in binary values.
How do I again convert them into hex string?
Is there anyway to xor md5 values in postgresql and convert this into hexadecimal values again ?
Those binary values are in fact of type bit varying, which differs significantly from bytea.
bit varying comes with built-in support for XOR and such, but PostgreSQL doesn't provide a cast from bit varying to bytea.
You could write a function that does the cast, but it's not trivial and probably not the more efficient way in your case.
It would make more sense to XOR the md5 digests directly. PostgreSQL doesn't provide the XOR operator for bytea either, but it can be easily written in a function, especially when assumed that the operands have an equal length (16 bytes in the case of md5 digests):
CREATE FUNCTION xor_digests(_in1 bytea, _in2 bytea) RETURNS bytea
AS $$
DECLARE
o int; -- offset
BEGIN
FOR o IN 0..octet_length(_in1)-1 LOOP
_in1 := set_byte(_in1, o, get_byte(_in1, o) # get_byte(_in2, o));
END LOOP;
RETURN _in1;
END;
$$ language plpgsql;
Now the built-in postgresql md5 function that produces an hex string is not the best fit for post-processing, either. The pgcrypto module provides this function instead:
digest(data text, type text) returns bytea
Using this function and getting the final result as an hex string:
select encode(
xor_digest ( digest('first string', 'md5') ,
digest('second string', 'md5')),
'hex');
produces the result: c1bd61a3c411bc0127c6d7ab1238c4bd with type text.
If pgcrypto can't be installed and only the built-in md5 function is available, you could still combine encode and decode to achieve the result like this:
select
encode(
xor_digest(
decode(md5('first string'), 'hex'),
decode(md5('second string'), 'hex')
),
'hex'
);
Result:
c1bd61a3c411bc0127c6d7ab1238c4bd
You may want to check
https://github.com/artejera/text_xor_agg/
I recently wrote this postgres script for such a use case.

How to convert literal \u sequences into UTF-8? [duplicate]

This question already has an answer here:
Convert escaped Unicode character back to actual character in PostgreSQL
(1 answer)
Closed 8 years ago.
I am loading data dump from external source and some strings contain \uXXXX sequences for the UTF8 chars, like this one:
\u017D\u010F\u00E1r nad S\u00E1zavou
I can check the contents by using E'' constant in psql, but cannot find any function/operator to return me proper value.
I'd like to ask, if it's possible to convert this string with unicode escapes into normal UTF8 without using PL/pgSQL functions?
I don't think there is a built in method for that. Easiest way I can think of is the plpgsql function you wanted to avoid:
CREATE OR REPLACE FUNCTION str_eval(text, OUT t text)
LANGUAGE plpgsql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
BEGIN
EXECUTE 'SELECT E''' || replace($1, '''', '''''') || ''''
USING $1
INTO t;
END
$func$;
The updated version safeguards against SQLi and is faster, too.