How to generate a random 6-digit integer that's cryptographically strong? - postgresql

We want a function that given an argument that is three bytes of type bytea (generated by the function gen_random_bytes of the pgcrypto extension), the function returns a random 6-digit integer (between 0 & 999999 inclusive). The 6-digit integer should preserve the randomness given by the argument passed to the function.

I might be overcomplicating, but I need to make sure that I have exactly n numeric digits in the string. Repetition of numbers is allowed.
SELECT string_agg(shuffle('0123456789')::char, '')
FROM generate_series(1, 6);
With the shuffle function provided in another answer that I copied here for convenience
create or replace function shuffle(text)
returns text language sql as $$
select string_agg(ch, '')
from (
select substr($1, i, 1) ch
from generate_series(1, length($1)) i
order by random()
) s
$$;

In case of 3 bytes, take 6 last chars, prepend 'x', convert to bitstring and then to int:
select ('x' || right(gen_random_bytes(3)::text, 6))::bit(24)::int;
More details are in similar question: What's the easiest way to represent a bytea as a single integer in PostgreSQL?

Related

error 'function sum(character varying) does not exist' using sum with SQL Alchemy and Postgres [duplicate]

I want to calculate the average number from a column in PostgreSQL
SELECT AVG(col_name)
From TableName
It gives me this error:
ERROR: function avg (character varying) does not exist
LINE 1: SELECT AVG(col_name)
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Store numbers in a numeric field, an integer, decimal or whatever. But not in a text/varchar field.
Check the manual for all numeric data types.
Nasty workaound: CAST some records to a numeric value and keep others as text. Example:
/*
create temp table foo AS
SELECT x FROM (VALUES('x'),('1'),('2')) sub(x);
*/
WITH cte AS (
SELECT
CASE
WHEN x ~ '[0-9]' THEN CAST(x AS decimal) -- cast to numeric field
END AS num,
CASE
WHEN x ~ '[a-zA-Z]' THEN x
END AS a
FROM foo
)
SELECT AVG(num), COUNT(a) FROM cte;

Typecast string to integer

I am importing data from a table which has raw feeds in Varchar, I need to import a column in varchar into a string column. I tried using the <column_name>::integer as well as to_number(<column_name>,'9999999') but I am getting errors, as there are a few empty fields, I need to retrieve them as empty or null into the new table.
Wild guess: If your value is an empty string, you can use NULLIF to replace it for a NULL:
SELECT
NULLIF(your_value, '')::int
You can even go one further and restrict on this coalesced field such as, for example:-
SELECT CAST(coalesce(<column>, '0') AS integer) as new_field
from <table>
where CAST(coalesce(<column>, '0') AS integer) >= 10;
If you need to treat empty columns as NULLs, try this:
SELECT CAST(nullif(<column>, '') AS integer);
On the other hand, if you do have NULL values that you need to avoid, try:
SELECT CAST(coalesce(<column>, '0') AS integer);
I do agree, error message would help a lot.
The only way I succeed to not having an error because of NULL, or special characters or empty string is by doing this:
SELECT REGEXP_REPLACE(COALESCE(<column>::character varying, '0'), '[^0-9]*' ,'0')::integer FROM table
I'm not able to comment (too little reputation? I'm pretty new) on Lukas' post.
On my PG setup to_number(NULL) does not work, so my solution would be:
SELECT CASE WHEN column = NULL THEN NULL ELSE column :: Integer END
FROM table
If the value contains non-numeric characters, you can convert the value to an integer as follows:
SELECT CASE WHEN <column>~E'^\\d+$' THEN CAST (<column> AS INTEGER) ELSE 0 END FROM table;
The CASE operator checks the < column>, if it matches the integer pattern, it converts the rate into an integer, otherwise it returns 0
Common issue
Naively type casting any string into an integer like so
SELECT ''::integer
Often results to the famous error:
Query failed: ERROR: invalid input syntax for integer: ""
Problem
PostgreSQL has no pre-defined function for safely type casting any string into an integer.
Solution
Create a user-defined function inspired by PHP's intval() function.
CREATE FUNCTION intval(character varying) RETURNS integer AS $$
SELECT
CASE
WHEN length(btrim(regexp_replace($1, '[^0-9]', '','g')))>0 THEN btrim(regexp_replace($1, '[^0-9]', '','g'))::integer
ELSE 0
END AS intval;
$$
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
Usage
/* Example 1 */
SELECT intval('9000');
-- output: 9000
/* Example 2 */
SELECT intval('9gag');
-- output: 9
/* Example 3 */
SELECT intval('the quick brown fox jumps over the lazy dog');
-- output: 0
you can use this query
SUM(NULLIF(conversion_units, '')::numeric)
And if your column has decimal points
select NULLIF('105.0', '')::decimal
This works for me:
select (left(regexp_replace(coalesce('<column_name>', '0') || '', '[^0-9]', '', 'g'), 8) || '0')::integer
For easy view:
select (
left(
regexp_replace(
-- if null then '0', and convert to string for regexp
coalesce('<column_name>', '0') || '',
'[^0-9]',
'',
'g'
), -- remove everything except numbers
8 -- ensure ::integer doesn't overload
) || '0' -- ensure not empty string gets to ::integer
)::integer
The perfect solution for me is to use nullif and regexp_replace
SELECT NULLIF(REGEXP_REPLACE('98123162t3712t37', '[^0-9]', '', 'g'), '')::bigint;
Above solution consider the following edge cases.
String and Number: only the regexp_replace function perfectly converts into integers.
SELECT NULLIF(REGEXP_REPLACE('string and 12345', '[^0-9]', '', 'g'), '')::bigint;
Only string: regexp_replace converts non-string characters to empty strings; which can't cast directly to integer so use nullif to convert to null
SELECT NULLIF(REGEXP_REPLACE('only string', '[^0-9]', '', 'g'), '')::bigint;
Integer range: Converting a string into integer may cause out of range for type integer error. So use bigint instead
SELECT NULLIF(REGEXP_REPLACE('98123162t3712t37', '[^0-9]', '', 'g'), '')::bigint;

PostreSQL ERROR: Function AVG (character varying) does not exist

I want to calculate the average number from a column in PostgreSQL
SELECT AVG(col_name)
From TableName
It gives me this error:
ERROR: function avg (character varying) does not exist
LINE 1: SELECT AVG(col_name)
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Store numbers in a numeric field, an integer, decimal or whatever. But not in a text/varchar field.
Check the manual for all numeric data types.
Nasty workaound: CAST some records to a numeric value and keep others as text. Example:
/*
create temp table foo AS
SELECT x FROM (VALUES('x'),('1'),('2')) sub(x);
*/
WITH cte AS (
SELECT
CASE
WHEN x ~ '[0-9]' THEN CAST(x AS decimal) -- cast to numeric field
END AS num,
CASE
WHEN x ~ '[a-zA-Z]' THEN x
END AS a
FROM foo
)
SELECT AVG(num), COUNT(a) FROM cte;

Convert hex in text representation to decimal number

I am trying to convert hex to decimal using PostgreSQL 9.1
with this query:
SELECT to_number('DEADBEEF', 'FMXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX');
I get the following error:
ERROR: invalid input syntax for type numeric: " "
What am I doing wrong?
Ways without dynamic SQL
There is no cast from hex numbers in text representation to a numeric type, but we can use bit(n) as waypoint. There are undocumented casts from bit strings (bit(n)) to integer types (int2, int4, int8) - the internal representation is binary compatible. Quoting Tom Lane:
This is relying on some undocumented behavior of the bit-type input
converter, but I see no reason to expect that would break. A possibly
bigger issue is that it requires PG >= 8.3 since there wasn't a text
to bit cast before that.
integer for max. 8 hex digits
Up to 8 hex digits can be converted to bit(32) and then coerced to integer (standard 4-byte integer):
SELECT ('x' || lpad(hex, 8, '0'))::bit(32)::int AS int_val
FROM (
VALUES
('1'::text)
, ('f')
, ('100')
, ('7fffffff')
, ('80000000') -- overflow into negative number
, ('deadbeef')
, ('ffffffff')
, ('ffffffff123') -- too long
) AS t(hex);
int_val
------------
1
15
256
2147483647
-2147483648
-559038737
-1
Postgres uses a signed integer type, so hex numbers above '7fffffff' overflow into negative integer numbers. This is still a valid, unique representation but the meaning is different. If that matters, switch to bigint; see below.
For more than 8 hex digits the least significant characters (excess to the right) get truncated.
4 bits in a bit string encode 1 hex digit. Hex numbers of known length can be cast to the respective bit(n) directly. Alternatively, pad hex numbers of unknown length with leading zeros (0) as demonstrated and cast to bit(32). Example with 7 hex digits and int or 8 digits and bigint:
SELECT ('x'|| 'deafbee')::bit(28)::int
, ('x'|| 'deadbeef')::bit(32)::bigint;
int4 | int8
-----------+------------
233503726 | 3735928559
bigint for max. 16 hex digits
Up to 16 hex digits can be converted to bit(64) and then coerced to bigint (int8, 8-byte integer) - overflowing into negative numbers in the upper half again:
SELECT ('x' || lpad(hex, 16, '0'))::bit(64)::bigint AS int8_val
FROM (
VALUES
('ff'::text)
, ('7fffffff')
, ('80000000')
, ('deadbeef')
, ('7fffffffffffffff')
, ('8000000000000000') -- overflow into negative number
, ('ffffffffffffffff')
, ('ffffffffffffffff123') -- too long
) t(hex);
int8_val
---------------------
255
2147483647
2147483648
3735928559
9223372036854775807
-9223372036854775808
-1
-1
uuid for max. 32 hex digits
The Postgres uuid data type is not a numeric type. But it's the most efficient type in standard Postgres to store up to 32 hex digits, only occupying 16 bytes of storage. There is a direct cast from text to uuid (no need for bit(n) as waypoint), but exactly 32 hex digits are required.
SELECT lpad(hex, 32, '0')::uuid AS uuid_val
FROM (
VALUES ('ff'::text)
, ('deadbeef')
, ('ffffffffffffffff')
, ('ffffffffffffffffffffffffffffffff')
, ('ffffffffffffffffffffffffffffffff123') -- too long
) t(hex);
uuid_val
--------------------------------------
00000000-0000-0000-0000-0000000000ff
00000000-0000-0000-0000-0000deadbeef
00000000-0000-0000-ffff-ffffffffffff
ffffffff-ffff-ffff-ffff-ffffffffffff
ffffffff-ffff-ffff-ffff-ffffffffffff
As you can see, standard output is a string of hex digits with typical separators for UUID.
md5 hash
This is particularly useful to store md5 hashes:
SELECT md5('Store hash for long string, maybe for index?')::uuid AS md5_hash;
md5_hash
--------------------------------------
02e10e94-e895-616e-8e23-bb7f8025da42
See:
What is the optimal data type for an MD5 field?
You have two immediate problems:
to_number doesn't understand hexadecimal.
X doesn't have any meaning in a to_number format string and anything without a meaning apparently means "skip a character".
I don't have an authoritative justification for (2), just empirical evidence:
=> SELECT to_number('123', 'X999');
to_number
-----------
23
(1 row)
=> SELECT to_number('123', 'XX999');
to_number
-----------
3
The documentation mentions how double quoted patterns are supposed to behave:
In to_date, to_number, and to_timestamp, double-quoted strings skip the number of input characters contained in the string, e.g. "XX" skips two input characters.
but the behavior of non-quoted characters that are not formatting characters appears to be unspecified.
In any case, to_number isn't the right tool for converting hex to numbers, you want to say something like this:
select x'deadbeef'::int;
so perhaps this function will work better for you:
CREATE OR REPLACE FUNCTION hex_to_int(hexval varchar) RETURNS integer AS $$
DECLARE
result int;
BEGIN
EXECUTE 'SELECT x' || quote_literal(hexval) || '::int' INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
Then:
=> select hex_to_int('DEADBEEF');
hex_to_int
------------
-559038737 **
(1 row)
** To avoid negative numbers like this from integer overflow error, use bigint instead of int to accommodate larger hex numbers (like IP addresses).
pg-bignum
Internally, pg-bignum uses the SSL library for big numbers. This method has none of the drawbacks mentioned in the other answers with numeric. Nor is it slowed down by plpgsql. It's fast and it works with a number of any size. Test case taken from Erwin's answer for comparison,
CREATE EXTENSION bignum;
SELECT hex, bn_in_hex(hex::cstring)
FROM (
VALUES ('ff'::text)
, ('7fffffff')
, ('80000000')
, ('deadbeef')
, ('7fffffffffffffff')
, ('8000000000000000')
, ('ffffffffffffffff')
, ('ffffffffffffffff123')
) t(hex);
hex | bn_in_hex
---------------------+-------------------------
ff | 255
7fffffff | 2147483647
80000000 | 2147483648
deadbeef | 3735928559
7fffffffffffffff | 9223372036854775807
8000000000000000 | 9223372036854775808
ffffffffffffffff | 18446744073709551615
ffffffffffffffff123 | 75557863725914323415331
(8 rows)
You can get the type to numeric using bn_in_hex('deadbeef')::text::numeric.
If anybody else is stuck with PG8.2, here is another way to do it.
bigint version:
create or replace function hex_to_bigint(hexval text) returns bigint as $$
select
(get_byte(x,0)::int8<<(7*8)) |
(get_byte(x,1)::int8<<(6*8)) |
(get_byte(x,2)::int8<<(5*8)) |
(get_byte(x,3)::int8<<(4*8)) |
(get_byte(x,4)::int8<<(3*8)) |
(get_byte(x,5)::int8<<(2*8)) |
(get_byte(x,6)::int8<<(1*8)) |
(get_byte(x,7)::int8)
from (
select decode(lpad($1, 16, '0'), 'hex') as x
) as a;
$$
language sql strict immutable;
int version:
create or replace function hex_to_int(hexval text) returns int as $$
select
(get_byte(x,0)::int<<(3*8)) |
(get_byte(x,1)::int<<(2*8)) |
(get_byte(x,2)::int<<(1*8)) |
(get_byte(x,3)::int)
from (
select decode(lpad($1, 8, '0'), 'hex') as x
) as a;
$$
language sql strict immutable;
Here is a version which uses numeric, so it can handle arbitrarily large hex strings:
create function hex_to_decimal(hex_string text)
returns text
language plpgsql immutable as $pgsql$
declare
bits bit varying;
result numeric := 0;
exponent numeric := 0;
chunk_size integer := 31;
start integer;
begin
execute 'SELECT x' || quote_literal(hex_string) INTO bits;
while length(bits) > 0 loop
start := greatest(1, length(bits) - chunk_size);
result := result + (substring(bits from start for chunk_size)::bigint)::numeric * pow(2::numeric, exponent);
exponent := exponent + chunk_size;
bits := substring(bits from 1 for greatest(0, length(bits) - chunk_size));
end loop;
return trunc(result, 0);
end
$pgsql$;
For example:
=# select hex_to_decimal('ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff');
32592575621351777380295131014550050576823494298654980010178247189670100796213387298934358015
Here is a poper way to convert hex to string... then you can check whether it's a numric type or not
SELECT convert_from('\x7468697320697320612076657279206C6F6E672068657820737472696E67','utf8')
returns
this is a very long hex string
Here is a other version which uses numeric, so it can handle arbitrarily large hex strings:
create OR REPLACE function hex_to_decimal2(hex_string text)
returns text
language plpgsql immutable as $pgsql$
declare
bits bit varying;
result numeric := 0;
begin
execute 'SELECT x' || quote_literal(hex_string) INTO bits;
while length(bits) > 0 loop
result := result + (substring(bits from 1 for 1)::bigint)::numeric * pow(2::numeric, length(bits) - 1);
bits := substring(bits from 2 for length(bits) - 1);
end loop;
return trunc(result, 0);
end
$pgsql$;
For example:
=# select hex_to_decimal('ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff');
32592575621351777380295131014550050576823494298654980010178247189670100796213387298934358015
For example:
=# select hex_to_decimal('5f68e8131ecf80000');
110000000000000000000
Here is another implementation:
CREATE OR REPLACE FUNCTION hex_to_decimal3(hex_string text)
RETURNS numeric
LANGUAGE plpgsql
IMMUTABLE
AS $function$
declare
hex_string_lower text := lower(hex_string);
i int;
digit int;
s numeric := 0;
begin
for i in 1 .. length(hex_string) loop
digit := position(substr(hex_string_lower, i, 1) in '0123456789abcdef') - 1;
if digit < 0 then
raise '"%" is not a valid hexadecimal digit', substr(hex_string_lower, i, 1) using errcode = '22P02';
end if;
s := s * 16 + digit;
end loop;
return s;
end
$function$;
It is a straightforward one that works digit by digit, using the position() function to compute the numeric value of each character in the input string. Its benefit over hex_to_decimal2() is that it seems to be much faster (4x or so for md5()-generated hex strings).
CREATE OR REPLACE FUNCTION numeric_from_bytes(bytea)
RETURNS numeric
LANGUAGE plpgsql
AS $$
declare
bits bit varying;
result numeric := 0;
exponent numeric := 0;
bit_pos integer;
begin
execute 'SELECT x' || quote_literal(substr($1::text,3)) into bits;
bit_pos := length(bits) + 1;
exponent := 0;
while bit_pos >= 56 loop
bit_pos := bit_pos - 56;
result := result + substring(bits from bit_pos for 56)::bigint::numeric * pow(2::numeric, exponent);
exponent := exponent + 56;
end loop;
while bit_pos >= 8 loop
bit_pos := bit_pos - 8;
result := result + substring(bits from bit_pos for 8)::bigint::numeric * pow(2::numeric, exponent);
exponent := exponent + 8;
end loop;
return trunc(result);
end;
$$;
In a future PostgreSQL version, when/if Dean Rasheed's patch 0001-Add-non-decimal-integer-support-to-type-numeric.patch gets committed, this can be simplified:
CREATE OR REPLACE FUNCTION numeric_from_bytes(bytea)
RETURNS numeric
LANGUAGE sql
AS $$
SELECT ('0'||right($1::text,-1))::numeric
$$;

How to find the first and last occurrences of a specific character inside a string in PostgreSQL

I want to find the first and the last occurrences of a specific character inside a string. As an example, consider a string named "2010-####-3434", and suppose the character to be searched for is "#". The first occurrence of hash inside the string is at 6-th position, and the last occurrence is at 9-th position.
Well...
Select position('#' in '2010-####-3434');
will give you the first.
If you want the last, just run that again with the reverse of your string. A pl/pgsql string reverse can be found here.
Select length('2010-####-3434') - position('#' in reverse_string('2010-####-3434')) + 1;
My example:
reverse(substr(reverse(newvalue),0,strpos(reverse(newvalue),',')))
Reverse all string
Substring string
Reverse result
In the case where char = '.', an escape is needed. So the function can be written:
CREATE OR REPLACE FUNCTION last_post(text,char)
RETURNS integer LANGUAGE SQL AS $$
select length($1)- length(regexp_replace($1, E'.*\\' || $2,''));
$$;
9.5+ with array_positions
Using basic PostgreSQL array functions we call string_to_array(), and then feed that to array_positions() like this array_positions(string_to_array(str,null), c)
SELECT
arrpos[array_lower(arrpos,1)] AS first,
arrpos[array_upper(arrpos,1)] AS last
FROM ( VALUES
('2010-####-3434', '#')
) AS t(str,c)
CROSS JOIN LATERAL array_positions(string_to_array(str,null), c)
AS arrpos;
I do not know how to do that, but the regular expression functions like regexp_matches, regexp_replace, and regexp_split_to_array may be an alternative route to solving your problem.
This pure SQL function will provide the last position of a char inside the string, counting from 1. It returns 0 if not found ... But (big disclaimer) it breaks if the character is some regex metacharacter ( .$^()[]*+ )
CREATE FUNCTION last_post(text,char) RETURNS integer AS $$
select length($1)- length(regexp_replace($1, '.*' || $2,''));
$$ LANGUAGE SQL IMMUTABLE;
test=# select last_post('hi#-#-#byte','#');
last_post
-----------
7
test=# select last_post('hi#-#-#byte','a');
last_post
-----------
0
A more robust solution would involve pl/pgSQL, as rfusca's answer.
Another way to count last position is to slit string to array by delimeter equals to needed character and then substract length of characters
for the last element from the length of whole string
CREATE FUNCTION last_pos(char, text) RETURNS INTEGER AS
$$
select length($2) - length(a.arr[array_length(a.arr,1)])
from (select string_to_array($2, $1) as arr) as a
$$ LANGUAGE SQL;
For the first position it is easier to use
select position('#' in '2010-####-3434');