PostgreSQL: Treat TEXT column as hexadecimal number [duplicate]

PostgreSQL: Treat TEXT column as hexadecimal number [duplicate] - postgresql

This question already has answers here:
Convert hex in text representation to decimal number
(9 answers)
Closed 5 years ago.
I have a table that has a TEXT column which contains hexadecimal numbers.
I now need to represent them as the integers that they really are, but found no
way of doing that (something like to_hex() in reverse).
I am aware that I can convert literal hexadecimal values like this:
SELECT x'DEADBEEF';
But how do I apply something like this if the value to be converted comes from a
column? Concatenating 'x' to the column name obviously doesn't work, because then
it is no longer a string literal.
I found a very ugly function in the PostgreSQL mailing lists which
pieces together a query string such that the function argument is then again a
literal, and then executes that function, but the approach is just downright
perverse---there has to be a better way. At least I hope so, given that the
message is almost ten years old…
Of course, I know that having the value in question stored as an integer in the
database in the first place would be the way to go. But this is not possible in
this case, so I'm stuck with trying to decypher those strings…)

The following functions are roughly the same as the function in that post from the mailing list. In fact, I took them from the mailing list too, but from a newer post. I can't see anything wrong with them. I've used it just once while migrating a small dataset.
Feel free to downvote if you can point anything "perverse" that could potentially derive from its use.
With INTEGER datatype:
CREATE OR REPLACE FUNCTION hex_to_int(hexval varchar) RETURNS integer AS $$
DECLARE
result int;
BEGIN
EXECUTE 'SELECT x''' || hexval || '''::int' INTO result;
RETURN result;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE STRICT;
With BIGINT datatype:
CREATE OR REPLACE FUNCTION hex_to_bigint(hexval varchar) RETURNS bigint AS $$
DECLARE
result bigint;
BEGIN
EXECUTE 'SELECT x''' || hexval || '''::bigint' INTO result;
RETURN result;
END;
$$
LANGUAGE 'plpgsql' IMMUTABLE STRICT;

Hm, there might be a simpler way to do it.
CREATE FUNCTION from_hex(text) RETURNS integer AS $$
DECLARE
x bytea;
BEGIN
x := decode($1, 'hex');
return (get_byte(x, 0) << 24) | (get_byte(x, 1) << 16) |
(get_byte(x, 2) << 8) | get_byte(x, 3);
END
$$ LANGUAGE plpgsql;
Note that as written, this only works for 8-digit hexadecimal numbers.

Related

Postgres function to turn regexp_matches SETOF result into an ARRAY

I am having a go at pgSQL for the first time.
Many thanks in advance indeed - Eugen
The problem I like to solve
I like to return an array of words with a minimum length from a text. Picked regexp_matches which returns SETOF text[] (not array). There may be better ways but this is what I picked.
regexp_matches ( string text, pattern text [, flags text ] ) → setof text[]
Postgres Reference for String functins
The Issue
When trying to use individual results from the resut set (seemingly arrays) they are in fact of type text but include the curly brackets of a Postgres array.
Example
Functionally the below example works because I added a workaround removing curly brackets with the trim function.
The output for the function described below
# select lower_words_arr('there is an answer','\w{3,}') AS words;
NOTICE: match = {there}
NOTICE: match = {answer}
words
----------------
{there,answer}
(1 row)
My Question
Why come, that individual regexp_matches entries match and return data as "{txt}" but behave as type text and not ARRAY?
Is there a way without my workaround?
Code
CREATE OR REPLACE FUNCTION lower_words_arr(input_str text, match_expr text) RETURNS text[] AS $$
DECLARE
words_arr text[];
one_match text;
BEGIN
FOR one_match IN SELECT regexp_matches(lower(input_str), match_expr, 'g')
AS match
LOOP
RAISE NOTICE 'match = %', one_match;
-- But if I write:
--
-- RAISE NOTICE 'match = %', one_match[0];
--
-- fails with:
---
-- ERROR: cannot subscript type text because it is not an array
-- CONTEXT: SQL statement "SELECT one_match[0]"
-- PL/pgSQL function lower_words_arr(text,text) line 10 at RAISE
--
-- even though this the result returned
---
-- NOTICE: match = {there}
-- NOTICE: match = {answewr}
words_arr := array_append(words_arr,trim(BOTH '{}' FROM one_match));
END LOOP;
RETURN words_arr;
END;
$$ LANGUAGE plpgsql;

I have now solved the issue without the trim workaround. The solution came right from the Postgres documentation. Look for SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1]; on that page. With that knowledge in hand I changed the function to
Improved Solution
CREATE OR REPLACE FUNCTION lower_words_arr(input_str text, match_expr text) RETURNS text[] AS $$
DECLARE
words_arr text[];
one_match text;
BEGIN
FOR one_match IN SELECT(regexp_matches(lower(input_str), match_expr, 'g'))[1] AS match
LOOP
RAISE NOTICE 'match = %', one_match;
words_arr := array_append(words_arr,one_match);
END LOOP;
RETURN words_arr;
END;
$$ LANGUAGE plpgsql STABLE;
Output
select lower_words_arr('there is an answer - you will find it in the Postgres documentation.','\w{4,}') AS words;
NOTICE: match = there
NOTICE: match = answer
NOTICE: match = will
NOTICE: match = find
NOTICE: match = postgres
NOTICE: match = documentation
words
-------------------------------------------------
{there,answer,will,find,postgres,documentation}
(1 row)
Conclusion
My original problem has been solved, but a puzzling piece understanding is still missing (not subject of my initial question):
Why does the one_match variable not behave like an array when [1] is removed from the SELECT?
Maybe I should open a question specific to that behaviour ...

Understanding difference between int literal vs int parameter in PL/pgSQL function

I have a function to left pad bit stings in PostgreSQL 9.5:
CREATE OR REPLACE FUNCTION lpad_bits(val bit varying)
RETURNS bit varying as
$BODY$
BEGIN return val::bit(32) >> (32-length(val));
END;
$BODY$
LANGUAGE plpgsql IMMUTABLE;
which works fine:
# select lpad_bits(b'1001100111000');
lpad_bits
----------------------------------
00000000000000000001001100111000
(1 row)
My problem is when I try to add a parameter to change the amount of padding:
CREATE OR REPLACE FUNCTION lpad_bits(val bit varying, sz integer default 1024)
RETURNS bit varying as
$BODY$
BEGIN return val::bit(sz) >> (sz-length(val));
END;
$BODY$
LANGUAGE plpgsql IMMUTABLE;
The function is now broken:
# select lpad_bits(b'1001100111000', 32);
ERROR: invalid input syntax for integer: "sz"
LINE 1: SELECT val::bit(sz) >> (sz-length(val))
^
QUERY: SELECT val::bit(sz) >> (sz-length(val))
CONTEXT: PL/pgSQL function lpad_bits(bit varying,integer) line 2 at RETURN
I have stared at the bitstring documentation and PL/pgSQL function documentation, am simply not seeing what is fundamentally different between these two implementations.

Why?
PL/pgSQL executes SQL queries like prepared statements. The manual about parameter substituion:
Prepared statements can take parameters: values that are substituted
into the statement when it is executed.
Note the term values. Only actual values can be parameterized, but not key words, identifiers or type names. 32 in bit(32) looks like a value, but the modifier of a data type is only a "value" internally and can't be parameterized. SQL demands to know data types at planning stage, it cannot wait for the execution stage.
You could achieve your goal with dynamic SQL and EXECUTE. As proof of concept:
CREATE OR REPLACE FUNCTION lpad_bits(val varbit, sz int = 32, OUT outval varbit) AS
$func$
BEGIN
EXECUTE format('SELECT $1::bit(%s) >> $2', sz) -- literal
USING val, sz - length(val) -- values
INTO outval;
END
$func$ LANGUAGE plpgsql IMMUTABLE;
Call:
SELECT lpad_bits(b'1001100111000', 32);
Note the distinction between sz being used as literal to build the statement and its second occurrence where it's used as value, that can be passed as parameter.
Faster alternatives
A superior solution for this particular task is to just use lpad() like #Abelisto suggested:
CREATE OR REPLACE FUNCTION lpad_bits2(val varbit, sz int = 32)
RETURNS varbit AS
$func$
SELECT lpad(val::text, sz, '0')::varbit;
$func$ LANGUAGE sql IMMUTABLE;
(Simpler as plain SQL function, which also allows function inlining in the context of outer queries.)
Several times faster than the above function. A minor flaw: we have to cast to text and back to varbit. Unfortunately, lpad() is not currently implemented for varbit. The manual:
The following SQL-standard functions work on bit strings as well as
character strings: length, bit_length, octet_length, position, substring, overlay.
overlay() is available, we can have a cheaper function:
CREATE OR REPLACE FUNCTION lpad_bits3(val varbit, base varbit = '00000000000000000000000000000000')
RETURNS varbit AS
$func$
SELECT overlay(base PLACING val FROM bit_length(base) - bit_length(val))
$func$ LANGUAGE sql IMMUTABLE;
Faster if you can work with varbit values to begin with. (The advantage is (partly) voided, if you have to cast text to varbit anyway.)
Call:
SELECT lpad_bits3(b'1001100111000', '00000000000000000000000000000000');
SELECT lpad_bits3(b'1001100111000', repeat('0', 32)::varbit);
We might overlaod the function with a variant taking an integer to generate base itself:
CREATE OR REPLACE FUNCTION lpad_bits3(val varbit, sz int = 32)
RETURNS varbit AS
$func$
SELECT overlay(repeat('0', sz)::varbit PLACING val FROM sz - bit_length(val))
$func$ LANGUAGE sql IMMUTABLE;
Call:
SELECT lpad_bits3(b'1001100111000', 32;
Related:
Postgresql Convert bit varying to integer
Convert hex in text representation to decimal number

The parser does not allow a variable at that place. The alternative is to use a constant and trim it:
select right((val::bit(128) >> (128 -length(val)))::text, sz)::bit(sz)
from (values (b'1001100111000', 32)) s(val,sz)
;
right
----------------------------------
00000000000000000001001100111000
Or the lpad function as suggested in the comments.

Defining global constants in Postgresql stored function/procedures?

I have a set of functions I have created in PostgreSql. I would like to be able to configure some behavior and limits with global constants.
So far, I have implemented functions like these, which my functions call to retrieve the constant value:
CREATE OR REPLACE FUNCTION slice_length()
RETURNS integer AS $$
BEGIN
RETURN 50;
END; $$
LANGUAGE plpgsql IMMUTABLE;
I was wondering, is there a better/smarter way to achieve this?

I had a similar problem: store some sort of configuration and access it from my function.
To solve the problem I created a function which returns a constant JSONB containing my configuration.
The function looks like this:
create or replace function config()
returns jsonb
language plpgsql
immutable
as $BODY$
declare
job_manager_config constant jsonb := '{
"notify": {
"channels": {
"all_available": "all.available",
"all_status": "all.status",
"task_available": "task.%s.available",
"task_status": "task.%s.status",
"job": "job.%s"
},
}
"allowed_pets": {
"dogs": 6,
"cats": 6,
"rabbits": 3,
"lions": 2,
"sweet_piranha": 8
}
}';
begin
return job_manager_config;
end;
$BODY$;
To access configuration elements quickly I defined a second function to query the configuration. This function accepts a path as a list of strings and return the value (or JSON object) found at the path.
Note that the returned value is text but you can easily cast it to the actual type.
create or replace function job_manager.config(
variadic path_array text[])
returns text
language plpgsql
immutable
as $BODY$
begin
-- Return selected object as text (instead of json or jsonb)
return jsonb_extract_path_text(job_manager.config(), variadic path_array);
end;
$BODY$;
This is a usage example:
test=# select job_manager.config('notify', 'channels', 'job');
config
--------------------
job_manager.job.%s
(1 row)

I would create a table for that:
create table constant (c1 int, c2 numeric);
insert into constant (c1, c2) values (100, 33.2);
The function, SQL not PL/pgSQL, would retrieve the single row:
create or replace function get_constants()
returns constant as $$
select *
from constant;
$$ language sql immutable;
And would be called for each constant:
select (get_constants()).c1, (get_constants()).c2;
All data would be in a single place and retrieved with a single function.
If a table is really that bad then place all the values in a single function:
create or replace function get_constants (
c1 out int, c2 out numeric
) returns record as $$
select 100, 33.5;
$$ language sql immutable;
And use it as above.

Take a look at this other answer. It uses the same approach that you do.
Create constant string for entire database

You can declare a table named constants for that purpose as mentioned that answer where each column corresponds to one setting.
To ensure that no more than one row can be added to the table, this answer might be helpful.
As come comments have pointed to the increased I/O when constants are stored like this as opposed to storing constants as functions, this might be a good reeding.

Convert bigint to bytea, but swap the byte order

I have a PostgreSQL table that I want to alter a column from bigint to bytea byte to hold more data. I am thinking using the following sequence:
alter table mytable add new_column
update mytable set new_column = int8send(old_column)
alter table drop old_column
alter table rename new_column to old_column
The above sequence works, the only problem is that I want the byte sequence in the bytea to be reversed. For example, if a value in old_column is 0x1234567890abcdef, the above sequence would generate \0224Vx\220\253\315\357, but I want it to be
\357\315\253\220xV4\022. Seems like the resulting bytea uses the big-endian order from originating bigint.
Is there an easy way to do that without writing a program? I was looking for a swap64() sort of function in PostgreSQL but failed to find one.

Here's a pure-SQL function I wrote to reverse the byte-order of a bytea-type value:
CREATE OR REPLACE FUNCTION reverse_bytes_iter(bytes bytea, length int, midpoint int, index int)
RETURNS bytea AS
$$
SELECT CASE WHEN index >= midpoint THEN bytes ELSE
reverse_bytes_iter(
set_byte(
set_byte(bytes, index, get_byte(bytes, length-index)),
length-index, get_byte(bytes, index)
),
length, midpoint, index + 1
)
END;
$$ LANGUAGE SQL IMMUTABLE;
CREATE OR REPLACE FUNCTION reverse_bytes(bytes bytea) RETURNS bytea AS
'SELECT reverse_bytes_iter(bytes, octet_length(bytes)-1, octet_length(bytes)/2, 0)'
LANGUAGE SQL IMMUTABLE;
I just wrote it yesterday, so it's not particularly well-tested nor optimized, but it seems to work, at least on byte strings up to 1k in length.

It is possible to byte-swap without plpgsql code using regexp extractions on the hexadecimal representation.
Here's an example to swap a bigint constant, assuming SET standard_conforming_strings to ON (the default with PG 9.1)
select regexp_replace( lpad(to_hex(x'123456789abcd'::bigint),16,'0'),
'(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)',
'\8\7\6\5\4\3\2\1');
It returns cdab896745230100. Then apply decode(value, 'hex') to convert that to a bytea.
The whole type conversion could actually be done in a single SQL statement:
ALTER TABLE mytable ALTER COLUMN old_column TYPE bytea
USING decode(
regexp_replace( lpad(to_hex(old_column), 16,'0'),
'(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)(\w\w)',
'\8\7\6\5\4\3\2\1')
, 'hex');

I am playing with pageinspect module now, and I was also curios how to change byte order of an existing bytea value, which pretty much matches your case.
I've come up with the following function:
CREATE OR REPLACE FUNCTION reverse(bytea) RETURNS bytea AS $reverse$
SELECT string_agg(byte,''::bytea)
FROM (
SELECT substr($1,i,1) byte
FROM generate_series(length($1),1,-1) i) s
$reverse$ LANGUAGE sql;
It's pretty straightforward and works similar to textual reverse() function:
WITH v(val) AS (
VALUES ('\xaabbccdd'::bytea),('\x0123456789abcd'::bytea)
)
SELECT val, reverse(val)
FROM v;

This function, while not exactly what you're looking for, should help you get on your way.
The source code from that function is reproduced verbatim below.
CREATE OR REPLACE FUNCTION utils.int_littleendian(v_number integer)
RETURNS bytea AS
$BODY$
DECLARE
v_textresult bytea;
v_temp int;
v_int int;
v_i int = 0;
BEGIN
v_int = v_number;
v_textresult = '1234';
WHILE(v_i < 4) LOOP
raise notice 'loop %',v_int;
v_temp := v_int%256;
v_int := v_int - v_temp;
v_int := v_int / 256;
SELECT set_byte(v_textresult,v_i,v_temp) INTO v_textresult;
v_i := v_i + 1;
END LOOP;
return v_textresult;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;

Select multiple ids from a PostgreSQL sequence

Is there a concise way to select the nextval for a PostgreSQL sequence multiple times in 1 query? This would be the only value being returned.
For example, I would like to do something really short and sweet like:
SELECT NEXTVAL('mytable_seq', 3) AS id;
And get:
id
-----
118
119
120
(3 rows)

select nextval('mytable_seq') from generate_series(1,3);
generate_series is a function which returns many rows with sequential numbers, configured by it's arguments.
In above example, we don't care about the value in each row, we just use generate_series as row generator. And for each row we can call nextval. In this case it returns 3 numbers (nextvals).
You can wrap this into function, but I'm not sure if it's really sensible given how short the query is.

There is a great article about this exact problem: "getting multiple values from sequences".
If performance is not an issue, for instance when using the sequence values dwarfs the time used to get them or n is small, then the SELECT nextval('seq') FROM generate_series(1,n) approach is the simplest and most appropriate.
But when preparing data for bulk loads the last approach from the article of incrementing the sequence by n from within a lock is appropriate.

CREATE OR REPLACE FUNCTION foo() RETURNS SETOF INT AS $$
DECLARE
seqval int; x int;
BEGIN
x := 0;
WHILE x < 100 LOOP
SELECT into seqval nextval('f_id_seq');
RETURN NEXT seqval;
x := x+1;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql STRICT;
Of course, if all you're trying to do is advance the sequence, there's setval().
You could also have the function take a parameter for how many times to loop:
CREATE OR REPLACE FUNCTION foo(loopcnt int) RETURNS SETOF INT AS $$
DECLARE
seqval int;
x int;
BEGIN
x := 0;
WHILE x < loopcnt LOOP
SELECT into seqval nextval('f_id_seq');
RETURN NEXT seqval;x := x+1;
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql STRICT;

Unless you really want three rows returned I would set the sequence to 'INCREMENT BY 3' for each select. Then you can simple add 1 and 2 to the result have have your three sequence numbers.
I tried to add a link to the postgresql docs, but apparenty I am not allowed to post links.

My current best solution is:
SELECT NEXTVAL('mytable_seq') AS id
UNION ALL
SELECT NEXTVAL('mytable_seq') AS id
UNION ALL
SELECT NEXTVAL('mytable_seq') AS id;
Which will correctly return 3 rows... but I would like something that is minimal SQL for even as much as 100 or more NEXTVAL invocations.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

PostgreSQL: Treat TEXT column as hexadecimal number [duplicate] - postgresql

Related

Postgres function to turn regexp_matches SETOF result into an ARRAY

Understanding difference between int literal vs int parameter in PL/pgSQL function

Defining global constants in Postgresql stored function/procedures?

Convert bigint to bytea, but swap the byte order

Select multiple ids from a PostgreSQL sequence

Categories

Resources