Is there a good reason to insert HEX as BYTEA in PostgreSQL? - postgresql

I found this from the documentation:
... binary strings specifically allow storing octets of value zero and
... octets outside the range 32 to 126 ...
To me that sounds like there is no reason to use BYTEA to store a HEX value? Still a lot of people seem to use BYTEA for sth. like this:
013d7d16d7ad4fefb61bd95b765c8ceb
007687fc64b746569616414b78c81ef1
Is there a good reason to do so?

There are three good reasons:
It will require less storage space, since two hexadecimal digits are stored as one byte.
It will automatically check the value for correctness:
SELECT decode('0102ABCDNONSENSE', 'hex');
ERROR: invalid hexadecimal digit: "N"
you can store and retrieve binary data without converting them from and to text if your API supports it.

Related

Is it safe to use "char" (note the quotes) data type in PostgreSQL for production application?

I have a column that stores a numeric value ranging from 1 to 5.
I considered using smallint and character(1), but I feel like there might be a better data type that is "char" (note the quotes).
"char" requires only 1 byte, which led me to thinking a performance gain over smallint and character(1).
However, in PostgreSQL's own documentation, it says that "char" is not intended for general-purpose use, only for use in the internal system catalogs.
Does this mean I shouldn't use "char" data type for my production application?
If so, which data type would you recommend to store a numeric value ranging from 1 to 5.
The data type "char" is a Postgres type that's not in standard SQL. It's "safe" and you can use it freely, even if "not intended for general-purpose use". There are no restrictions, other than it being non-standard. The type is used across many system catalogs, so it's not going away. If you ever need to migrate to another RDBMS, make the target varchar(1) or similar.
"char" lends itself to 1-letter enumeration types that never grow beyond a hand full of distinct plain ASCII letters. That's how I use it - including productive DBs. (You may want to add a CHECK or FOREIGN KEY constraint to columns enforcing valid values.)
For a "numeric value ranging from 1 to 5" I would still prefer smallint, seems more appropriate for numeric data.
A storage benefit (1 byte vs. 2 bytes) only kicks in for multiple columns - and/or (more importantly!) indexes that can retain a smaller footprint after applying alignment padding.
Notable updates in Postgres 15, quoting the release notes:
Change the I/O format of type "char" for non-ASCII characters (Tom Lane)
Bytes with the high bit set are now output as a backslash and three
octal digits, to avoid encoding issues.
And:
Create a new pg_type.typcategory value for "char" (Tom Lane)
Some other internal-use-only types have also been assigned to this
category.
(No effect on enumeration with plain ASCII letters.)
Related:
Shall I use enum when are too many "categories" with PostgreSQL?
How to store one-byte integer in PostgreSQL?
Calculating and saving space in PostgreSQL

Update bytea data that was first encoded as a hex char string

Given a table with a "blob" column of bytea type, that has data that was first encoded as a char string of 'hex' format by the toString method of Node's Buffer API ... yes, not the best idea ...
Is it possible to update the data so that the data is decoded from 'hex' and returned to raw bytes?
decode(blob,'hex') is not going to work as the blob is still bytea, not text.
Looking for a possibly 'pure' Postgres (> v12) solution without going back to Node's Buffer API first, but I'll accept the punishment of having to export the data, transform it, and update from there.
Fear not:
UPDATE blobs
SET b = decode(encode(b, 'escape'), 'hex');
With the escape format, the hex digits you stored by mistake will be output as characters, and the hex format will interpret pairs of these hex digits as a single byte.

PostgreSQL - Converting Binary data to Varchar

We are working towards migration of databases from MSSQL to PostgreSQL database. During this process we came across a situation where a table contains password field which is of NVARCHAR type and this field value got converted from VARBINARY type and stored as NVARCHAR type.
For example: if I execute
SELECT HASHBYTES('SHA1','Password')`
then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn if this value is converted into NVARCHAR then it is returning a text in the format "䏉悱゚얿괚浦Њ鴼"
As we know that PostgreSQL doesn't support VARBINARY so we have used BYTEA instead and it is returning binary data. But when we try to convert this binary data into VARCHAR type it is returning hex format
For example: if the same statement is executed in PostgreSQL
SELECT ENCODE(DIGEST('Password','SHA1'),'hex')
then it returns
8be3c943b1609fffbfc51aad666d0a04adf83c9d.
When we try to convert this encoded text into VARCHAR type it is returning the same result as 8be3c943b1609fffbfc51aad666d0a04adf83c9d
Is it possible to get the same result what we retrieved from MSSQL server? As these are related to password fields we are not intended to change the values. Please suggest on what needs to be done
It sounds like you're taking a byte array containing a cryptographic hash and you want to convert it to a string to do a string comparison. This is a strange way to do hash comparisons but it might be possible depending on which encoding you were using on the MSSQL side.
If you have a byte array that can be converted to string in the encoding you're using (e.g. doesn't contain any invalid code points or sequences for that encoding) you can convert the byte array to string as follows:
SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'latin1') AS hash_string;
hash_string
-----------------------------
\u008BãÉC±`\u009Fÿ¿Å\x1A­fm+
\x04­ø<\u009D
If you're using Unicode this approach won't work at all since random binary arrays can't be converted to Unicode because there are certain sequences that are always invalid. You'll get an error like follows:
# SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'utf-8');
ERROR: invalid byte sequence for encoding "UTF8": 0x8b
Here's a list of valid string encodings in PostgreSQL. Find out which encoding you're using on the MSSQL side and try to match it to PostgreSQL. If you can I'd recommend changing your business logic to compare byte arrays directly since this will be less error prone and should be significantly faster.
then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn
if this value is converted into NVARCHAR then it is returning a text
in the format "䏉悱゚얿괚浦Њ鴼"
Based on that, MSSQL interprets these bytes as a text encoded in UTF-16LE.
With PostgreSQL and using only built-in functions, you cannot obtain that result because PostgreSQL doesn't use or support UTF-16 at all, for anything.
It also doesn't support nul bytes in strings, and there are nul bytes in UTF-16.
This Q/A: UTF16 hex to text suggests several solutions.
Changing your business logic not to depend on UTF-16 would be your best long-term option, though. The hexadecimal representation, for instance, is simpler and much more portable.

PostgreSql storing a value as integer vs varchar

I want to store a 15 digit number in a table.
In terms of lookup speed should I use bigint or varchar?
Additionally, if it's populated with millions of records will the different data types have any impact on storage?
In terms of space, a bigint takes 8 bytes while a 15-character string will take up to 19 bytes (a 4 bytes header and up to another 15 bytes for the data), depending on its value. Also, generally speaking, equality checks should be slightly faster for numerical operations. Other than that, it widely depends on what you're intending to do with this data. If you intend to use numerical/mathematical operations or query according to ranges, you should use a bigint.
You can store them as BIGINT as the comparison using the INT are faster when compared with varchar. I would also advise to create an index on the column if you are expecting millions of records to make the data retrieval faster.
You can also check this forum:
Indexing is fastest on integer types. Char types are probably stored
underneath as integers (guessing.) VARCHARs are variable allocation
and when you change the string length it may have to reallocate.
Generally integers are fastest, then single characters then CHAR[x]
types and then VARCHARS.

How to return strings that are trimmed automatically by ServiceStack.OrmLite.PostgreSQL?

I've noticed that when my entities are returned, the string values are padded to the number of characters of the field definition in the database, is the a setting to control this behavior?
Thank you,
Stephen
PostGreSQL 9.3
ServiceStack.OrmLite.PostgreSQL.4.0.30
PostgreSQL only pads strings for fixed-size CHAR(N) columns, if you don't want this behavior you should instead use VARCHAR(N) columns instead - the recommended default for strings.
OrmLite also uses VARCHAR for strings when it's used to generate your table schema's, e.g:
db.CreateTable<Table>();
If dealing with legacy DB's, you can get OrmLite to automatically trim padded strings returned from RDBMS's with:
OrmLiteConfig.StringFilter = s => s.TrimEnd();