PostgreSQL - Converting Binary data to Varchar - postgresql

We are working towards migration of databases from MSSQL to PostgreSQL database. During this process we came across a situation where a table contains password field which is of NVARCHAR type and this field value got converted from VARBINARY type and stored as NVARCHAR type.
For example: if I execute
SELECT HASHBYTES('SHA1','Password')`
then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn if this value is converted into NVARCHAR then it is returning a text in the format "䏉悱゚얿괚浦Њ鴼"
As we know that PostgreSQL doesn't support VARBINARY so we have used BYTEA instead and it is returning binary data. But when we try to convert this binary data into VARCHAR type it is returning hex format
For example: if the same statement is executed in PostgreSQL
SELECT ENCODE(DIGEST('Password','SHA1'),'hex')
then it returns
8be3c943b1609fffbfc51aad666d0a04adf83c9d.
When we try to convert this encoded text into VARCHAR type it is returning the same result as 8be3c943b1609fffbfc51aad666d0a04adf83c9d
Is it possible to get the same result what we retrieved from MSSQL server? As these are related to password fields we are not intended to change the values. Please suggest on what needs to be done

It sounds like you're taking a byte array containing a cryptographic hash and you want to convert it to a string to do a string comparison. This is a strange way to do hash comparisons but it might be possible depending on which encoding you were using on the MSSQL side.
If you have a byte array that can be converted to string in the encoding you're using (e.g. doesn't contain any invalid code points or sequences for that encoding) you can convert the byte array to string as follows:
SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'latin1') AS hash_string;
hash_string
-----------------------------
\u008BãÉC±`\u009Fÿ¿Å\x1A­fm+
\x04­ø<\u009D
If you're using Unicode this approach won't work at all since random binary arrays can't be converted to Unicode because there are certain sequences that are always invalid. You'll get an error like follows:
# SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'utf-8');
ERROR: invalid byte sequence for encoding "UTF8": 0x8b
Here's a list of valid string encodings in PostgreSQL. Find out which encoding you're using on the MSSQL side and try to match it to PostgreSQL. If you can I'd recommend changing your business logic to compare byte arrays directly since this will be less error prone and should be significantly faster.

then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn
if this value is converted into NVARCHAR then it is returning a text
in the format "䏉悱゚얿괚浦Њ鴼"
Based on that, MSSQL interprets these bytes as a text encoded in UTF-16LE.
With PostgreSQL and using only built-in functions, you cannot obtain that result because PostgreSQL doesn't use or support UTF-16 at all, for anything.
It also doesn't support nul bytes in strings, and there are nul bytes in UTF-16.
This Q/A: UTF16 hex to text suggests several solutions.
Changing your business logic not to depend on UTF-16 would be your best long-term option, though. The hexadecimal representation, for instance, is simpler and much more portable.

Related

Update bytea data that was first encoded as a hex char string

Given a table with a "blob" column of bytea type, that has data that was first encoded as a char string of 'hex' format by the toString method of Node's Buffer API ... yes, not the best idea ...
Is it possible to update the data so that the data is decoded from 'hex' and returned to raw bytes?
decode(blob,'hex') is not going to work as the blob is still bytea, not text.
Looking for a possibly 'pure' Postgres (> v12) solution without going back to Node's Buffer API first, but I'll accept the punishment of having to export the data, transform it, and update from there.
Fear not:
UPDATE blobs
SET b = decode(encode(b, 'escape'), 'hex');
With the escape format, the hex digits you stored by mistake will be output as characters, and the hex format will interpret pairs of these hex digits as a single byte.

Dealing with parsing oids in Postgres

I'm currently improving a library client for Postgresql, the library already has working communication protocol including DataRow and RowDescription.
The problem I'm facing right now is how to deal with values.
Returning plain string with array of integers for example is kind of pointless.
By my research I found that some other libraries (like for Python) either return is as unmodified string or convert primitive types including arrays.
What I mean by conversion is making Postgres DataRow raw data as Python-type value: Postgres integer is parsed as python number, Postgres booleans as python booleans, etc.
Should I make second query to get information column type and use its converters or should I leave it plain?
You could opt to get the array values in the internal format by setting the corresponding "result-column format code" in the Bind message to 1, but that is typically a bad choice, since the internal format varies from type to type and may even depend on the server's architecture.
So your best option is probably to parse the string representation of the array on the client side, including all the escape characters.
When it comes to finding the base type for an array type, there is no other option than querying pg_type like
SELECT typelem::regtype FROM pg_type WHERE oid = 1007;
typelem
---------
integer
(1 row)
You could cache these values on the client side so that you don't have to query more than once per type and database session.

Is there a good reason to insert HEX as BYTEA in PostgreSQL?

I found this from the documentation:
... binary strings specifically allow storing octets of value zero and
... octets outside the range 32 to 126 ...
To me that sounds like there is no reason to use BYTEA to store a HEX value? Still a lot of people seem to use BYTEA for sth. like this:
013d7d16d7ad4fefb61bd95b765c8ceb
007687fc64b746569616414b78c81ef1
Is there a good reason to do so?
There are three good reasons:
It will require less storage space, since two hexadecimal digits are stored as one byte.
It will automatically check the value for correctness:
SELECT decode('0102ABCDNONSENSE', 'hex');
ERROR: invalid hexadecimal digit: "N"
you can store and retrieve binary data without converting them from and to text if your API supports it.

How to return strings that are trimmed automatically by ServiceStack.OrmLite.PostgreSQL?

I've noticed that when my entities are returned, the string values are padded to the number of characters of the field definition in the database, is the a setting to control this behavior?
Thank you,
Stephen
PostGreSQL 9.3
ServiceStack.OrmLite.PostgreSQL.4.0.30
PostgreSQL only pads strings for fixed-size CHAR(N) columns, if you don't want this behavior you should instead use VARCHAR(N) columns instead - the recommended default for strings.
OrmLite also uses VARCHAR for strings when it's used to generate your table schema's, e.g:
db.CreateTable<Table>();
If dealing with legacy DB's, you can get OrmLite to automatically trim padded strings returned from RDBMS's with:
OrmLiteConfig.StringFilter = s => s.TrimEnd();

Write query for inserting varbinary value in PostgreSQL

What is the syntax in PostgreSQL for inserting varbinary values?
SQL Server's syntax using a constant like 0xFFFF, it didn't work.
Given there's no "varbinary" data type in Postgres I believe you mean "bytea". Take a look at the docs about the way to specify "bytea" literals.
Depending on the language and the bindings you use there could be more sophisticated ways for transferring binary data - you could find a .Net/C#/Npgsql example here (under "Working with binary data and bytea datatype").