Hash Conversion from Postgres to Rails - postgresql

I am not sure what type of coded hash I am getting back from a postgres database that when queried shows instead a different code.
The question is...
how to convert this hash (as it is returned from Rails):
\x158\x06\xDB\xCD\x13M\xDE\xE6\x9A\x8CR\x04\xE3\x8A\xC8\x04H\xF6#B\xF8\xC2<\xFEK~\xDF
into this (as it shows inside the postgres database):
\x153806dbcd134ddee69a8c5204e38ac80448f62342f8c23cfe4b7edf

The first hash (as you say, coming from Rails) is a byte array, in which any printable character is left as is instead of being converted to hex: \x158 is really two characters: '\x15' and '\x38' ('8').
In the Postgres table, that byte array is the same, but the format is to hexlify the whole thing.
So:
\x158\x06\xDB\xCD\x13M... is really \x15,8,\x06,\xDB,\xCD,\x13,M
-- becomes
\x153806dbcd134d... ('8':\x38, 'M':\x4d)

Related

Dealing with parsing oids in Postgres

I'm currently improving a library client for Postgresql, the library already has working communication protocol including DataRow and RowDescription.
The problem I'm facing right now is how to deal with values.
Returning plain string with array of integers for example is kind of pointless.
By my research I found that some other libraries (like for Python) either return is as unmodified string or convert primitive types including arrays.
What I mean by conversion is making Postgres DataRow raw data as Python-type value: Postgres integer is parsed as python number, Postgres booleans as python booleans, etc.
Should I make second query to get information column type and use its converters or should I leave it plain?
You could opt to get the array values in the internal format by setting the corresponding "result-column format code" in the Bind message to 1, but that is typically a bad choice, since the internal format varies from type to type and may even depend on the server's architecture.
So your best option is probably to parse the string representation of the array on the client side, including all the escape characters.
When it comes to finding the base type for an array type, there is no other option than querying pg_type like
SELECT typelem::regtype FROM pg_type WHERE oid = 1007;
typelem
---------
integer
(1 row)
You could cache these values on the client side so that you don't have to query more than once per type and database session.

Is there a good reason to insert HEX as BYTEA in PostgreSQL?

I found this from the documentation:
... binary strings specifically allow storing octets of value zero and
... octets outside the range 32 to 126 ...
To me that sounds like there is no reason to use BYTEA to store a HEX value? Still a lot of people seem to use BYTEA for sth. like this:
013d7d16d7ad4fefb61bd95b765c8ceb
007687fc64b746569616414b78c81ef1
Is there a good reason to do so?
There are three good reasons:
It will require less storage space, since two hexadecimal digits are stored as one byte.
It will automatically check the value for correctness:
SELECT decode('0102ABCDNONSENSE', 'hex');
ERROR: invalid hexadecimal digit: "N"
you can store and retrieve binary data without converting them from and to text if your API supports it.

PostgreSQL - Converting Binary data to Varchar

We are working towards migration of databases from MSSQL to PostgreSQL database. During this process we came across a situation where a table contains password field which is of NVARCHAR type and this field value got converted from VARBINARY type and stored as NVARCHAR type.
For example: if I execute
SELECT HASHBYTES('SHA1','Password')`
then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn if this value is converted into NVARCHAR then it is returning a text in the format "䏉悱゚얿괚浦Њ鴼"
As we know that PostgreSQL doesn't support VARBINARY so we have used BYTEA instead and it is returning binary data. But when we try to convert this binary data into VARCHAR type it is returning hex format
For example: if the same statement is executed in PostgreSQL
SELECT ENCODE(DIGEST('Password','SHA1'),'hex')
then it returns
8be3c943b1609fffbfc51aad666d0a04adf83c9d.
When we try to convert this encoded text into VARCHAR type it is returning the same result as 8be3c943b1609fffbfc51aad666d0a04adf83c9d
Is it possible to get the same result what we retrieved from MSSQL server? As these are related to password fields we are not intended to change the values. Please suggest on what needs to be done
It sounds like you're taking a byte array containing a cryptographic hash and you want to convert it to a string to do a string comparison. This is a strange way to do hash comparisons but it might be possible depending on which encoding you were using on the MSSQL side.
If you have a byte array that can be converted to string in the encoding you're using (e.g. doesn't contain any invalid code points or sequences for that encoding) you can convert the byte array to string as follows:
SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'latin1') AS hash_string;
hash_string
-----------------------------
\u008BãÉC±`\u009Fÿ¿Å\x1A­fm+
\x04­ø<\u009D
If you're using Unicode this approach won't work at all since random binary arrays can't be converted to Unicode because there are certain sequences that are always invalid. You'll get an error like follows:
# SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'utf-8');
ERROR: invalid byte sequence for encoding "UTF8": 0x8b
Here's a list of valid string encodings in PostgreSQL. Find out which encoding you're using on the MSSQL side and try to match it to PostgreSQL. If you can I'd recommend changing your business logic to compare byte arrays directly since this will be less error prone and should be significantly faster.
then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn
if this value is converted into NVARCHAR then it is returning a text
in the format "䏉悱゚얿괚浦Њ鴼"
Based on that, MSSQL interprets these bytes as a text encoded in UTF-16LE.
With PostgreSQL and using only built-in functions, you cannot obtain that result because PostgreSQL doesn't use or support UTF-16 at all, for anything.
It also doesn't support nul bytes in strings, and there are nul bytes in UTF-16.
This Q/A: UTF16 hex to text suggests several solutions.
Changing your business logic not to depend on UTF-16 would be your best long-term option, though. The hexadecimal representation, for instance, is simpler and much more portable.

deserialize cassandra row key

I'm trying to use the sstablekeys utility for Cassandra to retrieve all the keys currently in the sstables for a Cassandra cluster. The format they come back in what appears to be serialized format when I run sstablekeys for a single sstable. Does anyone know how to deserialize the keys or get them back into their original format? They were inserted into Cassandra using astyanax, where the serialized type is a tuple in Scala. The key_validation_class for the column family is a CompositeType.
Thanks to the comment by Thomas Stets, I was able to figure out that the keys are actually just converted to hex and printed out. See here for a way of getting them back to their original format.
For the specific problem of figuring out the format of a CompositeType row key and unhexifying it, see the Cassandra source which describes the format of a CompositeType key that gets output by sstablekeys. With CompositeType(UTF8Type, UTF8Type, Int32Type), the UTF8Type treats bytes as ASCII characters (so the function in the link above works in this case), but with Int32Type, you must interpret the bytes as one number.

How to return strings that are trimmed automatically by ServiceStack.OrmLite.PostgreSQL?

I've noticed that when my entities are returned, the string values are padded to the number of characters of the field definition in the database, is the a setting to control this behavior?
Thank you,
Stephen
PostGreSQL 9.3
ServiceStack.OrmLite.PostgreSQL.4.0.30
PostgreSQL only pads strings for fixed-size CHAR(N) columns, if you don't want this behavior you should instead use VARCHAR(N) columns instead - the recommended default for strings.
OrmLite also uses VARCHAR for strings when it's used to generate your table schema's, e.g:
db.CreateTable<Table>();
If dealing with legacy DB's, you can get OrmLite to automatically trim padded strings returned from RDBMS's with:
OrmLiteConfig.StringFilter = s => s.TrimEnd();