Converting MS SQL Hash Text to Postgres Hash Text - postgresql

I am working on migrating all our databases from MS SQL server to Postgres. In this process, I am working on writing equivalent code in Postgres to yield the same hashed texts obtained in MS SQL.
Following is my code in MS SQL:
DECLARE #HashedText nvarchar(50)
DECLARE #InputText nvarchar(50) = 'password'
DECLARE #HashedBytes varbinary(20) -- maximum size of SHA1 output
SELECT #HashedBytes = HASHBYTES('SHA1', #InputText)
SET #HashedText = CONVERT(nvarchar(50), #HashedBytes, 2)
SELECT #HashedText
This is yielding the value E8F97FBA9104D1EA5047948E6DFB67FACD9F5B73
Following is equivalent code written in Postgres:
DO
$$
DECLARE v_InputText VARCHAR = 'password';
DECLARE v_HashedText VARCHAR;
DECLARE v_HashedBytes BYTEA;
BEGIN
SELECT
ENCODE(DIGEST(v_InputText, 'SHA1'), 'hex')
INTO
v_HashedBytes;
v_HashedText := CAST(v_HashedBytes AS VARCHAR);
RAISE INFO 'Hashed Text: %', v_HashedText;
END;
$$;
This yields the value 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8.
After spending some time I understood that replacing the datatype 'NVARCHAR' with 'VARCHAR' in MS SQL yields the same result as Postgres.
Now the problem is in MS SQL we already have passwords hashed and stored in database as shown above. I am unable to convert hashed text in MS SQL to Postgres and also unable to generate same hashed text in Postgres as Postgres doesn't support UTF-16 unicode.
So, I just want to know if there is any possibility of following solutions?
Convert hexadecimal value generated in MS SQL to hex value
equivalent to that generated by using VARCHAR datatype (which is
same value in Postgres)
Convert UTF8 texts to UTF16 texts in
Postgres (even by any kind of extensions) and generate hex values
which would be equivalent to values generated in MS SQL

Let's look at your suggestions in turn:
Convert hexadecimal value generated in MS SQL to hex value equivalent to that generated by using VARCHAR datatype (which is same value in Postgres)
This comes down to converting the user's password from UTF-16 to UTF-8 (or to some other encoding) and re-hashing it. To do this, you need to know the user's password, which theoretically, you don't - that's the point of hashing it in the first place.
In practice, you're using unsalted SHA1 hashes, for which large pre-computed tables exist, and for which brute force is feasible with a GPU-optimised algorithm. So a "grey hat" option would be to crack all your user's passwords, and re-hash them.
If you do so, it would probably be sensible to re-hash them using a salt and better hash function, as well as converting them to UTF-8.
Convert UTF8 texts to UTF16 texts in Postgres (even by any kind of extensions) and generate hex values which would be equivalent to values generated in MS SQL
This, theoretically, is simpler, you just need a routine to do the string conversion. However, as you've found, there is no built-in support for this in Postgres.
For any string composed entirely of ASCII characters, the conversion is trivial: insert a NULL byte (hex 00) before every byte of the string. But this will break any password that used a character not in this range.
An alternative would be to move the responsibility for generating the hash out of the database:
Retrieve the hash from the DB.
Ensure the user's input is represented as UTF-16 in your application, and calculate its hash.
If valid, you can now generate a new hash (because you know the password the user just typed), using a better algorithm, and store that in the DB instead of the old hash.
Once all active users have logged in at least once, you will have no SHA1 hashes left, and can remove the support for them completely.

Related

Convert a BLOB to VARCHAR instead of VARCHAR FOR BIT

I have a BLOB field in a table that I am selecting. This field data consists only of JSON data.
If I do the following:
Select CAST(JSONBLOB as VARCHAR(2000)) from MyTable
--> this returns the value in VARCHAR FOR BIT DATA format.
I just want it as a standard string or varcher - not in bit format.
That is because I need to use JSON2BSON function to convert the JSON to BSON. JSON2BSON accepts a string but it will not accept a VarChar for BIT DATA type...
This conversation should be easy.
I am able to do the select as a VARCHAR FOR BIT DATA.. Manually COPY it using the UI. Paste it into a select literal and convert that to BSON. I need to migrate a bunch of data in this BLOB from JSON to BSON, and doing it manually won't be fast. I just want to explain how simple of a use case this should be.
What is the select command to essentially get this to work:
Select JSON2BSON(CAST(JSONBLOB as VARCHAR(2000))) from MyTable
--> Currently this fails because the CAST converts this (even though its only text characters) to VARCHAR for BIT DATA type and not standard VARCHAR.
What is the suggestion to fix this?
DB2 11 on Windows.
If the data is JSON, then the table column should be CLOB in the first place...
Having the table column a BLOB might make sense if the data is actually already BSON.
You could change the blob into a clob using the converttoclob procedure then you should be ok.
https://www.ibm.com/support/knowledgecenter/SSEPGG_11.5.0/com.ibm.db2.luw.apdv.sqlpl.doc/doc/r0055119.html
You can use this function to remove the "FOR BIT DATA" flag on a column
CREATE OR REPLACE FUNCTION DB_BINARY_TO_CHARACTER(A VARCHAR(32672 OCTETS) FOR BIT DATA)
RETURNS VARCHAR(32672 OCTETS)
NO EXTERNAL ACTION
DETERMINISTIC
BEGIN ATOMIC
RETURN A;
END
or if you are on Db2 11.5 the function SYSIBMADM.UTL_RAW.CAST_TO_VARCHAR2 will also work

DB2 DBCLOB data INSERT with Unicode data

The problem at hand is to insert data into a DB2 table which has a DBCLOB column. The table's encoding is Unicode. The subsystem is a MIXED YES with Japanese CCSID set of (290, 930, 300). The application is bound ENCODING CCSID.
I was successful in FETCHING the DBCLOB's data in Unicode, no problem there. But when I turn around and try to INSERT it back, the data inserted is being interpreted as not being Unicode, seems DB2 thinks its EBCDIC DBCS/GRAPHIC, and the inserted row shows Unicode 0xFEFE. When I manually update the data being inserted to valid DBCS then the data inserts OK and shows the expected Unicode DBCS values.
To insert the data I am using a dynamically prepared INSERT statement with a placeholder for the DBCLOB column. The SQLVAR entry associated with the placeholder is a DBCLOB_LOCATOR with the CCSID set to 1200.
A DBCLOB locator is being created doing a SET dbclobloc = SUBSTR(dbclob, 1, length). The created locator is being put into SQLDA. Then the prepared INSERT is being executed.
It seems DB2 is ignoring the 1200 CCSID associated with the DBCLOB_LOCATOR SQLVAR. Attempts to put a CAST(? AS DBCLOB CCSID UNICODE) on the placeholder in the INSERT do not help because at that time DB2 seems to have made up its mind about the encoding of the data to be inserted.
I am stuck :( Any ideas?
Greg
I think I figured it out and it is not good: the SET statement for the DBCLOB_LOCATOR is static SQL and the DBRM is bound ENCODING EBCDIC. Hence DB2 has no choice but to assume the data is in the CCSID of the plan.
I also tried what the books suggest and used a SELECT ... FROM SYSIBM.SYSDUMMYU to set the DBCLOB_LOCATOR. This should have told DB2 that the data was coming in Unicode. But it failed again, with symptoms indicating it still assumed the DBCS EBCDIC CCSID.
Not good.

Create big integer from the big end of a uuid in PostgreSQL

I have a third-party application connecting to a view in my PostgreSQL database. It requires the view to have a primary key but can't handle the UUID type (which is the primary key for the view). It also can't handle the UUID as the primary key if it is served as text from the view.
What I'd like to do is convert the UUID to a number and use that as the primary key instead. However,
SELECT x'14607158d3b14ac0b0d82a9a5a9e8f6e'::bigint
Fails because the number is out of range.
So instead, I want to use SQL to take the big end of the UUID and create an int8 / bigint. I should clarify that maintaining order is 'desirable' but I understand that some of the order will change by doing this.
I tried:
SELECT x(substring(UUID::text from 1 for 16))::bigint
but the x operator for converting hex doesn't seem to like brackets. I abstracted it into a function but
SELECT hex_to_int(substring(UUID::text from 1 for 16))::bigint
still fails.
How can I get a bigint from the 'big end' half of a UUID?
Fast and without dynamic SQL
Cast the leading 16 hex digits of a UUID in text representation as bitstring bit(64) and cast that to bigint. See:
Convert hex in text representation to decimal number
Conveniently, excess hex digits to the right are truncated in the cast to bit(64) automatically - exactly what we need.
Postgres accepts various formats for input. Your given string literal is one of them:
14607158d3b14ac0b0d82a9a5a9e8f6e
The default text representation of a UUID (and the text output in Postgres for data type uuid) adds hyphens at predefined places:
14607158-d3b1-4ac0-b0d8-2a9a5a9e8f6e
The manual:
A UUID is written as a sequence of lower-case hexadecimal digits, in
several groups separated by hyphens, specifically a group of 8 digits
followed by three groups of 4 digits followed by a group of 12 digits,
for a total of 32 digits representing the 128 bits.
If input format can vary, strip hyphens first to be sure:
SELECT ('x' || translate(uuid_as_string, '-', ''))::bit(64)::bigint;
Cast actual uuid input with uuid::text.
db<>fiddle here
Note that Postgres uses signed integer, so the bigint overflows to negative numbers in the upper half - which should be irrelevant for this purpose.
DB design
If at all possible add a bigserial column to the underlying table and use that instead.
This is all very shaky, both the problem and the solution you describe in your self-answer.
First, a mismatch between a database design and a third-party application is always possible, but usually indicative of a deeper problem. Why does your database use the uuid data type as a PK in the first place? They are not very efficient compared to a serial or a bigserial. Typically you would use a UUID if you are working in a distributed environment where you need to "guarantee" uniqueness over multiple installations.
Secondly, why does the application require the PK to begin with (incidentally: views do not have a PK, the underlying tables do)? If it is only to view the data then a PK is rather useless, particularly if it is based on a UUID (and there is thus no conceivable relationship between the PK and the rest of the tuple). If it is used to refer to other data in the same database or do updates or deletes of existing data, then you need the exact UUID and not some extract of it because the underlying table or other relations in your database would have the exact UUID. Of course you can convert all UUID's with the same hex_to_int() function, but that leads straight back to my point above: why use uuids in the first place?
Thirdly, do not mess around with things you have little or no knowledge of. This is not intended to be offensive, take it as well-meant advice (look around on the internet for programmers who tried to improve on cryptographic algorithms or random number generation by adding their own twists of obfuscation; quite entertaining reads). There are 5 algorithms for generating UUID's in the uuid-ossp package and while you know or can easily find out which algorithm is used in your database (the uuid_generate_vX() functions in your table definitions, most likely), do you know how the algorithm works? The claim of practical uniqueness of a UUID is based on its 128 bits, not a 64-bit extract of it. Are you certain that the high 64-bits are random? My guess is that 64 consecutive bits are less random than the "square root of the randomness" (for lack of a better way to phrase the theoretical drop in periodicity of a 64-bit number compared to a 128-bit number) of the full UUID. Why? Because all but one of the algorithms are made up of randomized blocks of otherwise non-random input (such as the MAC address of a network interface, which is always the same on a machine generating millions of UUIDs). Had 64 bits been enough for randomized value uniqueness, then a uuid would have been that long.
What a better solution would be in your case is hard to say, because it is unclear what the third-party application does with the data from your database and how dependent it is on the uniqueness of the "PK" column in the view. An approach that is likely to work if the application does more than trivially display the data without any further use of the "PK" would be to associate a bigint with every retrieved uuid in your database in a (temporary) table and include that bigint in your view by linking on the uuids in your (temporary) tables. Since you can not trigger on SELECT statements, you would need a function to generate the bigint for every uuid the application retrieves. On updates or deletes on the underlying tables of the view or upon selecting data from related tables, you look up the uuid corresponding to the bigint passed in from the application. The lookup table and function would look somewhat like this:
CREATE TEMPORARY TABLE temp_table(
tempint bigserial PRIMARY KEY,
internal_uuid uuid);
CREATE INDEX ON temp_table(internal_uuid);
CREATE FUNCTION temp_int_for_uuid(pk uuid) RETURNS bigint AS $$
DECLARE
id bigint;
BEGIN
SELECT tempint INTO id FROM temp_table WHERE internal_uuid = pk;
IF NOT FOUND THEN
INSERT INTO temp_table(internal_uuid) VALUES (pk)
RETURNING tempint INTO id;
END IF;
RETURN id;
END; $$ LANGUAGE plpgsql STRICT;
Not pretty, not efficient, but fool-proof.
Use the bit() function to parse a decimal number from hex literal built from a substr of the UUID:
select ('x'||substr(UUID, 1, 16))::bit(64)::bigint
See SQLFiddle
Solution found.
UUID::text will return a string with hyphens. In order for substring(UUID::text from 1 for 16) to create a string that x can parse as hex the hyphens need to be stripped first.
The final query looks like:
SELECT hex_to_int(substring((select replace(id::text,'-','')) from 1 for 16))::bigint FROM table
The hext_to_int function needs to be able to handle a bigint, not just int. It looks like:
CREATE OR REPLACE FUNCTION hex_to_int(hexval character varying)
RETURNS bigint AS
$BODY$
DECLARE
result bigint;
BEGIN
EXECUTE 'SELECT x''' || hexval || '''::bigint' INTO result;
RETURN result;
END;
$BODY$`

Can I use nvarchar data type in plpgsql?

Can I use nvarchar data type in plpgsql?
If so, do I have to follow the same syntax?
If not, then what is the alternative?
Just use the text or varchar types.
There's no need for, or support for, nvarchar. All PostgreSQL text-like types are always in the database's text encoding, you can't choose a 1-byte or 2-byte type like in MS SQL Server. Generally you just use a UTF-8 encoded DB, and everything works with no hassle.

How to insert (raw bytes from file data) using a plain text script

Database: Postgres 9.1
I have a table called logos defined like this:
create type image_type as enum ('png');
create table logos (
id UUID primary key,
bytes bytea not null,
type image_type not null,
created timestamp with time zone default current_timestamp not null
);
create index logo_id_idx on logos(id);
I want to be able to insert records into this table in 2 ways.
The first (and most common) way rows will be inserted in the table will be that a user will provide a PNG image file via an html file upload form. The code processing the request on the server will receive a byte array containing the data in the PNG image file and insert a record in the table using something very similar to what is explained here. There are plenty of example of how to insert byte arrays into a postgresql field of type bytea on the internet. This is an easy exercise. An example of the insert code would look like this:
insert into logos (id, bytes, type, created) values (?, ?, ?, now())
And the bytes would be set with something like:
...
byte[] bytes = ... // read PNG file into a byte array.
...
ps.setBytes(2, bytes);
...
The second way rows will be inserted in the table will be from a plain text file script. The reason this is needed is only to populate test data into the table for automated tests, or to initialize the database with a few records for a remote development environment.
Regardless of how the data is entered in the table, the application will obviously need to be able to select the bytea data from the table and convert it back into a PNG image.
Question
How does one properly encode a byte array, to be able to insert the data from within a script, in such a way that only the original bytes contained in the file are stored in the database?
I can write code to read the file and spit out insert statements to populate the script. But I don't know how to encode the byte array for the plain text script such that when running the script from psql the image data will be the same as if the file was inserted using the setBytes jdbc code.
I would like to run the script with something like this:
psql -U username -d dataBase -a -f test_data.sql
The easiest way, IMO, to represent bytea data in an SQL file is to use the hex format:
8.4.1. bytea Hex Format
The "hex" format encodes binary data as 2 hexadecimal digits per byte, most significant nibble first. The entire string is preceded by the sequence \x (to distinguish it from the escape format). In some contexts, the initial backslash may need to be escaped by doubling it, in the same cases in which backslashes have to be doubled in escape format; details appear below. The hexadecimal digits can be either upper or lower case, and whitespace is permitted between digit pairs (but not within a digit pair nor in the starting \x sequence). The hex format is compatible with a wide range of external applications and protocols, and it tends to be faster to convert than the escape format, so its use is preferred.
Example:
SELECT E'\\xDEADBEEF';
Converting an array of bytes to hex should be trivial in any language that a sane person (such a yourself) would use to write the SQL file generator.