Convert an UTF-8 varchar to bytea with latin1 encoding in PostgreSQL - postgresql

I'm porting a proprietary (and old) checksum algorithm from a MySQL procedure to PostgreSQL 8.4.
The whole database is UTF-8, but for this algorithm I need to convert the UTF-8 input to a bytea value with latin1 encoding. In MySQL, variables can be of different encoding and conversion is performed on-the-fly. Is there any function in PostgreSQL to do such an conversion?
The only alternative I see is to write a custom utf8_convert() C function which returns a bytea value and uses internally iconv() to convert the input to latin1. But I want to avoid such C functions.

From String Functions and Operators:
convert_to(text_in_database,'LATIN1')
But you have to be sure that the text can be encoded in Latin1 — you'll get an exception otherwise.

Related

Converting MS SQL Hash Text to Postgres Hash Text

I am working on migrating all our databases from MS SQL server to Postgres. In this process, I am working on writing equivalent code in Postgres to yield the same hashed texts obtained in MS SQL.
Following is my code in MS SQL:
DECLARE #HashedText nvarchar(50)
DECLARE #InputText nvarchar(50) = 'password'
DECLARE #HashedBytes varbinary(20) -- maximum size of SHA1 output
SELECT #HashedBytes = HASHBYTES('SHA1', #InputText)
SET #HashedText = CONVERT(nvarchar(50), #HashedBytes, 2)
SELECT #HashedText
This is yielding the value E8F97FBA9104D1EA5047948E6DFB67FACD9F5B73
Following is equivalent code written in Postgres:
DO
$$
DECLARE v_InputText VARCHAR = 'password';
DECLARE v_HashedText VARCHAR;
DECLARE v_HashedBytes BYTEA;
BEGIN
SELECT
ENCODE(DIGEST(v_InputText, 'SHA1'), 'hex')
INTO
v_HashedBytes;
v_HashedText := CAST(v_HashedBytes AS VARCHAR);
RAISE INFO 'Hashed Text: %', v_HashedText;
END;
$$;
This yields the value 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8.
After spending some time I understood that replacing the datatype 'NVARCHAR' with 'VARCHAR' in MS SQL yields the same result as Postgres.
Now the problem is in MS SQL we already have passwords hashed and stored in database as shown above. I am unable to convert hashed text in MS SQL to Postgres and also unable to generate same hashed text in Postgres as Postgres doesn't support UTF-16 unicode.
So, I just want to know if there is any possibility of following solutions?
Convert hexadecimal value generated in MS SQL to hex value
equivalent to that generated by using VARCHAR datatype (which is
same value in Postgres)
Convert UTF8 texts to UTF16 texts in
Postgres (even by any kind of extensions) and generate hex values
which would be equivalent to values generated in MS SQL
Let's look at your suggestions in turn:
Convert hexadecimal value generated in MS SQL to hex value equivalent to that generated by using VARCHAR datatype (which is same value in Postgres)
This comes down to converting the user's password from UTF-16 to UTF-8 (or to some other encoding) and re-hashing it. To do this, you need to know the user's password, which theoretically, you don't - that's the point of hashing it in the first place.
In practice, you're using unsalted SHA1 hashes, for which large pre-computed tables exist, and for which brute force is feasible with a GPU-optimised algorithm. So a "grey hat" option would be to crack all your user's passwords, and re-hash them.
If you do so, it would probably be sensible to re-hash them using a salt and better hash function, as well as converting them to UTF-8.
Convert UTF8 texts to UTF16 texts in Postgres (even by any kind of extensions) and generate hex values which would be equivalent to values generated in MS SQL
This, theoretically, is simpler, you just need a routine to do the string conversion. However, as you've found, there is no built-in support for this in Postgres.
For any string composed entirely of ASCII characters, the conversion is trivial: insert a NULL byte (hex 00) before every byte of the string. But this will break any password that used a character not in this range.
An alternative would be to move the responsibility for generating the hash out of the database:
Retrieve the hash from the DB.
Ensure the user's input is represented as UTF-16 in your application, and calculate its hash.
If valid, you can now generate a new hash (because you know the password the user just typed), using a better algorithm, and store that in the DB instead of the old hash.
Once all active users have logged in at least once, you will have no SHA1 hashes left, and can remove the support for them completely.

Get the enconding of a char column through SQL query in db2

How do I get the encoding of a char column in db2 through a query? I'm using Db2 i 7.2. Basically, I need to check if the enconding of a char column is "for bit data".
FOR BIT DATA is not an encoding, it is an attribute of character data types that have no encoding.
Each column data type and, for character types, encoding can be found in the catalog view QSYS2.SYSCOLUMNS -- columns DATA_TYPE and CCSID respectively. I guess you'll be looking for 'BINARY' and 'VARBIN' values.
More information in the manual.

Can I use nvarchar data type in plpgsql?

Can I use nvarchar data type in plpgsql?
If so, do I have to follow the same syntax?
If not, then what is the alternative?
Just use the text or varchar types.
There's no need for, or support for, nvarchar. All PostgreSQL text-like types are always in the database's text encoding, you can't choose a 1-byte or 2-byte type like in MS SQL Server. Generally you just use a UTF-8 encoded DB, and everything works with no hassle.

How to insert (raw bytes from file data) using a plain text script

Database: Postgres 9.1
I have a table called logos defined like this:
create type image_type as enum ('png');
create table logos (
id UUID primary key,
bytes bytea not null,
type image_type not null,
created timestamp with time zone default current_timestamp not null
);
create index logo_id_idx on logos(id);
I want to be able to insert records into this table in 2 ways.
The first (and most common) way rows will be inserted in the table will be that a user will provide a PNG image file via an html file upload form. The code processing the request on the server will receive a byte array containing the data in the PNG image file and insert a record in the table using something very similar to what is explained here. There are plenty of example of how to insert byte arrays into a postgresql field of type bytea on the internet. This is an easy exercise. An example of the insert code would look like this:
insert into logos (id, bytes, type, created) values (?, ?, ?, now())
And the bytes would be set with something like:
...
byte[] bytes = ... // read PNG file into a byte array.
...
ps.setBytes(2, bytes);
...
The second way rows will be inserted in the table will be from a plain text file script. The reason this is needed is only to populate test data into the table for automated tests, or to initialize the database with a few records for a remote development environment.
Regardless of how the data is entered in the table, the application will obviously need to be able to select the bytea data from the table and convert it back into a PNG image.
Question
How does one properly encode a byte array, to be able to insert the data from within a script, in such a way that only the original bytes contained in the file are stored in the database?
I can write code to read the file and spit out insert statements to populate the script. But I don't know how to encode the byte array for the plain text script such that when running the script from psql the image data will be the same as if the file was inserted using the setBytes jdbc code.
I would like to run the script with something like this:
psql -U username -d dataBase -a -f test_data.sql
The easiest way, IMO, to represent bytea data in an SQL file is to use the hex format:
8.4.1. bytea Hex Format
The "hex" format encodes binary data as 2 hexadecimal digits per byte, most significant nibble first. The entire string is preceded by the sequence \x (to distinguish it from the escape format). In some contexts, the initial backslash may need to be escaped by doubling it, in the same cases in which backslashes have to be doubled in escape format; details appear below. The hexadecimal digits can be either upper or lower case, and whitespace is permitted between digit pairs (but not within a digit pair nor in the starting \x sequence). The hex format is compatible with a wide range of external applications and protocols, and it tends to be faster to convert than the escape format, so its use is preferred.
Example:
SELECT E'\\xDEADBEEF';
Converting an array of bytes to hex should be trivial in any language that a sane person (such a yourself) would use to write the SQL file generator.

How to convert oid datatype to varchar during updation

i want update a column image_name data type is oid
using image1 column datatype is varchar
during the updation I'm getting error like
update mst_memberdetail set image_name=image1;
column "image" is of type oid but expression is of type character varying
can u please tell me how to convert explicitly datatype
I assume that image1 is an OID that refers to an LOB? If so, you probably don't want to update that directly to varchar since you will get escaped binary (possibly hexadecimal). This is something you want to do in a client library, not in the db backend, if that is your use case.
Alternatively you could write a PL/Perl function to unescape the binary, and use the lo_* functions to read it, pass it to PL/Perl and update the varchar, but if you have encoding issues that will fail, and it will be slow.
If you are using lobs I would just recommend leaving them as is.