Update bytea data that was first encoded as a hex char string

Update bytea data that was first encoded as a hex char string - postgresql

Given a table with a "blob" column of bytea type, that has data that was first encoded as a char string of 'hex' format by the toString method of Node's Buffer API ... yes, not the best idea ...
Is it possible to update the data so that the data is decoded from 'hex' and returned to raw bytes?
decode(blob,'hex') is not going to work as the blob is still bytea, not text.
Looking for a possibly 'pure' Postgres (> v12) solution without going back to Node's Buffer API first, but I'll accept the punishment of having to export the data, transform it, and update from there.

Fear not:
UPDATE blobs
SET b = decode(encode(b, 'escape'), 'hex');
With the escape format, the hex digits you stored by mistake will be output as characters, and the hex format will interpret pairs of these hex digits as a single byte.

Related

Decimal number from JSON truncated when read from Mapping Data Flow

When reading a decimal number from JSON data files using a Mapping Data Flow, the decimal digits are truncated.
[Source data]
{
"value": 1123456789.12345678912345678912
}
In Data Factory, the source dataset is configured with no schema. The Mapping Data Flow projection defines a decimal data type with sufficient precision and scale.
[Mapping Data Flow script]
source(output(
value as decimal(35,20)
),
...
However, when viewing the value in the 'Data preview' window, or reviewing pipeline output, the value is truncated.
[Output]
1123456789.12345670000000000000
This issue doesn't occur with other file formats, such as delimited text.
When previewing the data from the source dataset, the decimal digits are truncated in the same way. This occurs whether or not a schema is set. If a schema is set, the data type is number rather than decimal since it's JSON. Mozilla Developer Network documentation calls out the varied number of decimal digits supported by browsers, so I wonder if this is down to the JSON parser being used.
Is this expected behaviour? Can Data Factory be configured to support a the full number of decimal places when working with JSON? Unfortunately this is calling into question whether it's viable to perform aggregate calculations in Data Factory.

I've created a same test as yours and got the same result as follows:
After I changed the soure data, I put double quotes on the value:
Then I use a toDecimal(Value,35,20) to convert the string type to decimal type:
It seems work well. So we can get conclusion:
Don't let ADF do default data type conversion, it will truncate the length of the value.
This issue doesn't occur with other file formats, such as delimited text. Because the default value is the string type.

Its common issue with JSON parser, FloatParseHandling setting is available in .net library but not in ADF.
FloatParseHandling can set to Decimal while parsing the file through .net.
Until the setting is made available in ADF need to try the workaround - using quotes ["] at both end to make the value string and get it converted after loading.

Is there a good reason to insert HEX as BYTEA in PostgreSQL?

I found this from the documentation:
... binary strings specifically allow storing octets of value zero and
... octets outside the range 32 to 126 ...
To me that sounds like there is no reason to use BYTEA to store a HEX value? Still a lot of people seem to use BYTEA for sth. like this:
013d7d16d7ad4fefb61bd95b765c8ceb
007687fc64b746569616414b78c81ef1
Is there a good reason to do so?

There are three good reasons:
It will require less storage space, since two hexadecimal digits are stored as one byte.
It will automatically check the value for correctness:
SELECT decode('0102ABCDNONSENSE', 'hex');
ERROR: invalid hexadecimal digit: "N"
you can store and retrieve binary data without converting them from and to text if your API supports it.

PostgreSQL - Converting Binary data to Varchar

We are working towards migration of databases from MSSQL to PostgreSQL database. During this process we came across a situation where a table contains password field which is of NVARCHAR type and this field value got converted from VARBINARY type and stored as NVARCHAR type.
For example: if I execute
SELECT HASHBYTES('SHA1','Password')`
then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn if this value is converted into NVARCHAR then it is returning a text in the format "䏉悱ﾟ얿괚浦Њ鴼"
As we know that PostgreSQL doesn't support VARBINARY so we have used BYTEA instead and it is returning binary data. But when we try to convert this binary data into VARCHAR type it is returning hex format
For example: if the same statement is executed in PostgreSQL
SELECT ENCODE(DIGEST('Password','SHA1'),'hex')
then it returns
8be3c943b1609fffbfc51aad666d0a04adf83c9d.
When we try to convert this encoded text into VARCHAR type it is returning the same result as 8be3c943b1609fffbfc51aad666d0a04adf83c9d
Is it possible to get the same result what we retrieved from MSSQL server? As these are related to password fields we are not intended to change the values. Please suggest on what needs to be done

It sounds like you're taking a byte array containing a cryptographic hash and you want to convert it to a string to do a string comparison. This is a strange way to do hash comparisons but it might be possible depending on which encoding you were using on the MSSQL side.
If you have a byte array that can be converted to string in the encoding you're using (e.g. doesn't contain any invalid code points or sequences for that encoding) you can convert the byte array to string as follows:
SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'latin1') AS hash_string;
hash_string
-----------------------------
\u008BãÉC±`\u009Fÿ¿Å\x1Afm+
\x04ø<\u009D
If you're using Unicode this approach won't work at all since random binary arrays can't be converted to Unicode because there are certain sequences that are always invalid. You'll get an error like follows:
# SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'utf-8');
ERROR: invalid byte sequence for encoding "UTF8": 0x8b
Here's a list of valid string encodings in PostgreSQL. Find out which encoding you're using on the MSSQL side and try to match it to PostgreSQL. If you can I'd recommend changing your business logic to compare byte arrays directly since this will be less error prone and should be significantly faster.

then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn
if this value is converted into NVARCHAR then it is returning a text
in the format "䏉悱ﾟ얿괚浦Њ鴼"
Based on that, MSSQL interprets these bytes as a text encoded in UTF-16LE.
With PostgreSQL and using only built-in functions, you cannot obtain that result because PostgreSQL doesn't use or support UTF-16 at all, for anything.
It also doesn't support nul bytes in strings, and there are nul bytes in UTF-16.
This Q/A: UTF16 hex to text suggests several solutions.
Changing your business logic not to depend on UTF-16 would be your best long-term option, though. The hexadecimal representation, for instance, is simpler and much more portable.

BLOB to String conversion - DB2

A table in DB2 contains BLOB data. I need to convert it into String so that it can be viewed in a readable format. I tried options like
getting blob object and converting to byte array
string buffer reader
sqoop import using --map-column-java and --map-column-hive options.
After these conversions also i am not able to view the data in readable format. Its in an unreadable format like 1f8b0000..
Please suggest a solution on how to handle this scenario.

I think you need to look at the CAST function.
SELECT CAST(BLOB_VAR as VARCHAR(SIZE) CCSID UNICODE) as CHAR_FLD
Also, be advised that max value of SIZE is 32K.
Let me know if you tried this.

1f8b0000 indicates the data in gzip form and therefore you have to unzip it.

Inserting string in FOR BIT DATA column in DB2

When i try inserting a string value in a column in DB2, which is defined as CHAR(15) FOR BIT DATA, i notice that it gets converted into some other format, probably hexadecimal.
On retrieving the data, i get a byte array, and on trying to convert it back to ASCII, using System.Text.Encoding.ASCII.GetString, i get back a string of some junk characters.
Anyone faced this issue?
Any resolution?
Thanks in advance.

For Bit Data prevents the code page convertion between the client and the server. This is normally used to insert binary data, instead of strings.
Please, take a look at this forum, where many cases are proposed and solved: http://www.dbforums.com/db2/1663992-bit-data.html
You could eventually make a cast to database page (it depends you platform)
CAST(c1 AS CHAR(n) FOR SBCS DATA)
CAST (<forbitdataexpression> AS [VARCHAR|CHAR][(<n>)] FOR [SBCS|DBCS] DATA)
References
http://bytes.com/topic/db2/answers/180874-problem-db2-field-type-char-n-bit-data
http://bytes.com/topic/db2/answers/182124-function-convert-bit-data-column-string
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0023459.html

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Update bytea data that was first encoded as a hex char string - postgresql

Fear not: UPDATE blobs SET b = decode(encode(b, 'escape'), 'hex'); With the escape format, the hex digits you stored by mistake will be output as characters, and the hex format will interpret pairs of these hex digits as a single byte.

Related

Decimal number from JSON truncated when read from Mapping Data Flow

Is there a good reason to insert HEX as BYTEA in PostgreSQL?

PostgreSQL - Converting Binary data to Varchar

BLOB to String conversion - DB2

Inserting string in FOR BIT DATA column in DB2

Categories

Resources