Unicode converstion issue DMU in oracle 12c - oracle12c

Why are some fields still needing expanding after byte-to-char conversion?
e.g. If a field is varchar2(30 byte) before the conversion, then we convert byte-to-char and this field is now varchar2(30 char), why are some fields not being able to handle the data that was in there previously and now to be expanded to varchar2(30+x char)...

Related

Creating a postgres column which allows all datatypes

I want to create a logging table which tracks changes in a certain table, like so:
CREATE TABLE logging.zaak_history (
event_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tstamp timestamp DEFAULT NOW(),
schemaname text,
tabname text,
columnname text,
operation text,
who text DEFAULT current_user,
new_val <any_type>,
old_val <any_type>
);
However, the column that I want to track can take different datatypes, such as text, boolean and numeric. Is there a datatype that support the functionality?
Currently I am thinking about storing is as jsonb, as this will deal with the datatype in the json formatting, but I was wondering if there is a better way.
There is no postgres data type that isn't strongly typed, because the "any" data type that is available as a pseudo type cannot be used as a column (it can be used in functions, etc.)
You could store the binary representation of your data, because every type does have a binary representation.
Your approach of using JSON seems more flexible, as you can also store meta data (such as type information).
However, I recommend looking at how other people have solved the same issue for alternative ideas. For example, most wikis store a copy of the entire record for history, which is easy to reconstruct, can be referenced independently, and has no typing issues.

Npgsql.PostgresException: Column cannot be cast automatically to type bytea

Using EF-Core for PostgresSQL, I have an entity with a field of type byte but decided to change it to type byte[]. But when I do migrations, on applying the migration file generated, it threw the following exception:
Npgsql.PostgresException (0x80004005): 42804: column "Logo" cannot be
cast automatically to type bytea
I have searched the internet for a solution but all I saw were similar problems with other datatypes and not byte array. Please help.
The error says exactly what is happening... In some cases PostgreSQL allows for column type changes (e.g. int -> bigint), but in many cases where such a change is non-trivial or potentially destructive, it refuses to do so automatically. In this specific case, this happens because Npgsql maps your CLR byte field as PostgreSQL smallint (a 2-byte field), since PostgreSQL lacks a 1-byte data field. So PostgreSQL refuses to cast from smallint to bytea, which makes sense.
However, you can still do a migration by writing the data conversion yourself, from smallint to bytea. To do so, edit the generated migration, find the ALTER COLUMN ... ALTER TYPE statement and add a USING clause. As the PostgreSQL docs say, this allows you to provide the new value for the column based on the existing column (or even other columns). Specifically for converting an int (or smallint) to a bytea, use the following:
ALTER TABLE tab ALTER COLUMN col TYPE BYTEA USING set_bytea(E'0', 0, col);
If your existing column happens to contain more than a single byte (should not be an issue for you), it should get truncated. Obviously test the data coming out of this carefully.

Converting MS SQL Hash Text to Postgres Hash Text

I am working on migrating all our databases from MS SQL server to Postgres. In this process, I am working on writing equivalent code in Postgres to yield the same hashed texts obtained in MS SQL.
Following is my code in MS SQL:
DECLARE #HashedText nvarchar(50)
DECLARE #InputText nvarchar(50) = 'password'
DECLARE #HashedBytes varbinary(20) -- maximum size of SHA1 output
SELECT #HashedBytes = HASHBYTES('SHA1', #InputText)
SET #HashedText = CONVERT(nvarchar(50), #HashedBytes, 2)
SELECT #HashedText
This is yielding the value E8F97FBA9104D1EA5047948E6DFB67FACD9F5B73
Following is equivalent code written in Postgres:
DO
$$
DECLARE v_InputText VARCHAR = 'password';
DECLARE v_HashedText VARCHAR;
DECLARE v_HashedBytes BYTEA;
BEGIN
SELECT
ENCODE(DIGEST(v_InputText, 'SHA1'), 'hex')
INTO
v_HashedBytes;
v_HashedText := CAST(v_HashedBytes AS VARCHAR);
RAISE INFO 'Hashed Text: %', v_HashedText;
END;
$$;
This yields the value 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8.
After spending some time I understood that replacing the datatype 'NVARCHAR' with 'VARCHAR' in MS SQL yields the same result as Postgres.
Now the problem is in MS SQL we already have passwords hashed and stored in database as shown above. I am unable to convert hashed text in MS SQL to Postgres and also unable to generate same hashed text in Postgres as Postgres doesn't support UTF-16 unicode.
So, I just want to know if there is any possibility of following solutions?
Convert hexadecimal value generated in MS SQL to hex value
equivalent to that generated by using VARCHAR datatype (which is
same value in Postgres)
Convert UTF8 texts to UTF16 texts in
Postgres (even by any kind of extensions) and generate hex values
which would be equivalent to values generated in MS SQL
Let's look at your suggestions in turn:
Convert hexadecimal value generated in MS SQL to hex value equivalent to that generated by using VARCHAR datatype (which is same value in Postgres)
This comes down to converting the user's password from UTF-16 to UTF-8 (or to some other encoding) and re-hashing it. To do this, you need to know the user's password, which theoretically, you don't - that's the point of hashing it in the first place.
In practice, you're using unsalted SHA1 hashes, for which large pre-computed tables exist, and for which brute force is feasible with a GPU-optimised algorithm. So a "grey hat" option would be to crack all your user's passwords, and re-hash them.
If you do so, it would probably be sensible to re-hash them using a salt and better hash function, as well as converting them to UTF-8.
Convert UTF8 texts to UTF16 texts in Postgres (even by any kind of extensions) and generate hex values which would be equivalent to values generated in MS SQL
This, theoretically, is simpler, you just need a routine to do the string conversion. However, as you've found, there is no built-in support for this in Postgres.
For any string composed entirely of ASCII characters, the conversion is trivial: insert a NULL byte (hex 00) before every byte of the string. But this will break any password that used a character not in this range.
An alternative would be to move the responsibility for generating the hash out of the database:
Retrieve the hash from the DB.
Ensure the user's input is represented as UTF-16 in your application, and calculate its hash.
If valid, you can now generate a new hash (because you know the password the user just typed), using a better algorithm, and store that in the DB instead of the old hash.
Once all active users have logged in at least once, you will have no SHA1 hashes left, and can remove the support for them completely.

Create big integer from the big end of a uuid in PostgreSQL

I have a third-party application connecting to a view in my PostgreSQL database. It requires the view to have a primary key but can't handle the UUID type (which is the primary key for the view). It also can't handle the UUID as the primary key if it is served as text from the view.
What I'd like to do is convert the UUID to a number and use that as the primary key instead. However,
SELECT x'14607158d3b14ac0b0d82a9a5a9e8f6e'::bigint
Fails because the number is out of range.
So instead, I want to use SQL to take the big end of the UUID and create an int8 / bigint. I should clarify that maintaining order is 'desirable' but I understand that some of the order will change by doing this.
I tried:
SELECT x(substring(UUID::text from 1 for 16))::bigint
but the x operator for converting hex doesn't seem to like brackets. I abstracted it into a function but
SELECT hex_to_int(substring(UUID::text from 1 for 16))::bigint
still fails.
How can I get a bigint from the 'big end' half of a UUID?
Fast and without dynamic SQL
Cast the leading 16 hex digits of a UUID in text representation as bitstring bit(64) and cast that to bigint. See:
Convert hex in text representation to decimal number
Conveniently, excess hex digits to the right are truncated in the cast to bit(64) automatically - exactly what we need.
Postgres accepts various formats for input. Your given string literal is one of them:
14607158d3b14ac0b0d82a9a5a9e8f6e
The default text representation of a UUID (and the text output in Postgres for data type uuid) adds hyphens at predefined places:
14607158-d3b1-4ac0-b0d8-2a9a5a9e8f6e
The manual:
A UUID is written as a sequence of lower-case hexadecimal digits, in
several groups separated by hyphens, specifically a group of 8 digits
followed by three groups of 4 digits followed by a group of 12 digits,
for a total of 32 digits representing the 128 bits.
If input format can vary, strip hyphens first to be sure:
SELECT ('x' || translate(uuid_as_string, '-', ''))::bit(64)::bigint;
Cast actual uuid input with uuid::text.
db<>fiddle here
Note that Postgres uses signed integer, so the bigint overflows to negative numbers in the upper half - which should be irrelevant for this purpose.
DB design
If at all possible add a bigserial column to the underlying table and use that instead.
This is all very shaky, both the problem and the solution you describe in your self-answer.
First, a mismatch between a database design and a third-party application is always possible, but usually indicative of a deeper problem. Why does your database use the uuid data type as a PK in the first place? They are not very efficient compared to a serial or a bigserial. Typically you would use a UUID if you are working in a distributed environment where you need to "guarantee" uniqueness over multiple installations.
Secondly, why does the application require the PK to begin with (incidentally: views do not have a PK, the underlying tables do)? If it is only to view the data then a PK is rather useless, particularly if it is based on a UUID (and there is thus no conceivable relationship between the PK and the rest of the tuple). If it is used to refer to other data in the same database or do updates or deletes of existing data, then you need the exact UUID and not some extract of it because the underlying table or other relations in your database would have the exact UUID. Of course you can convert all UUID's with the same hex_to_int() function, but that leads straight back to my point above: why use uuids in the first place?
Thirdly, do not mess around with things you have little or no knowledge of. This is not intended to be offensive, take it as well-meant advice (look around on the internet for programmers who tried to improve on cryptographic algorithms or random number generation by adding their own twists of obfuscation; quite entertaining reads). There are 5 algorithms for generating UUID's in the uuid-ossp package and while you know or can easily find out which algorithm is used in your database (the uuid_generate_vX() functions in your table definitions, most likely), do you know how the algorithm works? The claim of practical uniqueness of a UUID is based on its 128 bits, not a 64-bit extract of it. Are you certain that the high 64-bits are random? My guess is that 64 consecutive bits are less random than the "square root of the randomness" (for lack of a better way to phrase the theoretical drop in periodicity of a 64-bit number compared to a 128-bit number) of the full UUID. Why? Because all but one of the algorithms are made up of randomized blocks of otherwise non-random input (such as the MAC address of a network interface, which is always the same on a machine generating millions of UUIDs). Had 64 bits been enough for randomized value uniqueness, then a uuid would have been that long.
What a better solution would be in your case is hard to say, because it is unclear what the third-party application does with the data from your database and how dependent it is on the uniqueness of the "PK" column in the view. An approach that is likely to work if the application does more than trivially display the data without any further use of the "PK" would be to associate a bigint with every retrieved uuid in your database in a (temporary) table and include that bigint in your view by linking on the uuids in your (temporary) tables. Since you can not trigger on SELECT statements, you would need a function to generate the bigint for every uuid the application retrieves. On updates or deletes on the underlying tables of the view or upon selecting data from related tables, you look up the uuid corresponding to the bigint passed in from the application. The lookup table and function would look somewhat like this:
CREATE TEMPORARY TABLE temp_table(
tempint bigserial PRIMARY KEY,
internal_uuid uuid);
CREATE INDEX ON temp_table(internal_uuid);
CREATE FUNCTION temp_int_for_uuid(pk uuid) RETURNS bigint AS $$
DECLARE
id bigint;
BEGIN
SELECT tempint INTO id FROM temp_table WHERE internal_uuid = pk;
IF NOT FOUND THEN
INSERT INTO temp_table(internal_uuid) VALUES (pk)
RETURNING tempint INTO id;
END IF;
RETURN id;
END; $$ LANGUAGE plpgsql STRICT;
Not pretty, not efficient, but fool-proof.
Use the bit() function to parse a decimal number from hex literal built from a substr of the UUID:
select ('x'||substr(UUID, 1, 16))::bit(64)::bigint
See SQLFiddle
Solution found.
UUID::text will return a string with hyphens. In order for substring(UUID::text from 1 for 16) to create a string that x can parse as hex the hyphens need to be stripped first.
The final query looks like:
SELECT hex_to_int(substring((select replace(id::text,'-','')) from 1 for 16))::bigint FROM table
The hext_to_int function needs to be able to handle a bigint, not just int. It looks like:
CREATE OR REPLACE FUNCTION hex_to_int(hexval character varying)
RETURNS bigint AS
$BODY$
DECLARE
result bigint;
BEGIN
EXECUTE 'SELECT x''' || hexval || '''::bigint' INTO result;
RETURN result;
END;
$BODY$`

Can I use nvarchar data type in plpgsql?

Can I use nvarchar data type in plpgsql?
If so, do I have to follow the same syntax?
If not, then what is the alternative?
Just use the text or varchar types.
There's no need for, or support for, nvarchar. All PostgreSQL text-like types are always in the database's text encoding, you can't choose a 1-byte or 2-byte type like in MS SQL Server. Generally you just use a UTF-8 encoded DB, and everything works with no hassle.