PostgreSQL: varchar(1) and Umlaut - postgresql

I have a VARCHAR(1) field in postgresql.
Now I export data from a postgresql 9.4 server with pg_dump
and import it to a postgresql 9.5 server with pgsql.
When I import it, I get an error:
ERROR: value too long for type character varying(1) COPY XXX "Ö"
That means in the table there is the value "Ö" which takes 2 bytes instead of 1 byte.
Must I increase the column to VARCHAR(2)?
Is there another way to keep VARCHAR(1) and use a locale etc.?
Why could this data ever be stored there?
Thanks for your help!

Easy fix:
The encoding of the target database was wrong and had to be set to UTF8.

Related

Clob(length 1048576) DB2 to PostgreSQL best datatype?

We had a table with a column with a Clob(length 1048576) that would store search text that helps with searching. When I transferred it from DB2 to Postgres in our migration, I found that it wasn't working as well. So I going to try text or varchar, but I was finding it would take much longer for the long text entries to be added to the table to the point my local wildfly window would timeout when trying to run.
What is the equilavelent of a datatpye that accepts text that I should be using in postgres to replace a Clob that was length 1048576 in DB2? It might be that I was using the right datatypes but didn't have the right corresponding size.
Use text. That is the only reasonable data type for long character strings.

Converting MS SQL Hash Text to Postgres Hash Text

I am working on migrating all our databases from MS SQL server to Postgres. In this process, I am working on writing equivalent code in Postgres to yield the same hashed texts obtained in MS SQL.
Following is my code in MS SQL:
DECLARE #HashedText nvarchar(50)
DECLARE #InputText nvarchar(50) = 'password'
DECLARE #HashedBytes varbinary(20) -- maximum size of SHA1 output
SELECT #HashedBytes = HASHBYTES('SHA1', #InputText)
SET #HashedText = CONVERT(nvarchar(50), #HashedBytes, 2)
SELECT #HashedText
This is yielding the value E8F97FBA9104D1EA5047948E6DFB67FACD9F5B73
Following is equivalent code written in Postgres:
DO
$$
DECLARE v_InputText VARCHAR = 'password';
DECLARE v_HashedText VARCHAR;
DECLARE v_HashedBytes BYTEA;
BEGIN
SELECT
ENCODE(DIGEST(v_InputText, 'SHA1'), 'hex')
INTO
v_HashedBytes;
v_HashedText := CAST(v_HashedBytes AS VARCHAR);
RAISE INFO 'Hashed Text: %', v_HashedText;
END;
$$;
This yields the value 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8.
After spending some time I understood that replacing the datatype 'NVARCHAR' with 'VARCHAR' in MS SQL yields the same result as Postgres.
Now the problem is in MS SQL we already have passwords hashed and stored in database as shown above. I am unable to convert hashed text in MS SQL to Postgres and also unable to generate same hashed text in Postgres as Postgres doesn't support UTF-16 unicode.
So, I just want to know if there is any possibility of following solutions?
Convert hexadecimal value generated in MS SQL to hex value
equivalent to that generated by using VARCHAR datatype (which is
same value in Postgres)
Convert UTF8 texts to UTF16 texts in
Postgres (even by any kind of extensions) and generate hex values
which would be equivalent to values generated in MS SQL
Let's look at your suggestions in turn:
Convert hexadecimal value generated in MS SQL to hex value equivalent to that generated by using VARCHAR datatype (which is same value in Postgres)
This comes down to converting the user's password from UTF-16 to UTF-8 (or to some other encoding) and re-hashing it. To do this, you need to know the user's password, which theoretically, you don't - that's the point of hashing it in the first place.
In practice, you're using unsalted SHA1 hashes, for which large pre-computed tables exist, and for which brute force is feasible with a GPU-optimised algorithm. So a "grey hat" option would be to crack all your user's passwords, and re-hash them.
If you do so, it would probably be sensible to re-hash them using a salt and better hash function, as well as converting them to UTF-8.
Convert UTF8 texts to UTF16 texts in Postgres (even by any kind of extensions) and generate hex values which would be equivalent to values generated in MS SQL
This, theoretically, is simpler, you just need a routine to do the string conversion. However, as you've found, there is no built-in support for this in Postgres.
For any string composed entirely of ASCII characters, the conversion is trivial: insert a NULL byte (hex 00) before every byte of the string. But this will break any password that used a character not in this range.
An alternative would be to move the responsibility for generating the hash out of the database:
Retrieve the hash from the DB.
Ensure the user's input is represented as UTF-16 in your application, and calculate its hash.
If valid, you can now generate a new hash (because you know the password the user just typed), using a better algorithm, and store that in the DB instead of the old hash.
Once all active users have logged in at least once, you will have no SHA1 hashes left, and can remove the support for them completely.

ORA-12899 - value too large for column when upgrading to Oracle 12C

My project is going through a tech upgrade so we are upgrading Oracle DB from 11g to 12c. SAP DataServices is upgraded to version 14.2.7.1156.
The tables in Oracle 12C is defaulted to varchar (byte) when it shoud be varchar (char). I understand this is normal. So, I altered the session for each datastore running
`ALTER session SET nls_length_semantics=CHAR;`
When I create a new table, with varchar (1), I am able to load unicode characters like Chinese characters (i.e 东) into the new table from Oracle.
However, when I try to load the same unicode character via SAPDS into the same table, it throws me an error 'ORA-12899 - value too large for column'. My datastore settings are:
Locale
Language: default
Code Page: utf-8
Server code page: utf-8
Additional session parameters:
ALTER session SET nls_length_semantics=CHAR
I would really appreciate to know what settings I need to change in my SAP BODS since my Oracle seems to be working fine.
I think, you should consider modifying table column from varchar2(x BYTE) to varchar2(x CHAR) to allow Unicode (UTF-8 format) data and avoid ORA-12899 .
create table test1 (name varchar2(100));
insert into test1 values ('east');
insert into test1 values ('东');
alter table test1 modify name varchar2(100 char);
-- You can check 'char_used' for each column like -
select column_name, data_type, char_used from user_tab_columns where table_name='TEST1';

DB2 DBCLOB data INSERT with Unicode data

The problem at hand is to insert data into a DB2 table which has a DBCLOB column. The table's encoding is Unicode. The subsystem is a MIXED YES with Japanese CCSID set of (290, 930, 300). The application is bound ENCODING CCSID.
I was successful in FETCHING the DBCLOB's data in Unicode, no problem there. But when I turn around and try to INSERT it back, the data inserted is being interpreted as not being Unicode, seems DB2 thinks its EBCDIC DBCS/GRAPHIC, and the inserted row shows Unicode 0xFEFE. When I manually update the data being inserted to valid DBCS then the data inserts OK and shows the expected Unicode DBCS values.
To insert the data I am using a dynamically prepared INSERT statement with a placeholder for the DBCLOB column. The SQLVAR entry associated with the placeholder is a DBCLOB_LOCATOR with the CCSID set to 1200.
A DBCLOB locator is being created doing a SET dbclobloc = SUBSTR(dbclob, 1, length). The created locator is being put into SQLDA. Then the prepared INSERT is being executed.
It seems DB2 is ignoring the 1200 CCSID associated with the DBCLOB_LOCATOR SQLVAR. Attempts to put a CAST(? AS DBCLOB CCSID UNICODE) on the placeholder in the INSERT do not help because at that time DB2 seems to have made up its mind about the encoding of the data to be inserted.
I am stuck :( Any ideas?
Greg
I think I figured it out and it is not good: the SET statement for the DBCLOB_LOCATOR is static SQL and the DBRM is bound ENCODING EBCDIC. Hence DB2 has no choice but to assume the data is in the CCSID of the plan.
I also tried what the books suggest and used a SELECT ... FROM SYSIBM.SYSDUMMYU to set the DBCLOB_LOCATOR. This should have told DB2 that the data was coming in Unicode. But it failed again, with symptoms indicating it still assumed the DBCS EBCDIC CCSID.
Not good.

How to change Oracle 10gr2 express edition's default character set

I installed oracle 10gr2 express edition on my laptop.
when I import a .dmp file which is generated by oracle 10gr2 enterprise edition, error occurs.
the database server which generated the .dmp file is running with GBK charset, but my oracle express server is running with UTF-8.
SQL> select userenv('language') from dual;
USERENV('LANGUAGE')
--------------------------------------------------------------------------------
SIMPLIFIED CHINESE_CHINA.AL32UTF8
how can I configure my own oracle server to import the .dmp file?
edit ---------------------------------------------------
my own oracle express server:
SQL> select * from v$nls_parameters where parameter like '%CHARACTERSET';
PARAMETER
--------------------------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
NLS_CHARACTERSET
AL32UTF8
NLS_NCHAR_CHARACTERSET
AL16UTF16
The new character set requires up to 4 bytes per character while the old one only required up to 2 bytes. So due to the character set change, some character fields will require more space than before. Obviously, some of them have now hit the column lenght limit.
To resolve it, you'll have to increase the length of the affected columns or change the length semantics so the length is interpreted in characters (and not in bytes, which is the default).
If your dump file contains both the schema definition and the data, you'll have to work in phases: first import the schema only, the increase the column lengths and finally import the data.
I have no experience with the length semantics. I usually specify it explicit. See the documentation about the NLS_LENGTH_SEMANTICS parameter for information. It affects how the number 100 in the following statement is interpreted:
CREATE TABLE example (
id NUMBER,
name VARCHAR(100)
);
Usually, it's better to be explicit and specify the unit directly:
CREATE TABLE example (
id NUMBER,
name VARCHAR(100 CHAR)
);
The dump file contains a whole schema, alter column length is not a good option for me.
the Oracle Express edition use UTF-8 as default, after googled the web, I found a way to alter the database character set.
in my case:
UTF-8 --> GBK
I connected with user sys as sysdba in sqlplus. then executed following commands:
shutdown immediate
startup mount
alter system enable restricted session ;
alter system set JOB_QUEUE_PROCESSES=0;
alter system set AQ_TM_PROCESSES=0;
alter database open;
alter database character set internal_use ZHS16GBK ;
shutdown immediate
startup
I don't know what these commands done to my database, but It works.