As part of requirement, we have to insert special characters data into PostgreSQL table by using c# queries(ORM). Getting the error while committing the trasnaction.
"invalid byte sequence for encoding "UTF8": 0xc3 0x5f"
Special characters: Ã_Ã
Query:
INSERT INTO XXXXXXX(key, file_name, file_path, source_id) VALUES (E'XXXX', E'Ã_Ã', ‘Ã_Ã.xlsx', XXX5)
.Net framwork 4.61.
PostgreSQl 11
nHibernate
C#
Please add "Unicode=true;" to your connection string. Refer to https://www.devart.com/dotconnect/postgresql/docs/?Devart.Data.PostgreSql~Devart.Data.PostgreSql.PgSqlConnection~ConnectionString.html.
Related
I've been having some problems trying to save a string word with limited varchar(9).
create database big_text
LOCALE 'en_US.utf8'
ENCODING UTF8
create table big_text(
description VARCHAR(9) not null
)
# OK
insert into big_text (description) values ('sintético')
# I Got error here
insert into big_text (description) values ('sintético')
I already know that the problem is because one word is using 'é' -> Latin small letter E with Acute (this case only have 1 codepoint) and another word is using 'é' -> Latin Small Letter E + Combining Acute Accent Modifier. (this case I have 2 codepoint).
How can I store the same word using both representation in a limited varchar(9)? There is some configuration that the database is able to understand both ways? I thought that database being UTF8 is enough but not.
I appreciate any explanation that could help me understand where am I wrong? Thank you!
edit: Actually I would like to know if there is any way for postgres automatically normalize for me.
A possible workaround using CHECK to do the character length constraint.
show lc_ctype;
lc_ctype
-------------
en_US.UTF-8
create table big_text(
description VARCHAR not null CHECK (length(normalize(description)) <= 9)
)
-- Note shortened string. Explanation below.
select 'sintético'::varchar(9);
varchar
----------
sintétic
insert into big_text values ('sintético');
INSERT 0 1
select description, length(description) from big_text;
description | length
-------------+--------
sintético | 10
insert into big_text values ('sintético test');
ERROR: new row for relation "big_text" violates check constraint "big_text_description_check"
DETAIL: Failing row contains (sintético test).
From here Character type the explanation for the string truncation vs the error you got when inserting:
An attempt to store a longer string into a column of these types will result in an error, unless the excess characters are all spaces, in which case the string will be truncated to the maximum length.(This somewhat bizarre exception is required by the SQL standard.)
If one explicitly casts a value to character varying(n) or character(n), then an over-length value will be truncated to n characters without raising an error. (This too is required by the SQL standard.)
We are migrating DB2 data to PostgreSQL 11.x using AWS DMS, we have varchar fields in db2 with trailing spaces and without any TRIM these fields working fine when we are using these fields in a WHERE clause. I think DB2 internally trimming them as these are varchar fields. But after moving to PostgreSQL these fields are not working without TRIM and also some times these giving unexpected results even if you use TRIM. below is the detailed problem.
Source: DB2 - RECIP_NUM -- VARCHAR(10) -- 'ST001 '
select RECIP_NUMBER, SERV_TYPE, LENGTH(SERV_TYPE) AS before_trim_COL_LENGTH, LENGTH(trim(SERV_TYPE)) AS after_trim_COL_LENGTH
from serv_type rst
WHERE SERV_TYPE = 'ST001' -- THIS WORKS FINE WITHOUT TRIM
Output:Output of DB2
Target: PGSQL -- RECIP_NUM -- VARCHAR(10) -- 'ST001 '
select RECIP_NUMBER, SERV_TYPE, LENGTH(SERV_TYPE) AS COL_LENGTH
from serv_type rst
WHERE trim(SERV_TYPE) = 'ST001' -- THIS IS NOT GIVING ANY OUTPUT WITHOUT TRIM
Output: Output of PostgreSQL
Is there any way we can tell PostgreSQL to ignore the trailing spaces of a VARCHAR Column?
Postgres doesn't follow the SQL standard, which requires the shorter string be padded, when comparing VARCHAR or TEXT strings; it only pads the CHAR strings. Therefore, you can use ...WHERE SERV_TYPE::char = 'ST001'::char to simulate the Db2 behaviour. Note though that this will preclude the use of index on SERV_TYPE, same as when using trim(SERV_TYPE).
I have a database with one column of the type nvarchar. If I write
INSERT INTO table VALUES ("玄真")
It shows ¿¿ in the table. What should I do?
I'm using SQL Developer.
Use single quotes, rather than double quotes, to create a text literal and for a NVARCHAR2/NCHAR text literal you need to prefix it with N
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( value NVARCHAR2(20) );
INSERT INTO table_name VALUES (N'玄真');
Query 1:
SELECT * FROM table_name
Results:
| VALUE |
|-------|
| 玄真 |
First, using NVARCHAR might not even be necessary.
The 'N' character data types are for storing data that doesn't 'fit' in the database's defined character set. There's an auxiliary character set defined as the NCHAR Character set. It's kind of a band aid - once you create a database it can be difficult to change its character set. Moral of this story - take great care in defining the Character Set when creating your database and do not just accept the defaults.
Here's a scenario (LiveSQL) where we're storing a Chinese string in both NVARCHAR and VARCHAR2.
CREATE TABLE SO_CHINESE ( value1 NVARCHAR2(20), value2 varchar2(20 char));
INSERT INTO SO_CHINESE VALUES (N'玄真', '我很高興谷歌翻譯。' )
select * from SO_CHINESE;
Note that both the character sets are in the Unicode family. Note also I told my VARCHAR2 string to hold 20 characters. That's because some characters may require up to 4 bytes to be stored. Using a definition of (20) would give you only room to store 5 of those characters.
Let's look at the same scenario using SQL Developer and my local database.
And to confirm the character sets:
SQL> clear screen
SQL> set echo on
SQL> set sqlformat ansiconsole
SQL> select *
2 from database_properties
3 where PROPERTY_NAME in
4 ('NLS_CHARACTERSET',
5 'NLS_NCHAR_CHARACTERSET');
PROPERTY_NAME PROPERTY_VALUE DESCRIPTION
NLS_NCHAR_CHARACTERSET AL16UTF16 NCHAR Character set
NLS_CHARACTERSET AL32UTF8 Character set
First of all, you should to establish the Chinese character encoding on your Database, for example
UTF-8, Chinese_Hong_Kong_Stroke_90_BIN, Chinese_PRC_90_BIN, Chinese_Simplified_Pinyin_100_BIN ...
I show you an example with SQL Server 2008 (Management Studio) that incorporates all of this Collations, however, you can find the same characters encodings in other Databases (MySQL, SQLite, MongoDB, MariaDB...).
Create Database with Chinese_PRC_90_BIN, but you can choose other Coallition:
Select a Page (Left Header) Options > Collation > Choose the Collation
Create a Table with the same Collation:
Execute the Insert Statement
INSERT INTO ChineseTable VALUES ('玄真');
Server: IBM i Series AS/400 running DB2
Client: Linux using unixodbc
I have a table in a DB2 database with a column of data using CCSID 836 (Simplified Chinese EBCDIC). I want to get results in UTF-16 so they work on other systems, but I'm having a hard time finding the right way to convert.
When I try:
SELECT CAST(MYCOLNAME AS VARCHAR(100) CCSID 13491) FROM MY.TABLE
I get the error:
SQL State: 22522
Vendor Code: -189
Message: [SQL0189] Coded Character Set Identifier 13491 not valid. Cause . . . . . : Coded Character Set Identifier (CCSID) 13491 is not valid for one of the following reasons: -- The CCSID is not EBCDIC. -- The CCSID is not supported by the system. -- The CCSID is not vaid for the data type. -- If the CCSID is specified for graphic data, then the CCSID must be a DBCS CCSID. -- If the CCSID is specified for UCS-2 or UTF-16 data, then the CCSID must be a UCS-2 or UTF-16 CCSID. -- If the CCSID is specified for XML data, then the CCSID must be SBCS or Unicode. It must not be DBCS or 65545.
How can I convert the data from CCSID 836 into UTF-16? I've been equally unsuccessful with UNICODE_STR().
I can't explain why, but here's what works:
SELECT CAST(MYCOLNAME AS VARCHAR(100) CCSID 935) FROM MY.TABLE
The native CCSID for the column in question is 836, which seems very similar to 935, so I don't understand the difference. But 935 works for me.
Good day,
i have a Sybase ASE 12.5 database on windows NT server
i need to know the character set of some Arabic data stored in the database
i checked the database default character set : it is "CP850"
but the stored data are "Arabic" data,so they are stored using another character set
i tried checking the "master..syscharsets" table , i can't find any popular Arabic charsets
Command: select id, csid, name, description from master..syscharsets
Result: http://dc414.2shared.com/download/CCfkf_RW/syscharsets_cropped.jpg?tsid=20140507-130321-3ade23f2
any ideas how to know the character set of the data?
i think it uses cp850 multilingual, try sp_configure "enable unicode conversions" in the server and try also sp_help tableName