Scala Slick, Trouble with inserting Unicode into database

Scala Slick, Trouble with inserting Unicode into database - scala

When I create a new row of data that contains several columns that may contain Unicode, the columns that do contain Unicode are being corrupted.
However, if I insert that data directly, using the mysql-cli Slick will retrieve that Unicode data fine.
Is there anything I should add to my table class to tell Slick that this Column may be a Unicode string?

I found the problem, I have to set the character encoding, for the connection.
db.default.url="jdbc:mysql://localhost/your_db_name?characterEncoding=UTF-8"

You probably need to configure that on the db schema side by setting the right collation.

Related

UTF-8 table with lowercase and uppercase and relation between them

I'm looking for a UTF-8 table / tables etc. with all the lowercase (small) and uppercase (capital) characters in (hexa-)decimal form and with a relation between the elements.
So far I found:
https://www.fileformat.info/info/unicode/category/Ll/list.htm
https://www.fileformat.info/info/unicode/category/Lu/list.htm
these are nice lists, though:
there is no relation between the 2 tables (I could create this by means some scripting creating a new table)
the values are Unicode values and not utf-8 (hexa-)decimal values.
Any suggestions?

My advise is to use an existing library that can do case conversion for you. You mentioned you were targeting C++ so the library ICU is one option.

How to return strings that are trimmed automatically by ServiceStack.OrmLite.PostgreSQL?

I've noticed that when my entities are returned, the string values are padded to the number of characters of the field definition in the database, is the a setting to control this behavior?
Thank you,
Stephen
PostGreSQL 9.3
ServiceStack.OrmLite.PostgreSQL.4.0.30

PostgreSQL only pads strings for fixed-size CHAR(N) columns, if you don't want this behavior you should instead use VARCHAR(N) columns instead - the recommended default for strings.
OrmLite also uses VARCHAR for strings when it's used to generate your table schema's, e.g:
db.CreateTable<Table>();
If dealing with legacy DB's, you can get OrmLite to automatically trim padded strings returned from RDBMS's with:
OrmLiteConfig.StringFilter = s => s.TrimEnd();

SSIS Convert Between Unicode and Non-Unicode Error

I have an ssis package where I am using an OLEDB source linking to SQL Server 2005 table. All columns except a date column are NVARCHAR(255). I am using an Excel destination and using a SQL statement to create the sheet in the Excel workbook, the SQL is in the excel connection manager (effectively a create table statement that creates a sheet) and is derived from the mapping of the columns from the DB.
No matter what I have done I keep getting this unicode --> non-unicode conversion error between my source and destination. Tried conversion to string[DT_STR] between S > D, removed it, changed SQL Table VARCHAR to NVARCHAR and still get this flippin error.
Because I am creating the sheet in Excel with a SQL statement I do not see any way to actually pre-define what the data types of the columns will be in the Excel sheet. I imagine it would be a default meta data but I do not know.
So between my SQL table destination and the creation of my Excel sheet with this SSIS sql statement how can I stop this error coming up?
My error is:
Error at Data Flow Task [OLE DB Source [1]]: Column "MyColumn" cannot convert between unicode and non-unicode string data types.
And for all nvarchar columns.
Appreciate any help
Thanks
Andrew

The below Steps worked for me:
right click on source task.
click on "Show Advanced editor".
Go to "Input and Output Properties" tab.
select the output column for which you are getting the error.
Its data type will be "String[DT_STR]".
Change that data type to "Unicode String[DT_WSTR]".
save and close.

Add Data Conversion transformations to convert string columns from non-Unicode (DT_STR) to Unicode (DT_WSTR) strings.
You need to do this for all the string columns...

The missing piece here is Data Conversion object. It should be in between OLE DB Source and Destination object.

First, add a data conversion block into your data flow diagram.
Open the data conversion block and tick the column for which the error is showing. Below change its data type to unicode string(DT_WSTR) or whatever datatype is expected and save.
Go to the destination block. Go to mapping in it and map the newly created element to its corresponding address and save.
Right click your project in the solution explorer.select properties. Select configuration properties and select debugging in it. In this, set the Run64BitRunTime option to false (as excel does not handle the 64 bit application very well).

Instead of adding an earlier suggested Data Conversion you can cast the nvarchar column to a varchar column. This prevents you from having an unnecessary step and has a higher performance then the alternative.
In the select of your SQL statement replace date with CAST(date AS varchar([size])). For some reason this does not yet change the output data type. To do this do the following:
Right click your OLE DB Source step and open the advanced editor.
Go to Input and Output Properties
Select Output Columns
Select your column
Under Data Type Properties change DataType to string [DT_STR]
Change Length to the length you specified in your CAST statement
After doing this your source data will be output as a varchar and your error will disappear.
Source

I have been having the same issue and tried everything written here but it was still giving me the same error.
Turned out to be NULL value in the column which I was trying to convert.
Removing the NULL value solved my issue.
Cheers,
Ahmed

No-one seems to mention this but, converting varchar to nvarchar in the source query also solves the issue.

On the above example I kept losing the values, I think that delaying the Validation will allow the new data types to be saved as part of the meta data.
On the connection Manager for 'Excel Connection Manager' set the Delay Validation to False from the Properties.
Then on the data flow Destination task for Excel set the ValidationExternalMetaData to False, again from the properties.
This will now allow you to right click on the Excel Destination Task and go to Advanced Editor for Excel Destination --> far right tab - Input and Output Properties. In the External Columns folder section you will be able to now change the Data Types and Length values of the problematic columns and this can now be saved.
Good Luck!

I experienced this condition when I had installed Oracle version 12 client 32 bit client connected to an Oracle 12 Server running on windows.
Although both of Oracle-source and SqlServer-destination are NOT Unicode, I kept getting this message, as if the oracle columns were Unicode.
I solved the problem inserting a data conversion box, and
selecting type DT-STR (not unicode) for varchar2 fields and DT-WSTR (unicode) for numeric fields, then I've dropped the 'COPY OF' from the output field name.
Note that I kept getting the error because I had connected the source box arrow with the conversion box BEFORE setting the convertion types. So I had to switch source box and this cleaned all the errors in the destination box.

When creating table in SQL Server make your table columns NVARCHAR instead of VARCHAR.

I think people are missing this. In my case I had 100 character columns to convert between Oracle and MS Sql. All this stuff about Data Conversion and Advanced Editor is incredibly tedious if you have a 100 separate character columns to assign. Plus SSIS being SSIS, it will sometimes reset all your 100 advanced editor changes even if you set VALIDATEEXTERNALMETADATA to false, incredibly obnoxious. I wouldn't mind doing the Data Conversion if there was some value to it but 20 years ago ETL tools used to take oracle character to ms sql characters without fussing. What Bakalolo and Zafer say is the answer if you have a lot of character columns and you can live with nvarchar, just declare all your output ms sql columns (nvarchar) and your data task will automatically assign your oracle fields into ms sql fields with no manual overrides. I have also found that the new Oracle Source (2021) doesn't complain about a unicode conversion to varchar in ms sql. A colleague just told me that the ssis wizard (it may be only in vs 2019+) to assign oracle character to ms sql varchar will do the assignments automatically with no override, but I haven't tried that personally.
2022 update - I think this is just vs 2019 created packages and later. An ado.net task reading a varchar ms sql table going to oledb (and ado.net I think) ms sql varchar will throw the unicode error. If you switch the input task to oledb reading ms sql varchar table you won't have to do the advanced editor overrides for the varchar fields. If you don't want to do advanced editor overrides (who does?) try different tasks and more oledb tasks.

I just encounter same issue, I solve it in my SQL request : using convert directly
CONVERT(NVARCHAR(50),'') AS MyVarName
I need to put an empty (or fix size string) into excel file. Converting force type of MyVarName from DT-STR to DT-WSTR (unicode)

I know this is a very old post but I ran into the same issue and found that I had to manually select the conversion component output alias as the mapping in the excel destination component. Since the names of the OLE DB Source match the excel column names it was mapping it to the OLE DB and not to the Output Alias. Such as SourceID column from the OLE DB component being named Copy of SourceID after conversion. I don't see the original question saying they specifically selected the new alias name just that they mapped to DB columns. #Serge Voloshenko post comes the closest but also does not mention to make sure the mapping happens. To a new SSIS user this might be overlooked.

Storing uni code characters in PostgreSQL 8.4 table

I want to store unicode characters in on of the column of PostgreSQL8.4 datat base table. I want to store non-English language data say want to store the Indic language texts. I have achieved the same in Oracle XE by converting the text into unicode and stored in the table using nvarchar2 column data type.
The same way I want to store unicode characters of Indic languages say (Tamil,Hindi) in one of the column of a table. How to I can achieve that,what data type should I use?
Please guide me, thanks in advance

Just make sure the database is initialized with encoding utf8. This applies to the whole database for 8.4, later versions are more sophisticated. You might want to check the locale settings too - see the manual for details, particularly around matching with LIKE and text pattern ops.

Postgresql: auto lowercase text while (or before) inserting to a column

I want to achieve case insensitive uniqueness in a varchar column. But, there is no case insensitive text data type in Postgres. Since original case of text is not important, it will be a good idea to convert all to lowercase/uppercase before inserting in a column with UNIQUE constraint. Also, it will require one INDEX for quick search.
Is there any way in Postgres to manipulate data before insertion?
I looked at this other question: How to automatically convert a MySQL column to lowercase.
It suggests using triggers on insert/update to lowercase text or to use views with lowercased text. But, none of the suggested methods ensure uniqueness.
Also, since this data will be read/written by various applications, lowercasing data in every individual application is not a good idea.

ALTER TABLE your_table
ADD CONSTRAINT your_table_the_column_lowercase_ck
CHECK (the_column = lower(the_column));
From the manual:
The use of indexes to enforce unique constraints could be considered
an implementation detail that should not be accessed directly.

You don't need a case-insensitive data type (although there is one)
CREATE UNIQUE INDEX idx_lower_unique
ON your_table (lower(the_column));
That way you don't even have to mess around with the original data.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse