Teradata replacing hyphens with boxes in SQL Assistant? - unicode

I have been trying to load data from Teradata tables to Excel files but the problem I'm encountering is that of special characters getting replaced by boxes. As can be seen in the image below, some hyphens and Turkish characters are getting replaced by boxes.
Special character replaced by boxes in Teradata SQL Assistant
The connection by default uses the ASCII character set and I tried to use the below code to translate the string to unicode (since I want to eventually use UTF-16 encoding):
TRANSLATE(_UNICODE "Column Name" USING UNICODE_TO_UNICODE_NFKD WITH ERROR).
I am not quite sure what exactly is the problem here. What could I be possibly doing wrong?

Related

Convert Emoji UTF8 to Unicode in Powershell

I need to persist data in a database with powershell. Occasionally the data contains an emoji.
DB: On the DB side everything should be fine. The attributes are set to NVARCHAR which makes it possible to persist emojis. When I inserted the data manually the emoji got displayed after I query them(🤝💰).
I tested it with example data in SSMS and it worked perfectly.
Powershell: When preparing the SQL Statement in Powershell I noticed that the emojis are interpreted in UTF8 (ðŸ¤ðŸ’°). Basically gibberish.
Is a conversion from UTF8 to Unicode even necessary? How can I persist the emojis as 🤝💰 and not as ðŸ¤ðŸ’°/1f600
My colleague had the correct answer to this problem.
To persist emojis in a MS SQL Database you need to declare the column as nvarchar(max) (max is not necessarily), which I already did.
I tried to persist example data which I had hardcoded in my PS Script like this
#{ description = "Example Description😊😊 }
Apparently VS Code adds some kind of encoding on top of the data(our guess).
What basically solved the issue was simply requesting the data from the API and persist it into the database with prefix string literal with N + the nvarchar(max) column datatype
Example:
SET #displayNameFormatted = N'"+$displayName+"'
And then include that variable in my insert statement.
Does this answer your question? "Use NVARCHAR(size) datatype and prefix string literal with N:"
Add emoji / emoticon to SQL Server table
1 Emoji in powershell is 2 utf16 surrogate characters, since the code for it is too high for 16 bits. Surrogates and Supplementary Characters
'🤝'.length
2
Microsoft has made "unicode" a confusing term, since that's what they call utf16 le encoding.
Powershell 5.1 doesn't recognize utf8 no bom encoded scripts automatically.
We don't know what your commandline actually is, but see also: Unicode support for Invoke-Sqlcmd in PowerShell

PostgreSQL Escape Microsoft Special Characters In Select Query

PostgreSQL, DBvisualizer and Salesforce
I'm selecting records from a database table and exporting them to a csv file: comma-separated and UTF8 encoded. I send the file to a user who is uploading the data into Saleforce. I do not know Salesforce, so I'm totally ignorant on that side of this. She is reporting that some data in the file is showing up as gibberish (non UTF8) characters (see below).
It seems that some of our users are copy/pasting emails into a web form which then inserts them into our db. Dates from the email headers (I believe) are the text that are showing as gibberish.
11‎/‎17‎/‎2015‎ ‎7‎:‎26‎:‎26‎ ‎AM
becomes
‎11‎/‎16‎/‎2015‎ ‎07‎:‎26‎:‎26‎ ‎AM
The text in the db field looks normal. It's when it is exported to a csv file and then that file is viewed in a text-editor like Wordpad or Salesforce. Then she sees the odd characters.
This only happens with dates from the text that is copy/pasted into the form/db. I have no idea how, or if there is a way, remove these "unseen" characters.
It's the same three-characters each time: ‎ I did a regex_replace() on these to strip them out, but it doesn't work. I think since they are not seen in the db field, the regex does see them.
It seems like even though I cannot see these characters, they must be there in some form that is making them show in text-editors like Wordpad or the Salesforce client after being exported to csv.
I can probably do a mass search/find/replace in the text editor, but it would be nice to do this in the sql and avoid the extra step each time.
Hoping someone has seen this and knows an easy fix.
Thanks for any ideas or pointers that may help.
The sequence ‎ is a left-to-right mark, encoded in UTF-8 (as 0xE2 0x80 0x8E), but being read as if it were in Windows-1252.
A left-to-right mark is invisible, so the fact that you can't see it in the database suggests that it's encoded correctly, but without knowing precisely what path the data took after that, it's hard to guess exactly where it was misinterpreted.
In any case, you should be able to replace the character in your Postgres query by using its Unicode escape sequence: E'\u200E'

Unicode characters in ColdFusion CLIENT scope

Is it possible to store unicode (utf-8) characters in ColdFusion's CLIENT scope? (CF9)
If I set a CLIENT scope variable and dump it immediately, it looks fine. But on next page load (ie: when CLIENT scope read back from storage) I just see question marks for the unicode characters.
I'm using a database for persistence and the data column in the CDATA table has been set to ntext.
Looking directly in the database I can see that the records have not been written correctly (again, just question marks showing for unicode characters).
(From the comments)
Have you checked/enabled the: String format --Enable High Ascii characters and Unicode ..." option in your client datasource?
From the docs:
Enable this option if your application uses Unicode data in
DBMS-specific Unicode data types, such as National Character or nchar.

Reading special characters from FoxPro using OLEDB

I'm using the FoxPro OLEDB driver (VFPOLEDB.1) to connect to a DBF using ADO.NET. The problem I am having is that some characters don't come across correctly. For example the '²' character comes out as '_'.
I have tried issuing the SET ANSI OFF command, to no avail.
I have found that the DBF is codepage 850
Does anyone know what is going on?
Foxpro doesn't support UNICODE if that is what you appear to be getting. It only works with ASCII 0-255 character set. Codepage 850 I believe is MS-DOS. There is a CPConvert() (for code page conversion), but I don't know if that is associated with the OleDbProvider as a usable function.
It turns out that I had to add CodePage=850 to the connection string so that it matched the DBF's code page.

Toad unicode input problem

In toad, I can see unicode characters that are coming from oracle db. But when I click one of the fields in the data grid into the edit mode, the unicode characters are converted to meaningless symbols, but this is not the big issue.
While editing this field, the unicode characters are displayed correctly as I type. But as soon as I press enter and exit edit mode, they are converted to the nearest (most similar) non-unicode character. So I cannot type unicode characters on data grids. Copy & pasting one of the unicode characters also does not work.
How can I solve this?
Edit: I am using toad 9.0.0.160.
We never found a solution for the same problems with toad. In the end most people used Enterprise Manager to get around the issues. Sorry I couldn't be more help.
Quest officially states, that they currently do not fully support Unicode, but they promise a full Unicode version of Toad in 2009: http://www.quest.com/public-sector/UTF8-for-Toad-for-Oracle.aspx
An excerpt from know issues with Toad 9.6:
Toad's data layer does not support UTF8 / Unicode data. Most non-ASCII characters will display as question marks in the data grid and should not produce any conversion errors except in Toad Reports. Toad Reports will produce errors and will not run on UTF8 / Unicode databases. It is therefore not advisable to edit non-ASCII Unicode data in Toad's data grids. Also, some users are still receiving "ORA-01026: multiple buffers of size > 4000 in the bind list" messages, which also seem to be related to Unicode data.