Convert Emoji UTF8 to Unicode in Powershell

Convert Emoji UTF8 to Unicode in Powershell - powershell

I need to persist data in a database with powershell. Occasionally the data contains an emoji.
DB: On the DB side everything should be fine. The attributes are set to NVARCHAR which makes it possible to persist emojis. When I inserted the data manually the emoji got displayed after I query them(🤝💰).
I tested it with example data in SSMS and it worked perfectly.
Powershell: When preparing the SQL Statement in Powershell I noticed that the emojis are interpreted in UTF8 (ðŸ¤ðŸ’°). Basically gibberish.
Is a conversion from UTF8 to Unicode even necessary? How can I persist the emojis as 🤝💰 and not as ðŸ¤ðŸ’°/1f600

My colleague had the correct answer to this problem.
To persist emojis in a MS SQL Database you need to declare the column as nvarchar(max) (max is not necessarily), which I already did.
I tried to persist example data which I had hardcoded in my PS Script like this
#{ description = "Example Description😊😊 }
Apparently VS Code adds some kind of encoding on top of the data(our guess).
What basically solved the issue was simply requesting the data from the API and persist it into the database with prefix string literal with N + the nvarchar(max) column datatype
Example:
SET #displayNameFormatted = N'"+$displayName+"'
And then include that variable in my insert statement.

Does this answer your question? "Use NVARCHAR(size) datatype and prefix string literal with N:"
Add emoji / emoticon to SQL Server table
1 Emoji in powershell is 2 utf16 surrogate characters, since the code for it is too high for 16 bits. Surrogates and Supplementary Characters
'🤝'.length
2
Microsoft has made "unicode" a confusing term, since that's what they call utf16 le encoding.
Powershell 5.1 doesn't recognize utf8 no bom encoded scripts automatically.
We don't know what your commandline actually is, but see also: Unicode support for Invoke-Sqlcmd in PowerShell

Related

Teradata replacing hyphens with boxes in SQL Assistant?

I have been trying to load data from Teradata tables to Excel files but the problem I'm encountering is that of special characters getting replaced by boxes. As can be seen in the image below, some hyphens and Turkish characters are getting replaced by boxes.
Special character replaced by boxes in Teradata SQL Assistant
The connection by default uses the ASCII character set and I tried to use the below code to translate the string to unicode (since I want to eventually use UTF-16 encoding):
TRANSLATE(_UNICODE "Column Name" USING UNICODE_TO_UNICODE_NFKD WITH ERROR).
I am not quite sure what exactly is the problem here. What could I be possibly doing wrong?

PostgreSQL Escape Microsoft Special Characters In Select Query

PostgreSQL, DBvisualizer and Salesforce
I'm selecting records from a database table and exporting them to a csv file: comma-separated and UTF8 encoded. I send the file to a user who is uploading the data into Saleforce. I do not know Salesforce, so I'm totally ignorant on that side of this. She is reporting that some data in the file is showing up as gibberish (non UTF8) characters (see below).
It seems that some of our users are copy/pasting emails into a web form which then inserts them into our db. Dates from the email headers (I believe) are the text that are showing as gibberish.
11‎/‎17‎/‎2015‎ ‎7‎:‎26‎:‎26‎ ‎AM
becomes
â€Ž11â€Ž/â€Ž16â€Ž/â€Ž2015â€Ž â€Ž07â€Ž:â€Ž26â€Ž:â€Ž26â€Ž â€ŽAM
The text in the db field looks normal. It's when it is exported to a csv file and then that file is viewed in a text-editor like Wordpad or Salesforce. Then she sees the odd characters.
This only happens with dates from the text that is copy/pasted into the form/db. I have no idea how, or if there is a way, remove these "unseen" characters.
It's the same three-characters each time: â€Ž I did a regex_replace() on these to strip them out, but it doesn't work. I think since they are not seen in the db field, the regex does see them.
It seems like even though I cannot see these characters, they must be there in some form that is making them show in text-editors like Wordpad or the Salesforce client after being exported to csv.
I can probably do a mass search/find/replace in the text editor, but it would be nice to do this in the sql and avoid the extra step each time.
Hoping someone has seen this and knows an easy fix.
Thanks for any ideas or pointers that may help.

The sequence â€Ž is a left-to-right mark, encoded in UTF-8 (as 0xE2 0x80 0x8E), but being read as if it were in Windows-1252.
A left-to-right mark is invisible, so the fact that you can't see it in the database suggests that it's encoded correctly, but without knowing precisely what path the data took after that, it's hard to guess exactly where it was misinterpreted.
In any case, you should be able to replace the character in your Postgres query by using its Unicode escape sequence: E'\u200E'

Hungarian characters in Firebird database

I cannot seem to get Hungarian accented characters to store properly in my Firebird database despite using ISO8859_2 character set and ISO_HUN collation.
This string for example:
Magyar Képzőművészeti Egyetem, Festő szak, mester: Klimó Károly
gets displayed as
Magyar Képzomuvészeti Egyetem, Festo szak, mester: Klimo Karoly
What am I doing wrong?

Your string is UTF8 encoded. It's working fine with IBExpert and an UTF8 database. Make sure that you're using the correct character set (DB connection, DB column, string).

Unicode characters in ColdFusion CLIENT scope

Is it possible to store unicode (utf-8) characters in ColdFusion's CLIENT scope? (CF9)
If I set a CLIENT scope variable and dump it immediately, it looks fine. But on next page load (ie: when CLIENT scope read back from storage) I just see question marks for the unicode characters.
I'm using a database for persistence and the data column in the CDATA table has been set to ntext.
Looking directly in the database I can see that the records have not been written correctly (again, just question marks showing for unicode characters).

(From the comments)
Have you checked/enabled the: String format --Enable High Ascii characters and Unicode ..." option in your client datasource?
From the docs:
Enable this option if your application uses Unicode data in
DBMS-specific Unicode data types, such as National Character or nchar.

Character with encoding UTF8 has no equivalent in WIN1252

I am getting the following exception:
Caused by: org.postgresql.util.PSQLException: ERROR: character 0xefbfbd of encoding "UTF8" has no equivalent in "WIN1252"
Is there a way to eradicate such characters, either via SQL or programmatically?
(SQL solution should be preferred).
I was thinking of connecting to the DB using WIN1252, but it will give the same problem.

I had a similar issue, and I solved by setting the encoding to UTF8 with \encoding UTF8 in the client before attempting an INSERT INTO foo (SELECT * from bar WHERE x=y);. My client was using WIN1252 encoding but the database was in UTF8, hence the error.
More info is available on the PostgreSQL wiki under Character Set Support (devel docs).

What do you do when you get this message? Do you import a file to Postgres? As devstuff said it is a BOM character. This is a character Windows writes as first to a text file, when it is saved in UTF8 encoding - it is invisible, 0-width character, so you'll not see it when opening it in a text editor.
Try to open this file in for example Notepad, save-as it in ANSI encoding and add (or replace similar) set client_encoding to 'WIN1252' line in your file.

Don't eridicate the characters, they're real and used for good reasons. Instead, eridicate Win1252.

I had a very similar issue. I had a linked server from SQL Server to a PostgreSQL database. Some data I had in the table I was selecting from using an openquery statement had some character that didn't have an equivalent in Win1252. The problem was that the System DSN entry (to be found under the ODBC Data Source Administrator) I had used for the connection was configured to use PostgreSQL ANSI(x64) rather than PostgreSQL Unicode(x64). Creating a new data source with the Unicode support and creating a new modified linked server and refernecing the new linked server in your openquery resolved the issue for me. Happy days.

That looks like the byte sequence 0xBD, 0xBF, 0xEF as a little-endian integer. This is the UTF8-encoded form of the Unicode byte-order-mark (BOM) character 0xFEFF.
I'm not sure what Postgre's normal behaviour is, but the BOM is normally used only for encoding detection at the beginning of an input stream, and is usually not returned as part of the result.
In any case, your exception is due to this code point not having a mapping in the Win1252 code page. This will occur with most other non-Latin characters too, such as those used in Asian scripts.
Can you change the database encoding to be UTF8 instead of 1252? This will allow your columns to contain almost any character.

I was able to get around it by using Postgres' substring function and selecting that instead:
select substring(comments from 1 for 200) from billing
The comment that the special character started each field was a great help in finally resolving it.

This problem appeared for us around 19/11/2016 with our old Access 97 app accessing a postgresql 9.1 DB.
This was solved by changing the driver to UNICODE instead of ANSI (see plang comment).

Here's what worked for me :
1 enable ad-hoc queries in sp_configure.
2 add ODBC DSN for your linked PostgreSQL server.
3 make sure you have both ANSI and Unicode (x64) drivers (try with both).
4 run query like this below - change UID, server ip, db name and password.
5 just keep the query in last line in postgreSQL format.
EXEC sp_configure 'show advanced options', 1
RECONFIGURE
GO
EXEC sp_configure 'ad hoc distributed queries', 1
RECONFIGURE
GO
SELECT * FROM OPENROWSET('MSDASQL',
'Driver=PostgreSQL Unicode(x64);
uid=loginid;
Server=1.2.3.41;
port=5432;
database=dbname;
pwd=password',
'select * FROM table_name limit 10;')

I have face this issue when my Windows 10 using Mandarin China as default language. This problem has occurred because I did try to import a database with UTF-8. Checking via psql and do "\l", it shows collate and cytpe is Mandarin China.
The solution, reset OS language back to US and re-install PostgreSQL. As the collate back to UTF-8, you can reset back your OS language again.
I write the full context and solution here https://www.yodiw.com/fix-utf8-encoding-win1252-cputf8-postgresql-windows-10/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse