How can I stop TinyMCE from converting HTML entities back to special characters? - tinymce

When I put something like £ or © into my TinyMCE editor and save the text, then when I load the text back from the database, it appears to turn into the actual £ and © characters. I don't want that. When I check my database, I see that these symbols are stored as £ and ©, so it's not a storage problem. Also if I try to store the symbols as £ and ©, that doesn't work either, as the character sequences are still converted into original symbols.
What can I do to fix this?
Thanks.

You may insert a zero-width-no-break-space between & and pound. This way the editor won't recognise it as an entity/character.

It turns out that I can just htmlspecialchars() the text in my PHP script, and it correctly handles both the formatting and the sequences that are entered. This is rather unexpected, but it actually works out to do the right thing.

Related

PostgreSQL Escape Microsoft Special Characters In Select Query

PostgreSQL, DBvisualizer and Salesforce
I'm selecting records from a database table and exporting them to a csv file: comma-separated and UTF8 encoded. I send the file to a user who is uploading the data into Saleforce. I do not know Salesforce, so I'm totally ignorant on that side of this. She is reporting that some data in the file is showing up as gibberish (non UTF8) characters (see below).
It seems that some of our users are copy/pasting emails into a web form which then inserts them into our db. Dates from the email headers (I believe) are the text that are showing as gibberish.
11‎/‎17‎/‎2015‎ ‎7‎:‎26‎:‎26‎ ‎AM
becomes
‎11‎/‎16‎/‎2015‎ ‎07‎:‎26‎:‎26‎ ‎AM
The text in the db field looks normal. It's when it is exported to a csv file and then that file is viewed in a text-editor like Wordpad or Salesforce. Then she sees the odd characters.
This only happens with dates from the text that is copy/pasted into the form/db. I have no idea how, or if there is a way, remove these "unseen" characters.
It's the same three-characters each time: ‎ I did a regex_replace() on these to strip them out, but it doesn't work. I think since they are not seen in the db field, the regex does see them.
It seems like even though I cannot see these characters, they must be there in some form that is making them show in text-editors like Wordpad or the Salesforce client after being exported to csv.
I can probably do a mass search/find/replace in the text editor, but it would be nice to do this in the sql and avoid the extra step each time.
Hoping someone has seen this and knows an easy fix.
Thanks for any ideas or pointers that may help.
The sequence ‎ is a left-to-right mark, encoded in UTF-8 (as 0xE2 0x80 0x8E), but being read as if it were in Windows-1252.
A left-to-right mark is invisible, so the fact that you can't see it in the database suggests that it's encoded correctly, but without knowing precisely what path the data took after that, it's hard to guess exactly where it was misinterpreted.
In any case, you should be able to replace the character in your Postgres query by using its Unicode escape sequence: E'\u200E'

Unicode converted text isn't shown properly in MS-Word

In a mapping editor, the display is correct after the legacy to unicode conversion for DEVANAGARI text shown using a unicode font (Arial Unicode MS). However, in MS-WORD, the display isn't as expected for the same unicode text in the unicode font (Arial Unicode MS) or any other Devanagari unicode fonts. The expected sequence of unicodes are provided as per the documentation. The sequence can be seen on the left-hand side table.
Please let me know where I am going wrong.
Thanks for your help!
Does your map have to insert the zero_width_joiner? The halant (virama) by itself is enough to get the half-consonant (for some combinations) and in particular, it may be that Word is using the presence of the ZWJ to keep them separate.
If getting rid of the ZWJ doesn't help, another possibility is that Word may be treating the individual characters of the text string as individual "runs" of text.
If those first 4 characters are not in a single run, this can happen.
[aside: the way to tell if it's being treated as a single run, is to save the document as an xml file and then open it with something like notepad++ and look at the xml "w:t" element (IIRC) associated with these characters. If they're all in separate w:t elements, it means they're in separate runs. In that case, you might need to copy the text from Word to some other tool (e.g. Notepad++) and then copy it from there and paste it back in Word -- that might cause it to be imported into Word in a single run.

MS Access Convert from Unicode when Reading from Text File

So, I have an Access database where I import data from a text file. The file is semi-colon delimited. Occasionally (and will become more frequent) I receive a file from one of our affiliates from Russia. The file has unicode (I think) characters like "Ìèðîøíèêîâ" instead of "Мирошников". Ultimately, I'd like to translate those into English upon import, but for now, I'll accept the Russian characters.
How should I go about doing this? Currently, I'm reading each line of the file, using the SPLIT function to separate each field by the ";" separator into an array, and sticking each array element into a table. Would changing the system Keyboard Layout to Russian prior to this work, or is it more complicated than that.
Does any of this make sense, or should I just bag it and go grab a beer (or some Vodka)?
Thanks!
You should be able to create an "Import Specification" that will tell Access how to convert the character data. Follow the procedure here...
Importing a text with separators using VBA
...and choose the appropriate character set from the "Code Page" combo box.
If you need to perform the imports from VBA code then you can save the specification (using the "Save As..." button) and then re-use that specification in a DoCmd.TransferText statement.

How to insert asian characters using Squirrel Sql

I'm running Squirrel-SQL on Ubuntu.
I cannot write chinese characters on Squirrel, but I can write them in another text editor and copy+paste into squirrel. However, when I run the update and select the data I just inserted, the characters I write show up as question marks.
When I insert the data from a web interface, or when I right click on results and choose "make editable", I can paste in the data which will show up fine when I select again.
This tells me that the database saves the characters fine. Squirrel is capable of displaying the characters fine. The problem seems to be in the sql text editor.
Anyone have this problem before?
I finally found the answer! Looks like hibernate was doing some extra work for me (via web interface or squirrel's "make editable" option on results) that I wasn't aware was necessary. Looks like the problem was actually a syntactical mistake for Microsoft SQL Server. I needed to prepend the letter 'N' right before the characters I wish to insert.
For example:
update title_product
set synopsis = N'我很高兴 test'
where title_product_id = 26
This converts chinese and english characters correctly. Yay.
Although I still cannot write chinese characters directly into Squirrel, I have to copy+paste from another editor.

Apostrophe issue in RTF

I have a function within a custom CRM web application (old VB.Net circa 2003) that takes a set of fields from a database and merges them with palceholders in a set of RTF based template documents. These generate merged letters and documentation. The code essentially loops through each line of the RTF template file and replaces any instances of the placeholder values with text from a database record. The issue I'm having is that users have pasted a certain type of apostrophe into the web app (and therefore into the database) that is not rendering correctly in the resulting RTF file. It is rendering like this - ’.
I need a way to spot this invalid apostrophe in the code and replace it with a valid one. Unfortunately when I paste the invalid apostrophe into the Visual Studio editor it gets converted into the correct one. So I need another way to express this invalid apostrophe's value. Unfortunately I do not know a great deal about unicode and other encodings so I'm calling out for help with this.
Any ideas?
If you really just want to figure out what the character is you might want to try and paste it into a text editor like ultraedit. It has a hex mode that you can flip to to see the actual underlying bytes.
In order to do the replace once you've figured out the character you'd do something like this in Vb,
text.Replace(ChrW(2001), "'")
Note that you might not be able to figure it out easily using the text editor because it might also get mangled by paste from the clipboard. You might want to either print some debug of the ascii values from code. You can use the AscW function to do that.
I can't help but think that it may actually simply be a case of specifying the correct encoding to use when you write out the stream though. Assuming you're using a StreamWriter you can specify it on the constructor. I'm guessing you actually want ASCII given your requirement.
oWriter = New System.IO.StreamWriter(path, False, System.Text.Encoding.ASCII)
It looks like you probably want to encode characters out of the 8 bit range (>255).
You can do that using \uNNNN according to the wikipedia article.