Edit: My assumptions about encoding were incorrect. I'm leaving the question as originally asked in case others come here with the same misunderstanding.
When I include a link in some text in the editor that includes a querystring, then view the source code, I can see that it's converted any & characters in the href to &, which breaks the links.
A link
becomes
A link
and if I change it back to just & in the source, click Ok on the view source dialog, then immediately view source again, it's already worked its charms and encoded the & once again.
Is there a way to cue the editor to go ahead and convert those outside tag attributes, but not mess with those in attributes?
Using an older version (4.0.12), but I see the behavior on the current live sample right on tinymce.com, so if it's a bug it looks like it hasn't been fixed. But I am wondering if it's just a setting I'm missing.
Relevant questions:
Do I encode ampersands in <a href...>?
Do ampersands still need to be encoded in URLs in HTML5?
The HTML spec actually states that ampersands in HTML attributes have to be encoded so TinyMCE is working 100% as it should. If your server side code is not handling that correctly that is an issue with the server side code.
Related
I am writing a Flutter web application that needs to have a customizable template for printing reports: inside the template there will be some placeholders that must be replaced with data at the moment of printing. The "printing" itself will be done by having the user download and open a PDF file, then print it through the browser, the OS or anything else, but that's beyond the scope, at the moment.
The default template would be something like this, where <BUYER_DATA> and <TRANSACTION_DATA> will be replaced with the data of the transaction the user is printing, along with some other "technical" tags (i.e., the page number and the pages count):
Header with site name
727 Chester Rd New Trafford, Stretford
Manchester
<BUYER_DATA>
01-12-2022 14:40
<TRANSACTION_DATA>
AppName - TM 2022
Page <PAGE_NO> of <PAGES_COUNT>
The user is allowed to edit this template in any aspect (boldness, size, colors, etc), provided that the tags related to the data are not removed from it.
So, to achieve this, I added a WYSIWYG html editor inside a page in order to save the template as an HTML-formatted string. And this works fine: only then I realized that the well-known flutter printing library doesn't support conversion from HTML directly to PDF on web, and all my plans began to crumble.
I then tried to discover if there's some other way to achieve the same by replacing the HTML template with something else, like markdown, but it seems that there's nothing that could help me.
The question is: anybody knows of a package capable of converting from HTML, markdown or such, directly into PDF?
I just need to know so I can stop googling around and decide to write my own parser for the HTML and convert it into a series of Widgets of the before-mentioned printing package.
I having a hard trying to properly display Vietnamese text in ColdFusion. I've proper charset set to UTF-8 but still no luck. The same texts work fine in a HTML page. What else am I missing? Any suggestion would be much appreciated.
Html:
ColdFusion:
Thanks!
There are two things you need to watch out for, as far as I recall of the top of my head.
The first is to ensure that the .cfm file itself is saved as UTF-8 - this is a file system option, and will probably be settable in your editor. This ensures that the UTF-8 characters are correctly preserved when saving the file.
The other is that every .cfm file that includes any UTF-8 text should start with:
<cfprocessingdirective pageencoding="utf-8" />
This ensures that ColdFusion delivers the page to the browser in the correct format.
Just to be sure, when you display your working HTML, can you check the page encoding used by your browser (ie. in FireFox you can right-click+page Info). Maybe your text is not UTF-8 encoded that could explain the problem...
I have a wordpress installation that clients can edit, all characters display ok. On the main homepage I query the same database for the same title and post content, but it doesn't display correctly - just a question mark
I have tried sending the utf8 headers manually, through htaccess and through meta tags. I have used SET name UTF8 (which turns the characters into the diamond symbol with a questionmark inside).
I genuinely cant figure out what it could be now and I really need these characters to display correctly.
Heres the homepage, you can see in the Sounddhism 6 preview that there are lots of question marks, if you click on it you will see what they are meant to look like
http://nottingham.subverb.net
I have passed it through the validator and it gives me this error:
Sorry, I am unable to validate this document because on line 373 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: utf8 "\xA0" does not map to Unicode
Which, i appreciate is supposed to help me, but I don't know what to do about it. Especially since that line, the letter generating the error is supposed to be a space and is AFTER the offending question marks.
Can anyone help?
Compare the encoding of both the back-end scripts in Wordpress and also your homepage script. If you're using IE, right-click the page and check the encoding. Sometimes it's set to "Auto-detect" and IE will often detect a different encoding for different pages, causing strange issues like this.
If you're not using IE, try using a tool like Fiddler to see exactly what encoding (and what bytes are being sent back and forth both in the back-end and your homepage script.
If forcing UTF-8 on your homepage script doesn't work, I would guess that the back-end is not using UTF-8.
So I started developing my firefox addon.
Most of the work is performed by a referenced javascript file.
Problem is that when I edit some of the html elements on the page and say, set their text it's written as pure giberish. I am writing the text in hebrew. Can't for the life of me figure the reason.
Any ideas?
Javascript strings are already Unicode at runtime. However, you have to make sure that your files are encoded correctly.
Always use utf-8 (without BOM) file encoding for all your js, XUL, DTD, properties files to be sure.
Firefox might try to guess the file character set incorrectly otherwise, and even worse some stuff might not even try guessing the encoding and instead simply always assume utf-8.
Better yet, do not hard-code strings in js/xul, but use DTD/properties files for localization (XUL tutorial, XUL School).
This, e.g. snippet works pretty well for me (on this very page):
document.getElementsByTagName("h1")[0].textContent="русский язык";
(Just fire up the Firefox Web Console)
"Inline" hewbrew embedded in js files might create additional problems because it is right-to-left and bidi sucks, so the localization approach should be preferred.
I have a great concern in deploying the TinyMCE editor on a website. Looking at the code parsed by the editor it does a great job, and I leave the HTML button off the toolbar configuration so users can not inject their own source.
However, from what I read in the TinyMCE docs, it claims to degrade nicely to a regular textarea should javascript be disabled on a users browser... and therein lies my concern. If it does revert to a normal textarea, then the user is then able to easily inject their own HTML, and this leaves me with a security concern.
I just pass through data created with TinyMCE, and it is used within another page created by my script, so it poses no security risk to my server. The security concern arises over what malicious data may be passed to another user viewing the generated page.
I know many of you will tell me to just use regexes, or parse this data, but that itself could be a nightmare, as I would be trying to either...
a.) Use regexes to try and clean up the HTML without breaking the generated page,
and it is better to parse the data for that anyway.
b.) Reparsing data that has already been parsed by the RTF editor, which also
would probably end up breaking the generated page.
Anyone with any previous experience with this type of scenario, I would really appreciate a 'heads-up' as to any other risks that using an RTF editor for user data could entail.
I would really like to provide this as a user option, but not if the risks outweigh giving the user using the RTF a chance to take a wack at another user viewing the page that is generated by the script.
My gut feeling is to steer a wide berth around use of the RTF at this point.
Thanks for any direction you can give me with your own experiences.
You cannot have client-side security on the web. You simply can't trust the browser, because it's easy for a malicious user to substitute a replacement browser that does whatever he wants.
If you accept HTML from users (using TinyMCE or through any other method) and display it to other users, you must sanitize or validate the HTML in some way on the server. If you're using Perl, the leading package seems to be HTML::Scrubber (along with various other modules that help you plug it in to various frameworks). I haven't had occasion to try it myself.
The TinyMCE Security page mentions some ways to make it harder for people to submit arbitrary HTML, but you still need server-side checks.
Regex is generally not considered good for parsing HTML
RegEx match open tags except XHTML self-contained tags but I have noted the "perl" tag :)
My advice when taking markup from users is to always parse it through something that can accept mal-formed HTML and return well formed HTML. These parses generally produce something that can be queried and updated with some form of XPath.
In Python there is a module called BeautifulSoup, Ruby has Nokogiri and in ASP.NET there is a project called HtmlAgilityPack that all do this sort of thing. I'm not sure what library perl has, but I'm sure there would be something.