Silverstripe CMS wysiwyg incorrectly decoding/encoding quotes - wysiwyg

I've encountered an issue that I've never come across before with Silverstripe when saving content in the CMS.
When saving in the Content wysiwyg (or any other fields I've added), it is escaping the quotes and apostrophies ie. When applying a style to a piece of content through the style dropdown to make the underlying HTML:
<p class="my-style">lorem ipsum</p>
When I press save, the page reloads and the style is not shown. When inspecting the HTML put back into the wysiwyg, I am getting:
<p class="\"my-style\"">lorem ipsum</p>
Initially, my thoughts were that the content field was maybe set to Text rather than HTMLText but I've checked and found this not to be the case.
Anyone have any ideas? I've built numerous sites in Silverstripe previously and this is the first time I've encountered this behaviour.
I'm using 3.1.0
Cheers

As I've mentioned I think this is a PHP issue and an issue of escaping double/single quotes. It's a symptom of magic quotes.
Magic Quotes was a "(annoying) security feature" enabled by default in PHP < 5.3.0 and deprecated in > 5.4.0
In a jist here's what magic quotes does (taken from php website)
When on, all ' (single-quote), " (double quote), \ (backslash) and NULL characters are escaped with a backslash automatically. This is identical to what addslashes() does.
This may be what you are experiencing.
Disabling Magic Quotes
On to the solution.
If you have access to your main php.ini, just turn it off like so:
; Magic quotes
;
; Magic quotes for incoming GET/POST/Cookie data.
magic_quotes_gpc = Off
; Magic quotes for runtime-generated data, e.g. data from SQL, from exec(), etc.
magic_quotes_runtime = Off
; Use Sybase-style magic quotes (escape ' with '' instead of \').
magic_quotes_sybase = Off
If you don't have have access to the main php.ini:
add this line to the root .htaccess file (if you're using apache mod_php)
php_flag magic_quotes_gpc Off
or if you're running PHP as a CGI, create a php.ini file on your document root and put the previously mentioned snippet for php.ini.
Hope this helps!

Very peculiar...
I read the symtpoms as a double double quote, if you parse
<p class="\"my-style\"">lorem ipsum</p>
it's going to appear as
<p class=""my-style"">lorem ipsum</p>
I usually define my styles in the typography.css file and it automatically appears in the "Styles" drop-down in the WYSIWYG editor.
Can you try this out and let me know if it helps?
Thanks!

Related

HTML entities in attributes with tinymce

When I have doble marks encoded in my HTML attributes, tinymce breaks that attributes.
For example:
data-value="ab&quote;----&quote;"> will be seen in source code: <div data-type="more-posts" data-value="ab">Hello</div>
http://codepen.io/anon/pen/MKYrbJ
How can I fix this?
If you would have real double quotes here your HTML would not be valid anymore because attributes use them.
It will be best do handle those when you save that content to your database.
You could replace them with single quotes - those wouldn't break the markup.

How can I stop TinyMCE from converting HTML entities back to special characters?

When I put something like £ or © into my TinyMCE editor and save the text, then when I load the text back from the database, it appears to turn into the actual £ and © characters. I don't want that. When I check my database, I see that these symbols are stored as £ and ©, so it's not a storage problem. Also if I try to store the symbols as &pound; and &copy;, that doesn't work either, as the character sequences are still converted into original symbols.
What can I do to fix this?
Thanks.
You may insert a zero-width-no-break-space between & and pound. This way the editor won't recognise it as an entity/character.
It turns out that I can just htmlspecialchars() the text in my PHP script, and it correctly handles both the formatting and the sequences that are entered. This is rather unexpected, but it actually works out to do the right thing.

Minify HTML files in text/html templates

I use mustache/handlebar templates.
eg:
<script id="contact-detail-template" type="text/html">
<div>... content to be compressed </div>
</script>
I am looking to compress/minify my HTML files in the templates for the best compression.
YUIcompressor, closure does not work as they think that it is script and gives me script errors.
HTMLCompressor does not touch them even as it thinks that it is a script.
How do I minify the content in the script tags with type text/html?
Can I use a library?
If not, is sed or egrep a preferable way? Do you have sed/egrep syntax to remove empty lines (with just spaces or tabs), remove all tabs, trim extra spaces?
Thanks.
sed -e "s/^[ \t]*//g" -e "/^$/d" yourfile This will remove all the extra spaces and tabs from the begining, and remove all empty lines.
sed -e "s/^[ \t]*//g" -e ":a;N;$!ba;s/\n//g" yourfile This will remove all the extra spaces and tabs from the begining, and concatenate all your code.
Sorry if i missed something.
Use sed ':a;N;$!ba;s/>\s*</></g' file, it enables to you remove whitespaces and newlines where unneeded. Unlike ghaschel example, this doesn't remove those useful whitespaces in the beginning of the line as it preserves <pre> and <p> tags.
This is useful as you can remove whitespaces between > and < which is a common method to enlarge a html file. This example could also be used for a XML file like atom feed and rss feed for example.
I personally use this as a pipe in my site generator, this can reduce a normaly file size and can be use in conjunction with gzip.
Try using Pretty Diff to minify this kind of code. It will only assume the stuff inside script tags is JavaScript if there is no mime type or if the type is one of the various JavaScript types. It is also intelligent enough to know which white space is okay to remove without corrupting the output of content or the recursive beautification of code later.

How to avoid malformed URI sequence error?

I'm working with perl. I have data saved on database as  “
and I want to escape those characters to avoid having malformed URI sequence error on the client side. This error seems to happen on fire fox only. The fix I found while googling is not to use decodeURI , yet I need this for other characters to be displayed correctly.
Any help? uri_escape does not seem enough on the server side.
Thanks in advance.
Detalils:
In perl I'm doing the following:
print "<div style='display:none;' id='summary_".$note_count."_note'>".uri_escape($summary)."</div>";
and on the java script side I want to read from this div and place it on another place as this:
getObj('summary_div').innerHTML= unescape(decodeURI(note_obj.innerHTML));
where the note_obj is the hidden div that saved the summary on perl.
When I remove decodeURI the problem is solved, I don't get malformed URI sequence error on java script. Yet I need to use decodeURI for other characters.
This issue seems to be reproduced on firefox and IE7.
you can try to use the CGI module, and perform
$uri = CGI::escape($uri);
maybe it depends of the context your try to escape the uri.
This worked fine for me in CGI context.
After you added details, i can suggest :
<div style='display:none;' id='summary_".$note_count."_note'>".CGI::escape($summary)."</div>";
URL escaping won't help you here -- that's for escaping URLs, not escaping text in HTML. What you really want is to encode the string when you output it. See the Encode.pm built-in library. Make sure that you get your charset statements right in the HTTP headers: "Content-Type: text/html; charset=UTF-8" or something like that.
If you're unlucky, you may also have to decode the string as it comes out of the database. That depends on the database driver and the encoding...

How do I protect against cross-site scripting?

I am using php, mysql with smarty and I places where users can put comments and etc. I've already escaped characters before inserting into database for SQL Injection. What else do I need to do?
XSS is mostly about the HTML-escaping(*). Any time you take a string of plain text and put it into an HTML page, whether that text is from the database, directly from user input, from a file, or from somewhere else entirely, you need to escape it.
The minimal HTML escape is to convert all the & symbols to & and all the < symbols to <. When you're putting something into an attribute value you would also need to escape the quote character being used to delimit the attribute, usually " to ". It does no harm to always escape both quotes (" and the single quote apostrophe '), and some people also escape > to >, though this is only necessary for one corner case in XHTML.
Any good web-oriented language should provide a function to do this for you. For example in PHP it's htmlspecialchars():
<p> Hello, <?php htmlspecialchars($name); ?>! </p>
and in Smarty templates it's the escape modifier:
<p> Hello, {$name|escape:'html'}! </p>
really since HTML-escaping is what you want 95% of the time (it's relatively rare to want to allow raw HTML markup to be included), this should have been the default. Newer templating languages have learned that making HTML-escaping opt-in is a huge mistake that causes endless XSS holes, so HTML-escape by default.
You can make Smarty behave like this by changing the default modifiers to html. (Don't use htmlall as they suggest there unless you really know what you're doing, or it'll likely screw up all your non-ASCII characters.)
Whatever you do, don't fall into the common PHP mistake of HTML-escaping or “sanitising” for HTML on the input, before it gets processed or put in the database. This is the wrong place to be performing an output-stage encoding and will give you all sort of problems. If you want to validate your input to make sure it's what the particular application expects, then fine, but weeding out or escaping “special” characters at this stage is inappropriate.
*: Other aspects of XSS are present when (a) you actually want to allow users to post HTML, in which case you have to whittle it down to acceptable elements and attributes, which is a complicated process usually done by a library like HTML Purifier, and even then there have been holes. Alternative, simpler markup schemes may help. And (b) when you allow users to upload files, which is something very difficult to make secure.
In regards to SQL Injection, escaping is not enough - you should use data access libraries where possible and parameterized queries.
For XSS (cross site scripting), start with html encoding outputted data. Again, anti XSS libraries are your friend.
One current approach is to only allow a very limited number of tags in and sanitize those in the process (whitelist + cleanup).
You'll want to make sure people can't post JavaScript code or scary HTML in their comments. I suggest you disallow anything but very basic markup.
If comments are not supposed to contain any markup, doing a
echo htmlspecialchars($commentText);
should suffice, but it's very crude. Better would be to sanitize all input before even putting it in your database. The PHP strip_tags() function could get you started.
If you want to allow HTML comments, but be safe, you could give HTML Purifier a go.
You should not modify data that is entered by the user before putting it into the database. The modification should take place as you're outputting it to the website. You don't want to lose the original data.
As you're spitting it out to the website, you want to escape the special characters into HTML codes using something like htmlspecialchars("my output & stuff", ENT_QUOTES, 'UTF-8') -- make sure to specify the charset you are using. This string will be translated into my output & stuff for the browser to read.
The best way to prevent SQL injection is simply not to use dynamic SQL that accepts user input. Instead, pass the input in as parameters; that way it will be strongly typed and can't inject code.