Can I manage content structure with a rich text editor? - tinymce

I'm developing a CMS, and I'm trying to figure out which rich text editor (if any) I want to use.
The content is stored in a structured form on the server. Let's call it the "canonical form". It is not a simple HTML or markdown page, but a multi-part structure where each part is stored as individual records in the database.
The server reads the canonical form and sends it to the client. The client transforms the canonical form into HTML. I now want to let the user edit the content, and save it back to the server in canonical form.
I'm not sure a rich text editor will do the trick. It seems most RTE's give you HTML, leaving it up to you to parse the HTML and save it. The problem is that the conversion of canonical to HTML is one-way. The canonical form is different enough from HTML that the transformation can't be readily reversed.
So I need some kind of intimate interaction with the editor. I need to track all the things the editor does (select, copy, paste, drag-n-drop, splitting blocks, merging blocks, etc.) as the editor is doing it, so that I can maintain the canonical form in parallel with the displayed HTML.
Is there anything out there that will do this? I'm looking at TinyMCE, CKEditor, etc.

It sounds like you're probably going to need logic that converts content into canonical form on an editor get operation, and the inverse on an editor set operation.
Textbox.io supports the idea of filters for content. You could possibly tie this in with something like Markdown-js to get your canonical format.

Related

TinyMCE autocloses HTML tags - How to disable? 2

Same question as here
I have two tinymce Editors One of them for Header other for Footer(needs o be done for email template).
I want for example to have
<div>abra in Header editor. After saving becomes <div>abra</div>(closes the tag)
And
cadabra</div> in Footer editor. After saving becomes cadabra(removes tag)
so that at the end I could get <div>abracadabra</div>
How can i disable it?
You cannot disable TinyMCE from trying to create valid, well-formed HTML. The engine that drives TinyMCE is designed to ensure that the content in any one editor is valid and well-formed and while you realize that the data across two editors is intended to be correct TinyMCE won't allow you to do this. You could certainly post-process the data when extracting it from TinyMCE to get your desired end result.

Sanitizing inputs with AEM

We have various people updating our AEM website however when they copy and paste from word or from online it retains the HTML. I'm wondering if AEM has any built-in way of sanitizing the input so I don't need to build one.
If you are using Rich Text Editor field in the dialog then the text will be parsed and some tags will be stripped. Take a look here for more information about how to configure it and how it works.
We had a rich text edit component with same issue wherein authors were able to place HTML styling onto RTE and the placed styles were colliding with application styles and was breaking components. Fix was, we stripped out all HTML styling using jsoup API before rendering back on screen.
The usual approach in AEM is to protect the user on output (i.e. take the input as-is and use the built-in XSS API when rendering that input).
https://docs.adobe.com/docs/en/cq/5-6-1/deploying/security_checklist.html#Protect%20against%20Cross-Site%20Scripting%20%28XSS%29
https://docs.adobe.com/content/docs/en/cq/5-6-1/developing/securitychecklist/_jcr_content/par/download/file.res/xss_cheat_sheet.pdf

Does Orbeon Form support the Thai language?

I am new to Orbeon Form and would like to use it. However, I had tried the Form examples on Orbeon Form Web Site and input some of data in Thai Language. Yes, It can be input data in the fields with “Thai Language”. But when I try to generate “PDF”. The Thai Language Data cannot be displayed.
Can Orbeon X-Forms Support “Thai Language” for inputting Data in the Fields ?
Do I need to use “Professional version” in order to get “Thai Language” to work and display on PDF generation ?
Can “Orbeon X-Form” be able to save Data Locally at the workstation (in case the forms are complicated to fill-in, and need several input time to finish?
This is probably due to the fact that the PDF is lacking an adequate font. Since September 2011 builds, there are properties to specify font embedding, for example:
<property as="xs:string"
name="oxf.fr.pdf.font.path.vera"
value="/path/to/DejaVuSans.ttf"/
For more information, see the documentation. Embedding a specific font with Thai characters might do the trick, although to be fair I haven't tried Thai specifically.
This should work equally well with both Orbeon Forms CE and PE.
You can do this by adding the "Save locally" button to your forms, which is done by setting a property in your properties-local.xml. This will enable users to save an HTML file on their local machine. The HTML file contains all the information they entered so far: when they reopen it, they are taken back to the form on your side, with the data they entered so far "pre-filled".
You can go through this link for internationlization of Orbeon Forms
http://wiki.orbeon.com/forms/how-to/logic/i18n
There is an example given too with multiple languages reflecting on a same form.

How safe is the data being parsed by RTF editors like TinyMCE?

I have a great concern in deploying the TinyMCE editor on a website. Looking at the code parsed by the editor it does a great job, and I leave the HTML button off the toolbar configuration so users can not inject their own source.
However, from what I read in the TinyMCE docs, it claims to degrade nicely to a regular textarea should javascript be disabled on a users browser... and therein lies my concern. If it does revert to a normal textarea, then the user is then able to easily inject their own HTML, and this leaves me with a security concern.
I just pass through data created with TinyMCE, and it is used within another page created by my script, so it poses no security risk to my server. The security concern arises over what malicious data may be passed to another user viewing the generated page.
I know many of you will tell me to just use regexes, or parse this data, but that itself could be a nightmare, as I would be trying to either...
a.) Use regexes to try and clean up the HTML without breaking the generated page,
and it is better to parse the data for that anyway.
b.) Reparsing data that has already been parsed by the RTF editor, which also
would probably end up breaking the generated page.
Anyone with any previous experience with this type of scenario, I would really appreciate a 'heads-up' as to any other risks that using an RTF editor for user data could entail.
I would really like to provide this as a user option, but not if the risks outweigh giving the user using the RTF a chance to take a wack at another user viewing the page that is generated by the script.
My gut feeling is to steer a wide berth around use of the RTF at this point.
Thanks for any direction you can give me with your own experiences.
You cannot have client-side security on the web. You simply can't trust the browser, because it's easy for a malicious user to substitute a replacement browser that does whatever he wants.
If you accept HTML from users (using TinyMCE or through any other method) and display it to other users, you must sanitize or validate the HTML in some way on the server. If you're using Perl, the leading package seems to be HTML::Scrubber (along with various other modules that help you plug it in to various frameworks). I haven't had occasion to try it myself.
The TinyMCE Security page mentions some ways to make it harder for people to submit arbitrary HTML, but you still need server-side checks.
Regex is generally not considered good for parsing HTML
RegEx match open tags except XHTML self-contained tags but I have noted the "perl" tag :)
My advice when taking markup from users is to always parse it through something that can accept mal-formed HTML and return well formed HTML. These parses generally produce something that can be queried and updated with some form of XPath.
In Python there is a module called BeautifulSoup, Ruby has Nokogiri and in ASP.NET there is a project called HtmlAgilityPack that all do this sort of thing. I'm not sure what library perl has, but I'm sure there would be something.

How to build an inline translation system similar to Magento's

I am working on a Zend Framework, MVC, enterprise website project. I would like to develop a friendly translation system with the ability to translate each word according its context (sometimes same word have different translation).
Zend Framework uses Zend_Translate for i18n and localization. We have also seen Magento's (which uses ZF) inline translation system, where users can translate pages directly.
We want to know how this inline translation system works, so that we can build a similar system with improvements.
Where are translations stored: in the database or in CSV files?
How does the system know to fetch translations for the same word when tranlsated differently by the user on different pages?
How should we build a page to support inline translation?
How does the system handle static text vs. dynamic (database-driven) text?
Inline translation seems like it would make the site very slow. How does Magento solve this problem?
Please if you have more points that should be explained, write them. Thanks
Starting from the beginning here (in the future, this is probably more than one logical question):
Magento stores basic translations (provided by the programmer) in CSV files, but inline translations are stored in the database.
Magento's translations operate on entire strings, not words. By providing an entire sentence worth of context for translations, idiomatic translations are achievable. The tradeoff is obviously that every sentence must be translated, rather than every word.
Magento's answer to this is to wrap all localizable strings in a call to the localizer. Magento templates usually look something like this (the double-underscore function maps to the "translate into the current locale" function):
print $this->__("Please translate this string");
Dynamic text (as in product descriptions) in Magento is often not translated, but if you want to do so, it's as simple as passing the right string to the translator, like this:
print $this->__($someString);
It's unlikely that translation will make or break your site (look to your DB queries for most performance problems), but this is a legitimate question nonetheless. Magento does a few things to help. First, it stores serialized versions of the CSV files in a cache, so that reading CSVs is made more efficient. Secondly, Magento offers page caching so that an entire page's output can be stored (assuming that it will render identically), as well as block-level caching for smaller bits of a page. Between these you're in good shape for the most part.
Hope that helps!
Thanks,
Joe