How to handle text copy/pasted from Word - forms

We have a textbox where users will be entering reviews. They should be able to do simple formatting things like bold, italics, lists, headers, etc... Our problem is that the majority of our users will most likely create their reviews in MS Word then copy/paste the text to our form. As you know, this can (and mostly will) cause problems when displaying the data. What is the best way to provide the simple formatting functionality without having problems when they copy/paste from Word?
The best solution would be some type of filter that takes text from Word and strips out everything unneeded or illegal.

You could write an add-in that exposes a single button with the text "Copy for Review". Your add-in would do all the cleanup that you'd want.

Related

Change fields depending on drop down selection in Microsoft Word design mode

Can anyone point me to info about how to create a Microsoft Word document that changes text input fields depending on what the user selects in a drop-down menu?
I'm using Word, Developer toolbar, Design mode, and have gotten as far as how to create the drop down selection box, and add text input fields below that on the page, but I need to know how to change what fields appear depending on what the selection is. I'm sure it's possible, I just don't know how to go about it.
I'm pretty good with this sort of thing in HTML with javascript and jQuery, but Word is its own little world.
I tried the "structured" tab but it suggests selecting XML add ins, and none appear in the list to select.
One option is using a template approach in combination with 3rd party toolkit and external application. External application takes care for user interface where user selects template and sets filter for data retrieval. The application then reads the data, generates new document based on template and populates it with data.
You don’t have to mess with MS Word macros and this solution can survive Office upgrades very smoothly.
Template design in done in MS Word. We are using third party toolkit (i.e. Docentric Toolkit) for populating Word documents with data.

Space length limitation

I have a word document file which is a form.
I try to complete it. Here is a screenshot of how it is looks like
When I type in the grey box there is a limitation in length and when I reach it, it won't let me type more.
I am not sure of what it is, however I want to insert an image or a table but I can't.
How can I make it?
The field you are trying to enter information into is a Legacy Text Form Field in Word 2010. In order to have a data entry area within the form that will accept text, tables, and images, delete this field and replace it with a Rich Text Content Control. This control is found on Word's Developer Tab:
Instructions for Displaying Word Developer Tab (if needed)
Like the legacy form fields, content controls allow manual or programmatic entry of data as well the ability to restrict editing of the data within the content control. Gregory K. Maxey has posted an incredibly detailed tutorial on creating forms with content controls, programming the content entry via VBA (Visual Basic for Applications) and restricting editing of the control's contents (all of which is available using the Rich Text Content Control):
Create Forms with Content Controls by Gregory K. Maxey
The same author also has an additional posting on content controls where he provides links to and offers explanations of more advanced content control abilities such as data mapping:
Content Controls (Additional Information) by Gregory K. Maxey
Lastly, Microsoft also provides some guidance on programming content controls via .NET (which I think may be beyond the scope of your question, but which I include for future readers):
MSDN: How to Add Content Controls to Word Documents

E-mail like rich text input

How do I implement an e-mail like text area input? Where the users can upload images, format text, etc.
Like when you ask a question here, it gives you option to format text, input HTML or add images.
You will probably want to use one of the many freely available WYSIWYG editors.
A couple of the more popular ones are:
CKEditor: http://ckeditor.com/
TinyMCE: http://www.tinymce.com/
You can explore alternatives using AlternativeTo.
If your users are more technical, they may prefer a Markdown syntax.

How safe is the data being parsed by RTF editors like TinyMCE?

I have a great concern in deploying the TinyMCE editor on a website. Looking at the code parsed by the editor it does a great job, and I leave the HTML button off the toolbar configuration so users can not inject their own source.
However, from what I read in the TinyMCE docs, it claims to degrade nicely to a regular textarea should javascript be disabled on a users browser... and therein lies my concern. If it does revert to a normal textarea, then the user is then able to easily inject their own HTML, and this leaves me with a security concern.
I just pass through data created with TinyMCE, and it is used within another page created by my script, so it poses no security risk to my server. The security concern arises over what malicious data may be passed to another user viewing the generated page.
I know many of you will tell me to just use regexes, or parse this data, but that itself could be a nightmare, as I would be trying to either...
a.) Use regexes to try and clean up the HTML without breaking the generated page,
and it is better to parse the data for that anyway.
b.) Reparsing data that has already been parsed by the RTF editor, which also
would probably end up breaking the generated page.
Anyone with any previous experience with this type of scenario, I would really appreciate a 'heads-up' as to any other risks that using an RTF editor for user data could entail.
I would really like to provide this as a user option, but not if the risks outweigh giving the user using the RTF a chance to take a wack at another user viewing the page that is generated by the script.
My gut feeling is to steer a wide berth around use of the RTF at this point.
Thanks for any direction you can give me with your own experiences.
You cannot have client-side security on the web. You simply can't trust the browser, because it's easy for a malicious user to substitute a replacement browser that does whatever he wants.
If you accept HTML from users (using TinyMCE or through any other method) and display it to other users, you must sanitize or validate the HTML in some way on the server. If you're using Perl, the leading package seems to be HTML::Scrubber (along with various other modules that help you plug it in to various frameworks). I haven't had occasion to try it myself.
The TinyMCE Security page mentions some ways to make it harder for people to submit arbitrary HTML, but you still need server-side checks.
Regex is generally not considered good for parsing HTML
RegEx match open tags except XHTML self-contained tags but I have noted the "perl" tag :)
My advice when taking markup from users is to always parse it through something that can accept mal-formed HTML and return well formed HTML. These parses generally produce something that can be queried and updated with some form of XPath.
In Python there is a module called BeautifulSoup, Ruby has Nokogiri and in ASP.NET there is a project called HtmlAgilityPack that all do this sort of thing. I'm not sure what library perl has, but I'm sure there would be something.

How to enable copy/paste formatted text from Lotus Notes to TinyMCE?

This question was previously posted to the TinyMCE HowTo Forum with no responses. Here's hoping that someone out there has encountered (and solved) this issue.
The question: Is there some way to enable correct copy/paste of formatted text from a Lotus Notes email directly into TinyMCE?
The scenario: A rolling comments system on a web site, into which users occasionally need to paste rich text from an email viewed in Lotus Notes.
The details:
I have tried copying some formatted text from emails viewed in Lotus Notes (7.0.4, Windows XP) and pasting it into the "Full featured example" implementation of TinyMCE at http://tinymce.moxiecode.com/examples/full.php and found that it generally fails to maintain the formatting. In fact, of the browsers I tested, IE6 fared the best, and the more modern W3C standards compliant browsers were the worst.
Some text formatting I tested was:
larger text
underline
italics
bold
numbered list
bullet list
indented text
permanent pen
font family: arial
font family: times new roman
Results:
-Firefox (3.6.8), Vista or XP: all formatting lost
-Chrome (5.0.375.125), Vista or XP: all formatting lost, including line breaks
-IE6 (XP): some formatting is maintained (fails to copy numbers and bullets for lists, but indents lists properly)
-IETester (IE6) Vista: some formatting is maintained (fails to format lists at all, and the underline tag is not closed)
-IE7 (XP): some formatting is maintained (fails to format lists at all, and the underline tag is not closed)
-IE8 (Vista): some formatting is maintained (fails to format lists at all, and the underline tag is not closed)
If I first paste the clipboard from Lotus Notes into MS Word 2003 (11.5604.5606) it shows perfectly in Word, and if I then copy/paste it from there into TinyMCE it generally works better enough to be usable, although still loses some formatting, even when using the "Paste from Word" button in TinyMCE. Not surprisingly, if I open my Lotus Notes mail in a web mail client, the HTML mail copies and pastes perfectly into TinyMCE.
Since it shows perfectly in my Domino web client, and pastes perfectly into MS Word, it is obviously possible to copy/paste Lotus Notes formatting.
If anyone has had success with this please mention your Notes and browser versions, and any modifications you had to make to the TinyMCE config.
If you check what's pasted from Word, you'll find that it's pretty much what you'd get if you had done a File->Save As->Web Page in Word: a whole bunch of Word-specific HTML attributes and CSS. Essentially, it's Word's ability to be coerced into exporting HTML that does the trick; Word's rich text alone won't do the job. The Notes clipboard (which is different from the system clipboard) can export RTF to the system clipboard, which then pastes (with limitations) to Word (which can interpret RTF), but a JavaScript widget in the browser doesn't understand RTF.
You can use the w32 api to do your formatted copy (e.g. make a special copy btn in LotusScript and call it). I have actually done this, and it works fine.
however, will TinyMCE handle the paste operation well? - that I cannot tell you.
I have logged this as a bug against TinyMCE.
Ok, then eigther you will need to deactivate the paste plugin and write a plugin of your own or you will have to configure/change the paste plugin to your needs.
If I first paste the clipboard from Lotus Notes into MS Word 2003 (11.5604.5606) it shows >perfectly in Word, and if I then copy/paste it from there into TinyMCE it generally works >better enough to be usable,
Thing is, that your OS detects (at least sometimes) from which kind of context (plain text, html,...) copy-paste is done. That probably is the reason why copying it first into Word helps a bit.