How to build an inline translation system similar to Magento's - zend-framework

I am working on a Zend Framework, MVC, enterprise website project. I would like to develop a friendly translation system with the ability to translate each word according its context (sometimes same word have different translation).
Zend Framework uses Zend_Translate for i18n and localization. We have also seen Magento's (which uses ZF) inline translation system, where users can translate pages directly.
We want to know how this inline translation system works, so that we can build a similar system with improvements.
Where are translations stored: in the database or in CSV files?
How does the system know to fetch translations for the same word when tranlsated differently by the user on different pages?
How should we build a page to support inline translation?
How does the system handle static text vs. dynamic (database-driven) text?
Inline translation seems like it would make the site very slow. How does Magento solve this problem?
Please if you have more points that should be explained, write them. Thanks

Starting from the beginning here (in the future, this is probably more than one logical question):
Magento stores basic translations (provided by the programmer) in CSV files, but inline translations are stored in the database.
Magento's translations operate on entire strings, not words. By providing an entire sentence worth of context for translations, idiomatic translations are achievable. The tradeoff is obviously that every sentence must be translated, rather than every word.
Magento's answer to this is to wrap all localizable strings in a call to the localizer. Magento templates usually look something like this (the double-underscore function maps to the "translate into the current locale" function):
print $this->__("Please translate this string");
Dynamic text (as in product descriptions) in Magento is often not translated, but if you want to do so, it's as simple as passing the right string to the translator, like this:
print $this->__($someString);
It's unlikely that translation will make or break your site (look to your DB queries for most performance problems), but this is a legitimate question nonetheless. Magento does a few things to help. First, it stores serialized versions of the CSV files in a cache, so that reading CSVs is made more efficient. Secondly, Magento offers page caching so that an entire page's output can be stored (assuming that it will render identically), as well as block-level caching for smaller bits of a page. Between these you're in good shape for the most part.
Hope that helps!
Thanks,
Joe

Related

PDF generation from templated Word documents

I have a Word document(some template format) where it containing some placeholders for the data to be filled in and there are several Word documents like this which lies in some directory. When data comes I will be choosing different templates (based on some criteria) and fill the data and the documents have to be converted to PDF format.
I have been investigating Apache POI for this. If anyone has a good suggestion, it would be much appreciated.
As mbeckish mentioned you should indicate how you are going to run/automate this. For example is it one-off, run by hand or part of another program (and if so what programming languages do you use)?
If you are trying to automate it JODReports and Docmosis are tools that can use templates like you require and can produce PDF. JODReports is free. Docmosis is not but has several APIs. Please note I work for the company that develops Docmosis.
Hope that helps.
I've just uploaded this presentation, which presents three approaches for doing this.
Why not use any of existing PDF virtual printers?

How safe is the data being parsed by RTF editors like TinyMCE?

I have a great concern in deploying the TinyMCE editor on a website. Looking at the code parsed by the editor it does a great job, and I leave the HTML button off the toolbar configuration so users can not inject their own source.
However, from what I read in the TinyMCE docs, it claims to degrade nicely to a regular textarea should javascript be disabled on a users browser... and therein lies my concern. If it does revert to a normal textarea, then the user is then able to easily inject their own HTML, and this leaves me with a security concern.
I just pass through data created with TinyMCE, and it is used within another page created by my script, so it poses no security risk to my server. The security concern arises over what malicious data may be passed to another user viewing the generated page.
I know many of you will tell me to just use regexes, or parse this data, but that itself could be a nightmare, as I would be trying to either...
a.) Use regexes to try and clean up the HTML without breaking the generated page,
and it is better to parse the data for that anyway.
b.) Reparsing data that has already been parsed by the RTF editor, which also
would probably end up breaking the generated page.
Anyone with any previous experience with this type of scenario, I would really appreciate a 'heads-up' as to any other risks that using an RTF editor for user data could entail.
I would really like to provide this as a user option, but not if the risks outweigh giving the user using the RTF a chance to take a wack at another user viewing the page that is generated by the script.
My gut feeling is to steer a wide berth around use of the RTF at this point.
Thanks for any direction you can give me with your own experiences.
You cannot have client-side security on the web. You simply can't trust the browser, because it's easy for a malicious user to substitute a replacement browser that does whatever he wants.
If you accept HTML from users (using TinyMCE or through any other method) and display it to other users, you must sanitize or validate the HTML in some way on the server. If you're using Perl, the leading package seems to be HTML::Scrubber (along with various other modules that help you plug it in to various frameworks). I haven't had occasion to try it myself.
The TinyMCE Security page mentions some ways to make it harder for people to submit arbitrary HTML, but you still need server-side checks.
Regex is generally not considered good for parsing HTML
RegEx match open tags except XHTML self-contained tags but I have noted the "perl" tag :)
My advice when taking markup from users is to always parse it through something that can accept mal-formed HTML and return well formed HTML. These parses generally produce something that can be queried and updated with some form of XPath.
In Python there is a module called BeautifulSoup, Ruby has Nokogiri and in ASP.NET there is a project called HtmlAgilityPack that all do this sort of thing. I'm not sure what library perl has, but I'm sure there would be something.

Is there an alternative to open-xml sdk to generate word documents

I'm trying to generate word documents using open xml sdk. When the documents are small this is no problem (and rather easy). When the documents become larger (+500 pages) I notice the peformance (duration, memory usage, ...) goes down significantly.
Googling this problem I came across some posts that point out the same problem. For excel there is a solution with spreadsheetgear.
I would like to know if there is a word alternative to this or if there are other solutions to generate word documents?
Thanks,
Jelle
I've written a blog post series on generating Open XML WordprocessingML documents. The approach that I take is that you create a template Word document, insert content controls, and then write XPath expressions in those content controls to specify the XML to pull from a source XML data file. I've also explored another approach where you write C# code in Open XML content controls. That approach also works.
http://ericwhite.com/blog/map/generating-open-xml-wordprocessingml-documents-blog-post-series/
-Eric
You might look at http://docx.codeplex.com/
On Java, you could use docx4j. If you were brave, you could create DLLs for it via IKVM...
I decided to go with Aspose Words. It is really fast and not very demanding on resources (CPU, memory). It has the disadvantage that it is quite expensive. I also investigated Softartisans Office writer. The posibilities are the same but due to fact that the company I'm currently working for already used other Aspose components we decided to go with Aspose Word.

Is there a Platform-independent Web-based replacement for Word Templates?

The above Title is my Manager's words, not mine. :)
This is a follow-up to a question that I posted previously. After reading my assessment on the impacts of converting Word Templates from PC to Mac, I have now been asked to investigate whether Word Templates can be replaced with a "Platform-independent Web-based solution" (her words, not mine). She has suggested using Adobe Forms (ie. Adobe Designer).
Personally, I think the only truly platform-independent web-based solution is text files or html forms. What do other people think?
It's called WordprocessingML (aka. WordXML, WordML)...
Overview of WordprocessingML [Word 2003 XML Reference] at http://msdn.microsoft.com/en-us/library/aa212812(office.11).aspx.
MSDN Search for "WordML" at http://social.msdn.microsoft.com/Search/en-US?query=WordML&ac=3
It could be called XForms...
The Web was suppose to be platform-independent electronic documents. In other words, if you truly want platform-independence, then I agree with you and your forms should be in HTML. Yet, HTML forms are really not a good development platform. That is why Adobe, Microsoft, and others provide "form" solutions. XForms is an attempt to make developing and using HTML forms more flexible, overcome its limitations, and provide a platform-independent object model for completing HTML forms. You might want to look at XForms at http://www.w3.org/MarkUp/Forms/.
But, I wouldn't call it PDF
In my opinion, working with PDF files is difficult. I have not looked at the file format specification, but I heard it is not trivial. Moreover, you need a custom editor and you are locked into one vendor, which is Adobe. (Yet, there are other open-source and vendors who support the file format.) Adobe is not know for creating programs that are easy to use.
My Suggestion
If you are already using Word, then moving to WordML should be fairly easy. You can easily convert your existing Word documents into WordML by simply saving them as XML from the Save Dialog; therefore, you can automate this process through code. In addition, I believe WordML supports form templates (the actual form) and data documents (the actual data for a form).
It's called PDF...
At the core (and without the million of extra unnecessary features" that's exactly the niche that Adobe PDFs were designed to fill.
I'd suggest you look more into Adobe Acrobat Professional for more info. Although, I don't think there's any good way to directly convert Word docs to PDF format.
Note: This question should be moved to Super User since it's not really programming related
Google Docs meets those requirements of a Platform-independent Web-based solution. Your mileage will vary with Google Docs though - if you just want to use it for letters, it's good. Much beyond that, it's rather limited. Unless you get the Premier (read: Corporate) version which you have to pay for, you won't be able to programmatically fiddle with the templates.
If you want a "Platform-independent solution", go with ODF or OOXML. You can make either "web-based" to your hearts content - maybe with HTML5 or another solution such as Flash or Silverlight.

Interactive PDF Creation Alternatives to Acrobat?

Are there any good alternatives to Adobe Acrobat for creating interactive PDFs? The terminology is a little fuzzy here - by interactive, I mean "able to be filled in", and not necessarily "scriptable". So this form would be for data collection, rather than report generation which seems to be the common scenario for pdf-related questions on SO.
The trick is that they need to be fillable using Adobe Reader. For those who have not experienced the many frustrations of Acrobat - by default, Reader cannot fill in a form unless it was created using Acrobat Pro >8.0 and has specifically enabled usage rights. That's fine and it basically works (except then Pro users can't save their data - WTF?).
Because I am getting frustrated, I would ideally like to avoid Adobe products altogether (that is on the design side, for the users Reader is still a necessity or I would just do it as a db-backed web form). I'm wondering if anyone has has good experiences with alternatives? Either software libraries or products?
Thanks!
EDIT - Thanks, matt b - I'd seen iText before but didn't know it could create forms. Unfortunately, it looks like Reader cannot save filled-in data to the forms generated by iText (or generated by OO Writer). I've got the nasty feeling that what I want is fundamentally impossible except using Adobe's own rights management tools. If there are other ideas. I'd love to hear them.
You can create fillable form PDFs using OpenOffice.org as well as LibreOffice.
To create the initial form elements in the *.odt documents, enable the View --> Toolbars --> Form Controls tools, which allow you to add clickable checkboxes + radiobuttons, fillable text fields, pushbuttons and some more to the page(s).
When you're finished with your document, use File --> Export as PDF with the checkbox Create PDF form enabled.
Now your PDF form will be editable (and saveable!) with any non-Adobe PDF viewer.
NOTE, however: Adobe uses an own proprietary way to create and fill PDF forms. Adobe Reader does only support to fill PDF forms which were created by an Adobe product (and which have been assigned 'extended rights' so Reader can indeed save the formdata alongside the document).
Adobe Reader will not work with PDF forms you created with OpenOffice.org or LibreOffice ('work' in the sense of: 'allows you to fill+save the form data'.). The technical mechanism behind this is that Adobe digitally sign their form documents with their own key (which is known to the Adobe Reader, and which you agreed to not reverse engineer when you accepted the Adobe Reader EULA...). --
This means:
Non-Adobe PDF Readers will not be able to 'fill+save' forms created with Adobe products (they can 'fill+print' them however).
Adobe PDF readers will refuse to 'fill+save' forms created with non-Adobe products (they will 'fill+print' them however).
The latter two points will be true for all the tools and utilities mentioned in the other answers to this question. If I'm mistaken here, please let me know in a comment...
iText is pretty much the standard in the java-world for generating PDF files programmatically. Perhaps it can also be used to create PDFs with forms in them as you would like?
The open source page layout tool Scribus has a bunch of features oriented to creating interactive PDF forms. I haven't personally used them, but they appear reasonably complete and are covered by the tutorial.
Scribus is worth knowing about if you ever need to do serious page layout in any case.
XSL FO is some thing we used to create PDF files out of existing form data. Unless you want the fillable pdf to be sent out the client, this is a valid option.
IText lets you create Annotations (there are essentially 3 types of 'interactive' components - forms (old style FDF and new XFA) and Annotations. Acrobat and lots of third party tools should let you modify the Annotations values.
There is also a DotNet version of IText called ISharp - both are freeand extremely powerful.
CutePDF Pro allows you to turn a PDF into an interactive form.
Foxit reader allows you to save any pdf with the filled in fields.
I recently dabbled with Scribus. I found it to be an excellent tool if one has enough time to configure and play around with it. I highly recommend it. Wufoo is also very good.
I am not a fan of Acrobat / Adobe. A software should make my life easier not challenge me at every step.
If you search the net with these keywords - FREE FORM CREATOR and you can add the word HTML5.
You will find an array of sites where you can log online and all your clients can have their separate login, fill in data and the form remains in the Cloud and declutter your hard drive. All stakeholders can access the form and edit at anytime. The account can be used as a folder for your business. These forms can be accessed on any device and any platform.
Many of these forms are HTML5 driven, they are so beautiful and fluid. Keep away from macros, they carry viruses.
www.homebasedofficeservices.com