Create a docx (Word) document by using Perl (module) - perl

I have been looking for some time now, and I decided to try some crowd sourcing.
I have searched (Googled) the answer and looked through Stack Overflow for some time now, and I cannot find a proper and relatively easy way of created DOCX documents via Perl.
I want to create a DOC file, and since DOCX is XML based, I was guessing that would be an easier way to achieve this.
I located a RTF::Writer module but its very limited in its capabilities.
There are more than one such library for PHP, and other languages, but I cannot use that, unfortunately.
I am not running on a Windows environment so I cannot use anything that would integrate with Office, in addition I don't want to start bundling Office with my product.
I am open to suggestions, but please provide sensible ones :) i.e. no, you are scr*wed DOCX is impossible.
Here is what I tried:
1) Take an existing DOCX, and modify the XML directly, all I achieved via this is caused Word to crash :) apparently Word is very sensitive on its attribute order
2) Googled for answers and I found some, like Win32::Word::Writer which only works on Windows and requires OLE and Office
3) Found a lot of posts from 2010, that say its impossible, well almost 4 years have passed, probably something is out there that can do it
4) Looked for commercial solutions, couldn't find one, I found FOP which is able to create RTF, which is pretty close, but it lacks a lot of the styling I would like to use
5) A lot of things (code and modules) that allow extracting data from DOCX, but nothing that can create one, weird
6) Found abandoned code like OpenOffice::OODoc which stopped being written in 2010, and of course requires OpenOffice to be installed, and potentially also requires a non-headless (i.e. requires a GUI system)
Thanks guys for any answers :}

One cheat that I've used in the past is to output HTML with a ".doc" file name.
This gives you less fine-grained control over the document formatting, but may be sufficient for your use case.

The closest I've ever managed is to generate an OpenOffice document and then use that to export as .docx (in headless mode).
You need some fonts installed, but no GUI for this. I use OpenOffice::OODoc, and it's enough to let me open up an existing document and add text/pictures.
The OpenOffice (LibreOffice) export process is not 100% reliable, but I've never been able to get a simple, repeatable test case to reproduce it - just hangs occasionally. I add a timer to kill the process and let it retry.
Not a perfect situation, I'm afraid and I hope someone has a better solution.

Related

Need a Solution for file conversion in progress 4GL from .htm to .xlsx format

I have to load a .htm file and save it to .xlsx file format by automation using progress? Need a solution to solve this!!
AFAIK there is nothing in Progress that will help you do this. If I had to do this, I'd look to Apache POI, which is capable of creating .xlsx fairly cleanly and has a reasonable learning curve, although it is picky about data coming into it and its error messages are typically obtuse. Using Progress directly, you could parse the .html (painfully), but creating .xlsx yourself is probably unrealistic. So, I'd also hunt around for any tools that can do this directly. Good luck.
What O/S and Progress version are you using?
If the HTML is well-formatted, perhaps you could use XML parsing to assist. I have never tried this, though.
As for writing an Excel document, there are many approaches. If you are running on a Windows box that has Excel on it, there are solutions that allow you to call the Excel libraries from inside of Progress. If you need something portable, your choices are fewer. We use ABL_xks.i, which works under both Linux and Windows. It uses the native libraries under Windows, and produces an Excel XML spreadsheat under Linux.
As I recall, there exists a library that allows you to edit an OpenOffice template (Word or Excel) from within Progress. (I would have to go searching for this, but a good place to start looking might be the OpenEdge Hive). And there several commercial packages (especially report generators) that range from using the OO template technique to full automation of Excel output.
If this doesn't point you in the right direction, fill us in with some more details about what you want to do.

Gulp Inject / WireDep for Microsoft Word?

OK, so this is a bit out there, but a little as 5 years ago a minified js file was an oddity. Today it is common and expected.
So when you look at how we compile js files into one large one, in the correct order, wiring up dependencies and all of that, how come we don't have anything like this for MS Word?
My vision is this:
40 chapter book, each chapter in its own file. Pictures in their own file, and a Table of Contents that is automatically generated on "build". A glossary that is automatically generated on "build". Templates are used to enforce conformity even though multiple authors contribute. Clickable references resolved (think Chapter 1 Heading X as being resolved).
Anyone? How would I even search for that in Google?
EDIT:
I have solved this problem in the past using home made software and RTF. Even in the early 2000's using XML and XSLTs. Pretty neat, but really hard to maintain. With large documents never going away, how do the big boys handle this? I can't imagine everyone has self written software to do this, or worse, letting MS Word handle this entirely.
Using the information about TeX I found a compelling project that seems to be exactly what I'm looking for:
Pandoc-Seed-Project
It uses Gulp, Pandoc, and a very similar interface for us web developers.

any way to synchonise between Redmine(or other issue trackers) and a plain text todo list?

I would like to access a Redmine taskbase via a simple text based interface - wondering what the shortest path would be (minimum investment/development).
Right now, this boils down to 2 use cases/phases:
Import a batch of tasks into Redmine from simple, wiki-based, bulletted TODO list, ie. plain text content. This is more of a one-off task, so a quick and dirty solution would be fine.
Later, some smooth two-way synchrosation would be great.. E.g. edit loads of tasks via some friendly plain text (or XML) in an editor, or scripting where I could manipulate all of them with simple text processing; then synchronise with Redmine and commit them back.
Any ideas on the easiest way to achieve these?
I'd prefer an external solution (i.e not touching the server), especially for the one-off import case; something like a neat IDE/editor/client, or a standalone Ruby script (e.g using the RM API).
If an appropriate RM plugin would be available, I would not resist giving it a try (can get root access from our lovely IT support:)..
Current ideas:
Emacs/Org-mode, looks like a great combination of a cool task manager UI and full plain text power. It seems rich enough to capture tags, states as well. This artice looks promising Orgmode and Roundup: Bridging public bugtrackers and local tasklists, although not exactly a perfect match.
org-mode parser in Ruby, could be used in an script with redmine-api access, or - worst case(for me, right now)- in newly developed RM plugin.. This looks like a good start: org-ruby
export RM->XML, process file, import XML->RM... not sure if this is supported?
I guess it's always possible to talk to the DB directly, but I'd prefer to avoid that.
Actually, I'm also interested a similar solution for Bugzilla.
At the simplest level, you could write a RM/Rails plugin that parses an Org-Mode task list, updating corresponding issues in the RM Model.
Equally, you can build a view for Redmine (again as a Rails plugin) to generate an org list of the current (or subset of) issues.
For Bugzilla I think you would be best off using the XML-RPC interface to do your issue comparison/update sync, so you'd have to take a very different approach from Redmine.
If you have any specific questions, please update your question, it's quite broad at the moment.
Update
At the moment, there are a few plugins which will probably help you figure out your solution, for example Nick Boltons xml import and Martin Liu's Redmine CSV Import Plugin but neither of these are going to completely solve the problem for you, just give you some useful starting point.
On the other hand, If you write a script that interacts with Redmine's REST api, you don't need it to be in any specific lanugage, in fact you could do it in Emacs-lisp, if the target users of the script are all Emacs aware, then this might well be the best way to do the job. (it would certainly be the most appealing option to me.)
Maybe this can be useful: https://github.com/fukamachi/redmine-el

What is the best Perl module to use for creating a .pdf from scratch?

There are quite a number of modules on CPAN relating to the creation and manipulation of .pdf files, and I'm hoping this community can save me some time going down blind alleys.
I am looking to create .pdf files from scratch, with only simple formatting such as bold/italic and left/right/center justify. Being able to use a template file would nice, from an MVC perspective, but if the best module doesn't support that, I'm ok. I want the best module for my narrow problem set.
Edit: let's add the constraint that it does have to be a Perl module, if not a pure-perl solution. Thanks for answers thus far!
Update: PDF creation is one difficult problem to decide how to approach. In addition to the good suggestions here, there seems to be about 1,000 different ways to solve this, and knowing which solution(s) to invest your time in is a real challenge. It is easy to acquire dependencies on outside executables in the process of building this solution, which is why I have been favoring doing everything in Perl if possible.
I went down the road of trying to use PDF::Create but found it too limiting. You have to give coordinates to place each string of text and there is no built-in concept of text wrapping... this is all work you have to do. Impossible amount of overhead for my task.
I am now using PDF::API2, which is much more powerful than PDF::Create, but still demands the PDF be assembled at a troublingly low level. Luckily, there is some help online. See Rick Measham's excellent PDF::API2 tutorial with accompanying text_block() subroutine, which thankfully does the heavy lifting on the text wrap problem.
Unless you see another update here, this is the solution that ended up working for me.
I'm the author of the CPAN module CAM::PDF which is definitely not the best tool for this job -- it's designed for high-performance editing, not creating.
Among free PDF creation libraries, I like PDF::API2 the best. It has a very rich feature set and good encryption support (inspired by CAM::PDF I might add!) The author, Alfredo, manages a popular email list. People sometimes complain about documentation, but I've found it to be adequate.
Among commercial libraries, I've had good experiences with pdflib.
Three modules for creating PDF come to mind (in no particular order)....
PDF::API2
PDF:::Create
PDF::Template
PDF::Template gives you that template option you maybe hankering for? PDF::Create seems more straightforward (at least from the docs) and may meet your "simple formatting" requirement more adequately.
However if you want to know what the "community" thinks then only PDF::API2 gets a rating on CPAN Ratings coming in with 4 out of 5 stars overall score.
Hope that helps.
PS. Disclaimer: I've not used any of these modules. In past I've always gone for XML/XSLT/XSL-FO using Apache FOP with Perl being used to create the initial XML data. This can be an overkill for something small and not always ideal if you want to embed PDF generation into your Perl app.
PPS. So I'll also be looking at these CPAN PDF modules at some point in near future!
Does it have to be a Perl module? You could always use LaTeX and convert that to PDF. Not quite as straight-forward, but it is another option.
G'day Marcus,
Glad you found the tutorial. I do a lot of work in PDF::API2, so if there's anything I can help with, just let me know.
Naturally, I recommend PDF::API2!
There's a guy Jay Hannah, who's currently turning the text block into a module for CPAN that does exactly what you want: bold, italic, etc. If you check the mailing list, you'll see his posts at the top.
Cheers!
Rick Measham
Yeah, tough to answer without knowing exactly what your constraints are. If pure-Perl is not a necessity, I'd be inclined towards DocBook.
The initial markup you'll generate can be very simple XML; and the transformation requires just an XSL processor and shelling out to something like Apache's FOP.
how to save a online pdf file using perl?
http://www.nwcc.bc.ca/FNC/pdfs/Stepping%20Stones%20to%20improved%20Relationships%20-%20web.pdf
I am using file::download. but the problem is its not downloading url with url encoded strings.
sharma

MS Word is evil! Is there a good alternative? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
As a developer I really don't like writing documentation but when I have to I'd like to make the process as painless as possible.
The problem with Word is that it constantly gets in my way. I worry more about the layout than about the actual content ... that's why I'd like to get rid of Word.
Ideally I'd like to write my content and then 'compile' it into a document.
I've heard of LaTeX but I don't have any experience with it whatsoever. Would this be the right technology for the job? What editor (Windows) should I use? Is it a good idea to start with LyX?
EDIT: I'm not asking about documenting code (I use Sandcastle for that).
Update 2014:
We have now switched to GFM (GitHub Flavored Markdown).
It's really easy to work with.
Write code & documentation in the same IDE!
Everything can be versioned!
Get great output either as raw txt, html or pdf!
My solution to this was to invest some time in creating a decent Word Template for myself.
The important thing to do is make sure you have a Style defined for everything you can put in the document.
Once you have all the Styles defined and all of the document content tagged with the correct Style instead of formatted in an ad hoc fashion, you'll be surprised how easy it is to produce good looking Word documents quickly every time.
The wider problem here is that everyone spends hours in Word and yet it is very rare for companies to invest in Word training. At some point you have to bite the bullet and take the time to teach yourself how to use it properly, just like you would with any other tool.
Anything you can do with LyX you can do with LaTeX. LaTeX is suitable for all sorts of things; it has been used for everything from manuals to lecture slides to novels.
I think LaTeX is probably worth looking into as an option; if you've ever wanted to "code" for your word processor, LaTeX is for you. At the simplest level you can define new commands to do things for you, but there's a lot of power there. And the output looks really neat.
In my opinion, LyX is fantastic in certain circumstances, handy in others, and occasionally just gets in your way. I think it should be seen as a productivity booster for LaTeX. In other words, learn to use LaTeX before trying LyX. Both are of course free and available for Windows, though the learning curve is quite steep compared with MS Word. For long documents, or plenty of similar documents, LaTeX/LyX is probably a worthwhile investment.
I've found that wikis can be good for this. Find a wiki you like that lets you do a bit of formatting, but nothing really heavy. Ideally it should let you format code easily too - to be honest, the markdown available on SO is probably a good start.
That way:
You have change tracking built-in (assuming a decent wiki)
You can edit from anywhere
Everyone always sees the same documentation (instant distribution)
You can concentrate on content instead of formatting
You could write your documentation using your own XML format and then transform it into any format with XSL (e.g. PDF via FOP+XSL-FO ).
See also the DocBook XML format.
LaTeX is an extremely powerful tool and might well be overkill here as it is designed for scientific/mathematical literature. It has a (relatively) steep learning curve and can be tricky to coax to do exactly as you want if you're new to it. I LOVE LaTeX, but it is not really a general purpose word processor.
Have you considered OpenOffice instead?
LaTeX is really a very powerful language if you need to write documents.
Perhaps you can try texmaker, a cross-platform LaTeX editor:
Texmaker is a clean, highly
configurable LaTeX editor with good
hot key support and extensive Latex
documentation. Texmaker integrates
many tools needed to develop
documents with LaTeX, in just one
application. It has some nice
features such as syntax highlighting,
insertion of 370 mathematical symbols
with only one click, and "structure
view" of the document for easier
navigation.
What about using HTML? This way you could then publish the documentation if there will be need for many people to access it from many places.
Despite all efforts and reasonable expectation I don't think Word Processing has been "solved" yet.
My response to what I also personally find a deeply frustrating experience with MS Word is to avoid it altogether and use an auto-documenting tool like GhostDoc to generate XML from what I've already written in the code (DRY!) and deal with the XML from an XSLT based intranet site or similar later.
Are you talking about documenting your actual code? If so, I recommend Doxygen for unmanaged code and Sandcastle for managed code. Both will compile your help or build it as a website for you.
Both applications will read special tags above functions / classes / variables and compile that into the help.
Well I've never found anything wrong with MS-Word in the first place. (i.e if you take the time to know how to use it effectively). OpenOffice indeed is an amazing & credible free alternative - but then if you hate MS Word for layout related problems, the same problem is gonna occur with OpenOffice too.
Never tried the Latex system myself, but have heard its good for scientific work. I think using some HTML WYSIWYG editor would be best for you, if you want to just focus on the content.
I considered a wiki, but I decided to go with a modified Markdown notation, for the simple reason, that a wiki's content isn't easily exported and distributed outside of the wiki itself, while the Markdown can be rendered into HTML.
Answer to chris' question about my workflow: I write the documentation with a Notepad-like application (TextWrangler, only because of its word-wrapping feature) in its raw Markdown format. Then I have a small localhost documentation website with my modified Markdown parser (extended for a few features and a bit more HTML-oriented functionality) that checks for the timestamps for the documentation files - if a file has been updated, it parses that file into HTML, and stores the file in a cache.
This way I'm able to edit the source documentation on my desktop, and just press F5 in my browser to see the results immediately.
I haven't got around to trying it yet, but I've always thought AsciiDoc would be good for this kind of thing.
If you want something simpler than LaTeX, you can have a look at ReStructured Text
Read this book: http://en.wikipedia.org/wiki/The_Pragmatic_Programmer . There is some idee fixe inside, so that documentation should be built automatically. Think about using your IDE for this, or look for some additional tools. Most modern languages support generating documentation as you write the code. This can simply maintain your doc in touch with latest changes in the code.
I prefer to use a RTF editor which is a lot less clunkier than words. This way the formatting and all the headers/footers nonsense will not take up half your time. Wordpad has worked for me on several occasions. I'm stuck with Word for now though :(
there are a lot of possible ways:
embedded documentation, e.g. javadoc: good for describing APIs, not so good for the "big picture"
plain html: can be checked in under version control, a definite plus
a wiki, e.g. confluence -- great for collaboration, but has version control different from your source
LaTeX or somesuch: better suited for books or papers than typical documentation; support for graphics is cumbersome
an Office clone, e.g. OpenOffice: mostly the same as Word+Visio, but open source, with a nicer document format
I usually document the software structure (the "metaphors" of a project, component interrelations, external systems) up front, using Visio, in "freeform" UML. These are then embedded in confluence, which can be converted to PDF if someone wants a printout.
LyX
LyX is a WYSIWYM front end to LaTeX: You get the convenience of a document processor (somewhat similar to Word) with the consistency and power of LaTeX: It doesn't get in your way and can do a lot of things that professional writers need.
Note: The correct answer for you really depends on your way of thinking --- we can't decide this for you. This answer simply shows an excellent choice if you think of documentations as documents and want something similar to Word (where Word is good) that doesn't suck as Word (where Word is bad for programmers).
But many programmers think of documentation differently and hence prefer different metaphors. I myself had the same problem years ago, worked with LaTeX (as I am a mathematician), found LyX and finally settled on a Wiki/Source system that I wrote myself.
Vim is the solution for anything that means writing plain text in the most efficient possible way. If you need formatting, then use XML, Latex or something similar (in Vim).
Vim changed my life!
Simple answer: LaTeX sounds like just what you are looking for.
I use it for writing documentation myself. I will never go back to Word if I have the option.
At phc, we started with latex, then moved to docbook, and have settled (permanently I hope) on Restructured Text/Sphinx.
Latex was chosen because we are academics, and latex is the tool of choice. I believe it didn't generate good enough HTML.
Docbook was chosen for power, but it was very unwieldy. It put us off writing any documentation: code had to be manually formatted, we kept forgetting the syntax, and it was difficult to read. The learning curve was also steep.
Finally, we moved to reST, using sphinx, and that was a great decision. Documentation is now very easy to write, and both PDF and HTML versions look beautiful (though the PDF could do with some customization). Its very easy to customize too.
The best bit about reST though, is that its human readable in source form. That is a wonderful advantage. I've switched to using reST for all my stuff now, especially anything over the web (except of course academic papers, where one would be foolish to use anything but latex).
You may want to look into doxygen at http://www.doxygen.nl/, see their nice examples. In this case, the documentation is presented by tags in comments in the source.
Another option would be to use an online system like trac from http://trac.edgewall.org/ which is a wiki/doc/issuetracking system that lives on top of subversion.