docx - markup / markup - docx conversion - ms-word

I have to store some documents in the docx format, but can't stand using msword: I would like to edit some kind of plain text markup, anything except stuff based on XML (I don't like that either) and convert from/to that to/from docx.
Are there any options for this?
EDIT: since people think this is not programming related, I'll extend my question. What libraries do you suggest for writing a complete tex-docx/docx-tex converter?

If you're talking .net, I'd check out the OpenXML toolkit first. There are lots of "libraries" on the internet to do this, but they all seem to just be thin wrappers around the OpenXML stuff.
You might also check out
http://openxmldeveloper.org/

Aspose.Words for .NET allows you to create DOCX files from scratch using text or other content and then convert DOCX files to text etc. It doesn't require MS Office to be installed on the system. And the component is a simple .NET assembly with an easy to learn and implement API. Please try and see if it helps in your scenario.
Disclosure: I work as developer evangelist at Aspose.

You can try the DocxEditorKit http://java-sl.com/docx_editor_kit.html
Set the editor kit to JEditorPane, add styled text and store the document in docx format.

Related

Can I convert .docx Word documents using the DocX .NET Library?

I am currently attempting to convert a couple of .NET desktop applications that I have developed into a web application harnessing AngularJS and RESTful services.
One of the key components of these applications is in their ability to generate Word documents on the fly using a .dotx Word template. I am currently exploring the possibility of using a third party library called DocX to generate these Word documents without resorting to using a template.
I guess my question is: Can I use this library to read an existing Word document in .docx format and generate a source code representation of the document? If this is possible could someone point me in the direction of any code samples that I could use? I have looked around and have been unable to find anything that could help me get started.
Generating code representation of the document and using it with DocX seems like a time consuming effort to me. Why not using a template instead and fill it with data at runtime?
I have some experience with Docentric, which is 3rd party OpenXML toolkit. It features an Word Add-in for template design and libraries for document generation and manipulation. It took me less then a week to generate pretty complex documents. If I was in your shoes I would definitely try some 3rd party toolkits. They cost money, but save time so do some math and see it they can be useful for you.
It is possible to read an existing Word document in .docx format with following code
DocX document = DocX.Load(filename)
While it is impossible to generate a source code representation of a document.

How can I open a .mediawiki file in Word so that Word will interpret it as a MediaWiki file rather than a TXT file?

I've recently installed the Microsoft Office Word Add-in For MediaWiki (http://www.microsoft.com/en-us/download/details.aspx?id=12298) and I'm able to save MediaWiki files just fine but I can't open them (they are opened as plain text).
How can I force MS Word to make the correct association for MediaWiki files?
You can't. "Interpreting" the MediaWiki markup, i.e. wikitext, is called parsing. Writing a MediaWiki parser is a pain and there is no single parser which fully works yet, other than MediaWiki itself. LibreOffice's wiki-publisher plugin and those which copied it are able to produce good wikitext from their well-formed data format, but making this bidirectional is another matter.
Parsoid is almost perfect now and produces standard HTML, but it's a rather heavy application, you can't expect it to be embedded in Word. Maybe someone can write a LibreOffice plugin for Parsoid, though! Would be scary, but who knows.
See https://www.mediawiki.org/wiki/Alternative_parsers for more information. Many tried and got hurt. :)

Is there currently a way to get Emacs muse-mode to output rtf,odt or doc format?

Muse is a special mode in emacs that can be used as a wiki. It has multiple output formats like static HTML pages, LaTeX, PDF etc.
But sometimes I need to output something that less tech-savvy people can edit/correct and send back to me.
I think either RTF, ODT or DOC would do the trick.
My problem is that muse only supports HTML, LaTeX, TexInfo and XML out of the box.
Implementing an own output format is currently not an option as I cannot program in elisp and learning it would take too much time.
I searched for a way to convert to or use markdown as pandoc can convert to RTF. But I found only the following discussion that does not solve my problem.
My last resort would be to convert to HTML and then to RTF, ODT or DOC but AFAIK the results are far from great.
It would appreciate a solution that can be automated (with custom scripts).
I think, that importing of HTML into MS Word (or compatible processor) should work. As I remember, OpenOffice had some scripting support, so you can launch it, and perform some commands inside it.
Another way - writing RTF export backend, it shouldn't be too complicated, although it could be too much details to be taken into account. If you'll go this way, please write to muse mailing list, and I'll try to help you

Is there a Platform-independent Web-based replacement for Word Templates?

The above Title is my Manager's words, not mine. :)
This is a follow-up to a question that I posted previously. After reading my assessment on the impacts of converting Word Templates from PC to Mac, I have now been asked to investigate whether Word Templates can be replaced with a "Platform-independent Web-based solution" (her words, not mine). She has suggested using Adobe Forms (ie. Adobe Designer).
Personally, I think the only truly platform-independent web-based solution is text files or html forms. What do other people think?
It's called WordprocessingML (aka. WordXML, WordML)...
Overview of WordprocessingML [Word 2003 XML Reference] at http://msdn.microsoft.com/en-us/library/aa212812(office.11).aspx.
MSDN Search for "WordML" at http://social.msdn.microsoft.com/Search/en-US?query=WordML&ac=3
It could be called XForms...
The Web was suppose to be platform-independent electronic documents. In other words, if you truly want platform-independence, then I agree with you and your forms should be in HTML. Yet, HTML forms are really not a good development platform. That is why Adobe, Microsoft, and others provide "form" solutions. XForms is an attempt to make developing and using HTML forms more flexible, overcome its limitations, and provide a platform-independent object model for completing HTML forms. You might want to look at XForms at http://www.w3.org/MarkUp/Forms/.
But, I wouldn't call it PDF
In my opinion, working with PDF files is difficult. I have not looked at the file format specification, but I heard it is not trivial. Moreover, you need a custom editor and you are locked into one vendor, which is Adobe. (Yet, there are other open-source and vendors who support the file format.) Adobe is not know for creating programs that are easy to use.
My Suggestion
If you are already using Word, then moving to WordML should be fairly easy. You can easily convert your existing Word documents into WordML by simply saving them as XML from the Save Dialog; therefore, you can automate this process through code. In addition, I believe WordML supports form templates (the actual form) and data documents (the actual data for a form).
It's called PDF...
At the core (and without the million of extra unnecessary features" that's exactly the niche that Adobe PDFs were designed to fill.
I'd suggest you look more into Adobe Acrobat Professional for more info. Although, I don't think there's any good way to directly convert Word docs to PDF format.
Note: This question should be moved to Super User since it's not really programming related
Google Docs meets those requirements of a Platform-independent Web-based solution. Your mileage will vary with Google Docs though - if you just want to use it for letters, it's good. Much beyond that, it's rather limited. Unless you get the Premier (read: Corporate) version which you have to pay for, you won't be able to programmatically fiddle with the templates.
If you want a "Platform-independent solution", go with ODF or OOXML. You can make either "web-based" to your hearts content - maybe with HTML5 or another solution such as Flash or Silverlight.

What's the easiest way to generate DOC files?

Right now I'm generating HTML with a Perlscript, and then manually converting to DOC in OpenOffice. Actually I have to copy, create new "Text document", paste, save, as it treats HTML and DOC as separate file types, but that's quite unessential. That's very inconvenient.
Is there any automated way I can convert HTML to decent DOC, or some other nice format like HTML I can generate textually and convert to DOC in automated way?
(I'm on OSX)
I can't help you get to .doc, but have you seen the Open XML Format SDK from Microsoft? This will allow you to generate Office 2007 format documents (.docx, .xlsx etc) from .NET code.
Theoretically you may have some luck with this under Mono on OS X, as it doesn't require an installation of Office 2007 (for Windows) to function.
Not sure if this is what you want, but you can fairly easily generate WordML documents with code. WordML is the Word 2003 XML file format. It's NOT the same thing at the Office 2007 Open XML formats. WordML is just one file that's not too hard to create if your just doing fairly basic formatting. You could generate it directly rather than creating the HTML first. You can name the files with a .DOC extension and Word 2003 and later will open them just fine. You can resave them as real .DOC file if you want.
Here's the on-line WordML reference. I can send you some sample code if you'd like.
http://msdn.microsoft.com/en-us/library/aa212812(office.11).aspx
If you really want to create a general file format that could be converted into other formats, creating XML-FO file might be the way to go. There are a number of products out there that can take XML-FO and transform it into other files, such as Word and PDF.
We do use the components of Aspose that are available for .NET and Java. With Java you should be able to use them on OS X, too.
You have to purchase the components (i.e. they are not free), but aside from this, they are really great.