Programmatically convert DITA to FrameMaker - dita

Is there a toolkit available (paid or otherwise) to help with programmatically converting a DITA document to a FrameMaker one?
I'm attempting to make an application that converts to multiple formats from DITA. I know I can use the DITA Open Toolkit for most of my needs, but I need to be able to create a native FrameMaker document as well.
Programing language doesn't matter, altho I prefer Java as my application will be web based.

Arbortext import-export is industrial strength and very flexible. You could also try MIFtoGo. Conversion is tricky because source documents are rarely consistent. Conversion without cleanup before and after is next to impossible.

DITA-FMx is what you need:
http://leximation.com/dita-fmx/
Using DITA-FMx, you generate a FrameMaker book from your ditamap (and then save the FM book as PDF).
There is a movie on YouTube that shows you how the process goes. Just search for "PDF Publishing with DITA-FMx 1.1" (Stack Overflow does not allow me to post a second URL here yet)
If you like to see an example, just send me a small sample of a ditamap and I'll generate a FrameMaker book for you.

The disclaimer is that if you're converting a lot of documents you'll be better off with a supported well-integrated solution that fully uses FrameMaker's DITA support. If you're looking to do it on the cheap though (and who isn't) you can do this conversion by using straight XSLT and framemaker templates.
First create the framemaker template to handle the appearance of the document, then use XSLT to map your DITA content to the content tags you've used in Framemaker.
You can use the free SAXON xslt interpretor to do the actual conversion.
Here is some of adobe's reference material on authoring new DITA documents:
http://help.adobe.com/en_US/FrameMaker/8.0/05h/dita.html
Info on Framemakers's native XML support is here:
http://www.adobe.com/products/framemaker/pdfs/xml_fm7.pdf
The framemaker manuals also cover the subject quite extensively. Hope that helps

Indeed FM supports loading DITA files (I tried FM10 and newer) but to automate conversion to .FM format you either use the internal scripting mechanism which is still some manual work.
I have found an existing free utility that can do most basic operations like opening a file, 'saving as' another format and closing it.
tool name is DZBatcher
Example DZbatcher batch file:
Open "c:\My Dita Files\Doc1.dita"
SaveAs -d "c:\My FM Files\Doc1.fm"
Close "c:\My FM Files\Doc1.fm"
Open "c:\My Dita Files\Doc2.dita"
SaveAs -d "c:\My FM Files\Doc2.fm"
Close "c:\My FM Files\Doc2.fm"
Exit
Adobe has a framework called RoboHelp which is probably the infrastructure for this, but I didn't dig deeper as this utility did the job perfectly.
In my SW flow, I created this batch file using a python script that scan all the docs in an input directory and added 3 lines per file as seen above.
I used FM2015 for this task.

Bryan, after a decade's experience converting Frame, Word, Interleaf, etc to XML, I'll tell you that Adobe doesn't have it fully covered. The DITA support in FrameMaker works best if you also have the Leximation plug-in or know how to program the Adobe proprietary EDD. You can't do DITA specialization without programming the EDD in FrameMaker.

FrameMaker has excellent support for DITA. You can open DITA topics, and save them (if you wish) as .fm files. You could also open DITAMAPS, and save them as FrameMaker book files, or as composite (monolithic) .fm files. There is no need to write a parser.
PS: I am talking about FM 12.

Related

Is there currently a way to get Emacs muse-mode to output rtf,odt or doc format?

Muse is a special mode in emacs that can be used as a wiki. It has multiple output formats like static HTML pages, LaTeX, PDF etc.
But sometimes I need to output something that less tech-savvy people can edit/correct and send back to me.
I think either RTF, ODT or DOC would do the trick.
My problem is that muse only supports HTML, LaTeX, TexInfo and XML out of the box.
Implementing an own output format is currently not an option as I cannot program in elisp and learning it would take too much time.
I searched for a way to convert to or use markdown as pandoc can convert to RTF. But I found only the following discussion that does not solve my problem.
My last resort would be to convert to HTML and then to RTF, ODT or DOC but AFAIK the results are far from great.
It would appreciate a solution that can be automated (with custom scripts).
I think, that importing of HTML into MS Word (or compatible processor) should work. As I remember, OpenOffice had some scripting support, so you can launch it, and perform some commands inside it.
Another way - writing RTF export backend, it shouldn't be too complicated, although it could be too much details to be taken into account. If you'll go this way, please write to muse mailing list, and I'll try to help you

docx - markup / markup - docx conversion

I have to store some documents in the docx format, but can't stand using msword: I would like to edit some kind of plain text markup, anything except stuff based on XML (I don't like that either) and convert from/to that to/from docx.
Are there any options for this?
EDIT: since people think this is not programming related, I'll extend my question. What libraries do you suggest for writing a complete tex-docx/docx-tex converter?
If you're talking .net, I'd check out the OpenXML toolkit first. There are lots of "libraries" on the internet to do this, but they all seem to just be thin wrappers around the OpenXML stuff.
You might also check out
http://openxmldeveloper.org/
Aspose.Words for .NET allows you to create DOCX files from scratch using text or other content and then convert DOCX files to text etc. It doesn't require MS Office to be installed on the system. And the component is a simple .NET assembly with an easy to learn and implement API. Please try and see if it helps in your scenario.
Disclosure: I work as developer evangelist at Aspose.
You can try the DocxEditorKit http://java-sl.com/docx_editor_kit.html
Set the editor kit to JEditorPane, add styled text and store the document in docx format.

how can previous versions of office files be accessed using open xml sdk 2.0

I suppose office files(prior MS office 2007) cannot be accessed using open xml sdk2.0 or if they cannot be programmatically accessed using the open xml format.
so is there any way to work or these older version files or can i view the xml content of these files.
or is it that open xml sdk isnt designed for that purpose
See the answer to a similar question I asked when I just started learning this SDK.
No but the open source project POI provides an API to most of the old formats. Warning POI is a bit (not a lot) buggy, does not fully implement the specs, and support is catch as catch can (ie it's open source).
You can use Office File converters to convert to open xml formats and start processing it.
See here:
http://technet.microsoft.com/en-us/library/cc179019(v=office.14).aspx
I'm using this for my application. This works for me.
Hope this helps.

Is there a Platform-independent Web-based replacement for Word Templates?

The above Title is my Manager's words, not mine. :)
This is a follow-up to a question that I posted previously. After reading my assessment on the impacts of converting Word Templates from PC to Mac, I have now been asked to investigate whether Word Templates can be replaced with a "Platform-independent Web-based solution" (her words, not mine). She has suggested using Adobe Forms (ie. Adobe Designer).
Personally, I think the only truly platform-independent web-based solution is text files or html forms. What do other people think?
It's called WordprocessingML (aka. WordXML, WordML)...
Overview of WordprocessingML [Word 2003 XML Reference] at http://msdn.microsoft.com/en-us/library/aa212812(office.11).aspx.
MSDN Search for "WordML" at http://social.msdn.microsoft.com/Search/en-US?query=WordML&ac=3
It could be called XForms...
The Web was suppose to be platform-independent electronic documents. In other words, if you truly want platform-independence, then I agree with you and your forms should be in HTML. Yet, HTML forms are really not a good development platform. That is why Adobe, Microsoft, and others provide "form" solutions. XForms is an attempt to make developing and using HTML forms more flexible, overcome its limitations, and provide a platform-independent object model for completing HTML forms. You might want to look at XForms at http://www.w3.org/MarkUp/Forms/.
But, I wouldn't call it PDF
In my opinion, working with PDF files is difficult. I have not looked at the file format specification, but I heard it is not trivial. Moreover, you need a custom editor and you are locked into one vendor, which is Adobe. (Yet, there are other open-source and vendors who support the file format.) Adobe is not know for creating programs that are easy to use.
My Suggestion
If you are already using Word, then moving to WordML should be fairly easy. You can easily convert your existing Word documents into WordML by simply saving them as XML from the Save Dialog; therefore, you can automate this process through code. In addition, I believe WordML supports form templates (the actual form) and data documents (the actual data for a form).
It's called PDF...
At the core (and without the million of extra unnecessary features" that's exactly the niche that Adobe PDFs were designed to fill.
I'd suggest you look more into Adobe Acrobat Professional for more info. Although, I don't think there's any good way to directly convert Word docs to PDF format.
Note: This question should be moved to Super User since it's not really programming related
Google Docs meets those requirements of a Platform-independent Web-based solution. Your mileage will vary with Google Docs though - if you just want to use it for letters, it's good. Much beyond that, it's rather limited. Unless you get the Premier (read: Corporate) version which you have to pay for, you won't be able to programmatically fiddle with the templates.
If you want a "Platform-independent solution", go with ODF or OOXML. You can make either "web-based" to your hearts content - maybe with HTML5 or another solution such as Flash or Silverlight.

What's the easiest way to generate DOC files?

Right now I'm generating HTML with a Perlscript, and then manually converting to DOC in OpenOffice. Actually I have to copy, create new "Text document", paste, save, as it treats HTML and DOC as separate file types, but that's quite unessential. That's very inconvenient.
Is there any automated way I can convert HTML to decent DOC, or some other nice format like HTML I can generate textually and convert to DOC in automated way?
(I'm on OSX)
I can't help you get to .doc, but have you seen the Open XML Format SDK from Microsoft? This will allow you to generate Office 2007 format documents (.docx, .xlsx etc) from .NET code.
Theoretically you may have some luck with this under Mono on OS X, as it doesn't require an installation of Office 2007 (for Windows) to function.
Not sure if this is what you want, but you can fairly easily generate WordML documents with code. WordML is the Word 2003 XML file format. It's NOT the same thing at the Office 2007 Open XML formats. WordML is just one file that's not too hard to create if your just doing fairly basic formatting. You could generate it directly rather than creating the HTML first. You can name the files with a .DOC extension and Word 2003 and later will open them just fine. You can resave them as real .DOC file if you want.
Here's the on-line WordML reference. I can send you some sample code if you'd like.
http://msdn.microsoft.com/en-us/library/aa212812(office.11).aspx
If you really want to create a general file format that could be converted into other formats, creating XML-FO file might be the way to go. There are a number of products out there that can take XML-FO and transform it into other files, such as Word and PDF.
We do use the components of Aspose that are available for .NET and Java. With Java you should be able to use them on OS X, too.
You have to purchase the components (i.e. they are not free), but aside from this, they are really great.