Right now I'm generating HTML with a Perlscript, and then manually converting to DOC in OpenOffice. Actually I have to copy, create new "Text document", paste, save, as it treats HTML and DOC as separate file types, but that's quite unessential. That's very inconvenient.
Is there any automated way I can convert HTML to decent DOC, or some other nice format like HTML I can generate textually and convert to DOC in automated way?
(I'm on OSX)
I can't help you get to .doc, but have you seen the Open XML Format SDK from Microsoft? This will allow you to generate Office 2007 format documents (.docx, .xlsx etc) from .NET code.
Theoretically you may have some luck with this under Mono on OS X, as it doesn't require an installation of Office 2007 (for Windows) to function.
Not sure if this is what you want, but you can fairly easily generate WordML documents with code. WordML is the Word 2003 XML file format. It's NOT the same thing at the Office 2007 Open XML formats. WordML is just one file that's not too hard to create if your just doing fairly basic formatting. You could generate it directly rather than creating the HTML first. You can name the files with a .DOC extension and Word 2003 and later will open them just fine. You can resave them as real .DOC file if you want.
Here's the on-line WordML reference. I can send you some sample code if you'd like.
http://msdn.microsoft.com/en-us/library/aa212812(office.11).aspx
If you really want to create a general file format that could be converted into other formats, creating XML-FO file might be the way to go. There are a number of products out there that can take XML-FO and transform it into other files, such as Word and PDF.
We do use the components of Aspose that are available for .NET and Java. With Java you should be able to use them on OS X, too.
You have to purchase the components (i.e. they are not free), but aside from this, they are really great.
Related
I am currently attempting to convert a couple of .NET desktop applications that I have developed into a web application harnessing AngularJS and RESTful services.
One of the key components of these applications is in their ability to generate Word documents on the fly using a .dotx Word template. I am currently exploring the possibility of using a third party library called DocX to generate these Word documents without resorting to using a template.
I guess my question is: Can I use this library to read an existing Word document in .docx format and generate a source code representation of the document? If this is possible could someone point me in the direction of any code samples that I could use? I have looked around and have been unable to find anything that could help me get started.
Generating code representation of the document and using it with DocX seems like a time consuming effort to me. Why not using a template instead and fill it with data at runtime?
I have some experience with Docentric, which is 3rd party OpenXML toolkit. It features an Word Add-in for template design and libraries for document generation and manipulation. It took me less then a week to generate pretty complex documents. If I was in your shoes I would definitely try some 3rd party toolkits. They cost money, but save time so do some math and see it they can be useful for you.
It is possible to read an existing Word document in .docx format with following code
DocX document = DocX.Load(filename)
While it is impossible to generate a source code representation of a document.
We are developing a Java application that needs to programmatically convert .rtf, .doc and .docx files to PDF files.
Formatting is important to us, so we need the page numbers to be the same between a source file and a target PDF file, and the contents of each page being the same as the original file.
We have tried out open source solutions, such as JODConverter to invoke a LibreOffice of OpenOffice installation, Docx4j and XDocReport. The best formatting was achieved with LibreOffice. However, even in that case, the pages were different (for example, a 87-page .rtf file results in an 80-page PDF file).
So, we think that the ideal way to make the conversion would be to somehow invoke Microsoft Word though our Java application, and make the conversion with it. That would produce PDF files that have the same formatting as the original files.
Is this possible in any of the following ways:
An API that is directly invokeable through Java?
An API that is invokeable through a .Net language and we would use that with something like JACOB?
A 3rd party library that uses a Microsoft Word installation under the hood (something like JODConverter for Word)?
A CLI interface supported by Word (relevant question)?
Something else?
I have to store some documents in the docx format, but can't stand using msword: I would like to edit some kind of plain text markup, anything except stuff based on XML (I don't like that either) and convert from/to that to/from docx.
Are there any options for this?
EDIT: since people think this is not programming related, I'll extend my question. What libraries do you suggest for writing a complete tex-docx/docx-tex converter?
If you're talking .net, I'd check out the OpenXML toolkit first. There are lots of "libraries" on the internet to do this, but they all seem to just be thin wrappers around the OpenXML stuff.
You might also check out
http://openxmldeveloper.org/
Aspose.Words for .NET allows you to create DOCX files from scratch using text or other content and then convert DOCX files to text etc. It doesn't require MS Office to be installed on the system. And the component is a simple .NET assembly with an easy to learn and implement API. Please try and see if it helps in your scenario.
Disclosure: I work as developer evangelist at Aspose.
You can try the DocxEditorKit http://java-sl.com/docx_editor_kit.html
Set the editor kit to JEditorPane, add styled text and store the document in docx format.
I would like to create a .docx file within an iPad application. The file would be created within the app (the user would create/edit it like in Word--preferably with the same "feel" of Word) and then it would be saved as a .docx file.
So, is it possible to do this? If so, how? What other alternative file formats are there?
Thanks,
John
You can easily generate RTF corresponding to most typical features of a word processor. It will not cover the vastness of available DOCX features, but I'm not certain a complete port of Microsoft Word to the iPhone would be practical, so most of these features would be unavailable anyway.
RTF is fully (read-write) supported by Microsoft Office and several other editors.
I need to write pure Msword file in objective C.
I have been writing .txt file till now but when i write a .doc file i'm facing encoding issues with all encoding schemes.
Microsoft provide library in visual studio to play with .doc files which is not available in Xcode.
So is there any way to make it happen?
You can extract the code from e.g. OpenOffice project. This is C++ code, but you can use a wrapper around. This will be a lot of work.
If it's an option for you:
On the server use Microsft VSTO to create a document (doc,els,ppt...) using ServerDocument class and passing the data that you collect on the client. Then you download that file and your phone.
Sounds like you are trying to port MS word. Basically u need to write XML files and zip them in a particular manner. Check out the markup specification here
http://msdn.microsoft.com/en-us/library/cc313105%28office.12%29.aspx
if it's .docx, you may compose an XML file and rename it. With .doc you may have to do it with your own wrapper.