Can I convert .docx Word documents using the DocX .NET Library? - ms-word

I am currently attempting to convert a couple of .NET desktop applications that I have developed into a web application harnessing AngularJS and RESTful services.
One of the key components of these applications is in their ability to generate Word documents on the fly using a .dotx Word template. I am currently exploring the possibility of using a third party library called DocX to generate these Word documents without resorting to using a template.
I guess my question is: Can I use this library to read an existing Word document in .docx format and generate a source code representation of the document? If this is possible could someone point me in the direction of any code samples that I could use? I have looked around and have been unable to find anything that could help me get started.

Generating code representation of the document and using it with DocX seems like a time consuming effort to me. Why not using a template instead and fill it with data at runtime?
I have some experience with Docentric, which is 3rd party OpenXML toolkit. It features an Word Add-in for template design and libraries for document generation and manipulation. It took me less then a week to generate pretty complex documents. If I was in your shoes I would definitely try some 3rd party toolkits. They cost money, but save time so do some math and see it they can be useful for you.

It is possible to read an existing Word document in .docx format with following code
DocX document = DocX.Load(filename)
While it is impossible to generate a source code representation of a document.

Related

PDF generation from templated Word documents

I have a Word document(some template format) where it containing some placeholders for the data to be filled in and there are several Word documents like this which lies in some directory. When data comes I will be choosing different templates (based on some criteria) and fill the data and the documents have to be converted to PDF format.
I have been investigating Apache POI for this. If anyone has a good suggestion, it would be much appreciated.
As mbeckish mentioned you should indicate how you are going to run/automate this. For example is it one-off, run by hand or part of another program (and if so what programming languages do you use)?
If you are trying to automate it JODReports and Docmosis are tools that can use templates like you require and can produce PDF. JODReports is free. Docmosis is not but has several APIs. Please note I work for the company that develops Docmosis.
Hope that helps.
I've just uploaded this presentation, which presents three approaches for doing this.
Why not use any of existing PDF virtual printers?

asp.net web application to convert pdf to word

Is there any clear and proper process to convert a pdf file into a word file with all formatting and images in asp.net web application?
The best way to do that is by using the OCR. It will recognize the text and the images in the PDF file, and then you can save it on a DOC file. I know a third party toolkit named leadtools that should help you doing your requirements, since it support the ASP.NET environment. You can check their Online OCR Demo
Also, you can check their website for more information, or contact their support team.
PDF is a presentational format where all the content is placed by absolute positions. There are no paragraphs and other structured elements (unless it is a Tagged PDF). Technically, you can output every word character by character in any order, but visually it would look like a normal text. Thus, to make a proper conversion to word it is required to do content recognition or some kind of OCR (e.g. ABBYY FineReader)
There are some paid components on the market that allow to do text extraction and some do converting pages to images (obviously, this is not a desired approach for converting into word).

docx - markup / markup - docx conversion

I have to store some documents in the docx format, but can't stand using msword: I would like to edit some kind of plain text markup, anything except stuff based on XML (I don't like that either) and convert from/to that to/from docx.
Are there any options for this?
EDIT: since people think this is not programming related, I'll extend my question. What libraries do you suggest for writing a complete tex-docx/docx-tex converter?
If you're talking .net, I'd check out the OpenXML toolkit first. There are lots of "libraries" on the internet to do this, but they all seem to just be thin wrappers around the OpenXML stuff.
You might also check out
http://openxmldeveloper.org/
Aspose.Words for .NET allows you to create DOCX files from scratch using text or other content and then convert DOCX files to text etc. It doesn't require MS Office to be installed on the system. And the component is a simple .NET assembly with an easy to learn and implement API. Please try and see if it helps in your scenario.
Disclosure: I work as developer evangelist at Aspose.
You can try the DocxEditorKit http://java-sl.com/docx_editor_kit.html
Set the editor kit to JEditorPane, add styled text and store the document in docx format.

Is there an alternative to open-xml sdk to generate word documents

I'm trying to generate word documents using open xml sdk. When the documents are small this is no problem (and rather easy). When the documents become larger (+500 pages) I notice the peformance (duration, memory usage, ...) goes down significantly.
Googling this problem I came across some posts that point out the same problem. For excel there is a solution with spreadsheetgear.
I would like to know if there is a word alternative to this or if there are other solutions to generate word documents?
Thanks,
Jelle
I've written a blog post series on generating Open XML WordprocessingML documents. The approach that I take is that you create a template Word document, insert content controls, and then write XPath expressions in those content controls to specify the XML to pull from a source XML data file. I've also explored another approach where you write C# code in Open XML content controls. That approach also works.
http://ericwhite.com/blog/map/generating-open-xml-wordprocessingml-documents-blog-post-series/
-Eric
You might look at http://docx.codeplex.com/
On Java, you could use docx4j. If you were brave, you could create DLLs for it via IKVM...
I decided to go with Aspose Words. It is really fast and not very demanding on resources (CPU, memory). It has the disadvantage that it is quite expensive. I also investigated Softartisans Office writer. The posibilities are the same but due to fact that the company I'm currently working for already used other Aspose components we decided to go with Aspose Word.

What's the easiest way to generate DOC files?

Right now I'm generating HTML with a Perlscript, and then manually converting to DOC in OpenOffice. Actually I have to copy, create new "Text document", paste, save, as it treats HTML and DOC as separate file types, but that's quite unessential. That's very inconvenient.
Is there any automated way I can convert HTML to decent DOC, or some other nice format like HTML I can generate textually and convert to DOC in automated way?
(I'm on OSX)
I can't help you get to .doc, but have you seen the Open XML Format SDK from Microsoft? This will allow you to generate Office 2007 format documents (.docx, .xlsx etc) from .NET code.
Theoretically you may have some luck with this under Mono on OS X, as it doesn't require an installation of Office 2007 (for Windows) to function.
Not sure if this is what you want, but you can fairly easily generate WordML documents with code. WordML is the Word 2003 XML file format. It's NOT the same thing at the Office 2007 Open XML formats. WordML is just one file that's not too hard to create if your just doing fairly basic formatting. You could generate it directly rather than creating the HTML first. You can name the files with a .DOC extension and Word 2003 and later will open them just fine. You can resave them as real .DOC file if you want.
Here's the on-line WordML reference. I can send you some sample code if you'd like.
http://msdn.microsoft.com/en-us/library/aa212812(office.11).aspx
If you really want to create a general file format that could be converted into other formats, creating XML-FO file might be the way to go. There are a number of products out there that can take XML-FO and transform it into other files, such as Word and PDF.
We do use the components of Aspose that are available for .NET and Java. With Java you should be able to use them on OS X, too.
You have to purchase the components (i.e. they are not free), but aside from this, they are really great.