How to programmatically open and modify ODT file from Calc Macro - libreoffice

I am writing a Macro programm in a LibreOffice Calc spreadsheet. This macro should do the following (amongst other things):
open an existing ODT text document as a template
search and replace some strings with new values
save a copy of this as a new file
generate and open a PDF version
Is this possible somehow using LibreOffice Basic only? I have found nothing in the Libreoffice docs and examples and only this slightly related answer here on SO: How to programatically modify Open/Libre Office odt document?
Thanks!

open an existing ODT text document as a template
The method to open a document is loadComponentFromURL, for example https://help.libreoffice.org/6.4/en-US/text/sbasic/shared/stardesktop.html. There is nothing especially difficult about opening a document in Writer from a Calc macro, as LibreOffice components are well integrated. Remember to use the object returned from opening the document instead of ThisComponent.
search and replace some strings with new values
Andrew's Macro Document section 7.14. Search And Replace shows some ways to do this.
save a copy of this as a new file
The command is storeAsURL which is like "save as", not to be confused with storeToURL which would modify the existing file. See https://wiki.openoffice.org/wiki/Saving_a_document.
generate and open a PDF version
Generating a PDF is like any other save. The only difference is that the export filter writer_pdf_Export must be specified. An example is at https://ask.libreoffice.org/en/question/178818/how-i-export-pdf-using-macro/.
As for opening the PDF, what application do you want to open it? LibreOffice Draw can open a PDF although it's not a normal PDF viewer. Shell can call the viewer of your choice.

Related

Can Perl's MsOffice::Word::HTML::Writer edit existing Word documents?

I have a Word document and want to add to it an image logo, I found this module from CPAN MsOffice::Word::HTML::Writer
which can manipulate new word documents like for example writing image
$doc->write("<img src='files/my_image.gif'>");
My question how I can open with this module MsOffice::Word::HTML::Writer an already existed word document, as from the new method seems create a new Word document, but there are no option to open already created document.
The examples in the docs show that you have to attach an image first, then reference it:
$doc->attach("my_image.gif", $path_to_my_image);
$doc->write("<img src='files/my_image.gif'>");
But the module docs say that it's for making Word docs, not editing them:
The present module is one way to programatically generate documents targeted for Microsoft Word (MsWord). It doesn't need MsWord to be installed, and doesn't even require a Win32 machine (which is why the module is not in the Win32 namespace).

Microsoft Word to Org-mode

I am trying to put the Microsoft Word document in emacs using org-mode. I have copied the Word Document and pasted in emacs. I like to achieve the headings like 7.1.2.4 in org-mode format.
and then link the TOC to appropriate headings. How I can do that? Any suggestions? Any programming language like Perl has done it?
Thanks.
There is ODT2ORG (https://bitbucket.org/josemaria.alkala/odt2org/wiki/Home) which lets you import odt files in org-mode.
Use Openoffice/Libreoffice to produce an .odt from your .doc.
Use odt2org to get an .org.
About the headings: I am not entirely sure I understand you.
there is org-toc.el included in org-mode that provides a seperate buffer with a TOC of your current document (like in Reftex). All the entries there are already links to the individual headings. Also, an exported document will have a TOC included by default without your intervention.
Orgmode does not support automatically numbered headings (yet). However, if you want to export your document to html, docbook, latex, or pdf, your headings will appear numbered and nested (you can tweak the settings quite a lot).
I doubt that you will get your intended result purely automatically but it should work 70% automatically, especially if you have latex installed and simply want to have a good-looking pdf in the end. Convert doc to odt, convert odt to org, open and type "C-c C-e d".
Another option: Save as an HTML file, then use Pandoc to convert the HTML to an .org file.
I've converted loads of Word documents into Org files. It takes minutes to do it by hand.
If you want cross-references, use internal links (4.2 in the current manual).
The * and ** style headings are always likely to be there in Org. Think of the use case where exports are compiled from #+INCLUDEd files, or you have done a selective export using tags. Any kind of single sourcing technology isn't going to display the numbering.
There is a ruby gem which converts doc to md. With pandoc you can convert to org.
https://github.com/benbalter/word-to-markdown

Merging multiple MS Word documents already saved in one docx file with OpenXML

I did the merging multiple documents into one singe document (Test.docx) with FeedData and it works fine.
When I open the merged document Test.docx with WinZip, content looks like this:
File1.docx, File2.docx, File3.docx, where all merged documents are being stored like external files into Test.docx file.
Now wondering if there are possibility to be created one single document Test.docx with whole content inside instead of multiple files to be stored as it noted above, this will helps me a lot when I'm making Search / Replace content since like this, we opening file by file procedure?
Note: If I open the Test.docx via MS Word and press "Save", MS Word do the job but I would like to produce the same result via code?
Thank you in advance.
Best
Tod
Take a look at this article, and see if this is what you're looking for:
http://blogs.msdn.com/b/brian_jones/archive/2008/12/08/the-easy-way-to-assemble-multiple-word-documents.aspx
Another way to merge multiple Open XML DOCX files into a single file is using the DocumentBuilder module that is part of Open-Xml-PowerTools, which is an open source lib on github.
https://github.com/OfficeDev/Open-Xml-PowerTools
more info about DocumentBuilder: http://openxmldeveloper.org/wiki/w/wiki/documentbuilder.aspx
Given that you want to do search and replace, check out OpenXmlRegex, also part of Open-Xml-PowerTools:
http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2014/07/22/search-and-replace-content-in-docx-pptx-using-regular-expressions.aspx
All open source, all free (both as in beer and speech).

How to most effectively automate repetitive Excel task?

I want to automate Excel using Perl to do the following task(s):
For a list of Excel .xls files, do the following:
Open the file
Set Format to CSV
Save the file under the original filename and directory, but replace the extension "xls" with "csv"
Close the file
End
I found how to open files, even how to save them. I did not find how to change the fileformat/save as a different format. There shall be no user dialogs popping up, it should be fully automated. The Excel file list I can generate myself, a parameterized "find" or maybe "dir" should suffice.
If you are using Excel automation a great help is Excel itself. Use the VBA environment (Alt+F11) to get help for the Excel objects you want to use.
The objectbrowser (F2) is very valuable.
Workbook.SaveAs([Filename], [FileFormat], [Password], [WriteResPassword], [ReadOnlyRecommended], [CreateBackup], [AccessMode As XlSaveAsAccessMode = xlNoChange], [ConflictResolution], [AddToMru], [TextCodepage], [TextVisualLayout], [Local])
Searching for CSV in the object browser will show Excel constants with their values, since you probably cannot use these Excel constants in Perl.
See Spreadsheet::ParseExcel and xls2csv, they will help you.

How to generate Microsoft Word documents using Sphinx

Sphinx supports a few output formats:
Multiple HTML files (with html or dirhtml)
Latex which is useful for creating .pdf or .ps
text
How can I obtain output in a Microsoft Word file instead?
With another doc generator I managed to generate a single html output file and then convert it to Microsoft Word format using the Word application.
Unfortunately I don't know a way to generate either Word or the HTML single-page format.
The solution I use is singlehtml builder like andho mentioned in the comment, then convert the html to docx using pandoc.
The following sample assumes the generated html would be located at _build/singlehtml/index.html
make singlehtml
cd _build/singlehtml/
pandoc -o index.docx index.html
There is a Sphinx extension for generating docx format (which I haven't tested) and a newer one (which I also haven't tested, but looks like it is more actively maintained)
To convert files in restructured text to MSdoc, I use rst2odt and next unoconv. Look next script:
#!/bin/sh
rst2odt $1 $1.odt
unoconv -f doc $1.odt
rm $1.odt
With rst2odt you can use your own stylesheet: unoconv comes with OpenOffice and also allows to apply an Open Office style (template) during the conversion. Simply edit a converted document, change styles, add headers and footers, save that as an ODF Text Document Template (OTT) and use this as part of the conversion, like:
unoconv -f doc -t template.ott $1.odt
to use that template for various conversions later on.
I realize this is an old question, but I found that LibreOffice supports the following way of doing conversion (assuming soffice.exe is in your path):
soffice.exe --invisible --convert-to doc myInputFile.odt
Some things I have read say to use the --headless option rather than --invisible. Both seem to work on Windows.
You can start with the rst2odt.py script and then do the above to convert to an MS Word document.
Here is a link with additional start up options for LibreOffice:
http://help.libreoffice.org/Common/Starting_the_Software_With_Parameters
Here is a link with file types supported by OpenOffice which, I believe, LibreOffice should also support:
http://wiki.services.openoffice.org/wiki/Framework/Article/Filter/FilterList_OOo_3_0
This answer is not a command line solution and it is not obviously the best, but it simply works for me and save my time. After generating html file 1, you can open the generated html with a browser and copy the entire page (Crtl + a and Ctrl+ c) and then run Microsoft Office(or use live version if you don't have Microsoft Windows, like me) and paste (Ctrl+v) to it.
The best option might be rst -> odt -> doc
Convert the sphinx documents into openoffice format.
Then convert open the odt with openoffice and saved to Word. But I don't know how to do this automatically.
This is a workaround using Calibre (https://calibre-ebook.com), which includes a powerful converter. This worked well and most of the formatting are preserved:
Generate epub output in Sphinx make epub
Import epub output into Calibre and then convert epub to docx using inbuilt ebook converter.
Answer is too late for the original question, but people looking at the same problem may find this useful.
I don't now what Sphinx is, but you could create a rtf file or html file or something similar.
See the following blogpost for more information/approaches : OFFICE AUTOMATION
and from there : How to use ASP to generate a Rich Text Format (RTF) document to stream to Microsoft Word
This article describes how you can generate Rich Text Format (RTF) files with ASP script and then stream those files to Microsoft Word. This technique provides an alternative to server-side Automation of Microsoft Word for run-time document generation.
You don't use ASP script (who does :-) ), but for the idea.