OO:Doc -perl module for Openoffic - perl

I want to automate some writer tasks. I need to create a .odt writer
document with oo:doc using methods such as create paragraph and append
paragraph. The problem is that append paragraph and create paragraph does not
allow text to start at middle of page or at a certain column, ie
Name Surname Address
When I unzip the "master" document I want to to create, when I inspect the content.xml file i see the xml equivalent is
" <text:p text:style-name="Text_20_body"><text:s text:c="115"/><text:span text:style-name="T1"><text:s/>Hallo how are you today</text:span></text:p><text:p text:style-name="P1"><text:s text:c="116"/>I hope you are well also</text:p><text:p text:style-name="P1""
How do I set the text:c and text:s element(s) from within oo::doc
Question2:
How do i set the formatting of a paragraph
to only extend from ie column 20 to column 80
thanks

Those elements are for runs of non-breaking spaces. the text:c attribute says how many spaces there are.
That doesn't strike me as a solution to what you want, which is to change the margins and position of a paragraph, yes?
Do you have a document that you want to use as a template, where the text will be inserted? Or ar you trying to create the entire page from scratch?
I think you want to use OpenOffice.org to create a Writer document that has the structure you want, then look at the XML to see what the markup is that accomplishes that. Look at paragraph-level styles or even frames if that is what is used. You might be able to create insertion points for your generated content by then adding magic-text phrases that you can scan for.
Then figure out how to get that done with the perl module.

Related

The auto numbering/bullets of the Source MS Word document do not automatically align with those of the Destination MS Word document

It happens when a user inserts a Word Document(Source) into another Word Document(Destination) - auto numbering/bullets didn't work...
I wont to insert text here, and continue auto-numbering of source as it in destination:
Step 1
But it inserting above without auto-numbering:
Step 2
I have an answer from MS that it's a feature of MS Word :(.
So, the question is - is it possible to automatically align the above either by use Word Automation ( via C# , .NET Interop Word APIs )?
I mean to align the source document contents as per the destination document’s Auto-numbering. The same is about Bullets...
Please help - we are open to any suggestions/ recommendations.
Both documents need to have the same style definitions for both the Paragraph Style of the text being copied and also the List Style that organizes the numbering or bullets. Using automation, you can copy styles between the files before actually moving the numbered or bulleted text.
I agree with John.
The basic idea is to use the Define New MultiLevel List Dialog to define a new multilevel list in which each level you would be using is attached to an existing paragraph style. A paragraph style can be attached to only one level in one list. Then use the styles to apply the numbering.
Once you have them set up, you apply the numbering by using the styles, not the numbering controls.
Setting up the numbering linked to styles can seem a bit convoluted. Step-by-step instructions for doing it in Windows can be found here:
http://www.shaunakelly.com/word/numbering/numbering20072010.html
Backup: http://web.archive.org/web/20130510174814/http://www.shaunakelly.com/word/numbering/numbering20072010.html
For a Mac, John has a page showing the Mac controls to accomplish it.
http://www.brandwares.com/bestpractices/2016/06/outline-numbering-in-word-for-os-x/
Backup link: http://web.archive.org/web/20200912134758/http://www.brandwares.com/bestpractices/2016/06/outline-numbering-in-word-for-os-x/
These describe the only known ways to have consistent numbering in Word in heavily-edited / co-authored documents.

iText - Manipulate existing PDF - add dashes to end of each paragraph

I need to manipulate existing PDF in iText to add dashes to the end of each paragraph. Something like this:
I would make this in Word with tab leaders.
Is this possible to do with iText on an existing document.
Any help would be greatly appreciated.
Thanks!
Edit for clarifications
iText version is 5.5.x, but I guess we can upgrade it if the task would be easier with newer version.
There could be some paragraph that do not need dashes, but I have some control of the original PDF. It is assembled from different system and I could add some kind of markers to the paragraphs that need leaders (ie. I can add text like "~tab~" at the end of such paragraphs).
At the moment the documents that need this kind of editing have headers and footer, nothing but the text and one column with justified alignment.
Edit for even more clarification
I can even (by configuration) set where the dashes has to end (ie. at 10px) for specific document. We know every document type (and its structure) that needs to be manipulated this way.
This is insanely hard.
You should think of a PDF document as a container of instructions, rather than a WYSIWYG format. So finding out where lines are (let alone paragraphs) is very hard.
High level plan:
use IEventListener to process events from the PDF being parsed
look out for TextRenderInfo events, store them
sort TextRenderInfo events to ensure your list of events is in logical reading order.
merge items in your list if they appear on the same line and are less than a certain distance apart (for instance the distance of 3 spaces in the font specified by TextRenderInfo)
Now you should have lines
Merge lines if they appear in close vertical proximity of eachother and they overlap horizontally. How close they should be, and how much they overlap is something you'll have to figure out, and might differ from page to page, and document to document.
now you should have paragraphs
Figure out the bounding box of each paragraph. Or more accurately, the convex hull. There is a good algorithm for this called the gift-wrapping algorithm.
Now you can simply insert lines by inspecting your convex hull. This is the easy step.
If you can insert markers, you can easily do this using iText7. iText7 has an implementation of IEventListener that allows you to look for regular expressions within a PDF document. It returns the locations where the regular expression was found. If you can ensure your markers always satisfy some kind of regular expression, you can easily look for them, get their coordinates, and insert a line at the calculated position.
Of course, then you need to get rid of the marker text.
For that you can use pdfSweep.

Is there an option to control output page orientation (using knitr->pander->pandoc->docx)

I am playing with Tal's intro to producing word tables with as little overhead as possible in real world situations. (Please see for reproducible examples there - Thanks, Tal!) In real application, tables are to wide to print them on a portrait-oriented page, but you might not want to split them.
Sorry if I have overlooked this in the pandoc or pander documentation, but how do I control page orientation (portrait/landscape) when writing from R to a Word .docx file?
I maybe should add tat I started using knitr+markdown, and I am not yet familiar with LaTex syntax. But I'm trying to pick up as much as possible while getting my stuff done.
I am pretty sure the docx writer has no section breaks implemented, also as far as I understand --reference-docx allows for customizing styles and not the page layout (but I might also be wrong here), this is from pandocs guide on --reference-docx:
--reference-docx=FILE
Use the specified file as a style reference in producing a docx file.
For best results, the reference docx should be a modified version of a
docx file produced using pandoc. The contents of the reference docx
are ignored, but its stylesheets are used in the new docx. If no
reference docx is specified on the command line, pandoc will look for
a file reference.docx in the user data directory (see --data-dir). If
this is not found either, sensible defaults will be used. The
following styles are used by pandoc: [paragraph] Normal, Title,
Authors, Date, Heading 1, Heading 2, Heading 3, Heading 4, Heading 5,
Block Quote, Definition Term, Definition, Body Text, Table Caption,
Image Caption; [character] Default Paragraph Font, Body Text Char,
Verbatim Char, Footnote Ref, Link.
Which are styles that are saved in the /word/styles.xml component of the docx document.
The page layout on the other hand is saved in the /word/document.xml component in the <w:sectPr> tag, but pandoc's docx writer ignores this part as far as I can tell.
The docx writer builds by default a continuous document, with elements such as headers, paragraphs, simple tables and so on ... much like a html output.
Option #1 (doesn't solve the page orientation problem):
The only page layout option that you can define through styles is the pageBreakBefore which will add a page break before a certain style
Option #2 (seems elegant but hasn't been tested):
Recently the custom writer has been added that allows for a custom lua script, where you should be able to define how certain Pandoc blocks will be written into the output file ... meaning you could potentially define section breaks and page layout for a specific block inserting the sectPr tag into the document. I haven't tried this out but it would be worth investigating. On pandoc github you can check out a sample lua script file for custom html output.
However, this means, you have to have lua installed, learn the language, and it is up to you if you think its worth the time investment.
Optin #3 (a couple of clicks in Word might just do):
As you will probably spend quite some time setting up how to insert sections and what would be the right size, margins, and figuring how to fit the table to such a layout ... I recommend that you use pandoc to put write your document.docx, that you open in Word, and do the layout by hand:
select the table you want on the landscape page
go to Layout > Margins
> select Apply to: Selected text
> choose Page Setup > select Landscape
Now a new section with a landscape orientation should surround your table.
What you would anyway also probably want to do is styling the table and table caption a little (font-size,...), to achieve the best result (all text styling can be already applied with pandoc where --reference-docx comes handy).
Option #4 (in situation when you can just use pdf instead of docx):
As far as I could figure out is that with pandoc does a good job with tables in md -> docx (alignment, style, ... ), in tex -> docx it had some trouble sometimes. However if your option allows for a pdf output latex will be your greatest friend. For example your problem is solved as easily as just using
\usepackage{pdflscape}
and adding this around your table
\begin{landscape}
...
\end{landscape}
This are the options that I could think of so far.
I would always recommend using the pdf format for reports, as you can style it to your liking with latex and the layout will stay the way you want it to be.
However, I also know that for various reasons word documents are still the main way of reviewing manuscripts in many fields ... so i would most likely just go with my suggested option 3, mostly cause it is a lazy and quick solution and because I usually don't have many documents with tons of giant tables with awkward placement and styling.
Good luck ;-)
Based on Taleb's answer here and some officer package functions, I created a little gist that one can use like this:
---
title: "Example"
author: "Dan Chaltiel"
output:
word_document:
pandoc_args:
'--lua-filter=page-break.lua'
---
I'm in portrait
\endLandscape
I'm in landscape
\endPortrait
I'm in portrait again
With page-breaks.lua being the file hosted here: https://gist.github.com/DanChaltiel/e7505e62341093cfdc489265963b6c8f
This is far from perfect (for instance it won't work without the last portrait section), but it is quite useful sometimes.

Is there a way to detect when a field mark with 'can grow' has truncated the field data?

I do not normally work with crystal, but I have spent nearly 2 days looking for a way to do this.
The problem is that I have a number of lines of text that need to show on a report, but need to cut off after 8 lines and show a 'more' prompt to inform the user that they need to go look at the rest of the details online. This was originally handled by storing the data as individual lines already wrap to size and counting the lines with a formula and conditionally showing a separate 'more' field. They have since added the ability to use html to the text, but this made the current way of doing things wrap incorrectly and show the html mark up.
I wrote a database function to combine the text into a single field and use the HTML text interpretation to display it correctly on 7 other reports that do not limit the text length, and the max line count works great for limiting the text size, I just can't figure out how to show the 'more' prompt when needed.
Any suggestions would be greatly appreciated.
GrumpyGeek,
If your database function now combines the text into a single field does this mean the original way, with the separated lines, is still stored? If so, why not add another calculated field called 'line-count' that tallies the old line-based data?
So you'd still have your new combined HTML field and this new field that you could use to show the 'more' button when 'line-count > x'?
Alternatively, another option might work, but would be a bit touchy. That is to make the formula that shows the more button trigger when the field length exceeds x. The catch is that html mark-up isn't displayed, and heavy use of it would skew the amount of text required before you should show the 'more' button. Put another way, a field with very heavy use of mark-up ( and tags) might force the 'more' button earlier than it should. Unless you could somehow make either your 'line-count' calculated field exclude the mark-up OR make the length calculation do the same.
This would be possible if MSSQL or Crystal Reports could run regex to strip the mark-up.
If NONE of the above works, the only other thing I can suggest is to look into UDFs. Crystal allows you to load an external library that you write. These will read functions you write and show them in the function list inside Crystal. If you do this, then you could easily write a routine that strips the HTML and calculates when the more button should be shown.
Good luck with it.
Ideally, there would be a property of the DB field that would return its displayed line count. Unfortunately, there is no such property.
You could try counting the # of line ending characters (e.g. carriage return, line feed). If they are > 7 then show the hyperlink. In a HTML situation, you have to count ending elements (e.g. ). You could make use of a RegEx UFL to make it easier to identify the elements.
Probably the easiest route is to the DB to calculate the # of lines and return that as another field. Use this field to hide/show the hyperlink.

Is it possible to show the contents of a text file in Crystal Reports

I have a crystal report which contains a list of absolutely referenced text files. There is one text file referenced in each body line.
e.g.
line1 c:\file1.txt
line2 c:\file2.txt
Is there any way to display the contents of these files in Crystal?
i.e. I would like each crystal body line to show the text from the referenced text file.
I'm using Crystal reports 11 with a non-standard database connector (dataflex).
You would need to set up a file dsn (in XP it's under Control Panel/Administrative Tools/Datasources (ODBC)) and then use the file dsn (Microsoft Text Driver) for the datasource as an ODBC(RDO) connection.
I set this test scenario up on mine like the following:
**File 1**
column1
1row1
1row2
1row3
**File 2**
column1
2row1
2row2
2row3
I set up the file dsn to point to the c drive and in the datasource screen I added file1.txt and file2.txt to the selected tables. Then the easiest thing to do is clear the links of the tables so that it pulls every row. It will warn you that there are multiple starting points. I don't generally recomend this, but it will work in this case and since it's not reporting off a database it probably isn't the end of the world. If you disregard the starting point message then add the fields to the report, when you run it you should get the following output:
1row1 2row1
1row1 2row2
1row1 2row3
1row2 2row1
1row2 2row2
1row2 2row3
1row3 2row1
1row3 2row2
1row3 2row3
From this you can change your grouping to get the output that you need.
You can also use this same connect against subreports instead of doing this linking where you have the main report pull the info from file1.txt and then put a subreport in the report footer that pulls from file2.txt. This option won't have the text collated, but you'd still have it in the same report.
Hope this helps some.
It's easier than you think. I just set up one myself before I wrote this to make sure I was giving you the right steps. Using CR version XI and a .txt file, I followed these steps:
For each text file you want to import, make a subsection in your report (i.e. DetailsA, DetailsB, etc.). If your list of text files is constantly changing (and I don't think it is, based on your description), you'll need another method.
Make sure your text file is comma delimited and the first row contains field names. If these text files are actually text (i.e. not tables), then just put a dummy variable name in the first row so Crystal will see the text as a table of data with just 1 row.
For each text file you want to display, create a new Subreport (Insert->Subreport)
In the database selection menu, go to "Create New Connection"->"Access/Excel (DAO)"
Under 'database type', you'll see a 'text' option at the bottom of the screen.
Choose your file.
Relax! (I'm in a good mood this morning, don't know why)
I guess if you have a function that takes a file name as an argument and returns the contents of that file - you could use that function in a Crystal Report formula.
I am not familiar with the current CR, it has been years since I last used it (I last used version 8). In the versions I did use, such a function was not built in. What you would have to do back then, was to create a UFL (user function library) containing the functions you needed. If I remember correctly, you had to do this using COM.
In this day and age, I guess you can extend CR using some other mechanism, perhaps writing .NET code?
I suggest you search the CR documentation for the term UFL.
Another suggestion, then:
Create a new table FILECONTENTS (filename varchar primary key, contents blob)
Create a script that on a schedule populates this table with the filenames and contents of all the files (assuming that there is a finite number of files, and that you have a way of knowing about them)
Modify the report datasource query to join it with the FILECONTENTS table, and add the contents field to the report.
You could setup a file dsn. But this is geared toward tabular file data, not text.
How big are these text files? You want to display the entire contents of each file?
There is probably no easy way to dynamically read in a file from within crystal. You will most likely have to push a dataset to the report which contains the file contents.