Disable Automerge of Tables in Rational Publishing Engine - ms-word

We are using Rational Publishing Engine to generate documents from IBM Doors.
I want to create a 2x2 table for each requirement in the Doors database, e.g.:
ID SRS-1234
Req The system shall so some magick
However, if I export multiple Doors objects to MS Word, the 2x2 tables are merged into one big table inside the Word document. This means, for example, that a 10x2 table will be generated if 5 subsequent Doors objects are being exported.
Does anybody know a trick to hinder RPE/Word from merging the tables?
I would like to avoid adding additional paragraphs as paddings between the tables. Therefore, a setting to disable this behaviour would be my preferred solution.

The way to keep two adjacent tables from merging into one is to place a paragraph mark (ANSI 13) between them. Word requires a paragraph mark at the end of a table - it uses that structure to store information about the table, relative to the surrounding text.
If the paragraph mark should be less visible it can be formatted with a small font size (minimum: 1 pt). It might also be necessary to check the paragraph Before/After spacing and possibly the line spacing.
Should you need this a lot in a document it's a good idea to create a Style for this set of formatting and apply the style, as needed.

Related

How can I perform automated tests against MS Word documents using PowerShell?

We regularly need to perform a handful of relatively simple tests against a bunch of MS Word documents. As these checks are currently done manually, I am striving for a way to automate this. For example:
Check if every page actually has a page number and verify that it is correct.
Verify that a version identifier in the page header is identical across all pages.
Check if the document has a table of contents.
Check if the document has a table of figures.
Check if every figure has a caption.
et cetera. Is this reasonably feasible using PowerShell in conjunction with a Word API?
Powershell can access Word via its object model/Interop (on Windows, at any rate) and AIUI can also work with the Office Open XML OOXML) API, so really you should be able to write any checks you want on the document content. What is slightly less obvious is how you verify that the document content will result in a particular "printed appearance". I'm going to start with some comments on the details first.
Just bear in mind that in the following notes I'm just pointing out a few things that you might have to deal with. If you're examining documents produced by an organisation where people are already broadly speaking following the same standards, it may be easier.
Of the 5 examples you give, without checking the details I couldn't say exactly how you would do them, and there could be difficulties with all of them, but for example
Check if every page actually has a page number and verify that it is correct.
Difficult using either OOXML or the object model, because what you would really be checking is that the header for a particular section had a visible { PAGE } field code. Because that field code might be nested inside other fields that say "if don't display this field code", it's not so easy to be sure that there would be a page number.
Which is what I mean by checking the document's "printed appearance" - if, for example, you can use the object model to print to PDF and have some mechanism that lets PS inspect the PDF's content, that might be a better approach.
Verify that a version identifier in the page header is identical across all pages.
Similar problem to the above, IMO. It depends partly on how the version identifier might be inserted. Is it just a piece of text? Could it be constructed from a number of fields? Might it reference Document Properties or Variables, or Custom XML content?
Check if the document has a table of contents.
Perhaps enough to look for a TOC field that does not have certain options, such as a \c option that a Table of Figures would contain.
Check if the document has a table of figures.
Perhaps enough to check for a TOC field that does have a \c option, perhaps with a specific parameter such as "Figure"
Check if every figure has a caption.
Not sure that you can tell whether a particular image is "a Figure". But if you mean "verify that every graphic object has a caption", you could probably iterate through the inline and floating graphics in the document and verify that there was something that looked like a Word standard caption paragraph within a certain distance of that object. Word has two standard field code patterns for captions AFAIK (one where the chapter number is included and one where it isn't), so you could look for those. You could measure a distance between the image and the caption by ensuring that they were no more than a predefined number of paragraphs apart, or in the case of a floating image, perhaps that the paragraph anchoring the image was no more than so many paragraphs away from the caption.
A couple of more general problems that you might have to deal with:
- just because a document contains a certain feature, such as a ToC field, does not mean that it is visible. A TOC field might have been formatted as not visible. Even harder to detect, it could have been formatted as colored white.
- change tracking. You might have to use the Word object model to "accept changes" before checking whether any given feature is actually there or not. Unless you can find existing code that would help you do that using the OOXML representation of the document, that's probably a strong case for doing checks via the object model.
Some final observations
for future checks, perhaps worth noting that in principle you could create a "DocumentInspector" that users could call from Word BackStage to perform checks on a document. Not sure you can force users to run it, or that you could create it in PS, but perhaps a useful tool.
longer term, if you are doing a very large number of checks, perhaps worth considering whether you could train a ML model to try to detect problems.

Where are List informations stored in ApachePoi's XWPFDocument?

I want to merge two or more docx files (append them after each other) or move one part of the document(XWPFParagraph) to other place.
The problem is that listings always breaks after such an operation. Say we have a listing in a document which has sequence numbers then we have other listing in another document which has bullets or letters. Than after the copy all of the bullets becomes numbers (or worse numbers which starts from where the previous listing has been ended).
I have tried several solutions :
-traversing BodyElements and copying Paragraphs and Tables by hand like here.
-attaching a newBody into an existing one like here here
Aside from page scoped styles they work well. But the listings never. Is that means the listing symbols are stored as page scoped information (otherwise it would be copyied successfully with the XWPFParagraph)? If yes than why and where?
I have dig myself into the javadoc: https://poi.apache.org/apidocs/dev/org/apache/poi/xwpf/usermodel/XWPFDocument.html
But couldn't find anything about the listings.
The Word numberings (numbered lists but bullet lists also) in Office Open XML file format are stored in /word/numbering.xml of the *.docx ZIP archive. There are abstractNum elements describing the list format and num elements referencing the abstractNum. The numId of the numelements are referenced in paragraphs of /word/document.xml to set which numbering formats shall be used in that paragraph. Paragraphs referencing the same numId are in the same list.
Paragraphs referencing different numId are in different lists.
In apache poi there are XWPFNumbering representing the document part /word/numbering.xml and XWPFAbstractNum representing the abstractNum.
Until now there is no way creating XWPFAbstractNum from scratch without using the low level ooxml-schemas classes.
Also, as far as I know, there is no simple way to merge /word/numbering.xml document parts of different Word documents because of the need handling the different Ids in /word/numbering.xml as well as their occurrences in /word/document.xml. This is very complex and I do not know any free library which can do this properly.
In general, as far as I know, there is no simple way to merge different Word documents together because of the complex storage in Word file formats. All provided possibilities using free code are only halfway useful (traversing and copying), if not wrong and useless (simply attaching multiple document bodys one after the other) at all.

How to check if a table will fit in current page or get split up in two pages in itext7

I am trying to create a table in a PDF document with itext7. But, if the content before the table is too big, the table gets split up between current page and next page. I want to insert a -
document.Add(new AreaBreak())
if there is not enough space left in the current page for the table to be inserted completely. However, I have no idea on how to calculate the available space.
Any help or pointers will be highly appreciated.
From your requirement to avoid page-break inside the table, I assume Table#setKeepTogether(boolean) is exactly what you need.
This property ensures that, if it's possible, elements with this property are pushed to the next area if they are split between areas.
This is not exactly what you have asked, however it seems it's what you want to achieve. Manually checking for this use case might be tricky. You would need to have a look at renderers mechanism and inner processing of iText layout in order to get available space left and space needed for the table. You would also need to take care of the cases like if table is to big to be fit on single page. Also #setKeepTogether(boolean) works when elements are nested inside each other.

Binding Text Blocks in a Master to Shape Data on the Master

I'm returning to Visio after being a power user in the 2000's. A lot of what I'd do back in the day was create custom masters and associate data with the shape with individual labels etc. on those masters. Sort of a multi-part shape bound to the shape data on a given master, with fine-tuned arrangement.
The shapesheet seems entirely gone in 2016 Pro and now we have the data graphic features, which are nice and interesting, but they don't give you the same degree of fine-tuning and baked-in support that my old approach of building custom masters did.
How would I go about taking a text block on a master and binding it inside that master to the master's shape data for a given property? I'm betting it's a custom expression, but I'm not sure what the syntax would be.
Oh, my overall use case here: I want to have a shape with fine-tuned fields that are always visible, but appear in different compartments on the shape. I want to link external data into the shape and have the text blocks pull the value out of the shape data and render it for the area in question. I may use Data Graphics for ancillary things on a case by case basis, but at a core, I know I want certain features to always be present in a master and styled in certain ways.
to display the property of another shape you need to reference it in the form:
sheet!N.prop.X
N being the ID of the other shape, in your case the parent.
Store this value in a intermediate field, the use insert/field.
Here's a tool to do this automatically: http://visguy.com/vgforum/index.php?topic=6318.0
To handle input options to custom properties I recommend the following
1) set up a custom property of the page as semi-colon separated list for holding the desired values. eg: prop.myOption = "A;B;C"
2) in the shape needing this option, set up am according field as fixed list. In the format cell write: thePage!prop.options.
That's it. This way you can edit the list in one central place and have all the shapes updated.

iTextSharp comparing 2 PDFs for equality

I am generating and storing PDFs in a database.
The pdf data is stored in a text field using Convert.ToBase64String(pdf.ByteArray)
If I generate the same exact PDF that already exists in the database, and compare the 2 base64strings, they are not the same. A big portion is the same, but it appears about 5-10% of the text is different each time.
What would make 2 pdfs different if both were generated using the same method?
This is a problem because I can't tell if the PDF was modified since it was last saved to the db.
Edit: The 2 pdfs visually appear exactly the same when viewing the actual pdf, but the base64string of the bytes are different
Two PDFs that look 100% the same visually can be completely different under the covers. PDF producing programs are free to write the word "hello" as a single word or as five individual letters written in any order. They are also free to draw the lines of a table first followed by the cell contents, or the cell contents first, or any combination of these such as one cell at a time.
If you are actually programmatically creating the PDFs and you create two PDFs using completely identical code you still won't get files that are 100% identical. There's a couple of reasons for this, the most obvious is that PDFs support creation and modification dates. These will obviously change depending on when they are created. You can override these (and confuse everyone else so I don't recommend this) using something like this:
var info = writer.Info;
info.Put(PdfName.CREATIONDATE, new PdfDate(new DateTime(2001,01,01)));
info.Put(PdfName.MODDATE, new PdfDate(new DateTime(2001,01,01)));
However, PDFs also support a unique identifier in the trailer's /ID entry. To the best of my knowledge iText has no support for overriding this parameter. You could duplicate your PDF, change this manually and then calculate your differences and you might get closer to a comparison.
Then there's fonts. When subsetting fonts, producers create a unique internal name based on the original name and an arbitrary selection of six uppercase ASCII letters. So for the font Calibri the font's name could be JLXWHD+Calibri one time and SDGDJT+Calibri another time. iText doesn't support overriding of this because you'd probably do more harm than good. These internal names are used to avoid font subset collisions.
So the short answer is that unless you are comparing two files that are physical duplicates of each other you can't perform a direct comparison on their binary contents. The long answer is that you can tweak some of the PDF entries to remove unique parts for comparison only but you'd probably be doing more work than it would take to just re-store the file in the database.