How to get MS Word total pages count using Open XML SDK? - ms-word

I am using below code to get the page count but it is not giving actual page count(PFA). What is the better way to get the total pages count?
var pageCount = doc.ExtendedFilePropertiesPart.Properties.Pages.Text.Trim();
Note: We cannot use the Office Primary Interop Assemblies in my Azure web app service
Thanks in advance.

In theory, the following property can return that information from the Word Open XML file, using the Open XML SDK:
int pageCount = (int) document.ExtendedFilePropertiesPart.Properties.Pages.Text;
In practice, however, this isn't reliable. It might work, but then again, it might not - it all depends on 1) What Word managed to save in the file before it was closed and 2) what kind of editing may have been done on the closed file.
The only sure way to get a page number or a page count is to open a document in the Word application interface. Page count and number of pages is calculated dynamically, during editing, by Word. When a document is closed, this information is static and not necessarily what it will be when the document is open or printed.
See also https://github.com/OfficeDev/Open-XML-SDK/issues/22 for confirmation.

This code worked for me. It adds "Page X of Y" to the document.
para = new Paragraph(new Run(
new Text() { Text = "Page ", Space = SpaceProcessingModeValues.Preserve },
new SimpleField() { Instruction = "PAGE" },
new Text() { Text = " of ", Space = SpaceProcessingModeValues.Preserve },
new SimpleField() { Instruction = "NUMPAGES \\*MERGEFORMAT" }));

Related

How to properly close Word documents after Documents.Open

I have the following code for a C# console app. It parses a Word document for textboxes and inserts the same text into the document at the textbox anchor point with markup. This is so I can convert to Markdown using pandoc, including textbox content which is not available due to https://github.com/jgm/pandoc/issues/3086. I can then replace my custom markup with markdown after conversion.
The console app is called in a PowerShell loop for all documents in a target list.
When I first run the Powershell script, all documents are opened and saved (with a new name) without error. But the next time I run it, I get an occasional popup error:
The last time you opened '' it caused a serious error. Do you still want to open it?
I can get through this by selecting yes on every popup, but this requires intervention and is tedious and slow. I want to know why this code results in this problem?
string path = args[0];
Console.WriteLine($"Parsing {path}");
Application word = new Application();
Document doc = word.Documents.Open(path);
try
{
foreach (Shape shp in doc.Shapes)
{
if (shp.TextFrame.HasText != 0)
{
string text = shp.TextFrame.TextRange.Text;
int page = shp.Anchor.Information[WdInformation.wdActiveEndPageNumber];
string summary = Regex.Replace(text, #"\r\n?|\n", " ");
Console.WriteLine($"++++textbox++++ Page {page}: {summary.Substring(0, Math.Min(summary.Length, 40))}");
string newtext = #$"{Environment.NewLine}TEXTBOX START%%%{text}%%%TEXTBOX END{Environment.NewLine}";
var range = shp.Anchor;
range.InsertBefore(Environment.NewLine);
range.Collapse();
range.Text = newtext;
range.set_Style(WdBuiltinStyle.wdStyleNormal);
}
}
string newFile = Path.GetFullPath(path) + ".notb.docx";
doc.SaveAs2(newFile);
}
finally
{
doc.Close();
word.Quit();
}
The console app is called in a PowerShell loop for all documents in a target list.
You can automate Word from your PowerShell script directly without involving any other dependencies. At least that will allow you to keep a single Word instance without creating each time a new Word Application instance for each document:
Application word = new Application();
Document doc = word.Documents.Open(path);
In the loop you could just open documents for processing and then closing them. It should improve the overall performance of your solution.
When you are done processing a document you need to close it by using the Close method which closes the specified document.
Also when a new Word Application instance is created, don't forget to close it as well by calling the Quit method which quits Microsoft Word and optionally saves or routes the open documents.
Application.Quit SaveChanges:=wdSaveChanges, OriginalFormat:=wdWordDocument

Getting page margins of a Word document using OpenXML

Tried different version of word documents and different page margins in them, but nothing I found on the web did the job.
Here's my attempt at reading page margins from a .docx file.
var document = WordprocessingDocument.Open(sourceDocxPath, true);
var myMargins = document.MainDocumentPart.Document.GetFirstChild<PageMargin>();
// always null
In the truest programmer fashion, I figured out the answer 5 minutes after posting
var leftMargin = document.MainDocumentPart.Document.Body
.GetFirstChild<SectionProperties>()
.GetFirstChild<PageMargin>().Left;
I figured it out by inspecting the outerXml of the document and seeing where things are positioned.

Issue while adding/updating TOC in MS Word using OpenXML

I have a requirement to add/update TOC - Table of Contents in MS document using OpenXML. I am facing challenges to achieve the same. I am using MS Office 2016.
I have tried all the options from this post:
How to generate Table Of Contents using OpenXML SDK 2.0?
Also gone through Eric White videos.
I am trying to use UpdateField option and able to add empty TOC following the sample code from the above link.
However when I open the document I am not getting a pop-up dialog which will ask to update the TOC.
Here is the sample code:
var sdtBlock = new SdtBlock();
sdtBlock.InnerXml = GetTOC(); //TOC Xml
document.MainDocumentPart.Document.Body.AppendChild(sdtBlock);
DocumentFormat.OpenXml.Wordprocessing.SimpleField f;
f = new SimpleField();
f.Instruction = "sdtContent";
f.Dirty = true;
document.MainDocumentPart.Document.Body.Append(f);
var setting = document.MainDocumentPart.DocumentSettingsPart;
if (setting != null)
{
document.MainDocumentPart.DocumentSettingsPart.Settings.Append(new UpdateFieldsOnOpen() { Val = new DocumentFormat.OpenXml.OnOffValue(true)});
document.MainDocumentPart.DocumentSettingsPart.Settings.Save();
}
It is displaying default message, whereas I have valid entries (Headings).
Is it due to MS Office 2016? UpdateField Pop-up is not coming?
I don't want to go with below options:
Word Automation - Due to Microsoft.Office.Interop.Word
Word Automation Services - Require Sharepoint for this
Adding Macro - As it is asking to save the document every time I open it.
Also let me know if there is any better option to create/update TOC.
Your answer/comment is really very helpful.

how to start a new line in sapui5

//create page 4 content of Iviews
var page4 = new sap.m.Page("page4",{
title : "Iviews",
content : new sap.m.Text({
text : "An iView (integrated view) is a logical portal content building block representing a visual application or part thereof.iViews let you extend the reach of your portal to any available information resource, regardless of where it may be stored. The underlying architecture of iViews enables them to return up-to-the-minute information each time they are launched, from data sources as varied as:",
i want to start a new line in the text how to do it?
I think it was /n or \n in your text string... Try one of those and let me know afterwards

iTextSharp insert image PushbuttonField not working

I have searched and searched and cannot find the answer to my problem. I've tried many different approaches in my code, but I've hit a wall and I'm not sure where to go from here. I seem to be wanting to do the same thing as these two threads:
Trying to insert an image into a pdf‏ in c#
Add image in an existing PDF with itextsharp
They are very similar and the answer is the same. However, when I use that exact code, the result is a PDF without an image. Here is my code:
using (var existingFileStream = new FileStream(fileNameExisting, FileMode.Open))
using (var newFileStream = new FileStream(fileNameNew, FileMode.Create))
{
var pdfReader = new PdfReader(existingFileStream);
var stamper = new PdfStamper(pdfReader, newFileStream, '\0', true);
var form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (var field in form.Fields)
{
if (field.Key == "form1[0].ec_Bldg_Photo_1[0].ImageField2[0]")
{
PushbuttonField imageField = form.GetNewPushbuttonFromField(field.Key);
imageField.Layout = PushbuttonField.LAYOUT_ICON_ONLY;
imageField.IconReference = null;
imageField.ProportionalIcon = true;
imageField.Image = Image.GetInstance(#"PATH_TO_IMAGE\front.jpg");
form.ReplacePushbuttonField(field.Key, imageField.Field);
}
}
stamper.FormFlattening = false;
stamper.Close();
pdfReader.Close();
}
I have tried to rule out all of the obvious things. My path to the image is correct, the field is indeed a PushbuttonField when I read the existing PDF field and get the field type. If I open the PDF in Adobe Reader and click on the placeholder for the image, it allows me to pick a file from my PC. When I place an image in the file, save, and then read in that PDF, I can then change my code to this:
imageField.ProportionalIcon = false;
And now all of sudden the image is stretched on the saved copy. So I see that it is changing this part but this is when I enter the image manually in Adobe Reader. When I read in the field after I set that image in Adobe Reader and it shows correctly, I see a couple interesting things. The field.Image property IS NULL and the field.IconReference is NOT NULL. When I use the original code to try and insert the image, it is reversed, where Image is NOT NULL but IconReference IS NULL
Any help would be greatly appreciated, thank you!!
EDIT 1: Ok so I didn't see it the first time, but I went back and checked more thoroughly and I did find that key. Here it is:
Several things are at play here.
Usage Rights:
The PDF is digitally signed with a private key owned by Adobe.
You can see this using RUPS here (in your screen shot you didn't go deep enough):
This has two implications:
The signature unlocks special permissions in Adobe Reader, such as the permission to save a filled out form locally.
Making any changes to the original PDF breaks the signature and removes the special permissions leading to an ugly error message in Adobe Reader.
This functionality is deprecated in (and even removed from) PDF 2.0. It's old technology that became obsolete with the emergence of PDF viewers other than Adobe Reader.
My suggestion: remove the usage rights to avoid breaking the signature. See the FAQ entry "Why do I get an error saying that "use of extended features is no longer available"?" iText 7 / iText 5
This is the iText 7 code:
public void removeUsageRights(PdfDocument pdfDoc) {
PdfDictionary perms = pdfDoc.getCatalog().getPdfObject().getAsDictionary(PdfName.Perms);
if (perms == null) {
return;
}
perms.remove(new PdfName("UR"));
perms.remove(PdfName.UR3);
if (perms.size() == 0) {
pdfDoc.getCatalog().remove(PdfName.Perms);
}
}
This is the iText 5 code:
PdfReader reader = new PdfReader(old_file);
if (reader.hasUsageRights()) {
reader.removeUsageRights();
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(new_file));
stamper.close();
}
reader.close();
This is the iText 5 answer.
Hybrid Form:
If you click on the /AcroForm entry, you see this:
There is a /Fields array with references to field dictionaries that are also widget annotations. That means that the document has an AcroForm form inside. However, there is also an /XFA entry with a series of XML snippets. That means that the document has an XFA form inside.
In other words: the same form description is added twice inside. You are changing a button in one form (the AcroForm part), but not in the other (the XFA form) and that leads to inconsistencies.
XFA has been deprecated in PDF 2.0 because there weren't many vendors supporting that technology. It's kind of frustrating to be confronted with forms that use deprecated technology.
My suggestion: I would remove the XFA part. See the FAQ entry "Is it safe to remove XFA?" iText 5 / iText 7
In iText 5, removing XFA is done like this:
AcroFields form = stamper.getAcroFields();
form.removeXfa();
Important: my suggestion is to remove all the deprecated functionality from the PDF, but if the government expects that functionality to be present, then you're out of luck. In that case, you will need to use Adobe software to process the form. If that's the case, you could complain to the government that their requirements lead to a de facto vendor lock-in. By the way: iText Software is also a vendor. It's an open source company that offers open source software under the AGPL license. The AGPL license allows free use under certain circumstances (see How do I make sure my software complies with AGPL: How can I use iText for free?) If you don't meet those requirements, you will have to purchase a commercial license for your use of iText.