OpenXML Merging Documents - How to See Content using Productivity Tool - openxml

I'm fairly new too and struggling with OpenXML. We basically have a Word template and depending on the number of records the user selects in a website we have, we pull the records out of the database and use the template to create a word document. Then we assemble all these documents into one master Word document which is served up to the user. We use OpenXML and altchunks to do this:
string altChunkId = "AltChunkId" + itemNo.ToString();
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart( AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open(createdFileName, FileMode.Open))
{
chunk.FeedData(fileStream);
fileStream.Close();
}
AltChunk altChunk = new AltChunk();
AltChunkProperties altChunkProperties = new AltChunkProperties();
MatchSource matchSrc = new MatchSource();
matchSrc.Val = true;
altChunkProperties.Append(matchSrc);
altChunk.AppendChild(altChunkProperties);
altChunk.Id = altChunkId;
mainPart.Document.Body.Append(altChunk);
mainPart.Document.Save();
When I open one of the template documents in the Productivity tool I can see all the elements which I'd expect:
However, when I look at the 'master' document, all I can get down to is the MatchSource:
What I don't understand is why I can't see the paragraph tags etc in the master document that's produced. Can anybody help me understand how I would see this information? Is there something wrong with the document structure?

That's because of the way altchunks are merged to the master document. altchunks won't change the master document's openxml markup but it does add the file as embedded resources. Think of it as if you are attaching something to an email but not adding it to the actual email body text. There are some third party solutions like Document Builder that will merge the document by modifying the master documents markup. Then you can see what you are expecting in the productivity tool.

Related

How to properly close Word documents after Documents.Open

I have the following code for a C# console app. It parses a Word document for textboxes and inserts the same text into the document at the textbox anchor point with markup. This is so I can convert to Markdown using pandoc, including textbox content which is not available due to https://github.com/jgm/pandoc/issues/3086. I can then replace my custom markup with markdown after conversion.
The console app is called in a PowerShell loop for all documents in a target list.
When I first run the Powershell script, all documents are opened and saved (with a new name) without error. But the next time I run it, I get an occasional popup error:
The last time you opened '' it caused a serious error. Do you still want to open it?
I can get through this by selecting yes on every popup, but this requires intervention and is tedious and slow. I want to know why this code results in this problem?
string path = args[0];
Console.WriteLine($"Parsing {path}");
Application word = new Application();
Document doc = word.Documents.Open(path);
try
{
foreach (Shape shp in doc.Shapes)
{
if (shp.TextFrame.HasText != 0)
{
string text = shp.TextFrame.TextRange.Text;
int page = shp.Anchor.Information[WdInformation.wdActiveEndPageNumber];
string summary = Regex.Replace(text, #"\r\n?|\n", " ");
Console.WriteLine($"++++textbox++++ Page {page}: {summary.Substring(0, Math.Min(summary.Length, 40))}");
string newtext = #$"{Environment.NewLine}TEXTBOX START%%%{text}%%%TEXTBOX END{Environment.NewLine}";
var range = shp.Anchor;
range.InsertBefore(Environment.NewLine);
range.Collapse();
range.Text = newtext;
range.set_Style(WdBuiltinStyle.wdStyleNormal);
}
}
string newFile = Path.GetFullPath(path) + ".notb.docx";
doc.SaveAs2(newFile);
}
finally
{
doc.Close();
word.Quit();
}
The console app is called in a PowerShell loop for all documents in a target list.
You can automate Word from your PowerShell script directly without involving any other dependencies. At least that will allow you to keep a single Word instance without creating each time a new Word Application instance for each document:
Application word = new Application();
Document doc = word.Documents.Open(path);
In the loop you could just open documents for processing and then closing them. It should improve the overall performance of your solution.
When you are done processing a document you need to close it by using the Close method which closes the specified document.
Also when a new Word Application instance is created, don't forget to close it as well by calling the Quit method which quits Microsoft Word and optionally saves or routes the open documents.
Application.Quit SaveChanges:=wdSaveChanges, OriginalFormat:=wdWordDocument

Issue while adding/updating TOC in MS Word using OpenXML

I have a requirement to add/update TOC - Table of Contents in MS document using OpenXML. I am facing challenges to achieve the same. I am using MS Office 2016.
I have tried all the options from this post:
How to generate Table Of Contents using OpenXML SDK 2.0?
Also gone through Eric White videos.
I am trying to use UpdateField option and able to add empty TOC following the sample code from the above link.
However when I open the document I am not getting a pop-up dialog which will ask to update the TOC.
Here is the sample code:
var sdtBlock = new SdtBlock();
sdtBlock.InnerXml = GetTOC(); //TOC Xml
document.MainDocumentPart.Document.Body.AppendChild(sdtBlock);
DocumentFormat.OpenXml.Wordprocessing.SimpleField f;
f = new SimpleField();
f.Instruction = "sdtContent";
f.Dirty = true;
document.MainDocumentPart.Document.Body.Append(f);
var setting = document.MainDocumentPart.DocumentSettingsPart;
if (setting != null)
{
document.MainDocumentPart.DocumentSettingsPart.Settings.Append(new UpdateFieldsOnOpen() { Val = new DocumentFormat.OpenXml.OnOffValue(true)});
document.MainDocumentPart.DocumentSettingsPart.Settings.Save();
}
It is displaying default message, whereas I have valid entries (Headings).
Is it due to MS Office 2016? UpdateField Pop-up is not coming?
I don't want to go with below options:
Word Automation - Due to Microsoft.Office.Interop.Word
Word Automation Services - Require Sharepoint for this
Adding Macro - As it is asking to save the document every time I open it.
Also let me know if there is any better option to create/update TOC.
Your answer/comment is really very helpful.

UCM service for searching in all revisions

I would like to search for a text in an attribute of UCM content using RIDC. If I use GET_SEARCH_RESULTS service, I will be getting only the latest revision the of result document. But I want to get all the revisions, which will fall in to the given search criteria. is there any way for the same?
Sample code is here..
String whereClause = "UPPER(XCOMMENTS) LIKE '%VALUE%'";
dataBinder.putLocal("IdcService", "GET_DATARESULTSET");
dataBinder.putLocal("dataSource", "Documents");
dataBinder.putLocal("whereClause", whereClause);
dataBinder.putLocal("resultName", "YourResult");
ServiceResponse response =
idcClient.sendRequest(userContext, dataBinder);
System.out.println(response.toString());
DataBinder serverBinder = response.getResponseAsBinder();
DataResultSet resultSet = serverBinder.getResultSet("YourResult");
Do you want to search for against full-text or metadata?
If metadata, you should be able to use service GET_DATARESULTSET and dataSource RevisionIDs.
If full-text, you might need to roll your own.

Combine two PDF-a documents using ITextSharp

hoping that someone can see the flaw in my code to merge to PDF-a documents using ITextSharp. Currently it complains about missing metadata which PDF-a requires.
Document document = new Document();
MemoryStream ms = new MemoryStream();
using (PdfACopy pdfaCopy = new PdfACopy(document, ms, PdfAConformanceLevel.PDF_A_1A))
{
document.Open();
using (PdfReader reader = new PdfReader("Doc1.pdf"))
{
pdfaCopy.AddDocument(reader);
}
using (PdfReader reader = new PdfReader("doc2.pdf"))
{
pdfaCopy.AddDocument(reader);
}
}
The exact error received is
Unhandled Exception: iTextSharp.text.pdf.PdfAConformanceException: The document catalog dictionary of a PDF/A conforming file shall contain
the Metadata key
I was hoping that the 'document catalog dictionary' would be copied as well, but I guess the 'new Document()' creates an empty non-conforming document or something.
Thanks! Hope you can help
Wouter
You need to add this line:
copy.CreateXmpMetadata();
This will create some default XMP metadata. Of course: if you want to create your own XMP file containing info about the documents you're about to merge, you can also use:
copy.XmpMetadata = myMetaData;
where myMetaData is a byte array containing a correct XMP stream.
I hope you understand that iText can't automatically create the correct metadata. Providing metadata is something that needs human attention.

Serializing OpenXML Parts Elements Storing in VARBINARY SQL 2005

I am building a solution that allows users to pick and chose sections from a Word template, populate those sections with content from a database, and assemble the 1k
new data into a new .docx document
So far, I have successful methodologies for locating content and transplanting that content into a new document. I am using the OpenXML SDK 2.0 to locate content by Styles and Content Controls. I am able to create IEnumerable objects containing elements such as Paragraphs, SdtBlocks, Run, etc.
I need to find an elegant way to serialize these element blocks so I can store them as whole blocks of type VARBINARY in a SQL 2005 database. Can someone please point me to a viable example for serializing these OpenXML parts/elements?
I am working on Excel at the moment but I think your problem is similar in nature.
From the code below I can extract the XML code of the row and then store it.
private string GetContents(uint rowIndex)
{
return GetExistingRow(rowIndex).OuterXml;
}
private Row GetExistingRow(uint rowIndex)
{
return SheetData.
Elements<Row>().
Where(r => r.RowIndex == rowIndex).
FirstOrDefault();
}
please note the SheetData object is extracted as
this.SheetData = WorksheetPart.Worksheet.GetFirstChild<SheetData>()
I hope this helps.