Adding Runs to Paragraphs - ms-word

I'm trying to convert xml formatted with tags to a DOCX file. I'm not generating a new document, but inserting text in a template document.
<p id="_fab91699-6d85-4ce5-b0b5-a17197520a7f">This document is amongst a series of International Standards dealing with the conversion of systems of writing produced by Technical Committee ISO/TC 46, <em>Information and documentation</em>, WG 3 <em>Conversion of written languages</em>.</p>
I collected the text fragments in an array, then tried to process them with code like this:
foreach (var bkmkStart in wordDoc.MainDocumentPart.RootElement.Descendants<BookmarkStart>())
{
if (bkmkStart.Name == "ForewordText")
{
forewordbkmkParent = bkmkStart.Parent;
for (var y = 0; y <= ForewordArray.Length / (double)2 - 1; y++)
{
if (ForewordArray[0, y] == "Normal")
{
if (y < ForewordArray.Length / (double)2 - 1)
{
if (ForewordArray[0, y + 1] == "Normal")
{
forewordbkmkParent.InsertBeforeSelf(new Paragraph(new Run(new Text(ForewordArray[1, y]))));
}
else
{
fPara = forewordbkmkParent.InsertBeforeSelf(new Paragraph(new Run(new Text(ForewordArray[1, y]))));
}
}
else
{
fPara.InsertAfter(new Run(new Text(ForewordArray[1, y])), fPara.GetFirstChild<Run>());
}
}
else
{
NewRun = forewordbkmkParent.InsertBeforeSelf(new Run());
NewRunProps = new RunProperties();
NewRunProps.AppendChild<Italic>(new Italic());
NewRun.AppendChild<RunProperties>(NewRunProps);
NewRun.AppendChild(new Text(ForewordArray[1, y]));
}
}
}
}
but I end up with malformed XML because the runs are inserted after the paragraphs instead of inside them:
<w:p>
<w:r>
<w:t>This document is amongst a series of International Standards dealing with the conversion of systems of writing produced by Technical Committee ISO/TC 46, </w:t>
</w:r>
</w:p>
<w:r>
<w:rPr>
<w:i />
</w:rPr>
<w:t>Information and documentation</w:t>
</w:r>
<w:p>
<w:r>
<w:t>, WG 3 </w:t>
</w:r>
<w:r>
<w:t>.</w:t>
</w:r>
</w:p>
<w:r>
<w:rPr>
<w:i />
</w:rPr>
<w:t>Conversion of written languages</w:t>
</w:r>
Doing this the right way, using the SDK, would be best. As an alternative, I was able to create a string with all the correct XML and text using regexes, but I can't find a WordprocessingDocument method to turn that into an XML fragment that I can insert.

The solution for this kind of problem is to perform a pure functional transformation, as shown in the following code example.
The code example uses the sample XML element <p> given in the question (see Xml constant below). It transforms it into a corresponding Open XML w:p element, i.e., a Paragraph instance in terms of the strongly-typed classes provided by the Open XML SDK. The expected outer XML of that w:p or Paragraph is defined by the OuterXml constant.
using System;
using System.Linq;
using System.Xml.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Wordprocessing;
using Xunit;
namespace CodeSnippets.Tests.OpenXml.Wordprocessing
{
public class XmlTransformationTests
{
private const string Xml =
#"<p id=""_fab91699-6d85-4ce5-b0b5-a17197520a7f"">" +
#"This document is amongst a series of International Standards dealing with the conversion of systems of writing produced by Technical Committee ISO/TC 46, " +
#"<em>Information and documentation</em>" +
#", WG 3 " +
#"<em>Conversion of written languages</em>" +
#"." +
#"</p>";
private const string OuterXml =
#"<w:p xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">" +
#"<w:r><w:t xml:space=""preserve"">This document is amongst a series of International Standards dealing with the conversion of systems of writing produced by Technical Committee ISO/TC 46, </w:t></w:r>" +
#"<w:r><w:rPr><w:i /></w:rPr><w:t>Information and documentation</w:t></w:r>" +
#"<w:r><w:t xml:space=""preserve"">, WG 3 </w:t></w:r>" +
#"<w:r><w:rPr><w:i /></w:rPr><w:t>Conversion of written languages</w:t></w:r>" +
#"<w:r><w:t>.</w:t></w:r>" +
#"</w:p>";
[Fact]
public void CanTransformXmlToOpenXml()
{
// Arrange, creating an XElement based on the given XML.
var xmlParagraph = XElement.Parse(Xml);
// Act, transforming the XML into Open XML.
var paragraph = (Paragraph) TransformElementToOpenXml(xmlParagraph);
// Assert, demonstrating that we have indeed created an Open XML Paragraph instance.
Assert.Equal(OuterXml, paragraph.OuterXml);
}
private static OpenXmlElement TransformElementToOpenXml(XElement element)
{
return element.Name.LocalName switch
{
"p" => new Paragraph(element.Nodes().Select(TransformNodeToOpenXml)),
"em" => new Run(new RunProperties(new Italic()), CreateText(element.Value)),
"b" => new Run(new RunProperties(new Bold()), CreateText(element.Value)),
_ => throw new ArgumentOutOfRangeException()
};
}
private static OpenXmlElement TransformNodeToOpenXml(XNode node)
{
return node switch
{
XElement element => TransformElementToOpenXml(element),
XText text => new Run(CreateText(text.Value)),
_ => throw new ArgumentOutOfRangeException()
};
}
private static Text CreateText(string text)
{
return new Text(text)
{
Space = text.Length > 0 && (char.IsWhiteSpace(text[0]) || char.IsWhiteSpace(text[^1]))
? new EnumValue<SpaceProcessingModeValues>(SpaceProcessingModeValues.Preserve)
: null
};
}
}
}
The above sample deals with <p> (paragraph), <em> (emphasis / italic), and <b> (bold) elements. Adding further formatting elements (e.g., underlining) is easy.
Note that the sample code makes the simplifying assumption that <em>, <b>, and potentially further formatting elements are not nested. Adding the capability to nest those elements would make the sample code a little more complicated (but it's obviously possible).

Related

How to merge word documents with different headers using openxml?

I am trying to merge multiple documents into a single one by following examples as posted in this other post.
I am using AltChunk altChunk = new AltChunk(). When documents are merged, it does not seem to retain seperate hearders of each document. The merged document will contain the headers of the first document during the merging. If the first document being merged contains no hearders, then all the rest of the newly merged document will contain no headers, and vise versa.
My question is, how can I preserve different headers of the documents being merged?
Merge multiple word documents into one Open Xml
using System;
using System.IO;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
namespace WordMergeProject
{
public class Program
{
private static void Main(string[] args)
{
byte[] word1 = File.ReadAllBytes(#"..\..\word1.docx");
byte[] word2 = File.ReadAllBytes(#"..\..\word2.docx");
byte[] result = Merge(word1, word2);
File.WriteAllBytes(#"..\..\word3.docx", result);
}
private static byte[] Merge(byte[] dest, byte[] src)
{
string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString();
var memoryStreamDest = new MemoryStream();
memoryStreamDest.Write(dest, 0, dest.Length);
memoryStreamDest.Seek(0, SeekOrigin.Begin);
var memoryStreamSrc = new MemoryStream(src);
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStreamDest, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
AlternativeFormatImportPart altPart =
mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
altPart.FeedData(memoryStreamSrc);
var altChunk = new AltChunk();
altChunk.Id = altChunkId;
OpenXmlElement lastElem = mainPart.Document.Body.Elements<AltChunk>().LastOrDefault();
if(lastElem == null)
{
lastElem = mainPart.Document.Body.Elements<Paragraph>().Last();
}
//Page Brake einfügen
Paragraph pageBreakP = new Paragraph();
Run pageBreakR = new Run();
Break pageBreakBr = new Break() { Type = BreakValues.Page };
pageBreakP.Append(pageBreakR);
pageBreakR.Append(pageBreakBr);
return memoryStreamDest.ToArray();
}
}
}
I encountered this question a few years ago and spent quite some time on it; I eventually wrote a blog article that links to a sample file. Achieving integrating files with headers and footers using Alt-Chunk is not straight-forward. I'll try to cover the essentials, here. Depending on what kinds of content the headers and footers contain (and assuming Microsoft has not addressed any of the problems I originally ran into) it may not be possible to rely soley on AltChunk.
(Note also that there may be Tools/APIs that can handle this - I don't know and asking that on this site would be off-topic.)
Background
Before attacking the problem, it helps to understand how Word handles different headers and footers. To get a feel for it, start Word...
Section Breaks / Unlinking headers/footers
Type some text on the page and insert a header
Move the focus to the end of the page and go to the Page Layout tab in the Ribbon
Page Setup/Breaks/Next Page section break
Go into the Header area for this page and note the information in the blue "tags": you'll see a section identifier on the left and "Same as previous" on the right. "Same as Previous" is the default, to create a different Header click the "Link to Previous" button in the Header
So, the rule is:
a section break is required, with unlinked headers (and/or footers),
in order to have different header/footer content within a document.
Master/Sub-documents
Word has an (in)famous functionality called "Master Document" that enables linking outside ("sub") documents into a "master" document. Doing so automatically adds the necessary section breaks and unlinks the headers/footers so that the originals are retained.
Go to Word's Outline view
Click "Show Document"
Use "Insert" to insert other files
Notice that two section breaks are inserted, one of the type "Next page" and the other "Continuous". The first is inserted in the file coming in; the second in the "master" file.
Two section breaks are necessary when inserting a file because the last paragraph mark (which contains the section break for the end of the document) is not carried over to the target document. The section break in the target document carries the information to unlink the in-coming header from those already in the target document.
When the master is saved, closed and re-opened the sub documents are in a "collapsed" state (file names as hyperlinks instead of the content). They can be expanded by going back to the Outline view and clicking the "Expand" button. To fully incorporate a sub-document into the document click on the icon at the top left next to a sub-document then clicking "Unlink".
Merging Word Open XML files
This, then, is the type of environment the Open XML SDK needs to create when merging files whose headers and footers need to be retained. Theoretically, either approach should work. Practically, I had problems with using only section breaks; I've never tested using the Master Document feature in Word Open XML.
Inserting section breaks
Here's the basic code for inserting a section break and unlinking headers before bringing in a file using AltChunk. Looking at my old posts and articles, as long as there's no complex page numbering involved, it works:
private void btnMergeWordDocs_Click(object sender, EventArgs e)
{
string sourceFolder = #"C:\Test\MergeDocs\";
string targetFolder = #"C:\Test\";
string altChunkIdBase = "acID";
int altChunkCounter = 1;
string altChunkId = altChunkIdBase + altChunkCounter.ToString();
MainDocumentPart wdDocTargetMainPart = null;
Document docTarget = null;
AlternativeFormatImportPartType afType;
AlternativeFormatImportPart chunk = null;
AltChunk ac = null;
using (WordprocessingDocument wdPkgTarget = WordprocessingDocument.Create(targetFolder + "mergedDoc.docx", DocumentFormat.OpenXml.WordprocessingDocumentType.Document, true))
{
//Will create document in 2007 Compatibility Mode.
//In order to make it 2010 a Settings part must be created and a CompatMode element for the Office version set.
wdDocTargetMainPart = wdPkgTarget.MainDocumentPart;
if (wdDocTargetMainPart == null)
{
wdDocTargetMainPart = wdPkgTarget.AddMainDocumentPart();
Document wdDoc = new Document(
new Body(
new Paragraph(
new Run(new Text() { Text = "First Para" })),
new Paragraph(new Run(new Text() { Text = "Second para" })),
new SectionProperties(
new SectionType() { Val = SectionMarkValues.NextPage },
new PageSize() { Code = 9 },
new PageMargin() { Gutter = 0, Bottom = 1134, Top = 1134, Left = 1318, Right = 1318, Footer = 709, Header = 709 },
new Columns() { Space = "708" },
new TitlePage())));
wdDocTargetMainPart.Document = wdDoc;
}
docTarget = wdDocTargetMainPart.Document;
SectionProperties secPropLast = docTarget.Body.Descendants<SectionProperties>().Last();
SectionProperties secPropNew = (SectionProperties)secPropLast.CloneNode(true);
//A section break must be in a ParagraphProperty
Paragraph lastParaTarget = (Paragraph)docTarget.Body.Descendants<Paragraph>().Last();
ParagraphProperties paraPropTarget = lastParaTarget.ParagraphProperties;
if (paraPropTarget == null)
{
paraPropTarget = new ParagraphProperties();
}
paraPropTarget.Append(secPropNew);
Run paraRun = lastParaTarget.Descendants<Run>().FirstOrDefault();
//lastParaTarget.InsertBefore(paraPropTarget, paraRun);
lastParaTarget.InsertAt(paraPropTarget, 0);
//Process the individual files in the source folder.
//Note that this process will permanently change the files by adding a section break.
System.IO.DirectoryInfo di = new System.IO.DirectoryInfo(sourceFolder);
IEnumerable<System.IO.FileInfo> docFiles = di.EnumerateFiles();
foreach (System.IO.FileInfo fi in docFiles)
{
using (WordprocessingDocument pkgSourceDoc = WordprocessingDocument.Open(fi.FullName, true))
{
IEnumerable<HeaderPart> partsHeader = pkgSourceDoc.MainDocumentPart.GetPartsOfType<HeaderPart>();
IEnumerable<FooterPart> partsFooter = pkgSourceDoc.MainDocumentPart.GetPartsOfType<FooterPart>();
//If the source document has headers or footers we want to retain them.
//This requires inserting a section break at the end of the document.
if (partsHeader.Count() > 0 || partsFooter.Count() > 0)
{
Body sourceBody = pkgSourceDoc.MainDocumentPart.Document.Body;
SectionProperties docSectionBreak = sourceBody.Descendants<SectionProperties>().Last();
//Make a copy of the document section break as this won't be imported into the target document.
//It needs to be appended to the last paragraph of the document
SectionProperties copySectionBreak = (SectionProperties)docSectionBreak.CloneNode(true);
Paragraph lastpara = sourceBody.Descendants<Paragraph>().Last();
ParagraphProperties paraProps = lastpara.ParagraphProperties;
if (paraProps == null)
{
paraProps = new ParagraphProperties();
lastpara.Append(paraProps);
}
paraProps.Append(copySectionBreak);
}
pkgSourceDoc.MainDocumentPart.Document.Save();
}
//Insert the source file into the target file using AltChunk
afType = AlternativeFormatImportPartType.WordprocessingML;
chunk = wdDocTargetMainPart.AddAlternativeFormatImportPart(afType, altChunkId);
System.IO.FileStream fsSourceDocument = new System.IO.FileStream(fi.FullName, System.IO.FileMode.Open);
chunk.FeedData(fsSourceDocument);
//Create the chunk
ac = new AltChunk();
//Link it to the part
ac.Id = altChunkId;
docTarget.Body.InsertAfter(ac, docTarget.Body.Descendants<Paragraph>().Last());
docTarget.Save();
altChunkCounter += 1;
altChunkId = altChunkIdBase + altChunkCounter.ToString();
chunk = null;
ac = null;
}
}
}
If there's complex page numbering (quoted from my blog article):
Unfortunately, there’s a bug in the Word application when integrating
Word document “chunks” into the main document. The process has the
nasty habit of not retaining a number of SectionProperties, among them
the one that sets whether a section has a Different First Page
() and the one to restart Page Numbering () in a section. As long as your documents don’t need to
manage these kinds of headers and footers you can probably use the
“altChunk” approach.
But if you do need to handle complex headers and footers the only
method currently available to you is to copy in the each document in
its entirety, part-by-part. This is a non-trivial undertaking, as
there are numerous possible types of Parts that can be associated not
only with the main document body, but also with each header and footer
part.
...or try the Master/Sub Document approach.
Master/Sub Document
This approach will certainly maintain all information, it will open as a Master document, however, and the Word API (either the user or automation code) is required to "unlink" the sub-documents to turn it into a single, integrated document.
Opening a Master Document file in the Open XML SDK Productivity Tool shows that inserting sub documents into the master document is a fairly straight-forward procedure:
The underlying Word Open XML for the document with one sub-document:
<w:body xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:p>
<w:pPr>
<w:pStyle w:val="Heading1" />
</w:pPr>
<w:subDoc r:id="rId6" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" />
</w:p>
<w:sectPr>
<w:headerReference w:type="default" r:id="rId7" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" />
<w:type w:val="continuous" />
<w:pgSz w:w="11906" w:h="16838" />
<w:pgMar w:top="1417" w:right="1417" w:bottom="1134" w:left="1417" w:header="708" w:footer="708" w:gutter="0" />
<w:cols w:space="708" />
<w:docGrid w:linePitch="360" />
</w:sectPr>
</w:body>
and the code:
public class GeneratedClass
{
// Creates an Body instance and adds its children.
public Body GenerateBody()
{
Body body1 = new Body();
Paragraph paragraph1 = new Paragraph();
ParagraphProperties paragraphProperties1 = new ParagraphProperties();
ParagraphStyleId paragraphStyleId1 = new ParagraphStyleId(){ Val = "Heading1" };
paragraphProperties1.Append(paragraphStyleId1);
SubDocumentReference subDocumentReference1 = new SubDocumentReference(){ Id = "rId6" };
paragraph1.Append(paragraphProperties1);
paragraph1.Append(subDocumentReference1);
SectionProperties sectionProperties1 = new SectionProperties();
HeaderReference headerReference1 = new HeaderReference(){ Type = HeaderFooterValues.Default, Id = "rId7" };
SectionType sectionType1 = new SectionType(){ Val = SectionMarkValues.Continuous };
PageSize pageSize1 = new PageSize(){ Width = (UInt32Value)11906U, Height = (UInt32Value)16838U };
PageMargin pageMargin1 = new PageMargin(){ Top = 1417, Right = (UInt32Value)1417U, Bottom = 1134, Left = (UInt32Value)1417U, Header = (UInt32Value)708U, Footer = (UInt32Value)708U, Gutter = (UInt32Value)0U };
Columns columns1 = new Columns(){ Space = "708" };
DocGrid docGrid1 = new DocGrid(){ LinePitch = 360 };
sectionProperties1.Append(headerReference1);
sectionProperties1.Append(sectionType1);
sectionProperties1.Append(pageSize1);
sectionProperties1.Append(pageMargin1);
sectionProperties1.Append(columns1);
sectionProperties1.Append(docGrid1);
body1.Append(paragraph1);
body1.Append(sectionProperties1);
return body1;
}
}

update word Customxml part using of OpenXml API, but can't update "document.xml" in Main Document

we update custom xml part using C# code. we are successfully update document in widows. but can we open this document in Linux environment, it could not be changed value.
how can we achieve change of custom xml part in windows as well as Document.xml in word Folder?
Using a slightly enhanced example from another question, let's assume you have a MainDocumentPart (/word/document.xml) with a data-bound w:sdt (noting that this is a block-level structured document tag (SDT) containing a w:p in this example; it could also be an inline-level SDT contained in a w:p).
<?xml version="1.0" encoding="utf-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:sdt>
<w:sdtPr>
<w:tag w:val="Node" />
<w:dataBinding w:prefixMappings="xmlns:ex='http://example.com'"
w:xpath="ex:Root[1]/ex:Node[1]"
w:storeItemID="{C152D0E4-7C03-4CFA-97E6-721B2DCB5C7B}" />
</w:sdtPr>
<w:sdtContent>
<w:p>
<w:r>
<w:t>VALUE1</w:t>
</w:r>
</w:p>
</w:sdtContent>
</w:sdt>
</w:body>
</w:document>
Our MainDocumentPart references the following CustomXmlPart (/customXML/item.xml):
<?xml version="1.0" encoding="utf-8"?>
<ex:Root xmlns:ex="http://example.com">
<ex:Node>VALUE1</ex:Node>
</ex:Root>
The above CustomXmlPart then references the following CustomXmlPropertiesPart (/customXML/itemProps1.xml):
<?xml version="1.0" encoding="utf-8"?>
<ds:datastoreItem xmlns:ds="http://schemas.openxmlformats.org/officeDocument/2006/customXml"
ds:itemID="{C152D0E4-7C03-4CFA-97E6-721B2DCB5C7B}">
<ds:schemaRefs>
<ds:schemaRef ds:uri="http://example.com" />
</ds:schemaRefs>
</ds:datastoreItem>
The "enhancement" in this case is the additional w:tag element in the MainDocumentPart at the top. This w:tag element is one way to create an easy-to-use link between a w:sdt element and the custom XML element to which it is bound (e.g., ex:Node). In this example, the value Node happens to be the local name of the ex:Node element.
Finally, here is a working code example that shows how you can update both the CustomXmlPart and the MainDocumentPart. This uses the Open-XML-PowerTools.
[Fact]
public void CanUpdateCustomXmlAndMainDocumentPart()
{
// Define the initial and updated values of our custom XML element and
// the data-bound w:sdt element.
const string initialValue = "VALUE1";
const string updatedValue = "value2";
// Create the root element of the custom XML part with the initial value.
var customXmlRoot =
new XElement(Ns + "Root",
new XAttribute(XNamespace.Xmlns + NsPrefix, NsName),
new XElement(Ns + "Node", initialValue));
// Create the w:sdtContent child element of our w:sdt with the initial value.
var sdtContent =
new XElement(W.sdtContent,
new XElement(W.p,
new XElement(W.r,
new XElement(W.t, initialValue))));
// Create a WordprocessingDocument with the initial values.
using var stream = new MemoryStream();
using (WordprocessingDocument wordDocument = WordprocessingDocument.Create(stream, Type))
{
InitializeWordprocessingDocument(wordDocument, customXmlRoot, sdtContent);
}
// Assert the WordprocessingDocument has the expected, initial values.
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true))
{
AssertValuesAreAsExpected(wordDocument, initialValue);
}
// Update the WordprocessingDocument, using the updated value.
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true))
{
MainDocumentPart mainDocumentPart = wordDocument.MainDocumentPart;
// Change custom XML element, again using the simplifying assumption
// that we only have a single custom XML part and a single ex:Node
// element.
CustomXmlPart customXmlPart = mainDocumentPart.CustomXmlParts.Single();
XElement root = customXmlPart.GetXElement();
XElement node = root.Elements(Ns + "Node").Single();
node.Value = updatedValue;
customXmlPart.PutXDocument();
// Change the w:sdt contained in the MainDocumentPart.
XElement document = mainDocumentPart.GetXElement();
XElement sdt = FindSdtWithTag("Node", document);
sdtContent = sdt.Elements(W.sdtContent).Single();
sdtContent.ReplaceAll(
new XElement(W.p,
new XElement(W.r,
new XElement(W.t, updatedValue))));
mainDocumentPart.PutXDocument();
}
// Assert the WordprocessingDocument has the expected, updated values.
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(stream, true))
{
AssertValuesAreAsExpected(wordDocument, updatedValue);
}
}
private static void InitializeWordprocessingDocument(
WordprocessingDocument wordDocument,
XElement customXmlRoot,
XElement sdtContent)
{
MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
string storeItemId = CreateCustomXmlPart(mainDocumentPart, customXmlRoot);
mainDocumentPart.PutXDocument(new XDocument(
new XElement(W.document,
new XAttribute(XNamespace.Xmlns + "w", W.w.NamespaceName),
new XElement(W.body,
new XElement(W.sdt,
new XElement(W.sdtPr,
new XElement(W.tag, new XAttribute(W.val, "Node")),
new XElement(W.dataBinding,
new XAttribute(W.prefixMappings, $"xmlns:{NsPrefix}='{NsName}'"),
new XAttribute(W.xpath, $"{NsPrefix}:Root[1]/{NsPrefix}:Node[1]"),
new XAttribute(W.storeItemID, storeItemId))),
sdtContent)))));
}
private static void AssertValuesAreAsExpected(
WordprocessingDocument wordDocument,
string expectedValue)
{
// Retrieve inner text of w:sdt element.
MainDocumentPart mainDocumentPart = wordDocument.MainDocumentPart;
XElement sdt = FindSdtWithTag("Node", mainDocumentPart.GetXElement());
string sdtInnerText = GetInnerText(sdt);
// Retrieve inner text of custom XML element, making the simplifying
// assumption that we only have a single custom XML part. In reality,
// we would have to find the custom XML part to which our w:sdt elements
// are bound among any number of custom XML parts. Further, in our
// simplified example, we also assume there is a single ex:Node element.
CustomXmlPart customXmlPart = mainDocumentPart.CustomXmlParts.Single();
XElement root = customXmlPart.GetXElement();
XElement node = root.Elements(Ns + "Node").Single();
string nodeInnerText = node.Value;
// Assert those inner text are indeed equal.
Assert.Equal(expectedValue, sdtInnerText);
Assert.Equal(expectedValue, nodeInnerText);
}
private static XElement FindSdtWithTag(string tagValue, XElement openXmlCompositeElement)
{
return openXmlCompositeElement
.Descendants(W.sdt)
.FirstOrDefault(e => e
.Elements(W.sdtPr)
.Elements(W.tag)
.Any(tag => (string) tag.Attribute(W.val) == tagValue));
}
private static string GetInnerText(XElement openXmlElement)
{
return openXmlElement
.DescendantsAndSelf(W.r)
.Select(UnicodeMapper.RunToString)
.StringConcatenate();
}
The complete source code is contained in my CodeSnippets GitHub repository. Look for the DataBoundContentControlTests class.

Insert multiple lines of text into a Rich Text content control with OpenXML

I'm having difficulty getting a content control to follow multi-line formatting. It seems to interpret everything I'm giving it literally. I am new to OpenXML and I feel like I must be missing something simple.
I am converting my multi-line string using this function.
private static void parseTextForOpenXML(Run run, string text)
{
string[] newLineArray = { Environment.NewLine, "<br/>", "<br />", "\r\n" };
string[] textArray = text.Split(newLineArray, StringSplitOptions.None);
bool first = true;
foreach (string line in textArray)
{
if (!first)
{
run.Append(new Break());
}
first = false;
Text txt = new Text { Text = line };
run.Append(txt);
}
}
I insert it into the control with this
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, string text)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>().FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>().Val == contentControlTag);
if (element == null)
throw new ArgumentException("ContentControlTag " + contentControlTag + " doesn't exist.");
element.Descendants<Text>().First().Text = text;
element.Descendants<Text>().Skip(1).ToList().ForEach(t => t.Remove());
return doc;
}
I call it with something like...
doc.InsertText("Primary", primaryRun.InnerText);
Although I've tried InnerXML and OuterXML as well. The results look something like
Example AttnExample CompanyExample AddressNew York, NY 12345 or
<w:r xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:t>Example Attn</w:t><w:br /><w:t>Example Company</w:t><w:br /><w:t>Example Address</w:t><w:br /><w:t>New York, NY 12345</w:t></w:r>
The method works fine for simple text insertion. It's just when I need it to interpret the XML that it doesn't work for me.
I feel like I must be super close to getting what I need, but my fiddling is getting me nowhere. Any thoughts? Thank you.
I believe the way I was trying to do it was doomed to fail. Setting the Text attribute of an element is always going to be interpreted as text to be displayed it seems. I ended up having to take a slightly different tack. I created a new insert method.
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, Paragraph paragraph)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>().FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>().Val == contentControlTag);
if (element == null)
throw new ArgumentException("ContentControlTag " + contentControlTag + " doesn't exist.");
OpenXmlElement cc = element.Descendants<Text>().First().Parent;
cc.RemoveAllChildren();
cc.Append(paragraph);
return doc;
}
It starts the same, and gets the Content Control by searching for it's Tag. But then I get it's parent, remove the Content Control elements that were there and just replace them with a paragraph element.
It's not exactly what I had envisioned, but it seems to work for my needs.

Remove Content controls after adding text using open xml

By the help of some very kind community members here I managed to programatically create a function to replace text inside content controls in a Word document using open xml. After the document is generated it removes the formatting of the text after I replace the text.
Any ideas on how I can still keep the formatting in word and remove the content control tags ?
This is my code:
using (var wordDoc = WordprocessingDocument.Open(mem, true))
{
var mainPart = wordDoc.MainDocumentPart;
ReplaceTags(mainPart, "FirstName", _firstName);
ReplaceTags(mainPart, "LastName", _lastName);
ReplaceTags(mainPart, "WorkPhoe", _workPhone);
ReplaceTags(mainPart, "JobTitle", _jobTitle);
mainPart.Document.Save();
SaveFile(mem);
}
private static void ReplaceTags(MainDocumentPart mainPart, string tagName, string tagValue)
{
//grab all the tag fields
IEnumerable<SdtBlock> tagFields = mainPart.Document.Body.Descendants<SdtBlock>().Where
(r => r.SdtProperties.GetFirstChild<Tag>().Val == tagName);
foreach (var field in tagFields)
{
//remove all paragraphs from the content block
field.SdtContentBlock.RemoveAllChildren<Paragraph>();
//create a new paragraph containing a run and a text element
Paragraph newParagraph = new Paragraph();
Run newRun = new Run();
Text newText = new Text(tagValue);
newRun.Append(newText);
newParagraph.Append(newRun);
//add the new paragraph to the content block
field.SdtContentBlock.Append(newParagraph);
}
}
Keeping the style is a tricky problem as there could be more than one style applied to the text you are trying to replace. What should you do in that scenario?
Assuming a simple case of one style (but potentially over many Paragraphs, Runs and Texts) you could keep the first Text element you come across per SdtBlock and place your required value in that element then delete any further Text elements from the SdtBlock. The formatting from the first Text element will then be maintained. Obviously you can apply this theory to any of the Text blocks; you don't have to necessarily use the first. The following code should show what I mean:
private static void ReplaceTags(MainDocumentPart mainPart, string tagName, string tagValue)
{
IEnumerable<SdtBlock> tagFields = mainPart.Document.Body.Descendants<SdtBlock>().Where
(r => r.SdtProperties.GetFirstChild<Tag>().Val == tagName);
foreach (var field in tagFields)
{
IEnumerable<Text> texts = field.SdtContentBlock.Descendants<Text>();
for (int i = 0; i < texts.Count(); i++)
{
Text text = texts.ElementAt(i);
if (i == 0)
{
text.Text = tagValue;
}
else
{
text.Remove();
}
}
}
}

Bolding with Rich Text Values in iTextSharp

Is it possible to bold a single word within a sentence with iTextSharp? I'm working with large paragraphs of text coming from xml, and I am trying to bold several individual words without having to break the string into individual phrases.
Eg:
document.Add(new Paragraph("this is <b>bold</b> text"));
should output...
this is bold text
As #kuujinbo pointed out there is the XMLWorker object which is where most of the new HTML parsing work is being done. But if you've just got simple commands like bold or italic you can use the native iTextSharp.text.html.simpleparser.HTMLWorker class. You could wrap it into a helper method such as:
private Paragraph CreateSimpleHtmlParagraph(String text) {
//Our return object
Paragraph p = new Paragraph();
//ParseToList requires a StreamReader instead of just text
using (StringReader sr = new StringReader(text)) {
//Parse and get a collection of elements
List<IElement> elements = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(sr, null);
foreach (IElement e in elements) {
//Add those elements to the paragraph
p.Add(e);
}
}
//Return the paragraph
return p;
}
Then instead of this:
document.Add(new Paragraph("this is <b>bold</b> text"));
You could use this:
document.Add(CreateSimpleHtmlParagraph("this is <b>bold</b> text"));
document.Add(CreateSimpleHtmlParagraph("this is <i>italic</i> text"));
document.Add(CreateSimpleHtmlParagraph("this is <b><i>bold and italic</i></b> text"));
I know that this is an old question, but I could not get the other examples here to work for me. But adding the text in Chucks with different fonts did.
//define a bold font to be used
Font boldFont = FontFactory.GetFont(FontFactory.HELVETICA_BOLD, 12);
//add a phrase and add Chucks to it
var phrase2 = new Phrase();
phrase2.Add(new Chunk("this is "));
phrase2.Add(new Chunk("bold", boldFont));
phrase2.Add(new Chunk(" text"));
document.Add(phrase2);
Not sure how complex your Xml is, but try XMLWorker. Here's a working example with an ASP.NET HTTP handler:
<%# WebHandler Language="C#" Class="boldText" %>
using System;
using System.IO;
using System.Web;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.xml;
using iTextSharp.tool.xml;
public class boldText : IHttpHandler {
public void ProcessRequest (HttpContext context) {
HttpResponse Response = context.Response;
Response.ContentType = "application/pdf";
StringReader xmlSnippet = new StringReader(
"<p>This is <b>bold</b> text</p>"
);
using (Document document = new Document()) {
PdfWriter writer = PdfWriter.GetInstance(
document, Response.OutputStream
);
document.Open();
XMLWorkerHelper.GetInstance().ParseXHtml(
writer, document, xmlSnippet
);
}
}
public bool IsReusable { get { return false; } }
}
You may have to pre-process your Xml before sending it to XMLWorker. (notice the snippet is a bit different from yours) Support for parsing HTML/Xml was released relatively recently, so your mileage may vary.
Here is another XMLWorker example that uses a different overload of ParseHtml and returns a Phrase instead of writing it directly to the document.
private static Phrase CreateSimpleHtmlParagraph(String text)
{
var p = new Phrase();
var mh = new MyElementHandler();
using (TextReader sr = new StringReader("<html><body><p>" + text + "</p></body></html>"))
{
XMLWorkerHelper.GetInstance().ParseXHtml(mh, sr);
}
foreach (var element in mh.elements)
{
foreach (var chunk in element.Chunks)
{
p.Add(chunk);
}
}
return p;
}
private class MyElementHandler : IElementHandler
{
public List<IElement> elements = new List<IElement>();
public void Add(IWritable w)
{
if (w is iTextSharp.tool.xml.pipeline.WritableElement)
{
elements.AddRange(((iTextSharp.tool.xml.pipeline.WritableElement)w).Elements());
}
}
}