Insert multiple lines of text into a Rich Text content control with OpenXML - openxml

I'm having difficulty getting a content control to follow multi-line formatting. It seems to interpret everything I'm giving it literally. I am new to OpenXML and I feel like I must be missing something simple.
I am converting my multi-line string using this function.
private static void parseTextForOpenXML(Run run, string text)
{
string[] newLineArray = { Environment.NewLine, "<br/>", "<br />", "\r\n" };
string[] textArray = text.Split(newLineArray, StringSplitOptions.None);
bool first = true;
foreach (string line in textArray)
{
if (!first)
{
run.Append(new Break());
}
first = false;
Text txt = new Text { Text = line };
run.Append(txt);
}
}
I insert it into the control with this
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, string text)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>().FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>().Val == contentControlTag);
if (element == null)
throw new ArgumentException("ContentControlTag " + contentControlTag + " doesn't exist.");
element.Descendants<Text>().First().Text = text;
element.Descendants<Text>().Skip(1).ToList().ForEach(t => t.Remove());
return doc;
}
I call it with something like...
doc.InsertText("Primary", primaryRun.InnerText);
Although I've tried InnerXML and OuterXML as well. The results look something like
Example AttnExample CompanyExample AddressNew York, NY 12345 or
<w:r xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:t>Example Attn</w:t><w:br /><w:t>Example Company</w:t><w:br /><w:t>Example Address</w:t><w:br /><w:t>New York, NY 12345</w:t></w:r>
The method works fine for simple text insertion. It's just when I need it to interpret the XML that it doesn't work for me.
I feel like I must be super close to getting what I need, but my fiddling is getting me nowhere. Any thoughts? Thank you.

I believe the way I was trying to do it was doomed to fail. Setting the Text attribute of an element is always going to be interpreted as text to be displayed it seems. I ended up having to take a slightly different tack. I created a new insert method.
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, Paragraph paragraph)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>().FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>().Val == contentControlTag);
if (element == null)
throw new ArgumentException("ContentControlTag " + contentControlTag + " doesn't exist.");
OpenXmlElement cc = element.Descendants<Text>().First().Parent;
cc.RemoveAllChildren();
cc.Append(paragraph);
return doc;
}
It starts the same, and gets the Content Control by searching for it's Tag. But then I get it's parent, remove the Content Control elements that were there and just replace them with a paragraph element.
It's not exactly what I had envisioned, but it seems to work for my needs.

Related

How to catch *all* characters sent to Text field from a HID barcode scanner?

I need to capture input from a barcode scanner. Up until now the input has been just simple alphanum text which I have captured in one Text field. I added a ModifyListener to the Text field and am able to see the input arrive. That has worked fine.
I now need to handle a more complex matrix code which contains values for multiple fields. The values are separated by non-printable characters such as RS, GS and EOT (0x1E, 0x1D, 0x04). The complete data stream has a well-defined header and an EOT at the end, so I am hoping that I can detect barcode input as opposed to manual input.
When a barcode is detected, I can use the record separators RS to split the message and insert the values into the relevant Text fields.
However, the standard key handler on the Text controls ignore these non-printable characters and they do not appear in the controls text. This makes it impossible to proceed as planned.
How could I modify these Text fields to accept and store all characters? Or is there an alternative approach I could use?
This is the code I used to handle the barcode stream.
public class Main
{
static StringBuilder sb = new StringBuilder();
public static void main(String[] args)
{
Display d = new Display();
Shell shell = new Shell(d);
shell.setLayout(new FillLayout());
Text text = new Text(shell, 0);
text.addListener(SWT.KeyDown, new Listener()
{
#Override
public void handleEvent(Event e)
{
// only accept real characters
if (e.character != 0 && e.keyCode < 0x1000000)
{
sb.append(e.character);
String s = sb.toString();
// have start and end idents in buffer?
int i = s.indexOf("[)>");
if (i > -1)
{
int eot = s.indexOf("\u0004", i);
if (eot > -1)
{
String message = s.substring(i, eot + 1);
handleMessageHere(message);
// get ready for next message
sb = new StringBuilder();
}
}
}
}
});
shell.open();
while (!shell.isDisposed())
{
if (!d.readAndDispatch())
d.sleep();
}
}

Retrieve reference to comment OpenXML

I am trying to pull out the text from a Word document that is referenced by a comment in OpenXML. I can easily get the text of a comment, but not the paragraph text in the document that the comment is referencing.
The image I attached shows a comment and the related text. I am having a lot of trouble finding an example of how to get the referenced text. How can I get this text?
The solution is to get the Id of the comment which as you said you already know how to retrieve, and then search the document for a CommentRangeStart element with the same Id. When you have found it, you can loop over .NextSibling() until you hit a CommentRangeEnd element.
The elements between CommentRangeStart and CommentRangeEnd is the referenced part, which obviously can be multiple runs, paragraphs, images, whatever. So you will have to handle the collected elements somehow afterwards.
I made a test document looking like this:
I've made this code to test it:
using (var wordDoc = WordprocessingDocument.Open(#"c:\test\test.docx", true))
{
MainDocumentPart mainPart = wordDoc.MainDocumentPart;
var document = mainPart.Document;
var comments = mainPart.WordprocessingCommentsPart.Comments.ChildElements;
foreach(Comment comment in comments)
{
string commentId = comment.Id;
string commentText = comment.InnerText;
OpenXmlElement rangeStart = document.Descendants<CommentRangeStart>().Where(c => c.Id == commentId).FirstOrDefault();
List<OpenXmlElement> referenced = new List<OpenXmlElement>();
rangeStart = rangeStart.NextSibling();
while(!(rangeStart is CommentRangeEnd))
{
referenced.Add(rangeStart);
rangeStart = rangeStart.NextSibling();
}
Console.WriteLine("Comment Id " + commentId + " with text \"" + " " + commentText + "\" references =>");
foreach (var ele in referenced)
{
if(!string.IsNullOrWhiteSpace(ele.InnerText))
{
Console.WriteLine(" " + ele.InnerText);
}
}
}
Console.ReadKey();
}
Which produces this output
I hope it helps!
I could not get your solution to work. However I found a workaround.
OpenXmlElement rangeStart = document.Descendants<CommentRangeStart>().Where(c => c.Id == commentId).FirstOrDefault();
bool breakLoop = false;
rangeStart = rangeStart.Parent;
while (true) // Looping through items between commentRangeStart and commentRangeEnd.
{
if (rangeStart.NextSibling() == null)
{
break;
}
foreach (var ele in rangeStart.ChildElements)
{
if (!(ele is CommentRangeEnd))
{
if (!(string.IsNullOrWhiteSpace(ele.InnerText)))
{
referenced.Add(ele);
}
}
else
{
breakLoop = true;
}
if (breakLoop)
break;
}
rangeStart = rangeStart.NextSibling();
}
Hence, instead of looping through the paragraph in which the CommenRageStart exists, since one comment may be built up of several paragraphs, I use the parent node in order to trace back and forth between the paragraphs. Finnaly, as I reach the CommentRangeEnd I can break the loop and process the data however is required.

Remove Content controls after adding text using open xml

By the help of some very kind community members here I managed to programatically create a function to replace text inside content controls in a Word document using open xml. After the document is generated it removes the formatting of the text after I replace the text.
Any ideas on how I can still keep the formatting in word and remove the content control tags ?
This is my code:
using (var wordDoc = WordprocessingDocument.Open(mem, true))
{
var mainPart = wordDoc.MainDocumentPart;
ReplaceTags(mainPart, "FirstName", _firstName);
ReplaceTags(mainPart, "LastName", _lastName);
ReplaceTags(mainPart, "WorkPhoe", _workPhone);
ReplaceTags(mainPart, "JobTitle", _jobTitle);
mainPart.Document.Save();
SaveFile(mem);
}
private static void ReplaceTags(MainDocumentPart mainPart, string tagName, string tagValue)
{
//grab all the tag fields
IEnumerable<SdtBlock> tagFields = mainPart.Document.Body.Descendants<SdtBlock>().Where
(r => r.SdtProperties.GetFirstChild<Tag>().Val == tagName);
foreach (var field in tagFields)
{
//remove all paragraphs from the content block
field.SdtContentBlock.RemoveAllChildren<Paragraph>();
//create a new paragraph containing a run and a text element
Paragraph newParagraph = new Paragraph();
Run newRun = new Run();
Text newText = new Text(tagValue);
newRun.Append(newText);
newParagraph.Append(newRun);
//add the new paragraph to the content block
field.SdtContentBlock.Append(newParagraph);
}
}
Keeping the style is a tricky problem as there could be more than one style applied to the text you are trying to replace. What should you do in that scenario?
Assuming a simple case of one style (but potentially over many Paragraphs, Runs and Texts) you could keep the first Text element you come across per SdtBlock and place your required value in that element then delete any further Text elements from the SdtBlock. The formatting from the first Text element will then be maintained. Obviously you can apply this theory to any of the Text blocks; you don't have to necessarily use the first. The following code should show what I mean:
private static void ReplaceTags(MainDocumentPart mainPart, string tagName, string tagValue)
{
IEnumerable<SdtBlock> tagFields = mainPart.Document.Body.Descendants<SdtBlock>().Where
(r => r.SdtProperties.GetFirstChild<Tag>().Val == tagName);
foreach (var field in tagFields)
{
IEnumerable<Text> texts = field.SdtContentBlock.Descendants<Text>();
for (int i = 0; i < texts.Count(); i++)
{
Text text = texts.ElementAt(i);
if (i == 0)
{
text.Text = tagValue;
}
else
{
text.Remove();
}
}
}
}

Bolding with Rich Text Values in iTextSharp

Is it possible to bold a single word within a sentence with iTextSharp? I'm working with large paragraphs of text coming from xml, and I am trying to bold several individual words without having to break the string into individual phrases.
Eg:
document.Add(new Paragraph("this is <b>bold</b> text"));
should output...
this is bold text
As #kuujinbo pointed out there is the XMLWorker object which is where most of the new HTML parsing work is being done. But if you've just got simple commands like bold or italic you can use the native iTextSharp.text.html.simpleparser.HTMLWorker class. You could wrap it into a helper method such as:
private Paragraph CreateSimpleHtmlParagraph(String text) {
//Our return object
Paragraph p = new Paragraph();
//ParseToList requires a StreamReader instead of just text
using (StringReader sr = new StringReader(text)) {
//Parse and get a collection of elements
List<IElement> elements = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(sr, null);
foreach (IElement e in elements) {
//Add those elements to the paragraph
p.Add(e);
}
}
//Return the paragraph
return p;
}
Then instead of this:
document.Add(new Paragraph("this is <b>bold</b> text"));
You could use this:
document.Add(CreateSimpleHtmlParagraph("this is <b>bold</b> text"));
document.Add(CreateSimpleHtmlParagraph("this is <i>italic</i> text"));
document.Add(CreateSimpleHtmlParagraph("this is <b><i>bold and italic</i></b> text"));
I know that this is an old question, but I could not get the other examples here to work for me. But adding the text in Chucks with different fonts did.
//define a bold font to be used
Font boldFont = FontFactory.GetFont(FontFactory.HELVETICA_BOLD, 12);
//add a phrase and add Chucks to it
var phrase2 = new Phrase();
phrase2.Add(new Chunk("this is "));
phrase2.Add(new Chunk("bold", boldFont));
phrase2.Add(new Chunk(" text"));
document.Add(phrase2);
Not sure how complex your Xml is, but try XMLWorker. Here's a working example with an ASP.NET HTTP handler:
<%# WebHandler Language="C#" Class="boldText" %>
using System;
using System.IO;
using System.Web;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.xml;
using iTextSharp.tool.xml;
public class boldText : IHttpHandler {
public void ProcessRequest (HttpContext context) {
HttpResponse Response = context.Response;
Response.ContentType = "application/pdf";
StringReader xmlSnippet = new StringReader(
"<p>This is <b>bold</b> text</p>"
);
using (Document document = new Document()) {
PdfWriter writer = PdfWriter.GetInstance(
document, Response.OutputStream
);
document.Open();
XMLWorkerHelper.GetInstance().ParseXHtml(
writer, document, xmlSnippet
);
}
}
public bool IsReusable { get { return false; } }
}
You may have to pre-process your Xml before sending it to XMLWorker. (notice the snippet is a bit different from yours) Support for parsing HTML/Xml was released relatively recently, so your mileage may vary.
Here is another XMLWorker example that uses a different overload of ParseHtml and returns a Phrase instead of writing it directly to the document.
private static Phrase CreateSimpleHtmlParagraph(String text)
{
var p = new Phrase();
var mh = new MyElementHandler();
using (TextReader sr = new StringReader("<html><body><p>" + text + "</p></body></html>"))
{
XMLWorkerHelper.GetInstance().ParseXHtml(mh, sr);
}
foreach (var element in mh.elements)
{
foreach (var chunk in element.Chunks)
{
p.Add(chunk);
}
}
return p;
}
private class MyElementHandler : IElementHandler
{
public List<IElement> elements = new List<IElement>();
public void Add(IWritable w)
{
if (w is iTextSharp.tool.xml.pipeline.WritableElement)
{
elements.AddRange(((iTextSharp.tool.xml.pipeline.WritableElement)w).Elements());
}
}
}

Word/Office Automation - How to retrieve selected value from a Drop-down form field

I am trying to retrieve the value of all fields in a word document via office automation using c#. The code is shown below however if the field is a drop-down then the value of the range text is always empty even though I know it is populated. If it is a simple text field then I can see the range text. How do I get the selected drop down item? I feel there must be something quite simple that I'm doing wrong...
private void OpenWordDoc(string filename) {
Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
Document doc = app.Documents.Open(filename, ReadOnly: true, Visible: false);
foreach (Field f in doc.Fields) {
string bookmarkName = "??";
if (f.Code.Bookmarks.Count > 0) {
bookmarkName = f.Code.Bookmarks[1].Name; // have to start at 1 because it is vb style!
}
Debug.WriteLine(bookmarkName);
Debug.WriteLine(f.Result.Text); // This is empty when it is a drop down field
}
doc.Close();
app.Quit();
}
Aha - If I scan through FormFields instead of Fields then all is good...
foreach (FormField f in doc.FormFields) {
string bookmarkName = "??";
if (ff.Range.Bookmarks.Count > 0) {
bookmarkName = ff.Range.Bookmarks[1].Name; // have to start at 1 because it is vb style!
}
Debug.WriteLine(bookmarkName);
Debug.WriteLine(ff.Result); // This is empty when it is a drop down field
}
Problem solved. Phew.