Replacing content in Word 2010 Textboxes using OpenXML - ms-word

Using the Open XML SDK I've been successful in programatically finding bookmarks or text strings in a word document and inserting new content. I'm using OpenXmlPowerTools.SearchAndReplacer to do the text search and replace and this post's answer for the bookmarks Replace bookmark text in Word file using Open XML SDK
This all fails when the bookmark or the text I am trying to replace is located inside a Textbox.
Why does neither approach work within a Textbox? The Word documents I am trying to replace content within use Texboxes for layout and I can't work out what the problem is.
Does anybody have suggestions as to what might be the problem? Thanks

I did this - it works on text boxes in the case where there are not multiple runs with text (like 1 word bolded
Dim searchQuery = From tx In mainPart.Document.Body.Descendants(Of Text)()
Where tx.Text.Contains(replaceData.OldText)
Dim i As Integer
For i = 0 To searchQuery.Count - 1
searchQuery(i).Text = searchQuery(i).Text.Replace(replaceData.OldText, replaceData.NewText)
Next

Here is the XML for a simple textbox with the word test in it:
<w:pict xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<v:shapetype id="_x0000_t202" coordsize="21600,21600" o:spt="202" path="m,l,21600r21600,l21600,xe" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml">
<v:stroke joinstyle="miter" />
<v:path gradientshapeok="t" o:connecttype="rect" />
</v:shapetype>
<v:shape id="_x0000_s1027" style="position:absolute;margin-left:0;margin-top:0;width:186.35pt;height:110.6pt;z-index:251660288;mso-width-percent:400;mso-height-percent:200;mso-position-horizontal:center;mso-width-percent:400;mso-height-percent:200;mso-width-relative:margin;mso-height-relative:margin" type="#_x0000_t202" xmlns:v="urn:schemas-microsoft-com:vml">
<v:textbox style="mso-fit-shape-to-text:t">
<w:txbxContent>
<w:p w:rsidR="00B558B5" w:rsidRDefault="00B558B5">
<w:proofErr w:type="gramStart" />
<w:r>
<w:t>test</w:t>
</w:r>
<w:proofErr w:type="gramEnd" />
</w:p>
</w:txbxContent>
</v:textbox>
</v:shape>
</w:pict>
You can see the structure is different then when searching for text within a bookmark since a textbox is actually stored as a picture. If you adjust your searching algorithm to deal with this different structure then you should be able to find the text and replace it.

Related

How to insert a new Word style into a RMarkdown file

Is there a simple way to do this using Knitr without using Pandoc? I tried adding some HTML <DIV Class="newStyle>&nbsp</div> into an .Rmd file, but the style didn't show up in the generated Word .docx.
Thanks, Sue.
My setup: Office 365 Pro Plus, RStudio 1.0.143.
I was able to get both the <div> and <span> syntax to work. First I created a new style in my reference.docx document with the same name I intended to use in the Pandoc markup tags. Careful what you name the style -- this worked when I used the name "SpanAdd" but did not work with the name "Span_Add." The <div> tag should be used when you want to specify a paragraph style -- the default "Linked Paragraph and Character" style type in Word works fine for this. However, the <span> tag is more finicky and I was only able to get it to work with a "Character" style type. I based my character styles on "Default Paragraph Text."
Anyway, once I modified a new style in the reference document and saved it, I was able to use these tags within an .Rmd file to generate marked-up text.

OpenXML freeze-fixed ID for w:tags

I have a java program that search rsidR="00CA303F" inside document.xml(unzipped of DOCX).
<w:sdtContent>
<w:r w:rsidR="00CA303F">
<w:rPr>
<w:rFonts w:cs="Arial"/>
<w:b/>
<w:sz w:val="18"/>
<w:szCs w:val="18"/>
<w:lang w:val="en-US"/>
</w:rPr>
<w:t>17-Jan-14</w:t>
</w:r>
</w:sdtContent>
The problem: if i change something like the date in the docx and after i save the file, this rsidR change! and im not able to find it next time in my program.
How i can freeze-fixed it? or which other fixed-element can i add to w:r for find it after saving file?
Solutions(not working) that i tryed: I added other tags(hoping they will not change), i tryed for example: w:rsidRDefault, w:id, w:val, w:rsidRPr to this w:r, but Word wont be able to open file docx after.
Word or the OpenXML file format do not offer a direct way to add an ID to an element, which is also persisted if the document is edited.
As a workaround, you can create a character style which you then apply to the run of text you are interested in. Then you can search for the w:rStyle element with the correct character style in the w:val attribute:
<w:r w:rsidRPr="00E05157">
<w:rPr>
<w:rStyle w:val="MyCharacterStyle"/>
</w:rPr>
<w:t>17-Jan-14</w:t>
</w:r>
It should be possible to assign a unique id to the containing w:sdt (in the descendant w:sdtPr/w:id/#w:val). See for example the docx4java documentation for sdtPr.
A good explanation of rsid's, and how they are used by MS Word, is in What's up with all those rsid's. In many application it is harmless to completely ignore them.

table caption tag for docx

in
http://officeopenxml.com/WPtableCaption.php
it is written that
<w:tblCaption w:val="caption text"/>
is the tag for table caption , but when I add it to the xml, I get error and also the caption is not shown.
When I add the caption directly from the Word it is added as :
<w:p w:rsidR="00346450" w:rsidRDefault="00346450" w:rsidP="00346450">
<w:pPr>
<w:pStyle w:val="Caption"/>
<w:keepNext/>
</w:pPr>
<w:r>
<w:t>caption text</w:t>
</w:r>
</w:p>
I use Word 2010, can someone explain this? maybe w:tblCaption isn't used anymore and it was not updated in officeopenxml.com ?
Take another look at the page you link to: the tblCaption tag is a child element of tblPr (table properties).
What this page does not tell you is that this is not a "caption" in the sense of the term Word users understand it. It's actually the "Alt-text" for a web-page, in case the Word document is saved as a web page. So it's never going to be visible in the Word document. You can see the option in the UI by selecting the table, going to the "Properties" dialog and choosing the "Alt Text" tab.
A "real" caption is the Word Open XML you show in your second code snippet. What marks it as a caption is the style applied to it. It can be positioned anywhere in the document, although Word's built-in tool to insert a caption will offer to place it above or below the object it's for.

OpenXML xml snippet for a bulleted list

I am using PHPWord to ger
nerate word elements. I want to insert a bulleted list using setValue in my template.
I tried inserting this snippet
<w:p>
<w:pPr>
<w:pStyle w:val="ListParagraph"/>
<w:numPr>
<w:ilvl w:val="0"/>
<w:numId w:val="1"/>
</w:numPr>
</w:pPr>
<w:r>
<w:t>One</w:t>
</w:r>
</w:p>
But somehow I am missing the style. where do I need to insert the style and what style?
I worked based on that page: https://msdn.microsoft.com/de-de/library/office/ee922775%28v=office.14%29.aspx
The <w:pStyle> element takes the style ID as its w:val attribute. So in this case, the 'List Paragraph' style. List Paragraph does not have a bullet. If you want a bullet you'll need to use List Bullet instead.
Note that this will only actually work if the style in question is explicitly defined in the /word/styles.xml part. The List Bullet style is a so-called 'built-in' style, and is not written by Word into the styles.xml part until it is used for the first time.
So it's possible you may need to add it yourself. The Word behavior when a paragraph is assigned an undefined style is simply to use the default paragraph style, probably Normal.

Word by word animation

I want do to word by word animation on a document. My document may be .doc .html .ppt. I think a macro in .doc may do a better job.
When the document is displayed I want to animate word by word by highlighting a word.
When I am speaking word should highlight and move to next word. This way I can sync my voice with the words.
I tried animation in PowerPoint but it displays word by word and it does not allow whole text to appear and then move around words.
Linking an animation such as highlighting of text to a sound file is allowed in epub 3.0; details can be found in the IDPF's spec, in the Media Overlay section. The first thing that you will need to do is mark up the text at your required level of granularity--by word, it sounds like. So the xhtml should look like:
<p><span id="word1>This</span> <span id="word2">is</span> <span id="word3">a</span> <span id="word4">sample</span>.</p>
You'll also need the audio file in the epub, of course, and then a .smil file to link the two together. The .smil file looks like:
<par id="first">
<text src="book.xhtml#word1"/>
<audio src="audio/audio.mp3" clipBegin="0s" clipEnd="0.65s"/>
</par>
<par id="second">
<text src="book.xhtml#word2"/>
<audio src="audio/audio.mp3" clipBegin="0.66s" clipEnd="1.4s"/>
</par>
...
You'll have to include the media-overlay attribute for the xhtml file in the manifest in the content.opf as well:
<manifest>
<item id="book" href="book.xhtml" media-type="application/zhtml+xml" media-overlay="smil-file"/>
<item id="smil-file" href="book_audio.smil" media-type="application/smil+xml"/>