Manipulate status of links in Word document with OpenXML SDK - ms-word

I have a Word-Document with some links to cells in Excel-files. In Word, I can get a context menu, that leads to a window with all the links of the document. There, I can see and manipulate properties of the links.
Amongst others, there is the part "Updatemethod for chosen link" (words may differ, I translated it from the German version), I have two radio-boxes with "automatic" / "manual". And a Checkbox "locked".
I want to modify (especially the locked-checkbox) these properties with OpenXML, but I did not find the place, where in the model this information is stored. I printed the OuterXML for a link with locked checked and for a link with locked unchecked, but did not find any differences in the parameter field (\a \f 5 \h * MERGEFORMAT - for both!)
Anyone knows, how I can modify this with OpenXML SDK?
Thanks in advance,
Frank

Word has different ways to represent the LINK in Office Open XML depending partly on the format of the link (e.g. whether you Paste Link to an object or to plain text).
For example, if you paste a link to a "Microsoft Excel Worksheet Object", although Word displays a LINK field in the document, the XML does not actually record the field code using either the simple or more complex encoding for field codes. It actually encodes the object in a <w:object> element that records information about a "shape", with the shape type in <v:shapetype>, the shape itself in <v:shape>, and information about the OLE link in <o:OLEObject>
In that case, Automatic link updating is recorded using
<o:OLEObject UpdateMode='Always'> for automatic links
and
<o:OLEObject UpdateMode='OnCall'> for manual links.
Whether or not the link is Locked is recorded in
<o:OLEObject><o:LockedField></o:LockedField<o:OLEObject>
(either as "false" or "" AFAICS).
Word reconstructs the LINK field code from the w:object information when it displays the document.
However, if you paste the link as text, the XML Word records will contain a complex field code construction, starting with a <w:fldChar w:fldCharType='begin' /> element.
In that case, the fact that the link is locked is indicated by a value of '1" in the w:fldLock attribute, and probably the absence of that attribute if it is not locked. e.g.
<w:fldChar w:fldCharType='begin' w:fldLock='1' />
In either case, an automatic link is indicated by the presence of the \a switch in the field code (reconstructed in the case of the first example). If there is no \a switch, it's not an automatic link.
That may not cover all the possible cases but should give you some clues about where to look in the XML.

Related

pandoc markdown to docx - keep list on one page

I have a markdown list like so:
* Question A
- Answer 1
- Answer 2
- Answer 3
I need to ensure that all the answers (1 - 3) appear on the same page as Question A when I convert the markdown document to docx using pandoc. How can I do this?
Use custom styles in your Markdown and then define those styles in a custom docx template.
It's important to note that Pandoc's documentation states (emphasis added):
Because pandoc’s intermediate representation of a document is less
expressive than many of the formats it converts between, one should
not expect perfect conversions between every format and every other.
Pandoc attempts to preserve the structural elements of a document, but
not formatting details...
Of course, Markdown has no concept of "pages" or "page breaks," so that is not something Pandoc can handle by default. However, Pandoc is aware of docx styles. As the documentation explains:
By default, pandoc’s docx output applies a predefined set of styles
for blocks such as paragraphs and block quotes, and uses largely
default formatting (italics, bold) for inlines. This will work for
most purposes, especially alongside a reference.docx file. However, if
you need to apply your own styles to blocks, or match a preexisting
set of styles, pandoc allows you to define custom styles for blocks
and text using divs and spans, respectively.
If you define a div or span with the attribute custom-style, pandoc
will apply your specified style to the contained elements. So, for
example using the bracketed_spans syntax,
[Get out]{custom-style="Emphatically"}, he said.
would produce a docx file with “Get out” styled with character style
Emphatically. Similarly, using the fenced_divs syntax,
Dickinson starts the poem simply:
::: {custom-style="Poetry"}
| A Bird came down the Walk---
| He did not know I saw---
:::
would style the two contained lines with the Poetry paragraph style.
If the styles are not yet in your reference.docx, they will be defined
in the output file as inheriting from normal text. If they are already
defined, pandoc will not alter the definition.
If you don't want to define the style manually, but would like it applied to every list automatically (or perhaps to every list which follows a specific pattern), you could define a custom filter which applied the style(s) to every matching element in the document.
Of course, that only adds the style names to the output. You still need to define the styles (tell Word how to display elements assigned those styles). As the documentation for the --reference-doc option explains :
For best results, the reference docx should be a modified version of a
docx file produced using pandoc. The contents of the reference docx
are ignored, but its stylesheets and document properties (including
margins, page size, header, and footer) are used in the new docx. If
no reference docx is specified on the command line, pandoc will look
for a file reference.docx in the user data directory (see --data-dir).
If this is not found either, sensible defaults will be used.
To produce a custom reference.docx, first get a copy of the default
reference.docx: pandoc --print-default-data-file reference.docx >
custom-reference.docx. Then open custom-reference.docx in Word, modify
the styles as you wish, and save the file.
Of course, when modifying the custom-reference.docx in Word, you can add your new custom style which you have used in your Markdown. As #CindyMeister points out in a comment:
Word would handle this using styles, where the Question style would
have the paragraph setting "Keep with Next". the Answer style would
have this as well. A third style, for the last entry, would NOT have
the setting activated. In addition, all three styles would have the
paragraph setting "Keep together" activated.
Finally, when using pandoc to convert your Markdown to a Word docx file, use the option --reference-doc=custom-reference.docx and your custom style definitions will be included in the generated docx file. As long as you also properly identify which elements in the Markdown document get which styles, your should have a list which doesn't get broken across a page break as long at the entire list fits on one page.

Word/Publisher email merge issues

I've been searching for days for an answer to this issue. I'm trying to append an Access field to a base URL to customize each email in my merge like so: http://www.example.com/myItems.asp?ItemID={field}.
I tried several approaches in Word 2007, then gave up and finally tried Publisher after coming across this post - MS Word: Mailmerge hyperlinks with query get URL string with a MERGEFIELD.
In Publisher, I got everything to merge properly including the custom links (according to preview), but when I hit "send email" it wasn't passing the emails to Outlook - said 0 message(s) sent. I tried again, using a blank email template and got it to pass the email, but the email showed field names rather than the merged data.
Coming across this article regarding the field names - http://msgroups.net/microsoft.public.publisher/emailmerge-not-working-in-publishe/213664 - I clicked outside the text box as suggested before sending email but still, the field names show and not the merged data.
I'm super frustrated and exhausted. This shouldn't be this difficult! Any ideas or suggestions would be appreciated.
This shouldn't be this difficult!
I agree. I can't help on the Publisher front, but this link should help for Windows Word.
To summarise, when you insert the HYPERLINK field, do it this way:
Use ctrl-F9 to insert a field code brace pair { }
Type HYPERLINK between the braces
Select the field and update it once (F9)
Do not update this field code again. If you do, Word will always insert the same link text (i.e. the hyperlink target). People working with fields often select F9 quite a lot just to make sure things are up to date, so you have to try not to do that.
If you Alt-F9, you should see that the display text is an error message (starting with "E" in the ENglish language version of Word).
Move the insertion point so it is immediately after the E. Type the display text that you want, or, if you want a variable display text built from text + MERGE fields etc, enter that text and those codes).
Carefully remove the "E" and the other part of the error text.
Use ALt-F9 again to display the HYPERLINK field code. Click after the K, type a space, then enter the following fields and text, assuming your variable text is coming from a MERGE field called fieldname:
"{ SET X 1 }http://www.example.com/myitems.asp?ItemID={ MERGEFIELD fieldname }"
(The SET field is there to stop Word doing something else wrong. If you have more than one HYPERLINK field, you will need to SET a different variable name (X1, X2 etc.) in each HYPERLINK). This is discussed in more detail here - interestingly enough, that question was also about merge to HTML email, but I think you also have to do the additional stuff I mention above to make it all work.

How can I use the DocX library to change the font globally, remove superfluous spaces, and remove or add extra line breaks?

I want to, using the DocX library [https://docx.codeplex.com/], convert a .docx document to use a different font. Does anybody know how to do that? The samples projects are very spare, and the documentation is nonexistent.
I find, too, that often there are extraneous spaces in documents, and I want to iterate over all these until there are never two contiguous spaces. I can do this in a loop, I guess, replacing " " (2 spaces) with " " (1 space) until " " (2 spaces) is no longer found.
However, I also want to remove superfluous line breaks that sometimes occur when copying-and-pasting text into a document. I can do it "manually" (in Libre Office, not sure how it's done in MS Word), as I got an answer to this question:
(select "Regular Expressions" and then replace "$" (without the quotes) with a space)
...but how programmatically, with DocX?
Additionally, in some cases I want to ADD line breaks/"paragraph returns" where there are legitimate line breaks between the end of one paragraph and the start of another, but no extra line to separate them visually. According to this:
...I can add a paragraph/line break to a legitimate line break by searching for "$" and replacing that with "\n\n"
This does work, too (manually, in Libre Office); but again...how to do this with the DocX library?
It appears that not all of this is possible with the current version of the DocX library you are using. If it is not exposed in documentation, the functions might as well not exist, and you should not be using undocumented features.
There is a much more mature library available, however, called the "Open XML SDK", that can do everything you need.
The correct way to change a font, regardless of whether you are doing it with the document editor, or you are writing a program to manipulate these files, is to change the appropriate text's style attribute, or changing the definition of style in use.
You should never, ever, ever, ever directly change the font of any text. Personally, I think that the 'font type' and 'font size' menus should be removed entirely from word/libreoffice/etc, and only be accessible inside a 'change style properties' dialog; the only reason to directly apply a font is if you are actually providing an example of particular typeface under discussion!
See How to: Replace the styles parts in a word processing document (Open XML SDK) from the MSDN documentation for a description of the way that works.
To search and replace text, the applicable MSDN page is How to: Search and replace text in a document part (Open XML SDK). For specifically replacing multiple spaces with a single space, there are numerous results on Google that should all work to at least some degree.

How can I identify an OpenXml Paragraph as one I programmatically inserted?

I am programmatically adding an OpenXML paragraph to a Word Document and I need to be able to identify that paragraph as mine later on. Any ideas on how to do this? I have tried inserting an XML comment and extended attributes but when you save the document in word it removes all unknown xml. It doesn't matter if it is an attribute in the paragraph or the run, or an element before the paragraph, just some way I can identify it later on. Also, I do not want this identifier visible in the word document.
Examples of what I could use:
<paragraph id="myParagraph"></paragraph>
<otherelement>myparagraph</otherelement>
<paragraph></paragraph>
Any help would be AWESOME because my head it hurting from the brick wall I have been running into.
Thanks!
Give the paragraph a w:rsidR attribute and assign a unique value to it; if there is no value present when word saves the document it will randomly assign it's own 8-digit hexadecimal value anyway. (The value is not limited to 8 digits or hexadecimal characters. Word will not modify existing RSIDs.)
That being said -- make sure to keep RSID values unique and do NOT modify existing RSID attributes -- they are the unique ID for that paragraph, and if the document splits into multiple versions and a user tries to merge them back together those RSIDs are used to determine what paragraphs have changed.
(Also note that runs have RSIDs as well.)
If the user modifies the paragraph, the RSID of that paragraph may change.
The alternate option is to use Custom XML: http://msdn.microsoft.com/en-us/library/bb608618.aspx
Use stylename in paragraph properties.
or try this one
http://msdn.microsoft.com/en-us/library/office/hh674468.aspx
Hope this helps.

Only display one paragraph of text

You can set what the Facebook Share preview says. I would like it to be the first paragraph of my movable type entry. The people who make entries sometimes use
<p>
tags or they use the rich editor which puts in two
<br /><br />
tags to separate paragraphs.
Is there a way I can have movable type detect when the first paragraph end and only display the first paragraph? I would like to add that to my entry template so it will add some information to my head.
EntryBody has a lot of attributes to help format the output of the tag. You can use those to change the content so it shows up correctly in HTML, JavaScript, PHP, XML or other forms of output.
If you understand how to use regular expressions, you can use that and an additional language, say PHP, to break the body up into an array and only output the first paragraph or element of the array.
The simplest thing, though, I would think, would be to do something like
<mt:EntryBody words=100>
That will cut off the entry body after the first 100 words. You could also require users to upload an excerpt with the entry and use the entry excerpt for Facebook, instead.