How to extract all footnotes from a MS Word file? - ms-word

Is there a generic (Macro, XML Parser, ...) way to extract all footnotes in a MS Word file and also keep the corresponding number from the original text?

You might use Elod Pal Czirmaz's macro as a starting point. I suspect that footnotes don't carry their numbers with them. When the document is rendered, the numbers are just assigned in order.

Related

Is it possible to ignore paragraph marks when using getTextRanges() in word add in?

I am currently developing a word addin using the office js library. I need to get all sentences in the word document as individual ranges. For this I used getTextRanges() on the body of the document with "." as the delimiter. However, it also separates on paragraph mark which is not ideal for my use case. All I want is for the document to be divvied up into ranges where the only delimiter is "." - regardless of whether the ranges will then expand across paragraphs.
Is there a way to ignore paragraph marks with getTextRanges(), or is there another method entirely that I seem to have overlooked?
Thanks.
I have been unable to resolve it.

Unicode converted text isn't shown properly in MS-Word

In a mapping editor, the display is correct after the legacy to unicode conversion for DEVANAGARI text shown using a unicode font (Arial Unicode MS). However, in MS-WORD, the display isn't as expected for the same unicode text in the unicode font (Arial Unicode MS) or any other Devanagari unicode fonts. The expected sequence of unicodes are provided as per the documentation. The sequence can be seen on the left-hand side table.
Please let me know where I am going wrong.
Thanks for your help!
Does your map have to insert the zero_width_joiner? The halant (virama) by itself is enough to get the half-consonant (for some combinations) and in particular, it may be that Word is using the presence of the ZWJ to keep them separate.
If getting rid of the ZWJ doesn't help, another possibility is that Word may be treating the individual characters of the text string as individual "runs" of text.
If those first 4 characters are not in a single run, this can happen.
[aside: the way to tell if it's being treated as a single run, is to save the document as an xml file and then open it with something like notepad++ and look at the xml "w:t" element (IIRC) associated with these characters. If they're all in separate w:t elements, it means they're in separate runs. In that case, you might need to copy the text from Word to some other tool (e.g. Notepad++) and then copy it from there and paste it back in Word -- that might cause it to be imported into Word in a single run.

MS Word: Carriage Returns in numbering format

in MS Word 2010, is it possible to include a manual line breaks in the formatting for a numbered list?
What I mean is I'm creating a style that includes numbering in a list. I'd like the list to appear like this:
Section 1[MLB]
Benefits
Section 2[MLB]
Drawbacks
etc.
I'm in the Define New Number Format dialog box, trying to find a way to include a manual line break in the Number Format field. I've got the word "Section" in there, but the line break is a problem so far. I've tried ^|, which is the search-and-replace code for manual line breaks. But that includes a literal carat followed by a pipe. Is there some other way of including things like paragraph breaks or line breaks in numbered lists? Thanks everyone.

How to fill a field with spaces until a length in Notepad++

I've prepared a macro in Notepad++ to transform a ldif file in a csv file with a few fields. Everything is OK but I have a final problem: I have to have 2 fields with a specific length and in this moment I cannot ensure that length because in the source file they are not coming so
For instance, I generate this line:
12345,namenamename,123456
And I have to ensure that the 2nd and 3rd fields have 30 (filling with spaces at right side) and 9 (filling with zeros at left) characters, so in this case I should generate:
12345,namenamename ,000123456
I haven't found how Notepad++ could match a pattern in order to add spaces/zeros, so I have though in to add 1 space/zero to the proper field and repeat this step so many times as needed to ensure the lengths (this is, 29 and 8, because they cannot come empty) and search with the length in the regex (for instance: \d{1,8} for the third field)
My question is: can I repeat only one step of the macro several times (and the rest of the macro only 1 repetition)?
I've read the wiki related to this point (http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Editing_Configuration_Files#.3CMacros.3E) and I don't found anything neither
If not possible, how could be a good solution? Create another 2 different macros and after execute the main one, execute this new 2 macros several times?
Thanks in advance!
A two pass solution with Notepad++ is possible. Find a pair of characters or two short sequence of characters that never occurs in your data file. I will use =#<= and =>#= here.
First pass, generate or convert the input text into the form 12345,=#<=namenamename______________________________,000000000123456=>#=. Ie add 30 spaces after the name and nine zeroes before the number (underscores used here just to make things clearer).
Second pass, do a regular expression search for =#<=(.{30})_*,0*(\d{9})=>#= and replace with \1,\2.
I have just suggested a similar solution in special timestamp format of csv

How can I identify an OpenXml Paragraph as one I programmatically inserted?

I am programmatically adding an OpenXML paragraph to a Word Document and I need to be able to identify that paragraph as mine later on. Any ideas on how to do this? I have tried inserting an XML comment and extended attributes but when you save the document in word it removes all unknown xml. It doesn't matter if it is an attribute in the paragraph or the run, or an element before the paragraph, just some way I can identify it later on. Also, I do not want this identifier visible in the word document.
Examples of what I could use:
<paragraph id="myParagraph"></paragraph>
<otherelement>myparagraph</otherelement>
<paragraph></paragraph>
Any help would be AWESOME because my head it hurting from the brick wall I have been running into.
Thanks!
Give the paragraph a w:rsidR attribute and assign a unique value to it; if there is no value present when word saves the document it will randomly assign it's own 8-digit hexadecimal value anyway. (The value is not limited to 8 digits or hexadecimal characters. Word will not modify existing RSIDs.)
That being said -- make sure to keep RSID values unique and do NOT modify existing RSID attributes -- they are the unique ID for that paragraph, and if the document splits into multiple versions and a user tries to merge them back together those RSIDs are used to determine what paragraphs have changed.
(Also note that runs have RSIDs as well.)
If the user modifies the paragraph, the RSID of that paragraph may change.
The alternate option is to use Custom XML: http://msdn.microsoft.com/en-us/library/bb608618.aspx
Use stylename in paragraph properties.
or try this one
http://msdn.microsoft.com/en-us/library/office/hh674468.aspx
Hope this helps.