How to read Paragraph Titles and Paragraphs with Track changes like Insert and Delete in a Word Document using python-docx - ms-word

I am working on a project where there is a Legal Word document in below Format and I want to extract the Paragraph titles along with the Paragraph and check if that paragraph has any track changes like Insert or Delete.
Heading
Paragraph 1 Text
Heading
Paragraph 2 Text
etc.,
I am able to extract the paragraphs individually and get the track changes from word/document.xml however the heading is also being coming as a paragraph when read with python-docx.
the Heading however is in bold text.
Is there a way to check when the paragraph 1 has any track changes, extract heading1 and assign it to a dictionary along with the paragraph text and track changes?
I tried to see if there is any particular style for headings but apparently there is nothing and the below output gave wrong results.
from docx import Document
#from docx.shared import Inches
document = Document("DOC_Example.docx")
headings = []
texts = []
for paragraph in document.paragraphs:
if paragraph.style.name == "Heading 2":
headings.append(paragraph.text)
elif paragraph.style.name == "Normal":
texts.append(paragraph.text)
for h, t in zip(headings, texts):
print(h, t)
I need to correlate and capture the Heading and Paragraph with the track changes in the same Dictionary.
I am a beginner in python and looking for some inputs.
Thanks in Advance.
Best Regards

Related

Can't see the pages in Word

I was working on a document in Word. I accidentally pinched something and I can’t see the content anymore, I see only 1 empty lsit
I want to return back the contents of the sheet. Although all headings and word counts are preserved in the document

Replace contents between headings

So what I am trying to do is I want to fetch Heading wise data from word via docx4j and want to replace the data that is in those Headings, I do not want to change the heading text just want to remove all the things in that Heading and replace it with another. I searched everywhere for 4 days but not able to find a successful way of doing it, I can do it by fetching text line by line manually but that won't be efficient so is there any way I can directly get Heading wise data and thn can remove every child in that Heading and append another child in that same heading?
Below I have attached an image of the word document, so thing is I want to remove everything in the Heading1 i.e. "2. Functional Requirements" but I do not want to change the heading or it's heirarchy, I just want to remove everything i.e. 2.1.1 and it's subdata, 2.1, BR_1,etc and want to append data same way here as child of "2. Functional Requirements" Style - Heading1.
Below is the image of the word document:
Please Help me, any help is highly appreciated.
Thank you, have a great day.

How to start a Paragraph always in a new page?

Let say I have 3 paragraphs in one page. Now if I add few new sentences in 1st Paragraph, all 2 Paragraphs below the first, will move downwards in a continuous fashion.
However I want all these 3 paragraphs will start in a new page. Let say, first Paragraph will start in page #1, second Paragraph in page #2 and third Paragraph in page #3. Now if I add new lines in the first paragraph, the other 2 paragraphs below it will stay in the same pages respectively, until the first paragraph eventually become large enough to occupy first 2 pages. When that would happen, the second paragraph will move to page #3 and third paragraph will move to page #4 i.e. all paragraph will move in a discrete fashion such that each of them will start from a new page.
Is there any way to achieve the same in Libre-office?
Any pointer will be highly appreciated.
To make LibreOffice insert a page break before every paragraph, you can modify the paragraph style (or the paragraph properties) accordingly.
To modify your paragraphs 1-3 as described, select them, then select menu Format -> Paragraph to open the paragraph options. Select the "Text Flow" tab and under "Breaks", select "Insert", Type "Page", Position "Before".
To modify the default paragraph style accordingly, hit F11 to open the styles list, right-click on "Default style" and select "Modify". Again, a paragraph properties window will appear, but the modifications will affect every paragraph in your document, so this approach may lead to strange results.
The best way would be to define a custom paragraph style with the Page Break set as described, and assign that style to your paragraphs 1-3.
This is called a Page Break and the easiest way to insert it is CTRL+Enter.

Copying PowerPoints Notes Pane to Word Gives Inconsistent Font Sizes

I have a Word VSTO addin which copies the Notes page for each slide in a PowerPoint file into a Word document. There are only certain lines in the Notes pane that I need. My code loops thru each slide, looks for any notes, then checks for the tags and copies the text between the start and end tags. It is important to maintain the formatting in the Notes page. However, when pasting into Word, the font copies over, but the size changes. For instance, the font in PowerPoint might be Times New Roman 12, but in Word it will sometimes randomly change to Times New Roman 14. It appears that even if pasting a single range of text, the font may change between paragraphs. Also, certain blocks of text in Word will have the Normal style, but others will have Normal (Web), which affects line spacing.
I tried using a method to retain original source formatting (commented out below), but that sometimes will create bullets before the text, besides changing the font size.
Anyone have an idea how to resolve this?
The abbreviated example code is below.
Dim notesRng, fndRng, endRng, copyRng as PowerPoint.TextRange
dim iStart, charLen as Integer
For i As Integer = 0 To pwrPointApp.ActivePresentation.Slides.Count - 1
oSlide = pwrPointApp.ActivePresentation.Slides(i)
If oSlide.NotesPage.Shapes.Placeholders(2).TextFrame.TextRange.Text.Length > 0 Then
notesRng = oSlide.NotesPage.Shapes.Placeholders(2).TextFrame.TextRange
fndRange = notesRng.Find(FindWhat:="START:")
If Not fndRange is Nothing then
iStart = notesRng.Text.IndexOf("START:") + 6 ' puts the start just after the colon:
endRng = notesRng.Find(FindWhat:="END:", After:=iStart)
If Not endRng is Nothing Then
charLen = notesRng.Text.IndexOf("END:", iStart) - iStart
copyRng = oSlide.NotesPage.Shapes.Placeholder(2).TextFrame.TextRange.Characters(iStart, charLen)
If Not copyRng is Nothing then
copyRng.Copy
ThisAddIn.WordApp.Selection.Paste()
'ThisAddIn.WordApp.Selection.PasteAndFormat(Word.WdRecoveryType.wdFormatOriginalFormatting)
ThisAddin.WordApp.Selection.Collapse()
End If
End If
End If
End If
Next I

Applescript for inserting hyperlink into MSWord comment

I'm trying to add a hyperlink object inside a Word comment.
To create a new comment in the active document I'm using this piece of script:
tell application "Microsoft Word"
set tempString to "lorem ipsum"
make new Word comment at selection with properties {comment text:tempString}
end tell
but now I'm not able to get a reference to the new created comment for use it with the command "make new hyperlink object".
Thanks for any suggestions.
Riccardo
I don't think you can work with the object returned by make new Word comment (at least not in this case), and you have to insert a unique, findable string then iterate through the comments:
tell application "Microsoft Word"
-- insert a unique string
set tempString to (ASCII character 127)
set theComments to the Word comments of the active document
repeat with theComment in theComments
if the content of the comment text of theComment = tempString then
set theRange to the comment text of theComment
-- you do not have to "set theHyperlink". "make new" is enough
set theHyperlink to make new hyperlink object at theRange with properties {text range:theRange, hyperlink address:"http://www.google.com", text to display:"HERE", screen tip:"click to search Google"}
insert text "You can search the web " at theRange
exit repeat
end if
end repeat
end tell
(edited to insert some text before the Hyperlink. If you want to insert text after the hyperlink, you can also use 'insert text "the text" at end of theRange.).
So for adding text, it was enough to use "the obvious" after all.
[
For anyone else finding this Answer. The basic problem with working with Word ranges in Applescript is that every attempt to redefine a range in the Comments story results in a range that is in the main document story. OK, I may not have tried every possible method, but e.g., collapsing the range, moving the start of range and so on cause that problem. In the past, I have noticed that with other story ranges as well, but have not investigated as far as this.
Also, I suspect that the reason why you cannot set a range to the Word comment that you just created is because the properties of the Comment specify a range object of some kind that I think is a temporary object that may be destroyed immediately after creation. SO trying to reference the object that you just created just doesn't work.
This part of the Answer is modified...
Finally, the only other way I found to populate a Comment with "rich content" was to insert the content in a document at a known place, then copy its formatted text to the comment, e.g. if the "known place is the selection, you can set the content of theComment via
set the formatted text of the comment text of theComment to the formatted text of the text object of the selection
If you are using a version of Word that supports VBA as well as Applescript, I don't really see any technical reason why you shouldn't invoke VBA to do some of these trickier things, even if you need the main code to be Applescript.
]
Finally I got a solution here:
https://discussions.apple.com/message/24628799#24628799
that allowed me to insert the hyperlink in reference with part of the comment text, with the following lines, if somebody in the future will search for the same:
tell application "Microsoft Word"
set wc to make new Word comment at end of document 1 with properties {comment text:"some text"}
set ct to comment text of wc
set lastChar to last character of ct
make new hyperlink object at end of document 1 with properties {hyperlink address:"http://www.example.com", text object:lastChar}
end tell