Returning Word search results using Powershell - powershell

I would like to extract formatting information from Word documents with PowerShell. Using Word you can search formatted pieces of texts. This way Word highlights the parts satisfying the criterion (e.g. green underlined text). With this one I can find italic text in PowerShell as well:
$objWord = New-Object -Com Word.Application
$myWordFile = 'C:\My\Word\File.docx'
$objDocument = $objWord.Documents.Open($myWordFile)
$objDocument.Paragraphs[0].Range.Find.Font.Italic = $true
$objDocument.Paragraphs[0].Range.Find.Execute()
However, I'm curious about the italic text itself, a similar thing as the content of the $matches for -match.

Here is an example of what you are trying to do …
This is find and replace, so, ignore that replace part if that is not your end goal - it is also find words and then applying italics, but the same approach can be used to just find all italicized words.
$application = New-Object -comobject word.application
$application.visible = $true
$document = $application.documents.open("C:fsoTest.docx")
$selection = $application.Selection
$words = "exchange","sql"
$matchCase = $false
$matchWholeWord = $true
$matchWildCards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1
$format = $true
$replace = 2
Foreach ($word in $words)
{
$findText = $word
$replaceWith = $word
$selection.find.replacement.font.italic = $true
$exeRTN = $selection.find.execute($findText,$matchCase,
$matchWholeWord,$matchWIldCards,$matchSoundsLike,
$matchAllWordForms,$forward,$wrap,$format,$replaceWith,
$replace)
}
… as documented here:
Hey, Scripting Guy! How Can I Italicize Specific Words in a Microsoft Word Document?

Related

Replacing multiple strings in a word doc in PowerShell

I'm trying to replace multiple strings in a word document using PowerShell, but only one string is replaced when running the code below:
#Includes
Add-Type -AssemblyName System.Windows.Forms
#Functions
#Function to find and replace in a word document
function FindAndReplace($objSelection, $findText,$replaceWith){
$matchCase = $true
$matchWholeWord = $true
$matchWildcards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = [Microsoft.Office.Interop.Word.WdFindWrap]::wdReplaceAll
$format = $false
$replace = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
$objSelection.Find.Execute($findText,$matchCase,$matchWholeWord,$matchWildcards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap,$format,$replaceWith, $replace) > $null
}
$item1 = "Should"
$item2 = "this"
$item3 = "work"
$item4 = "?"
$fileName = "NewFile"
#Opens a file browsers to select a word document
$FileBrowser = New-Object System.Windows.Forms.OpenFileDialog -Property #{
InitialDirectory = [Environment]::GetFolderPath('Desktop')
Filter = 'Documents (*.docx)|*.docx'
}
Write-Host "Select word template file"
$FileBrowser.ShowDialog()
$templateFile = $FileBrowser.FileName
$word = New-Object -comobject Word.Application
$word.Visible = $false
$template = $word.Documents.Open($templateFile)
$selection = $template.ActiveWindow.Selection
FindAndReplace $selection '#ITEM1#' $item1
FindAndReplace $selection '#ITEM2#' $item2
FindAndReplace $selection '#ITEM3#' $item3
FindAndReplace $selection '#ITEM4#' $item4
$fileName = $fileName
$template.SaveAs($fileName)
$word.Quit()
If I comment out FindAndReplace the first one that runs works, but subsequent calls do not.
For example running this as is results in:
Input Output
#ITEM1# Should
#ITEM2# #ITEM2#
#ITEM3# #ITEM3#
#ITEM4# #ITEM4#
I'm not sure what I'm missing, any help would be appreciated
As was suggested it appears that the cursor was not returning to the beginning of the document. I added the following code:
Set-Variable -Name wdGoToLine -Value 3 -Option Constant
Set-Variable -Name wdGoToAbsolute -Value 1 -Option Constant
To the beginning of my script and:
$objSelection.GoTo($wdGoToLine, $wdGoToAbsolute, 1) > $null
as the first line in my FindAndReplace function, and now it works as expected.
There may be a more elegant solution, but this works for me

Find and replace in Text Box in Header of Word Doc using Powershell

I am trying to use Powershell to do a find and replace of some text within a Text Box that is within the Header of a Word Document (.docx). I was able to get it working for text outside of the Header but not that within. I think it is failing because I am not correctly accessing the contents of the Text Box, so I added in the final line (before quit and save) to see what the text was but it printed out blank for each of the three Items in my Header. This is my first time using Powershell and I think I have perhaps spent more time learning this and writing it than I will save by using it...
The relevant snippet of the script is below:
$word = New-Object -COM "Word.Application";
$word.Visible = $false;
$doc = $word.Documents.Open($FullPath);
$selection = $word.Selection;
$section = $doc.sections.item(1);
$header = $section.headers.Item(3);
$FindText = "Cnnn";
$MatchCase = $False;
$MatchWholeWord = $False;
$MatchWildcards = $False;
$MatchSoundsLike = $False;
$MatchAllWordForms = $False;
$Forward = $True;
$wdFindContinue = 1;
$Wrap = $wdFindContinue;
$Format = $False;
$wdReplaceNone = 0;
$ReplaceAll = 2;
$ReplaceWith = "C" + $newString;
$a = $header.Find.Execute($FindText,$MatchCase,$MatchWholeWord, `
$MatchWildcards,$MatchSoundsLike,$MatchAllWordForms,$Forward,`
$Wrap,$Format,$ReplaceWith, $ReplaceAll);
Write-Host ("Header is: " + $header.Text);
$doc.Save();
$word.Quit();
You need to apply your search to the .TextFrame.TextRange.Find object from the TextBox or any shape containing text.
You could try something like this:
If ($header.ShapeRange.Count) {
ForEach ($shp in $header.ShapeRange) {
If ($shp.TextFrame.HasText) {
$obj = $shp.TextFrame.TextRange.Find
$a = $obj.Execute($FindText,$MatchCase,$MatchWholeWord,`
$MatchWildcards,$MatchSoundsLike,$MatchAllWordForms,$Forward,`
$Wrap,$Format,$ReplaceWith,$ReplaceAll)
}
}
}

Programmatically remove all hidden text in a Word document

Using PowerShell, I need to write a script which would remove all hidden text of a Word Document.
Here is what I have so far :
$WordDocument = Get-Item "C:\MyWordDocument.docx"
$word_app = New-Object -ComObject Word.Application
$word_app.Visible = $false
$document = $word_app.Documents.Open($WordDocument.FullName)
$objSelection = $word_app.Selection
$objSelection.Font.Hidden = $True
$FindText = "" # search on formatting only (according to MS doc)
$wdFindContinue = 1
$ReplaceAll = 2
$MatchCase = $False
$MatchWholeWord = $False
$MatchWildcards = $False
$MatchSoundsLike = $False
$MatchAllWordForms = $False
$Forward = $True
$Wrap = $wdFindContinue
$Format = $True # ?
$ReplaceWith = ""
$a = $objSelection.Find.Execute($FindText,$MatchCase,$MatchWholeWord, `
$MatchWildcards,$MatchSoundsLike,$MatchAllWordForms,$Forward,`
$Wrap,$Format,$ReplaceWith,$ReplaceAll)
$document.Save()
$document.Close()
$word_app.Quit()
It does not work, and I cannot figure out why.
Any idea ?
The mistake is where you set the search filter to find hidden text. Instead of $objSelection.Font.Hidden = $True (this actually hides the currently selected text) you need to set the property on the $objSelection.Find object:
$objSelection.Find.Font.Hidden = $True

Powershell - Trim word document

I have a word document, that I read with the following code:
$objWord = New-Object -Com Word.Application
$objWord.Visible = $false
$objDocument = $objWord.Documents.Open($Formularpath, $false, $true)
$documenttext = $objDocument.wordopenxml
Now I have my document, but there is alot of text I don't need. How can I cut the text, e.g. until 1 specific word? I know split(';') but I will need a whole word...

Powershell word header replace

I want to replace header in word document.
$pth = "d:\test\test.docx"
$objWord = New-Object -ComObject word.application
$objWord.Visible = $True
$objDoc = $objWord.Documents.Open($pth)
$objSelection = $objWord.Selection
$Section = $objDoc.Sections.Item(1)
$header = $Section.Headers.Item(1)
This return me a plain text:
Write-Host $header.Range.Text
But my header have an image and table. Can i replace string in header without destroying header? I replace strings in word document and works great. My only problem is header.
Link to example Word document header below.
http://zapodaj.net/223c522426648.png.html
Try this:
$replaceWith = "New Text !"
$replace = [Microsoft.Office.Interop.Word.WdReplace]::wdReplaceAll
$findWrap = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
$find = $header.Range.find
$find.Execute($header.Range.Text,
$false, #match case
$false, #match whole word
$false, #match wildcards
$false, #match soundslike
$false, #match all word forms
$true, #forward
$findWrap,
$null, #format
$replaceWith,
$replace)
The pictures and other tables, should remain untouched.
I don't know solution in powershell but I use VBA runed from powershell.
// code is changed so if it don't work let me know
$objWord = New-Object -ComObject word.application
$objWord.Visible = $True # don't have to be true
$pathToFile = "d:\Delivery_Templates\filename.docx" #path to your file
$objDoc = $objWord.Documents.Open(pathToFile )
$objSelection = $objWord.Selection
$objWord.Run('myReplace', [ref] $currentVersion); # myReplace - macro name, currentVersion - macro parameter