Document manipluation: powershell - powershell

I am attempting to take a large document, search for a "^m" (page break) and create a new text file for each page break I find.
Using:
$SearchText = "^m"
$word = new-object -ComObject "word.application"
$path = "C:\Users\me\Documents\Test.doc"
$doc = $word.documents.open("$path")
$doc.content.find.execute("$SearchText")
I am able to find text, but how do I save the text before the page break into a new file? In VBScript, I would just do a readline and save it to a buffer, but powershell is much different.
EDIT:
$text = $word.Selection.MoveUntil (cset:="^m")
returns an error:
Missing ')' in method call.

I think my solution is kinda stupid, but here is my own solution (please help me find a better one):
Param(
[string]$file
)
#$file = "C:\scripts\docSplit\test.docx"
$word = New-Object -ComObject "word.application"
$doc=$word.documents.open($file)
$txtPageBreak = "<!--PAGE BREAK--!>"
$fileInfo = Get-ChildItem $file
$folder = $fileInfo.directoryName
$fileName = $fileInfo.name
$newFileName = $fileName.replace(".", "")
#$findtext = "^m"
#$replaceText = $txtPageBreak
function Replace-Word ([string]$Document,[string]$FindText,[string]$ReplaceText) {
#Variables used to Match And Replace
$ReplaceAll = 2
$FindContinue = 1
$MatchCase = $False
$MatchWholeWord = $True
$MatchWildcards = $False
$MatchSoundsLike = $False
$MatchAllWordForms = $False
$Forward = $True
$Wrap = $FindContinue
$Format = $False
$Selection = $Word.Selection
$Selection.Find.Execute(
$FindText,
$MatchCase,
$MatchWholeWord,
$MatchWildcards,
$MatchSoundsLike,
$MatchAllWordForms,
$Forward,
$Wrap,
$Format,
$ReplaceText,
$ReplaceAll
)
$newFileName = "$folder\$newFileName.txt"
$Doc.saveAs([ref]"$newFileName",[ref]2)
$doc.close()
}
Replace-Word($file, "^m", $txtPageBreak)
$word.quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word)
Remove-Variable word
#begin txt file manipulation
#add end of file marker
$eof = "`n<!--END OF FILE!-->"
Add-Content $newfileName $eof
$masterTextFile = Get-Content $newFileName
$buffer = ""
foreach($line in $masterTextFile){
if($line.compareto($eof) -eq 0){
#end of file, save buffer to new file, be done
}
else {
$found = $line.CompareTo($txtPageBreak)
if ($found -eq 1) {
$buffer = "$buffer $line `n"
}
else {
#save the buffer to a new file (still have to write this part)
}
}
}

Related

Replacing multiple strings in a word doc in PowerShell

I'm trying to replace multiple strings in a word document using PowerShell, but only one string is replaced when running the code below:
#Includes
Add-Type -AssemblyName System.Windows.Forms
#Functions
#Function to find and replace in a word document
function FindAndReplace($objSelection, $findText,$replaceWith){
$matchCase = $true
$matchWholeWord = $true
$matchWildcards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = [Microsoft.Office.Interop.Word.WdFindWrap]::wdReplaceAll
$format = $false
$replace = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
$objSelection.Find.Execute($findText,$matchCase,$matchWholeWord,$matchWildcards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap,$format,$replaceWith, $replace) > $null
}
$item1 = "Should"
$item2 = "this"
$item3 = "work"
$item4 = "?"
$fileName = "NewFile"
#Opens a file browsers to select a word document
$FileBrowser = New-Object System.Windows.Forms.OpenFileDialog -Property #{
InitialDirectory = [Environment]::GetFolderPath('Desktop')
Filter = 'Documents (*.docx)|*.docx'
}
Write-Host "Select word template file"
$FileBrowser.ShowDialog()
$templateFile = $FileBrowser.FileName
$word = New-Object -comobject Word.Application
$word.Visible = $false
$template = $word.Documents.Open($templateFile)
$selection = $template.ActiveWindow.Selection
FindAndReplace $selection '#ITEM1#' $item1
FindAndReplace $selection '#ITEM2#' $item2
FindAndReplace $selection '#ITEM3#' $item3
FindAndReplace $selection '#ITEM4#' $item4
$fileName = $fileName
$template.SaveAs($fileName)
$word.Quit()
If I comment out FindAndReplace the first one that runs works, but subsequent calls do not.
For example running this as is results in:
Input Output
#ITEM1# Should
#ITEM2# #ITEM2#
#ITEM3# #ITEM3#
#ITEM4# #ITEM4#
I'm not sure what I'm missing, any help would be appreciated
As was suggested it appears that the cursor was not returning to the beginning of the document. I added the following code:
Set-Variable -Name wdGoToLine -Value 3 -Option Constant
Set-Variable -Name wdGoToAbsolute -Value 1 -Option Constant
To the beginning of my script and:
$objSelection.GoTo($wdGoToLine, $wdGoToAbsolute, 1) > $null
as the first line in my FindAndReplace function, and now it works as expected.
There may be a more elegant solution, but this works for me

Windows Power Shell rename files

I am sort of new to scripting and here's my task:
A folder with X files. Each file contains some Word documents, Excel sheets, etc. In these files, there is a client name and I need to assign an ID number.
This change will affect all the files in this folder that contain this client's name.
How can do this using Windows Power Shell?
$configFiles = Get-ChildItem . *.config -rec
foreach ($file in $configFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace " JOHN ", "123" } |
Set-Content $file.PSPath
}
Is this the right approach ?
As #lee_Daily pointed out you would need to have different code to perform a find and replace in different file types. Here is an example of how you could go about doing that:
$objWord = New-Object -comobject Word.Application
$objWord.Visible = $false
foreach ( $file in (Get-ChildItem . -r ) ) {
Switch ( $file.Extension ) {
".config" {
(Get-Content $file.FullName) |
Foreach-Object { $_ -replace " JOHN ", "123" } |
Set-Content $file.FullName
}
{('.doc') -or ('.docx')} {
### Replace in word document using $file.fullname as the target
}
{'.xlsx'} {
### Replace in spreadsheet using $file.fullname as the target
}
}
}
For the actual code to perform the find and replace, i would suggest com objects for both.
Example of word find and replace https://codereview.stackexchange.com/questions/174455/powershell-script-to-find-and-replace-in-word-document-including-header-footer
Example of excel find and replace Search & Replace in Excel without looping?
I would suggest learning the ImportExcel module too, it is a great tool which i use a lot.
For Word Document : This is what I'm using. Just can't figure out how this script could also change Header and Footer in a Word Document
$objWord = New-Object -comobject Word.Application
$objWord.Visible = $false
$list = Get-ChildItem "C:\Users\*.*" -Include *.doc*
foreach($item in $list){
$objDoc = $objWord.Documents.Open($item.FullName,$true)
$objSelection = $objWord.Selection
$wdFindContinue = 1
$FindText = " BLAH "
$MatchCase = $False
$MatchWholeWord = $true
$MatchWildcards = $False
$MatchSoundsLike = $False
$MatchAllWordForms = $False
$Forward = $True
$Wrap = $wdFindContinue
$Format = $False
$wdReplaceNone = 0
$ReplaceWith = "help "
$wdFindContinue = 1
$ReplaceAll = 2
$a = $objSelection.Find.Execute($FindText,$MatchCase,$MatchWholeWord, `
$MatchWildcards,$MatchSoundsLike,$MatchAllWordForms,$Forward,`
$Wrap,$Format,$ReplaceWith,$ReplaceAll)
$objDoc.Save()
$objDoc.Close()
}
$objWord.Quit()
What If I try to run on C# ? Is anything else missing?
}
string rootfolder = #"C:\Temp";
string[] files = Directory.GetFiles(rootfolder, "*.*",SearchOption.AllDirectories);
foreach (string file in files)
{ try
{ string contents = File.ReadAllText(file);
contents = contents.Replace(#"Text to find", #"Replacement text");
// Make files writable
File.SetAttributes(file, FileAttributes.Normal);
File.WriteAllText(file, contents);
}
catch (Exception ex)
{ Console.WriteLine(ex.Message);
}
}

replace a string with an hyperlink in a file Word powershell

i want to replace a string with an hyperlink
i try with something like this
Update:
$FindText = "[E-mail]"
$email ="asdadasd#asdada.com"
$a=$objSelection.Find.Execute($FindText)
$newaddress = $objSelection.Hyperlinks.Add($objSelection.Range,$email) )
but this insert the email at beginnig of file word don't replace the string "[E-mail]"
Add-Type -AssemblyName "Microsoft.Office.Interop.Word"
$wdunits = "Microsoft.Office.Interop.Word.wdunits" -as [type]
$objWord = New-Object -ComObject Word.Application
$objWord.Visible = $false
$findText = "[E-mail]"
$emailAddress = "someemail#example.com"
$mailTo = "mailto:"+$emailAddress
$objDoc = $objWord.Documents.Open("Path\to\input.docx")
$saveAs = "Path\to\output.docx")
$range = $objDoc.Content
$null = $range.movestart($wdunits::wdword,$range.start)
$objSelection = $objWord.Selection
$matchCase = $false
$matchWholeWord = $true
$matchWildcards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1
$format = $False
$wdReplaceNone = 0
$wdFindContinue = 1
$wdReplaceAll = 2
$wordFound = $range.find.execute($findText,$matchCase,$matchWholeWord,$matchWildCards,$matchSoundsLike,$matchAllWordForms,$forward,$wrap)
if($wordFound)
{
if ($range.style.namelocal -eq "normal")
{
$null = $objDoc.Hyperlinks.Add($range,$mailTo,$null,$null,$emailAddress)
}
}
$objDoc.SaveAs($saveAs)
$objDoc.Close()
$objWord.Quit()
Remove-Variable -Name objWord
[gc]::Collect()
[gc]::WaitForPendingFinalizers()
Kinda ugly, but this script will do what you need. It loads the .docx specified with $objDoc, finds all instances of $findText, and replaces it with a mailto link for $emailAddress and then saves the changes to $saveAs.
Most of this based on a "Hey, Scripting Guy" Article

Extract sections from range found in word document

Below code works fine, we got start and end point which needs to be extracted but im not able to get range.set/select to work
I'm able to get the range from below, just need to extra and save it to CSV file...
$found = $paras2.Range.SetRange($startPosition, $endPosition) - this piece doesn't work.
$file = "D:\Files\Scan.doc"
$SearchKeyword1 = 'Keyword1'
$SearchKeyword2 = 'Keyword2'
$word = New-Object -ComObject Word.Application
$word.Visible = $false
$doc = $word.Documents.Open($file,$false,$true)
$sel = $word.Selection
$paras = $doc.Paragraphs
$paras1 = $doc.Paragraphs
$paras2 = $doc.Paragraphs
foreach ($para in $paras)
{
if ($para.Range.Text -match $SearchKeyword1)
{
Write-Host $para.Range.Text
$startPosition = $para.Range.Start
}
}
foreach ($para in $paras1)
{
if ($para.Range.Text -match $SearchKeyword2)
{
Write-Host $para.Range.Text
$endPosition = $para.Range.Start
}
}
Write-Host $startPosition
Write-Host $endPosition
$found = $paras2.Range.SetRange($startPosition, $endPosition)
# cleanup com objects
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($doc) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
This line of code is the problem
$found = $paras2.Range.SetRange($startPosition, $endPosition)
When designating a Range by the start and end position it's necessary to do so relative to the document. The code above refers to a Paragraphs collection. In addition, it uses SetRange, but should only use the Range method. So:
$found = $doc.Range.($startPosition, $endPosition)

PowerShell workaround for the blank Header/Footer bug when Find and Replace in a whole Word document

I am trying to put together a PowerShell script to do multiple find and replace throughout a whole Word Document, that is including Headers, Footers and any Shape potentially displaying text.
There are plenty of VBA examples around so it's not too difficult, but there is a know bug that is circumvented in VBA with a solution dubbed as "Peter Hewett 's VBA trickery". See this example and also this one.
I have tried to address this bug in a similar fashion in PowerShell but it is not working as expected. Some TextBoxes in Header or Footer are still being ignored.
I noticed however, that runnning my script twice will actually end up working.
Any idea as to a solution to this problem would be greatly appreciated.
$folderPath = "C:\Users\user\folder\*" # multi-folders: "C:\fso1*", "C:\fso2*"
$fileType = "*.doc" # *.doc will take all .doc* files
$textToReplace = #{
# "TextToFind" = "TextToReplaceWith"
"This1" = "That1"
"This2" = "That2"
"This3" = "That3"
}
$word = New-Object -ComObject Word.Application
$word.Visible = $false
$storyTypes = [Microsoft.Office.Interop.Word.WdStoryType]
#Val, Name
# 1, wdMainTextStory
# 2, wdFootnotesStory
# 3, wdEndnotesStory
# 4, wdCommentsStory
# 5, wdTextFrameStory
# 6, wdEvenPagesHeaderStory
# 7, wdPrimaryHeaderStory
# 8, wdEvenPagesFooterStory
# 9, wdPrimaryFooterStory
# 10, wdFirstPageHeaderStory
# 11, wdFirstPageFooterStory
# 12, wdFootnoteSeparatorStory
# 13, wdFootnoteContinuationSeparatorStory
# 14, wdFootnoteContinuationNoticeStory
# 15, wdEndnoteSeparatorStory
# 16, wdEndnoteContinuationSeparatorStory
# 17, wdEndnoteContinuationNoticeStory
Function findAndReplace($objFind, $FindText, $ReplaceWith) {
#simple Find and Replace to execute on a Find object
$matchCase = $true
$matchWholeWord = $true
$matchWildcards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$findWrap = [Microsoft.Office.Interop.Word.WdReplace]::wdReplaceAll
$format = $false
$replace = [Microsoft.Office.Interop.Word.WdFindWrap]::wdFindContinue
$objFind.Execute($FindText, $matchCase, $matchWholeWord, $matchWildCards, $matchSoundsLike, $matchAllWordForms, \`
$forward, $findWrap, $format, $ReplaceWith, $replace) > $null
}
Function findAndReplaceAll($objFind, $FindText, $ReplaceWith) {
findAndReplace $objFind $FindText $ReplaceWith
While ($objFind.Found) {
findAndReplace $objFind $FindText $ReplaceWith
}
}
Function findAndReplaceMultiple($objFind, $lookupTable) {
#apply multiple Find and Replace on the same Find object
$lookupTable.GetEnumerator() | ForEach-Object {
findAndReplaceAll $objFind $_.Key $_.Value
}
}
Function findAndReplaceMultipleWholeDoc($Document, $lookupTable) {
ForEach ($storyRge in $Document.StoryRanges) {
#Loop through each StoryRange
Do {
findAndReplaceMultiple $storyRge.Find $lookupTable
#check if the StoryRange has shapes (we check only StoryTypes 6 to 11, basically Headers and Footers)
# as the Shapes inside the wdMainTextStory will be checked
# see http://wordmvp.com/FAQs/Customization/ReplaceAnywhere.htm
# and http://gregmaxey.com/using_a_macro_to_replace_text_wherever_it_appears_in_a_document.html
If (($storyRge.StoryType -ge $storyTypes::wdEvenPagesHeaderStory) -and \`
($storyRge.StoryType -le $storyTypes::wdFirstPageFooterStory)) {
If ($storyRge.ShapeRange.Count) { #non-zero is True
ForEach ($shp in $storyRge.ShapeRange) {
If ($shp.TextFrame.HasText) { #non-zero is True, in case of text .HasText = -1
findAndReplaceMultiple $shp.TextFrame.TextRange.Find $lookupTable
}
}
}
}
#check for linked Ranges
$storyRge = $storyRge.NextStoryRange
} Until (!$storyRge) #non-null is True
}
}
Function processDoc {
$doc = $word.Documents.Open($_.FullName)
# The "VBA trickey" translated to PowerShell...
$junk = $doc.Sections.Item(1).Headers.Item(1).Range.StoryType
#... but not working
findAndReplaceMultipleWholeDoc $doc $textToReplace
$doc.Close([ref]$true)
}
$sw = [Diagnostics.Stopwatch]::StartNew()
$countf = 0
Get-ChildItem -Path $folderPath -Recurse -Filter $fileType | ForEach-Object {
Write-Host "Processing \`"$($_.Name)\`"..."
processDoc
$countf++
}
$sw.Stop()
$elapsed = $sw.Elapsed.toString()
Write-Host "Done. $countf files processed in $elapsed"
$word.Quit()
$word = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()
I checked out Microsoft documentation documentation here and then I think the below code can do it.
$word = New-Object -ComObject Word.Application
$word.visible=$false
$files = Get-ChildItem "C:\Users\Ali\Desktop\Test" -Filter *.docx
$find="Hello"
$replace="Bye"
$wdHeaderFooterPrimary = 1
$ReplaceAll = 2
$FindContinue = 1
$MatchCase = $false
$MatchWholeWord = $false
$MatchWildcards = $false
$MatchSoundsLike = $false
$MatchAllWordForms = $false
$Forward = $true
$Wrap = $findContinue
$Format = $false
for ($i=0; $i -lt $files.Count; $i++) {
$filename = $files[$i].FullName
$doc = $word.Documents.Open($filename)
ForEach ($StoryRange In $doc.StoryRanges){
$StoryRange.Find.Execute($find,$MatchCase,
$MatchWholeWord,$MatchWildcards,$MatchSoundsLike,
$MatchAllWordForms,$Forward,$Wrap,$Format,
$replace,$ReplaceAll)
While ($StoryRange.find.Found){
$StoryRange.Find.Execute($find,$MatchCase,
$MatchWholeWord,$MatchWildcards,$MatchSoundsLike,
$MatchAllWordForms,$Forward,$Wrap,$Format,
$replace,$ReplaceAll)
}
While (-Not($StoryRange.NextStoryRange -eq $null)){
$StoryRange = $StoryRange.NextStoryRange
$StoryRange.Find.Execute($find,$MatchCase,
$MatchWholeWord,$MatchWildcards,$MatchSoundsLike,
$MatchAllWordForms,$Forward,$Wrap,$Format,
$replace,$ReplaceAll)
While ($StoryRange.find.Found){
$StoryRange.Find.Execute($find,$MatchCase,
$MatchWholeWord,$MatchWildcards,$MatchSoundsLike,
$MatchAllWordForms,$Forward,$Wrap,$Format,
$replace,$ReplaceAll)
}
}
}
#shapes in footers and headers
for ($j=1; $j -le $doc.Sections.Count; $j++) {
$FooterShapesCount = $doc.Sections($j).Footers($wdHeaderFooterPrimary).Shapes.Count
$HeaderShapesCount = $doc.Sections($j).Headers($wdHeaderFooterPrimary).Shapes.Count
for ($i=1; $i -le $FooterShapesCount; $i++) {
$TextRange = $doc.Sections($j).Footers($wdHeaderFooterPrimary).Shapes($i).TextFrame.TextRange
$TextRange.Find.Execute($find,$MatchCase,
$MatchWholeWord,$MatchWildcards,$MatchSoundsLike,
$MatchAllWordForms,$Forward,$Wrap,$Format,
$replace,$ReplaceAll)
}
for ($i=1; $i -le $HeaderShapesCount; $i++) {
$TextRange = $doc.Sections($j).Headers($wdHeaderFooterPrimary).Shapes($i).TextFrame.TextRange
$TextRange.Find.Execute($find,$MatchCase,
$MatchWholeWord,$MatchWildcards,$MatchSoundsLike,
$MatchAllWordForms,$Forward,$Wrap,$Format,
$replace,$ReplaceAll)
}
}
$doc.Save()
$doc.close()
}
$word.quit()