Extract sections from range found in word document - powershell

Below code works fine, we got start and end point which needs to be extracted but im not able to get range.set/select to work
I'm able to get the range from below, just need to extra and save it to CSV file...
$found = $paras2.Range.SetRange($startPosition, $endPosition) - this piece doesn't work.
$file = "D:\Files\Scan.doc"
$SearchKeyword1 = 'Keyword1'
$SearchKeyword2 = 'Keyword2'
$word = New-Object -ComObject Word.Application
$word.Visible = $false
$doc = $word.Documents.Open($file,$false,$true)
$sel = $word.Selection
$paras = $doc.Paragraphs
$paras1 = $doc.Paragraphs
$paras2 = $doc.Paragraphs
foreach ($para in $paras)
{
if ($para.Range.Text -match $SearchKeyword1)
{
Write-Host $para.Range.Text
$startPosition = $para.Range.Start
}
}
foreach ($para in $paras1)
{
if ($para.Range.Text -match $SearchKeyword2)
{
Write-Host $para.Range.Text
$endPosition = $para.Range.Start
}
}
Write-Host $startPosition
Write-Host $endPosition
$found = $paras2.Range.SetRange($startPosition, $endPosition)
# cleanup com objects
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($doc) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

This line of code is the problem
$found = $paras2.Range.SetRange($startPosition, $endPosition)
When designating a Range by the start and end position it's necessary to do so relative to the document. The code above refers to a Paragraphs collection. In addition, it uses SetRange, but should only use the Range method. So:
$found = $doc.Range.($startPosition, $endPosition)

Related

Powershell function not receiving parameters

I have a Powershell script. My ultimate goal is to compare the two Excel files and highlight differences in both versions. Part of my "preparatory code" is this:
function DefineVars () {
Clear-Host
# Define some basic variables
$Directory = Split-Path -Parent $PSCommandPath
$FilePath = $Directory + "\xlsx\"
$FileName1 = $FilePath + "Firewallv2.xlsx"
$FileName2 = $FilePath + "Firewallv3.xlsx"
$OutFile1 = $FilePath + "file1_raw.csv"
$OutFile2 = $FilePath + "file2_raw.csv"
# Create an Object Excel.Application using Com interface
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $false
$Excel.DisplayAlerts = $false
# Generate the Workbook Objects
$WorkBook1 = $Excel.Workbooks.Open($FileName1)
$WorkBook2 = $Excel.Workbooks.Open($FileName2)
return $Directory, $FilePath, $FileName1, $FileName2, $OutFile1, $OutFile2, $Excel, $WorkBook1, $WorkBook2
}
function GenerateData ($WorkBook, $OutFile) {
$Results = #()
Write-Host $OutFile
foreach ($CurrentWorkSheet in $WorkBook.Worksheets) {
$CurrentWorkSheetName = $CurrentWorkSheet.Name
$CurrentWorkSheetRows = $CurrentWorkSheet.UsedRange.Rows.Count
$CurrentWorkSheetColumns = $CurrentWorkSheet.UsedRange.Columns.Count
$CurrentWorkSheet.Activate()
for ($CurrentColumn = 1; $CurrentColumn -le $CurrentWorkSheetColumns; $CurrentColumn++) {
for ($CurrentRow = 1; $CurrentRow -le $CurrentWorkSheetRows; $CurrentRow++) {
$CurrentCell = $CurrentWorksheet.Cells.Item($CurrentRow, $CurrentColumn)
$CurrentCellContent = $CurrentCell.Text
if ([System.IO.File]::Exists($OutFile)) {
Write-Host "true"
#","+$CurrentCellContent | Out-File $OutFile -Append
} else {
Write-Host "false"
#$CurrentCellContent | Out-File $OutFile
}
}
}
}
return $Results
}
function CloseExcel () {
$WorkBook1.Close($true)
$WorkBook2.Close($true)
$Excel.Quit()
spps -n Excel
}
$Directory, $FilePath, $FileName1, $FileName2, $OutFile1, $OutFile2, $Excel, $WorkBook1, $WorkBook2 = DefineVars
$ResultsFile1 = GenerateData($WorkBook1, $OutFile1)
$ResultsFile2 = GenerateData($WorkBook2, $OutFile2)
CloseExcel
My problem is that the parameter call to the GenerateData functions of the $OutFile variables won't work for some reason. All the other parameters appear to be passed successfully, e.g. the WorkBooks. But if I insert a Write-Host $OutFile at the beginning of the GenerateData function, the string is empty (which means it doesn't get passed, if I am not mistaken).
I am sure this is easily explained, but I just can't seem to figure this one out.
Thanks and best
Simon
I got it. My problem was the syntax in the main method. Being caught up in other languages, I thought I needed parentheses and commas to pass arguments. Yet, it's much simpler with Powershell:
$ResultsFile1 = GenerateData $WorkBook1 $OutFile1
$ResultsFile2 = GenerateData $WorkBook2 $OutFile2
CompareObjects $ResultsFile1 $ResultsFile2
CloseExcel
This did the trick! The only weird thing about it is that Powershell doesn't throw an error, if you stick to the parentheses-comma style of coding. The argument simply doesn't get passed.

I want to replace text in FieldCodes (Word Document). How can i use variable for that?

I want to replace text in FieldCodes (Word Document). How can i use variables for that?
This is for a Word Doc with links to other Word Doc's (IncludeText Link). When i change the link one by one without variable it works. When i use variables for it, it doesn't.
$Desktop = [Environment]::GetFolderPath("Desktop")
$Word = New-Object -ComObject Word.Application
$Document = $Word.Documents.Open("$Desktop\Test.docx")
$Test = 147 # (Test.GetType() = Int32)
$Document.Fields(147) #Works
$Document.Fields($Test) #Works
$Document.Fields($Test).LinkFormat.SourceFullName = "" #Works
$TextLinks = $Document.Fields | Where-Object Type -eq "68" | Select -expand Index
#TextLinks contains value 147 and 149
$Test = $TextLinks[0] # is also 147 (Test.GetType() = Int32)
$Document.Fields($Test) #Doesn't work (runs indefinitely)
$Document.Fields($Test).LinkFormat.SourceFullName = "" #Doesn't work (runs indefinitely)
147..149 | Foreach { $Document.Fields($_).LinkFormat.SourceFullName } #Doesn't work (runs indefinitely)
Update:
Now it runs with $Test = [INT]$Textlinks[0]. Thanks Cindy!
But when i try a loop it hangs with te second value
$Desktop = [Environment]::GetFolderPath("Desktop")
$Word = New-Object -ComObject Word.Application
$Document = $Word.Documents.Open("$Desktop\Test.docx")
$TextLinks = $Document.Fields | Where-Object Type -eq "68" | Select -expand Index
$ItemNumber = 0
$End = $TextLinks.Count
Do {
$Item = [INT]$Textlinks[$ItemNumber]
if ($Document.Fields($Item).LinkFormat.SourceFullName -match "Test") {
$Link = $Document.Fields($Item).LinkFormat.SourceFullName -replace "Test", "TestTest"
$Document.Fields($Item).LinkFormat.SourceFullName = $Link
$Document.Fields($Item).LinkFormat.AutoUpdate = "True"
$ItemNumber += 1
}
} Until ($ItemNumber -eq $End)
$Document.Save()
$Word.Quit()
$OUT=[System.Runtime.InteropServices.Marshal]::ReleaseComObject($Word)
Update 2:
Code below runs fine but i dont understand why the code above doesn't
$Desktop = [Environment]::GetFolderPath("Desktop")
$Word = New-Object -ComObject Word.Application
$Document = $Word.Documents.Open("$Desktop\Test.docx")
$TextLinks = $Document.Fields | Where-Object Type -eq "68" | Select -expand Index
$ItemNumber = ($TextLinks.Count)-1
$End = -1
Do {
$Item = [INT]$Textlinks[$ItemNumber]
if ($Document.Fields($Item).LinkFormat.SourceFullName -match "Test") {
$Link = $Document.Fields($Item).LinkFormat.SourceFullName -replace "Test", "TestTest"
$Document.Fields($Item).LinkFormat.SourceFullName = $Link
$Document.Fields($Item).LinkFormat.AutoUpdate = "True"
$ItemNumber -= 1
}
} Until ($ItemNumber -eq $End)
$Document.Save()
$Word.Quit()
$OUT=[System.Runtime.InteropServices.Marshal]::ReleaseComObject($Word)

Windows Power Shell rename files

I am sort of new to scripting and here's my task:
A folder with X files. Each file contains some Word documents, Excel sheets, etc. In these files, there is a client name and I need to assign an ID number.
This change will affect all the files in this folder that contain this client's name.
How can do this using Windows Power Shell?
$configFiles = Get-ChildItem . *.config -rec
foreach ($file in $configFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace " JOHN ", "123" } |
Set-Content $file.PSPath
}
Is this the right approach ?
As #lee_Daily pointed out you would need to have different code to perform a find and replace in different file types. Here is an example of how you could go about doing that:
$objWord = New-Object -comobject Word.Application
$objWord.Visible = $false
foreach ( $file in (Get-ChildItem . -r ) ) {
Switch ( $file.Extension ) {
".config" {
(Get-Content $file.FullName) |
Foreach-Object { $_ -replace " JOHN ", "123" } |
Set-Content $file.FullName
}
{('.doc') -or ('.docx')} {
### Replace in word document using $file.fullname as the target
}
{'.xlsx'} {
### Replace in spreadsheet using $file.fullname as the target
}
}
}
For the actual code to perform the find and replace, i would suggest com objects for both.
Example of word find and replace https://codereview.stackexchange.com/questions/174455/powershell-script-to-find-and-replace-in-word-document-including-header-footer
Example of excel find and replace Search & Replace in Excel without looping?
I would suggest learning the ImportExcel module too, it is a great tool which i use a lot.
For Word Document : This is what I'm using. Just can't figure out how this script could also change Header and Footer in a Word Document
$objWord = New-Object -comobject Word.Application
$objWord.Visible = $false
$list = Get-ChildItem "C:\Users\*.*" -Include *.doc*
foreach($item in $list){
$objDoc = $objWord.Documents.Open($item.FullName,$true)
$objSelection = $objWord.Selection
$wdFindContinue = 1
$FindText = " BLAH "
$MatchCase = $False
$MatchWholeWord = $true
$MatchWildcards = $False
$MatchSoundsLike = $False
$MatchAllWordForms = $False
$Forward = $True
$Wrap = $wdFindContinue
$Format = $False
$wdReplaceNone = 0
$ReplaceWith = "help "
$wdFindContinue = 1
$ReplaceAll = 2
$a = $objSelection.Find.Execute($FindText,$MatchCase,$MatchWholeWord, `
$MatchWildcards,$MatchSoundsLike,$MatchAllWordForms,$Forward,`
$Wrap,$Format,$ReplaceWith,$ReplaceAll)
$objDoc.Save()
$objDoc.Close()
}
$objWord.Quit()
What If I try to run on C# ? Is anything else missing?
}
string rootfolder = #"C:\Temp";
string[] files = Directory.GetFiles(rootfolder, "*.*",SearchOption.AllDirectories);
foreach (string file in files)
{ try
{ string contents = File.ReadAllText(file);
contents = contents.Replace(#"Text to find", #"Replacement text");
// Make files writable
File.SetAttributes(file, FileAttributes.Normal);
File.WriteAllText(file, contents);
}
catch (Exception ex)
{ Console.WriteLine(ex.Message);
}
}

Optimize code to search for multiple keywords in word document

Below code works perfectly fine, and extract the required sections from word document. We have to search for nearly 50 keywords one by one in document, how can we optimize this code. Because with currently approach we search for one keyword, then open document again and search again. This is taking very long in execution time. Any suggestions.
BELOW code: Extract text between two HEADINGS, how to optimize this so that we can it make it work to search for 50 keywords... opening document once and scanning till end..
function ExtractSectionsFromWordDoc{
Param([string]$SourceFile, [string]$Category, [string]$SearchKeyword1, [string]$SearchKeyword2, [string]$TableName)
$word = New-Object -ComObject Word.Application
$word.Visible = $false
$doc = $word.Documents.Open($SourceFile,$false,$true)
$sel = $word.Selection
$paras = $doc.Paragraphs
foreach ($para in $paras)
{
$style = $para.Style
If ($style.NameLocal -eq "Heading 2")
{
if ($para.Range.Text -match $SearchKeyword1)
{
$startPosition = $para.Range.Start
Write-Host $startPosition
}
if ($para.Range.Text -match $SearchKeyword2)
{
$endPosition = $para.Range.Start
Write-Host $endPosition
break
}
}
}
[array]$content=New-Object System.Collections.ArrayList
$content=$doc.Range($startPosition, $endPosition).text
$content = $content -replace "'", ""
# cleanup com objects
$doc.Close()
$word.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($doc) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}

Document manipluation: powershell

I am attempting to take a large document, search for a "^m" (page break) and create a new text file for each page break I find.
Using:
$SearchText = "^m"
$word = new-object -ComObject "word.application"
$path = "C:\Users\me\Documents\Test.doc"
$doc = $word.documents.open("$path")
$doc.content.find.execute("$SearchText")
I am able to find text, but how do I save the text before the page break into a new file? In VBScript, I would just do a readline and save it to a buffer, but powershell is much different.
EDIT:
$text = $word.Selection.MoveUntil (cset:="^m")
returns an error:
Missing ')' in method call.
I think my solution is kinda stupid, but here is my own solution (please help me find a better one):
Param(
[string]$file
)
#$file = "C:\scripts\docSplit\test.docx"
$word = New-Object -ComObject "word.application"
$doc=$word.documents.open($file)
$txtPageBreak = "<!--PAGE BREAK--!>"
$fileInfo = Get-ChildItem $file
$folder = $fileInfo.directoryName
$fileName = $fileInfo.name
$newFileName = $fileName.replace(".", "")
#$findtext = "^m"
#$replaceText = $txtPageBreak
function Replace-Word ([string]$Document,[string]$FindText,[string]$ReplaceText) {
#Variables used to Match And Replace
$ReplaceAll = 2
$FindContinue = 1
$MatchCase = $False
$MatchWholeWord = $True
$MatchWildcards = $False
$MatchSoundsLike = $False
$MatchAllWordForms = $False
$Forward = $True
$Wrap = $FindContinue
$Format = $False
$Selection = $Word.Selection
$Selection.Find.Execute(
$FindText,
$MatchCase,
$MatchWholeWord,
$MatchWildcards,
$MatchSoundsLike,
$MatchAllWordForms,
$Forward,
$Wrap,
$Format,
$ReplaceText,
$ReplaceAll
)
$newFileName = "$folder\$newFileName.txt"
$Doc.saveAs([ref]"$newFileName",[ref]2)
$doc.close()
}
Replace-Word($file, "^m", $txtPageBreak)
$word.quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word)
Remove-Variable word
#begin txt file manipulation
#add end of file marker
$eof = "`n<!--END OF FILE!-->"
Add-Content $newfileName $eof
$masterTextFile = Get-Content $newFileName
$buffer = ""
foreach($line in $masterTextFile){
if($line.compareto($eof) -eq 0){
#end of file, save buffer to new file, be done
}
else {
$found = $line.CompareTo($txtPageBreak)
if ($found -eq 1) {
$buffer = "$buffer $line `n"
}
else {
#save the buffer to a new file (still have to write this part)
}
}
}