Opening Large Set of Word Documents With Powershell - Automation - powershell

I am in the process of assigning a footer to hundreds of word documents with their current filepath. Here is my code, which does the job:
I plan to have $Word.Visible set to false, but it isn't for now for debugging purposes.
This gets all the word docs in a directory, adds footer with their file path, then saves and closes.
I am trying to handle a case like this:
I just want to skip this, or possibly force open and continue. Not sure the best way to go about this, however, and am seeking some help.
Thanks,
Elijah
Set-ExecutionPolicy bypass;
$path = 'somepath';
$documents = Get-ChildItem -Path $path *.docx -Recurse -Force
$filepaths = foreach ($document in $documents) {$document.fullname}
$Word = New-Object -ComObject Word.application;
$Word.Visible = $true;
foreach ($filepath in $filepaths){
$Doc = $Word.Documents.OpenNoRepairDialog($filepath);
$Doc.Unprotect();
$Selection = $Word.Selection;
$Doc.ActiveWindow.ActivePane.View.SeekView = 4;
$Selection.ParagraphFormat.Alignment = 1;
$Selection.TypeText($filepath);
$Doc.Save();
$Doc.Close();
}
$Word.Quit();
Edit1:
I've made an edit where it adds the dynamic field object for the file path, rather than just typing in the file path, that way if you happen to move the file, the file path can be updated to the new path. You will have to press F9 while selecting the footer in word, but this is the best you can do without making a macro and saving the file as a .docm.
Here is the amended code:
$documents = Get-ChildItem -path *docx -recurse -force
$filepaths = foreach($document in $documents){$document.FullName}
Set-Variable -Name wdFieldFileName -Value 29 -Option constant -Force -ErrorAction SilentlyContinue
$word = New-Object -ComObject Word.Application
#$word.Visible = $true
foreach($filepath in $filepaths){
$doc = $word.Documents.Open($filepath)
$sections = $doc.Sections
$item1 = $sections.Item(1)
$footer = $item1.Footers.Item(1)
$range = $footer.Range
$doc.Fields.Add($range, $wdFieldFileName, '\p')
$doc.Save()
$doc.Close()
}
$word.Quit()
I am still running into the error window when trying to open corrupted or document "in need of repair" as diagnosed by word.
Passing in multiple arguments to the Open() method does not yield results as expected. Here is an example:
Exception calling "Open" with "16" argument(s): "Type mismatch. (Exception from HRESULT: 0x80020005 (DISP_E_TYPEMISMATCH))"
At line:1 char:1
+ $doc = $word.Documents.Open($filepath, $False, $False, $False, $null, ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ComMethodTargetInvocation

Related

Powershell Loop to Write Password Protected Files

I'm trying to read excel files into Powershell, open, password protect them and write them back. I can do it individually but within a loop the script fails:
#working individually
$f = ("C:my\path\Out Files\1234dv.xlsx")
$outfile = $f.FullName + "out"
$xlNormal = -4143
$xl = new-object -comobject excel.application
$xl.Visible = $True
$xl.DisplayAlerts = $False
$wb = $xl.Workbooks.Open($f)
$a = $wb.SaveAs("C:my\path\Out Files\test.xls",$xlNormal,"test")
$a = $xl.Quit()
$a = Release-Ref($ws)
$a = Release-Ref($wb)
$a = Release-Ref($xl)
#not working in loop, error after
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject(
[System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
foreach ($f in Get-ChildItem "C:\my\path\Out Files"){
$ff = $f
$outfile = $f.FullName + "out"
$xlNormal = -4143
$xl = new-object -comobject excel.application
$xl.Visible = $True
$xl.DisplayAlerts = $False
$wb = $xl.Workbooks.Open($ff)
$a = $wb.SaveAs("C:\my\path\Out Files\test.xls",$xlNormal,"test")
$a = $xl.Quit()
$a = Release-Ref($ws)
$a = Release-Ref($wb)
$a = Release-Ref($xl)
}
Sorry, we couldn't find 1234dv.xlsx. Is it possible it was moved,
renamed or deleted? At line:16 char:5
+ $wb = $xl.Workbooks.Open($ff)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], COMException
+ FullyQualifiedErrorId : System.Runtime.InteropServices.COMException COM object that has been
separated from its underlying RCW cannot be used. At line:17 char:5
+ $a = $wb.SaveAs("C:\my\path ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], InvalidComObjectException
+ FullyQualifiedErrorId : System.Runtime.InteropServices.InvalidComObjectException
That error repeats for all four test files I'm working with.
I'm not really familiar with Powershell so I relied on MS docs, and I couldn't password protect the files in python so thought this would be easier. I know this doesn't address the password yet either but trying to get the loop to work first. Any help would be greatly appreciated. Thank you.
You should use
$wb = $xl.Workbooks.Open($ff.FullName)
To give Excel the full file path. Otherwise, $ff is a FileInfo object where a string (path) is required
Slightly off topic for your question , but not for your intent :
From a security perspective using .xls passwords is not security, it is merely an annoyance.
If you need security, then i suggest you use something like Azure Information protection that allows you to encrypt , and share the file securely only with those that need access.
You still need to create your xls or .xlsx files (or any other file for that matter) then you can the powershell simply loop over them :
PS C:\>foreach ($file in (Get-ChildItem -Path \\server1\Docs -Recurse -Force |
where {!$_.PSIsContainer} |
Where-Object {$_.Extension -eq ".xls"})) {
Protect-RMSFile -File $file.PSPath -InPlace -DoNotPersistEncryptionKey All -TemplateID "e6ee2481-26b9-45e5-b34a-f744eacd53b0" -OwnerEmail "IT#Contoso.com"
}
https://learn.microsoft.com/en-us/powershell/module/azureinformationprotection/protect-rmsfile?view=azureipps

Using SaveAs on a Word object from PowerShell fails with [ref] argument error

I'm using Word to convert a docx to PDF from PowerShell by opening the document and writing it by using SaveAs().
My code:
# got that hint from http://stackoverflow.com/questions/36487507/troubles-using-powershell-and-word-saveas
[Reflection.Assembly]::LoadWithPartialName("Microsoft.Office.Interop.Word") | Out-Null
$Word = New-Object -ComObject "Word.Application"
$Word.Visible = $False
foreach ($transf_file in $doc_path) {
$transf_pdf_file = $transf_file -replace "`.docx?$", ".pdf"
$rel_path = Split-Path -Parent $transf_file
Copy-Item -Path "$src_pp_dir\$transf_file" -Destination "$dest_pp_dir\$rel_path" -Force
$word_doc = $Word.Documents.Open( "$dest_pp_dir\$transf_file" )
# Hess / Herlet workaround die nächste Zeile einkommentieren, die übernächste Zeile auskommentieren
#$word_doc.SaveAs( "$dest_pp_dir\$transf_pdf_file" ,17 )
$word_doc.SaveAs( [ref] [system.object] "$dest_pp_dir\$transf_pdf_file" ,[ref]17)
$word_doc.Close()
Remove-Item -Force -Path "$dest_pp_dir\$transf_file"
}
# this line avoids saving of normal.dot what is normally requested when another
# Word is open in parallel
# the rest is necessary to kill this word process (otherwise they sum up in the
# system and after a while it doesn't work anymore)
$Word.NormalTemplate.Saved = $true
$Word.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($Word) > $null
Remove-Variable Word
The code runs on my computer as expected.
On a colleagues computer the line:
$word_doc.SaveAs( [ref] [system.object] "$dest_pp_dir\$transf_pdf_file" ,[ref]17)
throws an error:
Argument: '1' should not be a System.Management.Automation.PSReference. Do not
use [ref].
At D:\DMG\NX_Projekte\handle\PP-Encode.ps1:1015 char:7
+ $word_doc.SaveAs( [ref] [system.object] "$dest_pp_dir\$transf_pdf_file" ,[ ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodException
+ FullyQualifiedErrorId : RefArgumentToNonRefParameterMsg
On his machine the line before (now as comment, without [ref]), works fine.
What I checked:
We are using identical versions of PowerShell, Word, Word COM object, .NET Framework.
I scanned the hints on this site and improved in particular exiting from Word (making sure that all Word processes are finished -- what's the case on my computer).
I couldn't find any hints to this specific problem neither here nor elsewhere.

Getting error on converting ppt to PDF

Can some please guide me what is going wrong with the code? I need to convert a PowerPoint file to a PDF file.
Code:
#Convert Powerpoint formats to pdf
Param(
[string]$inputPath,
[string]$outputPath
)
Add-Type -AssemblyName Office
Add-Type -AssemblyName Microsoft.Office.Interop.PowerPoint
$ppFormatPDF = 32
$ppQualityStandard = 0
$pp = New-Object -ComObject PowerPoint.Application
# TODO: Why this property does not work
#$pp.visible = [Microsoft.Office.Core.MsoTriState]::msoFalse
$ppt = $pp.Presentations.Open($inputPath)
$ppt.SaveAs($outputPath, $ppFormatPDF) # 32 is for PDF
$ppt.Close()
$pp.Quit()
$pp = $null
[gc]::Collect()
[gc]::WaitForPendingFinalizers()
Error:
Exception calling "SaveAs" with "2" argument(s): "Presentation.SaveAs :
PowerPoint can't save ^0 to ^1."
At D:\AllAquent\Rambo\Digo\war\WEB-INF\classes\resources\pptToPdf.ps1:17 char:12
+ $ppt.SaveAs <<<< ($outputPath, $opt) # 32 is for PDF
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ComMethodTargetInvocation
Used this script a couple of months ago, removed my own modifications before posting here. Hope this might help you!
Note that this script is designed to convert all Powerpoint presentations in specified directory to PDFs.
Function Convert-PptxToPDF {
[CmdletBinding()]
Param(
$File,
$OutputFile
)
# add key assemblies
Add-type -AssemblyName office -ErrorAction SilentlyContinue
Add-Type -AssemblyName microsoft.office.interop.powerpoint -ErrorAction SilentlyContinue
# Open PowerPoint
$ppt = new-object -com powerpoint.application
$ppt.visible = [Microsoft.Office.Core.MsoTriState]::msoFalse
# Open the $File presentation
$pres = $ppt.Presentations.Open($file)
# Now save it away as PDF
$opt= [Microsoft.Office.Interop.PowerPoint.PpSaveAsFileType]::ppSaveAsPDF
$pres.SaveAs($OutputFile,$opt)
# and Tidy-up
$pres.Close()
$ppt.Quit()
$ppt=$null
}
#Where your PDF will be saved
$OutputFile = "C:\Temp\"
# File-extension could be changed to .pptx if needed
Foreach ($File in $(ls $OutputFile -Filter "*.ppt")) {
# Build name of output file
$pathname = split-path $File
$filename = split-path $File -leaf
$rmfileext = $filename.split(".")[0]
$OutputFile = $pathname + $rmfileext + ".pdf"
# Convert _this_ file to PDF
Convert-PptxToPDF -file $File -OutputFile $OutputFile
}
I take no credit for this script.

Basic Powershell - batch convert Word Docx to PDF

I am trying to use PowerShell to do a batch conversion of Word Docx to PDF - using a script found on this site:
http://blogs.technet.com/b/heyscriptingguy/archive/2013/03/24/weekend-scripter-convert-word-documents-to-pdf-files-with-powershell.aspx
# Acquire a list of DOCX files in a folder
$Files=GET-CHILDITEM "C:\docx2pdf\*.DOCX"
$Word=NEW-OBJECT –COMOBJECT WORD.APPLICATION
Foreach ($File in $Files) {
# open a Word document, filename from the directory
$Doc=$Word.Documents.Open($File.fullname)
# Swap out DOCX with PDF in the Filename
$Name=($Doc.Fullname).replace("docx","pdf")
# Save this File as a PDF in Word 2010/2013
$Doc.saveas([ref] $Name, [ref] 17)
$Doc.close()
}
And I keep on getting this error and can't figure out why:
PS C:\docx2pdf> .\docx2pdf.ps1
Exception calling "SaveAs" with "16" argument(s): "Command failed"
At C:\docx2pdf\docx2pdf.ps1:13 char:13
+ $Doc.saveas <<<< ([ref] $Name, [ref] 17)
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : DotNetMethodException
Any ideas?
Also - how would I need to change it to also convert doc (not docX) files, as well as use the local files (files in same location as the script location)?
Sorry - never done PowerShell scripting...
This will work for doc as well as docx files.
$documents_path = 'c:\doc2pdf'
$word_app = New-Object -ComObject Word.Application
# This filter will find .doc as well as .docx documents
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {
$document = $word_app.Documents.Open($_.FullName)
$pdf_filename = "$($_.DirectoryName)\$($_.BaseName).pdf"
$document.SaveAs([ref] $pdf_filename, [ref] 17)
$document.Close()
}
$word_app.Quit()
The above answers all fell short for me, as I was doing a batch job converting around 70,000 word documents this way. As it turns out, doing this repeatedly eventually leads to Word crashing, presumably due to memory issues (the error was some COMException that I didn't know how to parse). So, my hack to get it to proceed was to kill and restart word every 100 docs (arbitrarily chosen number).
Additionally, when it did crash occasionally, there would be resulting malformed pdfs, each of which were generally 1-2 kb in size. So, when skipping already generated pdfs, I make sure they are at least 3kb in size. If you don't want to skip already generated PDFs, you can delete that if statement.
Excuse me if my code doesn't look good, I don't generally use Windows and this was a one-off hack. So, here's the resulting code:
$Files=Get-ChildItem -path '.\path\to\docs' -recurse -include "*.doc*"
$counter = 0
$filesProcessed = 0
$Word = New-Object -ComObject Word.Application
Foreach ($File in $Files) {
$Name="$(($File.FullName).substring(0, $File.FullName.lastIndexOf("."))).pdf"
if ((Test-Path $Name) -And (Get-Item $Name).length -gt 3kb) {
echo "skipping $($Name), already exists"
continue
}
echo "$($filesProcessed): processing $($File.FullName)"
$Doc = $Word.Documents.Open($File.FullName)
$Doc.SaveAs($Name, 17)
$Doc.Close()
if ($counter -gt 100) {
$counter = 0
$Word.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($Word)
$Word = New-Object -ComObject Word.Application
}
$counter = $counter + 1
$filesProcessed = $filesProcessed + 1
}
This works for me (Word 2007):
$wdFormatPDF = 17
$word = New-Object -ComObject Word.Application
$word.visible = $false
$folderpath = Split-Path -parent $MyInvocation.MyCommand.Path
Get-ChildItem -path $folderpath -recurse -include "*.doc" | % {
$path = ($_.fullname).substring(0,($_.FullName).lastindexOf("."))
$doc = $word.documents.open($_.fullname)
$doc.saveas($path, $wdFormatPDF)
$doc.close()
}
$word.Quit()
Neither of the solutions posted here worked for me on Windows 8.1 (btw. I'm using Office 365). My PowerShell somehow does not like the [ref] arguments (I don't know why, I use PowerShell very rarely).
This is the solution that worked for me:
$Files=Get-ChildItem 'C:\path\to\files\*.docx'
$Word = New-Object -ComObject Word.Application
Foreach ($File in $Files) {
$Doc = $Word.Documents.Open($File.FullName)
$Name=($Doc.FullName).replace('docx', 'pdf')
$Doc.SaveAs($Name, 17)
$Doc.Close()
}
I've updated this one to work on latest office :
# Get invocation path
$curr_path = Split-Path -parent $MyInvocation.MyCommand.Path
# Create a PowerPoint object
$ppt_app = New-Object -ComObject PowerPoint.Application
#$ppt.visible = $false
# Get all objects of type .ppt? in $curr_path and its subfolders
Get-ChildItem -Path $curr_path -Recurse -Filter *.ppt? | ForEach-Object {
Write-Host "Processing" $_.FullName "..."
# Open it in PowerPoint
$document = $ppt_app.Presentations.Open($_.FullName,0,0,0)
# Create a name for the PDF document; they are stored in the invocation folder!
# If you want them to be created locally in the folders containing the source PowerPoint file, replace $curr_path with $_.DirectoryName
$pdf_filename = "$($curr_path)\$($_.BaseName).pdf"
# Save as PDF -- 17 is the literal value of `wdFormatPDF`
#$opt= [Microsoft.Office.Interop.PowerPoint.PpSaveAsFileType]::ppSaveAsPDF
$document.SaveAs($pdf_filename,32)
# Close PowerPoint file
$document.Close()
}
# Exit and release the PowerPoint object
$ppt_app.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($ppt_app)

Powershell - SaveAs function when file already exists

I'm trying to run some code that looks for all .doc & .docx files in a directory & sub-directories and then converts each one to PDF format.
The code below works only if there are no instances of the pdf in these directories i.e. it only works first time. Every subsequent time it fails with:
Exception calling "SaveAs" with "2" argument(s): "Command failed"
At C:\convert\convertword.ps1:12 char:13
+ $doc.saveas <<<< ($path, $wdFormatPDF)
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ComMethodTargetInvocation
When I delete the previously created PDFs and re-run the PS it works fine. Therefore I can only assume there is a switch or parameter that I'm missing from my SaveAs function which somehow forces the overwrite?
$wdFormatPDF = 17
$word = New-Object -ComObject word.application
$word.visible = $false
$folderpath = "c:\convert\*"
$fileTypes = "*.docx","*doc"
Get-ChildItem -path $folderpath -recurse -include $fileTypes |
foreach-object `
{
$path = ($_.fullname).substring(0,($_.FullName).lastindexOf("."))
"Converting $path to pdf ..."
$doc = $word.documents.open($_.fullname)
$doc.saveas($path, $wdFormatPDF)
$doc.close()
}
$word.Quit()
Ok I finally think I've tracked down the problem. It's the Windows Explorer Preview Pane which is locking the file. I had show preview pane turned on the directory where the files were being created and converted, this must have been creating a file lock on the pdf's therefore the script cannot save the new pdf. I turned off preview pane in my Windows Explorer and the script now works repeatedly! Therefore nothing wrong with the Powershell Scripting but thanks for all the input guys. Here's a link to the closest MS KB article that I could find on the subject http://support.microsoft.com/kb/942146
try this:
$word.displayalerts = $false
$doc.saveas($path, $wdFormatPDF) # with Word2010 I've to use $doc.saveas([ref]$path, [ref]$wdFormatPDF)
$word.displayalerts = $true
No error is raised, but I'm using Word2010 I can't test it with other versions
There's no flag to overwrite according to the documentation for SaveAs and SaveAs2. So you could just remove it before saving with something like this:
Remove-Item -Path $path -Force -ErrorAction SilentlyContinue
$doc.saveas ($path, $wdFormatPDF)