Powershell Scripting -- Docx to PDF - powershell

I have a script (below) that converts docx to PDF. However, after it gets to file 204 or 205, I get a memory exceeded message and the process stops. I have about 40,000 docx that need to be converted. Can someone help with making this more efficient or possibly add a loop that closes the application after every 150 documents then re-opens the application and continues? Any help would be appreciated.
$documents_path = 'C:\Users\jgentile\Desktop\Purdue\DocX\All'
$word_app = New-Object -ComObject Word.Application
$i=0
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {
If( $i%150 ) { $word_app.Quit(); $word_app = New-Object -ComObject Word.Application }
$document = $word_app.Documents.Open($_.FullName)
$pdf_filename = "C:\Users\jgentile\Desktop\Purdue\PDFs\$($_.BaseName)_Discipline.pdf"
$document.SaveAs([ref] $pdf_filename, [ref] 17)
$document.Close()
$i++
}
$word_app.Quit()

The way your if statement is set up, it would execute every iteration of the loop except for 150, which I'm guessing is not what you intended. Also, you should release the comobject in order to avoid out of memory exception
If(!($i%150)) {
$word_app.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word_app)
Remove-Variable word_app
$word_app = New-Object -ComObject Word.Application
}
http://technet.microsoft.com/en-us/library/ff730962.aspx

An if-test needs a true or false value. 1%150 will return 1, 2%150 will return 2 etc. -> value is not null. You're if-test will always be true, except for 0 which is converted to false. So your script actually works the opposite way of how you want it.
You should test if $i%150 -eq 0. That will only happend when $i is 0, 150, 300, 450 etc. If you don't want it to run on 0, start your $i on 1. :)
Try
If($i%150 -eq 0) { $word_app.Quit(); $word_app = New-Object -ComObject Word.Application }
You could also just flip the if-test(turning false to true) using if(-not($i%150)), but I think -eq 0 is more readable.
UPDATE: As #Cole9350 pointed out, you should also release the Comobject after calling Quit() to free up the handle on the resource(process) so that Word closes properly. Like:
If($i%150 -eq 0) {
$word_app.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word_app)
$word_app = New-Object -ComObject Word.Application
}

Related

Removing passwords from .Docx files using Powershell

I'm very new to Powershell and been banging my head against this for a while, hopefully someone can point me towards where I am going wrong. I am trying to use Powershell to remove the opening passwords from multiple .docx files in a folder. I can get it to change the password to something else but cannot get it to remove entirely, the part in Bold below is where I am getting tripped up and the error code details are at the bottom, appreciate any help with this!
$path = ("FilePath")
$passwd = ("CurrentPassword")
$counter=1
$WordObj = New-Object -ComObject Word.Application
foreach ($file in $count=Get-ChildItem $path -Filter \*.docx) {
$WordObj.Visible = $true
$WordDoc = $[WordObj.Documents.Open](https://WordObj.Documents.Open)
($file.FullName, $null, $false, $null, $passwd)
$WordDoc.Activate()
$WordDoc.Password=$null
$WordDoc.Close()
Write-Host("Finished: "+$counter+" of "+$count.Length)
$counter++
}
$WordObj.Application.Quit()
**Error details -** Object reference not set to an instance of an object. At line: 14 char: 5
\+$WordDoc.Password=$Null
\+Category info: Operations Stopped: (:) \[\], NullReferenceException
\+FullyQualifiedErrorId: System.NullReferenceException
I got an answer elsewhere to try using .unprotect instead but not sure how to insert this into my code!
$path = 'X:\TheFolderWhereTheProtectedDocumentsAre'
$passwd = 'CurrentPassword'
$counter = 0
$WordObj = New-Object -ComObject Word.Application
$WordObj.Visible = $false
# get the .docx files. Make sure this is an array using #()
$documentFiles = #(Get-ChildItem -Path $path -Filter '*.docx' -File)
foreach ($file in $documentFiles) {
try {
# add password twice, first for the document, next for the documents template
$WordDoc = $WordObj.Documents.Open($file.FullName, $null, $false, $null, $passwd, $passwd)
$WordDoc.Activate()
$WordDoc.Password = $null
$WordDoc.Close(-1) # wdSaveChanges, see https://learn.microsoft.com/en-us/office/vba/api/word.wdsaveoptions
$counter++
}
catch {
Write-Warning "Could not open file $($file.FullName):`r`n$($_.Exception.Message)"
}
}
Write-Host "Finished: $counter documents of $($documentFiles.Count)"
# quit Word and dispose of the used COM objects in memory
$WordObj.Quit()
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($WordDoc)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($WordObj)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

Powershell Word SaveAs command errors when run using a service account

This has been driving me nuts for days.... I have a powershell script that converts all .doc files in a target directory to PDF's using Word SaveAs interop.
The script works fine when run within context of the logged in user, but errors with "You cannot call a method on a null-valued expression." when I try to execute the script using a service account (via task scheduler, run as another user)... service account has local admin rights.
The exception occurs at this line: $Doc.SaveAs([ref]$Name.value,[ref]17)
My code is as follows, Im not the best coder in the world so any advice would be gratefully received.
thanks.
try
{
$FileSource = 'D:\PROCESSOR\NewArrivals\*.doc'
$SuccessPath = 'D:\PROCESSOR\Success\'
$docextn='.doc'
$Files=Get-ChildItem -path $FileSource
$counter = 0
$filesProcessed = 0
$Word = New-Object -ComObject Word.Application
#check files exist to be processed.
$WordFileCount = Get-ChildItem $FileSource -Filter *$docextn -File| Measure-Object | %{$_.Count} -ErrorAction Stop
If ($WordFileCount -gt 0) {
Foreach ($File in $Files) {
$Name="$(($File.FullName).substring(0, $File.FullName.lastIndexOf("."))).pdf"
$Doc = $Word.Documents.Open($File.FullName)
$Doc.SaveAs([ref]$Name.value,[ref]17)
$Doc.Close()
if ($counter -gt 100) {
$counter = 0
$Word.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($Word)
$Word = New-Object -ComObject Word.Application
}
$counter = $counter + 1
$filesProcessed = $filesProcessed + 1
}
}
$Word.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($Word)
}
catch
{
}
finally
{
}
If you are certain the service account has access to Word, then I think the exception you encounter is in the [ref] while doing the SaveAs().
AKAIK only Office versions below 2010 need [ref], versions above do not.
Next I think your code can be tydied up somewhat, for instance by releasing the com objects ($Doc and $Word) inside the finally block, as that is always executed.
Also, there is no need to perform a Get-ChildItem twice.
Something like this:
$SuccessPath = 'D:\PROCESSOR\Success'
$FileSource = 'D:\PROCESSOR\NewArrivals'
$filesProcessed = 0
try {
$Word = New-Object -ComObject Word.Application
$Word.Visible = $false
# get a list of FileInfo objects that have the .doc extension and loop through
Get-ChildItem -Path $FileSource -Filter '*.doc' -File | ForEach-Object {
# change the extension to pdf for the output file
$pdf = [System.IO.Path]::ChangeExtension($_.FullName, '.pdf')
$Doc = $Word.Documents.Open($_.FullName)
# Check Version of Office Installed. Pre 2010 versions need the [ref]
if ($word.Version -gt '14.0') {
$Doc.SaveAs($pdf, 17)
}
else {
$Doc.SaveAs([ref]$pdf,[ref]17)
}
$Doc.Close($false)
$filesProcessed++
}
}
finally {
# cleanup code
if ($Word) {
$Word.Quit()
$null = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($Doc)
$null = [System.Runtime.InteropServices.Marshal]::ReleaseComObject($Word)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
$Word = $null
}
}
Then, there is the question of $SuccessPath. You never use it. Is it your intention to save the PDF files in that path? If so, change the line
$pdf = [System.IO.Path]::ChangeExtension($_.FullName, '.pdf')
into
$pdf = Join-Path -Path $SuccessPath -ChildPath ([System.IO.Path]::ChangeExtension($_.Name, '.pdf'))
Hope that helps

Powershell Internet Explorer Automation

Trying to get powershell to start different websites at some time intervals.
Here is a script that works:
function IEWeb {
$ie = New-Object -Comobject 'InternetExplorer.Application'
$ie.visible=$true
Do
{
$ie.navigate('http://p-captas02.int.addom.dk/cap-tas-views/Queue.aspx')
start-sleep 15
$ie.navigate('https://oneview.int.addom.dk/dashboard?dashboard_id=1')
start-sleep 15
$ie.navigate('https://oneview.int.addom.dk/dashboard?time=0&scroll_value=15&dashboard_id=10')
start-sleep 15
}
While ($ie.name -contains 'Internet Explorer')
}#Function
The problem is that it does not work every time
Is there anyone who knows another way of doing it?
It is important that the websites are started in the same tab
I think it would be better to check if IE has not been closed by the user at some point before trying to navigate to the next url. Also, $ie.name is a String, so $ie.name -contains 'Internet Explorer' would be wrong.
Maybe this works better for you.
function IEWeb {
# create an array with the urls you want to revolve
$urls = 'http://p-captas02.int.addom.dk/cap-tas-views/Queue.aspx',
'https://oneview.int.addom.dk/dashboard?dashboard_id=1',
'https://oneview.int.addom.dk/dashboard?time=0&scroll_value=15&dashboard_id=10'
$ie = New-Object -Comobject 'InternetExplorer.Application'
$ie.visible=$true
$index = 0
while ($ie.HWND) { # for as long as the user does not close IE
$ie.navigate($urls[$index])
Start-Sleep 15
# increment the array counter, and revert to index 0 if $urls length is reached
$index = ($index + 1) % $urls.Count
}
try {
# close and release the Com object from memory
$ie.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($ie) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
catch {}
}
IEWeb

Powershell with Word CheckSpelling and selected dictionary

I would like to spell check some strings using the Microsoft Word API in Powershell and a specific dictionary ("English (US)").
I use the following code to do the checking but it does not seem to take into account the dictionary I want. Any ideas what is wrong? Also, command "New-Object -COM Word.Dictionary" seems to fail.
$word = New-Object -COM Word.Application
$dictionary = New-Object -COM Word.Dictionary
foreach ($language in $word.Languages) {
if ($language.Name.Contains("English (US)")) {
$dictionary = $language.ActiveSpellingDictionary;
break;
}
}
Write-Host $dictionary.Name
$check = $word.CheckSpelling("Color", [ref]$null, [ref]$null, [ref]$dictionary)
if(!$check) {
Write-Host "Spelling Error!"
}
$word.quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
Remove-Variable word
The COMobject word.dictionary does not exist (at least not on my machine), here is what worked for me in the short test i did:
$dic = New-Object -COM Scripting.Dictionary #credits to MickyB
$w = New-Object -COM Word.Application
$w.Languages | % {if($_.Name -eq "English (US)"){$dic=$_.ActiveSpellingDictionary}}
$w.checkSpelling("Color", [ref]$null, [ref]$null, [ref]$dic)
Another possibility in addition to Paul's:
$dictionary = New-Object -COM Scripting.Dictionary
I was experiencing the same problem, that the Word.Application.checkSpelling() method seems to ignore any dictionary passed to it. I worked around the issue creating a Word document, defining a text range, changing the LanguageID of this range to the language I want to proof read against, and then inspecting detected spelling errors. Here is the code:
<#Function which helps to pick the language#>
function FindLanguage($language_name){
foreach($element in $Word.Languages){
if($element.Name -eq $language_name){
$element.Name
return $element
}
}
}
$Proofread_text = "The lazie frog jumpss over over the small dog."
$Word = New-Object -COM Word.Application
$Document = $Word.Documents.Add()
$Textrange = $Document.Range(0)
$english = FindLanguage("English (US)")
$Textrange.LanguageID = $english.ID
$Textrange.InsertAfter($Proofread_text)
<#Handle misspelled words here#>
foreach($spell_error in $textrange.SpellingErrors){
Write-Host $spell_error.Text
}
$Document.Close(0)
$Word.Quit()
The output will be:
>>lazie
>>jumpss
>>over
I found it helpful to disable the language autodetection in word before starting the script. Especially when you plan to switch the language.

How to Open; SaveAs; then Close an Excel 2013 (macro-enabled) workbook from PowerShell4

Doing a search on the above Com operations yields links dating to '09 and even earlier. Perhaps it hasn't changed but I find myself bumping up against errors where 'it is being used by another process.' - even though no Excel app is open on my desktop. I have to reboot to resume.
To be clear - I'm trying to open an existing file; immediately SaveAs() (that much works), add a sheet, Save(); Close() - and then, importantly, repeat that cycle. In effect, I'm creating a few dozen new sheets within a loop that executes the above 'Open Master; SaveAs(); Edit Stuff; Save; Close;
From the examples I've seen this is not a typical workflow for PowerShell. Pasted at the very bottom is my provisional script - pretty rough and incomplete but things are opening what they need to open and adding sheet also works - until I know I have the right way to cleanly close stuff out I'm not worried about the iterations.
I've found a couple examples that address closing:
From http://theolddogscriptingblog.wordpress.com/2012/06/07/get-rid-of-the-excel-com-object-once-and-for-all/
$x = New-Object -com Excel.Application
$x.Visible = $True
Start-Sleep 5
$x.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($x)
Remove-Variable x
And from http://social.technet.microsoft.com/Forums/windowsserver/en-US/24e57b61-e792-40c1-8aff-b0a8205f48ab/updated-opened-excel-using-powershell?forum=winserverpowershell
Set-ItemProperty $path -name IsReadOnly -value $false
$Excel.ActiveWorkBook.Save()
$openfile.Close()
$openfile = $null
$Excel.Quit()
$Excel = $null
[GC]::Collect()
<>
function MakeNewBook($theWeek, $AffID){
$ExcelFile = "C:\csv\InvoiceTemplate.xlsm"
$Excel = New-Object -Com Excel.Application
$Excel.Visible = $True
$Workbook = $Excel.Workbooks.Open($ExcelFile)
$theWeek = $theWeek -replace "C:\\csv\\", ""
$theWeek = $theWeek -replace "\.csv", ""
$theWeek = "c:\csv\Invoices\" +$AffID +"_" + $theWeek + ".xlsm"
$SummaryWorksheet = $Workbook.worksheets.Item(1)
$Workbook.SaveAs($theWeek)
return $Excel
}
function MakeNewSheet($myBook, $ClassID){
$SheetName = "w"+$ClassID
#$Excel = New-Object -Com Excel.Application
#$Excel.Visible = $True
$wSheet = $myBook.WorkSheets.Add()
}
function SaveSheet ($myExcel)
{
#$WorkBook.EntireColumn.AutoFit()
#Set-ItemProperty $path -name IsReadOnly -value $false
$myExcel.ActiveWorkBook.Save()
$openfile= $myExcel.ActiveWorkBook
$openfile.Close()
$openfile = $null
$myExcel.Quit()
$myExcel = $null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($myExcel)
Remove-Variable $myExcel
[GC]::Collect()
}
$theWeek = "C:\csv\wkStart2013-11-04.csv"
$x = Import-Csv $theWeek
foreach ($xLine in $x){
if ($x[0]){
$AffID = $x[1].idAffiliate
$myExcel = MakeNewBook $theWeek $x[1].idAffiliate
$ClassID = $x[1].idClass
MakeNewSheet $myExcel $ClassID
continue
}
SaveSheet($myExcel)
$AffID = $_.$AffID
$wID = $xLine.idClass
#MakeNewSheet($wID)
Echo "$wID"
}
As a follow up after playing around with this issue myself. I geared my solution around Ron Thompson's comment minus the function calls:
# collect excel process ids before and after new excel background process is started
$priorExcelProcesses = Get-Process -name "*Excel*" | % { $_.Id }
$Excel = New-Object -ComObject Excel.Application
$postExcelProcesses = Get-Process -name "*Excel*" | % { $_.Id }
# your code here (probably looping through the Excel document in some way
# try to gently quit
$Excel.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($Excel)
# otherwise allow stop-process to clean up
$postExcelProcesses | ? { $priorExcelProcesses -eq $null -or $priorExcelProcesses -notcontains $_ } | % { Stop-Process -Id $_ }
My experience has been that the Quit method doesn't work well, especially when looping. When you get the error, instead of rebooting, open up Task Manager and look at the Processes tab. I'm willing to bet you'll see Excel still open -- maybe even multiple instances of it. I solved this problem by using Get-Process to find all instances of Excel and piping them to Stop-Process. Doesn't seem like that should be necessary, but it did the trick for me.
You should not have to keep track of processes and kill them off.
My experience has been that to properly and completely close Excel (including in loops), you also need to release COM references. In my own testing have found removing the variable for Excel also ensures no remaining references exist which will keep Excel.exe open (like if you are debugging in the ISE).
Without performing the above, if you look in Task Manager, you may see Excel still running...in some cases, many copies.
This has to do with how the COM object is wrapped in a “runtime callable wrapper".
Here is the skeleton code that should be used:
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $true
$workbook = $excel.Workbooks.Add()
# or $workbook = $excel.Workbooks.Open($xlsxPath)
# do work with Excel...
$workbook.SaveAs($xlsxPath)
$excel.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
# no $ needed on variable name in Remove-Variable call
Remove-Variable excel
Try this
$filePath = "E:\TSMBackup\TSMDATAPull\ODCLXTSM01_VM.xlsx"
$excelObj = New-Object -ComObject Excel.Application
$excelObj.Visible = $true
$workBook = $excelObj.Workbooks.Open($filePath)
$workSheet = $workBook.Sheets.Item("Sheet1")
$workSheet.Select()
$workBook.RefreshAll()
$workBook.Save()
$excelObj.Quit()