Objective: to find all word document files in a specific folder and then find the date within these doc files and change the date before converting all of the files to pdf files.
(Background:- I have like 100 files with dates inside of the files that all need to be uniform. I then need to change all of the files to pdf).
Here's what i have so far:
$scriptHome= Split-Path -parent $Myinvocation.Mycommand.Definition
$word_app = New-Object -ComObject Word.Application
# This filter will find .doc as well .docx documents
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {
$document = $word_app.Documents.Open($_.FullName)
$pdf_filename = "$($_.DirectoryName)\$($_.BaseName).pdf"
$document.SaveAs([ref] $pdf_filename, [ref] 17)
$document.Close()
}
$word_app.Quit()
Related
How do I open multiple HTML files (tabular format) and save them as Excel XLSX format in Windows PowerShell ISE? Directly renaming file extension removes all the formatting. It was working with a single file. I need help with the looping part?
$FolderPath = 'C:\Users\abcd\Desktop\New folder'
$FilePaths = get-childitem $FolderPath -recurse | where {$_.extension -eq ".html"}
foreach($FilePath in $FilePaths)
{
$Workbook = $Excel.Workbooks.Open($FilePath)
$Excel.Visible = $true
$Excel.DisplayAlerts = $False
$OutFile = 'C:\Users\abcd\Desktop\New folder\...xlsx' #Need same file names
$xlSLSXType = 51
$workBook.SaveAs("$OutFile",$xlSLSXType)
}
I found this script (https://gist.github.com/mp4096/1a2279ec7b3dfec659f58e378ddd9aee) which is bulking powerpoints to PDF's and are saving them where you run the script.
However, what if one want to save them into the same directories they are found in but swap the parent path to 'PDF' over 'Powerpoint'?
Suppose the tree of dirs looks something like this:
/Parent_dir/Powerpoint/A_1/B/p1.pptx
/Parent_dir/Powerpoint/A/p1.pptx
And then I want to save them into same tree but with folder “PDF” instead (all the directories already exists but are for now empty):
/Parent_dir/PDF/A_1/B/p1.pdf
/Parent_dir/PDF/A/p1.pdf
I tried playing around with the curr_path but I have to create the curr_path inside the Get-ChildItem loop and Im not sure how to.
# Batch convert all .ppt/.pptx files encountered in folder and all its subfolders
# The produced PDF files are stored in the invocation folder
#
# Adapted from http://stackoverflow.com/questions/16534292/basic-powershell-batch-convert-word-docx-to-pdf
# Thanks to MFT, takabanana, ComFreek
#
# If PowerShell exits with an error, check if unsigned scripts are allowed in your system.
# You can allow them by calling PowerShell as an Administrator and typing
# ```
# Set-ExecutionPolicy Unrestricted
# ```
# Get invocation path
$curr_path = Split-Path -parent $MyInvocation.MyCommand.Path
# Create a PowerPoint object
$ppt_app = New-Object -ComObject PowerPoint.Application
# Get all objects of type .ppt? in $curr_path and its subfolders
Get-ChildItem -Path $curr_path -Recurse -Filter *.ppt? | ForEach-Object {
Write-Host "Processing" $_.FullName "..."
# Open it in PowerPoint
$document = $ppt_app.Presentations.Open($_.FullName)
# Create a name for the PDF document; they are stored in the invocation folder!
# If you want them to be created locally in the folders containing the source PowerPoint file, replace $curr_path with $_.DirectoryName
$pdf_filename = "$($curr_path)\$($_.BaseName).pdf"
# Save as PDF -- 17 is the literal value of `wdFormatPDF`
$opt= [Microsoft.Office.Interop.PowerPoint.PpSaveAsFileType]::ppSaveAsPDF
$document.SaveAs($pdf_filename, $opt)
# Close PowerPoint file
$document.Close()
}
# Exit and release the PowerPoint object
$ppt_app.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($ppt_app)
There are of course several ways to handle your use case. The below is just one example.
$curr_path = Split-Path -parent $MyInvocation.MyCommand.Path
$ValidatePath = If (-Not (Test-Path -Path $curr_path))
{(New-Item -Path $curr_path -ItemType Directory).FullName}
Else {$curr_path}
$ppt_app = New-Object -ComObject PowerPoint.Application
Get-ChildItem -Path $ValidatePath -Recurse -Filter '*.ppt?' |
ForEach-Object {
Write-Host "Processing $($PSItem.FullName) '...'"
$document = $ppt_app.Presentations.Open($PSItem.FullName)
$pdf_filename = "$($curr_path)\$($PSItem.BaseName).pdf"
$opt= [Microsoft.Office.Interop.PowerPoint.PpSaveAsFileType]::ppSaveAsPDF
$document.SaveAs($pdf_filename, $opt)
$document.Close()
}
$ppt_app.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($ppt_app)
I try to create a powershell script, to perform a few steps:
In a specific folder, I put a .xlsx file, it converts it to csv. Until now I got this:
$ErrorActionPreference = 'Stop'
Function Convert-CsvInBatch
{
[CmdletBinding()]
Param
(
[Parameter(Mandatory=$true)][String]$Folder
)
$ExcelFiles = Get-ChildItem -Path $Folder -Filter *.xlsx -Recurse
$excelApp = New-Object -ComObject Excel.Application
$excelApp.DisplayAlerts = $false
$ExcelFiles | ForEach-Object {
$workbook = $excelApp.Workbooks.Open($_.FullName)
$csvFilePath = $_.FullName -replace "\.xlsx$", ".csv"
$workbook.SaveAs($csvFilePath, [Microsoft.Office.Interop.Excel.XlFileFormat]::xlCSV)
$workbook.Close()
}
# Release Excel Com Object resource
$excelApp.Workbooks.Close()
$excelApp.Visible = $true
Start-Sleep 5
$excelApp.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelApp) | Out-Null
}
#
# 0. Prepare the folder path which contains all excel files
$FolderPath = "C:\exacthpath"
Convert-CsvInBatch -Folder $FolderPath
The columns in the file, are still there, so I want to remove them, and insert a ';' instead, like:
H;1;43;185;
At this point I'm stuck. I can import it into Powershell like:
Import-Csv -Path 'C:\folder\filename.csv' | ForEach-Object {
$_
}
I get this look, and the most important task is here, in the first row only:
H;1;43;185;
This should be modified into:
H;01;43;185
the rest should be left untouched.
After I need to export back it into a CSV file, like:
Export-Csv -Path 'C:\folder\modified_filename.csv'
But this whole process should be inserted in one single powershell script, which performs the above steps on it's own. So in short:
identifies any .xlsx file - regardless of it's name
convers it into .csv
modifies the outlook of the document, to separate the columns with a ";"
modify the first line to have 'H;01;43;185' - this is a static line, it will always look like this
save the created file as a final .csv file
Can you help me somehow to include/optimize the above scripts and let powershell perform the modification too? Example content of a file like this (final look) Usually it includes more 1000+ lines:
H;01;43;185
D;111;3;1042;2
D;222;3;1055;3
D;333;3;1085;1
T;3;;;
Any help is highly appreciated.
Regards,
Armin
If as you say in your comment, your Excel already creates a csv with the semi-colon as delimiter, you can do this inside the loop, just below $workbook.Close()
# read the file created by Excel as string array
$data = Get-Content $csvFilePath
# overwrite the file with just the new header
Set-Content -Path $csvFilePath -Value 'H;01;43;185'
# add the rest of the data to the file
$data[1..($data.Count -1)] | Add-Content -Path $csvFilePath
P.S. I would delete the lines
$excelApp.Visible = $true
Start-Sleep 5
because I don't see the need to have Excel show itself and pause the function for 5 seconds.. Instead, have Excel not show at all so it will work a lot faster by adding
$excelApp.Visible = $false
right after you have created the $excelApp
I have to examine all of .docx files in a folder and i have to display the name of files which is contain that word I added as param. How can I do it in powershell?
try someting like this:
#Instance of word
$Word=NEW-Object –comobject Word.Application
$Word.visible = $False
#take list of .docx
Get-ChildItem "c:\temp" -file -Filter "*.docx" | %{
$Filename=$_.FullName
#open file and take content of word file
$Document=$Word.documents.open($Filename, $false, $true)
$range = $document.content
#if content have your word, print path of word file
If($range.Text -like "*tot*"){
$Filename
}
$word.Documents.Close($false)
}
I am using PowerShell to loop through designated folders in Outlook and saving the attachments in a tree like structure. This works wonders, but now management has requested the email itself be saved as a PDF as well. I found the PrintOut method in object, but that prompts for a file name. I haven't been able to figure out what to pass to it to have it automatically save to a specific filename. I looked on the MSDN page and it was a bit to high for my current level.
I am using the com object of outlook.application.
Short of saving all of the emails to a temp file and using a third party method is there parameters I can pass to PrintOut? Or another way to accomplish this?
Here is the base of the code to get the emails. I loop through $Emails
$Outlook = New-Object -comobject outlook.application
$Connection = $Outlook.GetNamespace("MAPI")
#Prompt which folder to process
$Folder = $Connection.PickFolder()
$Outlook_Folder_Path = ($Folder.FullFolderPath).Split("\",4)[3]
$BaseFolder += $Outlook_Folder_Path + "\"
$Emails = $Folder.Items
Looks like there are no built-in methods, but if you're willing to use third-party binary, wkhtmltopdf can be used.
Get precompiled binary (use MinGW 32-bit for maximum compatibility).
Install or extract installer with 7Zip and copy wkhtmltopdf.exe to your script directory. It has no external dependencies and can be redistributed with your script, so you don't have to install PDF printer on all PCs.
Use HTMLBody property of MailItem object in your script for PDF conversion.
Here is an example:
# Get path to wkhtmltopdf.exe
$ExePath = Join-Path -Path (
Split-Path -Path $Script:MyInvocation.MyCommand.Path
) -ChildPath 'wkhtmltopdf.exe'
# Set PDF path
$OutFile = Join-Path -Path 'c:\path\to\emails' -ChildPath ($Email.Subject + '.pdf')
# Convert HTML string to PDF file
$ret = $Email.HTMLBody | & $ExePath #('--quiet', '-', $OutFile) 2>&1
# Check for errors
if ($LASTEXITCODE) {
Write-Error $ret
}
Please note, that I've no experience with Outlook and used MSDN to get relevant properties for object, so the code might need some tweaking.
Had this same issue. This is what I did to fix it if anybody else is trying to do something similar.
You could start by taking your msg file and converting it to doc then converting the doc file to pdf.
$outlook = New-Object -ComObject Outlook.Application
$word = New-Object -ComObject Word.Application
Get-ChildItem -Path $folderPath -Filter *.msg? | ForEach-Object {
$msgFullName = $_.FullName
$docFullName = $msgFullName -replace '\.msg$', '.doc'
$pdfFullName = $msgFullName -replace '\.msg$', '.pdf'
$msg = $outlook.CreateItemFromTemplate($msgFullName)
$msg.SaveAs($docFullName, 4)
$doc = $word.Documents.Open($docFullName)
$doc.SaveAs([ref] $pdfFullName, [ref] 17)
$doc.Close()
}
Then, just clean up the unwanted files after