Save Email body to html file powershell - powershell

I am trying to convert a folder full of MSG files to HTML files. I have a scrip that gets most of the way there, but instead of displaying the text in powershell I need it to save each one as an individual html file. For some reason I can't get the save to work. I've tried various options like out-file and $body.SaveAs([ref][system.object]$name, [ref]$saveFormat)
$saveFormat = [Microsoft.Office.Interop.Outlook.olSaveAsType]::olFormatHTML
Get-ChildItem "C:\MSG\" -Filter *.msg |
ForEach-Object {
$body = ""
$outlook = New-Object -comobject outlook.application
$msg = $outlook.Session.OpenSharedItem($_.FullName)
$body = $msg | Select body | ft -AutoSize
}
Any advice on how to save this as individual files would be great.

To start with, you should not capture the output of a Format-* cmdlet in a variable. Those are designed to output to something (screen, file, etc).
Ok, that aside, you are already opening the msg files, so you just need to determine a name and then output the HTMLBody property for each file. Easiest way would be to just tack .htm to the end of the existing name.
Get-ChildItem "C:\MSG\*" -Filter *.msg |
ForEach-Object {
$body = ""
$outlook = New-Object -comobject outlook.application
$msg = $outlook.Session.OpenSharedItem($_.FullName)
$OutputFile = $_.FullName+'.htm'
$msg.HTMLBody | Set-Content $OutputFile
}

Related

Powershell - Retrieving content from a website that contains non-English characters and writing to a file results in incorrect characters

There is a news website I frequent that has a series of headlines on their main page. Clicking the headline takes you to the individual story. I am trying to write a Powershell script that will loop through all the headlines on the main page and write each story to a text file.
The problem I am having is the stories are in Spanish and the Spanish characters with accent marks do not show up properly in my text file (actually the weird thing is, sometimes they do, but the majority of the time they don't). I've checked the headers of each story and the charset is set to UTF8 so I think the web pages themselves are formatted correctly. I've tried every way I know of to set the output file as UTF8 as well, but I can't seem to get it fixed.
Anyone have any ideas? Here is the code:
$ie = New-Object -ComObject 'InternetExplorer.Application'
$url = "https://www3.nhk.or.jp/nhkworld/es/news/"
#$ie.Visible = $true
$ie.Navigate($url)
while($ie.busy) {Start-Sleep 1}
$file = "C:\temp\nhk.txt"
if(Test-Path $file) { Remove-Item $file }
$lastLink = $null
foreach($link in $ie.Document.getElementsByTagName("a")) {
if($link.href -match "\d{6}") { #the links to the stories we want are numbered with 6 digits
if(-not($link.href -eq $lastLink)) {
$uri = $link.href
$w = Invoke-WebRequest -Uri $uri
ForEach($element in $w.AllElements | where tagname -eq "p") {
$text = $element | select -expand innerText
$text = $text + "`r`n"
Add-Content -Path $file -Value $text
}
$lastLink = $link.href
}
}
}
I think it's the same basic problem as this question:
PowerShell Invoke-RestMethod Umlauts issues with UTF-8 and Windows-1252
The issue is the server is sending a response which is encoded using UTF8, but it's not correctly setting the Content-Type header to tell the client it's doing that, so the client is assuming it's encoded with the default ISO-8859-1 encoding.
This means, for example, the character ó is being sent by the server as the UTF8 byte sequence C3 B3 but the client is decoding that as an ISO-8859-1 byte sequence which becomes ó.
Since you can't presumably control the server's behaviour you might need to do some processing on the mangled text to recover the original version. I posted one way of doing that in an answer to the question above (see https://stackoverflow.com/a/58542493/3156906), but here it is again...
PS> $text = "ó"
PS> $bytes = [System.Text.Encoding]::GetEncoding("ISO-8859-1").GetBytes($text)
PS> $text = [System.Text.Encoding]::UTF8.GetString($bytes)
PS> write-host $text
ó
Did some more experimenting, and apparently, if you send the result of the Invoke-WebRequest call straight to a file using the -OutFile parameter, this file gets written in UTF8.
This should then (hopefully) do it:
# create a temporary file
$tempFile = (New-TemporaryFile).FullName
if(-not($link.href -eq $lastLink)) {
$uri = $link.href
$w = Invoke-WebRequest -Uri $uri
Invoke-WebRequest -Uri $uri -OutFile $tempFile
# read the file with encoding UTF8
$content = Get-Content -Path $tempFile -Raw -Encoding UTF8
# parse the html
$html = New-Object -Com "HTMLFile"
$html.IHTMLDocument2_write($content)
# and append the innerText to your file "C:\temp\nhk.txt"
Add-Content -Path $file -Value $html.body.innerText
Add-Content -Path $file # add extra empty line
# clean up
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($html) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
$html = $null
$lastLink = $link.href
}
# remove the temp file
Remove-Item -Path $tempFile -Force

How to get text from a specific cell in a PDF file in Powershell

So I have a program that is supposed to open a PDF in word, get the text from a specific cell, and export it into an excel sheet.
Set-StrictMode -Version latest
$file = "C:\PathToPDF.pdf"
$output = "C:\PathToCSV.csv"
$application = New-Object -comobject word.application
$application.visible = $False
$results = #{}
Function GetWordTable
{
$document = $application.documents.open($file,$false,$true)
$objTable = $document.Tables.Item(1)
$properties = #{
Data = $objTable.Cell(5, 5).Range.Text
}
$results = New-Object -TypeName PsCustomObject -Property $properties
$results | Export-Csv $output -NoTypeInformation
$document.close()
$application.quit()
}
GetWordTable
I keep getting an error at the line that populates properties, even though I successfully tested it in a function that seeks out string matches in a word file and exports to excel.
What should I try?
I just realized that the PDF I was using, I thought it was one continuous table but apparently there technically are several distinct tables within it. Thus the code does actually work so long as a valid cell is selected.

How to change script to save attachments by file name instead of by sender name?

I'm using the script below. It works well to save attachments by sender name in a specified folder. However, if the sender names are constant, it only saves 1 of the attachments vs. all of the attachments. I'm assuming it's a write error. How do I update the script below to save all attachments meeting the filtered criteria by their actual attachment name instead of sender name.
$o = New-Object -comobject outlook.application
$n = $o.GetNamespace("MAPI")
$f = $n.PickFolder()
$filepath = "c:\test"
$f.Items| foreach {$SendName = $_.Sendername
$_.attachments|foreach {
$_.filename
If ($_.filename.Contains("pdf")) {
$_.saveasfile((Join-Path $filepath "$SendName.pdf"))}}}`
Any ideas would be greatly appreciated.
So Lets Follow the rabbit hole here.
We can go to the Outlook Object Model, and look for Attachments Object because we see you are iterating over attachments:
$_.attachments|foreach
We see in the page:
Contains a set of Attachment objects
So we go look at the Attachment Object Page, look at Properties and we see there is a property for FileName
So to send by attachment name we can do this:
$o = New-Object -comobject outlook.application
$n = $o.GetNamespace("MAPI")
$f = $n.PickFolder()
$filepath = "c:\test"
$f.Items| foreach {
$FileName= $_.FileName
$_.attachments|foreach {
$_.filename
If ($_.filename.Contains("pdf")) {
$_.saveasfile((Join-Path $filepath "$FileName"))
}
}
}

Export or Print Outlook Emails to PDF

I am using PowerShell to loop through designated folders in Outlook and saving the attachments in a tree like structure. This works wonders, but now management has requested the email itself be saved as a PDF as well. I found the PrintOut method in object, but that prompts for a file name. I haven't been able to figure out what to pass to it to have it automatically save to a specific filename. I looked on the MSDN page and it was a bit to high for my current level.
I am using the com object of outlook.application.
Short of saving all of the emails to a temp file and using a third party method is there parameters I can pass to PrintOut? Or another way to accomplish this?
Here is the base of the code to get the emails. I loop through $Emails
$Outlook = New-Object -comobject outlook.application
$Connection = $Outlook.GetNamespace("MAPI")
#Prompt which folder to process
$Folder = $Connection.PickFolder()
$Outlook_Folder_Path = ($Folder.FullFolderPath).Split("\",4)[3]
$BaseFolder += $Outlook_Folder_Path + "\"
$Emails = $Folder.Items
Looks like there are no built-in methods, but if you're willing to use third-party binary, wkhtmltopdf can be used.
Get precompiled binary (use MinGW 32-bit for maximum compatibility).
Install or extract installer with 7Zip and copy wkhtmltopdf.exe to your script directory. It has no external dependencies and can be redistributed with your script, so you don't have to install PDF printer on all PCs.
Use HTMLBody property of MailItem object in your script for PDF conversion.
Here is an example:
# Get path to wkhtmltopdf.exe
$ExePath = Join-Path -Path (
Split-Path -Path $Script:MyInvocation.MyCommand.Path
) -ChildPath 'wkhtmltopdf.exe'
# Set PDF path
$OutFile = Join-Path -Path 'c:\path\to\emails' -ChildPath ($Email.Subject + '.pdf')
# Convert HTML string to PDF file
$ret = $Email.HTMLBody | & $ExePath #('--quiet', '-', $OutFile) 2>&1
# Check for errors
if ($LASTEXITCODE) {
Write-Error $ret
}
Please note, that I've no experience with Outlook and used MSDN to get relevant properties for object, so the code might need some tweaking.
Had this same issue. This is what I did to fix it if anybody else is trying to do something similar.
You could start by taking your msg file and converting it to doc then converting the doc file to pdf.
$outlook = New-Object -ComObject Outlook.Application
$word = New-Object -ComObject Word.Application
Get-ChildItem -Path $folderPath -Filter *.msg? | ForEach-Object {
$msgFullName = $_.FullName
$docFullName = $msgFullName -replace '\.msg$', '.doc'
$pdfFullName = $msgFullName -replace '\.msg$', '.pdf'
$msg = $outlook.CreateItemFromTemplate($msgFullName)
$msg.SaveAs($docFullName, 4)
$doc = $word.Documents.Open($docFullName)
$doc.SaveAs([ref] $pdfFullName, [ref] 17)
$doc.Close()
}
Then, just clean up the unwanted files after

Attachments.Add wildcard with Powershell

I have a ZIP file generated with dynamic information (Report_ PC Name-Date_User). However when I go to attach the file I'm unable to use a wildcard. There is only one ZIP file in this directory so using a wildcard will not attach any other ZIP files.
#Directory storage
$DIR = "$ENV:TEMP"
#Max number of recent screen captures
$MAX = "100"
#Captures Screen Shots from the recording
$SC = "1"
#Turn GUI mode on or off
$GUI = "0"
#Caputres the current computer name
$PCName = "$ENV:COMPUTERNAME"
#Use either the local name or domain name
#$User = "$ENV:UserDomainName"
$User = "$ENV:UserName"
#Timestamp
$Date = Get-Date -UFormat %Y-%b-%d_%H%M
#Computer Information
$MAC = ipconfig /all | Select-String Physical
$IP = ipconfig /all | Select-String IPv4
$DNS = ipconfig /all | Select-String "DNS Servers"
#Needed to add space after user input information
$EMPT = "`n"
#Quick capture of the computer information
$Info = #"
$EMPT
*** COMPUTER INFORMATION ***
$PCName
$IP
$MAC
$DNS
"#
# Used to attach to the outlook program
$File = Get-ChildItem -Path $Dir -Filter "*.zip" | Select -Last 1 -ExpandProperty Fullname
$Start_Click = {
psr.exe /start /output $DIR\$Date-$PCName-$User.zip /maxsc $MAX /sc $SC /gui $GUI
}
$Stop_Click={
psr.exe /stop
}
$Email_Click = {
$Outlook = New-Object -Com Outlook.Application
$Mail = $Outlook.CreateItem(0)
$Mail.To = "deaconf19#gmail.com"
$Mail.Subject = "Capture Report from " + $PCName + " " + $User + " " + $Date
$Mail.Body = $Problem.text + $Info
$Mail.Attachments.Add($File)
$Mail.Send()
}
I no longer get an error but the file will not attach the first time around. The second time it will attach but it does the previous .zip not the most recent. I added my entire code
As per the msdn article it shows what the source needs to be which is.
The source of the attachment. This can be a file (represented by the
full file system path with a file name) or an Outlook item that
constitutes the attachment.
Which mean that it does not accept wildcards. To get around this you should instead use Get-ChildItem to return the name of your zip.
$File = Get-ChildItem -Path $Dir -Filter "*.zip" | Select -First 1 -ExpandProperty Fullname
That should return the full path to the first zip. Since Get-ChildItem returns and object we use -ExpandProperty on the Fullname so that you just return the full path, as a string, to the file. -First 1 is not truly required if you really only have the one file. On the off-chance you do including -First 1 will make sure only one file is attached.
Update from comments
I see that you are having issues with attaching a file still. My code would still stand however you might be having an issue with your .zip file or $dir. After where $file is declared I would suggest something like this:
If (! Test-Path $file){Write-Host "$file is not a valid zip file"}
If you would prefer, since I don't know if you see your console when you are running your code, you could use a popup