how can i convert a RTF document to docx - powershell

I have found something similar on here but when I try running this I get errors.
I was therfore wondering if it would be possible to make a Powershell script that can take .RTF documents and convert them all to .docx documents?

Use this to convert rtf to docx:
Function Convert-Dir($path){
$Files=Get-ChildItem "$($path)\*.docx" -Recurse
$Word=New-Object –ComObject WORD.APPLICATION
foreach ($File in $Files) {
# open a Word document, filename from the directory
$Doc=$Word.Documents.Open($File.fullname)
# Swap out DOCX with PDF in the Filename
$Name=($Doc.Fullname).replace("docx","doc")
if (Test-Path $Name){
} else {
# Save this File as a PDF in Word 2010/2013
Write-Host $Name
$Doc.saveas([ref] $Name, [ref] 0)
$Doc.close()
}
}
$Files=Get-ChildItem "$($path)\*.rtf" -Recurse
$Word=New-Object –ComObject WORD.APPLICATION
foreach ($File in $Files) {
# open a Word document, filename from the directory
$Doc=$Word.Documents.Open($File.fullname)
# Swap out DOCX with PDF in the Filename
$Name=($Doc.Fullname).replace("rtf","doc")
if (Test-Path $Name){
} else {
# Save this File as a PDF in Word 2010/2013
Write-Host $Name
$Doc.saveas([ref] $Name, [ref] 0)
$Doc.close()
}
}
}
Convert-Dir "RtfFilePath";
Code from and attribution: https://gist.github.com/rensatsu/0a66a65c3a508ecfd491#file-rtfdocxtodoc-ps1

Related

Get a specific content from xml and compare

I have many compressed files that has inside an xml file with Package ID = Guid.
In any file the Package ID should show more then one. some of them will be 000000000 so I should ignore them. The goal is to collect all the package IDs in the file from all the compressed files and compare between them. If 2 files has the same package ID I want to know who has them.
So far I have a script that open all the compressed files and read the text file inside and replace a string (Not as xml)
$fileToEdit = "vip.manifest"
$toReplace = '<Prop Name="WarningDuringUpgrade" Value="True"'
$replaceWith = '<Prop Name="WarningDuringUpgrade" Value="False"'
function ModifyFiles($Manifests)
{
foreach ($file in $Manifests) {
try {
$zip = [System.IO.Compression.ZipFile]::Open($file, "Update")
}
catch {
Write-Warning $_.Exception.Message
continue
}
$entries = $zip.Entries.Where({ $_.Name -like $fileToEdit })
foreach($entry in $entries) {
$reader = [System.IO.StreamReader]::new($entry.Open())
$content = $reader.ReadToEnd().Replace($toReplace, $replaceWith)
$reader.Close()
$reader.Dispose()
$writer = [System.IO.StreamWriter]::new($entry.Open())
$writer.BaseStream.SetLength(0)
$writer.Write($content)
$writer.Flush()
$writer.Close()
$writer.Dispose()
}
$zip.Dispose()
Write-Host "$entry for $file was updated"
}
}
The tag called Package + ID=Guid

Powershell Writing to .XLSX is Corrupting the Files

I have a Powershell script that loops through .xslx files in a folder and password protects them with the file name (for now.) I have no problem looping through and writing to .xls, but when I try to open an .xlsx file after writing it with Powershell - I get the error:
Excel cannot open the file 'abcd.xlsx' because the file format or file
extension is not valid. Verify that the file has not been corrupted
and that the file extension matches the format of the file.
Here's the script:
function Release-Ref ($ref) {
([System.Runtime.InteropServices.Marshal]::ReleaseComObject(
[System.__ComObject]$ref) -gt 0)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
}
$e = $ErrorActionPreference
$ErrorActionPreference="continue"
foreach ($f in Get-ChildItem "C:"){
try{
$ff = $f
$xlNormal = -4143
$s = [System.IO.Path]::GetFileNameWithoutExtension($f)
$xl = new-object -comobject excel.application
$xl.Visible = $False
$xl.DisplayAlerts = $False
$wb = $xl.Workbooks.Open($ff.FullName)
$wb.sheets(1).columns("A:S").entirecolumn.AutoFit()
$wb.sheets(1).columns("N").NumberFormat = "0.0%"
$a = $wb.SaveAs("C:\Out\" + $s + ".xls",$xlNormal,$s) #works
#$a = $wb.SaveAs("C:\Out\" + $s + ".xlsx",$xlNormal,$s) #doesn't work
$a = $xl.Quit()
$a = Release-Ref($ws)
$a = Release-Ref($wb)
$a = Release-Ref($xl)
}
catch {
Write-Output "Exception"
$ErrorActionPreference=$e;
}
}
I've searched other questions but can't find any other examples of the same issues writing from Powershell. Thank you.
The problem is caused because Xls is different a format from Xlsx. Older Excels before version 2007 used binary formats. The 2007 Office introduced new formats called Office Open Xml, which Xslx uses.
Excel is smart enough to check both file extension and file format. Since saving a binary file with new versions' extension creates a conflict, the error message hints for this possibility too:
and that the file extension matches the format of the file.
Why doesn't Excel just open the file anyway? I guess it's a security feature that prevents unintentional opening of Office documents. Back in the days, Office macro viruses were a bane of many offices. One of the main infection vectors was to trick users to open files without precautions. Unlike classic viruses, macro ones infected application data (including default template files) instead of OS binaries, but that's another a story.
Anyway, to work in proper a format, use proper version value. That would be -4143 for Xls and 51 for Xlsx. What's more, Get-ChildItem returns a collection of FileInfo objects, and file extension is there in Extension property. Like so,
# Define Xls and Xlsx versions
$typeXls = -4143
$typeXls = 51
foreach ($f in Get-ChildItem "C:"){
try{
$ff = $f
...
# Select saveas type to match original file extension
if($f.extension -eq '.xsl') { $fType = $typeXls }
else if($f.extension -eq '.xslx') { $fType = $typeXlsx }
$a = $wb.SaveAs("C:\Out\" + $s + $.extension, $fType, $s)
Working with com objects is too complicated sometimes with excel. I recommend the import-excel module.
Install-Module -Name ImportExcel
Then you can do something like this.
function Release-Ref ($ref) {
$e = $ErrorActionPreference
$ErrorActionPreference="continue"
foreach ($f in Get-ChildItem $file){
try{
$filePass = gci $f
$path = split-path $f
$newFile = $path + "\" + $f.BaseName + "-protected.xlsx"
$f | Export-excel $newFile -password $filePass -NoNumberConversion * -AutoSize
}
catch {
Write-Output "Exception"
$ErrorActionPreference=$e;
}
}
}

Bulk File Renaming Format Change

I'm attempting a proof of concept for my department which attempts to extract attached files from .msg files located in a set of folders. I'm still struggling with getting up to speed with PowerShell, especially when using modules and rename features.
I found a module on-line that pretty well does everything I need except that I need a slightly different variant in the new attachment filename. i.e. Not sure how to modify line with code below...
$attFn = $msgFn -replace '\.msg$', " - Attachment - $($_.FileName)"
The code below extracts the attached files and renames them along the lines of...
An email file MessageFilename.msg, with an attachment AttachmentFilename.pdf extracts the attached filename to Messagefilename - Attachement - AttachmentFilename.pdf
I really need the attachment filename to be extracted into the format AttachmentFilename.pdf only. The problem I keep having is that I keep losing the path to the .msg filename so get errors when attempting the rename to a path that doesn't exist. I've tried a few options in debug mode but keep losing the path context when attempting the 'replace'.
Any help appreciated...
The borrowed code is...
##
## Source: https://chris.dziemborowicz.com/blog/2013/05/18/how-to-batch-extract-attachments-from-msg-files-using-powershell/
##
## Usage: Expand-MsgAttachment *
##
##
function Expand-MsgAttachment
{
[CmdletBinding()]
Param
(
[Parameter(ParameterSetName="Path", Position=0, Mandatory=$True)]
[String]$Path,
[Parameter(ParameterSetName="LiteralPath", Mandatory=$True)]
[String]$LiteralPath,
[Parameter(ParameterSetName="FileInfo", Mandatory=$True, ValueFromPipeline=$True)]
[System.IO.FileInfo]$Item
)
Begin
{
# Load application
Write-Verbose "Loading Microsoft Outlook..."
$outlook = New-Object -ComObject Outlook.Application
}
Process
{
switch ($PSCmdlet.ParameterSetName)
{
"Path" { $files = Get-ChildItem -Path $Path }
"LiteralPath" { $files = Get-ChildItem -LiteralPath $LiteralPath }
"FileInfo" { $files = $Item }
}
$files | % {
# Work out file names
$msgFn = $_.FullName
$msgFnbase = $_.BaseName
# Skip non-.msg files
if ($msgFn -notlike "*.msg") {
Write-Verbose "Skipping $_ (not an .msg file)..."
return
}
# Extract message body
Write-Verbose "Extracting attachments from $_..."
$msg = $outlook.CreateItemFromTemplate($msgFn)
$msg.Attachments | % {
# Work out attachment file name
$attFn = $msgFn -replace '\.msg$', " - Attachment - $($_.FileName)"
# Do not try to overwrite existing files
if (Test-Path -literalPath $attFn) {
Write-Verbose "Skipping $($_.FileName) (file already exists)..."
return
}
# Save attachment
Write-Verbose "Saving $($_.FileName)..."
$_.SaveAsFile($attFn)
# Output to pipeline
Get-ChildItem -LiteralPath $attFn
}
}
}
End
{
Write-Verbose "Done."
}
}
$msgFn = $_.FullName
says that it will be a full path in the form c:\path\to\file.msg.
So you can use:
# extract path, e.g. 'c:\path\to\'
$msgPath = Split-Path -Path $msgFn
# make new path, e.g. 'c:\path\to\attachment.pdf'
$newFn = Join-Path -Path $msgPath -ChildPath ($_.FileName)

How to search a word in a docx file with powershell?

I have to examine all of .docx files in a folder and i have to display the name of files which is contain that word I added as param. How can I do it in powershell?
try someting like this:
#Instance of word
$Word=NEW-Object –comobject Word.Application
$Word.visible = $False
#take list of .docx
Get-ChildItem "c:\temp" -file -Filter "*.docx" | %{
$Filename=$_.FullName
#open file and take content of word file
$Document=$Word.documents.open($Filename, $false, $true)
$range = $document.content
#if content have your word, print path of word file
If($range.Text -like "*tot*"){
$Filename
}
$word.Documents.Close($false)
}

Powershell automated deletion of specified SharePoint documents

We have a csv file with approximately 8,000 SharePoint document file URLs - the files in question they refer to have to be downloaded to a file share location, then deleted from the SharePoint. The files are not located in the same sites, but across several hundred in a server farm. We are looking to remove only the specified files - NOT the entire library.
We have the following script to effect the download, which creates the folder structure so that the downloaded files are separated.
param (
[Parameter(Mandatory=$True)]
[string]$base = "C:\Export\",
[Parameter(Mandatory=$True)]
[string]$csvFile = "c:\export.csv"
)
write-host "Commencing Download"
$date = Get-Date
add-content C:\Export\Log.txt "Commencing Download at $date":
$webclient = New-Object System.Net.WebClient
$webclient.UseDefaultCredentials = $true
$files = (import-csv $csvFile | Where-Object {$_.Name -ne ""})
$line=1
Foreach ($file in $files) {
$line = $line + 1
if (($file.SpURL -ne "") -and ($file.path -ne "")) {
$lastBackslash = $file.SpURL.LastIndexOf("/")
if ($lastBackslash -ne -1) {
$fileName = $file.SpURL.substring(1 + $lastBackslash)
$filePath = $base + $file.path.replace("/", "\")
New-Item -ItemType Directory -Force -Path $filePath.substring(0, $filePath.length - 1)
$webclient.DownloadFile($file.SpURL, $filePath + $fileName)
$url=$file.SpURL
add-content C:\Export\Log.txt "INFO: Processing line $line in $csvFile, writing $url to $filePath$fileName"
} else {
$host.ui.WriteErrorLine("Exception: URL has no backslash on $line for filename $csvFile")
}
} else {
$host.ui.WriteErrorLine("Exception: URL or Path is empty on line $line for filename $csvFile")
}
}
write-Host "Download Complete"
Is there a way we could get the versions for each file?
I have been looking for a means to carry out the deletion, using the same csv file as reference - all of the code I have seen refers to deleting entire libraries, which is not desired.
I am very new to PowerShell and am getting lost. Can anyone shed some light?
Many thanks.
This looks like it might be useful. It's a different approach and would need to be modified to pull in the file list from your CSV but it looks like it generally accomplishes what you are looking to do.
https://sharepoint.stackexchange.com/questions/6511/download-and-delete-documents-using-powershell