Blank file after using powershell to convert a word file? - powershell

I have been trying to use PowerShell to convert some .docx files to .docm. I'm able to convert the file, but it's blank every time I open it.
This is the code I have been using:
Get-ChildItem *.docx | Rename-Item -NewName { $_.name -replace '\.docx$','.docm' }

Adding this here per other comments regarding it.
.DOCM is just a Word doc with embedded macros.
What do you expect to see?
In most cases, Word security blocks macro docs from opening unless you tell Word you accept the macro risk, or you've already disabled that.
So, if these are not .DOCs with macros, I am not sure of what your plan was here.
If you just went into Windows Explorer and opened a .docx (non-Macro) file, then manually renamed it to .docm, then try and open it, you'd get the same result.
So, not a PS or PS-specific code issue. Changing the extension does not make it a true .docm, it must be saved that way in Word.
... removing the code refactor.
FYI...There are online tools for this conversion.
Though I've never used or needed to use them. So, just a heads up.
However, here is more info after looking at my old notes, if the goal is to automate this via PS.
if you really wanted to do this in PS, you need to use PS to open a .docx using MSOffice COM, add VBA/Macro code to the doc, and then save it as a macro-enabled file.
For example, here is an article regarding
[Converting Word document format with PowerShell][2]
$path = "c:\olddocuments\"
$word_app = New-Object -ComObject Word.Application
$Format = [Microsoft.Office.Interop.Word.WdSaveFormat]::wdFormatXMLDocument
Get-ChildItem -Path $path -Filter '*.doc' |
ForEach-Object {
$document = $word_app.Documents.Open($_.FullName)
$docx_filename = "$($_.DirectoryName)\$($_.BaseName).docx"
$document.SaveAs([ref] $docx_filename, [ref]$Format)
$document.Close()
}
$word_app.Quit()
If you need to convert the documents to PDF, make the following change
to the “SaveAs” line in the script. 17 corresponds to the PDF file
format when doing a Save As in Microsoft Word.
$document.SaveAs([ref] $docx_filename, [ref]17)
Microsoft Word file format tech doc is here:
[WdSaveFormat enumeration (Word)][3]
https://learn.microsoft.com/en-us/office/vba/api/Word.WdSaveFormat
wdFormatFlatXMLMacroEnabled # 20 Open XML file format with macros enabled saved as a single XML file.

Related

Choose which CSV to import when running a PowerShell script

I get a CSV every week that our finance team puts in a shared drive. I have a script for that CSV that I run once I get it.
The first command of the script is of course Import-Csv.
The problem is, the finance team insists on naming the file differently each time plus they don't always put it in the same location within the drive.
As a result, I have to first hunt for the file, put it into the directory that the script points to and then rename the file.
I've tried talking to the team about putting it in the same location and making sure the filename is the same but they only follow the instructions for a couple of weeks before just doing whatever.
Ideally, I'd like for it so that when I run the script, there would be a popup that would ask me to pick a CSV (Similar to how it looks when you do "Save As" on an Office Document).
Anyway for this to be done within PowerShell?
You can access .Net classes and interface with the forms library to instantiate and take input from the standard FileOpen dialog. Something like below:
Using Namespace System.Windows.Forms
$FileBrowser = [OpenFileDialog]::new()
$FileBrowser.InitialDirectory = 'c:\temp'
$FileBrowser.Filter = 'Comma Separated Values (*.csv) | *.csv'
[Void]$FileBrowser.ShowDialog()
$CsvFile = $FileBrowser.FileName
Then use $CsvFile int he Import-Csv command.
You can change the .InitialDirectory property to make navigating a little more convenient.
Use the .Filter property to limit the file open display to CSV files, to make things that much more convenient.
Also, use the [Void] class to prevent the status return (usually 'OK' or 'Cancel') from echoing to the screen.
Note: A simple Google search will turn up many examples. I refined some of the work from here. That will also document some of the other properties if you want to explore etc.
If you are willing to settle for a selection box that doesn't look as nice as the Save As dialog, you can use Out-Gridview. Something along these lines might help.
$filenames =
#(Get-ChildItem -Path C:\temp -Recurse -Filter *.csv |
Sort-Object LastWriteTime -Descending |
Out-GridView -Title 'Choose a file' -PassThru)
$csvfile = $filenames[0].FullName
Import-Csv $csvfile | More
The -Path specifies a directory that contains all the locations where your csv file might be delivered. The sort is just to put the recently written files at the top of the grid. This supposedly makes selection easier. The #() wrapper merely makes sure the result stored in $filenames is an array.
You would do something else with the results of Import-Csv.
Steven's response certainly satisfies your original question, but an alternative would be to let PowerShell do the work. If you know the drive, and you know the name of the file this week, you can pass the name to your script and let it search the drive filtering on the specific csv file you need. Make it recursive, and open the only file that matches. Sorry, didn't have time yesterday to include code. Here's a function that returns the full file path when provided with a top level search path and a filename with possible wildcards.
function gfp { $result=gci $args[0] -recurse -include $args[1]; return ($result.DirectoryName + "\" + $result.Name) }
Example: gfp "d:\rootfolder" "thisweeksfilename.csv"

Powershell: Go through all files (PDF's) in a directory and move them based on what's written in the first 6 bytes

I am currently trying to write a powershell script that does the following:
Go through all PDF-Files in the directory in which the script is in
Check the first few bytes of those PDF-Files
If those bytes say something along the lines of "PK", move them to a different location
If the bytes say something else (ex: PDF1.4), dont move them at all and go to the next one.
Context: We have around 70k PDF-Files that cant be opened. After checking them with a certain tool, it looks like around 99% of those are damaged and the remaining 1% are zip files.
The first bytes of a zipped PDF file start with "PK", the first bytes of a broken PDF-File start with PDF1.4 for example.
I need to unzip all zip files and relocate them. Going through 70k PDF-Files by hand is kinda painful, so im looking for a way to automate it.
I know im supposed to provide a code sample, but the truth is that i am absolutely lost. I have written a few powershell scripts before, but i have no idea how to do something like this.
So, if anyone could kindly point me to the right direction or give me a useful function, i would really appreciate it a lot.
You can use Get-Content to get your first 6 bytes as you asked.
We can then tie that into a loop on all the documents and configure a simple if statement to decide what to do next, e.g. move the file to another dir
EDITED BASED ON YOUR COMMENT:
$pdfDirectory = 'C:\Temp\struktur_id_1225\ext_dok'
$newLocation = 'C:\Path\To\New\Folder'
Get-ChildItem "$pdfDirectory" -Filter "*.pdf" | foreach {
if((Get-Content $_.FullName | select -first 1 ) -like "%PDF-1.5*"){
$HL7 = $_.FullName.replace("ext_dok","MDM")
$HL7 = $HL7.replace(".pdf",".hl7")
move $_.FullName $newLocation;
move $HL7 $newLocation
}
}
Try using the above, which is also a bit easier to edit.
$pdfDirectory will need to be set to the folder containing the PDF Files
$newLocation will obviously be the new directory!
And you will still need to change the -like "%PDF-1.5*" to suit your search!
It should do the rest for you, give it a shot
Another Edit
I have mimicked your folder structure on my computer, and placed a few PDF files and matching HL7 files and the script is working perfectly.
Get-Content is not suited for PDF's, you'd want to use iTextSharp to read PDF's.
Download the iTextSharp(found in releases) and put the itextsharp.dll somewhere easy to find (ie. the folder your script is located in).
You can install the .nupkg by using Install-Package, or simply using an archive tool to extract the contents of the .nupkg file (it's basically a .zip file)
The code below adds every word on page 1 for each PDF separated by whitespace to an array. You can then test if the array contains your keyword
Add-Type -Path "C:\path\to\itextsharp.dll"
$pdfs = Get-ChildItem "C:\path\to\pdfs" *.pdf
foreach ($pdf in $pdfs) {
$reader = New-Object itextsharp.text.pdf.pdfreader -ArgumentList $pdf.Fullname
$text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader,1).Split("")
foreach($line in $text) {
# do your test here
}
}

Edit powershell script to merge 2 docx into one PDF

i have found this script online. It converts docx files to pdf. The thing is, it creates one pdf for each docx. I need to edit this script, to merge 2 docx files into one single PDF file. I have zero knowledge of powershell, but i know batch in linux.
$documents_path = Split-Path -parent $MyInvocation.MyCommand.Path
$word_app = New-Object -ComObject Word.Application
Get-ChildItem -Path $documents_path -Filter *.doc? | ForEach-Object {
$document = $word_app.Documents.Open($_.FullName)
$pdf_filename = "$($_.DirectoryName)\$($_.BaseName).pdf"
$document.SaveAs([ref] $pdf_filename, [ref] 17)
$document.Close()
}
$word_app.Quit()
This is the design of the script you are using.
Use the more direct approach by merging the .docx files first, then convert to PDF. This means you have to understand the MSWord object model and how to code for it. You're going to have to pick a starting .docx the append other word data to the end.
So, do a search for how to merge Word files. Get that worked out, then you can just use PowerShell to make them .pdfs.
With zero knowledge of PowerShell, you should really take a few quick online training session to get an handle on it all, before you get yourself in a very frustrating position.
Go to the Microsoft Virtual Academy and YouTube and do a search for 'beginning PowerShell'

Using Powershell to Strip Content from PDF

Using Powershell to Strip Content from PDF While Keeping PDF Format.
My Task:
I have been attempting to perform what would be a simple task if the documents were not in PDF format. I have a bunch of PDFs that have unwanted data before the bulk of usable data starts, this is anything that comes before ‘%PDF’ in the documents. A script that pulls all the desired data and exports it to a new file was needed. That part was super easy.
The Problem:
The data that is exported appears to be formatted correctly, except it doesn’t open as a PDF anymore. I can open it in Notepad++ and it looks identical to one that was clean manually and works. Examining the raw code of the Powershell altered PDF it appears that the ‘lines’ are much shorter than they should be.
$Path = 'C:\FileLocation'
$Output = '.\MyFile.pdf'
$LineArr = #()
$Target = Get-ChildItem -Path $Path -Filter *.pdf -Recurse -ErrorAction SilentlyContinue | Get-Content -Encoding default | Out-String -stream
$Target.Where({ $_ -like '*%PDF*' }, 'SkipUntil') | ForEach-Object{
If ($_.contains('%PDF')){
$LineArr += "%" + $_.Split('%')[1]
}
else{
$LineArr += $_
}
}
$LineArr | Out-File -Encoding Default -FilePath $Output
I understand the PDF format doesn't really use lines, so that might be where the problem is being created. Either when the data is being initially put into an array, or when it’s being written the PDF format is probably being broken. Is there a way to retain the format of the PDF while it is modified and then saved? It’s probably the case that I’m missing something simple.
So I was about to start looking at iTextSharp and decided to give an older language a try first, Winbatch. (bleh!) I almost made a screen scraper to do the work but the shame of taking that route got the better of me. So, the function library was the next stop.
This is just a little blurb I spit out with no error checking or logging going on at this point. All that will be added in along with file searches later. All in all it manages to clear all the unwanted extras in the PDF but keeping the exact format that is required by PDFs.
strPDFdoco = "C:\TestPDFs\Test.pdf"
strPDFString = "%%PDF"
strPDFendString = "%%%%END"
If FileExist(strPDFdoco)
strPDFName = ItemExtract(-1, strPDFdoco, "\")
strFixedPDFFullPath = ("C:\TestPDF\Fixed\": strPDFName)
strCurrentPDFFileSize = FileSize(strPDFdoco) ; Get size of PDF file
hndOldPDFFile = BinaryAlloc(strCurrentPDFFileSize) ; Allocate memory for reading PDF file
BinaryRead(hndOldPDFFile, strPDFdoco) ; Read PDF file
strStartIndex = BinaryIndexEx(hndOldPDFFile, 0, strPDFString, #FWDSCAN, #FALSE) ; Find start point for copy
strEndIndex = BinaryEodGet(hndOldPDFFile) ; find eof
strCount = strEndIndex - strStartIndex
strWritePDF = BinaryWriteEx( hndOldPDFFile, strStartIndex, strFixedPDFFullPath, 0, strCount)
BinaryFree(hndOldPDFFile)
ENDIF
Now that I have an idea how this works, making a tool to do this in PS sounds more doable. There's a PS function out there in the wild called Get-HexDump that might be a good base to educate myself on bits and hex in PS. Since this works in Winbatch I assume there is some sort of equivalent in AutoIt and it could be reproduced in most basic languages.
There appears to be a lot of people out there trying to clear crud from before the header and after the end of their PDF docos, Hopefully this helps, I've got a half mill to hit with whatever script I morph this into. I might update with a PS version if I decide to go that route again, and if I remember.

Working with Word templates with Powershell

I am writing a function that is part of a much larger script that will take input from a web form, check to see if that user exists in either our AD or Linux systems, create the account if it doesn't, email the user when it's done, then create a Word document that we can print out and give them with their credentials (sans temp password), email address, and basic information about our IT services. I have been beating my head against the wall with the Word integration. There is almost ZERO Powershell documentation online for Word integration. I've been having to translate what I can from C# and VB and even half of that isn't even translateable. I've got it mostly working now but I'm having problems getting PS to put my text in the correct location in the Word template. I have a Word Template with 4 bookmarks where I am inserting the user's name, username, email address, and account expiration. The problem is, PS is placing all of the text at the same bookmark. I've found that if I put info in the script statically it will work (ie. $FillName.Text = 'John Doe') but if I use a variable it will just stick all of them at the first bookmark. Here is my code:
Function createWordDocument($fullname,$sam,$mailaddress,$Expiration)
{
$word = New-Object -ComObject "Word.application"
$doc = $word.Documents.add("C:\Users\smiths\Documents\Powershell Scripts\webformCreateUsers\welcome2.dotx")
$FillName=$doc.Bookmarks.Item("Name").Range
$FillName.Text="$fullname "
$FillUser=$doc.Bookmarks.Item("Username").Range
$FillUser.Text="$sam"
$FillMail=$doc.Bookmarks.Item("Email").Range
$FillMail.Text="$mailaddress"
$FillExpiration=$doc.Bookmarks.Item("Expiration").Range
$FillExpiration.Text="$Expiration"
$file = "C:\Users\smiths\Documents\Powershell Scripts\webformCreateUsers\test1.docx"
$doc.SaveAs([ref]$file)
$Word.Quit()
}
The function is receiving parameters that originated from a import-csv. $fullname, $sam and potentially $mailaddress have all been modified from their original inputs. #Expiration comes from the import-csv raw. Any help would be appreciated. This seems to be the most relevant info I could find and as far as I can tell I've got the same code, but It won't work for multiple bookmarks.
Ok, like I suggested you can setup a Mail Merge base that you can use to create docs for people. It does mean that you would need to output your data to a CSV file, but that is pretty trivial.
Start by setting up a test CSV with the data that you want to include. For simplicity you may want to place it with the word doc that references it. We'll call it mailmerge.csv for now, but you can name it whatever you want. Looks like Name, UserName, Email, and Expiration are the fields you would want. You can use dummy data in those fields for the time being.
Then setup your mail merge in Word, and save it someplace. We'll call it Welcome3.docx, and stash it in the same place as your last doc. Then, once it's setup to reference your CSV file, and saved, you can launch Word, open the master document, and perform the merge, then just save the file, and away you go.
I'll just use a modified version of your function which will create the CSV from the parameters provided, open the merge doc, execute the merge, save the new file, and close word. Then it'll pass a FileInfo object back so you can use that to send the email, or whatever.
Function createWordDocument($fullname,$sam,$mailaddress,$Expiration)
{
[PSCustomObject]#{Name=$fullname;Username=$sam;Email=$mailaddress;Expiration=$Expiration}|Export-Csv "C:\Users\smiths\Documents\Powershell Scripts\webformCreateUsers\mailmerge.csv" -NoTypeInformation -Force
$word = New-Object -ComObject "Word.application"
$doc = $word.Documents.Open("C:\Users\smiths\Documents\Powershell Scripts\webformCreateUsers\welcome3.dotx")
$doc.MailMerge.Execute()
$file = "C:\Users\smiths\Documents\Powershell Scripts\webformCreateUsers\$fullname.docx"
($word.documents | ?{$_.Name -Match "Letters1"}).SaveAs([ref]$file)
$Word.Quit()
[System.IO.FileInfo]$file
}
TheMadTechnician put me on the right track, but I had to do some tweaking. Here is what I wound up with:
Function createWordDocument($fullname)
{
$word = New-Object -ComObject "Word.application"
$doc = $word.Documents.Add("C:\Users\smiths\Documents\Powershell Scripts\webformCreateUsers\welcome_letter.docx")
$doc.MailMerge.Execute()
$file = "C:\Users\smiths\Documents\Powershell Scripts\webformCreateUsers\$fullname.docx"
($word.documents | ?{$_.Name -Match "Letters1"}).SaveAs([ref]$file)
$quitFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveOptions],"wdDoNotSaveChanges")
$Word.Quit([ref]$quitformat)
}
Instead of passing the arguments to the function, I had the main function create the mailmerge.csv file for me and just have the Word template connect to it. I'm still passing $fullname since that's what I'm naming the file in the end. The two major hiccups in the end were that everytime a mailmerge document file is opened, Word asks if you want to conect back to the source data. This means that when Powershell was trying to open it, Word was waiting for interaction and then PS would close it when it thought it was done. Of course, this meant that nothing got done. I found that there is a registry key that you must create to enable Word to skip the SQL Security check. for posterity's sake you must create a key here:
HKCU\Software\Microsoft\Office\14.0\Word\Options\ called SQLSecurityCheck with a DWORD value of 0. That allowed Word to properly open the template and manipulate the files. The last bit of trouble that I had was that Word was wanting to re-save the original file each time it ran and would leave a dialogue box open which would leave Word open and in memory. The last 2 lines force word to close without saving.