I've got an application that opens a winform and asks the user to input a PDF file. Because I can't read strings in PDF files easily, I need to convert it to a .txt. When the user clicks OK, the application does this.
The problem I'm having is now using the .txt file object and passing it to another command without knowing the name of it. When I try to pipe it to another command, it won't work because I don't have the path. I think this is because the output of conversion is the string "OK" and not the actual .txt file.
How can I convert the PDFs to text (I'm using Xpdf) and pass the converted file down the pipeline for further processing?
If the means I'm using is the problem, how can I accomplish this task another way?
Add-Type -AssemblyName System.Windows.Forms
$form = New-Object System.Windows.Forms.Form
$form.StartPosition = 'CenterScreen'
$button = New-Object System.Windows.Forms.Button
$form.Controls.Add($button)
$button.Text = 'Get file'
$button.Location = '10,10'
$button.Add_Click({
$ofd = New-Object system.windows.forms.Openfiledialog
$ofd.Filter = 'PDFs (*.pdf)|*.pdf'
$script:filename = 'Not found'
if ($ofd.ShowDialog() -eq 'Ok') {
$script:filename = $textbox.Text = $ofd.FileName
}
})
$buttonOK = New-Object System.Windows.Forms.Button
$form.Controls.Add($buttonOK)
$buttonOK.Text = 'Ok'
$buttonOK.Location = '10,40'
$buttonOK.DialogResult = 'OK'
$textbox = New-Object System.Windows.Forms.TextBox
$form.Controls.Add($textbox)
$textbox.Location = '100,10'
$textbox.Width += 50
$form.ShowDialog()
$output = & "C:\Users\eakinsa\Desktop\Style Guide Report\Includes\bin32\pdftotext" $filename
$output |
Get-Location -OutVariable textFile |
Select-String -Path $textFile -Pattern ed
Per Ansgar:
I amended the lines last few lines to, for now, maintain the default functionality of pdftotext where it creates the file in the same directory with the same name, as with his suggestion, I could easily replace .pdf with .txt on the end of the file path, thereby having the flexibility to pass the correct file path to subsequent functions. That made it so I was able to search the text file.
& "C:\users\eakinsa\Desktop\Style Guide Report\Includes\bin32\pdftotext" $filename
$pdf = Get-Item $filename
$textfile = $filename -replace '\.pdf$', '.txt'
Select-String -Path $textfile -Pattern ed
When you run pdftotext with just the input PDF as argument it creates the output text file in the same directory with the same basename and the extension txt.
& pdftotext C:\temp\foo.pdf # creates C:\temp\foo.txt
So you can build the text file path like this:
$pdf = Get-Item $filename
$textfile = Join-Path $pdf.DirectoryName ($pdf.BaseName + '.txt')
or like this:
$textfile = $filename -replace '\.pdf$', '.txt'
Alternatively you can tell pdftotext where to create the output file:
$textfile = 'C:\some\where\bar.txt'
& pdftotext $filename $textfile # creates C:\some\where\bar.txt
Related
I am setting up a Powershell script that by passing the path to a directory, checks if any of the strings contained in an excel file appear in the doc or docx files in that directory. If so, the colour of that string has to be changed in the doc/docx file.
I've been messing around with Word's COM object for a while and haven't been able to get it to work.
Here is what I have so far
#Open a Folder Browser to select the folder to process
$fwd = New-Object System.Windows.Forms.FolderBrowserDialog
$null = $fwd.ShowDialog()
$path = $fwd.SelectedPath
$files = Get-ChildItem -Path $path| Where-Object -Property Extension -Match ".docx?"
#Load the excel file data to an arraylist object for late iteration
$forbiddenWordsFilePath = "<<Path_To_XLSX>>"
$forbiddenWords= Import-Excel -Path $forbiddenWordsFilePath
$WordDoc = New-Object -ComObject Word.Application
$WordDoc.visible=$false
#Iterate over the files contained on the folder
foreach ($file in $files) {
#Make a backup of the docx file previous to working with it
$fileBackupPath= $file.FullName+"_bck"
Copy-Item $file.FullName -Destination $fileBackupPath
#Opens the docx? file on background
$doc = $WordDoc.Documents.Open($file.FullName,[type]::Missing,$true)
$MatchCase = $false
$MatchWholeWorld = $true
$MatchWildcards = $false
$MatchSoundsLike = $false
$MatchAllWordForms = $false
$Forward = $false
$Wrap = 1
$Format = $true
$wdReplaceAll = 2
#Iterate over the words list and search for them on the docx file
foreach ($word in $forbiddenWords){
$Replace=$word.Search_Pattern
$doc.Content.Find.Execute($word.Search_Pattern, $MatchCase, $MatchWholeWorld, $MatchWildcards, $MatchSoundsLike, $MatchAllWordForms, $Forward, $Wrap, $Format, $Replace, $wdReplaceAll)
}
}
#Clears any opened Microsoft Word process
$null = [System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$WordDoc)
[gc]::Collect()
[gc]::WaitForPendingFinalizers()
Remove-Variable WordDoc
It is clear to me that the colour change should be in the second foreach. But I am not able to find the code that allows me to do it.
I'm a bit of a novice in dealing with the Word COM object with Powershell.
The excel file has the following headers....
Control_Type Control Comments Search_Pattern
In case there are doubts about how to obtain the property "Search_word".
I have added spell checking by reading contents of a file using powershell script?
This script does my job, but I want to check if there are any external packages or modules available for the same, since it would make the work easier.
$file = Get-ChildItem ./Code-Duplication/master.md
$Proofread_text = Get-Content $file.FullName
$Word = New-Object -COM Word.Application
$Document = $Word.Documents.Add()
$Textrange = $Document.Range(0)
#$english = FindLanguage("English (US)")
#$Textrange.LanguageID = $english.ID
$Textrange.InsertAfter($Proofread_text)
<#Handle misspelled words here#>
$file.Name
Write-Output "---------------"
foreach($spell_error in $textrange.SpellingErrors){
Write-Host $spell_error.Text
}
$Document.Close(0)
$Word.Quit()
I try to insert a txt file to a document with
$doc.selection.InlineShapes.AddOLEObject($txtFile)
But when I run it I get an error message:
What could be the problem? tnx
The code:
$Global:path = "C:\Users\user\desktop"
$txtFile = New-Item $Global:path -Name TxtFile.txt
$docx = New-Object -ComObject Word.Application
$docx.Visible = $true
$docxFileName = $docx.Documents.add()
$docx.Selection.range.InsertAfter("hello")
$docx.Selection.InlineShapes.AddOLEObject($txtFile)
$docxFileName.SaveAs([ref]$Global:path,[ref]$SaveFormat::wdFormatDocument)
$docx.Quit()
Use this. Please read in-code comments.
Check out the documentation for AddOLEObject for more info.
$Global:path = "C:\Users\user\desktop"
# I've added the -Force flag. because if the file already exists,
# $txtFile will contain an error instead of the object you expect
# $Global:path > $path because you don't need to specify the scope when you use a variable.
$txtFile = New-Item $path -Name TxtFile.txt -Force
$docx = New-Object -ComObject Word.Application
$docx.Visible = $true
$docxFileName = $docx.Documents.add()
$docx.Selection.range.InsertAfter("hello")
# This line should contain an empty argument for ClassType, then the file path for FileName
$docx.Selection.InlineShapes.AddOLEObject("",$txtFile.FullName)
# again, $global:path is not required.
# you can specify a path and the docx extension will be added automagically
$docxFileName.SaveAs("$path\somefilename")
$docx.Quit()
I am fighting with outputting text file and getting file content in exactly the same way as it was provided by the user. I tried to find an answer but the only thing I can find is how to change multiline into one string which is quite the opposite I need.
How to ensure that input provided by the user in multiline will not be altered to single line in PowerShell?
Details:
In inputbox I provide details in multiline:
PowerShell saves the output in one line:
I want PowerShell to save multiline input and then show multiline output.
Here is the code:
function Notes
{
$var = $Users.Text
$var | Out-File C:\Users\A570654\Desktop\users.txt
$var = Get-Content C:\Users\A570654\Desktop\users.txt
Set-Content -Path C:\Users\A570654\Desktop\notes.txt "*************************************************
$var
*************************************************"
Invoke-Item -Path C:\Users\A570654\Desktop\notes.txt
}
############################## FORMS #######################################
$Users = New-Object System.Windows.Forms.RichTextBox
$Users.Size = New-Object System.Drawing.size(470,85)
$Users.Location = New-Object System.Drawing.Size(10,100)
$Users.MultiLine = $true
$Users.AutoSize = $true
$Users.ScrollBars = "Vertical"
$users.AcceptsTab
$form = New-Object system.windows.forms.form
$form.size = New-Object system.drawing.size(600,600)
$button = New-Object system.windows.forms.button
$button.text = "Get Notes"
$button.Add_Click({Notes})
$form.Controls.Add($Users)
$form.Controls.Add($button)
$form.Showdialog()
The reason why I decided to save the string in file is that output shows spaces as in picture 2. If I decide not to save output to file and then get its content then string appears in the following format: JohnTomKate.
You are saving the user input to a file and reload the content which will give you the current output. So just use the variable without reassigning it:
$var = $Users.Text
$var | Out-File C:\Users\A570654\Desktop\users.txt
Set-Content -Path C:\Users\A570654\Desktop\notes.txt "*************************************************
$var
*************************************************"
Output:
*************************************************
John
Tom
Kate
*************************************************
The following program runs an Excel VBA macro named "Macro1" from Powershell on a group of files in a folder location "c:\mfolder". How can I replicate it for a Word VBA macro?
*****runexcel.ps1 ******
$excel = new-object -comobject excel.application
$excelFiles = Get-ChildItem -Path C:\mfolder -Include *.xls -Recurse
Foreach($file in $excelFiles)
{
$workbook = $excel.workbooks.open($file.fullname)
$worksheet = $workbook.worksheets.item(1)
$excel.Run("Macro1")
$workbook.save()
$workbook.close()
}
$excel.quit()
To open MS Word via Powershell, use the following command:
$word = new-object –comobject Word.Application
Within your loop, use this to open each file:
$doc = $word.documents.open($file.fullname)
You should be able to adapt the rest from the script you provided.