Print pdf files on different printers depending on their content - powershell

I want to print .pdf-files on different printers - depending on their content.
How can I check whether a specific single word is present in a file?
To queue through a folder's content I've build the following so far:
Unblock-File -Path S:\test\itextsharp.dll
Add-Type -Path S:\test\itextsharp.dll
$files = Get-ChildItem S:\test\*.pdf
$adobe='C:\Program Files (x86)\Adobe\Acrobat DC\Acrobat\Acrobat.exe'
foreach ($file in $files) {
$reader = [iTextSharp.text.pdf.parser.PdfTextExtractor]
$Extract = $reader::GetTextFromPage($File.FullName,1)
if ($Extract -Contains 'Lieferschein') {
Write-Host -ForegroundColor Yellow "Lieferschein"
$printername='XX1'
$drivername='XX1'
$portname='192.168.X.41'
} else {
Write-Host -ForegroundColor Yellow "Etikett"
$printername='XX2'
$drivername='XX2'
$portname='192.168.X.42'
}
$arglist = '/S /T "' + $file.FullName + '" "' + $printername + '" "' + $drivername + " " + $portname
start-process $adobe -argumentlist $arglist -wait
Start-Sleep -Seconds 15
Remove-Item $file.FullName
}
And for now I got 2 problems with it:
1st: Add-Type -Path itextsharp.dll gives me an error.
Add-Type: One or more types in the assembly cannot be loaded. Get the LoaderExceptions property for more information. In line: 2 character: 1
I've read that it might be due to the file being blocked. There is no information about that in the properties though. And the Unblock-File comand and the start doesn't change/solve anything.
After using $error[0].exception.loaderexceptions[0] I get the information that BouncyCastle.Crypto, Version=1.8.6.0 is missing. Unfortunatelly I can't find any sources for that yet.
2nd: Will if ($Extract -Contains 'Lieferschein') work as I intend? Will it check for the phrase after the Add-Type gets loaded successfully?
Alternatively: There's also the possibility to make it depend from the content's format. One type of the files has the size of DIN A4 for example. The other one is smaller than that. If there's an easier way to check for that, you'd make me happy aswell.
Thank you in advance!

Searching for a keyword in a pdf using Powershell and iTextSharp.dll. It's a very common thing. You then just use your conditional logic to send to whatever printer you choose.
SO, something like this should do.
Add-Type -Path 'C:\path_to_dll\itextsharp.dll'
$pdfs = Get-ChildItem 'C:\path_to_pdfs' -Filter '*.pdf'
$export = 'D:\Temp\PdfExport.csv'
$results = #()
$keywords = #('Keyword1')
foreach ($pdf in $pdfs)
{
"processing - $($pdf.FullName)"
$reader = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $pdf.FullName
for ($page = 1; $page -le $reader.NumberOfPages; $page++)
{
$pageText = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader, $page).Split([char]0x000A)
foreach ($keyword in $keywords)
{
if ($pageText -match $keyword)
{
$response = #{
keyword = $keyword
file = $pdf.FullName
page = $page
}
$results += New-Object PSObject -Property $response
}
}
}
$reader.Close()
}
"`ndone"
$results |
Export-Csv $export -NoTypeInformation
Update
As per your comment, regarding your error.
Again, iTextSharp is a legacy, and you really need to move to iText7.
Nonetheless, that is not a PowerShell code issue. It is an iTextSharp.dll missing dependency. Even with iText7, you need to ensure you have all the dependencies on your machine and properly loaded.
As noted in this SO Q&A:
How to use Itext7 in powershell V5, Exception when loading pdfWriter

1st:
After finding the correct version (1.8.6) on nuget.org the Add-Type commands work perfectly. As expected I didn't even need the unblock command as it was not marked as a blocked file in the properties. Now the script starts with:
Add-Type -Path 'c:\BouncyCastle.Crypto.dll'
Add-Type -Path 'c:\itextsharp.dll'
2nd
Regarding the check-queue: I just had to replace -contains with -match in my if clause.
if ($Extract -Contains 'Lieferschein')

Related

Word com object failing

SCRIPT PURPOSE
The idea behind the script is to recursively extract the text from a large amount of documents and update a field in an Azure SQL database with the extracted text. Basically we are moving away from Windows Search of document contents to an SQL full text search to improve the speed.
ISSUE
When the script encounters an issue opening the file such as it being password protected, it fails for every single document that follows. Here is the section of the script that processes the files:
foreach ($list in (Get-ChildItem ( join-path $PSScriptRoot "\FileLists\*" ) -include *.txt )) {
## Word object
$word = New-Object -ComObject word.application
$word.Visible = $false
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], "wdFormatText")
$word.DisplayAlerts = 0
Write-Output ""
Write-Output "################# Parsing $list"
Write-Output ""
$query = "INSERT INTO tmp_CachedText (tCachedText, tOID)
VALUES "
foreach ($file in (Get-Content $list)) {
if ($file -like "*-*" -and $file -notlike "*~*") {
Write-Output "Processing: $($file)"
Try {
$doc = $word.Documents.OpenNoRepairDialog($file, $false, $false, $false, "ttt")
if ($doc) {
$fileName = [io.path]::GetFileNameWithoutExtension($file)
$fileName = $filename + ".txt"
$doc.SaveAs("$env:TEMP\$fileName", [ref]$saveFormat)
$doc.Close()
$4ID = $fileName.split('-')[-1].replace(' ', '').replace(".txt", "")
$text = Get-Content -raw "$env:TEMP\$fileName"
$text = $text.replace("'", "''")
$query += "
('$text', $4ID),"
Remove-Item -Force "$env:TEMP\$fileName"
<# Upload to azure #>
$query = $query.Substring(0,$query.Length-1)
$query += ";"
Invoke-Sqlcmd #params -Query $Query -ErrorAction "SilentlyContinue"
$query = "INSERT INTO tmp_CachedText (tCachedText, tOID)
VALUES "
}
}
Catch {
Write-Host "$($file) failed to process" -ForegroundColor RED;
continue
}
}
}
Remove-Item -Force $list.FullName
Write-Output ""
Write-Output "Uploading to azure"
Write-Output ""
<# Upload to azure #>
Invoke-Sqlcmd #params -Query $setQuery -ErrorAction "SilentlyContinue"
$word.Quit()
TASKKILL /f /PID WINWORD.EXE
}
Basically it parses through a folder of .txt files that contain x amount of document paths, creates a T-SQL update statement and runs against an Azure SQL database after each file is fully parsed. The files are generated with the following:
if (!($continue)) {
if ($pdf){
$files = (Get-ChildItem -force -recurse $documentFolder -include *.pdf).fullname
}
else {
$files = (Get-ChildItem -force -recurse $documentFolder -include *.doc, *.docx).fullname
}
$files | Out-File (Join-Path $PSScriptRoot "\documents.txt")
$i=0; Get-Content $documentFile -ReadCount $interval | %{$i++; $_ | Out-File (Join-Path $PSScriptRoot "\FileLists\documents_$i.txt")}
}
The $interval variable defines how many files are set to be extracted for each given upload to azure. Initially i had the word object being created outside the loop and never closed until the end. Unfortunately this doesn't seem to work as every time the script hits a file it cannot open, every file that follows will fail, until it reaches the end of the inner foreach loop foreach ($file in (Get-Content $list)) {.
This means that to get the expected outcome i have to run this with an interval of 1 which takes far too long.
This is a shot in the dark
But to me it sounds like the reason its failing is because the Word Com object is now prompting you for some action due since it cannot open the file so all following items in the loop also fail. This might explain why it works if you set the $Interval to 1 because when its 1 it is closing and opening the Com object every time and that takes forever (I did this with excel).
What you can do is in your catch statement, close and open a new Word Com object which should lets you continue on with the loop (but it will be a bit slower if it needs to open the Com object a lot).
If you want to debug the problem even more, set the Com object to be visible, and slowly loop through your program without interacting with Word. This will show you what is happening with Word and if there are any prompts that are causing the application to hang.
Of course, if you want to run it at full speed, you will need to detect which documents you can't open before hand or you could multithread it by opening several Word Com objects which will allow you to load several documents at a time.
As for...
ISSUE
When the script encounters an issue opening the file such as it being password protected, it fails for every single document that follows.
... then test for this as noted here...
How to check if a word file has a password?
$filename = "C:\path\to\your.doc"
$wd = New-Object -COM "Word.Application"
try {
$doc = $wd.Documents.Open($filename, $null, $null, $null, "")
} catch {
Write-Host "$filename is password-protected!"
}
... and skip the file to avoid the failure of the remaining files.

Extracting text from word documents

TASK
Extract text from .doc, .docx and .pdf files and upload content to an Azure SQL database. Needs to be fast as its running over millions of documents.
ISSUES
Script starts to fail if one of the documents has an issue. Some that i have come across are:
This file failed to open last time you tried - Open readonly
File is corrupt
SCRIPT
Firstly i generate a list of files containing 100 file paths. This is so i can continue execution if i need to stop it and / or it errors out:
## Word object
if (!($continue)) {
$files = (Get-ChildItem -force -recurse $documentFolder -include *.doc, *.docx).fullname
$files | Out-File (Join-Path $PSScriptRoot "\documents.txt")
$i=0; Get-Content $documentFile -ReadCount 100 | %{$i++; $_ | Out-File (Join-Path $PSScriptRoot "\FileLists\documents_$i.txt")}
}
Then i create the ComObject with the DisplayAlerts flag set to 0 (i thought this would fix it. It didnt)
$word = New-Object -ComObject word.application
$word.Visible = $false
$saveFormat = [Enum]::Parse([Microsoft.Office.Interop.Word.WdSaveFormat], "wdFormatText")
$word.DisplayAlerts = 0
After this, I loop through each file in each list, save the file as .txt to the temp folder, extract the text and generate an SQL INSERT statemnt:
foreach ($file in (Get-Content $list)) {
Try {
if ($file -like "*-*") {
Write-Output "Processing: $($file)"
$doc = $word.Documents.Open($file)
$fileName = [io.path]::GetFileNameWithoutExtension($file)
$fileName = $filename + ".txt"
$doc.SaveAs("$env:TEMP\$fileName", [ref]$saveFormat)
$doc.Close()
$4ID = $fileName.split('-')[-1].replace(' ', '').replace(".txt", "")
$text = Get-Content -raw "$env:TEMP\$fileName"
$text = $text.replace("'", "")
$query += "
('$text', $4ID),"
Remove-Item -Force "$env:TEMP\$fileName"
}
<# Upload to azure #>
$query = $query.Substring(0,$query.Length-1)
$query += ";"
$params = #{
'Database' = $TRIS5DATABASENAME
'ServerInstance' = $($AzureServerInstance.FullyQualifiedDomainName)
'Username' = $AdminLogin
'Password' = $InsecurePassword
'query' = $query
}
Invoke-Sqlcmd #params -ErrorAction "continue"
$query = "INSERT INTO tmp_CachedText (tCachedText, tOID)
VALUES "
}
Catch {
Write-Host "$($file) failed to process" -ForegroundColor RED;
}
}
Remove-Item -Force $list.FullName
ISSUES
As stated above, if something is wrong with one of the files or the document failed to open properly off a previous run the script starts failing. Everything in the loop throws errors, starting with:
You cannot call a method on a null-valued expression.
At D:\OneDrive\Scripts\Microsoft Cloud\CachedText-Extraction\CachedText-Extraction.ps1:226 char:13
+ $doc = $word.Documents.Open($file)
Basically what i want is a way to stop those errors from appearing by simply skipping the file if it has an error with the document. Alternatively, if there is a better way to extract text from document files using PowerShell and not using word that would be good too.
An example of one of the error messages:
This causes the file to be locked and execution to pause. The only way to get around it is to kill word, which then causes the rest of the script to fail.

Compare directories exactly - including moved files

My aim is to compare two directories exactly - including the structure of the directories and sub-directories.
I need this, because I want to monitor if something in the folder E:\path2 was changed. For this case a copy of the full folder is in C:\path1. If someone changes something it has to be done in two directories.
It is important for us, because if something is changed in the directory (accidentally or not) it could break down other functions in our infrastructure.
This is the script I've already written:
# Compare files for "copy default folder"
# This Script compares the files and folders which are synced to every client.
# Source: https://mcpmag.com/articles/2016/04/14/contents-of-two-folders-with-powershell.aspx
# 1. Compare content and Name of every file recursively
$SourceDocsHash = Get-ChildItem -recurse –Path C:\path1 | foreach {Get-FileHash –Path $_.FullName}
$DestDocsHash = Get-ChildItem -recurse –Path E:\path2 | foreach {Get-FileHash –Path $_.FullName}
$ResultDocsHash = (Compare-Object -ReferenceObject $SourceDocsHash -DifferenceObject $DestDocsHash -Property hash -PassThru).Path
# 2. Compare name of every folder recursively
$SourceFolders = Get-ChildItem -recurse –Path C:\path1 #| where {!$_.PSIsContainer}
$DestFolders = Get-ChildItem -recurse –Path E:\path2 #| where {!$_.PSIsContainer}
$CompareFolders = Compare-Object -ReferenceObject $SourceFolders -DifferenceObject $DestFolders -PassThru -Property Name
$ResultFolders = $CompareFolders | Select-Object FullName
# 3. Check if UNC-Path is reachable
# Source: https://stackoverflow.com/questions/8095638/how-do-i-negate-a-condition-in-powershell
# Printout, if UNC-Path is not available.
if(-Not (Test-Path \\bb-srv-025.ftscu.be\DIP$\Settings\ftsCube\default-folder-on-client\00_ftsCube)){
$UNCpathReachable = "UNC-Path not reachable and maybe"
}
# 4. Count files for statistics
# Source: https://stackoverflow.com/questions/14714284/count-items-in-a-folder-with-powershell
$count = (Get-ChildItem -recurse –Path E:\path2 | Measure-Object ).Count;
# FINAL: Print out result for check_mk
if($ResultDocsHash -Or $ResultFolders -Or $UNCpathReachable){
echo "2 copy-default-folders-C-00_ftsCube files-and-folders-count=$count CRITIAL - $UNCpathReachable the following files or folders has been changed: $ResultDocs $ResultFolders (none if empty after ':')"
}
else{
echo "0 copy-default-folders-C-00_ftsCube files-and-folders-count=$count OK - no files has changed"
}
I know the output is not perfect formatted, but it's OK. :-)
This script spots the following changes successfully:
create new folder or new file
rename folder or file -> it is shown as error, but the output is empty. I can live with that. But maybe someone sees the reason. :-)
delete folder or file
change file content
This script does NOT spot the following changes:
move folder or file to other sub-folder. The script still says "everything OK"
I've been trying a lot of things, but could not solve this.
Does anyone can help me how the script can be extended to spot a moved folder or file?
I think your best bet is to use the .NET FileSystemWatcher class. It's not trivial to implement an advanced function that uses it, but I think it will simplify things for you.
I used the article Tracking Changes to a Folder Using PowerShell when I was learning this class. The author's code is below. I cleaned it up as little as I could stand. (That publishing platform's code formatting hurts my eyes.)
I think you want to run it like this.
New-FileSystemWatcher -Path E:\path2 -Recurse
I could be wrong.
Function New-FileSystemWatcher {
[cmdletbinding()]
Param (
[parameter()]
[string]$Path,
[parameter()]
[ValidateSet('Changed', 'Created', 'Deleted', 'Renamed')]
[string[]]$EventName,
[parameter()]
[string]$Filter,
[parameter()]
[System.IO.NotifyFilters]$NotifyFilter,
[parameter()]
[switch]$Recurse,
[parameter()]
[scriptblock]$Action
)
$FileSystemWatcher = New-Object System.IO.FileSystemWatcher
If (-NOT $PSBoundParameters.ContainsKey('Path')){
$Path = $PWD
}
$FileSystemWatcher.Path = $Path
If ($PSBoundParameters.ContainsKey('Filter')) {
$FileSystemWatcher.Filter = $Filter
}
If ($PSBoundParameters.ContainsKey('NotifyFilter')) {
$FileSystemWatcher.NotifyFilter = $NotifyFilter
}
If ($PSBoundParameters.ContainsKey('Recurse')) {
$FileSystemWatcher.IncludeSubdirectories = $True
}
If (-NOT $PSBoundParameters.ContainsKey('EventName')){
$EventName = 'Changed','Created','Deleted','Renamed'
}
If (-NOT $PSBoundParameters.ContainsKey('Action')){
$Action = {
Switch ($Event.SourceEventArgs.ChangeType) {
'Renamed' {
$Object = "{0} was {1} to {2} at {3}" -f $Event.SourceArgs[-1].OldFullPath,
$Event.SourceEventArgs.ChangeType,
$Event.SourceArgs[-1].FullPath,
$Event.TimeGenerated
}
Default {
$Object = "{0} was {1} at {2}" -f $Event.SourceEventArgs.FullPath,
$Event.SourceEventArgs.ChangeType,
$Event.TimeGenerated
}
}
$WriteHostParams = #{
ForegroundColor = 'Green'
BackgroundColor = 'Black'
Object = $Object
}
Write-Host #WriteHostParams
}
}
$ObjectEventParams = #{
InputObject = $FileSystemWatcher
Action = $Action
}
ForEach ($Item in $EventName) {
$ObjectEventParams.EventName = $Item
$ObjectEventParams.SourceIdentifier = "File.$($Item)"
Write-Verbose "Starting watcher for Event: $($Item)"
$Null = Register-ObjectEvent #ObjectEventParams
}
}
I don't think any example I've found online tells you how to stop watching the filesystem. The simplest way is to just close your PowerShell window. But I always seem to have 15 tabs open in each of five PowerShell windows, and closing one of them is a nuisance.
Instead, you can use Get-Job to get the Id of registered events. Then use Unregister-Event -SubscriptionId n to, well, unregister the event, where 'n' represents the number(s) you find in the Id property of Get-Job..
So basically you want to synchronize the two folders and note all the changes made on that:
I would suggest you to use
Sync-Folder Script
Or
FreeFile Sync.

Powershell output formatting?

I have a script that scans for a specific folder in users AppData folder. If it finds the folder, it then returns the path to a txt file. So we can see the computer name and username where it was found.
I would like to be able to format the what is actually written to the text file, so it removes everything from the path except the Computer and User names.
Script:
foreach($computer in $computers){
$BetterNet = "\\$computer\c$\users\*\AppData\Local\Google\Chrome\User Data\Default\Extensions\gjknjjomckknofjidppipffbpoekiipm"
Get-ChildItem $BetterNet | ForEach-Object {
$count++
$betternetCount++
write-host BetterNet found on: $computer
Add-Content "\\SERVERNAME\PowershellScans\$date\$time\BetterNet.txt" $_`n
write-host
}
}
The text files contain information like this
\\computer-11-1004S10\c$\users\turtle\AppData\Local\Google\Chrome\User Data\Default\Extensions\gjknjjomckknofjidppipffbpoekiipm
\\computer-1004-24S\c$\users\camel\AppData\Local\Google\Chrome\User Data\Default\Extensions\gjknjjomckknofjidppipffbpoekiipm
\\computer-1004-23S\c$\users\rabbit\AppData\Local\Google\Chrome\User Data\Default\Extensions\gjknjjomckknofjidppipffbpoekiipm
If you have each line in a form of the string $string_containing_path then it is easy to split using split method and then add index(1) and (4) that you need:
$afterSplit = $string_containing_path.Split('\')
$stringThatYouNeed = $afterSplit[1] + " " + $afterSplit[4]
You can also use simple script that will fix your current logs:
$path_in = "C:\temp\list.txt"
$path_out= "C:\temp\output.txt"
$reader = [System.IO.File]::OpenText($path_in)
try {
while($true){
$line = $reader.ReadLine()
if ($line -eq $null) { break }
$line_after_split_method = $line.Split('\')
$stringToOutput = $line_after_split_method[1] + " " + $line_after_split_method[4] + "`r`n"
add-content $path_out $stringToOutput
}
add-content $path_out "End"
}
finally {
$reader.Close()
}
If you split your loop into two foreach loops, one for computer and user directory it would be easier to output the name of the user directory.
$output = foreach($computer in $computers){
$UserDirectories = Get-ChildItem "\\$computer\c$\users\" -Directory
foreach ($Directory in $UserDirectories) {
$BetterNet = Get-ChildItem (Join-Path $Directory.fullname "\AppData\Local\Google\Chrome\User Data\Default\Extensions\gjknjjomckknofjidppipffbpoekiipm")
Add-Content "\\SERVERNAME\PowershellScans\$date\$time\BetterNet.txt" "$computer $($Directory.name)`r`n"
write-host BetterNet found on: $computer
$BetterNet
}
}
$output.count

backing up .thumbnails powershell

I have written a backup script, which backs up and logs errors. works fine , except for some .thumbnails, many other .thumbnails do get copied!
of 54000 Files copied, the same 480 .thumbnails do not ever get copied or logged. i will be checking the attributes however i feel the copy-item function shouldve done the job. Any other recommendations are welcome as well, but please stay on topic, thx!!!!
here is my backUP script
Function backUP{ Param ([string]$destination1 ,$list1)
$destination2 = $destination1
#extract new made string for backuplog
$index = $destination2.LastIndexOf("\")
$count = $destination2.length - $index
$source1 = $destination2.Substring($index, $count)
$finalstr2 = $logdrive + $source1
Foreach($item in $list1){
Copy-Item -Container: $true -Recurse -Force -Path $item -Destination $destination1 -erroraction Continue
if(-not $?)
{
write-output "ERROR de copiado : " $error| format-list | out-file -Append "$finalstr2\GCI-ERRORS-backup.txt"
Foreach($erritem in $error){
write-output "Error Data:" $erritem.TargetObject | out-file -Append "$finalstr2\GCI-ERRORS-backup.txt"
}
$error.Clear()
}
}
}
Are you sure your backUP function is receiving .thumbnails files in $list1? If the files are hidden, then Get-ChildItem will only return them if the -Force switch is used.
As for other recommendations, Robocopy.exe is a good dedicated tool for performing file synchronization.
Apparantly permissions to the thumbnails folder, i had not.
that set, script worked fine!