How can I verify that a PDF file is "good"? - powershell

I have a process that compresses PDF files that our secretaries create by scanning signed documents at a multi-function printer.
On rare occasions, these files cannot be opened in Acrobat reader after being compressed. I don't know why this is happening rarely, so I'd like to be able to test the PDF post-compression and see if it is "good".
I am trying to use itextsharp 5.1.1 to accomplish this, but it happily loads the PDF. My best guess is that Acrobat reader fails when it's trying to display the picture.
Any ideas on how I can tell if the PDF will render?

In similar situations in the past I have successfully used the PDF Toolkit (a/k/a pdftk) to repair bad PDFs with a command like this: pdftk broken.pdf output fixed.pdf.

OK, what I ended up doing was using itextsharp to loop through all of the stream objects and check their length. The error condition I had was that the length would be zero. This test seems quite reliable. It may not work for everyone, but it worked in this particular situation.

PdfCpu works great. relaxed example:
pdfcpu validate goggles.pdf
Strict example:
pdfcpu validate -m strict goggles.pdf
https://pdfcpu.io/core/validate

qpdf will be of great help for your needs:
apt-get install qpdf
qpdf --check filename.pdf
example output:
checking filename.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
WARNING: filename.pdf: file is damaged
WARNING: filename.pdf (object 185 0, file position 1235875): expected n n obj
WARNING: filename.pdf: Attempting to reconstruct cross-reference table
WARNING: filename.pdf: object 185 0 not found in file after regenerating cross reference table
operation for Dictionary object attempted on object of wrong type

I've used "pdfinfo.exe" from xpdfbin-win package and cpdf.exe to check PDF files for corruption, but didn't want to involve a binary if it wasn't necessary.
I read that newer PDF formats have a readable xml data catalog at the end, so I opened the PDF with regular windows NOTEPAD.exe and scrolled down past the unreadable data to the end and saw several readable keys. I only needed one key, but chose to use both CreationDate and ModDate.
The following Powershell (PS) script will check ALL the PDF files in the current directory and output the status of each into a text file (!RESULTS.log). It took about 2 minutes to run this against 35,000 PDF files. I tried to add comments for those who are new to PS. Hope this saves someone some time. There's probably a better way to do this, but this works flawlessly for my purposes and handles errors silently. You might need to define the following at the beginning: $ErrorActionPreference = "SilentlyContinue" if you see errors on screen.
Copy the following into a text file and name it appropriately (ex: CheckPDF.ps1) or open PS and browse to the directory containing the PDF files to check and paste it in the console.
#
# PowerShell v4.0
#
# Get all PDF files in current directory
#
$items = Get-ChildItem | Where-Object {$_.Extension -eq ".pdf"}
$logFile = "!RESULTS.log"
$badCounter = 0
$goodCounter = 0
$msg = "`n`nProcessing " + $items.count + " files... "
Write-Host -nonewline -foregroundcolor Yellow $msg
foreach ($item in $items)
{
#
# Suppress error messages
#
trap { Write-Output "Error trapped"; continue; }
#
# Read raw PDF data
#
$pdfText = Get-Content $item -raw
#
# Find string (near end of PDF file), if BAD file, ptr will be undefined or 0
#
$ptr1 = $pdfText.IndexOf("CreationDate")
$ptr2 = $pdfText.IndexOf("ModDate")
#
# Grab raw dates from file - will ERR if ptr is undefined or 0
#
try { $cDate = $pdfText.SubString($ptr1, 37); $mDate = $pdfText.SubString($ptr2, 31); }
#
# Append filename and bad status to logfile and increment a counter
# catch block is also where you would rename, move, or delete bad files.
#
catch { "*** $item is Broken ***" >> $logFile; $badCounter += 1; continue; }
#
# Append filename and good status to logfile
#
Write-Output "$item - OK" -EA "Stop" >> $logFile
#
# Increment a counter
#
$goodCounter += 1
}
#
# Calculate total
#
$totalCounter = $badCounter + $goodCounter
#
# Append 3 blank lines to end of logfile
#
1..3 | %{ Write-Output "" >> $logFile }
#
# Append statistics to end of logfile
#
Write-Output "Total: $totalCounter / BAD: $badCounter / GOOD: $goodCounter" >> $logFile
Write-Output "DONE!`n`n"

Related

How to export certs with SAN extensions?

I have this PowerShell command that exports for me all issued certificates into a .csv file:
$Local = "$PSScriptRoot"
$File = "$Local\IssuedCerts.csv"
$Header = "Request ID,Requester Name,Certificate Template,Serial Number,Certificate Effective Date,Certificate Expiration Date,Issued Country/Region,Issued Organization,Issued Organization Unit,Issued Common Name,Issued City,Issued State,Issued Email Address"
certutil -view -out $Header csv > $File
This works fine, by the way I would like to format the output in a more readable manner, if its somehow possible, please let me know, too.
The point is I need to export all certificates which will expire soon, but I also need data from SAN Extensions from each certificate too be exported with.
Perhaps getting the certificates directly from the CertificateAuthority X509Store and reading the certificate extensions (one of which is the Subject Alt Names) using the ASNEncodedData class would do the trick?
Example code below on reading certificates from the given store and printing out their extensions:
using namespace System.Security.Cryptography.X509Certificates
$caStore = [X509Store]::new([StoreName]::CertificateAuthority, [StoreLocation]::LocalMachine)
$caStore.Open([OpenFlags]::ReadOnly)
foreach ($certificate in $caStore.Certificates) {
foreach ($extension in $certificate.Extensions) {
$asnData = [System.Security.Cryptography.AsnEncodedData]::new($extension.Oid, $extension.RawData)
Write-Host "Extension Friendly Name: $($extension.Oid.FriendlyName)"
Write-Host "Extension OID: $($asnData.Oid.Value)"
Write-Host "Extension Value: $($asnData.Format($true))"
}
}
$caStore.Close()
You can specify a different store to open by specifying a different value for the [StoreName]::CertificateAuthority section.
Disclaimer, I haven't been able to test this code in production, so I'm not 100% certain that all the fields you require are exposed, but may serve as a good starting point

CMake collect sources from folders using batch script (without GLOB)

I want to collect all the source or header files from a specified folder, also matching a curtain naming convention. I don't want to use GLOBbing, and also couldn't find any examples of an approach using only cmake.
One answer from this question suggests to use ls *.cpp into CMakeLists.txt. So I though of getting a list of sources via invoking a batch script in CMakeLists.
But something is wrong. Though it seems that the output is totally correct, CMake can not find those files. The path is (visually) correct: if I manually type it into add_executable, generating will succeed.
While I still want to know how to achieve the initial intent, I am extremely confused about the reason why totally identical strings compare to false:
CMake log:
-- Manually-typed: C:/Repos/cmake-scanner/src/main.cpp
-- Recieved-batch: C:/Repos/cmake-scanner/src/main.cpp
-- Path strings not identical
CollectSources.bat
#echo off
set arg1=%1
set arg2=%2
powershell -Command "$path = '%1'.Replace('\','/'); $headers = New-Object Collections.Generic.List[string]; ls -Name $path/*.%2 | foreach-object{ $headers.Add($path + '/' + $_)}; $headers"
CMakeLists.txt
cmake_minimum_required(VERSION 3.12 FATAL_ERROR)
project(Auto-scanner)
set(HEADERS)
set(SOURCES)
if(WIN32)
execute_process(
COMMAND CMD /c ${CMAKE_CURRENT_SOURCE_DIR}/CollectSources.bat ${CMAKE_CURRENT_SOURCE_DIR}/include h
OUTPUT_VARIABLE res
)
message(STATUS "Found headers: ${res}")
execute_process(
COMMAND CMD /c ${CMAKE_CURRENT_SOURCE_DIR}/CollectSources.bat ${CMAKE_CURRENT_SOURCE_DIR}/src cpp
OUTPUT_VARIABLE res2
)
message(STATUS "Found sources: ${res2}")
set(${HEADERS} ${res})
endif(WIN32)
message(STATUS "Collected headers: ${HEADERS}")
message(STATUS "Manually-typed: C:/Repos/cmake-scanner/src/main.cpp")
message(STATUS "Recieved-batch: ${res2}")
if(NOT "C:/Repos/cmake-scanner/src/main.cpp" STREQUAL "${res2}")
message(STATUS "Path strings not identical")
else()
message(STATUS "Path strings are identical")
endif()
add_executable(${PROJECT_NAME}
${res}
${res2}
)
target_include_directories(${PROJECT_NAME}
PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/include
${CMAKE_CURRENT_SOURCE_DIR}/src
)
and project tree:
cmake-scanner
|-include
| |-IPublicA.h
| |-IPublicB.h
| |-IPublicC.h
| |-IPublicD.h
|-src
|-main.cpp
https://github.com/ElDesalmado/cmake-scanner.git
UPDATE
Strings' comparison by length yields different results, so I thought maybe there are some trailing characters in the output of execute_process.
So I replaced all the newlines that actually might prevent cmake from finding source files.
string(REGEX REPLACE "\n$" "" ...)
So they compare equal, however still could not be located by cmake.
I had some luck with using OUTPUT_STRIP_TRAILING_WHITESPACE in execute_command and main.cpp has been finally located and project generated. But when there are 2 or more sources this doesn't help.
I m going to try outputting sources' names in a single line and see what would occur...
I have solved the issue.
Cmake accepts lists of sources that must be formatted in a way, that sources' paths are separated with a semicolon.
So the solution was to modifiy batch script to output a string line of semicolon-separated file names. Later I will update the repo and provide the batch code.
In order for CMake to recognize the output from the bathc script as a list of Source/Header files, it must not contain any trailing symbols like whitespaces or newlines and file paths must be separated with a semicolon:
path-to-headerA.h;path-to-headerB.h;path-to-headerC.h;
(It is ok if there is a semiciolon at the end of the string line - CMake accepts that).
Working solution
powershell
#echo off
set arg1=%1
set arg2=%2
powershell -Command "$path = '%1'.Replace('\','/'); $headers = ''; get-childitem $path/*.%2 | select-object -expandProperty Name | foreach-object{ $headers += ($path + '/' + $_ + ';')}; Write-output $headers"
CollectSources.cmake
#Collect source files from a given folder
set(DIR_OF_CollectSources_CMAKE ${CMAKE_CURRENT_LIST_DIR})
function(CollectSources path ext ret)
message(STATUS "Collecting sources *.${ext} from ${path}")
execute_process(
COMMAND CMD /c ${DIR_OF_CollectSources_CMAKE}/CollectSources.bat ${path} ${ext}
OUTPUT_VARIABLE res
OUTPUT_STRIP_TRAILING_WHITESPACE
)
message(STATUS "Sources collected:")
foreach(src ${res})
message(${src})
endforeach()
set(${ret} "${res}" PARENT_SCOPE)
endfunction()
usage in CMakeLists.txt:
include(CollectSources)
CollectSources(${CMAKE_CURRENT_SOURCE_DIR}/include h HEADERS)
Example:
https://github.com/ElDesalmado/cmake-scanner.git
CMake output:
-- Collecting sources *.h from C:/Repos/cmake-scanner/include
-- Sources collected:
C:/Repos/cmake-scanner/include/IPublicA.h
C:/Repos/cmake-scanner/include/IPublicB.h
C:/Repos/cmake-scanner/include/IPublicC.h
C:/Repos/cmake-scanner/include/IPublicD.h
-- Collecting sources *.cpp from C:/Repos/cmake-scanner/src
-- Sources collected:
C:/Repos/cmake-scanner/src/lib.cpp
C:/Repos/cmake-scanner/src/main.cpp

How to use get the current directory and use wildcard to find any csv file there?

How can I use relative path and relative (wildcard) files in powershell?
I have tried the following but didn't work:
Powershell -c
$file= Get-ChildItem -Filter *.csv
[System.IO.File]::ReadAllText .\$file .replace('Buyer','Alıcı').
replace('Supplier','Tedarikçi').
replace('FI','Banka').
replace('Supplier Reference','Tedarikçi Referansı').
replace('Buyer Program','Alıcı Programı').
replace('Buy Offer','eklif Al').
replace('Payment Obligation Id','Ödeme Yükümlülüğü Kimliği').
replace('Buyer Unique Doc Id','Alıcı Benzersiz Doküman Kimliği').
replace('Trade Date','Ticaret Tarihi').
replace('Due Date','ödeme tarihi').
replace('Maturity Date','Vade Tarihi').
replace('Currency','Para Birimi').
replace('Certified Value','Sertifikalı Değer').
replace('Buyer Payment Amount','Alıcıya Ödeme Tutarı').
replace('Buyer Fee','Alıcı Ücreti').
replace('Supplier Interest Fees','Tedarikçi Faiz Ücretleri').
replace('Supplier Funds Received','Alınan Tedarikçi Fonu')|sc c:\pstest\test2.csv
I want the script to fetch for any csv file in the same directory where the script is being run, any ideas?
I think there is a lot of suggestions how to get path of running script.
Althought I would do that like this:
$dir = $MyInvocation.MyCommand.Path
$dir = $dir.SubString(0, $dir.LastIndexOf("\") + 1)
Then you got path of script execution and what you need is just to call your script.
. $dir"MyScript.ps1"

get variable value from a file and append with new

I have a file which has a variable
$versionNumber = "1.0.0
I need to change the variable value to "$versionNumber = "user_choice"" explicitly.
#------------------------------------------------
# Pack parameters used to create the .nupkg file.
#------------------------------------------------
# Specify the Version Number to use for the NuGet package. If not specified, the version number of the assembly being packed will be used.
# NuGet version number guidance: https://docs.nuget.org/docs/reference/versioning and the Semantic Versioning spec: http://semver.org/
# e.g. "" (use assembly's version), "1.2.3" (stable version), "1.2.3-alpha" (prerelease version).
$versionNumber = "1.0.0"
# Specify any Release Notes for this package.
# These will only be included in the package if you have a .nuspec file for the project in the same directory as the project file.
$releaseNotes = ""
# Specify a specific Configuration and/or Platform to only create a NuGet package when building the project with this Configuration and/or Platform.
# e.g. $configuration = "Release"
# $platform = "AnyCPU"
$configuration = ""
$platform = ""
Any possible approach is advisable.
Use the Get-Content cmdlet to read the file, find the versionNumber variable using a positive lookbehind regex and replace it. Finally, use the Set-Content cmdlet to write it back:
(Get-Content 'youFile.nupkg' -raw) -replace '(?<=\$versionNumber\s=).+', '"user_choice"' |
Set-Content 'youFile.nupkg'
other solution without regex
(get-content "C:\temp\nuckpkg.txt").Replace('$releaseNotes = ""', '$releaseNotes = "user_choice"') |
set-content "C:\temp\nuckpkg.txt"
You haven't properly used the code suggested by Martin .
you are using "$env:Version" which will finally get interpreted to the variable value only... '"' are part of replace syntax.
you should use it in this way "'$env:Version'" .
Regards,
Kvprasoon

From Msi , how to get the list of files packed in each feature?

We have used wix to create Msi. Each Msi will be having 1 or 2 or 3 features such as Appserver feature, Webserver feature and DB server feature.
Now i was asked to get the list of config files presented in each feature.
It is tough to find the list of web.config files associated with each feature through wxs file.
Is it possible find the list of files associated with a feature with particular search pattern?
For ex. Find all the web.config files packed in Appserver feature.
Is there any way easy way ( querying or some other automated script such as powershell) to get the list?
Wix comes with a .NET SDK referred to as the DTF ("deployment tools foundation"). It wraps the windows msi.dll among other things. You can find these .NET Microsoft.Deployment.*.dll assemblies in the SDK subdirectory of the Wix Toolset installation directory. The documentation is in dtf.chm and dtfapi.chm in the doc subdirectory.
As shown in the documentation, you can use this SDK to write code which queries the msi database with SQL. You will be interested in the Feature, FeatureComponents and File tables.
If you haven't explored the internals of an MSI before, you can open it with orca to get a feel for it.
You can do it by making slight modifications to the Get-MsiProperties function described in this PowerShell article.
Please read the original article and create the prescribed comObject.types.ps1xml file.
function global:Get-MsiFeatures {
PARAM (
[Parameter(Mandatory=$true,ValueFromPipelineByPropertyName=$true,HelpMessage="MSI Database Filename",ValueFromPipeline=$true)]
[Alias("Filename","Path","Database","Msi")]
$msiDbName
)
# A quick check to see if the file exist
if(!(Test-Path $msiDbName)){
throw "Could not find " + $msiDbName
}
# Create an empty hashtable to store properties in
$msiFeatures = #{}
# Creating WI object and load MSI database
$wiObject = New-Object -com WindowsInstaller.Installer
$wiDatabase = $wiObject.InvokeMethod("OpenDatabase", (Resolve-Path $msiDbName).Path, 0)
# Open the Property-view
$view = $wiDatabase.InvokeMethod("OpenView", "SELECT * FROM Feature")
$view.InvokeMethod("Execute")
# Loop thru the table
$r = $view.InvokeMethod("Fetch")
while($r -ne $null) {
# Add property and value to hash table
$msiFeatures[$r.InvokeParamProperty("StringData",1)] = $r.InvokeParamProperty("StringData",2)
# Fetch the next row
$r = $view.InvokeMethod("Fetch")
}
$view.InvokeMethod("Close")
# Return the hash table
return $msiFeatures
}