PowerShell - Extracting the Metadata of of files and grid viewing it - powershell

Please see the following code:
# import .NET 4.5 compression utilities
Add-Type -As System.IO.Compression.FileSystem;
$zipArchives = Get-ChildItem "*.zip";
foreach($zipArchive in $zipArchives)
{
$archivePath = $zipArchive.FullName;
$archive = [System.IO.Compression.ZipFile]::OpenRead($archivePath);
try
{
foreach($archiveEntry in $archive.Entries)
{
if($archiveEntry.FullName -notmatch '/$')
{
$tempFile = [System.IO.Path]::GetTempFileName();
try
{
[System.IO.Compression.ZipFileExtensions]::ExtractToFile($archiveEntry, $tempFile, $true);
$windowsStyleArchiveEntryName = $archiveEntry.FullName.Replace('/', '\');
Select-String -pattern "<dc:title>.*</dc:title>" -path (Get-ChildItem $tempFile) | Select-Object #{Name="Path";Expression={Join-Path $archivePath (Split-Path $windowsStyleArchiveEntryName -Parent)}}
#Select-String -pattern "<dc:title>.*</dc:title>" -path (Get-ChildItem $tempFile) | Select-Object Matches
#Select-String -pattern "<dc:subject>.*</dc:subject>" -path (Get-ChildItem $tempFile) | Select-Object Matches
#Select-String -pattern "<dc:date>.*</dc:date>" -path (Get-ChildItem $tempFile) | Select-Object Matches
}
finally
{
Remove-Item $tempFile;
}
}
}
}
finally
{
$archive.Dispose();
}
}
It's a modified version of code that I found on the internet and helped me to find strings inside zip files.
My intention now is to extract metadata from zip files using this code.
I don't understand how can I display the two types of information in separate lines. If you run the script with only one Select-String... pipeline line active, the code works as expected. If you activate (uncomment) the second Select-String... pipeline line, the second type of information (the <dc:title> value) is not displayed and instead there is a blank line.
Please help me:
1) How can I also display the dc:title value using the Select-String | Select-Object mechanism that I used in the code.
2) How can I output all the data in a table format, so the table would look something like this:
* ZIP Filename * DC Title *
* zipfile01.zip * Bla Bla 01 *
* zipfile02.zip * Bla Bla 02 *
* zipfile03.zip * Bla Bla 03 *
This format of output would be the most usable for me.

The console "view" for pipeline-objcts is created based on the first object (which only have a Path-property). The second object is missing a Path-property which is why you see a blank line. If you had commented out the first Select-String ..-line (that shows Path), then the second line would work.
Objects sent through the pipeline should have the same set of properties so avoid using select-object with different property-sets. Ex:
.....
$tempFile = [System.IO.Path]::GetTempFileName();
try
{
[System.IO.Compression.ZipFileExtensions]::ExtractToFile($archiveEntry, $tempFile, $true);
[System.IO.Compression.ZipFileExtensions]::
$windowsStyleArchiveEntryName = $archiveEntry.FullName.Replace('/', '\');
Select-String -pattern "<dc:title>(.*)</dc:title>" -path (Get-ChildItem $tempFile) | Select-Object #{n="Zip FileName";e={$zipArchive.Name}}, #{Name="DC Title";Expression={ $_.Matches.Groups[1].Value}}
}
finally
{
Remove-Item $tempFile;
}
.....
To output all the metadata, you should create an object that includes all the values. Ex:
$tempFile = [System.IO.Path]::GetTempFileName();
try
{
[System.IO.Compression.ZipFileExtensions]::ExtractToFile($archiveEntry, $tempFile, $true);
[System.IO.Compression.ZipFileExtensions]::
$windowsStyleArchiveEntryName = $archiveEntry.FullName.Replace('/', '\');
#Avoid multiple reads
$content = Get-Content $tempFile
New-Object -TypeName psobject -Property #{
"Zip Filename" = $zipArchive.Name
"DC Title" = if($content -match '<dc:title>(.*)</dc:title>') { $Matches[1] } else { $null }
"DC Subject" = if($content -match '<dc:subject>(.*)</dc:subject>') { $Matches[1] } else { $null }
"DC Date" = if($content -match '<dc:date>(.*)</dc:date>') { $Matches[1] } else { $null }
}
}
finally
{
Remove-Item $tempFile;
}
....
Ex. output
Zip Filename DC Subject DC Title DC Date
------------ ---------- -------- -------
test.zip Subject O M G 5/18/2016
If you really want to force separate views (will get ugly), then you need to send he objects to | Out-Default to create a new view every time, ex:
Select-String -pattern "<dc:title>.*</dc:title>" -path (Get-ChildItem $tempFile) | Select-Object #{Name="Path";Expression={Join-Path $archivePath (Split-Path $windowsStyleArchiveEntryName -Parent)}} | Out-Default

i know it's not the answer you were looking for, but as a temporary workaround, you may be able to combine the two commands into one like this
Select-String -pattern "<dc:title>.*</dc:title>" -path (Get-ChildItem $tempFile) | Select-Object Matches, #{Name="Path";Expression={Join-Path $archivePath (Split-Path $windowsStyleArchiveEntryName -Parent)}}

Related

Nested zip contents listing

I've been working on a little side project of listing files compressed in nested zip files.
I've cooked up a script that does just that, but only if the depth of zip files is known.
In in example below the zip file has additional zips in it and then anthoer in one of them.
Add-Type -AssemblyName System.IO.Compression.Filesystem
$path = "PATH"
$CSV_Path = "CSV_PATH"
$zipFile = Get-ChildItem $path -recurse -Filter "*.zip"
$rootArchive = [System.IO.Compression.zipfile]::OpenRead($zipFile.fullname)
$rootArchive.Entries | Select #{l = 'Source Zip'; e = {} }, #{l = "FullName"; e = { $_.FullName.Substring(0, $rootArchive.Fullname.Lastindexof('\')) } }, Name | Export-csv $CSV_Path -notypeinformation
$archivesLevel2 = $rootArchive.Entries | Where { $_.Name -like "*.zip" }
foreach ($archive in $archivesLevel2)
{
(New-object System.IO.Compression.ZipArchive ($archive.Open())).Entries | Select #{l = 'Source Zip'; e = { $archive.name } }, #{l = "FullName"; e = { $archive.FullName.Substring(0, $_.Fullname.Lastindexof('\')) } }, Name | Export-Csv $CSV_Path -NoTypeInformation -append;
New-object System.IO.Compression.ZipArchive($archive.Open()) -OutVariable +lastArchiveLevel2
}
$archivesLevel3 = $lastArchiveLevel2.entries | Where { $_.Name -like "*.zip" }
foreach ($archive in $archivesLevel3)
{
(New-Object System.IO.Compression.ZipArchive ($archive.Open())).Entries | Select #{l = 'Source Zip'; e = { $archive.name } }, #{l = "FullName"; e = { $archive.FullName.Substring(0, $_.Fullname.Lastindexof('\')) } }, Name | Export-Csv $CSV_Path -NoTypeInformation -append
}
What I ask of you is to help me modify this to accomodate an unknown depth of inner zip files. Is that even possible?
Here's an example on how to do it using a Queue object, which allow you to recursively go through all depths of your zip file in one go.
As requested, here are some comments to explain what is going on.
Add-Type -AssemblyName System.IO.Compression.Filesystem
$path = "PATH"
$CSV_Path = "CSV_PATH"
$Queue = [System.Collections.Queue]::New()
$zipFiles = Get-ChildItem $path -recurse -Filter "*.zip"
# All records will be stored here
$Output = [System.Collections.Generic.List[PSObject]]::new()
# Main logic. Used when looking at the root zip and any zip entries.
# ScriptBlock is used to prevent code duplication.
$ProcessEntries = {
Param($Entries)
$Entries | % {
# Put all zip in the queue for future processing
if ([System.IO.Path]::GetExtension($entry) -eq '.zip') { $Queue.Enqueue($_) }
# Add a Source Zip property with the parent zip since we want this informations in the csv export and it is not available otherwise.
$_ | Add-Member -MemberType NoteProperty -Name 'Source Zip' -Value $zip.name
# Every entries, zip or not, need to be part of the output
$output.Add($_)
}
}
# Your initial Get-ChildItem to find zip file implicate there could be multiple root zip files, so a loop is required.
Foreach ($zip in $zipFiles) {
$archive = [System.IO.Compression.zipfile]::OpenRead($zip.fullname)
# The $ProcessEntries scriptblock is invoked to fill the Queue and the output.
. $ProcessEntries $archive.Entries
# Should the Zip file have no zip entries, this loop will never be entered.
# Otherwise, the loop will continue as long as zip entries are detected while processing any child zip.
while ($Queue.Count -gt 0) {
# Removing item from the queue to avoid reprocessing it again.
$Item = $Queue.Dequeue()
$archive = New-object System.IO.Compression.ZipArchive ($Item.open())
# We call the main scriptblock again to fill the queue and the output.
. $ProcessEntries $archive.Entries
}
}
$Output | Select 'Source Zip', FullName, Name | Export-Csv $CSV_Path -NoTypeInformation
References
Queue
Here you have a little example of how recursion would look like, basically, you loop over the .Entries property of ZipFile and check if the extension of each item is .zip, if it is, then you pass that entry to your function.
EDIT: Un-deleting this answer mainly to show how this could be approached using a recursive function, my previous answer was inaccurate. I was using [ZipFile]::OpenRead(..) to read the nested .zip files which seemed to work correctly on Linux (.NET Core) however it clearly does not work when using Windows PowerShell. The correct approach would be to use [ZipArchive]::new($nestedZip.Open()) as Sage Pourpre's helpful answer shows.
using namespace System.IO
using namespace System.IO.Compression
function Get-ZipFile {
[cmdletbinding()]
param(
[parameter(ValueFromPipeline)]
[object]$Path,
[parameter(DontShow)]
[int]$Nesting = -1
)
begin { $Nesting++ }
process {
try
{
$zip = if(-not $Nesting) {
[ZipFile]::OpenRead($Path)
}
else {
[ZipArchive]::new($Path.Open())
}
foreach($entry in $zip.Entries) {
[pscustomobject]#{
Nesting = $Nesting
Parent = $Path.Name
Contents = $entry.FullName
}
if([Path]::GetExtension($entry) -eq '.zip') {
Get-ZipFile -Path $entry -Nesting $Nesting
}
}
}
catch
{
$PSCmdlet.WriteError($_)
}
finally
{
if($null -ne $zip) {
$zip.Dispose()
}
}
}
}
Get-ChildItem *.zip | Get-ZipFile

How to split through the whole list using PowerShell

In my CSV file I have "SharePoint Site" column and a few other columns. I'm trying to split the ID from "SharePoint Site" columns and put it to the new column call "SharePoint ID" but not sure how to do it so I'll be really appreciated If I can get any help or suggestion.
$downloadFile = Import-Csv "C:\AuditLogSearch\New folder\Modified-Audit-Log-Records.csv"
(($downloadFile -split "/") -split "_") | Select-Object -Index 5
CSV file
SharePoint Site
Include:[https://companyname-my.sharepoint.com/personal/elksn7_nam_corp_kl_com]
Include:[https://companyname-my.sharepoint.com/personal/tzksn_nam_corp_kl_com]
Include:[https://companyname.sharepoint.com/sites/msteams_c578f2/Shared%20Documents/Forms/AllItems.aspx?id=%2Fsites%2Fmsteams%5Fc578f2%2FShared%20Documents%2FBittner%2DWilfong%20%2D%20Litigation%20Hold%2FWork%20History&viewid=b3e993a1%2De0dc%2D4d33%2D8220%2D5dd778853184]
Include:[https://companyname.sharepoint.com/sites/msteams_c578f2/Shared%20Documents/Forms/AllItems.aspx?id=%2Fsites%2Fmsteams%5Fc578f2%2FShared%20Documents%2FBittner%2DWilfong%20%2D%20Litigation%20Hold%2FWork%20History&viewid=b3e993a1%2De0dc%2D4d33%2D8220%2D5dd778853184]
Include:[All]
After spliting this will show it under new Column call "SharePoint ID"
SharePoint ID
2. elksn
3. tzksn
4. msteams_c578f2
5. msteams_c578f2
6. All
Try this:
# Import csv into an array
$Sites = (Import-Csv C:\temp\Modified-Audit-Log-Records.csv).'SharePoint Site'
# Create Export variable
$Export = #()
# ForEach loop that goes through the SharePoint sites one at a time
ForEach($Site in $Sites){
# Clean up the input to leave only the hyperlink
$Site = $Site.replace('Include:[','')
$Site = $Site.replace(']','')
# Split the hyperlink at the fifth slash (Split uses binary, so 0 would be the first slash)
$SiteID = $Site.split('/')[4]
# The 'SharePoint Site' Include:[All] entry will be empty after doing the split, because it has no 4th slash.
# This If statement will detect if the $Site is 'All' and set the $SiteID as that.
if($Site -eq 'All'){
$SiteID = $Site
}
# Create variable to export Site ID
$SiteExport = #()
$SiteExport = [pscustomobject]#{
'SharePoint ID' = $SiteID
}
# Add each SiteExport to the Export array
$Export += $SiteExport
}
# Write out the export
$Export
A concise solution that appends a Sharepoint ID column to the existing columns by way of a calculated property:
Import-Csv 'C:\AuditLogSearch\New folder\Modified-Audit-Log-Records.csv' |
Select-Object *, #{
Name = 'SharePoint ID'
Expression = {
$tokens = $_.'SharePoint Site' -split '[][/]'
if ($tokens.Count -eq 3) { $tokens[1] } # matches 'Include:[All]'
else { $tokens[5] -replace '_nam_corp_kl_com$' }
}
}
Note:
To see all resulting column values, pipe the above to Format-List.
To re-export the results to a CSV file, pipe to Export-Csv
You have 3 distinct patterns you are trying to extract data from. I believe regex would be an appropriate tool.
If you are wanting the new csv to just have the single ID column.
$file = "C:\AuditLogSearch\New folder\Modified-Audit-Log-Records.csv"
$IdList = switch -Regex -File ($file){
'Include:.+(?=/(\w+?)_)(?<=personal)' {$matches.1}
'Include:(?=\[(\w+)\])' {$matches.1}
'Include:.+(?=/(\w+?)/)(?<=sites)' {$matches.1}
}
$IdList |
ConvertFrom-Csv -Header "Sharepoint ID" |
Export-Csv -Path $newfile -NoTypeInformation
If you want to add a column to your existing CSV
$file = "C:\AuditLogSearch\New folder\Modified-Audit-Log-Records.csv"
$properties = ‘*’,#{
Name = 'Sharepoint ID'
Expression = {
switch -Regex ($_.'sharepoint Site'){
'Include:.+(?=/(\w+?)_)(?<=personal)' {$matches.1}
'Include:(?=\[(\w+)\])' {$matches.1}
'Include:.+(?=/(\w+?)/)(?<=sites)' {$matches.1}
}
}
}
Import-Csv -Path $file |
Select-Object $properties |
Export-Csv -Path $newfile -NoTypeInformation
Regex details
.+ Match any amount of any character
(?=...) Positive look ahead
(...) Capture group
\w+ Match one or more word characters
? Lazy quantifier
(?<=...) Positive look behind
This would require more testing to see if it works well, but with the input we have it works, the main concept is to use System.Uri to parse the strings. From what I'm seeing, the segment you are looking for is always the third one [2] and depending on the previous segments, perform a split on _ or trim the trailing / or leave the string as is if IsAbsoluteUri is $false.
$csv = Import-Csv path/to/test.csv
$result = foreach($line in $csv)
{
$uri = [uri]($line.'SharePoint Site' -replace '^Include:\[|]$')
$id = switch($uri)
{
{-not $_.IsAbsoluteUri} {
$_
break
}
{ $_.Segments[1] -eq 'personal/' } {
$_.Segments[2].Split('_')[0]
break
}
{ $_.Segments[1] -eq 'sites/' } {
$_.Segments[2].TrimEnd('/')
}
}
[pscustomobject]#{
'SharePoint Site' = $line.'SharePoint Site'
'SharePoint ID' = $id
}
}
$result | Format-List

How to add MD5 hash toa PowerShell file dump

Looking for a line to add that pulls the file information as below but includes an MD5 hash
It can be from certutil, but there is not a means to download that module so looking for a means that uses PowerShell without an additional update of PowerShell.
We are looking to compare two disks for missing files even when the file might be located in an alternate location.
cls
$filPath="G:/"
Set-Location -path $filPath
Get-ChildItem -Path $filPath -recurse |`
foreach-object{
$Item=$_
$Path =$_.FullName
$ParentS=($_.FullName).split("/")
$Parent=$ParentS[#($ParentS.Length-2)]
$Folder=$_.PSIsContainer
#$Age=$_.CreationTime
#$Age=$_.ModifiedDate
$Modified=$_.LastWriteTime
$Type=$_.Extension
$Path | Select-Object `
#{n="Name";e={$Item}},`
#{n="LastModified";e={$Modified}},`
#{n="Extension";e={$Type}},`
#{n="FolderName";e={if($Parent){$Parent}else{$Parent}}},`
#{n="filePath";e={$Path}}`
} | Export-csv Q:/lpdi/fileDump.csv -NoTypeInformation
Possible answer here: (Thanks Guenther)
#{name="Hash";expression={(Get-FileHash -Algorithm MD5 -Path $Path).hash}}
In this script it meets the filehash condition along with the name of the file which allows a way to find the file on the folder and know it matches another one in another location based on the hash.
I'm not sure what happens on the file hash itself. If it includes the name of the file, the hash will be different. If it is only the file itself and the path doesn't matter, it should meet the requirement. I'm not sure how to include it in the code above however
Your code could be simplified so you don't need all those 'in-between' variables.
Also, the path separator character in Windows is a backslash (\), not a forward slash (/) which makes this part of your code $ParentS=($_.FullName).split("/") not doing what you expect from it.
Try
$SourcePath = 'G:\'
Get-ChildItem -Path $SourcePath -File -Recurse | ForEach-Object {
# remove the next line if you do not want console output
Write-Host "Processing file '$($_.FullName)'.."
$md5 = ($_ | Get-FileHash -Algorithm MD5).Hash
$_ | Select-Object #{Name = 'Name'; Expression = { $_.Name }},
#{Name = 'LastModified'; Expression = { $_.LastWriteTime }},
#{Name = 'Extension'; Expression = { $_.Extension }},
#{Name = 'FolderName'; Expression = { $_.Directory.Name }},
#{Name = 'FilePath'; Expression = { $_.FullName }},
#{Name = 'FileHash'; Expression = { $md5 }}
} | Export-Csv -Path 'Q:/lpdi/fileDump.csv' -NoTypeInformation
Because getting hash values is a time consuming process I've added a Write-Host line, so you know the script did not 'hang'..
Edit: Okay so, here is my workaround as promised.
Before we start, requirements are:
Have python 3.8 or above installed and registered in windows PATH
edit the ps1 file variables accordingly
edit the python file variables accordingly
bypass powershell script execution policies
There are 4 files in the working directory (different from your target directory):
addMD5.ps1 (static)
addMD5.py (static)
fileDump-original.csv (auto-generated)
fileDump-modified.csv (auto-generated)
Here are the contents of those 4 files:
addMD5.ps1
$targetDir="C:\Users\USERname4\Desktop\myGdrive"
$workingDir="C:\Users\USERname4\Desktop\myWorkingDir"
$pythonName="addMD5.py"
$exportName = "fileDump-original.csv"
Set-Location -path $workingDir
if (Test-Path $exportName)
{
Remove-Item $exportName
}
Get-ChildItem -Path $targetDir -recurse |`
foreach-object{
$Item=$_
$Path =$_.FullName
$ParentS=($_.FullName).split("/")
$Parent=$ParentS[#($ParentS.Length-2)]
$Folder=$_.PSIsContainer
#$Age=$_.CreationTime
#$Age=$_.ModifiedDate
$Modified=$_.LastWriteTime
$Type=$_.Extension
$Path | Select-Object `
#{n="Name";e={$Item}},`
#{n="LastModified";e={$Modified}},`
#{n="Extension";e={$Type}},`
#{n="FolderName";e={if($Parent){$Parent}else{$Parent}}},`
#{n="filePath";e={$Path}}`
} | Export-csv $exportName -NoTypeInformation
python $pythonName
addMD5.py
import os, hashlib
def file_len(fname):
with open(fname) as fp:
for i, line in enumerate(fp):
pass
return i + 1
def read_nth(fname,intNth):
with open(fname) as fp:
for i, line in enumerate(fp):
if i == (intNth-1):
return line
def getMd5(fname):
file_hash = hashlib.md5()
with open(fname, "rb") as f:
chunk = f.read(8192)
while chunk:
file_hash.update(chunk)
chunk = f.read(8192)
return file_hash.hexdigest()
file1name = "fileDump-original.csv"
file2name = "fileDump-modified.csv"
try:
os.remove(file2name)
except:
pass
file2 = open(file2name , "w")
for linenum in range(file_len(file1name)):
if (linenum+1) == 1:
file2.write(read_nth(file1name,linenum+1).strip()+',"md5"\n')
else:
innerfilename = read_nth(file1name,linenum+1).split(",")[4].strip()[1:-1]
file2.write(read_nth(file1name,linenum+1).strip()+',"'+getMd5(innerfilename)+'"\n')
file2.close()
fileDump-original.csv
"Name","LastModified","Extension","FolderName","filePath"
"test1.txt","20-Jun-21 12:50:44 PM",".txt","C:\Users\USERname4\Desktop\myGdrive\test1.txt","C:\Users\USERname4\Desktop\myGdrive\test1.txt"
"test2.txt","20-Jun-21 12:50:37 PM",".txt","C:\Users\USERname4\Desktop\myGdrive\test2.txt","C:\Users\USERname4\Desktop\myGdrive\test2.txt"
fileDump-modified.csv
"Name","LastModified","Extension","FolderName","filePath","md5"
"test1.txt","20-Jun-21 12:50:44 PM",".txt","C:\Users\USERname4\Desktop\myGdrive\test1.txt","C:\Users\USERname4\Desktop\myGdrive\test1.txt","d659c1bc0a3010b0bdd45d9a8fee3196"
"test2.txt","20-Jun-21 12:50:37 PM",".txt","C:\Users\USERname4\Desktop\myGdrive\test2.txt","C:\Users\USERname4\Desktop\myGdrive\test2.txt","d55749658669d28f8549d94cd01b72ba"

Trouble with Log progression

I'm trying copy logs in numerical order and I want my output.txt to log the last file copied however I'm running to a problem where when my script goes from log_9.txt to Log_10.txt the value that gets put into my text file stays at log_9.txt even though it copies all the files
dir c:\PS1 *.bat | ForEach {
$variable = "$($_.Name) 'n$(Get-content $_.FullName)"
Set-Content -Value $variable -Path c:\PS1\Output.txt
$pull = Get-Content C:\PS1\Output.txt
copy-item $source\$pull -Destination $dest -Verbose
}
}
The following command shows you how you sort the base name (file name without extension) of your input files first lexically, by the text before the _, and then numerically, by the number following the _:
# The input simulates dir (Get-ChildItem) output.
#{ BaseName = 'log_10' }, #{ BaseName ='log_9' }, #{ BaseName = 'log_2' } |
Sort-Object { ($_.BaseName -split '_')[0] }, { [int] ($_.BaseName -split '_')[-1] }
The above yields the following - note the correct numerical sorting:
Name Value
---- -----
BaseName log_2
BaseName log_9
BaseName log_10

Comparing filehash and outputting files

I am new to PowerShell and am writing a script to get the hash of a directory and store it in a .txt file.
I then want to compare it to an earlier version and check for changes. If there are changes, I want a new .txt or .html file containing which line items have changed, with last modified dates.
So far, I've gotten the comparison to work, and the resulting steps based upon the pass/fail work fine.
What I need help with is outputting the results into a .txt file that lists only the files that have changed, with fields of Algorithm, Hash, Filename, Last edit time. I know I can use
(Get-Item $source).LastWriteTime
To fetch the write time, but I need to do it for every file in the directory, not just the .txt file that contains the hash.
# Variables
$Hashstore = "d:\baseline.txt"
$HashCompare = "d:\hashcompare.txt"
$HashTemp = "d:\hashtemp.txt"
$FileDir = "d:\New2"
$DateTime = Get-Date -format M.d.yyyy.hh.mm.ss
# Email Variables
$smtp_server = '<yourSMTPServer>'
$to_email = '<email>'
$from_email = '<email>'
$dns_server = "<yourExternalDNSServer>"
$domain = "<yourDomain>"
# Check if Baseline.txt Exists
If (Test-Path $Hashstore)
# // File exists
{}
Else {
# // File does not exist - Should never happen!
$RefreshHash = dir $FileDir | Get-FileHash -Algorithm MD5
$RefreshHash | Out-File $Hashstore
}
# Generate new Compare Hash.txt
$HashNew = dir $FileDir -Recurse | Get-FileHash -Algorithm MD5
$HashNew | Out-File $HashCompare
# Get Hash of baseline.txt
$HashBaseline = Get-FileHash -Path d:\baseline.txt -Algorithm MD5
#Get Hash of hashcompare.txt
$HashDiff = Get-FileHash -Path d:\hashcompare.txt -Algorithm MD5
#If changed, output hash to storage, and flag changes
If ($HashBaseline.hash -eq $HashDiff.hash)
{
Add-Content -Path d:\success.$DateTime.txt -Value " Source Files ARE EQUAL </p>"
}
else
{
Add-Content -Path d:\failure.$DateTime.html -Value "Source Files NOT EQUAL </p>"
$HashNew | Out-File $HashTemp
}
# Compare two logs, send email if there is a change
If ($diff_results)
{
#$evt_message = Get-Content .\domain.new.txt | Out-String
#Write-EventLog -LogName Application -EventId 9000 -EntryType Error -Source "Maximo Validation Script" -Message $evt_message
#Send-MailMessage -To $to_email -From $from_email -SmtpServer $smtp_server -Attachments .\domain.new.txt -Subject "ALERT! Change in Records" -Body "A change has been detected in the Maximo system files.`n`n`tACTION REQUIRED!`n`nVerify that this change was authorized."
}
If ($HashNew.HashString -eq $Hashstore.HashString)
{
}
else
{
$HashTemp | Out-File $HashStore
}
I know the add-item may not be the best way to write to this log I'm creating. What would be the best way to add the last write time to every file that is read?
Here is a clean way to ouput the information you need (Algorithm, Hash, Filename, Last edit time) for each file that has changed :
$Hashstore = "d:\baseline.txt"
$HashCompare = "d:\hashcompare.txt"
$HashTemp = "d:\hashtemp.txt"
$FileDir = "d:\New2"
$DateTime = Get-Date -format M.d.yyyy.hh.mm.ss
# Check if Baseline.txt Exists
If (Test-Path $Hashstore)
# // File exists
{
}
Else {
# // File does not exist - Should never happen!
$RefreshHash = dir $FileDir -Recurse | Get-FileHash -Algorithm MD5
$RefreshHash | Export-Csv -Path $Hashstore -NoTypeInformation -Force
}
# Generate new Compare Hash.txt
$HashNew = dir $FileDir -Recurse | Get-FileHash -Algorithm MD5
$HashNew | Export-Csv -Path $HashCompare -NoTypeInformation -Force
# Get Hash of baseline.txt
$HashBaseline = Get-FileHash -Path $Hashstore -Algorithm MD5
#Get Hash of hashcompare.txt
$HashDiff = Get-FileHash -Path $HashCompare -Algorithm MD5
#If changed, output hash to storage, and flag changes
If ($HashBaseline.hash -eq $HashDiff.hash) {
Add-Content -Path D:\success.$DateTime.txt -Value " Source Files ARE EQUAL </p>"
}
Else {
Add-Content -Path D:\failure.$DateTime.txt -Value "Source Files NOT EQUAL </p>"
$HashNew | Export-Csv -Path $HashTemp -NoTypeInformation -Force
# Storing a collection of differences in $Diffs
$Diffs = Compare-Object -ReferenceObject (Import-Csv $Hashstore) -DifferenceObject (Import-Csv $HashCompare)
Foreach ($Diff in $Diffs) {
$DiffHashInfo = $Diff | Select-Object -ExpandProperty InputObject
$DiffFileInfo = Get-ChildItem -Path $DiffHashInfo.Path
# Creating a list of properties for the information you need
$DiffObjProperties = [ordered]#{'Algorithm'=$DiffHashInfo.Algorithm
'Hash'=$DiffHashInfo.Hash
'Filename'=$DiffFileInfo.Name
'Last edit time'=$DiffFileInfo.LastWriteTime
}
# Building a custom object from the list of properties in $DiffObjProperties
$DiffObj = New-Object -TypeName psobject -Property $DiffObjProperties
$DiffObj
}
}
Before creating the files $Hashstore and $HashCompare, I convert the information they contain to CSV format, rather than plain text.
It makes their content much easier to manipulate later , using Import-CSV.
This makes proper objects with properties I can use.
This also makes them easier to compare, and the result of this comparison ($Diffs) is a collection of these proper objects.
So $Diffs contains all the files that have changed and I loop through each of them in a Foreach statement.
This allows you to create a custom object ($DiffObj) with exactly the information you need ($DiffObjProperties) for each of the file that have changed.
PowerShell v3+ Recursive Directory Diff Using MD5 Hashing
I use this pure PowerShell (no dependencies) recursive file content diff. It calculates in-memory the MD5 hash (the algorithm is configurable) for each directories file contents and gives results in standard PowerShell Compare-Object format.
It can optionally export to CSV files along with a summary text file. It can either drop the rdiff.ps1 file into your path or copy the contents into your script.
USAGE: rdiff path/to/left,path/to/right [-s path/to/summary/dir]
Here is the gist. I copied below for reference but I recommend using the gist version as I will be adding new features to it over time.
#########################################################################
### USAGE: rdiff path/to/left,path/to/right [-s path/to/summary/dir] ###
### ADD LOCATION OF THIS SCRIPT TO PATH ###
#########################################################################
[CmdletBinding()]
param (
[parameter(HelpMessage="Stores the execution working directory.")]
[string]$ExecutionDirectory=$PWD,
[parameter(Position=0,HelpMessage="Compare two directories recursively for differences.")]
[alias("c")]
[string[]]$Compare,
[parameter(HelpMessage="Export a summary to path.")]
[alias("s")]
[string]$ExportSummary
)
### FUNCTION DEFINITIONS ###
# SETS WORKING DIRECTORY FOR .NET #
function SetWorkDir($PathName, $TestPath) {
$AbsPath = NormalizePath $PathName $TestPath
Set-Location $AbsPath
[System.IO.Directory]::SetCurrentDirectory($AbsPath)
}
# RESTORES THE EXECUTION WORKING DIRECTORY AND EXITS #
function SafeExit() {
SetWorkDir /path/to/execution/directory $ExecutionDirectory
Exit
}
function Print {
[CmdletBinding()]
param (
[parameter(Mandatory=$TRUE,Position=0,HelpMessage="Message to print.")]
[string]$Message,
[parameter(HelpMessage="Specifies a success.")]
[alias("s")]
[switch]$SuccessFlag,
[parameter(HelpMessage="Specifies a warning.")]
[alias("w")]
[switch]$WarningFlag,
[parameter(HelpMessage="Specifies an error.")]
[alias("e")]
[switch]$ErrorFlag,
[parameter(HelpMessage="Specifies a fatal error.")]
[alias("f")]
[switch]$FatalFlag,
[parameter(HelpMessage="Specifies a info message.")]
[alias("i")]
[switch]$InfoFlag = !$SuccessFlag -and !$WarningFlag -and !$ErrorFlag -and !$FatalFlag,
[parameter(HelpMessage="Specifies blank lines to print before.")]
[alias("b")]
[int]$LinesBefore=0,
[parameter(HelpMessage="Specifies blank lines to print after.")]
[alias("a")]
[int]$LinesAfter=0,
[parameter(HelpMessage="Specifies if program should exit.")]
[alias("x")]
[switch]$ExitAfter
)
PROCESS {
if($LinesBefore -ne 0) {
foreach($i in 0..$LinesBefore) { Write-Host "" }
}
if($InfoFlag) { Write-Host "$Message" }
if($SuccessFlag) { Write-Host "$Message" -ForegroundColor "Green" }
if($WarningFlag) { Write-Host "$Message" -ForegroundColor "Orange" }
if($ErrorFlag) { Write-Host "$Message" -ForegroundColor "Red" }
if($FatalFlag) { Write-Host "$Message" -ForegroundColor "Red" -BackgroundColor "Black" }
if($LinesAfter -ne 0) {
foreach($i in 0..$LinesAfter) { Write-Host "" }
}
if($ExitAfter) { SafeExit }
}
}
# VALIDATES STRING MIGHT BE A PATH #
function ValidatePath($PathName, $TestPath) {
If([string]::IsNullOrWhiteSpace($TestPath)) {
Print -x -f "$PathName is not a path"
}
}
# NORMALIZES RELATIVE OR ABSOLUTE PATH TO ABSOLUTE PATH #
function NormalizePath($PathName, $TestPath) {
ValidatePath "$PathName" "$TestPath"
$TestPath = [System.IO.Path]::Combine((pwd).Path, $TestPath)
$NormalizedPath = [System.IO.Path]::GetFullPath($TestPath)
return $NormalizedPath
}
# VALIDATES STRING MIGHT BE A PATH AND RETURNS ABSOLUTE PATH #
function ResolvePath($PathName, $TestPath) {
ValidatePath "$PathName" "$TestPath"
$ResolvedPath = NormalizePath $PathName $TestPath
return $ResolvedPath
}
# VALIDATES STRING RESOLVES TO A PATH AND RETURNS ABSOLUTE PATH #
function RequirePath($PathName, $TestPath, $PathType) {
ValidatePath $PathName $TestPath
If(!(Test-Path $TestPath -PathType $PathType)) {
Print -x -f "$PathName ($TestPath) does not exist as a $PathType"
}
$ResolvedPath = Resolve-Path $TestPath
return $ResolvedPath
}
# Like mkdir -p -> creates a directory recursively if it doesn't exist #
function MakeDirP {
[CmdletBinding()]
param (
[parameter(Mandatory=$TRUE,Position=0,HelpMessage="Path create.")]
[string]$Path
)
PROCESS {
New-Item -path $Path -itemtype Directory -force | Out-Null
}
}
# GETS ALL FILES IN A PATH RECURSIVELY #
function GetFiles {
[CmdletBinding()]
param (
[parameter(Mandatory=$TRUE,Position=0,HelpMessage="Path to get files for.")]
[string]$Path
)
PROCESS {
ls $Path -r | where { !$_.PSIsContainer }
}
}
# GETS ALL FILES WITH CALCULATED HASH PROPERTY RELATIVE TO A ROOT DIRECTORY RECURSIVELY #
# RETURNS LIST OF #{RelativePath, Hash, FullName}
function GetFilesWithHash {
[CmdletBinding()]
param (
[parameter(Mandatory=$TRUE,Position=0,HelpMessage="Path to get directories for.")]
[string]$Path,
[parameter(HelpMessage="The hash algorithm to use.")]
[string]$Algorithm="MD5"
)
PROCESS {
$OriginalPath = $PWD
SetWorkDir path/to/diff $Path
GetFiles $Path | select #{N="RelativePath";E={$_.FullName | Resolve-Path -Relative}},
#{N="Hash";E={(Get-FileHash $_.FullName -Algorithm $Algorithm | select Hash).Hash}},
FullName
SetWorkDir path/to/original $OriginalPath
}
}
# COMPARE TWO DIRECTORIES RECURSIVELY #
# RETURNS LIST OF #{RelativePath, Hash, FullName}
function DiffDirectories {
[CmdletBinding()]
param (
[parameter(Mandatory=$TRUE,Position=0,HelpMessage="Directory to compare left.")]
[alias("l")]
[string]$LeftPath,
[parameter(Mandatory=$TRUE,Position=1,HelpMessage="Directory to compare right.")]
[alias("r")]
[string]$RightPath
)
PROCESS {
$LeftHash = GetFilesWithHash $LeftPath
$RightHash = GetFilesWithHash $RightPath
diff -ReferenceObject $LeftHash -DifferenceObject $RightHash -Property RelativePath,Hash
}
}
### END FUNCTION DEFINITIONS ###
### PROGRAM LOGIC ###
if($Compare.length -ne 2) {
Print -x "Compare requires passing exactly 2 path parameters separated by comma, you passed $($Compare.length)." -f
}
Print "Comparing $($Compare[0]) to $($Compare[1])..." -a 1
$LeftPath = RequirePath path/to/left $Compare[0] container
$RightPath = RequirePath path/to/right $Compare[1] container
$Diff = DiffDirectories $LeftPath $RightPath
$LeftDiff = $Diff | where {$_.SideIndicator -eq "<="} | select RelativePath,Hash
$RightDiff = $Diff | where {$_.SideIndicator -eq "=>"} | select RelativePath,Hash
if($ExportSummary) {
$ExportSummary = ResolvePath path/to/summary/dir $ExportSummary
MakeDirP $ExportSummary
$SummaryPath = Join-Path $ExportSummary summary.txt
$LeftCsvPath = Join-Path $ExportSummary left.csv
$RightCsvPath = Join-Path $ExportSummary right.csv
$LeftMeasure = $LeftDiff | measure
$RightMeasure = $RightDiff | measure
"== DIFF SUMMARY ==" > $SummaryPath
"" >> $SummaryPath
"-- DIRECTORIES --" >> $SummaryPath
"`tLEFT -> $LeftPath" >> $SummaryPath
"`tRIGHT -> $RightPath" >> $SummaryPath
"" >> $SummaryPath
"-- DIFF COUNT --" >> $SummaryPath
"`tLEFT -> $($LeftMeasure.Count)" >> $SummaryPath
"`tRIGHT -> $($RightMeasure.Count)" >> $SummaryPath
"" >> $SummaryPath
$Diff | Format-Table >> $SummaryPath
$LeftDiff | Export-Csv $LeftCsvPath -f
$RightDiff | Export-Csv $RightCsvPath -f
}
$Diff
SafeExit
Another my version. But without date/time.
# Check images. Display if differ
#
$file_path = "C:\Files"
$last_state = "last_state.json"
# Check last_state.json. If false - create new empty file.
If (!(Test-Path $last_state)) {
New-Item $last_state -ItemType file | Out-Null
}
$last_state_obj = Get-Content $last_state | ConvertFrom-Json
# Get files list and hash. Also you can use -Recurse option
Get-ChildItem $file_path -Filter *.* |
Foreach-Object {
if (!$_.PSIsContainer) {
$current_state += #($_ | Get-FileHash -Algorithm MD5)
}
}
# Compare hash
ForEach ($current_file in $current_state) {
if (($last_state_obj | where {$current_file.Path -eq $_.Path}).Hash -ne $current_file.Hash) {
$changed += #($current_file)
}
}
# Display changed files
$changed
# Save new hash to last_state.json
$current_state | ConvertTo-JSON | Out-File $last_state