Compare Directory and File removing equals from Directory - powershell

This is the code I am trying to execute
$objDir = Get-ChildItem "C:\Users\Bruce\Serena\"
$objFile = Get-Content "C:\Users\Bruce\process.txt"
$matches = (Compare-Object -ReferenceObject $objFile -DifferenceObject $objDir -Property Name,Length -excludedifferent -includeequal)
foreach ($file in $matches)
{
Remove-Item C:\Users\Bruce\Serena\$($file.Name)
}
All items that are equal in directory and txt file I want to delete from the directory will this code do that?

It's hard to tell what you should do without seeing the format of the data in process.txt, but I can tell you definitively that invoking Compare-Object on the results of Get-ChildItem and Get-Content can't possibly work, because the former returns an array of FileInfo objects and the latter retuns an array of strings (or just a string, if the file has only one line).
Compare-Object is intended primarily to compare sets of objects of the same type, though it can be used to compare sets of objects that have common property names. However, in the latter case the properties need to have the same type of information, not just the same names, in order for the comparison to be meaningful.
There's no way for it to guess what content in the strings in $objFile to compare to properties of the FileInfo objects in $objDir. The only property name these object types have in common is Length, but any matches on that property would be meaningless (and very unlikely) coincidences, because they have completely different meanings—the number of characters in the string, and the size of the file in bytes, respectively.
How you should do it depends on what kind of data you have in process.txt. If it's just a list of filenames, then it's as simple as
foreach ($file in (Get-Content 'C:\Users\Bruce\process.txt')) {
Remove-Item -ErrorAction SilentlyContinue "C:\Users\Bruce\Serena\$file"
}
If it's a CSV file containing the name and size of each file, then you'd use Import-Csv rather than Get-Content, to import the data into an array of objects with properties you can compare to the directory listing, but I'd need to see some sample data before getting specific.

Related

How to compare CSV file in powershell but exclude some fields of the dataset within the compare

I'm looking for a way to compare two CSV files with powershell and output only the data sets from the first given CSV file which are different.
It should also be possible to exclude some fields of the data set (provided via the header field name of the CSV).
CSV example (First CSV file)
FirstName;LastName;LastUpdate;Mail;City;PostalCode
Max;Mustermann;01.01.2023;test#test.de;Musterstadt;12345
Maxi;Musterfrau;01.01.2022;maxi#test.de;Musterstadt;12345
CSV example (second CSV file)
FirstName;LastName;LastUpdate;Mail;City;PostalCode
Max;Mustermann;24.12.2023;test#test.de;Musterdorf;54321
Maxi;Musterfrau;12.12.2022;maxi#test.de;Musterstadt;12345
As shown in the CSV examples, the first dataset in CSV file 1 and 2 are different.
Now, within the compare process, the field 'LastUpdate' should be ignored, so that only the fields 'FirstName;LastName;Mail;City;PostalCode' will be used for comparing the data.
The return of the compare should be only the full datasets which are different and only from file one in a array.
I tried some different things but nothing works as expected.
Here a sample of my tries
# Define the file paths
$file1 = "..\file1.csv"
$file2 = "..\file2.csv"
# Read the first file into a variable
$data1 = Import-Csv $file1
# Read the second file into a variable
$data2 = Import-Csv $file2
# Compare the files, ignoring data from the 'LastUpdate' field
$differences = Compare-Object -ReferenceObject $data1 -DifferenceObject $data2 -IncludeEqual -ExcludeDifferent -Property 'LastUpdate' | Where-Object {$_.SideIndicator -eq '<='}
# export differences to a CSV file
$differences | Export-Csv -Path "..\Compare_result.csv" -Delimiter ";" -NoTypeInformation
I hope you guys can help me out.
I thank you in advance
Compare-Object doesn't allow you to exclude properties to compare by - any properties you want to compare by must be expressed positively, as an array of names passed to -Property.
The purpose of the -ExcludeDifferent switch is to exclude objects that compare differently and only makes sense in combination with -IncludeEqual, given that objects that compare as equal are not included by default (in PowerShell (Core) 7+, use of -ExcludeDifferent now implies -IncludeEqual).
If -Property is used, the [pscustomobject] output objects have only the specified properties. To pass the input objects through as-is, the -PassThru switch must be used.
The passed-through objects are decorated with an ETS (Extended Type System) .SideIndicator property, so that filtering based on what side they were unique to is still possible.
Caveat: if -IncludeEqual is also present, for a given pair of input objects that compare equal, it is (only) the -ReferenceObject collection's object that is passed through - even though the -DifferenceObject object may have differing values in properties that aren't being compared.
Therefore:
# Compare the files, ignoring data from the 'LastUpdate' field
$differences =
Compare-Object -ReferenceObject $data1 `
-DifferenceObject $data2 `
-Property FirstName, LastName, Mail, City, PostalCode `
-PassThru |
Where-Object SideIndicator -eq '<='
I found my issue.
I forgot the Delimiter param for the two Import-csv, since the Delimiter in my CSV is a ;
So I changed from this code
# Read the first file into a variable
$data1 = Import-Csv $file1
# Read the second file into a variable
$data2 = Import-Csv $file2
to this code
# Read the first file into a variable
$data1 = Import-Csv $file1 -Delimiter ";"
# Read the second file into a variable
$data2 = Import-Csv $file2 -Delimiter ";"
After that I got the expected results from the compare.

PowerShell, can't get LastWriteTime

I have this working, but need LastWriteTime and can't get it.
Get-ChildItem -Recurse | Select-String -Pattern "CYCLE" | Select-Object Path, Line, LastWriteTime
I get an empty column and zero Date-Time data
Select-String's output objects, which are of type Microsoft.PowerShell.Commands.MatchInfo, only contain the input file path (string), no other metadata such as LastWriteTime.
To obtain it, use a calculated property, combined with the common -PipelineVariable parameter,
which allows you to reference the input file at hand in the calculated property's expression script block as a System.IO.FileInfo instance as output by Get-ChildItem, whose .LastWriteTime property value you can return:
Get-ChildItem -File -Recurse -PipelineVariable file |
Select-String -Pattern "CYCLE" |
Select-Object Path,
Line,
#{
Name='LastWriteTime';
Expression={ $file.LastWriteTime }
}
Note how the pipeline variable, $file, must be passed without the leading $ (i.e. as file) as the -PipelineVariable argument . -PipelineVariable can be abbreviated to -pv.
LastWriteTime is a property of System.IO.FileSystemInfo, which is the base type of the items Get-ChildItem returns for the Filesystem provider (which is System.IO.FileInfo for files). Path and Line are properties of Microsoft.PowerShell.Commands.MatchInfo, which contains information about the match, not the file you passed in. Select-Object operates on the information piped into it, which comes from the previous expression in the pipeline, your Select-String in this case.
You can't do this as a (well-written) one-liner if you want the file name, line match, and the last write time of the actual file to be returned. I recommend using an intermediary PSCustomObject for this and we can loop over the found files and matches individually:
# Use -File to only get file objects
$foundMatchesInFiles = Get-ChildItem -Recurse -File | ForEach-Object {
# Assign $PSItem/$_ to $file since we will need it in the second loop
$file = $_
# Run Select-String on each found file
$file | Select-String -Pattern CYCLE | ForEach-Object {
[PSCustomObject]#{
Path = $_.Path
Line = $_.Line
FileLastWriteTime = $file.LastWriteTime
}
}
}
Note: I used a slightly altered name of FileLastWriteTime to exemplify that this comes from the returned file and not the match provided by Select-String, but you could use LastWriteTime if you wish to retain the original property name.
Now $foundMatchesInFiles will be a collection of files which have CYCLE occurring within them, the path of the file itself (as returned by Select-String), and the last write time of the file itself as was returned by the initial Get-ChildItem.
Additional considerations
You could also use Select-Object and computed properties but IMO the above is a more concise approach when merging properties from unrelated objects together. While not a poor approach, Select-Object outputs data with a type containing the original object type name (e.g. Selected.Microsoft.PowerShell.Commands.MatchInfo). The code may work fine but can cause some confusion when others who may consume this object in the future inspect the output members. LastWriteTime, for example, belongs to FileSystemInfo, not MatchInfo. Another developer may not understand where the property came from at first if it has the MatchInfo type referenced. It is generally a better design to create a new object with the merged properties.
That said this is a minor issue which largely comes down to stylistic preference and whether this object might be consumed by others aside from you. I write modules and scripts that many other teams in my organization consume so this is a concern for me. It may not be for you. #mklement0's answer is an excellent example of how to use computed properties with Select-Object to achieve the same functional result as this answer.

Compare folder to a hash file

There are a lot of questions and answers about comparing the hash of two folder for integrity like this one. Assuming I have a folder that I copied to a backup medium (External drive, flash, optical disc) and would like to delete the original to save space.
What is the best way to save the original's folder hashes (before deletion) in a text file perhaps and check the backup's integrity much later against that file.
Note that if you delete the originals first and later find that the backup lacks integrity, so to speak, all you'll know is that something went wrong; the non-corrupted data will be gone.
You can create a CSV file with 2 columns, RelativePath (the full file path relative to the input directory) and Hash, and save it to a CSV file with Export-Csv:
$inputDir = 'C:\path\to\dir' # Note: specify a *full* path.
$prefixLen = $inputDir.Length + 1
Get-ChildItem -File -Recurse -LiteralPath $inputDir |
Get-FileHash |
Select-Object #{
Name='RelativePath'
Expression={ $_.Path.Substring($prefixLen) }
},
Hash |
Export-Csv originalHashes.csv -NoTypeInformation -Encoding utf8
Note: In PowerShell (Core) 7+, neither -NoTypeInformation nor -Encoding utf8 are needed, though note that the file will have no UTF-8 BOM; use -Encoding utf8bom if you want one; conversely, in Windows PowerShell you invariably get a BOM.
Note:
The Microsoft.PowerShell.Commands.FileHashInfo instances output by Get-FileHash also have an .Algorithm property naming the hashing algorithm that was used ('SHA256' by default, or as specified via the -Algorithm parameter).
If you want this property included (whose value will be the same for all CSV rows), you simply add Algorithm to the array of properties passed to Select-Object above.
Note how a hashtable (#{ ... }) passed as the second property argument to Select-Object serves as a calculated property that derives the relative path from each .Path property value (which contains the full path).
You can later apply the same command to the backup directory tree, saving to, say, backupHashes.csv, and compare the two CSV files with Compare-Object:
Compare-Object (Import-Csv -LiteralPath originalHashes.csv) `
(Import-Csv -LiteralPath backupHashes.csv) `
-Property RelativePath, Hash
Note: There's no strict need to involve files in the operation - one or both output collections can be captured in memory and can be used directly in the comparison - just omit the Export-Csv call in the command above and save to a variable ($originalHashes = Get-ChildItem ...)

Compare multiple folders for file differences

I began to compare 2 folder structures to find files that did not match by date and size, but the requirment has been changed to 4 folders and I am stuck.
So here is what I am trying to do:
We upload several hundred folders\files to 4 different servers. The files must all match. Sometimes a file will not copy properly. So I need a script to read all four directories and compare all the files to make sure they match by size and date.
Output should only be a simple list that shows me the files that didn't match.
Any ideas?
Thanks.
I can do two folders but am lost on four. Also, this output is confusing. Not sure how to only list those that don't match.
$path1 = "\\path\folder
$path2 = "\\path\folder1
$dif = Compare-Object -ReferenceObject $path1 -DifferenceObject $path2 -Property FullName, Length, LastWriteTime
$dif | ft -AutoSize
I'd go about it with a hash based approach, and possibly use a database table somehwere to help yourself out. BTW, PSCX has the Get-Hash commandlet which will help you do this.
Basic approach
Traverse each server's desired folder-tree (you want to do this on the servers involved for performance reasons, not over a network share!) and generate a hash on each file you find. Store the hash and the full path and server name somewhere, preferrably a database table accessible from all four servers--it'll make processing much easier.
Then, if you've used a database table, write a few simple queries:
find any hash where there are fewer than 4 instances of the hash.
find any file path (you may have to process the path string to get it to the same relative root for each server ) where there are differing hashes (although this might be covered by 1. above).
All of this can be done from within PS, of course.
Why this way of doing things may be helpful
You don't have to run a four-way Compare-Object. The hashes serve as your point of comparison.
Your Powershell code to generate the hashes is one identical function that gets run on each server.
It scales. You could easily do this for 100 folders.
You end up with something easily manipulated and "distributed",i.e. accesible to the servers involved--the database table.
Downside
PSCX Get-Hash isn't very fast. This can easily be remedied by having PS fire some faster hash generating command, such as this one, md5sums.
How to do without using a database table
1. Write the hashes, file paths, severnames to files on each server as you are processing folders for hashes, and bring those files back when done.
2. Process the files into a hash table that keys on the hashcodes and counts each hash code.
3. You can have a parallel hash table (built at that same time as 2. while you pass throug the result files) that keys on each hash code to an array of paths/servers for that hash code.
4. Look for hash codes in hash table 1 with a count of less than 4. Use parallel hash table 2 to look up hash codes found with a count less 4, to find out what the file path(s) and server(s) were.
Try this:
Remember that the PrimaryPath has to be a masterlocation(contents are correct). Also, be consistent with how you write the paths(if you include the \ or not). Ex. Either use c:\folders\folder1\ for all paths or c:\folders\folder1.
Compare.ps1
Param(
[parameter(Mandatory=$true)] [alias("p")] [string]$PrimaryPath,
[parameter(Mandatory=$true)] [alias("c")] [string[]]$ComparePath
)
#Get filelist with relativepath property
function Get-FilesWithRelativePath ($Path) {
Get-ChildItem $Path -Recurse | ? { !$_.PSIsContainer } | % {
Add-Member -InputObject $_ -MemberType NoteProperty -Name RelativePath -Value $_.FullName.Substring($Path.Length)
$_
}
}
#If path exists and is folder
if (Test-Path $PrimaryPath -PathType Container) {
#Get master fileslist
$Masterfiles = Get-FilesWithRelativePath (Resolve-Path $PrimaryPath).Path
#Compare folders
foreach ($Folder in $ComparePath) {
if (Test-Path $Folder -PathType Container) {
#Getting filelist and adding relative-path property to files
$ResolvedFolder = (Resolve-Path $Folder).Path
$Files = Get-FilesWithRelativePath $ResolvedFolder
#Compare and output filepath to missing or old file
Compare-Object -ReferenceObject $Masterfiles -DifferenceObject $Files -Property RelativePath, Length, LastWriteTime | ? { $_.SideIndicator -eq "<=" } | Select #{n="FilePath";e={Join-Path $ResolvedFolder $_.RelativePath}}
} else { Write-Error "$Folder is not a valid foldername. Foldertype: Compare" }
}
} else { Write-Error "$PrimaryPath is not a valid foldername. Foldertype: Master" }

How to get Select-Object to return a raw type (e.g. String) rather than PSCustomObject?

The following code gives me an array of PSCustomObjects, how can I get it to return an array of Strings?
$files = Get-ChildItem $directory -Recurse | Select-Object FullName | Where-Object {!($_.psiscontainer)}
(As a secondary question, what's the psiscontainer part for? I copied that from an example online)
Post-Accept Edit: Two great answers, wish I could mark both of them. Have awarded the original answer.
You just need to pick out the property you want from the objects. FullName in this case.
$files = Get-ChildItem $directory -Recurse | Select-Object FullName | Where-Object {!($_.psiscontainer)} | foreach {$_.FullName}
Edit: Explanation for Mark, who asks, "What does the foreach do? What is that enumerating over?"
Sung Meister's explanation is very good, but I'll add a walkthrough here because it could be helpful.
The key concept is the pipeline. Picture a series of pingpong balls rolling down a narrow tube one after the other. These are the objects in the pipeline. Each stage of pipeline--the code segments separated by pipe (|) characters--has a pipe going into it and pipe going out of it. The output of one stage is connected to the input of the next stage. Each stage takes the objects as they arrive, does things to them, and sends them back out into the output pipeline or sends out new, replacement objects.
Get-ChildItem $directory -Recurse
Get-ChildItem walks through the filesystem creating FileSystemInfo objects that represent each file and directory it encounters, and puts them into the pipeline.
Select-Object FullName
Select-Object takes each FileSystemInfo object as it arrives, grabs the FullName property from it (which is a path in this case), puts that property into a brand new custom object it has created, and puts that custom object out into the pipeline.
Where-Object {!($_.psiscontainer)}
This is a filter. It takes each object, examines it, and sends it back out or discards it depending on some condition. Your code here has a bug, by the way. The custom objects that arrive here don't have a psiscontainer property. This stage doesn't actually do anything. Sung Meister's code is better.
foreach {$_.FullName}
Foreach, whose long name is ForEach-Object, grabs each object as it arrives, and here, grabs the FullName property, a string, from it. Now, here is the subtle part: Any value that isn't consumed, that is, isn't captured by a variable or suppressed in some way, is put into the output pipeline. As an experiment, try replacing that stage with this:
foreach {'hello'; $_.FullName; 1; 2; 3}
Actually try it out and examine the output. There are four values in that code block. None of them are consumed. Notice that they all appear in the output. Now try this:
foreach {'hello'; $_.FullName; $ x = 1; 2; 3}
Notice that one of the values is being captured by a variable. It doesn't appear in the output pipeline.
To get the string for the file name you can use
$files = Get-ChildItem $directory -Recurse | Where-Object {!($_.psiscontainer)} | Select-Object -ExpandProperty FullName
The -ExpandProperty parameter allows you to get back an object based on the type of the property specified.
Further testing shows that this did not work with V1, but that functionality is fixed as of the V2 CTP3.
For Question #1
I have removed "select-object" portion - it's redundant and moved "where" filter before "foreach" unlike dangph's answer - Filter as soon as possible so that you are dealing with only a subset of what you have to deal with in the next pipe line.
$files = Get-ChildItem $directory -Recurse | Where-Object {!$_.PsIsContainer} | foreach {$_.FullName}
That code snippet essentially reads
Get all files full path of all files recursively (Get-ChildItem $directory -Recurse)
Filter out directories (Where-Object {!$_.PsIsContainer})
Return full file name only (foreach {$_.FullName})
Save all file names into $files
Note that for foreach {$_.FullName}, in powershell, last statement in a script block ({...}) is returned, in this case $_.FullName of type string
If you really need to get a raw object, you don't need to do anything after getting rid of "select-object". If you were to use Select-Object but want to access raw object, use "PsBase", which is a totally different question(topic) - Refer to "What's up with PSBASE, PSEXTENDED, PSADAPTED, and PSOBJECT?" for more information on that subject
For Question #2
And also filtering by !$_.PsIsContainer means that you are excluding a container level objects - In your case, you are doing Get-ChildItem on a FileSystem provider(you can see PowerShell providers through Get-PsProvider), so the container is a DirectoryInfo(folder)
PsIsContainer means different things under different PowerShell providers;
e.g.) For Registry provider, PsIsContainer is of type Microsoft.Win32.RegistryKey
Try this:
>pushd HKLM:\SOFTWARE
>ls | gm
[UPDATE] to following question: What does the foreach do? What is that enumerating over?
To clarify, "foreach" is an alias for "Foreach-Object"
You can find out through,
get-help foreach
-- or --
get-alias foreach
Now in my answer, "foreach" is enumerating each object instance of type FileInfo returned from previous pipe (which has filtered directories). FileInfo has a property called FullName and that is what "foreach" is enumerating over.
And you reference object passed through pipeline through a special pipeline variable called "$_" which is of type FileInfo within the script block context of "foreach".
For V1, add the following filter to your profile:
filter Get-PropertyValue([string]$name) { $_.$name }
Then you can do this:
gci . -r | ?{!$_.psiscontainer} | Get-PropertyName fullname
BTW, if you are using the PowerShell Community Extensions you already have this.
Regarding the ability to use Select-Object -Expand in V2, it is a cute trick but not obvious and really isn't what Select-Object nor -Expand was meant for. -Expand is all about flattening like LINQ's SelectMany and Select-Object is about projection of multiple properties onto a custom object.