Removing items from one CSV based on items in another CSV file

Removing items from one CSV based on items in another CSV file - powershell

I have a script that will generate a CSV file. The purpose of the script is to verify if a certain file is missing. For example, let's say I have the following files:
1.jpg
2.jpg
3.jpg
4.jpg
1.gif
3.gif
2.txt
3.txt
Once the script is run, it will generate a report so I can visually see what file is missing. The report looks like:
JPG Files GIF Files TXT Files
1.jpg 1.gif
2.jpg 2.txt
3.jpg 3.gif 3.txt
So you can see, I'm missing 1.txt and 2.gif.
Here's where my problem comes in....
I now have a SECOND CSV file that has a list of files that MUST be kept in the FIRST CSV. Anything that is NOT in the SECOND CSV file must now be removed from my FIRST CSV. For example:
My FIRST CSV contains:
1.jpg
2.jpg
3.jpg
1.gif
3.gif
2.txt
3.txt
The SECOND CSV says that the following files need to remain:
1.jpg
3.jpg
1.gif
2.txt
Therefore, anything that does not appear in the SECOND CSV file, needs to be removed from the FIRST CSV while retaining the same format, meaning that if 1.jpg is missing (it is still listed in the SECOND CSV but does not exist in the C:\JPG folder) it must show a blank space in the FIRST CSV.
I hope this make sense. Please ask me if you have any questions or need clarification.
Below is the portion of code from my script that generates the FIRST CSV:
# Get dirs
$dirJPG = "C:\JPG"
$dirGIF = "C:\GIF"
$dirTXT = "C:\TXT"
$files = #()
$files += Get-ChildItem -Path $dirBGR -Filter "*.jpg"
$files += Get-ChildItem -Path $dirMI -Filter "*.gif"
$files += Get-ChildItem -Path $dirW3F -Filter "*.txt"
# Write a datetime stamped CSV file
$datetime = Get-Date -Format "MM_dd_yyyy_hhmm"
$files | Sort-Object -Property { $_.Name } | Group-Object -Property {
[System.IO.Path]::GetFileNameWithoutExtension($_.Name) } | % {
New-Object psobject -Property #{
"JPG" Files" = $_.Group | ? { $_.Extension -eq ".jpg" } | % { $_.Name }
"GIF Files" = $_.Group | ? { $_.Extension -eq ".gif" } | % { $_.Name }
"TXT Files" = $_.Group | ? { $_.Extension -eq ".txt" } | % { $_.Name }
} } | Export-Csv -Path "$datetime.csv" -NoTypeInformation
Thanks in advance for your assistance! :D

It is possible to use arrays, but it will probably be more efficient to use hashtables. You can check iterate (foreach) through the first CSV items and check if files are in CSV1 and not in CSV2:
# Get the files by directory for each file type
function Get-FilesByType() {
param ([hashtable]$filters)
$result = #{}
foreach ($filter in $filters.Keys) {
$path = $filters[$filter]
Get-ChildItem -Path $path -Filter $filter | % {
$result.Add($_.Name, $_)
}
}
return $result
}
# Assume CSV1 hashtable already exists and is loaded
# Get the hashtable of files for CSV2
$csv2 = Get-FilesByType #{"*.jpg"="C:\JPG"; "*.gif"="C:\GIF"; "*.txt"="C:\TXT" }
# Remove items from CSV1 that do not exist in CSV2
# NOTE: You cannot remove items from the hashtable while
# iterating through the collection, so use a copy of the
# keys to iterate.
$keys = #()
$keys += $csv1.Keys
$keys | % {
if ( ! $csv2.ContainsKey($_) ) {
Write-Host "Removing $_"
$csv1.Remove($_)
}
}
# Write a datetime stamped CSV file
$datetime = Get-Date -Format "MM_dd_yyyy_hhmm"
$csv1.Values | Sort-Object -Property { $_.Name } | Group-Object -Property {
[System.IO.Path]::GetFileNameWithoutExtension($_.Name)
} | % {
New-Object psobject -Property #{
"JPG Files" = $_.Group | ? { $_.Extension -eq ".jpg" } | % { $_.Name }
"GIF Files" = $_.Group | ? { $_.Extension -eq ".gif" } | % { $_.Name }
"TXT Files" = $_.Group | ? { $_.Extension -eq ".txt" } | % { $_.Name }
}
} | Export-Csv -Path "$datetime.csv" -NoTypeInformation

Don't use an array - use the Hashtable like Ryan said. The array is not a good choice when you want to remove elements from it.

Found what my problem was...I was calling the files that needed to be removed and then removing them. I simply needed to add a not condition:
$keys = #()
$keys += $currentFiles.Keys
$keys | % {
if (! $filesToKeep.ContainsKey($_)) {
Write-Host "Removing $_"
$currentFiles.Remove($_)
}
}

Related

Powershell script to compare two directories (including sub directories and contents) that are supposed to be identical but on different servers

I would like to run a powershell script that can be supplied a directory name by the user and then it will check the directory, sub directories, and all file contents of those directories to compare if they are identical to each other. There are 8 servers that should all have identical files and contents. The below code does not appear to be doing what I intended. I have seen the use of Compare-Object, Get-ChildItem, and Get-FileHash but have not found the right combo that I am certain is actually accomplishing the task. Any and all help is appreciated!
$35 = "\\server1\"
$36 = "\\server2\"
$37 = "\\server3\"
$38 = "\\server4\"
$45 = "\\server5\"
$46 = "\\server6\"
$47 = "\\server7\"
$48 = "\\server8\"
do{
Write-Host "|1 : New |"
Write-Host "|2 : Repeat|"
Write-Host "|3 : Exit |"
$choice = Read-Host -Prompt "Please make a selection"
switch ($choice){
1{
$App = Read-Host -Prompt "Input Directory Application"
}
2{
#rerun
}
3{
exit; }
}
$c35 = $35 + "$App" +"\*"
$c36 = $36 + "$App" +"\*"
$c37 = $37 + "$App" +"\*"
$c38 = $38 + "$App" +"\*"
$c45 = $45 + "$App" +"\*"
$c46 = $46 + "$App" +"\*"
$c47 = $47 + "$App" +"\*"
$c48 = $48 + "$App" +"\*"
Write-Host "Comparing Server1 -> Server2"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c36 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server3"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c37 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server4"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c38 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server5"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c45 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server6"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c46 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server7"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c47 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server8"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c48 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
} until ($choice -eq 3)

Here is an example function that tries to compare one reference directory against multiple difference directories efficiently. It does so by comparing the most easily available informations first and stopping at the first difference.
Get all relevant informations about files in reference directory once, including hashes (though this could be more optimized by getting hashes only if necessary).
For each difference directory, compare in this order:
file count - if different, then obviously directories are different
relative file paths - if not all paths from difference directory can be found in reference directory, then directories are different
file sizes - should be obvious
file hashes - hashes only need to be calculated if files have equal size
Function Compare-MultipleDirectories {
param(
[Parameter(Mandatory)] [string] $ReferencePath,
[Parameter(Mandatory)] [string[]] $DifferencePath
)
# Get basic file information recursively by calling Get-ChildItem with the addition of the relative file path
Function Get-ChildItemRelative {
param( [Parameter(Mandatory)] [string] $Path )
Push-Location $Path # Base path for Get-ChildItem and Resolve-Path
try {
Get-ChildItem -File -Recurse |
Select-Object FullName, Length, #{ n = 'RelativePath'; e = { Resolve-Path $_.FullName -Relative } }
} finally {
Pop-Location
}
}
Write-Verbose "Reading reference directory '$ReferencePath'"
# Create hashtable with all infos of reference directory
$refFiles = #{}
Get-ChildItemRelative $ReferencePath |
Select-Object *, #{ n = 'Hash'; e = { (Get-FileHash $_.FullName -Algorithm MD5).Hash } } |
ForEach-Object { $refFiles[ $_.RelativePath ] = $_ }
# Compare content of each directory of $DifferencePath with $ReferencePath
foreach( $diffPath in $DifferencePath ) {
Write-Verbose "Comparing directory '$diffPath' with '$ReferencePath'"
$areDirectoriesEqual = $false
$differenceType = $null
$diffFiles = Get-ChildItemRelative $diffPath
# Directories must have same number of files
if( $diffFiles.Count -eq $refFiles.Count ) {
# Find first different path (if any)
$firstDifferentPath = $diffFiles | Where-Object { -not $refFiles.ContainsKey( $_.RelativePath ) } |
Select-Object -First 1
if( -not $firstDifferentPath ) {
# Find first different content (if any) by file size comparison
$firstDifferentFileSize = $diffFiles |
Where-Object { $refFiles[ $_.RelativePath ].Length -ne $_.Length } |
Select-Object -First 1
if( -not $firstDifferentFileSize ) {
# Find first different content (if any) by hash comparison
$firstDifferentContent = $diffFiles |
Where-Object { $refFiles[ $_.RelativePath ].Hash -ne (Get-FileHash $_.FullName -Algorithm MD5).Hash } |
Select-Object -First 1
if( -not $firstDifferentContent ) {
$areDirectoriesEqual = $true
}
else {
$differenceType = 'Content'
}
}
else {
$differenceType = 'FileSize'
}
}
else {
$differenceType = 'Path'
}
}
else {
$differenceType = 'FileCount'
}
# Output comparison result
[PSCustomObject]#{
ReferencePath = $ReferencePath
DifferencePath = $diffPath
Equal = $areDirectoriesEqual
DiffCause = $differenceType
}
}
}
Usage example:
# compare each of directories B, C, D, E, F against A
Compare-MultipleDirectories -ReferencePath 'A' -DifferencePath 'B', 'C', 'D', 'E', 'F' -Verbose
Output example:
ReferencePath DifferencePath Equal DiffCause
------------- -------------- ----- ---------
A B True
A C False FileCount
A D False Path
A E False FileSize
A F False Content
DiffCause column gives you the information why the function thinks the directories are different.
Note:
Select-Object -First 1 is a neat trick to stop searching after we got the first result. It is efficient because it doesn't process all input first and drop everything except first item, but instead it actually cancels the pipeline after the 1st item has been found.
Group-Object RelativePath -AsHashTable creates a hashtable of the file information so it can be looked up quickly by the RelativePath property.
Empty sub directories are ignored, because the function only looks at files. E. g. if reference path contains some empty directories but difference path does not, and the files in all other directories are equal, the function treats the directories as equal.
I've choosen MD5 algorithm because it is faster than the default SHA-256 algorithm used by Get-FileHash, but it is insecure. Someone could easily manipulate a file that is different, to have the same MD5 hash as the original file. In a trusted environment this won't matter though. Remove -Algorithm MD5 if you need more secure comparison.

A simple place to start:
compare (dir -r dir1) (dir -r dir2) -Property name,length,lastwritetime
You can also add -passthru to see the original objects, or -includeequal to see the equal elements. The order of each array doesn't matter without -syncwindow. I'm assuming all the lastwritetime's are in sync, to the millisecond. Don't assume you can skip specifying the properties to compare. See also Comparing folders and content with PowerShell
I was looking into calculated properties like for relative path, but it looks like you can't name them, even in powershell 7. I'm chopping off the first four path elements, 0..3.
compare (dir -r foo1) (dir -r foo2) -Property length,lastwritetime,#{e={($_.fullname -split '\\')[4..$_.fullname.length] -join '\'}}
length lastwritetime ($_.fullname -split '\\')[4..$_.fullname.length] -join '\' SideIndicator
------ ------------- ---------------------------------------------------------- -------------
16 11/12/2022 11:30:20 AM foo2\file2 =>
18 11/12/2022 11:30:20 AM foo1\file2 <=

How to delete duplicate files with similar name

I'm fairly new to PowerShell and I've not been able to find a definitive answer for my problem. I have a bunch of excel files in different folders which are duplicates but have varying file names due to them being updated.
e.g.
015 Approved warranty - Turkey - Case-2019 08-1437015 (issue 3),
015 Approved warranty - Turkey - Case-2019 08-1437015 (final issue)
015 Approved warranty - Turkey - Case-2019 08-1437015
015 Approved warranty - Turkey - Case-2019 08-1437015 amended
I've tried different things but now I know the easiest way to filter the files but don't know the syntax. The anchor point will be the case number just after the date. I want to compare the case numbers against each other and only keep the newest ones (by date modified) and delete the rest. Any guidance is appreciated.
#take files from folder
$dupesource = 'C:\Users\W_Brooker\Documents\Destination\2019\08'
#filter files by case number (7 digit number after date)
$files = Get-ChildItem $dupesource -Filter "08-aaaaaaa"
#If case number is the same keep newest file delete rest
foreach ($file in $files){
$file | Delete-Item - sort -property Datemodified |select -Last 1
}

A PowerShell-idiomatic solution is to:
combine multiple cmdlets in a single pipeline,
in which Group-Object provides the core functionality of grouping duplicate files by shared case number in the file name:
# Define the regex that matches a case number:
# A 7-digit number embedded in filenames that duplicates share.
$regex = '\b\d{7}\b'
# Enumerate all files and select only those whose name contains a case number.
Get-ChildItem -File $dupesource | Where-Object { $_.BaseName -match $regex } |
# Group the resulting files by shared embedded case number.
Group-Object -Property { [regex]::Match($_.BaseName, $regex).Value } |
# Process each group:
ForEach-Object {
# In each group, sort files by most recently updated first.
$_.Group | Sort-Object -Descending LastWriteTimeUtc |
# Skip the most recent file and delete the older ones.
Select-Object -Skip 1 | Remove-Item -WhatIf
}
The -WhatIf common parameter previews the operation. Remove it once you're sure it will do what you want.

This should do the trick:
$files = Get-ChildItem 'C:\Users\W_Brooker\Documents\Destination\2019\08' -Recurse
# create datatable to store file Information in it
$dt = New-Object system.Data.DataTable
[void]$dt.Columns.Add('FileName',[string]::Empty.GetType() )
[void]$dt.Columns.Add('CaseNumber',[string]::Empty.GetType() )
[void]$dt.Columns.Add('FileTimeStamp',[DateTime]::MinValue.GetType() )
[void]$dt.Columns.Add('DeleteFlag',[byte]::MinValue.GetType() )
# Step 1: Make inventory
foreach( $file in $files ) {
if( !$file.PSIsContainer -and $file.Extension -like '.xls*' -and $file.Name -match '^.*\-\d+ *[\(\.].*$' ) {
$row = $dt.NewRow()
$row.FileName = $file.FullName
$row.CaseNumber = $file.Name -replace '^.*\-(\d+) *[\(\.].*$', '$1'
$row.FileTimeStamp = $file.LastWriteTime
$row.DeleteFlag = 0
[void]$dt.Rows.Add( $row )
}
}
# Step 2: Mark files to delete
$rows = $dt.Select('', 'CaseNumber, FileTimeStamp DESC')
$caseNumber = ''
foreach( $row in $rows ) {
if( $row.CaseNumber -ne $caseNumber ) {
$caseNumber = $row.CaseNumber
Continue
}
$row.DeleteFlag = 1
[void]$dt.AcceptChanges()
}
# Step 3: Delete files
$rows = $dt.Select('DeleteFlag = 1', 'FileTimeStamp DESC')
foreach( $row in $rows ) {
$fileName = $row.FileName
Remove-Item -Path $fileName -Force | Out-Null
}

Here's an alternative that leverages the PowerShell Group-Object cmdlet.
It uses a regex to matche files on the case number, ignoring those that don't have a case number. See the screen shot at the bottom that shows test data (a collection of test xlsx files)
cls
#Assume that each file has an xlsx extension.
#Assume that a case number always looks like this: "Case-YYYY~XX-Z" where YYYY is 4 digits, ~ is a single space, XX is two digits, and Z is one-to-many-digits
#make a list of xlsx files (recursive)
$files = Get-ChildItem -LiteralPath .\ExcelFiles -Recurse -Include *.xlsx
#$file is a System.IO.FileInfo object. Parse out the Case number and add it to the $file object as CaseNumber property
foreach ($file in $files)
{
$Matches = $null
$file.Name -match "(^.*)(Case-\d{4}\s{1}\d{2}-\d{1,})(.*\.xlsx$)" | out-null
if ($Matches.Count -eq 4)
{
$caseNumber = $Matches[2]
$file | Add-Member -NotePropertyName CaseNumber -NotePropertyValue $caseNumber
}
Else
{
#child folders will end up in this group too
$file | Add-Member -NotePropertyName CaseNumber -NotePropertyValue "NoCaseNumber"
}
}
#group the files by CaseNumber
$files | Group-Object -Property CaseNumber -OutVariable fileGroups | out-null
foreach ($fileGroup in $fileGroups)
{
#skip folders and files that don't have a valid case #
if ($fileGroup.Name -eq "NoCaseNumber")
{
continue
}
#for each group: sort files descending by LastWriteTime. Newest file will be first, so skip 1st file and remove the rest
$fileGroup.Group | sort -Descending -Property LastWriteTime | select -skip 1 | foreach {Remove-Item -LiteralPath $_.FullName -Force}
}
Test Data

Using Array and get-childitem to find filenames with specific ids

In the most basic sense, I have a SQL query which returns an array of IDs, which I've stored into a variable $ID. I then want to perform a Get-childitem on a specific folder for any filenames that contain any of the IDs in said variable ($ID) There are three possible filenames that could exist:
$ID.xml
$ID_input.xml
$ID_output.xml
Once I have the results of get-childitem, I want to output this as a text file and delete the files from the folder. The part I'm having trouble with is filtering the results of get-childitem to define the filenames I'm looking for, so that only files that contain the IDs from the SQL output are displayed in my get-childitem results.
I found another way of doing this, which works fine, by using for-each ($i in $id), then building the desired filenames from that and performing a remove item on them:
# Build list of XML files
$XMLFile = foreach ($I in $ID)
{
"$XMLPath\$I.xml","$XMLPath\$I`_output.xml","$XMLPath\$I`_input.xml"
}
# Delete XML files
$XMLFile | Remove-Item -Force
However, this produces a lot of errors in the shell, as it tries to delete files that don't exist, but whose IDs do exist in the database. I also can't figure out how to produce a text output of the files that were actually deleted, doing it this way, so I'd like to get back to the get-childitem approach, if possible.
Any ideas would be greatly appreciated. If you require more info, just ask.

You can find all *.xml files with Get-ChildItem to minimize the number of files to test and then use regex to match the filenames. It's faster than a loop/multiple test, but harder to read if you're not familiar with regex.
$id = 123,111
#Create regex-pattern (search-pattern)
$regex = "^($(($id | ForEach-Object { [regex]::Escape($_) }) -join '|'))(?:_input|_output)?$"
$filesToDelete = Get-ChildItem -Path "c:\users\frode\Desktop\test" -Filter "*.xml" | Where-Object { $_.BaseName -match $regex }
#Save list of files
$filesToDelete | Select-Object -ExpandProperty FullName | Out-File "deletedfiles.txt" -Append
#Remove files (remove -WhatIf when ready)
$filesToDelete | Remove-Item -Force -WhatIf
Regex demo: https://regex101.com/r/dS2dJ5/2

Try this:
clear
$ID = "a", "b", "c"
$filesToDelete = New-Object System.Collections.ArrayList
$files = Get-ChildItem e:\
foreach ($I in $ID)
{
($files | Where-object { $_.Name -eq "$ID.xml" }).FullName | ForEach-Object { $filesToDelete.Add($_) }
($files | Where-object { $_.Name -eq "$ID_input.xml" }).FullName | ForEach-Object { $filesToDelete.Add($_) }
($files | Where-object { $_.Name -eq "$ID_output.xml" }).FullName | ForEach-Object { $filesToDelete.Add($_) }
}
$filesToDelete | select-object -Unique | ForEach-Object { Remove-Item $_ -Force }

Need to add the full path of where test was referenced from

So far I have a hash table with 2 values in it. Right now the code below, exports all the unique lines and gives me a count of how many times the line was referenced in 100's of xml files. This is one part.
I now need to find out which subfolder had the xml file in it that has that unique line of referenced in the hash table. Is this possible?
$ht = #{}
Get-ChildItem -recurse -Filter *.xml | Get-Content | %{$ht[$_] = $ht[$_]+1}
$ht
# To export to CSV:
$ht.GetEnumerator() | select key, value | Export-Csv D:\output.csv

To get file path to your output, you need to assign it to a variable in the first pipe.
Is this something similar to what you need?
$ht = #{}
Get-ChildItem -recurse -Filter *.xml | %{$path = $_.FullName; Get-Content $path} | % { $ht[$_] = $ht[$_] + $path + ";"}
The code above will return a hash-table in "config line" = "count" format.
EDIT:
If you need to return three elements (unique line, count and array of paths where it was found) it gets more complicated. Here is a code that will return an array of PSObjects. Each contains info for one unique line in XML files.
$ht = #()
$files = Get-ChildItem -recurse -Filter *.xml
foreach ($file in $files) {
$path = $file.FullName
$lines = Get-Content $path
foreach ($line in $lines) {
if ($match = $ht | where {$_.line -EQ $line}) {
$match.count = $match.count + 1
$match.Paths += $path
} else {
$ht += new-object PSObject -Property #{
Count = 1
Paths = #(,$path)
Line = $line }
}
}
}
$ht
I'm sure it can be shortened and optimized, but hopefully it is enough to get you started.

Comparing csv files with -like in Powershell

I have two csv files, each that contain a PATH column. For example:
CSV1.csv
PATH,Data,NF
\\server1\folderA,1,1
\\server1\folderB,1,1
\\server2\folderA,1,1
\\server2\folderB,1,1
CSV2.csv
PATH,User,Access,Size
\\server1\folderA\file1,don,1
\\server1\folderA\file2,don,1
\\server1\folderA\file3,sue,1
\\server2\folderB\file1,don,1
What I'm attempting to do is create a script that will result in separate csv exports based on the paths in CSV1 such that the new files contain file values from CSV2 that match. For example, from the above, I'd end up with 2 results:
result1.csv
\\server1\folderA\file1,don,1
\\server1\folderA\file2,don,1
\\server1\folderA\file3,sue,1
result2.csv
\\server2\folderB\file1,don,1
Previously I've used a script lime this when the two values are exact:
$reportfile = import-csv $apireportoutputfile -delimiter ';' -encoding unicode
$masterlist = import-csv $pathlistfile
foreach ($record in $masterlist)
{
$path=$record.Path
$filename = $path -replace '\\','_'
$filename = '.\Working\sharefiles\' + $filename + '.csv'
$reportfile | where-object {$_.path -eq $path} | select FilePath,UserName,LastAccessDate,LogicalSize | export-csv -path $filename
write-host " Creating files list for $path" -foregroundcolor red -backgroundcolor white
}
however since the two path values are not the same, it returns nothing. I found a -like operator but am not sure how to use it in this code to get the results I want. where-object is a filter while -like ends up returning a true/false. Am I on the right track? Any ideas for a solution?

Something like this, maybe?
$ht = #{}
Import-Csv csv1.csv |
foreach { $ht[$_.path] = New-Object collections.arraylist }
Import-Csv csv2.csv |
foreach {
$path = $_.path | Split-Path -Parent
$ht[$path].Add($_) > $null
}
$i=1
$ht.Values |
foreach { if ($_.count)
{
$_ | Export-Csv "result$i.csv" -NoTypeInformation
$i++
}
}

My suggestion:
$1=ipcsv .\csv1.CSV
$2=ipcsv .\csv2.CSV
$equal = diff ($2|select #{n='PATH';e={Split-Path $_.PATH}}) $1 -Property PATH -IncludeEqual -ExcludeDifferent -PassThru
0..(-1 + $equal.Count) | %{%{$i = $_}{
$2 | ?{ (Split-Path $_.PATH) -eq $equal[$i].PATH } | epcsv ".\Result$i.CSV"
}}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Removing items from one CSV based on items in another CSV file - powershell

Don't use an array - use the Hashtable like Ryan said. The array is not a good choice when you want to remove elements from it.

Found what my problem was...I was calling the files that needed to be removed and then removing them. I simply needed to add a not condition: $keys = #() $keys += $currentFiles.Keys $keys | % { if (! $filesToKeep.ContainsKey($_)) { Write-Host "Removing $_" $currentFiles.Remove($_) } }

Related

Powershell script to compare two directories (including sub directories and contents) that are supposed to be identical but on different servers

How to delete duplicate files with similar name

Using Array and get-childitem to find filenames with specific ids

Need to add the full path of where test was referenced from

Comparing csv files with -like in Powershell

Categories

Resources