Using Group-Object : How to Keep x number of each directory and delete all others [duplicate] - powershell

I have a network share with 20.000 XML files in the format
username-computername.xml
There are duplicate entries in the form of (when a user received a new comptuer)
user1-computer1.xml
user1-computer2.xml
or
BLRPPR-SKB52084.xml
BLRSIA-SKB50871.xml
S028DS-SKB51334.xml
s028ds-SKB52424.xml
S02FL6-SKB51644.xml
S02FL6-SKB52197.xml
S02VUD-SKB52083.xml
Since im going to manipulate the XMLs later I can't just dismiss properties of the array as at the very least I need the full path. The aim is, if a duplicate is found, the one with the newer timestamp is being used.
Here is a snipet of the code where I need that logic
$xmlfiles = Get-ChildItem "network share"
Here I'm just doing a foreach loop:
foreach ($xmlfile in $xmlfiles) {
[xml]$xmlcontent = Get-Content -Path $xmlfile.FullName -Encoding UTF8
Select-Xml -Xml $xmlcontent -Xpath " "
# create [pscustomobject] etc...
}
Essentially what I need is
if ($xmlfiles.Name.Split("-")[0]) - duplicate) {
# select the one with higher $xmlfiles.LastWriteTime and store either
# the full object or the $xmlfiles.FullName
}
Ideally that should be part of the foreach loop to not to have to loop through twice.

You can use Group-Object to group files by a custom attribute:
$xmlfiles | Group-Object { $_.Name.Split('-')[0] }
The above statement will produce a result like this:
Count Name Group
----- ---- -----
1 BLRPPR {BLRPPR-SKB52084.xml}
1 BLRSIA {BLRSIA-SKB50871.xml}
2 S028DS {S028DS-SKB51334.xml, s028ds-SKB52424.xml}
2 S02FL6 {S02FL6-SKB51644.xml, S02FL6-SKB52197.xml}
1 S02VUD {S02VUD-SKB52083.xml}
where the Group property contains the original FileInfo objects.
Expand the groups in a ForEach-Object loop, sort each group by LastWriteTime, and select the most recent file from it:
... | ForEach-Object {
$_.Group | Sort-Object LastWriteTime -Desc | Select-Object -First 1
}

Related

Is there a way to display the latest file of multiple paths with information in a table format?

I check every day, whether a CSV-File has been exported to a specific folder (path). At the moment there are 14 different paths with 14 different files to check. The files are being stored in the folder and are not deleted. So i have to differ between a lot of files with "lastwritetime". I would like a code to display the results in table format. I would be happy with something like this:
Name LastWriteTime Length
ExportCSV1 21.09.2022 00:50 185
ExportCSV2 21.09.2022 00:51 155
My code looks like this:
$Paths = #('Path1', 'Path2', 'Path3', 'Path4', 'Path5', 'Path6', 'Path7', 'Path8', 'Path9', 'Path10', 'Path11', 'Path12', 'Path13', 'Path13')
foreach ($Path in $Paths){
Get-ChildItem $path | Where-Object {$_.LastWriteTime}|
select -last 1
Write-host $Path
}
pause
This way i want to make sure, that the files are being sent each day.
I get the results that i want, but it is not easy to look at the results individually.
I am new to powershell and would very much appreciate your help. Thank you in advance.
Continuing from my comments, here is how you could do this:
$Paths = #('Path1', 'Path2', 'Path3', 'Path4', 'Path5', 'Path6', 'Path7', 'Path8', 'Path9', 'Path10', 'Path11', 'Path12', 'Path13', 'Path13')
$Paths | ForEach-Object {
Get-ChildItem $_ | Where-Object {$_.LastWriteTime} | Select-Object -Last 1
} | Format-Table -Property Name, LastWriteTime, Length
If you want to keep using foreach() instead, you have to wrap it in a scriptblock {…} to be able to chain everything to Format-Table:
. {
foreach ($Path in $Paths){
Get-ChildItem $path | Where-Object {$_.LastWriteTime} | Select-Object -Last 1
}
} | Format-Table -Property Name, LastWriteTime, Length
Here the . operator is used to run the scriptblock immediately, without creating a new scope. If you want to create a new scope (e. g. to define temporary variables that exist only within the scriptblock), you could use the call operator & instead.

How to merge 2 x CSVs with the same column but overwrite not append?

I've got this one that has been baffling me all day, and I can't seem to find any search results that match exactly what I am trying to do.
I have 2 CSV files, both of which have the same columns and headers. They look like this (shortened for the purpose of this post):
"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","none48","H006"
"1013740016604537004556","3835265","A007"
"1013740016604537004556","3835269","B007"
"1013740016604537004556","3835271","C007"
Each of the 2 CSVs only have some actual Lab IDs, and the 'nonexx' are just fillers for the importing software. There is no duplication ie each 'well' is only referenced once across the 2 files.
What I need to do is merge the 2 CSVs, for example the second CSV might have a Lab ID for well H006 but the first will not. I need the lab ID from the second CSV imported into the first, overwriting the 'nonexx' currently in that column.
Here is my current code:
$CSVB = Import-CSV "$RootDir\SymphonyOutputPending\$plateID`A_Header.csv"
Import-CSV "$RootDir\SymphonyOutputPending\$plateID`_Header.csv" | ForEach-Object {
$CSVData = [PSCustomObject]#{
labid = $_.labid
well = $_.well
}
If ($CSVB.well -match $CSVData.wellID) {
write-host "I MATCH"
($CSVB | Where-Object {$_.well -eq $CSVData.well}).labid = $CSVData.labid
}
$CSVB | Export-CSV "$RootDir\SymphonyOutputPending\$plateID`_final.csv" -NoTypeInformation
}
The code runs but doesn't 'merge' the data, the final CSV output is just a replication of the first input file. I am definitely getting a match as the string "I MATCH" appears several times when debugging as expected.
Based on the responses in the comments of your question, I believe this is what you are looking for. This assumes that the both CSVs contain the exact same data with labid being the only difference.
There is no need to modify csv2 if we are just grabbing the labid to overwrite the row in csv1.
$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv
# Loop through csv1 rows
Foreach($line in $csv1) {
# If Labid contains "none"
If($line.labid -like "none*") {
# Set rows labid to the labid from csv2 row that matches plate/well
# May be able to remove the plate section if well is a unique value
$line.labid = ($csv2 | Where {$_.well -eq $line.well -and $_.plate -eq $line.plate}).labid
}
}
# Export to CSV - not overwrite - to confirm results
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation
Since you need to do a bi-directional comparison of the 2 Csvs you could create a new array of both and then group the objects by their well property, for this you can use Group-Object, then filter each group if their Count is equal to 2 where their labid property does not start with none else return the object as-is.
Using the following Csvs for demonstration purposes:
Csv1
"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","3835265","A007"
"newrowuniquecsv1","none123","X001"
Csv2
"plate","labid","well"
"1013740016604537004556","none48","A007"
"1013740016604537004556","3835269","F006"
"1013740016604537004556","3835271","G006"
"newrowuniquecsv2","none123","X002"
Code
Note that this code assumes there will be a maximum of 2 objects with the same well property and, if there are 2 objects with the same well, one of them must have a value not starting with none.
$mergedCsv = #(
Import-Csv pathtocsv1.csv
Import-Csv pathtocsv2.csv
)
$mergedCsv | Group-Object well | ForEach-Object {
if($_.Count -eq 2) {
return $_.Group.Where{ -not $_.labid.StartsWith('none') }
}
$_.Group
} | Export-Csv pathtomerged.csv -NoTypeInformation
Output
plate labid well
----- ----- ----
1013740016604537004556 3835265 A007
1013740016604537004556 3835269 F006
1013740016604537004556 3835271 G006
newrowuniquecsv1 none123 X001
newrowuniquecsv2 none123 X002
If the lists are large, performance might be an issue as Where-Object (or any other where method) and Group-Object do not perform very well for embedded loops.
By indexing the second csv file (aka creating a hashtable), you have quicker access to the required objects. Indexing upon two (or more) items (plate and well) is issued here: Does there exist a designated (sub)index delimiter? and resolved by #mklement0 and zett42 with a nice CaseInsensitiveArrayEqualityComparer class.
To apply this class on Drew's helpful answer:
$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv
$dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())
$csv2.ForEach{ $dict.($_.plate, $_.well) = $_ }
Foreach($line in $csv1) {
If($line.labid -like "none*") {
$line.labid = $dict.($line.plate, $line.well).labid
}
}
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation

PowerShell: Find unique values from multiple CSV files

let's say that I have several CSV files and I need to check a specific column and find values that exist in one file, but not in any of the others. I'm having a bit of trouble coming up with the best way to go about it as I wanted to use Compare-Object and possibly keep all columns and not just the one that contains the values I'm checking.
So I do indeed have several CSV files and they all have a Service Code column, and I'm trying to create a list for each Service Code that only appears in one file. So I would have "Service Codes only in CSV1", "Service Codes only in CSV2", etc.
Based on some testing and a semi-related question, I've come up with a workable solution, but with all of the nesting and For loops, I'm wondering if there is a more elegant method out there.
Here's what I do have:
$files = Get-ChildItem -LiteralPath "C:\temp\ItemCompare" -Include "*.csv"
$HashList = [System.Collections.Generic.List[System.Collections.Generic.HashSet[String]]]::New()
For ($i = 0; $i -lt $files.Count; $i++){
$TempHashSet = [System.Collections.Generic.HashSet[String]]::New([String[]](Import-Csv $files[$i])."Service Code")
$HashList.Add($TempHashSet)
}
$FinalHashList = [System.Collections.Generic.List[System.Collections.Generic.HashSet[String]]]::New()
For ($i = 0; $i -lt $HashList.Count; $i++){
$UniqueHS = [System.Collections.Generic.HashSet[String]]::New($HashList[$i])
For ($j = 0; $j -lt $HashList.Count; $j++){
#Skip the check when the HashSet would be compared to itself
If ($j -eq $i){Continue}
$UniqueHS.ExceptWith($HashList[$j])
}
$FinalHashList.Add($UniqueHS)
}
It seems a bit messy to me using so many different .NET references, and I know I could make it cleaner with a tag to say using namespace System.Collections.Generic, but I'm wondering if there is a way to make it work using Compare-Object which was my first attempt, or even just a simpler/more efficient method to filter each file.
I believe I found an "elegant" solution based on Group-Object, using only a single pipeline:
# Import all CSV files.
Get-ChildItem $PSScriptRoot\csv\*.csv -File -PipelineVariable file | Import-Csv |
# Add new column "FileName" to distinguish the files.
Select-Object *, #{ label = 'FileName'; expression = { $file.Name } } |
# Group by ServiceCode to get a list of files per distinct value.
Group-Object ServiceCode |
# Filter by ServiceCode values that exist only in a single file.
# Sort-Object -Unique takes care of possible duplicates within a single file.
Where-Object { ( $_.Group.FileName | Sort-Object -Unique ).Count -eq 1 } |
# Expand the groups so we get the original object structure back.
ForEach-Object Group |
# Format-Table requires sorting by FileName, for -GroupBy.
Sort-Object FileName |
# Finally pretty-print the result.
Format-Table -Property ServiceCode, Foo -GroupBy FileName
Test Input
a.csv:
ServiceCode,Foo
1,fop
2,fip
3,fap
b.csv:
ServiceCode,Foo
6,bar
6,baz
3,bam
2,bir
4,biz
c.csv:
ServiceCode,Foo
2,bla
5,blu
1,bli
Output
FileName: b.csv
ServiceCode Foo
----------- ---
4 biz
6 bar
6 baz
FileName: c.csv
ServiceCode Foo
----------- ---
5 blu
Looks correct to me. The values 1, 2 and 3 are duplicated between multiple files, so they are excluded. 4, 5 and 6 exist only in single files, while 6 is a duplicate value only within a single file.
Understanding the code
Maybe it is easier to understand how this code works, by looking at the intermediate output of the pipeline produced by the Group-Object line:
Count Name Group
----- ---- -----
2 1 {#{ServiceCode=1; Foo=fop; FileName=a.csv}, #{ServiceCode=1; Foo=bli; FileName=c.csv}}
3 2 {#{ServiceCode=2; Foo=fip; FileName=a.csv}, #{ServiceCode=2; Foo=bir; FileName=b.csv}, #{ServiceCode=2; Foo=bla; FileName=c.csv}}
2 3 {#{ServiceCode=3; Foo=fap; FileName=a.csv}, #{ServiceCode=3; Foo=bam; FileName=b.csv}}
1 4 {#{ServiceCode=4; Foo=biz; FileName=b.csv}}
1 5 {#{ServiceCode=5; Foo=blu; FileName=c.csv}}
2 6 {#{ServiceCode=6; Foo=bar; FileName=b.csv}, #{ServiceCode=6; Foo=baz; FileName=b.csv}}
Here the Name contains the unique ServiceCode values, while Group "links" the data to the files.
From here it should already be clear how to find values that exist only in single files. If duplicate ServiceCode values within a single file wouldn't be allowed, we could even simplify the filter to Where-Object Count -eq 1. Since it was stated that dupes within single files may exist, we need the Sort-Object -Unique to count multiple equal file names within a group as only one.
It is not completely clear what you expect as an output.
If this is just the ServiceCodes that intersect then this is actually a duplicate with:
Comparing two arrays & get the values which are not common
Union and Intersection in PowerShell?
But taking that you actually want the related object and files, you might use this approach:
$HashTable = #{}
ForEach ($File in Get-ChildItem .\*.csv) {
ForEach ($Object in (Import-Csv $File)) {
$HashTable[$Object.ServiceCode] = $Object |Select-Object *,
#{ n='File'; e={ $File.Name } },
#{ n='Count'; e={ $HashTable[$Object.ServiceCode].Count + 1 } }
}
}
$HashTable.Values |Where-Object Count -eq 1
Here is my take on this fun exercise, I'm using a similar approach as yours with the HashSet but adding [System.StringComparer]::OrdinalIgnoreCase to leverage the .Contains(..) method:
using namespace System.Collections.Generic
# Generate Random CSVs:
$charset = 'abABcdCD0123xXyYzZ'
$ran = [random]::new()
$csvs = #{}
foreach($i in 1..50) # Create 50 CSVs for testing
{
$csvs["csv$i"] = foreach($z in 1..50) # With 50 Rows
{
$index = (0..2).ForEach({ $ran.Next($charset.Length) })
[pscustomobject]#{
ServiceCode = [string]::new($charset[$index])
Data = $ran.Next()
}
}
}
# Get Unique 'ServiceCode' per CSV:
$result = #{}
foreach($key in $csvs.Keys)
{
# Get all unique `ServiceCode` from the other CSVs
$tempHash = [HashSet[string]]::new(
[string[]]($csvs[$csvs.Keys -ne $key].ServiceCode),
[System.StringComparer]::OrdinalIgnoreCase
)
# Filter the unique `ServiceCode`
$result[$key] = foreach($line in $csvs[$key])
{
if(-not $tempHash.Contains($line.ServiceCode))
{
$line
}
}
}
# Test if the code worked,
# If something is returned from here means it didn't work
foreach($key in $result.Keys)
{
$tmp = $result[$result.Keys -ne $key].ServiceCode
foreach($val in $result[$key])
{
if($val.ServiceCode -in $tmp)
{
$val
}
}
}
i was able to get unique items as follow
# Get all items of CSVs in a single variable with adding the file name at the last column
$CSVs = Get-ChildItem "C:\temp\ItemCompare\*.csv" | ForEach-Object {
$CSV = Import-CSV -Path $_.FullName
$FileName = $_.Name
$CSV | Select-Object *,#{N='Filename';E={$FileName}}
}
Foreach($line in $CSVs){
$ServiceCode = $line.ServiceCode
$file = $line.Filename
if (!($CSVs | where {$_.ServiceCode -eq $ServiceCode -and $_.filename -ne $file})){
$line
}
}

How to get one row of info from .Csv file with Powershell?

The powershell script:
$test = import-csv “C:\CSVFiles\test.csv”
ForEach ($item in $test)
{
$Name = $item.(“Name”)
$property = $item.("property")
$location = $item.(“location”)
Write-Output "$Name=$Name"
Write-Output "Property=$property"
Write-Output "Location=$location"
}
This script shows all the data for name,property and location for each row. I want the results to only show the data of one row;for example the row: n1;13;computer
The Cvs file =
Name;property;location
n1;13;computer
n2;65;computer
n3;12;tablet
n4;234;phone
n5;123;phone
n6;125;phone
What the current script spits out:
Name=n1
Property=13
Location=computer
Name=n2
Property=65
Location=computer
Name= n3
Property=12
Location=tablet
Name=n4
Property=234
Location=phone
Name=n5
Property=123
Location=phone
Name=n6
Property=125
Location=phone
There are many ways to select a Row of a csv and to present the data
For demonstration I use an inline csv with a here string.
$Test = #"
Name;property;location
n1;13;computer
n2;65;computer
n3;12;tablet
n4;234;phone
n5;123;phone
n6;125;phone
"# | ConvertFrom-Csv -delimiter ';'
> $test[0]
Name property location
---- -------- --------
n1 13 computer
> $test | where-object Name -eq 'n1'
Name property location
---- -------- --------
n1 13 computer
> $test | where-object Name -eq 'n1' | Select-Object Name,property
Name property
---- --------
n1 13
> $test | where-object Name -eq 'n1' | ForEach-Object {"Name:{0} has property: {1}" -f $_.Name,$_.property}
Name:n1 has property: 13
Once imported the csv rows contents are converted to objects
If you want to get the original row of the csv matching a criteria don't import but:
> Get-Content "C:\CSVFiles\test.csv" | Select-String '^n1'
n1;13;computer
^n1 is a regular expression anchoring the pattern at line begin.
Select-String -Path "C:\CSVFiles\test.csv" -Pattern '^n1'
Is the same without a pipe
So your current output is having 3 objects that are having 3 headers majorly.
One is Name; Second one is Property and the third one is Location.
As part of solution, you can either pull the records by specifying the index value or you can use the .Headername to pull all the sets of same object. Like:
Avoiding Foreach and accessing with $test[0] or $test[1]
Or you can use like: $test.Name directly to have all of the names from the csv

Filtering files by partial name match

I have a network share with 20.000 XML files in the format
username-computername.xml
There are duplicate entries in the form of (when a user received a new comptuer)
user1-computer1.xml
user1-computer2.xml
or
BLRPPR-SKB52084.xml
BLRSIA-SKB50871.xml
S028DS-SKB51334.xml
s028ds-SKB52424.xml
S02FL6-SKB51644.xml
S02FL6-SKB52197.xml
S02VUD-SKB52083.xml
Since im going to manipulate the XMLs later I can't just dismiss properties of the array as at the very least I need the full path. The aim is, if a duplicate is found, the one with the newer timestamp is being used.
Here is a snipet of the code where I need that logic
$xmlfiles = Get-ChildItem "network share"
Here I'm just doing a foreach loop:
foreach ($xmlfile in $xmlfiles) {
[xml]$xmlcontent = Get-Content -Path $xmlfile.FullName -Encoding UTF8
Select-Xml -Xml $xmlcontent -Xpath " "
# create [pscustomobject] etc...
}
Essentially what I need is
if ($xmlfiles.Name.Split("-")[0]) - duplicate) {
# select the one with higher $xmlfiles.LastWriteTime and store either
# the full object or the $xmlfiles.FullName
}
Ideally that should be part of the foreach loop to not to have to loop through twice.
You can use Group-Object to group files by a custom attribute:
$xmlfiles | Group-Object { $_.Name.Split('-')[0] }
The above statement will produce a result like this:
Count Name Group
----- ---- -----
1 BLRPPR {BLRPPR-SKB52084.xml}
1 BLRSIA {BLRSIA-SKB50871.xml}
2 S028DS {S028DS-SKB51334.xml, s028ds-SKB52424.xml}
2 S02FL6 {S02FL6-SKB51644.xml, S02FL6-SKB52197.xml}
1 S02VUD {S02VUD-SKB52083.xml}
where the Group property contains the original FileInfo objects.
Expand the groups in a ForEach-Object loop, sort each group by LastWriteTime, and select the most recent file from it:
... | ForEach-Object {
$_.Group | Sort-Object LastWriteTime -Desc | Select-Object -First 1
}