how can I correct my reconciliation of .csv files to remove dupes/nulls - powershell

I have been using code from this answer to check for additions/changes to class rosters from MS Teams:
$set = [System.Collections.Generic.HashSet[string]]::new(
[string[]] (Import-CSV -Path stundent.csv).UserPrincipalName,
[System.StringComparer]::InvariantCultureIgnoreCase
)
Import-Csv ad.csv | Where-Object { $set.Add($_.UserPrincipalName) } |
Export-Csv path\to\output.csv -NoTypeInformation
Ideally, I want to be able to check if there have been removals when compared to a new file, swap the import file positions, and check for additions. If my files look like Source1 and Source2 (below), the check for removals would return Export1, and the check for additions would return Export2.
Since there will be multiple instances of students across multiple classes, I want to include TeamDesc in the filter query to make sure only the specific instance of that student with that class is returned.
Source1.csv
TeamDesc
UserPrincipalName
Name
Team 1
student1#domain.com
john smith
Team 1
student2#domain.com
nancy drew
Team 2
student3#domain.com
harvey dent
Team 3
student1#domain.com
john smith
Source2.csv
TeamDesc
UserPrincipalName
Name
Team 1
student2#domain.com
nancy drew
Team 2
student3#domain.com
harvey dent
Team 2
student4#domain.com
tim tams
Team 3
student1#domain.com
john smith
Export1.csv
TeamDesc
UserPrincipalName
Name
Team 1
student1#domain.com
john smith
Export2.csv
TeamDesc
UserPrincipalName
Name
Team 2
student4#domain.com
tim tams

Try the following, which uses Compare-Object to compare the CSV files by two column values, simply by passing the property (column) names of interest to -Property; the resulting output is split into two collections based on which input side a differing property combination is unique to, using the intrinsic .Where() method:
$removed, $added = (
Compare-Object (Import-Csv Source1.csv) (Import-Csv Source2.csv) -PassThru `
-Property TeamDesc, UserPrincipalName
).Where({ $_.SideIndicator -eq '=>' }, 'Split')
$removed |
Select-Object -ExcludeProperty SideIndicator |
Export-Csv -NoTypeInformation Export1.csv
$added |
Select-Object -ExcludeProperty SideIndicator |
Export-Csv -NoTypeInformation Export2.csv

Assuming both Csvs are stored in memory, Source1.csv is $csv1 and Source2.csv is $csv2, you already have the logic for Export2.csv using the HashSet<T>:
$set = [System.Collections.Generic.HashSet[string]]::new(
[string[]] $csv1.UserPrincipalName,
[System.StringComparer]::InvariantCultureIgnoreCase
)
$csv2 | Where-Object { $set.Add($_.UserPrincipalName) }
Outputs:
TeamDesc UserPrincipalName Name
-------- ----------------- ----
Team 2 student4#domain.com tim tams
For the first requirement, Export1.csv, the reference object would be $csv2 and instead of a HashSet<T> you could use a hash table, Group-Object -AsHashTable makes it really easy in this case:
$map = $csv2 | Group-Object UserPrincipalName -AsHashTable -AsString
# if Csv2 has unique values for `UserPrincipalName`
$csv1 | Where-Object { $map[$_.UserPrincipalName].TeamDesc -ne $_.TeamDesc }
# if Csv2 has duplicated values for `UserPrincipalName`
$csv1 | Where-Object { $_.TeamDesc -notin $map[$_.UserPrincipalName].TeamDesc }
Outputs:
TeamDesc UserPrincipalName Name
-------- ----------------- ----
Team 1 student1#domain.com john smith

Using this Join-Object script/Join-Object Module (see also: How to compare two CSV files and output the rows that are just in either of the file but not in both and In Powershell, what's the best way to join two tables into one?):
Loading your sample data:
(In your case you probably want to use Import-Csv to import your data)
Install-Script -Name Read-HtmlTable
$Csv1 = Read-HtmlTable https://stackoverflow.com/q/74452725 -Table 0 # Import-Csv .\Source1.csv
$Csv2 = Read-HtmlTable https://stackoverflow.com/q/74452725 -Table 1 # Import-Csv .\Source2.csv
Install-Module -Name JoinModule
$Csv1 |OuterJoin $Csv2 -On TeamDesc, UserPrincipalName -Name Out,In
TeamDesc UserPrincipalName OutName InName
-------- ----------------- ------- ------
Team 1 student1#domain.com john smith
Team 2 student4#domain.com tim tams
You might use the (single) result file as is. If you really want to work with two different files, you might split the results as in the nice answer from mklement0.

Related

How do I find a user by DisplayName in one CSV file given a DistinguishedName is another CSV file?

I have two CSV files with tab-separated values.
CSV1:
GivenName Surname DisplayName Manager
John Smith John Smith Oliver Twist
CSV2:
SAMAccountName mail DistinguishedName
John Smith johns#example.com CN=John Smith,CN=Users,DC=example,DC=com
Oliver Twist olivert#example.com CN=Oliver Twist,CN=Users,DC=example,DC=com
What I've wanted to do is to compare the CSV1 with CSV2 and match the manager name in Manager column in CSV1 file with CSV2 DistinguishedName column value (i.e. "CN=Oliver Twist") as in this example and export to a new CSV file like this:
SAMAccountName mail Manager
John Smith johns#example.com Oliver Twist
There might be hundreds of distinguished names in CSV2 file as mentioned and all I want is output from comparing and matching the DistinguishedName column value "CN=whatever name " with the Manager column in CSV1.
Here is the script I tried:
$csv1 = Import-Csv -Path 'C:temp\AD.csv'
$csv2 = Import-Csv -Path 'C:\temp\AD_ExportDN.csv'
$csv3 = foreach ($record in $csv2) {
$matchingRecord = $csv1 | Where-Object { $_.manager -eq $record.manager }
# if your goal is to ONLY export records with a matching email address,
# uncomment the if ($matchingRecord) condition
#if ($matchingRecord) {
$record | Select-Object *, #{Name = 'manager'; Expression = {$matchingRecord.DistinguishedName}}
#}
}
$csv3 | Export-Csv -NoTypeInformation -Force -Path 'C:\temp\merged_for_AD2.csv'
Your question inspired me to add an new feature to the Join-Object script/Join-Object Module I am maintaining (see also: In Powershell, what's the best way to join two tables into one?).
The feature is explained in issue: #29 key expressions.
In your case the required command will be something like:
$Csv1 |Join $Csv2 -On Manager -Equals { [RegEx]::Match($_.DistinguishedName, '(?<=CN=).*(?=,CN=Users)') }
Which returns:
GivenName : John
Surname : Smith
DisplayName : John Smith
Manager : Oliver Twist
SAMAccountName : Oliver Twist
mail : olivert#example.com
DistinguishedName : CN=Oliver Twist,CN=Users,DC=example,DC=com
For the specific regular expression '(?<=CN=).*(?=,CN=Users)':
.* refers to any sequence of characters that is
Preceded by CN= ((?<=CN=), regex Lookbehind)
and followed by CN=Users ((?=,CN=Users), regex lookahead)

Compare multiple elements in an object against multiple elements in another object of a different array

Say [hypothetically], I have two .CSVs I'm comparing to try and see which of my current members are original members... I wrote a nested ForEach-Object comparing every $name and $memberNumber from each object against every other object. It works fine, but is taking way to long, especially since each CSV has 10s of thousands of objects. Is there another way I should approach this?
Original_Members.csv
Name, Member_Number
Alice, 1234
Jim , 4567
Current_Members.csv
Alice, 4599
Jim, 4567
$currentMembers = import-csv $home\Desktop\current_members.csv |
ForEach-Object {
$name = $_.Name
$memNum = $_."Member Number"
$ogMembers = import-csv $home\Desktop\original_members.csv" |
ForEach-Object {
If ($ogMembers.Name -eq $name -and $ogMembers."Member Number" -eq $memNum) {
$ogMember = "Yes"
}
Else {
$ogMember = "No"
}
}
[pscustomobject]#{
"Name"=$name
"Member Number"=$memNum
"Original Member?"=$ogMember
}
} |
select "Name","Member Number","Original Member?" |
Export-CSV "$home\Desktop\OG_Compare_$(get-date -uformat "%d%b%Y").csv" -Append -NoTypeInformation
Assuming both of your files are like the below:
Original_Members.csv
Name, Member_Number
Alice, 1234
Jim, 4567
Current_Members.csv
Name, Member_Number
Alice, 4599
Jim, 4567
You could store the original member names in a System.Collections.Generic.HashSet<T> for constant time lookups, instead of doing a linear search for each name. We can use System.Linq.Enumerable.ToHashSet to create a hashset of string[] names.
We can then use Where-Object to filter current names by checking if the hashset contains the original name with System.Collections.Generic.HashSet<T>.Contains(T), which is an O(1) method.
$originalMembers = Import-Csv -Path .\Original_Members.csv
$currentMembers = Import-Csv -Path .\Current_Members.csv
$originalMembersLookup = [Linq.Enumerable]::ToHashSet(
[string[]]$originalMembers.Name,
[StringComparer]::CurrentCultureIgnoreCase
)
$currentMembers |
Where-Object {$originalMembersLookup.Contains($_.Name)}
Which will output the current members that were original members:
Name Member_Number
---- -------------
Alice 4599
Jim 4567
Update
As requested in the comments, If we want to check both Name and Member_Number, we can concatenate both strings to use for lookups:
$originalMembers = Import-Csv -Path .\Original_Members.csv
$currentMembers = Import-Csv -Path .\Current_Members.csv
$originalMembersLookup = [Linq.Enumerable]::ToHashSet(
[string[]]($originalMembers |
ForEach-Object {
$_.Name + $_.Member_Number
}),
[StringComparer]::CurrentCultureIgnoreCase
)
$currentMembers |
Where-Object {$originalMembersLookup.Contains($_.Name + $_.Member_Number)}
Which will now only return:
Name Member_Number
---- -------------
Jim 4567

Select-Object -ExcludeProperty based on the property's value

I have an object that has a large amount of properties. I want to return several of these properties, whose names may not always be consistent. I want to EXCLUDE properties that have or contain a particular value.
$notneeded = #('array of properties that I do not wish to select')
$csvPath = "$Log\$Summary"
$csvData = Get-Content -Path $csvPath | Select-Object -Skip 1 | Out-String | ConvertFrom-Csv #the first line is extra (not a header), needs skipped
$csvData | Select-Object -Property * -ExcludeProperty $notneeded
If the list of properties to exclude was static, then I could use this. But I want to exclude properties from view that contain a particular value.
INPUT CSV
John,Doe,120 jefferson st.,Riverside, NJ, 08075
Jack,McGinnis,220 hobo Av.,Phila, PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside, NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
,Blankman,,SomeTown, SD, 00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123
SCRIPT
$csvData = Invoke-WebRequest -Uri "https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv" | ConvertFrom-Csv -Header "Name", "Surname", "Address", "City", "State", "Zip"
$particularValue = "*120*"
$notneeded = #()
$csvData | Foreach-Object { $notneeded += $_.PSObject.Properties | Where-Object Value -like $particularValue | Select-Object Name }
$notneeded = $notneeded | Select-Object -Unique -ExpandProperty Name
$csvData | Select-Object * -ExcludeProperty $notneeded | Format-Table
Note That e.g. I want to exclude column where 120 is mentioned. Also I named columns in my script
OUTPUT (See the address column is missing)
Name Surname City State Zip
---- ------- ---- ----- ---
John Doe Riverside NJ 08075
Jack McGinnis Phila PA 09119
John "Da Man" Repici Riverside NJ 08075
Stephen Tyler SomeTown SD 91234
Blankman SomeTown SD 00298
Joan "the bone", Anne Jet Desert City CO 00123

Outputting sorted sections of sorted data to a variable?

I have the following PowerShell code:
[xml]$xml = Get-Content uccxResourceList_forReference.xml
$xml.resources.resource |
Select-Object firstname, lastname, extension,
#{Name="Team"; Expression={($_.team.name)}} |
Sort Team |
Format-Table
which produces a table like this:
firstName lastName extension Team
--------- -------- --------- ----------------
Homer Simpson 1000 SafetyInspectors
Frank Grimes 1001 SafetyInspectors
Lenford Leonard 1002 SafetyInspectors
Carlton Carlson 1003 SafetyInspectors
Montgomery Burns 2000 Executives
Waylon Smithers 2001 Executives
What I would like to do is output each team into its own file. So not just a simple | Out-File teamlist.txt at the end, but I would like to output a text file containing all of the "SafetyInspectors" and another with all of the "Executives".
I know I could get this done with a subsequent foreach loop but I feel it could also be done in the pipeline and I just don't know how to do it.
I'd prefer to output to a csv file (which easily is imported again) , so:
[xml]$xml = Get-Content uccxResourceList_forReference.xml
$xml.resources.resource |
Select-Object firstname, lastname, extension,
#{Name="Team"; Expression={($_.team.name)}} |
Group-Object Team | ForEach-Object {
$_.Group | Export-Csv ("{0}.csv" -f $_.Name) -NoTypeInformation
}
Should return something like this:
> gc .\Executives.csv
"firstName","lastName","extension","Team"
"Montgomery","Burns","2000","Executives"
"Waylon","Smithers","2001","Executives"
> gc .\SafetyInspectors.csv
"firstName","lastName","extension","Team"
"Homer","Simpson","1000","SafetyInspectors"
"Frank","Grimes","1001","SafetyInspectors"
"Lenford","Leonard","1002","SafetyInspectors"
"Carlton","Carlson","1003","SafetyInspectors"

Compare-Object and include properties not being compared in output

I'm comparing two CSV files that come from different sources (different column/property names) using the Compare-Object cmdlet. How can I include properties that are in either CSV file in the output without including them in the comparison?
Example CSV data
users1.csv
e-mail-address,name,side
luke#sw.com,Luke,light
users2.csv
e-mail-address,hiredate,hobbies
lando#sw.com,5/2/17,Sabacc
The following gives me a column with the e-mail address and side indicator, but how can I get $Users1.name and $Users2.hiredate without using them in the comparison?
$Users1 = Import-Csv users1.csv
$Users2 = Import-Csv users2.csv
Compare-Object $Users1 $Users2 -Property "E-mail-Address"
I'd like output similar to:
e-mail-address | SideIndicator | name | hiredate
---------------|---------------|------|----------
luke#sw.com | <= | Luke |
lando#sw.com | => | | 5/2/17
Add the PassThru parameter to have Compare-Object return all the properties, then use Select-Object to grab the name and hiredate properties:
Compare-Object $users1 $users2 -Property e-mail-address -PassThru|Select-Object e-mail-address,SideIndicator,name,hiredate