PowerShell display duplicates in CSV - powershell

I have a csv with headers like this
ID,Name,IP,Details
There are some value will be duplicated like this
1,John,192.168.1.1,Details1
2,Mary,192.168.1.2,Details2
3,John,192.168.1.3,Details3
4,Dick,192.168.1.1,Details4
5,Kent,192.168.1.4,Details5
Is there anyway I can select all lines with duplicated values?
Desired Output:
1,John,192.168.1.1,Details1
3,John,192.168.1.3,Details3
4,Dick,192.168.1.1,Details4
So far I have tried
Import-csv file | group | sort -des | select -f 10
but the result only group those with whole lines matching
I will be appreciate if anyone could lend a hand or show direction for me to solve this question. Thanks in advance for any reply

I'm not sure if I've understood your problem up to 100%, but as far as I've understood you want something like this:
# ID Name IP Detail
#-- ---- -- ------
#1 John 192.168.1.1 Details1
#2 Mary 192.168.1.2 Details2
#3 John 192.168.1.3 Details3
#4 Dick 192.168.1.1 Details4
#5 Kent 192.168.1.4 Details5
$input = Import-csv .\input.csv
$input
$uniqueNames = $input.Name | select -Unique
$duplicateNames = Compare-Object -ReferenceObject $input.Name -DifferenceObject $uniqueNames | ? { $_.SideIndicator -like "<=" } | select -ExpandProperty InputObject
$uniqueIps = $input.IP | select -Unique
$duplicateIps = Compare-Object -ReferenceObject $input.IP -DifferenceObject $uniqueIps | ? { $_.SideIndicator -like "<=" } | select -ExpandProperty InputObject
Write-Output ""
Write-Output "========================="
Write-Output "Duplicate Names: $duplicateNames"
Write-Output "Duplicate IPs: $duplicateIps"
Output:
ID Name IP Detail
-- ---- -- ------
1 John 192.168.1.1 Details1
2 Mary 192.168.1.2 Details2
3 John 192.168.1.3 Details3
4 Dick 192.168.1.1 Details4
5 Kent 192.168.1.4 Details5
=========================
Duplicate Names: John
Duplicate Names: 192.168.1.1
Hope that helps.

Related

how can I correct my reconciliation of .csv files to remove dupes/nulls

I have been using code from this answer to check for additions/changes to class rosters from MS Teams:
$set = [System.Collections.Generic.HashSet[string]]::new(
[string[]] (Import-CSV -Path stundent.csv).UserPrincipalName,
[System.StringComparer]::InvariantCultureIgnoreCase
)
Import-Csv ad.csv | Where-Object { $set.Add($_.UserPrincipalName) } |
Export-Csv path\to\output.csv -NoTypeInformation
Ideally, I want to be able to check if there have been removals when compared to a new file, swap the import file positions, and check for additions. If my files look like Source1 and Source2 (below), the check for removals would return Export1, and the check for additions would return Export2.
Since there will be multiple instances of students across multiple classes, I want to include TeamDesc in the filter query to make sure only the specific instance of that student with that class is returned.
Source1.csv
TeamDesc
UserPrincipalName
Name
Team 1
student1#domain.com
john smith
Team 1
student2#domain.com
nancy drew
Team 2
student3#domain.com
harvey dent
Team 3
student1#domain.com
john smith
Source2.csv
TeamDesc
UserPrincipalName
Name
Team 1
student2#domain.com
nancy drew
Team 2
student3#domain.com
harvey dent
Team 2
student4#domain.com
tim tams
Team 3
student1#domain.com
john smith
Export1.csv
TeamDesc
UserPrincipalName
Name
Team 1
student1#domain.com
john smith
Export2.csv
TeamDesc
UserPrincipalName
Name
Team 2
student4#domain.com
tim tams
Try the following, which uses Compare-Object to compare the CSV files by two column values, simply by passing the property (column) names of interest to -Property; the resulting output is split into two collections based on which input side a differing property combination is unique to, using the intrinsic .Where() method:
$removed, $added = (
Compare-Object (Import-Csv Source1.csv) (Import-Csv Source2.csv) -PassThru `
-Property TeamDesc, UserPrincipalName
).Where({ $_.SideIndicator -eq '=>' }, 'Split')
$removed |
Select-Object -ExcludeProperty SideIndicator |
Export-Csv -NoTypeInformation Export1.csv
$added |
Select-Object -ExcludeProperty SideIndicator |
Export-Csv -NoTypeInformation Export2.csv
Assuming both Csvs are stored in memory, Source1.csv is $csv1 and Source2.csv is $csv2, you already have the logic for Export2.csv using the HashSet<T>:
$set = [System.Collections.Generic.HashSet[string]]::new(
[string[]] $csv1.UserPrincipalName,
[System.StringComparer]::InvariantCultureIgnoreCase
)
$csv2 | Where-Object { $set.Add($_.UserPrincipalName) }
Outputs:
TeamDesc UserPrincipalName Name
-------- ----------------- ----
Team 2 student4#domain.com tim tams
For the first requirement, Export1.csv, the reference object would be $csv2 and instead of a HashSet<T> you could use a hash table, Group-Object -AsHashTable makes it really easy in this case:
$map = $csv2 | Group-Object UserPrincipalName -AsHashTable -AsString
# if Csv2 has unique values for `UserPrincipalName`
$csv1 | Where-Object { $map[$_.UserPrincipalName].TeamDesc -ne $_.TeamDesc }
# if Csv2 has duplicated values for `UserPrincipalName`
$csv1 | Where-Object { $_.TeamDesc -notin $map[$_.UserPrincipalName].TeamDesc }
Outputs:
TeamDesc UserPrincipalName Name
-------- ----------------- ----
Team 1 student1#domain.com john smith
Using this Join-Object script/Join-Object Module (see also: How to compare two CSV files and output the rows that are just in either of the file but not in both and In Powershell, what's the best way to join two tables into one?):
Loading your sample data:
(In your case you probably want to use Import-Csv to import your data)
Install-Script -Name Read-HtmlTable
$Csv1 = Read-HtmlTable https://stackoverflow.com/q/74452725 -Table 0 # Import-Csv .\Source1.csv
$Csv2 = Read-HtmlTable https://stackoverflow.com/q/74452725 -Table 1 # Import-Csv .\Source2.csv
Install-Module -Name JoinModule
$Csv1 |OuterJoin $Csv2 -On TeamDesc, UserPrincipalName -Name Out,In
TeamDesc UserPrincipalName OutName InName
-------- ----------------- ------- ------
Team 1 student1#domain.com john smith
Team 2 student4#domain.com tim tams
You might use the (single) result file as is. If you really want to work with two different files, you might split the results as in the nice answer from mklement0.

Powershell Foreach where-object -eq value

If I import a csv of computer names $computers that looks like this
Host
----
computer5
Computer20
Computer1
and import another csv of data $data that looks like this
Host OS HD Capacity Owner
---- -- -- ------ -----
Computer20 Windows7 C 80 Becky
Computer1 Windows10 C 80 Tom
Computer1 Windows10 D 100 Tom
computer5 Windows8 C 100 sara
computer5 Windows8 D 1000 sara
computer5 Windows8 E 1000 sara
Im trying to do this
foreach ($pc in $computers){
$hostname = $pc.host
foreach ($entry in $data | where {$enty.host -eq $hostname})
{write-host "$entry.Owner"}
}
The foreach statement isn't finding any matches. How can I fix this?
Im trying to do more than just write the owner name but, this is the basic premise
Thanks!
Steve
Replace $enty with $_ inside the Where-Object filter block:
foreach ($entry in $data |Where-Object {$_.host -eq $hostname}){
Write-Host $entry.Owner
}
Here's another way to do it, using an array $computers.host on the left side of the -eq operator:
$computers = 'Host
computer5
Computer20
Computer1' | convertfrom-csv
$data = 'Host,OS,HD,Capacity,Owner
Computer20,Windows7,C,80,Becky
Computer1,Windows10,C,80,Tom
Computer1,Windows10,D,100,Tom
computer5,Windows8,C,100,sara
computer5,Windows8,D,1000,sara
computer5,Windows8,E,1000,sara' | convertfrom-csv
$data | where { $computers.host -eq $_.host } | ft
Host OS HD Capacity Owner
---- -- -- -------- -----
Computer20 Windows7 C 80 Becky
Computer1 Windows10 C 80 Tom
Computer1 Windows10 D 100 Tom
computer5 Windows8 C 100 sara
computer5 Windows8 D 1000 sara
computer5 Windows8 E 1000 sara
Or
$data | where host -in $computers.host
Or
foreach ( $entry in $data | where host -in $computers.host | foreach owner ) {
$entry }
Becky
Tom
Tom
sara
sara
sara
Assuming $entry is spelled correctly, this syntax just happens not to work.
The expression $data | where {$entry.host -eq $hostname} just doesn't make sense by itself. Good question though. You can indeed use a pipeline in the parens after foreach.
foreach ($pc in $computers){
$hostname = $pc.host
foreach ($entry in $data | where {write-host "entry is $entry";
$entry.host -eq $hostname}){
write-host "$entry.Owner"
}
}
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is
entry is

Compare multiple elements in an object against multiple elements in another object of a different array

Say [hypothetically], I have two .CSVs I'm comparing to try and see which of my current members are original members... I wrote a nested ForEach-Object comparing every $name and $memberNumber from each object against every other object. It works fine, but is taking way to long, especially since each CSV has 10s of thousands of objects. Is there another way I should approach this?
Original_Members.csv
Name, Member_Number
Alice, 1234
Jim , 4567
Current_Members.csv
Alice, 4599
Jim, 4567
$currentMembers = import-csv $home\Desktop\current_members.csv |
ForEach-Object {
$name = $_.Name
$memNum = $_."Member Number"
$ogMembers = import-csv $home\Desktop\original_members.csv" |
ForEach-Object {
If ($ogMembers.Name -eq $name -and $ogMembers."Member Number" -eq $memNum) {
$ogMember = "Yes"
}
Else {
$ogMember = "No"
}
}
[pscustomobject]#{
"Name"=$name
"Member Number"=$memNum
"Original Member?"=$ogMember
}
} |
select "Name","Member Number","Original Member?" |
Export-CSV "$home\Desktop\OG_Compare_$(get-date -uformat "%d%b%Y").csv" -Append -NoTypeInformation
Assuming both of your files are like the below:
Original_Members.csv
Name, Member_Number
Alice, 1234
Jim, 4567
Current_Members.csv
Name, Member_Number
Alice, 4599
Jim, 4567
You could store the original member names in a System.Collections.Generic.HashSet<T> for constant time lookups, instead of doing a linear search for each name. We can use System.Linq.Enumerable.ToHashSet to create a hashset of string[] names.
We can then use Where-Object to filter current names by checking if the hashset contains the original name with System.Collections.Generic.HashSet<T>.Contains(T), which is an O(1) method.
$originalMembers = Import-Csv -Path .\Original_Members.csv
$currentMembers = Import-Csv -Path .\Current_Members.csv
$originalMembersLookup = [Linq.Enumerable]::ToHashSet(
[string[]]$originalMembers.Name,
[StringComparer]::CurrentCultureIgnoreCase
)
$currentMembers |
Where-Object {$originalMembersLookup.Contains($_.Name)}
Which will output the current members that were original members:
Name Member_Number
---- -------------
Alice 4599
Jim 4567
Update
As requested in the comments, If we want to check both Name and Member_Number, we can concatenate both strings to use for lookups:
$originalMembers = Import-Csv -Path .\Original_Members.csv
$currentMembers = Import-Csv -Path .\Current_Members.csv
$originalMembersLookup = [Linq.Enumerable]::ToHashSet(
[string[]]($originalMembers |
ForEach-Object {
$_.Name + $_.Member_Number
}),
[StringComparer]::CurrentCultureIgnoreCase
)
$currentMembers |
Where-Object {$originalMembersLookup.Contains($_.Name + $_.Member_Number)}
Which will now only return:
Name Member_Number
---- -------------
Jim 4567

How to use powershell to determine the frequency of objects in a collection based on a specific member

I am query the Security EventLog on our PDC to watch for trends that might indicate compromised hosts or usernames. I have got the code to gather the info and clean it up...
$TargetEvents ends up like this:
(I can't figure out how to format a normal looking table in my post)
Host User
---- ----
host1 user1
host2 user2
host1 user3
host1 user4
host2 user4
$Events= Get-WinEvent -ComputerName MYPDC -FilterHashtable #{Logname='Security';id=4740} -MaxEvents 10
$TargetEvents=#()
foreach ($Event in $Events)
{
$obj=[PSCustomObject]#{
Host=$Event.Properties[1].value.ToString()
User=$Event.Properties[0].value.ToString()
}
$TargetEvents+=$obj
}
I'd like to be able create a summary but I'm just genuinely stuck. I don't program professionally, just tools to help my work.
This is what I'm TRYING to create:
Host Frequency
---- ---------
host1 3
host2 2
User Frequency
---- ---------
user1 1
user2 1
user3 1
user4 2
You can directly emit the [PSCustomObject] to a variable gathering the whole foreach output:
$Events= Get-WinEvent -ComputerName MYPDC -FilterHashtable #{Logname='Security';id=4740} -MaxEvents 10
$TargetEvents= foreach ($Event in $Events){
[PSCustomObject]#{
Host=$Event.Properties[1].value.ToString()
User=$Event.Properties[0].value.ToString()
}
}
To get your data you can use Group-Object here with -NoElement as only count (Frequency) and Host/ User are relevant
> $TargetEvents | Group-Object Host -NoElement
Count Name
----- ----
3 host1
2 host2
To get your exact desired output use a Select-Object to rename Count to Frequency with a calculated property.
$TargetEvents | Group-Object Host -NoElement | Select-Object #{n='Host';e={$_.Name}},#{n='Frequency';e={$_.Count}}
Host Frequency
---- ---------
host1 3
host2 2
$TargetEvents | Group-Object User -NoElement | Select-Object #{n='User';e={$_.Name}},#{n='Frequency';e={$_.Count}}
User Frequency
---- ---------
user1 1
user2 1
user3 1
user4 2
The Group-Object cmdlet is your friend here:
# rebuilding your example data
$items = #("host1 user1",
"host2 user2",
"host1 user3",
"host1 user4",
"host2 user4")
$arraylist = New-Object System.Collections.ArrayList
#loop over the objects, split at the space and assign variable values
foreach($item in $items) {
$thehost = $item.split(' ')[0]
$theuser = $item.split(' ')[1]
# including | Out-Null just keeps the Add() method from outputting the number of the item
# it does not nullify your data
$arraylist.Add([PSCustomObject]#{Host=$thehost;User=$theuser}) | Out-null
}
# group by host property
$hostFrequency = $arraylist | Group-Object -Property Host
# group by user property
$userFrequency = $arraylist | Group-Object -Property User
# results arrays
$hostResults = New-Object System.Collections.ArrayList
$userResults = New-Object System.Collections.ArrayList
# group the count of hosts along with corresponding name
foreach($item in $hostFrequency)
{
$hostResults.Add([PSCustomObject]#{Host = $item.Name; Frequency = $item.Count}) | Out-Null
}
# group the count of users along with corresponding name
foreach($item in $userFrequency)
{
$userResults.Add([PSCustomObject]#{User = $item.Name; Frequency = $item.Count}) | Out-Null
}
# output all the things
$hostResults | Format-Table
$userResults | Format-Table
Output:
Host Frequency
---- ---------
host1 3
host2 2
User Frequency
---- ---------
user1 1
user2 1
user3 1
user4 2

Powershell Group Paramter - Count over 100 adds value to variable

I found the below article to help me group a list of IP addresses.
Using Powershell, how can i count the occurrence of each element in an array?
The command I am currently using is:
get-content c:\temp\temp3.txt | group
This would get the following output:
> Count Name Group
----- ---- -----
3 192.168.1.1 {192.168.1.1, 192.168.1.1, 192.168.1.1}
3 192.168.1.2 {192.168.1.2, 192.168.1.2, 192.168.1.2}
What command can I use to find all IP's with over a count of 5?
I imagine I would need to put my orignal command as a variable like below:
$groupedoutput get-content c:\temp\temp3.txt | group
Unsure where to go from there. Any help would be appreciated.
Thanks,
S
In full:
Get-Content -Path c:\temp\temp3.txt |
Group-Object -NoElement |
Where-Object { $_.Count -gt 5 } |
Select-Object -ExpandProperty Name
In short:
gc c:\temp\temp3.txt | group |? count -gt 5 |% Name
The -NoElement switch means the groups don't gather up all this stuff: {192.168.1.1, 192.168.1.1, 192.168.1.1} , it's not really necessary here but it saves memory if you aren't going to use the grouped items.