Grouping duplicate values in a Powershell custom object without deleting - powershell

I have a PS custom object that is in this format:
|Name|Number|Email|
|----|------| -----|
|Bob| 23| bob.bob#test.com|
|Tom|124|tom.tom#test.com|
|Jeff|125|jeff.jeff#test.com|
|Jeff|127|jeff.jeff#test.com|
|Jeff|129|jeff.jeff#test.com|
|Jessica|126|jessica.jessica#test.com|
|Jessica|132|jessica.jessica#test.com|
I'd like to group together the fields where the numbers are the same. I.e:
|Name|Number|Email|
|----|------|-----|
|Bob|123|bob.bob#test.com|
|Tom|124|tom.tom#test.com|
|Jeff|125,127,129|jeff.jeff#test.com|
|Jessica|126,132|jessica.jessica#test.com|
I've tried a number of compare-object, sort-object, creating a new array etc. but I can't seem to get it.
Any ideas?

Group-Object is perfectly suited for that task. With the help of calculated properties, you can create a select statement that will produce the desired output in one sweep.
$data = #'
Name|Number|Email|
Bob| 23| bob.bob#test.com|
Tom|124|tom.tom#test.com|
Jeff|125|jeff.jeff#test.com|
Jeff|127|jeff.jeff#test.com|
Jeff|129|jeff.jeff#test.com|
Jessica|126|jessica.jessica#test.com|
Jessica|132|jessica.jessica#test.com|
'# | ConvertFrom-Csv -Delimiter '|'
# Group by name and email
$data | Group-Object -Property Name, Email |
Select #{'Name' = 'Name' ; 'Expression' = { $_.Group[0].Name } },
#{'Name' = 'Number' ; 'Expression' = { $_.Group.Number } },
#{'Name' = 'Email' ; 'Expression' = { $_.Group[0].Email } }
Output
Name Number Email
---- ------ -----
Bob 23 bob.bob#test.com
Tom 124 tom.tom#test.com
Jeff {125, 127, 129} jeff.jeff#test.com
Jessica {126, 132} jessica.jessica#test.com
References
Group-Object
about calculated properties

Related

How to use PS calculated properties to collate data into a cell?

I'm stumped on this one. I have a PS custom object with strings only and I want to build a report where I'm outputting strings of data into a new pipeline output object.
$myObjectTable |
Select-Object #{
n = "OldData";
e = {
$_ | Select-Object name, *_old | Format-List | Out-String
}
},
#{
n = "NewData";
e = {
$_ | Select-Object name, *_new | Format-List | Out-String
}
}
Running this produces blank output.
I tried running the code above with just the $_ object in the expressions, but I only got ... as the output. Wrapping the expressions in parenthesis did not change the output.
The ... as property value means that the value is either a multi-line string or it just didn't fit in a tabular format. See Using Format commands to change output view more details.
You can fix those empty lines added by Out-String using Trim. Then if you want to properly display this object having multi-line property values, Format-Table -Wrap will be needed.
Here is a little example:
[pscustomobject]#{
Name = 'foo'
Something_Old = 123
Something_New = 456
} | ForEach-Object {
[pscustomobject]#{
OldData = $_ | Format-List name, *_old | Out-String | ForEach-Object Trim
NewData = $_ | Format-List name, *_new | Out-String | ForEach-Object Trim
}
} | Format-Table -Wrap
Resulting object would become:
OldData NewData
------- -------
Name : foo Name : foo
Something_Old : 123 Something_New : 456

PowerShell: Expression only with Last item of an Array

I've been stuck on this for a little while however I've got an array of People and im trying to get the last person and creating a seperate column with that person only.
I've played around with #{NAME = 'NAME' Expression = {}} in Select-Object but I don't really know how to tackle it.
Current:
| Employee |
|---------------|
| John Doe |
| Jane West |
| Jordan Row |
| Paul Willson |
| Andrew Wright |
Desired Result:
| Employee | Employee2 |
|--------------|---------------|
| John Doe | |
| Jane West | |
| Jordan Row | |
| Paul Willson | Andrew Wright |
TIA!
So what I decided to do here is create 2 groups. One group contains all of the values except the last 2, and the other group contains these last 2 values
# create the sample array
$employees = #(
'John Doe'
'Jane West'
'Jordan Row'
'Paul Willson'
'Andrew Wright'
)
$employees |
# Separate objects into 2 groups: those contained in the last 2 values and those not contained in the last 2 values
Group-Object {$_ -in ($employees | Select-Object -Last 2)} |
ForEach-Object {
switch ($_) {
{$_.name -eq 'False'} { # 'False' Name of group where values are not one of the last 2
# Iterate through all the values and assign them to Employee property. Leave Employee2 property blank
$_.group | ForEach-Object {
[PSCustomObject]#{
Employee = $_
Employee2 = ''
}
}
}
{$_.name -eq 'True'} { # 'True' Name of group where values are those of the last 2
# Create an object that assigns the values to Employee and Employee2
[PSCustomObject]#{
Employee = $_.group[0]
Employee2 = $_.group[1]
}
}
}
}
Output
Employee Employee2
-------- ---------
John Doe
Jane West
Jordan Row
Paul Willson Andrew Wright
Edit
Here is another way you can do it
$employees[0..($employees.Count-3)] | ForEach-Object {
[PSCustomObject]#{
Employee = $_
Employee2 = ''
}
}
[PSCustomObject]#{
Employee = $employees[-2]
Employee2 = $employees[-1]
}

Compare multiple elements in an object against multiple elements in another object of a different array

Say [hypothetically], I have two .CSVs I'm comparing to try and see which of my current members are original members... I wrote a nested ForEach-Object comparing every $name and $memberNumber from each object against every other object. It works fine, but is taking way to long, especially since each CSV has 10s of thousands of objects. Is there another way I should approach this?
Original_Members.csv
Name, Member_Number
Alice, 1234
Jim , 4567
Current_Members.csv
Alice, 4599
Jim, 4567
$currentMembers = import-csv $home\Desktop\current_members.csv |
ForEach-Object {
$name = $_.Name
$memNum = $_."Member Number"
$ogMembers = import-csv $home\Desktop\original_members.csv" |
ForEach-Object {
If ($ogMembers.Name -eq $name -and $ogMembers."Member Number" -eq $memNum) {
$ogMember = "Yes"
}
Else {
$ogMember = "No"
}
}
[pscustomobject]#{
"Name"=$name
"Member Number"=$memNum
"Original Member?"=$ogMember
}
} |
select "Name","Member Number","Original Member?" |
Export-CSV "$home\Desktop\OG_Compare_$(get-date -uformat "%d%b%Y").csv" -Append -NoTypeInformation
Assuming both of your files are like the below:
Original_Members.csv
Name, Member_Number
Alice, 1234
Jim, 4567
Current_Members.csv
Name, Member_Number
Alice, 4599
Jim, 4567
You could store the original member names in a System.Collections.Generic.HashSet<T> for constant time lookups, instead of doing a linear search for each name. We can use System.Linq.Enumerable.ToHashSet to create a hashset of string[] names.
We can then use Where-Object to filter current names by checking if the hashset contains the original name with System.Collections.Generic.HashSet<T>.Contains(T), which is an O(1) method.
$originalMembers = Import-Csv -Path .\Original_Members.csv
$currentMembers = Import-Csv -Path .\Current_Members.csv
$originalMembersLookup = [Linq.Enumerable]::ToHashSet(
[string[]]$originalMembers.Name,
[StringComparer]::CurrentCultureIgnoreCase
)
$currentMembers |
Where-Object {$originalMembersLookup.Contains($_.Name)}
Which will output the current members that were original members:
Name Member_Number
---- -------------
Alice 4599
Jim 4567
Update
As requested in the comments, If we want to check both Name and Member_Number, we can concatenate both strings to use for lookups:
$originalMembers = Import-Csv -Path .\Original_Members.csv
$currentMembers = Import-Csv -Path .\Current_Members.csv
$originalMembersLookup = [Linq.Enumerable]::ToHashSet(
[string[]]($originalMembers |
ForEach-Object {
$_.Name + $_.Member_Number
}),
[StringComparer]::CurrentCultureIgnoreCase
)
$currentMembers |
Where-Object {$originalMembersLookup.Contains($_.Name + $_.Member_Number)}
Which will now only return:
Name Member_Number
---- -------------
Jim 4567

Select-Object -ExcludeProperty based on the property's value

I have an object that has a large amount of properties. I want to return several of these properties, whose names may not always be consistent. I want to EXCLUDE properties that have or contain a particular value.
$notneeded = #('array of properties that I do not wish to select')
$csvPath = "$Log\$Summary"
$csvData = Get-Content -Path $csvPath | Select-Object -Skip 1 | Out-String | ConvertFrom-Csv #the first line is extra (not a header), needs skipped
$csvData | Select-Object -Property * -ExcludeProperty $notneeded
If the list of properties to exclude was static, then I could use this. But I want to exclude properties from view that contain a particular value.
INPUT CSV
John,Doe,120 jefferson st.,Riverside, NJ, 08075
Jack,McGinnis,220 hobo Av.,Phila, PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside, NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
,Blankman,,SomeTown, SD, 00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123
SCRIPT
$csvData = Invoke-WebRequest -Uri "https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv" | ConvertFrom-Csv -Header "Name", "Surname", "Address", "City", "State", "Zip"
$particularValue = "*120*"
$notneeded = #()
$csvData | Foreach-Object { $notneeded += $_.PSObject.Properties | Where-Object Value -like $particularValue | Select-Object Name }
$notneeded = $notneeded | Select-Object -Unique -ExpandProperty Name
$csvData | Select-Object * -ExcludeProperty $notneeded | Format-Table
Note That e.g. I want to exclude column where 120 is mentioned. Also I named columns in my script
OUTPUT (See the address column is missing)
Name Surname City State Zip
---- ------- ---- ----- ---
John Doe Riverside NJ 08075
Jack McGinnis Phila PA 09119
John "Da Man" Repici Riverside NJ 08075
Stephen Tyler SomeTown SD 91234
Blankman SomeTown SD 00298
Joan "the bone", Anne Jet Desert City CO 00123

PowerShell - filtering for unique values

I have an input CSV file with a column containing information similar to the sample below:
805265
995874
805674
984654
332574
339852
I'd like to extract unique values into a array based on the leading two characters, so using the above sample my result would be:
80, 99, 98, 33
How might I achieve this using PowerShell?
Use Select-Object and parameter -unique:
$values =
'805265',
'995874',
'805674',
'984654',
'332574',
'339852'
$values |
Foreach-Object { $_.Substring(0,2) } |
Select-Object -unique
If conversion to int is needed, then just cast it to [int]:
$ints =
$values |
Foreach-Object { [int]$_.Substring(0,2) } |
Select-Object -unique
I'd use the Group-Object cmdlet (alias group) for this:
Import-Csv foo.csv | group {$_.ColumnName.Substring(0, 2)}
Count Name Group
----- ---- -----
2 80 {805265, 805674}
1 99 {995874}
1 98 {984654}
2 33 {332574, 339852}
You might use a hash table:
$values = #(805265, 995874, 805674, 984654, 332574, 339852)
$ht = #{}
$values | foreach {$ht[$_ -replace '^(..).+','$1']++}
$ht.keys
99
98
33
80
You could make a new array with items containing the first two characters and then use Select-Object to give you the unique items like this:
$newArray = #()
$csv = Import-Csv -Path C:\your.csv
$csv | % {
$newArray += $_.YourColumn.Substring(0, 2)
}
$newArray | Select-Object -Unique
Just another option instead of using Select-Object -unique would be to use the Get-Unique cmdlet (or its alias gu; see the detailed description here) as demonstrated below:
$values = #(805265, 995874, 805674, 984654, 332574, 339852)
$values | % { $_.ToString().Substring(0,2) } | Get-Unique
# Or the same using the alias
$values | % { $_.ToString().Substring(0,2) } | gu