How to find duplicate values in powershell hash - powershell

Imagine the following hash:
$h=#{}
$h.Add(1,'a')
$h.Add(2,'b')
$h.Add(3,'c')
$h.Add(4,'d')
$h.Add(5,'a')
$h.Add(6,'c')
What query would return the 2 duplicate values 'a' and 'c' ?
Basically I am looking for the powershell equivalent of the following SQL query (assuming the table h(c1,c2):
select c1
from h
group by c1
having count(*) > 1

You could try this:
$h.GetEnumerator() | Group-Object Value | ? { $_.Count -gt 1 }
Count Name Group
----- ---- -----
2 c {System.Collections.DictionaryEntry, System.Collections.DictionaryEntry}
2 a {System.Collections.DictionaryEntry, System.Collections.DictionaryEntry}
If you store the results, you could dig into the group to get the key-name for the duplicate entries. Ex.
$a = $h.GetEnumerator() | Group-Object Value | ? { $_.Count -gt 1 }
#Check the first group(the one with 'c' as value)
$a[0].Group
Name Value
---- -----
6 c
3 c

You can use another hash table:
$h=#{}
$h.Add(1,'a')
$h.Add(2,'b')
$h.Add(3,'c')
$h.Add(4,'d')
$h.Add(5,'a')
$h.Add(6,'c')
$h1=#{}
$h.GetEnumerator() | foreach { $h1[$_.Value] += #($_.name) }
$h1.GetEnumerator() | where { $_.value.count -gt 1}
Name Value
---- -----
c {6, 3}
a {5, 1}

Just a slightly different question:
How to list the duplicate items of a PowerShell Array
But a similar solution as from Frode F:
$Duplicates = $Array | Group | ? {$_.Count -gt 1} | Select -ExpandProperty Name

Related

Powershell Get Hashtable Values

I've got this Hashtable with this values:
Name Value
---- -----
Bayas_palm_stem_A0311.jpg 1
Bayas_palm_stem_A0312.jpg 2
Bukit_Bangkong_area.tiff 1
BY_and_siblings_A0259.jpg 5
Cassava_camp_A0275.jpg 1
Children_A0115.jpg 6
cip_barau_kubak.jpg 1
How can i get only the Name and Value where Value is greater than 1 ?
I'm tryng with this code, but i'm doing something wrong!!!
$RT | Group-Object Name , Value | Where-Object {$RT.Values -gt 1}
Thanks a lot for any help.
Use $hashtable.GetEnumerator() to enumerate the individual name-value pairs in the hashtable:
$RT.GetEnumerator() |Where-Object {$_.Value -gt 1}
Beware that if you assign the resulting pairs to a variable, it's not longer a hashtable - it's just an array of individual name-value pairs.
To create a new hashtable with only the name-value pairs that filter through, do:
$RTFiltered = #{}
$RT.GetEnumerator() |Where-Object {$_.Value -gt 1} |ForEach-Object {$RTFiltered.Add($_.Name, $_.Value)}
Like this:
$RT.getenumerator() | Where-Object {$_.Value -gt 1}
Assuming you have a Hashtable defined like this.
$rt = #{
"Bayas_palm_stem_A0311.jpg " = 1
"Bayas_palm_stem_A0312.jpg " = 2
"Bukit_Bangkong_area.tiff " = 1
"BY_and_siblings_A0259.jpg " = 5
"Cassava_camp_A0275.jpg " = 1
"Children_A0115.jpg " = 6
"cip_barau_kubak.jpg " = 1
}
You can call GetEnumerator() which allows you to iterate through the Hashtable.
Once you've got an enumeration of the members, then your normal PowerShell value comparisons will work.
You can get values greater than 1 like this:
$rt.GetEnumerator() | ? Value -gt 1
Name Value
---- -----
BY_and_siblings_A0259.jpg 5
Children_A0115.jpg 6
Bayas_palm_stem_A0312.jpg 2

Add up the data if the reference from another file is correct

I have two CSV Files which look like this:
test.csv:
"Col1","Col2"
"1111","1"
"1122","2"
"1111","3"
"1121","2"
"1121","2"
"1133","2"
"1133","2"
The second looks like this:
test2.csv:
"Number","signs"
"1111","ABC"
"1122","DEF"
"1111","ABC"
"1121","ABC"
"1133","GHI"
Now the goal is to get a summary of all points from test.csv assigned to the "signs" of test2.csv. Reference are the numbers, as you may see.
Should be something like this:
ABC = 8
DEF = 2
GHI = 4
I have tried to test this out but cannot get the goal. What I have so far is:
$var = "C:\PathToCSV"
$csv1 = Import-Csv "$var\test.csv"
$csv2 = Import-Csv "$var\test2.csv"
# Process: group by 'Item' then sum 'Average' for each group
# and create output objects on the fly
$test1 = $csv1 | Group-Object Col1 | ForEach-Object {
New-Object psobject -Property #{
Col1 = $_.Name
Sum = ($_.Group | Measure-Object Col2 -Sum).Sum
}
}
But this gives me back the following output:
Ps> $test1
Sum Col1
--- ----
4 1111
2 1122
4 1121
4 1133
I am not able to get the summary and the mapping of the signs.
Not sure if I understand your question correctly, but I'm going to assume that for each value from the column "signs" you want to lookup the values from the column "Number" in the second CSV and then calculate the sum of the column "Col2" for all matches.
For that I'd build a hashtable with the pre-calculated sums for the unique values from "Col1":
$h1 = #{}
$csv1 | ForEach-Object {
$h1[$_.Col1] += [int]$_.Col2
}
and then build a second hashtable to sum up the lookup results for the values from the second CSV:
$h2 = #{}
$csv2 | ForEach-Object {
$h2[$_.signs] += $h1[$_.Number]
}
However, that produced a different value for "ABC" than what you stated as the desired result in your question when I processed your sample data:
Name Value
---- -----
ABC 12
GHI 4
DEF 2
Or did you mean you want to sum up the corresponding values for the unique numbers for each sign? For that you'd change the second code snippet to something like this:
$h2 = #{}
$csv2 | Group-Object signs | ForEach-Object {
$name = $_.Name
$_.Group | Select-Object -Unique -Expand Number | ForEach-Object {
$h2[$name] += $h1[$_]
}
}
That would produce the desired result from your question:
Name Value
---- -----
ABC 8
GHI 4
DEF 2

How to get the value only from the Hashtable in PowerShell?

If I have a hastable $states = #{ 1 = 15; 2 = 5; 3 = 41 }, The result shows
Name Value
---- -----
3 41
2 5
1 15
I used $states.GetEnumerator() | sort value -Descending | select -Last 1 to find the minimum value that I need.
The result is:
Name Value
---- -----
2 5
However, I cannot use the value (5) as a new variable to do a calculation. This is due to the result cotains both name and value. Is there any method to get the minimum value only from the result?
Use the .Values property from the beginning:
$states.Values | Sort-Object -Descending | Select-Object -Last 1
Or expand the .Value property:
$states.GetEnumerator() | sort value -Descending | select -Last 1 -ExpandProperty Value

Count the comma in each line and show the line numbers in a text file

I'm using the following script to get the comma counts.
Get-Content .\myFile |
% { ($_ | Select-String `, -all).matches | measure | select count } |
group -Property count
It returns,
Count Name Group
----- ---- -----
131 85 {#{Count=85}, #{Count=85}, #{Count=85}, #{Count=85}...}
3 86 {#{Count=86}, #{Count=86}, #{Count=86}}
Can I show the line number in the Group column instead of #{Count=86}, ...?
The files will have a lot of lines and majority of the lines have the same comma. I want to group them so the output lines will be smaller
Can you use something like this?
$s = #"
this,is,a
test,,
with,
multiple, commas, to, count,
"#
#convert to string-array(like you normally have with multiline strings)
$s = $s -split "`n"
$s | Select-String `, -AllMatches | Select-Object LineNumber, #{n="Count"; e={$_.Matches.Count}} | Group-Object Count
Count Name Group
----- ---- -----
2 2 {#{LineNumber=1; Count=2}, #{LineNumber=2; Count=2}}
1 1 {#{LineNumber=3; Count=1}}
1 4 {#{LineNumber=4; Count=4}}
If you don't want the "count" property multiple times in the group, you need custom objects. Like this:
$s | Select-String `, -AllMatches | Select-Object LineNumber, #{n="Count"; e={$_.Matches.Count}} | Group-Object Count | % {
New-Object psobject -Property #{
"Count" = $_.Name
"LineNumbers" = ($_.Group | Select-Object -ExpandProperty LineNumber)
}
}
Output:
Count LineNumbers
----- -----------
2 {1, 2}
1 3
4 4

PS: Filter selected rows with only max values as output?

I have a variable results ($result) of several rows of data or object like this:
PS> $result | ft -auto;
name value
---- -----
a 1
a 2
b 30
b 20
....
what I need to get all the rows of name and max(value) like this filtered output:
PS> $result | ? |ft -auto
name value
---- -----
a 2
b 30
....
Not sure what command or filters available (as ? in above) so that I can get each name and only the max value for the name out?
$result | group name | select name,#{n='value';e={ ($_.group | measure value -max).maximum}}
This should do the trick:
PS> $result | Foreach {$ht=#{}} `
{if ($_.Value -gt $ht[$_.name].Value) {$ht[$_.Name]=$_}} `
{$ht.Values}
This is essentially using the Begin/Process/End scriptblock parameters of the Foreach-Object cmdlet to stash input objects with a max value based on a key into a hashtable.
Note: watch out for extra spaces after the line continuation character (`) - there shouldn't be any.