Powershell - counting the number of times words from a list are in a csv column - zero value important - powershell

I am trying to count the number of times a list of words are in a csv column where a zero value is important.
This code only returns a value if the word is in the csv column and I also want a 0 if the word is not in the column
Import-Csv C:\Users\Work_PC\Documents\TAT\July\Stock.csv -Header Animal | where {$_.Animal -in $searchTerms} | Group-Object Animal -NoElement
Examples stock.csv
Animal,Someothervalue,anothervalue,
Cow,1,2,
Sheep,1,3
Pig,1,4
Cow,1,2,
Sheep,1,3
Pig,1,4
Cow,1,2,
Cow,1,2,
Sheep,1,3
Pig,1,4
Cow,1,2,
example $searchterms
Cow
Sheep
Pig
Horse
Donkey
Using the above code returns this value
Count Name
----- ----
5 Cow
3 Sheep
3 Pig
I would like it to return
Count Name
----- ----
5 Cow
3 Sheep
3 Pig
0 Horse
0 Donkey
Thanks

Group-Object will not list items it is unaware of.
Personally I would use the list of search terms to create objects and assign values, this also allows for additional data to be considered if required:
$SearchTerms = "Cow","Sheep","Pig","Horse","Donkey"
$GrpObjResults = Import-Csv C:\Users\Work_PC\Documents\TAT\July\Stock.csv -Header Animal | where {$_.Animal -in $searchTerms} | Group-Object Animal -NoElement
$ObjProps = #{
Name = [String]
Count = [int]
}
$Results = #()
Foreach($Term in $SearchTerms){
$ListEntry = New-Object -TypeName PSObject -Property $ObjProps
$ListEntry.Name = $Term
$ListEntry.Count = ($GrpObjResults | Where {$_.Name -eq $Term}).Count
$Results += $ListEntry
}
$Results
This will give the desired output of:
Count Name
----- ----
5 Cow
3 Sheep
3 Pig
0 Horse
0 Donkey

It looks from your example, the CSV file already has headers, so you could simply iterate the search terms and do a Select-Object to get what you seek:
$searchTerms = 'Cow','Sheep','Pig','Horse','Donkey'
$csv = Import-Csv 'C:\Users\Work_PC\Documents\TAT\July\Stock.csv'
foreach ($term in $searchTerms) {
'' | Select-Object #{Name = 'Count'; Expression = {($csv | Where-Object {$_.Animal -eq $term}).Count}},
#{Name = 'Name'; Expression = {$term}}
}
Output:
Count Name
----- ----
5 Cow
3 Sheep
3 Pig
0 Horse
0 Donkey

Related

Powershell Get Hashtable Values

I've got this Hashtable with this values:
Name Value
---- -----
Bayas_palm_stem_A0311.jpg 1
Bayas_palm_stem_A0312.jpg 2
Bukit_Bangkong_area.tiff 1
BY_and_siblings_A0259.jpg 5
Cassava_camp_A0275.jpg 1
Children_A0115.jpg 6
cip_barau_kubak.jpg 1
How can i get only the Name and Value where Value is greater than 1 ?
I'm tryng with this code, but i'm doing something wrong!!!
$RT | Group-Object Name , Value | Where-Object {$RT.Values -gt 1}
Thanks a lot for any help.
Use $hashtable.GetEnumerator() to enumerate the individual name-value pairs in the hashtable:
$RT.GetEnumerator() |Where-Object {$_.Value -gt 1}
Beware that if you assign the resulting pairs to a variable, it's not longer a hashtable - it's just an array of individual name-value pairs.
To create a new hashtable with only the name-value pairs that filter through, do:
$RTFiltered = #{}
$RT.GetEnumerator() |Where-Object {$_.Value -gt 1} |ForEach-Object {$RTFiltered.Add($_.Name, $_.Value)}
Like this:
$RT.getenumerator() | Where-Object {$_.Value -gt 1}
Assuming you have a Hashtable defined like this.
$rt = #{
"Bayas_palm_stem_A0311.jpg " = 1
"Bayas_palm_stem_A0312.jpg " = 2
"Bukit_Bangkong_area.tiff " = 1
"BY_and_siblings_A0259.jpg " = 5
"Cassava_camp_A0275.jpg " = 1
"Children_A0115.jpg " = 6
"cip_barau_kubak.jpg " = 1
}
You can call GetEnumerator() which allows you to iterate through the Hashtable.
Once you've got an enumeration of the members, then your normal PowerShell value comparisons will work.
You can get values greater than 1 like this:
$rt.GetEnumerator() | ? Value -gt 1
Name Value
---- -----
BY_and_siblings_A0259.jpg 5
Children_A0115.jpg 6
Bayas_palm_stem_A0312.jpg 2

Powershell Get-Random with Constraints

I'm currently using the Get-Random function of Powershell to randomly pull a set number of rows from a csv. I need to create a constraint that says if one id is pulled, find the other ids that match it and pull their value.
Here is what I currently have:
$chosenOnes = Import-CSV C:\Temp\pk2.csv | sort{Get-Random} | Select -first 6
$i = 1
$count = $chosenOnes | Group-Object householdID
foreach ($row in $count)
{
if ($row.count -gt 1)
{
$students = $row.Group.Student
foreach ($student in $students)
{
$name = $student.tostring()
#...do something
$i = $i + 1
}
}
else
{
$name = $row.Group.Student
if($i -le 5)
{
#...do something
}
else
{
#...do something
}
$i = $i + 1
}
}
Example dataset
ID,name
165,Ernest Hemingway
1204,Mark Twain
1578,Stephen King
1634,Charles Dickens
1726,George Orwell
7751,John Doe
7751,Tim Doe
In this example, there are 7 rows but I'm randomly selecting 6 in my code. What needs to happen is when ID=7751 then I must return both rows where ID=7751. The IDs cannot not be statically set in the code.
Use Get-Random directly, with -Count, to extract a given number of random elements from a collection.
$allRows = Import-CSV C:\Temp\pk2.csv
$chosenHouseholdIDs = ($allRows | Get-Random -Count 6).householdID
Then filter all rows by whether their householdID column contains one of the 6 randomly selected rows' householdID values (PSv3+ syntax), using the -in array-containment operator:
$allRows | Where-Object householdID -in $chosenHouseholdIDs
Optional reading: performance considerations:
$allRows | Get-Random -Count 6 is not only conceptually simpler, but also much faster than $allRows | Sort-Object { Get-Random } | Select-Object -First 6
Using the Time-Command function to compare the performance of two approaches, using a 1000-row test file with 10 columns yields the following sample timings on my Windows 10 VM in Windows PowerShell - note that the Sort-Object { Get-Random }-based solution is more than 15(!) times slower:
Factor Secs (100-run avg.) Command TimeSpan
------ ------------------- ------- --------
1.00 0.007 $allRows | Get-Random -Count 6 00:00:00.0072520
15.65 0.113 $allRows | Sort-Object { Get-Random } | Select-Object -First 6 00:00:00.1134909
Similarly, a single pass through all rows to find matching IDs via array-containment operator -in performs much better than looping over the randomly selected IDs and searching all rows for each.
I tried sticking with your beginning and came up with this.
$Array = Import-CSV C:\test\StudtentTest.csv
$Array | Sort{Get-Random} | select -first 2 | %{
$id = $_.id
$Array | ?{$_.id -eq $id} | %{
$_
}
}
$Array will be your parsed CSV
We pipe in and sort by random select -first 2 (in this case)
Save the ID of the object into $id and then search the array for that ID and dispaly each that matches
If same ID does match you end up with something like
ID name
-- ----
7751 John Doe
7751 Tim Doe
1634 Charles Dickens

Add up the data if the reference from another file is correct

I have two CSV Files which look like this:
test.csv:
"Col1","Col2"
"1111","1"
"1122","2"
"1111","3"
"1121","2"
"1121","2"
"1133","2"
"1133","2"
The second looks like this:
test2.csv:
"Number","signs"
"1111","ABC"
"1122","DEF"
"1111","ABC"
"1121","ABC"
"1133","GHI"
Now the goal is to get a summary of all points from test.csv assigned to the "signs" of test2.csv. Reference are the numbers, as you may see.
Should be something like this:
ABC = 8
DEF = 2
GHI = 4
I have tried to test this out but cannot get the goal. What I have so far is:
$var = "C:\PathToCSV"
$csv1 = Import-Csv "$var\test.csv"
$csv2 = Import-Csv "$var\test2.csv"
# Process: group by 'Item' then sum 'Average' for each group
# and create output objects on the fly
$test1 = $csv1 | Group-Object Col1 | ForEach-Object {
New-Object psobject -Property #{
Col1 = $_.Name
Sum = ($_.Group | Measure-Object Col2 -Sum).Sum
}
}
But this gives me back the following output:
Ps> $test1
Sum Col1
--- ----
4 1111
2 1122
4 1121
4 1133
I am not able to get the summary and the mapping of the signs.
Not sure if I understand your question correctly, but I'm going to assume that for each value from the column "signs" you want to lookup the values from the column "Number" in the second CSV and then calculate the sum of the column "Col2" for all matches.
For that I'd build a hashtable with the pre-calculated sums for the unique values from "Col1":
$h1 = #{}
$csv1 | ForEach-Object {
$h1[$_.Col1] += [int]$_.Col2
}
and then build a second hashtable to sum up the lookup results for the values from the second CSV:
$h2 = #{}
$csv2 | ForEach-Object {
$h2[$_.signs] += $h1[$_.Number]
}
However, that produced a different value for "ABC" than what you stated as the desired result in your question when I processed your sample data:
Name Value
---- -----
ABC 12
GHI 4
DEF 2
Or did you mean you want to sum up the corresponding values for the unique numbers for each sign? For that you'd change the second code snippet to something like this:
$h2 = #{}
$csv2 | Group-Object signs | ForEach-Object {
$name = $_.Name
$_.Group | Select-Object -Unique -Expand Number | ForEach-Object {
$h2[$name] += $h1[$_]
}
}
That would produce the desired result from your question:
Name Value
---- -----
ABC 8
GHI 4
DEF 2

powershell compare two files and list their columns with side indicator as match/mismatch

I have seen powershell script which also I have in mind. What I would like to add though is another column which would show the side indicator comparators ("==", "<=", "=>") and be named them as MATCH(if "==") and MISMATCH(if "<=" and "=>").
Any advise on how I would do this?
Here is the link of the script (Credits to Florent Courtay)
How can i reorganise powershell's compare-object output?
$a = Compare-Object (Import-Csv 'C:\temp\f1.csv') (Import-Csv 'C:\temp\f2.csv') -property Header,Value
$a | Group-Object -Property Header | % { New-Object -TypeName psobject -Property #{Header=$_.name;newValue=$_.group[0].Value;oldValue=$_.group[1].Value}}
========================================================================
The output I have in mind:
Header1 Old Value New Value STATUS
------ --------- --------- -----------
String1 Value 1 Value 2 MATCH
String2 Value 3 Value 4 MATCH
String3 NA Value 5 MISMATCH
String4 Value 6 NA MISMATCH
Here's a self-contained solution; simply replace the ConvertFrom-Csv calls with your Import-Csv calls:
# Sample CSV input.
$csv1 = #'
Header,Value
a,1
b,2
c,3
'#
$csv2 = #'
Header,Value
a,1a
b,2
d,4
'#
Compare-Object (ConvertFrom-Csv $csv1) (ConvertFrom-Csv $csv2) -Property Header, Value |
Group-Object Header | Sort-Object Name | ForEach-Object {
$newValIndex, $oldValIndex = ((1, 0), (0, 1))[$_.Group[0].SideIndicator -eq '=>']
[pscustomobject] #{
Header = $_.Name
OldValue = ('NA', $_.Group[$oldValIndex].Value)[$null -ne $_.Group[$oldValIndex].Value]
NewValue = ('NA', $_.Group[$newValIndex].Value)[$null -ne $_.Group[$newValIndex].Value]
Status = ('MISMATCH', 'MATCH')[$_.Group.Count -gt 1]
}
}
The above yields:
Header OldValue NewValue Status
------ -------- -------- ------
a 1 1a MATCH
c 3 NA MISMATCH
d NA 4 MISMATCH
Note:
The assumption is that a given Header column value appears at most once in each input file.
The Sort-Object Name call is needed to sort the output by Header valuesThanks, LotPings.
, because, due to how Compare-Object orders its output (right-side-only items first), the order of groups created by Group-Object would not automatically reflect the 1st CSV's order of header values (d would appear before c).

Pivot an array of array?

The following code defined a matrix.
$a = #('a','b','x',10),
#('a','b','y',20),
#('c','e','x',50),
#('c','e','y',30)
$a | % { "[$_]"}
I want to pivot the array by x and y. The expected result array should be
[a b 10 20]
[c e 50 30]
- - -- --
x y
I think it needs group-object and then mapping. How to use group-object on an array?
(BTW, why the question has been down voted twice?)
You can't use Group-Object with an array (at least not the way you want) since Group-Object works on object properties. A workaround is to organize your rows into a label that you want to group on, followed by the values to assign to the group. Then you can group on the label:
$a | %{
new-object PsObject -prop #{"label" = "$($_[0]),$($_[1])"; value=#{ $_[2]=$_[3]}}
} | Group-Object label
So, then you have a group with your entries stroed as an array of hashtables within each group:
Count Name Group
----- ---- -----
2 a,b {#{value=System.Collections.Hashtable; label=a,b}, #{value=System.Collections.Hashtable; label=a,b}}
2 c,e {#{value=System.Collections.Hashtable; label=c,e}, #{value=System.Collections.Hashtable; label=c,e}}
You can then expand out each row to get the info you desire:
$a | %{
new-object PsObject -prop #{"label" = "$($_[0]),$($_[1])"; value=#{ $_[2]=$_[3]}}
} |
group label | % {
"[$(#($_.Name -split ",") + #($_.Group.value.values))]"
}
which gives:
[a b 10 20]
[c e 50 30]
To answer your second comment, no the above won;t guarantee the order. To guarantee it, you'll have to be explicit:
$a | %{
new-object PsObject -prop #{"label" = "$($_[0]),$($_[1])"; value=#{ $_[2]=$_[3]}}
} |
group label | % {
"[$(#($_.Name -split ",") + #($_.Group.value.x, $_.Group.value.y))]"
}