Group and Sum CSV with unknown number of columns - powershell

Wondering if someone would be able to help me. Problem is that I'm trying to Import , Group, Sum and the Export a CSV. The problem is that my CSV has a unknown number of columns of the following format.
GroupA,GroupB,GroupC,ValueA,ValueB,ValueC,ValueD...
GroupA, B and C are constant and the fields I want to group by - I know the names of these fields in advance. The problem is there are an unknown number of Value columns - all of which I want to Sum (and don't know the names of in advance.)
I'm comfortable getting this code working if I know the name of the Value fields and have a fixed number of Value Fields. But I'm struggling to get code for unknown names and number of columns.
$csvImport = import-csv 'C:\input.csv'
$csvGrouped = $csvImport | Group-Object -property GroupA,GroupB,GroupC
$csvGroupedFinal = $csvGrouped | Select-Object #{Name = 'GroupA';Expression={$_.Values[0]}},
#{Name = 'GroupB';Expression={$_.Values[1]}},
#{Name = 'GroupC';Expression={$_.Values[2]}},
#{Name = 'ValueA' ;Expression={
($_.Group|Measure-Object 'ValueA' -Sum).Sum
}}
$csvGroupedFinal | Export-Csv 'C:\output.csv' -NoTypeInformation
Example Input Data -
GroupA, GroupB, Value A
Sam, Apple, 10
Sam, Apple, 20
Sam, Orange, 50
Ian, Apple, 15
Output Data -
GroupA, GroupB, Value A
Sam, Apple, 30
Sam, Orange, 50
Ian, Apple, 15

The following script should work. Pay your attention to the $FixedNames variable:
$csvImport = #"
Group A,Group B,Value A
sam,apple,10
sam,apple,20
sam,orange,50
ian,apple,15
"# | ConvertFrom-Csv
$FixedNames = #('Group A', 'Group B', 'Group C')
# $aux = ($csvImport|Get-Member -MemberType NoteProperty).Name ### sorted (wrong)
$aux = ($csvImport[0].psobject.Properties).Name ### not sorted
$auxGrpNames = #( $aux | Where-Object {$_ -in $FixedNames})
$auxValNames = #( $aux | Where-Object {$_ -notin $FixedNames})
$csvGrouped = $csvImport | Group-Object -property $auxGrpNames
$csvGroupedFinal = $csvGrouped |
ForEach-Object {
($_.Name.Replace(', ',','), (($_.Group |
Measure-Object -Property $auxValNames -Sum
).Sum -join ',')) -join ','
} | ConvertFrom-Csv -Header $aux
$csvGroupedFinal
Tested likewise for
$csvImport = #"
Group A,Group B,Value A,Value B
sam,apple,10,1
sam,apple,20,
sam,orange,50,5
ian,apple,15,51
"# | ConvertFrom-Csv
as well as for more complex data of Group A,Group B,Group C,Value A,Value B header.
Edit updated according to the beneficial LotPings' comment.

After importing this script splits the properties (columns) into Groups / Values
It groups dynamically and sums on only value fields independent of the number
The input ordering is maintained with a final Select-Object
## Q:\Test\2019\01\17\SO_54237887.ps1
$csvImport = Import-Csv '.\input.csv'
$Cols = ($csvImport[0].psobject.Properties).Name
# get list of group columns by name and wildcard
$GroupCols = $Cols | Where-Object {$_ -like 'Group*'}
# a different approach would be to select a number of leading columns
# $GroupCols = $Cols[0..1]
$ValueCols = $Cols | Where-Object {$_ -notin $GroupCols}
$OutCols = ,'Groups' + $ValueCols
$csvGrouped = $csvImport | Group-Object $GroupCols | ForEach-Object{
$Props = #{Groups=$_.Name}
ForEach ($ValCol in $ValueCols){
$Props.Add($ValCol,($_.Group|Measure-Object $ValCol -Sum).Sum)
}
[PSCustomObject]$Props
}
$csvGrouped | Select-Object $OutCols
With this sample input file
GroupA GroupB ValueA ValueB
------ ------ ------ ------
Sam Apple 10 15
Sam Apple 20 25
Sam Orange 50 75
Ian Apple 15 20
Sample output for any number of Groups and values
Groups ValueA ValueB
------ ------ ------
Sam, Apple 30 40
Sam, Orange 50 75
Ian, Apple 15 20
Without any change in code it does process data from Hassans answer too:
Groups ValueA ValueB ValueC
------ ------ ------ ------
Sam, Apple 30 4 20
Sam, Orange 50 4 5
Ian, Apple 15 3 3

script1.ps1
Import-Csv 'input.csv' | `
Group-Object -Property GroupA,GroupB | `
% {$b=$_.name -split ', ';$c=($_.group | `
Measure-Object -Property Value* -Sum).Sum;
[PScustomobject]#{GroupA=$b[0];
GroupB=$b[1];
Sum=($c | Measure-Object -Sum).Sum }}
input.csv
GroupA, GroupB, ValueA, ValueB, ValueC
Sam, Apple, 10, 1, 10
Sam, Apple, 20, 3, 10
Sam, Orange, 50, 4, 5
Ian, Apple, 15, 3, 3
OUTPUT
PS D:\coding> .\script1.ps1
GroupA GroupB Sum
------ ------ ---
Sam Apple 54
Sam Orange 59
Ian Apple 21

Related

Add up the data if the reference from another file is correct

I have two CSV Files which look like this:
test.csv:
"Col1","Col2"
"1111","1"
"1122","2"
"1111","3"
"1121","2"
"1121","2"
"1133","2"
"1133","2"
The second looks like this:
test2.csv:
"Number","signs"
"1111","ABC"
"1122","DEF"
"1111","ABC"
"1121","ABC"
"1133","GHI"
Now the goal is to get a summary of all points from test.csv assigned to the "signs" of test2.csv. Reference are the numbers, as you may see.
Should be something like this:
ABC = 8
DEF = 2
GHI = 4
I have tried to test this out but cannot get the goal. What I have so far is:
$var = "C:\PathToCSV"
$csv1 = Import-Csv "$var\test.csv"
$csv2 = Import-Csv "$var\test2.csv"
# Process: group by 'Item' then sum 'Average' for each group
# and create output objects on the fly
$test1 = $csv1 | Group-Object Col1 | ForEach-Object {
New-Object psobject -Property #{
Col1 = $_.Name
Sum = ($_.Group | Measure-Object Col2 -Sum).Sum
}
}
But this gives me back the following output:
Ps> $test1
Sum Col1
--- ----
4 1111
2 1122
4 1121
4 1133
I am not able to get the summary and the mapping of the signs.
Not sure if I understand your question correctly, but I'm going to assume that for each value from the column "signs" you want to lookup the values from the column "Number" in the second CSV and then calculate the sum of the column "Col2" for all matches.
For that I'd build a hashtable with the pre-calculated sums for the unique values from "Col1":
$h1 = #{}
$csv1 | ForEach-Object {
$h1[$_.Col1] += [int]$_.Col2
}
and then build a second hashtable to sum up the lookup results for the values from the second CSV:
$h2 = #{}
$csv2 | ForEach-Object {
$h2[$_.signs] += $h1[$_.Number]
}
However, that produced a different value for "ABC" than what you stated as the desired result in your question when I processed your sample data:
Name Value
---- -----
ABC 12
GHI 4
DEF 2
Or did you mean you want to sum up the corresponding values for the unique numbers for each sign? For that you'd change the second code snippet to something like this:
$h2 = #{}
$csv2 | Group-Object signs | ForEach-Object {
$name = $_.Name
$_.Group | Select-Object -Unique -Expand Number | ForEach-Object {
$h2[$name] += $h1[$_]
}
}
That would produce the desired result from your question:
Name Value
---- -----
ABC 8
GHI 4
DEF 2

powershell compare two files and list their columns with side indicator as match/mismatch

I have seen powershell script which also I have in mind. What I would like to add though is another column which would show the side indicator comparators ("==", "<=", "=>") and be named them as MATCH(if "==") and MISMATCH(if "<=" and "=>").
Any advise on how I would do this?
Here is the link of the script (Credits to Florent Courtay)
How can i reorganise powershell's compare-object output?
$a = Compare-Object (Import-Csv 'C:\temp\f1.csv') (Import-Csv 'C:\temp\f2.csv') -property Header,Value
$a | Group-Object -Property Header | % { New-Object -TypeName psobject -Property #{Header=$_.name;newValue=$_.group[0].Value;oldValue=$_.group[1].Value}}
========================================================================
The output I have in mind:
Header1 Old Value New Value STATUS
------ --------- --------- -----------
String1 Value 1 Value 2 MATCH
String2 Value 3 Value 4 MATCH
String3 NA Value 5 MISMATCH
String4 Value 6 NA MISMATCH
Here's a self-contained solution; simply replace the ConvertFrom-Csv calls with your Import-Csv calls:
# Sample CSV input.
$csv1 = #'
Header,Value
a,1
b,2
c,3
'#
$csv2 = #'
Header,Value
a,1a
b,2
d,4
'#
Compare-Object (ConvertFrom-Csv $csv1) (ConvertFrom-Csv $csv2) -Property Header, Value |
Group-Object Header | Sort-Object Name | ForEach-Object {
$newValIndex, $oldValIndex = ((1, 0), (0, 1))[$_.Group[0].SideIndicator -eq '=>']
[pscustomobject] #{
Header = $_.Name
OldValue = ('NA', $_.Group[$oldValIndex].Value)[$null -ne $_.Group[$oldValIndex].Value]
NewValue = ('NA', $_.Group[$newValIndex].Value)[$null -ne $_.Group[$newValIndex].Value]
Status = ('MISMATCH', 'MATCH')[$_.Group.Count -gt 1]
}
}
The above yields:
Header OldValue NewValue Status
------ -------- -------- ------
a 1 1a MATCH
c 3 NA MISMATCH
d NA 4 MISMATCH
Note:
The assumption is that a given Header column value appears at most once in each input file.
The Sort-Object Name call is needed to sort the output by Header valuesThanks, LotPings.
, because, due to how Compare-Object orders its output (right-side-only items first), the order of groups created by Group-Object would not automatically reflect the 1st CSV's order of header values (d would appear before c).

Conditional criteria in powershell group measure-object?

I have data in this shape:
externalName,day,workingHours,hoursAndMinutes
PRJF,1,11,11:00
PRJF,2,11,11:00
PRJF,3,0,0:00
PRJF,4,0,0:00
CFAW,1,11,11:00
CFAW,2,11,11:00
CFAW,3,11,11:00
CFAW,4,11,11:00
CFAW,5,0,0:00
CFAW,6,0,0:00
and so far code is
$gdata = Import-csv $filepath\$filename | Group-Object -Property Externalname;
$test = #()
$test += foreach($rostername in $gdata) {
$rostername.Group | Select -Unique externalName,
#{Name = 'AllDays';Expression = {(($rostername.Group) | measure -Property day).count}},
}
$test;
What I can't work out is how to do a conditional count of the lines where day is non-zero.
The aim is to produce two lines:
PRJF, 4, 2, 11
CFAW, 6, 4, 11
i.e. Roster name, roster length, days on, average hours worked per day on.
You need a where-object to filter for non zero workinghours
I'd use a [PSCustomObject] to generate a new table
EDIT a bit more efficient with only one Measure-Object
## Q:\Test\2018\08\06\SO_51700660.ps1
$filepath = 'Q:\Test\2018\08\06'
$filename = 'SO_S1700660.csv'
$gdata = Import-Csv (Join-Path $filepath $filename) | Group-Object -Property Externalname
$test = ForEach($Roster in $gdata) {
$WH = ($Roster.Group.Workinghours|Where-Object {$_ -ne 0}|Measure-Object -Ave -Sum)
[PSCustomObject]#{
RosterName = $Roster.Name
RosterLength = $Roster.Count
DaysOn = $WH.count
AvgHours = $WH.Average
TotalHours = $WH.Sum
}
}
$test | Format-Table
Sample output:
> .\SO_51700660.ps1
RosterName RosterLength DaysOn AvgHours TotalHours
---------- ------------ ------ -------- ----------
PRJF 4 2 11 22
CFAW 6 4 11 44

Group, get subtotal and output the final result as a table?

I have following powershell script.
$groups = gc $fn |
Select -Property #{name='G1'; expression={$_.SubString(340, 7)}},
#{name='G2'; expression={$_.SubString(32, 2)}},
#{name='V1'; expression={$_.SubString(420, 8)}},
#{name='V2'; expression={$_.SubString(43, 11)}} |
group G1,G2
$groups | % {
$g = $_.Group | ? { [float]($_.V1) -ne 0 } | measure V1 -Sum #V2???
$_.Name, $g
}
However I want the result as following. How to include the sum of V2? How to generate the object list for the final result?
Name SumOfV1 SumOfV2
G1, G2 3243243 2432432
.....
Workable example:
ps | select -Property #{name='g1'; expression = {$_.Name}},
##{name='g2'; expression = {$_.Id}},
#{name='v1'; expression = {$_.PM}},
#{name='v2'; expression = {$_.WS}} |
group g1 | % {
[pscustomobject] #{
Name = $_.Name
Sum = $_.Group | Measure V1,V2 -Sum | select -ExpandProperty Sum
}
}
Piece of cake, don't even need to see your data to be honest. In your ForEach loop simply create a PSCustomObject to output down the pipe with 3 properties, the Name of the group, the sum of V1, and the sum of V2. It can be done as such:
$groups = gc $fn |
Select -Property #{name='G1'; expression={$_.SubString(340, 7)}},
#{name='G2'; expression={$_.SubString(32, 2)}},
#{name='V1'; expression={$_.SubString(420, 8)}},
#{name='V2'; expression={$_.SubString(43, 11)}} |
group G1,G2
$groups|ForEach{
[pscustomobject]#{
'Name'=$_.name
'V1Sum'=$_.group|measure V1 -sum|select -expand sum
'V2Sum'=$_.group|measure V2 -sum|select -expand sum
}
}
I fabricated data for testing (yay get-random!):
G1 G2 V1 V2
-- -- -- --
Happy Meals 97 20
Happy Meals 71 21
Happy Meals 24 54
Tickle Fight 87 19
Tickle Fight 14 18
Tickle Fight 25 0
Tickle Fight 78 51
This provided the output of:
Name V1Sum V2Sum
---- ----- -----
Happy, Meals 192 95
Tickle, Fight 204 88

Count the comma in each line and show the line numbers in a text file

I'm using the following script to get the comma counts.
Get-Content .\myFile |
% { ($_ | Select-String `, -all).matches | measure | select count } |
group -Property count
It returns,
Count Name Group
----- ---- -----
131 85 {#{Count=85}, #{Count=85}, #{Count=85}, #{Count=85}...}
3 86 {#{Count=86}, #{Count=86}, #{Count=86}}
Can I show the line number in the Group column instead of #{Count=86}, ...?
The files will have a lot of lines and majority of the lines have the same comma. I want to group them so the output lines will be smaller
Can you use something like this?
$s = #"
this,is,a
test,,
with,
multiple, commas, to, count,
"#
#convert to string-array(like you normally have with multiline strings)
$s = $s -split "`n"
$s | Select-String `, -AllMatches | Select-Object LineNumber, #{n="Count"; e={$_.Matches.Count}} | Group-Object Count
Count Name Group
----- ---- -----
2 2 {#{LineNumber=1; Count=2}, #{LineNumber=2; Count=2}}
1 1 {#{LineNumber=3; Count=1}}
1 4 {#{LineNumber=4; Count=4}}
If you don't want the "count" property multiple times in the group, you need custom objects. Like this:
$s | Select-String `, -AllMatches | Select-Object LineNumber, #{n="Count"; e={$_.Matches.Count}} | Group-Object Count | % {
New-Object psobject -Property #{
"Count" = $_.Name
"LineNumbers" = ($_.Group | Select-Object -ExpandProperty LineNumber)
}
}
Output:
Count LineNumbers
----- -----------
2 {1, 2}
1 3
4 4