Group, get subtotal and output the final result as a table? - powershell

I have following powershell script.
$groups = gc $fn |
Select -Property #{name='G1'; expression={$_.SubString(340, 7)}},
#{name='G2'; expression={$_.SubString(32, 2)}},
#{name='V1'; expression={$_.SubString(420, 8)}},
#{name='V2'; expression={$_.SubString(43, 11)}} |
group G1,G2
$groups | % {
$g = $_.Group | ? { [float]($_.V1) -ne 0 } | measure V1 -Sum #V2???
$_.Name, $g
}
However I want the result as following. How to include the sum of V2? How to generate the object list for the final result?
Name SumOfV1 SumOfV2
G1, G2 3243243 2432432
.....
Workable example:
ps | select -Property #{name='g1'; expression = {$_.Name}},
##{name='g2'; expression = {$_.Id}},
#{name='v1'; expression = {$_.PM}},
#{name='v2'; expression = {$_.WS}} |
group g1 | % {
[pscustomobject] #{
Name = $_.Name
Sum = $_.Group | Measure V1,V2 -Sum | select -ExpandProperty Sum
}
}

Piece of cake, don't even need to see your data to be honest. In your ForEach loop simply create a PSCustomObject to output down the pipe with 3 properties, the Name of the group, the sum of V1, and the sum of V2. It can be done as such:
$groups = gc $fn |
Select -Property #{name='G1'; expression={$_.SubString(340, 7)}},
#{name='G2'; expression={$_.SubString(32, 2)}},
#{name='V1'; expression={$_.SubString(420, 8)}},
#{name='V2'; expression={$_.SubString(43, 11)}} |
group G1,G2
$groups|ForEach{
[pscustomobject]#{
'Name'=$_.name
'V1Sum'=$_.group|measure V1 -sum|select -expand sum
'V2Sum'=$_.group|measure V2 -sum|select -expand sum
}
}
I fabricated data for testing (yay get-random!):
G1 G2 V1 V2
-- -- -- --
Happy Meals 97 20
Happy Meals 71 21
Happy Meals 24 54
Tickle Fight 87 19
Tickle Fight 14 18
Tickle Fight 25 0
Tickle Fight 78 51
This provided the output of:
Name V1Sum V2Sum
---- ----- -----
Happy, Meals 192 95
Tickle, Fight 204 88

Related

Powershell Get-Random with Constraints

I'm currently using the Get-Random function of Powershell to randomly pull a set number of rows from a csv. I need to create a constraint that says if one id is pulled, find the other ids that match it and pull their value.
Here is what I currently have:
$chosenOnes = Import-CSV C:\Temp\pk2.csv | sort{Get-Random} | Select -first 6
$i = 1
$count = $chosenOnes | Group-Object householdID
foreach ($row in $count)
{
if ($row.count -gt 1)
{
$students = $row.Group.Student
foreach ($student in $students)
{
$name = $student.tostring()
#...do something
$i = $i + 1
}
}
else
{
$name = $row.Group.Student
if($i -le 5)
{
#...do something
}
else
{
#...do something
}
$i = $i + 1
}
}
Example dataset
ID,name
165,Ernest Hemingway
1204,Mark Twain
1578,Stephen King
1634,Charles Dickens
1726,George Orwell
7751,John Doe
7751,Tim Doe
In this example, there are 7 rows but I'm randomly selecting 6 in my code. What needs to happen is when ID=7751 then I must return both rows where ID=7751. The IDs cannot not be statically set in the code.
Use Get-Random directly, with -Count, to extract a given number of random elements from a collection.
$allRows = Import-CSV C:\Temp\pk2.csv
$chosenHouseholdIDs = ($allRows | Get-Random -Count 6).householdID
Then filter all rows by whether their householdID column contains one of the 6 randomly selected rows' householdID values (PSv3+ syntax), using the -in array-containment operator:
$allRows | Where-Object householdID -in $chosenHouseholdIDs
Optional reading: performance considerations:
$allRows | Get-Random -Count 6 is not only conceptually simpler, but also much faster than $allRows | Sort-Object { Get-Random } | Select-Object -First 6
Using the Time-Command function to compare the performance of two approaches, using a 1000-row test file with 10 columns yields the following sample timings on my Windows 10 VM in Windows PowerShell - note that the Sort-Object { Get-Random }-based solution is more than 15(!) times slower:
Factor Secs (100-run avg.) Command TimeSpan
------ ------------------- ------- --------
1.00 0.007 $allRows | Get-Random -Count 6 00:00:00.0072520
15.65 0.113 $allRows | Sort-Object { Get-Random } | Select-Object -First 6 00:00:00.1134909
Similarly, a single pass through all rows to find matching IDs via array-containment operator -in performs much better than looping over the randomly selected IDs and searching all rows for each.
I tried sticking with your beginning and came up with this.
$Array = Import-CSV C:\test\StudtentTest.csv
$Array | Sort{Get-Random} | select -first 2 | %{
$id = $_.id
$Array | ?{$_.id -eq $id} | %{
$_
}
}
$Array will be your parsed CSV
We pipe in and sort by random select -first 2 (in this case)
Save the ID of the object into $id and then search the array for that ID and dispaly each that matches
If same ID does match you end up with something like
ID name
-- ----
7751 John Doe
7751 Tim Doe
1634 Charles Dickens

Group and Sum CSV with unknown number of columns

Wondering if someone would be able to help me. Problem is that I'm trying to Import , Group, Sum and the Export a CSV. The problem is that my CSV has a unknown number of columns of the following format.
GroupA,GroupB,GroupC,ValueA,ValueB,ValueC,ValueD...
GroupA, B and C are constant and the fields I want to group by - I know the names of these fields in advance. The problem is there are an unknown number of Value columns - all of which I want to Sum (and don't know the names of in advance.)
I'm comfortable getting this code working if I know the name of the Value fields and have a fixed number of Value Fields. But I'm struggling to get code for unknown names and number of columns.
$csvImport = import-csv 'C:\input.csv'
$csvGrouped = $csvImport | Group-Object -property GroupA,GroupB,GroupC
$csvGroupedFinal = $csvGrouped | Select-Object #{Name = 'GroupA';Expression={$_.Values[0]}},
#{Name = 'GroupB';Expression={$_.Values[1]}},
#{Name = 'GroupC';Expression={$_.Values[2]}},
#{Name = 'ValueA' ;Expression={
($_.Group|Measure-Object 'ValueA' -Sum).Sum
}}
$csvGroupedFinal | Export-Csv 'C:\output.csv' -NoTypeInformation
Example Input Data -
GroupA, GroupB, Value A
Sam, Apple, 10
Sam, Apple, 20
Sam, Orange, 50
Ian, Apple, 15
Output Data -
GroupA, GroupB, Value A
Sam, Apple, 30
Sam, Orange, 50
Ian, Apple, 15
The following script should work. Pay your attention to the $FixedNames variable:
$csvImport = #"
Group A,Group B,Value A
sam,apple,10
sam,apple,20
sam,orange,50
ian,apple,15
"# | ConvertFrom-Csv
$FixedNames = #('Group A', 'Group B', 'Group C')
# $aux = ($csvImport|Get-Member -MemberType NoteProperty).Name ### sorted (wrong)
$aux = ($csvImport[0].psobject.Properties).Name ### not sorted
$auxGrpNames = #( $aux | Where-Object {$_ -in $FixedNames})
$auxValNames = #( $aux | Where-Object {$_ -notin $FixedNames})
$csvGrouped = $csvImport | Group-Object -property $auxGrpNames
$csvGroupedFinal = $csvGrouped |
ForEach-Object {
($_.Name.Replace(', ',','), (($_.Group |
Measure-Object -Property $auxValNames -Sum
).Sum -join ',')) -join ','
} | ConvertFrom-Csv -Header $aux
$csvGroupedFinal
Tested likewise for
$csvImport = #"
Group A,Group B,Value A,Value B
sam,apple,10,1
sam,apple,20,
sam,orange,50,5
ian,apple,15,51
"# | ConvertFrom-Csv
as well as for more complex data of Group A,Group B,Group C,Value A,Value B header.
Edit updated according to the beneficial LotPings' comment.
After importing this script splits the properties (columns) into Groups / Values
It groups dynamically and sums on only value fields independent of the number
The input ordering is maintained with a final Select-Object
## Q:\Test\2019\01\17\SO_54237887.ps1
$csvImport = Import-Csv '.\input.csv'
$Cols = ($csvImport[0].psobject.Properties).Name
# get list of group columns by name and wildcard
$GroupCols = $Cols | Where-Object {$_ -like 'Group*'}
# a different approach would be to select a number of leading columns
# $GroupCols = $Cols[0..1]
$ValueCols = $Cols | Where-Object {$_ -notin $GroupCols}
$OutCols = ,'Groups' + $ValueCols
$csvGrouped = $csvImport | Group-Object $GroupCols | ForEach-Object{
$Props = #{Groups=$_.Name}
ForEach ($ValCol in $ValueCols){
$Props.Add($ValCol,($_.Group|Measure-Object $ValCol -Sum).Sum)
}
[PSCustomObject]$Props
}
$csvGrouped | Select-Object $OutCols
With this sample input file
GroupA GroupB ValueA ValueB
------ ------ ------ ------
Sam Apple 10 15
Sam Apple 20 25
Sam Orange 50 75
Ian Apple 15 20
Sample output for any number of Groups and values
Groups ValueA ValueB
------ ------ ------
Sam, Apple 30 40
Sam, Orange 50 75
Ian, Apple 15 20
Without any change in code it does process data from Hassans answer too:
Groups ValueA ValueB ValueC
------ ------ ------ ------
Sam, Apple 30 4 20
Sam, Orange 50 4 5
Ian, Apple 15 3 3
script1.ps1
Import-Csv 'input.csv' | `
Group-Object -Property GroupA,GroupB | `
% {$b=$_.name -split ', ';$c=($_.group | `
Measure-Object -Property Value* -Sum).Sum;
[PScustomobject]#{GroupA=$b[0];
GroupB=$b[1];
Sum=($c | Measure-Object -Sum).Sum }}
input.csv
GroupA, GroupB, ValueA, ValueB, ValueC
Sam, Apple, 10, 1, 10
Sam, Apple, 20, 3, 10
Sam, Orange, 50, 4, 5
Ian, Apple, 15, 3, 3
OUTPUT
PS D:\coding> .\script1.ps1
GroupA GroupB Sum
------ ------ ---
Sam Apple 54
Sam Orange 59
Ian Apple 21

re-arrange and combine powershell custom objects

I have a system that currently reads data from a CSV file produced by a separate system that is going to be replaced.
The imported CSV file looks like this
PS> Import-Csv .\SalesValues.csv
Sale Values AA BB
----------- -- --
10 6 5
5 3 4
3 1 9
To replace this process I hope to produce an object that looks identical to the CSV above, but I do not want to continue to use a CSV file.
I already have a script that reads data in from our database and extracts the data that I need to use. I'll not detail the fairly long script that preceeds this point but in effect it looks like this:
$SQLData = Custom-SQLFunction "SELECT * FROM SALES_DATA WHERE LIST_ID = $LISTID"
$SQLData will contain ~5000+ DataRow objects that I need to query.
One of those DataRow object looks something like this:
lead_id : 123456789
entry_date : 26/10/2018 16:51:16
modify_date : 01/11/2018 01:00:02
status : WRONG
user : mrexample
vendor_lead_code : TH1S15L0NGC0D3
source_id : A543212
list_id : 333004
list_name : AA Some Text
gmt_offset_now : 0.00
SaleValue : 10
list_name is going to be prefixed with AA or BB.
SaleValue can be any integer 3 and up, however realistically extremely unlikely to be higher than 100 (as this is a monthly donation) and will be one of 3,5,10 in the vast majority of occurrences.
I already have script that takes the content of list_name, creates and populates the data I need to use into two separate psobjects ($AASalesValues and $BBSalesValues) that collates the total numbers of 'SaleValue' across the data set.
Because I cannot reliably anticipate the value of any SaleValue I have to dynamically create the psobjects properties like this
foreach ($record in $SQLData) {
if ($record.list_name -match "BB") {
if ($record.SaleValue -gt 0) {
if ($BBSalesValues | Get-Member -Name $($record.SaleValue) -MemberType Properties) {
$BBSalesValues.$($record.SaleValue) = $BBSalesValues.$($record.SaleValue)+1
} else {
$BBSalesValues | Add-Member -Name $($record.SaleValue) -MemberType NoteProperty -Value 1
}
}
}
}
The two resultant objects look like this:
PS> $AASalesValues
10 5 3 50
-- - - --
17 14 3 1
PS> $BBSalesvalues
3 10 5 4
- -- - -
36 12 11 1
I now have the data that I need, however I need to format it in a way that replicates the format of the CSV so I can pass it directly to another existing powershell script that is configured to expect the data in the format that the CSV is in, but I do not want to write the data to a file.
I'd prefer to pass this directly to the next part of the script.
Ultimately what I want to do is to produce a new object/some output that looks like the output from Import-Csv command at the top of this post.
I'd like a new object, say $OverallSalesValues, to look like this:
PS>$overallSalesValues
Sale Values AA BB
50 1 0
10 17 12
5 14 11
4 0 1
3 3 36
In the above example the values from $AASalesValues is listed under the AA column, the values from $BBSalesValues is listed under the BB column, with the rows matching the headers of the two original objects.
I did try this with hashtables but I was unable to work out how to both create them from dynamic values and format them to how I needed them to look.
Finally got there.
$TotalList = #()
foreach($n in 3..200){
if($AASalesValues.$n -or $BBSalesValues.$n){
$AACount = $AASalesValues.$n
$BBcount = $BBSalesValues.$n
$values = [PSCustomObject]#{
'Sale Value'= $n
AA = $AACount
BB = $BBcount
}
$TotalList += $values
}
}
$TotalList
produces an output of
Sale Value AA BB
---------- -- --
3 3 36
4 2
5 14 11
10 18 12
50 1
Just need to add a bit to include '0' values instead of $null.
I'm going to assume that $record contains a list of the database results for either $AASalesValues or $BBSalesValues, not both, otherwise you'd need some kind of selector to avoid counting records of one group with the other group.
Group the records by their SaleValue property as LotPings suggested:
$BBSalesValues = $record | Group-Object SaleValue -NoElement
That will give you a list of the SaleValue values with their respective count.
PS> $BBSalesValues
Count Name
----- ----
36 3
12 10
11 5
1 4
You can then update your CSV data with these values like this:
$file = 'C:\path\to\data.csv'
# read CSV into a hashtable mapping the sale value to the complete record
# (so that we can lookup the record by sale value)
$csv = #{}
Import-Csv $file | ForEach-Object {
$csv[$_.'Sale Values'] = $_
}
# Add records for missing sale values
$($AASalesValues; $BBSalesValues) | Select-Object -Expand Name -Unique | ForEach-Object {
if (-not $csv.ContainsKey($_)) {
$csv[$_] = New-Object -Type PSObject -Property #{
'Sale Values' = $_
'AA' = 0
'BB' = 0
}
}
}
# update records with values from $AASalesValues
$AASalesValues | ForEach-Object {
[int]$csv[$_.Name].AA += $_.Count
}
# update records with values from $BBSalesValues
$BBSalesValues | ForEach-Object {
[int]$csv[$_.Name].BB += $_.Count
}
# write updated records back to file
$csv.Values | Export-Csv $file -NoType
Even with your updated question the approach would be pretty much the same, you'd just add another level of grouping for collecting the sales numbers:
$sales = #{}
$record | Group-Object {$_.list_name.Split()[0]} | ForEach-Object {
$sales[$_.Name] = $_.Group | Group-Object SaleValue -NoElement
}
and then adjust the merging to something like this:
$file = 'C:\path\to\data.csv'
# read CSV into a hashtable mapping the sale value to the complete record
# (so that we can lookup the record by sale value)
$csv = #{}
Import-Csv $file | ForEach-Object {
$csv[$_.'Sale Values'] = $_
}
# Add records for missing sale values
$sales.Values | Select-Object -Expand Name -Unique | ForEach-Object {
if (-not $csv.ContainsKey($_)) {
$prop = #{'Sale Values' = $_}
$sales.Keys | ForEach-Object {
$prop[$_] = 0
}
$csv[$_] = New-Object -Type PSObject -Property $prop
}
}
# update records with values from $sales
$sales.GetEnumerator() | ForEach-Object {
$name = $_.Key
$_.Value | ForEach-Object {
[int]$csv[$_.Name].$name += $_.Count
}
}
# write updated records back to file
$csv.Values | Export-Csv $file -NoType

Add up the data if the reference from another file is correct

I have two CSV Files which look like this:
test.csv:
"Col1","Col2"
"1111","1"
"1122","2"
"1111","3"
"1121","2"
"1121","2"
"1133","2"
"1133","2"
The second looks like this:
test2.csv:
"Number","signs"
"1111","ABC"
"1122","DEF"
"1111","ABC"
"1121","ABC"
"1133","GHI"
Now the goal is to get a summary of all points from test.csv assigned to the "signs" of test2.csv. Reference are the numbers, as you may see.
Should be something like this:
ABC = 8
DEF = 2
GHI = 4
I have tried to test this out but cannot get the goal. What I have so far is:
$var = "C:\PathToCSV"
$csv1 = Import-Csv "$var\test.csv"
$csv2 = Import-Csv "$var\test2.csv"
# Process: group by 'Item' then sum 'Average' for each group
# and create output objects on the fly
$test1 = $csv1 | Group-Object Col1 | ForEach-Object {
New-Object psobject -Property #{
Col1 = $_.Name
Sum = ($_.Group | Measure-Object Col2 -Sum).Sum
}
}
But this gives me back the following output:
Ps> $test1
Sum Col1
--- ----
4 1111
2 1122
4 1121
4 1133
I am not able to get the summary and the mapping of the signs.
Not sure if I understand your question correctly, but I'm going to assume that for each value from the column "signs" you want to lookup the values from the column "Number" in the second CSV and then calculate the sum of the column "Col2" for all matches.
For that I'd build a hashtable with the pre-calculated sums for the unique values from "Col1":
$h1 = #{}
$csv1 | ForEach-Object {
$h1[$_.Col1] += [int]$_.Col2
}
and then build a second hashtable to sum up the lookup results for the values from the second CSV:
$h2 = #{}
$csv2 | ForEach-Object {
$h2[$_.signs] += $h1[$_.Number]
}
However, that produced a different value for "ABC" than what you stated as the desired result in your question when I processed your sample data:
Name Value
---- -----
ABC 12
GHI 4
DEF 2
Or did you mean you want to sum up the corresponding values for the unique numbers for each sign? For that you'd change the second code snippet to something like this:
$h2 = #{}
$csv2 | Group-Object signs | ForEach-Object {
$name = $_.Name
$_.Group | Select-Object -Unique -Expand Number | ForEach-Object {
$h2[$name] += $h1[$_]
}
}
That would produce the desired result from your question:
Name Value
---- -----
ABC 8
GHI 4
DEF 2

Conditional criteria in powershell group measure-object?

I have data in this shape:
externalName,day,workingHours,hoursAndMinutes
PRJF,1,11,11:00
PRJF,2,11,11:00
PRJF,3,0,0:00
PRJF,4,0,0:00
CFAW,1,11,11:00
CFAW,2,11,11:00
CFAW,3,11,11:00
CFAW,4,11,11:00
CFAW,5,0,0:00
CFAW,6,0,0:00
and so far code is
$gdata = Import-csv $filepath\$filename | Group-Object -Property Externalname;
$test = #()
$test += foreach($rostername in $gdata) {
$rostername.Group | Select -Unique externalName,
#{Name = 'AllDays';Expression = {(($rostername.Group) | measure -Property day).count}},
}
$test;
What I can't work out is how to do a conditional count of the lines where day is non-zero.
The aim is to produce two lines:
PRJF, 4, 2, 11
CFAW, 6, 4, 11
i.e. Roster name, roster length, days on, average hours worked per day on.
You need a where-object to filter for non zero workinghours
I'd use a [PSCustomObject] to generate a new table
EDIT a bit more efficient with only one Measure-Object
## Q:\Test\2018\08\06\SO_51700660.ps1
$filepath = 'Q:\Test\2018\08\06'
$filename = 'SO_S1700660.csv'
$gdata = Import-Csv (Join-Path $filepath $filename) | Group-Object -Property Externalname
$test = ForEach($Roster in $gdata) {
$WH = ($Roster.Group.Workinghours|Where-Object {$_ -ne 0}|Measure-Object -Ave -Sum)
[PSCustomObject]#{
RosterName = $Roster.Name
RosterLength = $Roster.Count
DaysOn = $WH.count
AvgHours = $WH.Average
TotalHours = $WH.Sum
}
}
$test | Format-Table
Sample output:
> .\SO_51700660.ps1
RosterName RosterLength DaysOn AvgHours TotalHours
---------- ------------ ------ -------- ----------
PRJF 4 2 11 22
CFAW 6 4 11 44