Sum various columns to get subtotal depending on a criteria from a row using Powershell - powershell

I have a csv file, that contains the next data:
Pages,Pages BN,Pages Color,Customer
145,117,28,Report_Alexis
46,31,15,Report_Alexis
75,27,48,Report_Alexis
145,117,28,Report_Jack
46,31,15,Report_Jack
75,27,48,Report_Jack
145,117,28,Report_Amy
46,31,15,Report_Amy
75,27,48,Report_Amy
So what i need to do , is sum each column based on the report name and the export to another csv file like this
Pages,Pages BN,Pages Color,Customer
266,175,91,Report_Alexis
266,175,91,Report_Jack
266,175,91,Report_Amy
How can i do this?
I tried with this:
$coutnpages = Import-Csv "C:\temp\testcount\final file2.csv" |where {$_.Filename -eq 'Report_Jack'} | Measure-Object -Property Pages -Sum
then
$Countpages.Sum | Set-Content -Path "C:\temp\testcount\final file3.csv"
But this is just one, and then i dont know how to follow.
Can you please help me?

Working code
$IdentityColumns = #('Customer')
$ColumnsToSum = #('Pages', 'Pages BN', 'Pages Color')
$CSVFileInput = 'S:\SCRIPTS\1.csv'
Import-Csv -Path $CSVFileInput |
Group-Object -Property $IdentityColumns |
ForEach-Object {
$resultHT = #{ Customer = $_.Name } # This is result HashTable (Key-Value collection). We add here sum's next line.
#($_.Group | Measure-Object -Property $ColumnsToSum -Sum ) | # Run calculating of sum for all $ColumnsToSum`s in one line
ForEach-Object { $resultHT[$_.Property] = $_.Sum } # For each calculated property we set property in result HashTable
return [PSCustomObject]$resultHT # Convert HashTable to PSCustomObject. This better.
} | # End of ForEach-Object by groups
Select #($ColumnsToSum + $IdentityColumns) | # This sets order of columns. It may be important.
Out-GridView # Or replace with Export-Csv
#Export-Csv ...
Explanation:
Use Group-Object to make collection of groups. Groups have 4 properties:
Name - Name of group, equals to stingified values of property(-ies) you're grouping by
Values - Collection of values of properties you're grouping by (not stringified)
Count - Count of elements grouped into this group
Group - Values of elements grouped into this group
For grouping by single string properties (in this case it is ok), you can easily use Name of group, otherwise, always use Values.
So after Group-Object, you iterate not on collection-of-rows of CSV, but on collection-of-collections-of-rows grouped by some condition.
Measure-Object can process more than one propertiy for single pass (not mixing between values from different properties), we use this actively. This results in array of objects with attribute Property equal to passed to Measure-Object and value (Sum in our case). We move those Property=Sum pairs to hashtable.
[PSCustomObject] converts hashtable to object. Objects are always better for output.

Related

How to merge 2 x CSVs with the same column but overwrite not append?

I've got this one that has been baffling me all day, and I can't seem to find any search results that match exactly what I am trying to do.
I have 2 CSV files, both of which have the same columns and headers. They look like this (shortened for the purpose of this post):
"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","none48","H006"
"1013740016604537004556","3835265","A007"
"1013740016604537004556","3835269","B007"
"1013740016604537004556","3835271","C007"
Each of the 2 CSVs only have some actual Lab IDs, and the 'nonexx' are just fillers for the importing software. There is no duplication ie each 'well' is only referenced once across the 2 files.
What I need to do is merge the 2 CSVs, for example the second CSV might have a Lab ID for well H006 but the first will not. I need the lab ID from the second CSV imported into the first, overwriting the 'nonexx' currently in that column.
Here is my current code:
$CSVB = Import-CSV "$RootDir\SymphonyOutputPending\$plateID`A_Header.csv"
Import-CSV "$RootDir\SymphonyOutputPending\$plateID`_Header.csv" | ForEach-Object {
$CSVData = [PSCustomObject]#{
labid = $_.labid
well = $_.well
}
If ($CSVB.well -match $CSVData.wellID) {
write-host "I MATCH"
($CSVB | Where-Object {$_.well -eq $CSVData.well}).labid = $CSVData.labid
}
$CSVB | Export-CSV "$RootDir\SymphonyOutputPending\$plateID`_final.csv" -NoTypeInformation
}
The code runs but doesn't 'merge' the data, the final CSV output is just a replication of the first input file. I am definitely getting a match as the string "I MATCH" appears several times when debugging as expected.
Based on the responses in the comments of your question, I believe this is what you are looking for. This assumes that the both CSVs contain the exact same data with labid being the only difference.
There is no need to modify csv2 if we are just grabbing the labid to overwrite the row in csv1.
$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv
# Loop through csv1 rows
Foreach($line in $csv1) {
# If Labid contains "none"
If($line.labid -like "none*") {
# Set rows labid to the labid from csv2 row that matches plate/well
# May be able to remove the plate section if well is a unique value
$line.labid = ($csv2 | Where {$_.well -eq $line.well -and $_.plate -eq $line.plate}).labid
}
}
# Export to CSV - not overwrite - to confirm results
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation
Since you need to do a bi-directional comparison of the 2 Csvs you could create a new array of both and then group the objects by their well property, for this you can use Group-Object, then filter each group if their Count is equal to 2 where their labid property does not start with none else return the object as-is.
Using the following Csvs for demonstration purposes:
Csv1
"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","3835265","A007"
"newrowuniquecsv1","none123","X001"
Csv2
"plate","labid","well"
"1013740016604537004556","none48","A007"
"1013740016604537004556","3835269","F006"
"1013740016604537004556","3835271","G006"
"newrowuniquecsv2","none123","X002"
Code
Note that this code assumes there will be a maximum of 2 objects with the same well property and, if there are 2 objects with the same well, one of them must have a value not starting with none.
$mergedCsv = #(
Import-Csv pathtocsv1.csv
Import-Csv pathtocsv2.csv
)
$mergedCsv | Group-Object well | ForEach-Object {
if($_.Count -eq 2) {
return $_.Group.Where{ -not $_.labid.StartsWith('none') }
}
$_.Group
} | Export-Csv pathtomerged.csv -NoTypeInformation
Output
plate labid well
----- ----- ----
1013740016604537004556 3835265 A007
1013740016604537004556 3835269 F006
1013740016604537004556 3835271 G006
newrowuniquecsv1 none123 X001
newrowuniquecsv2 none123 X002
If the lists are large, performance might be an issue as Where-Object (or any other where method) and Group-Object do not perform very well for embedded loops.
By indexing the second csv file (aka creating a hashtable), you have quicker access to the required objects. Indexing upon two (or more) items (plate and well) is issued here: Does there exist a designated (sub)index delimiter? and resolved by #mklement0 and zett42 with a nice CaseInsensitiveArrayEqualityComparer class.
To apply this class on Drew's helpful answer:
$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv
$dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())
$csv2.ForEach{ $dict.($_.plate, $_.well) = $_ }
Foreach($line in $csv1) {
If($line.labid -like "none*") {
$line.labid = $dict.($line.plate, $line.well).labid
}
}
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation

Powershell - group array objects by properties and sum

I am working on getting some data out of CSV file with a script and have no idea to solve the most important part - I have an array with few hundred lines, there are about 50 Ids in those lines, and each Id has a few different services attached to it. Each line has a price attached.
I want to group lines by ID and Service and I want each of those groups in some sort of variable so I can sum the prices. I filter out unique IDs and Services earlier in a script because they are different all the time.
Some example data:
$data = #(
[pscustomobject]#{Id='1';Service='Service1';Propertyx=1;Price='5'}
[pscustomobject]#{Id='1';Service='Service2';Propertyx=1;Price='4'}
[pscustomobject]#{Id='2';Service='Service1';Propertyx=1;Price='17'}
[pscustomobject]#{Id='3';Service='Service1';Propertyx=1;Price='3'}
[pscustomobject]#{Id='2';Service='Service2';Propertyx=1;Price='11'}
[pscustomobject]#{Id='4';Service='Service1';Propertyx=1;Price='7'}
[pscustomobject]#{Id='2';Service='Service3';Propertyx=1;Price='5'}
[pscustomobject]#{Id='3';Service='Service2';Propertyx=1;Price='4'}
[pscustomobject]#{Id='4';Service='Service2';Propertyx=1;Price='12'}
[pscustomobject]#{Id='1';Service='Service3';Propertyx=1;Price='8'})
$ident = $data.Id | select -unique | sort
$Serv = $data.Service | select -unique | sort
All help will be appreciated!
Use Group-Object to group objects by common values across one or more properties.
For example, to calculate the sum per Id, do:
$data |Group-Object Id |ForEach-Object {
[pscustomobject]#{
Id = $_.Name
Sum = $_.Group |Measure-Object Price -Sum |ForEach-Object Sum
}
}
Which should yield output like:
Id Sum
-- ---
1 17
2 33
3 7
4 19

Powershell - Import-CSV Group-Object SUM a number from grouped objects and then combine all grouped objects to single rows

I have a question similar to this one but with a twist:
Powershell Group Object in CSV and exporting it
My file has 42 existing headers. The delimiter is a standard comma, and there are no quotation marks in this file.
master_account_number,sub,txn,cur,last,first,address,address2,city,state,zip,ssn,credit,email,phone,cell,workphn,dob,chrgnum,cred,max,allow,neg,plan,downpayment,pmt2,min,clid,cliname,owner,merch,legal,is_active,apply,ag,offer,settle_perc,min_pay,plan2,lstpmt,orig,placedate
The file's data (the first 6 columns) looks like this:
master_account_number,sub,txn,cur,last,first
001,12,35,50.25,BIRD, BIG
001,34,47,100.10,BIRD, BIG
002,56,9,10.50,BUNNY, BUGS
002,78,3,20,BUNNY, BUGS
003,54,7,250,DUCK, DAFFY
004,44,88,25,MOUSE, JERRY
I am only working with the first column master_account_number and the 4th column cur.
I want to check for duplicates of the"master_account_number" column, if found then add the totals up from the 4th column "cur" for only those dupes found and then do a combine for any rows that we just did a sum on. The summed value from the dupes should replace the cur value in our combined row.
With that said, our out-put should look like so.
master_account_number,sub,txn,cur,last,first
001,12,35,150.35,BIRD, BIG
002,56,9,30.50,BUNNY, BUGS
003,54,7,250,DUCK, DAFFY
004,44,88,25,MOUSE, JERRY
Now that we have that out the way, here is how this question differs. I want to keep all 42 columns intact in the out-put file. In the other question I referenced above, the input was 5 columns and the out-put was 4 columns and this is not what I'm trying to achieve. I have so many more headers, I'd hate to have specify individually all 42 columns. That seems inefficient anyhow.
As for what I have so far for code... not much.
$revNB = "\\server\path\example.csv"
$global:revCSV = import-csv -Path $revNB | ? {$_.is_active -eq "Y"}
$dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object #{Expression={ ($_.Group|Measure-Object cur -Sum).Sum }}
Ultimately I want the output to look identical to the input, only the output should merge duplicate account numbers rows, and add all the "cur" values, where the merged row contains the sum of the grouped cur values, in the cur field.
Last Update: Tried Rich's solution and got an error. Modified what he had to this $dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object Name, #{Name='curSum'; Expression={ ($_.Group | Measure-Object cur -Sum).Sum}}
And this gets me exactly what my own code got me so I am still looking for a solution. I need to output this CSV with all 42 headers. Even for items with no duplicates.
Other things I've tried:
This doesn't give me the data I need in the columns, the columns are there but they are blank.
$dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object #{ expression={$_.Name}; label='master_account_number' },
sub_account_number,
charge_txn,
#{Name='current_balance'; Expression={ ($_.Group | Measure-Object current_balance -Sum).Sum },
last,
}
You're pretty close, but you used current_balance where you probably meant cur.
Here's a start:
$dupesGrouped = $revCSV | Group-Object master_account_number |
Select-Object Name, #{N='curSum'; E={ ($_.Group | Measure-Object cur -Sum).Sum},
#{N='last'; E={ ($_.Group | Select-Object last -first 1).last} }
You can add the other fields by adding Name;Expression hashtables for each of the fields you want to summarize. I assumed you would want to select the first occurrence of repeated last name for the same master_account_number. The output will be incorrect if the last name differs for the same master_account_number.
In the case of changing only part of the data, there is also the following way.
$dupesGrouped = $revCSV | Group-Object master_account_number | ForEach-Object {
# copy the first data in order not to change original data
$new = $_.Group[0].psobject.Copy()
# update the value of cur property
$new.cur = ($_.Group | Measure-Object cur -Sum).Sum
# output
$new
}

Powershell capturing group in pipeline select-object with calculated property

Given the following input XML sample -- assume multiple LogMessage entries.
<LogMessages>
<LogMessage time="2017-12-08 11:44:05.202" messageID="A10">
<![CDATA[Long non-xml string here containing <TS "2017120811431218"> somewhere in the body"]>
</LogMessage>
</LogMessages>
I am using the following code to capture the values of attributes time, messageID, and capture a group in the CDATA.
[xml]$xml = Get-Content input.xml
$xml.LogMessages.LogMessage | Where-Object {$_.messageID -eq "A10"} | Select-Object -Property time,messageID,#{Name="A10Timestamp"; Expression=$_."#cdata-section" -match '<TS "(?<group>[0-9]{16})">' | Select-Object $Matches['group'] }} `
| Export-Csv output.csv -NoTypeInformation
Output looks like:
time messageID Group
---- --------- ---------------
2017-12-08 11:43:12.183 S6F1 #{2017120811431218=}
The #{ and } wrapping the captured group value is undesired. I am concerned about this particular use of the $Matches variable...I think what gets printed is a Match object and not the group string that it matched on... or something like this.
What is going on and how do I get the entries in the Group column appear as 2017120811431218 and not #{2017120811431218=}?
The Match operator returns a boolean and populates the $Matches variable with the matching results.
In other words, you should void what is returned by the -Match operator (and not pipe it) and than simply return the $Matches['group'] to the Expression:
Expression={$Void = $_."#cdata-section" -match '<TS "(?<group>[0-9]{16})">'; $Matches['group']}

Select-Object of multiple properties

I am trying to find an elegant way to put the metadata of a table of type System.Data.DataTable into a multi-dimensional array for easy reference in my program. My approach to the issue so far seems tedious.
Assuming $DataTable being the DataTable in question
What I tried to do so far was:
$Types = $DataTable.Columns | Select-Object -Property DataType
$Columns= $DataTable.Columns | Select-Object -Property ColumnName
$Index = $DataTable.Columns | Select-Object -Property ordinal
$AllowNull = $DataTable.Columns | Select-Object -Property AllowDbNull
Then painfully going through each array, pick up individual items and put them in my multi-dimensional array $TableMetaData.
I read in the documentation of Select-Object and it seems to me that only 1 property can be selected at 1 time? I think I should be able to do all the above more elegantly and store the information in $TableMetaData.
Is there a way to easily pick up multiple properties and put them in a multi-dimensional array in 1 swoop?
I read the documentation of Select-Object and it seems to me that only 1 property can be selected at 1 time?
This is not true, Select-Object can take any number of arguments to the -Property parameter
$ColumnInfo = $DataTable.Columns | Select-Object -Property DataType,ColumnName,ordinal,AllowDbNull
Now $ColumnInfo will contain one object for each column, having all 4 properties.
Rather than using a multi-dimensional array, you should consider using a hashtable (#{}, an unordered dictionary):
$ColumnInfo = $DataTable.Columns | ForEach-Object -Begin { $ht = #{} } -Process {
$ht[$_.ColumnName] = $_
} -End { return $ht }
Here, we create an empty hashtable $ht (the -Begin block runs just once), then store each column object in $ht using the ColumnName as the key, and finally return $ht, storing it in $ColumnInfo.
Now you can reference metadata about each column by Name:
$ColumnInfo.Column2
# or
$ColumnInfo["Column2"]
One easy way to do this is to create an "empty" variable with Select-Object. Here is a sample command:
$DataTableReport = "" | Select-Object -Property DataType, ColumnName, ordinal, AllowDbNull
Then, link the $DataTableReport to the $Types, $Columns, $Index, and the $AllowNull properties as shown below:
$DataTableReport.Types = $DataTable.DataType
$DataTableReport.Columns = $DataTable.ColumnName
$DataTableReport.Index = $DataTable.ordinal
$DataTableReport.AllowNull = $DataTable.AllowDbNull
Finally, call the DataTableReport variable.
$DataTableReport # will display all the results in a tabular form.