How to summarize value rows of one column with reference to another column in PowerShell object

How to summarize value rows of one column with reference to another column in PowerShell object - powershell

I'm learning to work with the import-excel module and have successfully imported the data from a sample.xlsx file. I need to extract out the total amount based on the values of another column values. Basically, I want to just create a grouped data view where I can store the sum of values next to each type. Here's the sample data view.
Type Amount
level 1 $1.00
level 1 $2.00
level 2 $3.00
level 3 $4.00
level 3 $5.00
Now to import I'm just using the simple code
$fileName = "C:\SampleData.xlsx"
$data = Import-Excel -Path $fileName
#extracting distinct type values
$distinctTypes = $importedExcelRows | Select-Object -ExpandProperty "Type" -Unique
#looping through distinct types and storing it in the output
$output = foreach ($type in $distinctTypes)
{
$data | Group-Object $type | %{
New-Object psobject -Property #{
Type = $_.Name
Amt = ($_.Group | Measure-Object 'Amount' -Sum).Sum
}
}
}
$output
The output I'm looking for looks somewhat like:
Type Amount
level 1 $3.00
level 2 $3.00
level 3 $9.00
However, I'm getting nothing in the output. It's $null I think. Any help is appreciated I think I'm missing something in the looping.

You're halfway there by using Group-Object for this scenario, kudos on that part. Luckily, you can group by the type at your import and then measure the sum:
$fileName = "C:\SampleData.xlsx"
Import-Excel -Path $fileName | Group-Object -Property Type | % {
$group = $_.Group | % {
$_.Amount = $_.Amount -replace '[^0-9.]'
$_
} | Measure-Object -Property Amount -Sum
[pscustomobject]#{
Type = $_.Name
Amount = "{0:C2}" -f $group.Sum
}
}
Since you can't measure the amount in currency format, you can remove the dollar sign with some regex of [^0-9.], removing everything that is not a number, or ., or you could use ^\$ instead as well. This allows for the measurement of the amount and you can just format the amount back to currency format using the string format operator '{0:C2} -f ....

I don't know what your issue is but when the dollar signs are not part of the data you pull from the Excel sheet it should work as expected ...
$InputCsvData = #'
Type,Amount
level 1,1.00
level 1,2.00
level 2,3.00
level 3,4.00
level 3,5.00
'# |
ConvertFrom-Csv
$InputCsvData |
Group-Object -Property Type |
ForEach-Object {
[PSCustomObject]#{
Type = $_.Name
Amt = '${0:n2}'-f ($_.Group | Measure-Object -Property Amount -Sum).Sum
}
}
The ouptut looks like this:
Type Amt
---- ---
level 1 $3,00
level 2 $3,00
level 3 $9,00
Otherwise you may remove the dollar signs before you try to summarize the numbers.

Related

Group csv column data and display count using powershell

I am having below data in my csv
"Path_Name","Lun_Number","status"
"vmhba0:C2:T0:L1","1","active"
"vmhba0:C1:T0:L1","1","active"
"vmhba1:C0:T7:L230","230","active"
"vmhba1:C0:T7:L231","231","active"
"vmhba1:C0:T7:L232","230","active"
"vmhba1:C0:T7:L235","231","active"
"vmhba1:C0:T7:L236","230","active"
I need to group the data based on Lun_Number and create a column to get the count of those Lun_Number
expected output
"Path_Name","Lun_Number","status","Count"
"vmhba0:C2:T0:L1","1","active", 2
"vmhba0:C1:T0:L1","1","active",
"vmhba1:C0:T7:L230","230","active",3
"vmhba1:C0:T7:L231","230","active",
"vmhba1:C0:T7:L232","230","active",
"vmhba1:C0:T7:L235","231","active",2
"vmhba1:C0:T7:L236","231","active",
Please let me know how can I do that. I tried group-object, sort-object but it doesn't seems to be working
Below is the code which is generating the above csv
$status_csv = Import-Csv -Path E:\pathstate.csv
$path_csv = Import-Csv -Path E:\PathInfo.csv
foreach($row in $path_csv)
{
$path_1 = $row.Path_Name
$path_2 = $status_csv | where{$_.Name -match "^$path_1$" }
[PsCustomObject]#{
Path_Name = $path_1
Lun_Number = $row.Lun_Number
status = $path_2.PathState
} | Export-Csv -Path E:\FinalReport.csv -NoTypeInformation -Append | Group-Object Lun_Number
}

I can see the approach you're trying to take, and I think something like the following could be useful:
$path_csv = Import-Csv 'sample.csv'
$Unique_Counts = $path_csv.Lun_Number | Group-Object | Select-Object Name,Count
This will help give you the output that you can use as part of your mapping for later, where you can make it dynamic to match with the row you're checking. Which you can use to pull out through a loop, such as $Unique_Counts.
Meaning that it you do something like $Unique_Counts[0].Count, will be able to grab the Lun_Number associated to it (listed as Name in the array).
Name Count
---- -----
1 2
230 3
231 2
If you're okay with having the count for each row, you can then use something like what you have:
foreach($row in $path_csv)
{
$path_1 = $row.Path_Name
[PsCustomObject]#{
Path_Name = $path_1
Lun_Number = $row.Lun_Number
status = $row.status
count = $Unique_Counts | Where-Object {$_.name -eq $row.Lun_Number} | Select-Object -ExpandProperty Count
} | Export-Csv $finalReport -NoTypeInformation -Append
}
This then provides me with the following outcome:
"Path_Name","Lun_Number","status","count"
"vmhba0:C2:T0:L1","1","active","2"
"vmhba0:C1:T0:L1","1","active","2"
"vmhba1:C0:T7:L230","230","active","3"
"vmhba1:C0:T7:L231","231","active","2"
"vmhba1:C0:T7:L232","230","active","3"
"vmhba1:C0:T7:L235","231","active","2"
"vmhba1:C0:T7:L236","230","active","3"
Hope this helps, it may be useful to understand more with the use case, but at least you can grab the unique count for all Lun_numbers. Just has to put it in all rows too.

Find the min element of Hashtable (Values are - DateTime) on PowerShell

There is a hashtable of view:
key(string) - value (dateTime)
Have to find the min value among Values (dateTime-s).
Can't find generic method to find such a value. The only way is smth like
$first_file_date = $dates_hash.Values | Measure-Object -Minimum -Maximum
Get-Date ($first_file_date);
Though visibly I get the result ($first_file_date) the actual value casts to GenericObjectMeasureInfo type and I can't cast it back to DateTime to work further.
Any ideas?

The value you're interested in is stored in the Minimum and Maximum properties of the object returned by Measure-Object:
$measurement = $dates_hash.Values | Measure-Object -Minimum -Maximum
# Minimum/oldest datetime value is stored here
$measurement.Minimum
# Maximum/newest datetime value is stored here
$measurement.Maximum
Use ForEach-Object or Select-Object if you want the raw value in a single pipeline:
$oldest = $dates_hash.Values | Measure-Object -Minimum | ForEach-Object -MemberName Minimum
# or
$oldest = $dates_hash.Values | Measure-Object -Minimum | Select-Object -ExpandProperty Minimum

To complement Mathias R. Jessen's helpful answer with an alternative solution based on LINQ:
# Sample hashtable.
$hash = #{
foo = (Get-Date)
bar = (Get-Date).AddDays(-1)
}
# Note that the minimum is sought among the hash's *values* ([datetime] instances)
# The [datetime[] cast is required to find the appropriate generic overload.
[Linq.Enumerable]::Min([datetime[]] $hash.Values)
Use of LINQ from PowerShell is generally cumbersome, unfortunately (see this answer). GitHub proposal #2226 proposes improvements.

Just use Sort-Object for this:
$dates_hash = #{
"a" = (Get-Date).AddMinutes(4)
"b" = (Get-Date).AddMinutes(5)
"c" = (Get-Date).AddMinutes(2)
"d" = (Get-Date).AddMinutes(5)
"e" = (Get-Date).AddMinutes(1)
"f" = (Get-Date).AddMinutes(6)
"g" = (Get-Date).AddMinutes(8)
}
$first_file_date = $dates_hash.Values | Sort-Object | Select-Object -First 1
Or if you want the whole object:
$first_file = $dates_hash.GetEnumerator() | Sort-Object -Property "Value" | Select-Object -First 1

Powershell - Import-CSV Group-Object SUM a number from grouped objects and then combine all grouped objects to single rows

I have a question similar to this one but with a twist:
Powershell Group Object in CSV and exporting it
My file has 42 existing headers. The delimiter is a standard comma, and there are no quotation marks in this file.
master_account_number,sub,txn,cur,last,first,address,address2,city,state,zip,ssn,credit,email,phone,cell,workphn,dob,chrgnum,cred,max,allow,neg,plan,downpayment,pmt2,min,clid,cliname,owner,merch,legal,is_active,apply,ag,offer,settle_perc,min_pay,plan2,lstpmt,orig,placedate
The file's data (the first 6 columns) looks like this:
master_account_number,sub,txn,cur,last,first
001,12,35,50.25,BIRD, BIG
001,34,47,100.10,BIRD, BIG
002,56,9,10.50,BUNNY, BUGS
002,78,3,20,BUNNY, BUGS
003,54,7,250,DUCK, DAFFY
004,44,88,25,MOUSE, JERRY
I am only working with the first column master_account_number and the 4th column cur.
I want to check for duplicates of the"master_account_number" column, if found then add the totals up from the 4th column "cur" for only those dupes found and then do a combine for any rows that we just did a sum on. The summed value from the dupes should replace the cur value in our combined row.
With that said, our out-put should look like so.
master_account_number,sub,txn,cur,last,first
001,12,35,150.35,BIRD, BIG
002,56,9,30.50,BUNNY, BUGS
003,54,7,250,DUCK, DAFFY
004,44,88,25,MOUSE, JERRY
Now that we have that out the way, here is how this question differs. I want to keep all 42 columns intact in the out-put file. In the other question I referenced above, the input was 5 columns and the out-put was 4 columns and this is not what I'm trying to achieve. I have so many more headers, I'd hate to have specify individually all 42 columns. That seems inefficient anyhow.
As for what I have so far for code... not much.
$revNB = "\\server\path\example.csv"
$global:revCSV = import-csv -Path $revNB | ? {$_.is_active -eq "Y"}
$dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object #{Expression={ ($_.Group|Measure-Object cur -Sum).Sum }}
Ultimately I want the output to look identical to the input, only the output should merge duplicate account numbers rows, and add all the "cur" values, where the merged row contains the sum of the grouped cur values, in the cur field.
Last Update: Tried Rich's solution and got an error. Modified what he had to this $dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object Name, #{Name='curSum'; Expression={ ($_.Group | Measure-Object cur -Sum).Sum}}
And this gets me exactly what my own code got me so I am still looking for a solution. I need to output this CSV with all 42 headers. Even for items with no duplicates.
Other things I've tried:
This doesn't give me the data I need in the columns, the columns are there but they are blank.
$dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object #{ expression={$_.Name}; label='master_account_number' },
sub_account_number,
charge_txn,
#{Name='current_balance'; Expression={ ($_.Group | Measure-Object current_balance -Sum).Sum },
last,
}

You're pretty close, but you used current_balance where you probably meant cur.
Here's a start:
$dupesGrouped = $revCSV | Group-Object master_account_number |
Select-Object Name, #{N='curSum'; E={ ($_.Group | Measure-Object cur -Sum).Sum},
#{N='last'; E={ ($_.Group | Select-Object last -first 1).last} }
You can add the other fields by adding Name;Expression hashtables for each of the fields you want to summarize. I assumed you would want to select the first occurrence of repeated last name for the same master_account_number. The output will be incorrect if the last name differs for the same master_account_number.

In the case of changing only part of the data, there is also the following way.
$dupesGrouped = $revCSV | Group-Object master_account_number | ForEach-Object {
# copy the first data in order not to change original data
$new = $_.Group[0].psobject.Copy()
# update the value of cur property
$new.cur = ($_.Group | Measure-Object cur -Sum).Sum
# output
$new
}

Powershell Compare-object IF different then ONLY list items from one file, not both

I have deleted my original question because I believe I have a more efficient way to run my script, thus I'm changing my question.
$scrubFileOneDelim = "|"
$scrubFileTwoDelim = "|"
$scrubFileOneBal = 2
$scrubFileTwoBal = 56
$scrubFileOneAcctNum = 0
$scrubFileTwoAcctNum = 0
$ColumnsF1 = Get-Content $scrubFileOne | ForEach-Object{($_.split($scrubFileOneDelim)).Count} | Measure-Object -Maximum | Select-Object -ExpandProperty Maximum
$ColumnsF2 = Get-Content $scrubFileTwo | ForEach-Object{($_.split($scrubFileTwoDelim)).Count} | Measure-Object -Maximum | Select-Object -ExpandProperty Maximum
$useColumnsF1 = $ColumnsF1-1;
$useColumnsF2 = $ColumnsF2-1;
$fileOne = import-csv "$scrubFileOne" -Delimiter "$scrubFileOneDelim" -Header (0..$useColumnsF1) | select -Property #{label="BALANCE";expression={$($_.$scrubFileOneBal)}},#{label="ACCTNUM";expression={$($_.$scrubFileOneAcctNum)}}
$fileTwo = import-csv "$scrubFileTwo" -Delimiter "$scrubFileTwoDelim" -Header (0..$useColumnsF2) | select -Property #{label="BALANCE";expression={$($_.$scrubFileTwoBal)}},#{label="ACCTNUM";expression={$($_.$scrubFileTwoAcctNum)}}
$hash = #{}
$hashTwo = #{}
$fileOne | foreach { $hash.add($_.ACCTNUM, $_.BALANCE) }
$fileTwo | foreach { $hashTwo.add($_.ACCTNUM, $_.BALANCE) }
In this script I'm doing the following, counting header's to return the count and use it in a range operator in order to dynamically insert headers for later manipulation. Then I'm importing 2 CSV files. I'm taking those CSV files and pushing them into their own hashtable.
Just for an idea of what I'm trying to do from here...
CSV1 (as a hashtable) looks like this:
Name Value
---- -----
000000000001 000000285+
000000000002 000031000+
000000000003 000004685+
000000000004 000025877+
000000000005 000000001+
000000000006 000031000+
000000000007 000018137+
000000000008 000000000+
CSV2 (as a hashtable) looks like this:
Name Value
---- -----
000000000001 000008411+
000000000003 000018137+
000000000007 000042865+
000000000008 000009761+
I would like to create a third hash table. It will have all the "NAME" items from CSV2, but I don't want the "VALUE" from CSV2, I want it to have the "VALUE"s that CSV1 has. So in the end result would look like this.
Name Value
---- -----
000000000001 000000285+
000000000003 000004685+
000000000007 000018137+
000000000008 000000000+
Ultimately I want this to be exported as a csv.
I have tried this with just doing a compare-object, not doing the hashtables with the following code, but I abandoned trying to do it this way because file 1 may have 100,000 "accounts" where file 2 only has 200, and the result I was getting listed close to the 100,000 accounts that I didn't want to be in the result. They had the right balances but I want a file that only has those balances for the accounts listed in file 2. This code below isn't really a part of my question, just showing something I've tried. I just think this is much easier and faster with a hash table now so I would like to go that route.
#Find and Rename the BALANCE and ACCOUNT NUMBER columns in both files.
$fileOne = import-csv "$scrubFileOne" -Delimiter "$scrubFileOneDelim" -Header (0..$useColumnsF1) | select -Property #{label="BALANCE";expression={$($_.$scrubFileOneBal)}},#{label="ACCT-NUM";expression={$($_.$scrubFileOneAcctNum)}}
$fileTwo = import-csv "$scrubFileTwo" -Delimiter "$scrubFileTwoDelim" -Header (0..$useColumnsF2) | select -Property #{label="BALANCE";expression={$($_.$scrubFileTwoBal)}},#{label="ACCT-NUM";expression={$($_.$scrubFileTwoAcctNum)}}
Compare-Object $fileOne $fileTwo -Property 'BALANCE','ACCTNUM' -IncludeEqual -PassThru | Where-Object{$_.sideIndicator -eq "<="} | select * -Exclude SideIndicator | export-csv -notype "C:\test\f1.txt"

What you are after is filtering the Compare-Object function. This will show only one side of the result. YOu will need to place this before you exclude that property for it to work.
| Where-Object{$_.sideIndicator -eq "<="} |

Assuming that you have the following hash tables:
$hash = #{
'000000000001' = '000000285+';
'000000000002' = '000031000+';
'000000000003' = '000004685+';
'000000000004' = '000025877+';
'000000000005' = '000000001+';
'000000000006' = '000031000+';
'000000000007' = '000018137+';
'000000000008' = '000000000+';
}
$hashTwo = #{
'000000000001' = '000008411+';
'000000000003' = '000018137+';
'000000000007' = '000042865+';
'000000000008' = '000009761+';
}
you can create the third hash table by iterating over the keys from the second hash table and then assigning those keys to the value from the first hash table.
$hashThree = #{}
ForEach ($key In $hashTwo.Keys) {
$hashThree["$key"] = $hash["$key"]
}
$hashThree
The output of $hashThree is:
Name Value
---- -----
000000000007 000018137+
000000000001 000000285+
000000000008 000000000+
000000000003 000004685+
If you want the order of the data maintained (and you are using PowerShell 6 Core), you can use [ordered]#{} when creating the hash tables.

Powershell csv row column transpose and manipulation

I'm newbie in Powershell. I tried to process / transpose row-column against a medium size csv based record (around 10000 rows). The original CSV consist of around 10000 rows with 3 columns ("Time","Id","IOT") as below:
"Time","Id","IOT"
"00:03:56","23","26"
"00:03:56","24","0"
"00:03:56","25","0"
"00:03:56","26","1"
"00:03:56","27","0"
"00:03:56","28","0"
"00:03:56","29","0"
"00:03:56","30","1953"
"00:03:56","31","22"
"00:03:56","32","39"
"00:03:56","33","8"
"00:03:56","34","5"
"00:03:56","35","269"
"00:03:56","36","5"
"00:03:56","37","0"
"00:03:56","38","0"
"00:03:56","39","0"
"00:03:56","40","1251"
"00:03:56","41","103"
"00:03:56","42","0"
"00:03:56","43","0"
"00:03:56","44","0"
"00:03:56","45","0"
"00:03:56","46","38"
"00:03:56","47","14"
"00:03:56","48","0"
"00:03:56","49","0"
"00:03:56","2013","0"
"00:03:56","2378","0"
"00:03:56","2380","32"
"00:03:56","2758","0"
"00:03:56","3127","0"
"00:03:56","3128","0"
"00:09:16","23","22"
"00:09:16","24","0"
"00:09:16","25","0"
"00:09:16","26","2"
"00:09:16","27","0"
"00:09:16","28","0"
"00:09:16","29","21"
"00:09:16","30","48"
"00:09:16","31","0"
"00:09:16","32","4"
"00:09:16","33","4"
"00:09:16","34","7"
"00:09:16","35","382"
"00:09:16","36","12"
"00:09:16","37","0"
"00:09:16","38","0"
"00:09:16","39","0"
"00:09:16","40","1882"
"00:09:16","41","42"
"00:09:16","42","0"
"00:09:16","43","3"
"00:09:16","44","0"
"00:09:16","45","0"
"00:09:16","46","24"
"00:09:16","47","22"
"00:09:16","48","0"
"00:09:16","49","0"
"00:09:16","2013","0"
"00:09:16","2378","0"
"00:09:16","2380","19"
"00:09:16","2758","0"
"00:09:16","3127","0"
"00:09:16","3128","0"
...
...
...
I tried to do the transpose using code based from powershell script downloaded from https://gallery.technet.microsoft.com/scriptcenter/Powershell-Script-to-7c8368be
Basically my powershell code is as below:
$b = #()
foreach ($Time in $a.Time | Select -Unique) {
$Props = [ordered]#{ Time = $time }
foreach ($Id in $a.Id | Select -Unique){
$IOT = ($a.where({ $_.Id -eq $Id -and $_.time -eq $time })).IOT
$Props += #{ $Id = $IOT }
}
$b += New-Object -TypeName PSObject -Property $Props
}
$b | FT -AutoSize
$b | Out-GridView
Above code could give me the result as I expected which are all "Id" values will become column headers while all "Time" values will become unique row and "IOT" values as the intersection from "Id" x "Time" as below:
"Time","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","2013","2378","2380","2758","3127","3128"
"00:03:56","26","0","0","1","0","0","0","1953","22","39","8","5","269","5","0","0","0","1251","103","0","0","0","0","38","14","0","0","0","0","32","0","0","0"
"00:09:16","22","0","0","2","0","0","21","48","0","4","4","7","382","12","0","0","0","1882","42","0","3","0","0","24","22","0","0","0","0","19","0","0","0"
While it only involves a few hundreds rows, the result comes out quickly as expected, but the problem now when processing the whole csv file with 10000 rows, the script above 'keep executing' and doesn't seem able to finish for long time (hours) and couldn't spit out any results.
So probably if some powershell experts from stackoverflow could help to asses the code above and probably could help to modify to speed up the results?
Many thanks for the advise

10000 records is a lot but I don't think it is enough to advise streamreader* and manually parsing the CSV. The biggest thing going against you though is the following line:
$b += New-Object -TypeName PSObject -Property $Props
What PowerShell is doing here is making a new array and appending that element to it. This is a very memory intensive operation that you are repeating 1000's of times. Better thing to do in this case is use the pipeline to your advantage.
$data = Import-Csv -Path "D:\temp\data.csv"
$headers = $data.ID | Sort-Object {[int]$_} -Unique
$data | Group-Object Time | ForEach-Object{
$props = [ordered]#{Time = $_.Name}
foreach($header in $headers){
$props."$header" = ($_.Group | Where-Object{$_.ID -eq $header}).IOT
}
[pscustomobject]$props
} | export-csv d:\temp\testing.csv -NoTypeInformation
$data will be your entire file in memory as an object. Need to get all the $headers that will be the column headers.
Group the data by each Time. Then inside each time object we get the value for every ID. If the ID does not exist during that time then the entry will show as null.
This is not the best way but should be faster than yours. I ran 10000 records in under a minute (51 second average over 3 passes). Will benchmark to show you if I can.
I just ran your code once with my own data and it took 13 minutes. I think it is safe to say that mine performs faster.
Dummy data was made with this logic FYI
1..100 | %{
$time = get-date -Format "hh:mm:ss"
sleep -Seconds 1
1..100 | % {
[pscustomobject][ordered]#{
time = $time
id = $_
iot = Get-Random -Minimum 0 -Maximum 7
}
}
} | Export-Csv d:\temp\data.csv -notypeinformation
* Not a stellar example for your case of streamreader. Just pointing it out to show that it is the better way to read large files. Just need to parse string line by line.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to summarize value rows of one column with reference to another column in PowerShell object - powershell

Related

Group csv column data and display count using powershell

Find the min element of Hashtable (Values are - DateTime) on PowerShell

Powershell - Import-CSV Group-Object SUM a number from grouped objects and then combine all grouped objects to single rows

Powershell Compare-object IF different then ONLY list items from one file, not both

Powershell csv row column transpose and manipulation

Categories

Resources