Powershell - Group-Object Expand and Group again - powershell

I have a .csv file which I'm grouping on two properties 'DN', 'SKU' and then performing a sum on another property 'DeliQty'
This works fine and the sum is reflected back to the group.
However I then need to re group just on 'DN' and write out to separate files.
I've tried Select-Object -Expand Group but this reverts to the original contents without the summed lines.
Is there a way to un group preserving the summed lines and then group again?
$CSVFiles = Get-ChildItem -Path C:\Scripts\INVENTORY\ASN\IMPORT\ -Filter *.csv
foreach ($csv in $CSVFiles) {
$group = Import-Csv $csv.FullName | Group-Object DN, SKU
$group | Where Count -gt 1 | ForEach-Object {
$_.Group[0].'DeliQty' = ($_.Group | Measure-Object DeliQty -Sum).Sum
}
}

You may do the following:
$CSVFiles = Get-ChildItem -Path C:\Scripts\INVENTORY\ASN\IMPORT\ -Filter *.csv
foreach ($csv in $CSVFiles) {
$group = Import-Csv $csv.FullName | Group-Object DN, SKU | Foreach-Object {
if ($_.Count -gt 1) {
$_.Group[0].DeliQty = ($_.Group | Measure-Object DeliQty -Sum).Sum
}
$_.Group[0]
}
# outputs summed and single group objects
$group
}

Related

Powershell import multiple csv and group by name and total up wins

I am trying to import multiple csv files and output a total score, i don't want to create another csv for the output, below is how the csv is stored. below is csv 1
and this is csv 2
i want to group by Name and total the wins, please see code below that i have tried
get-item -Path "File Path" |
ForEach-Object {
import-csv $_|
Group-Object Name
Select-Object Name, #{ n='Wins'; e={ ($_.Group | Measure-Object Wins -Sum).Sum } }
}
i was hoping for an outcome like below
any help would be awesome
for some reason the current code is showing the below
Its looking better but still not grouping on Name
This will give you the output you are expecting, with the names and total wins for each player.
$csv1 = import-csv "File path of CSV 1"
$csv2 = import-csv "File path of CSV 2"
$allRecords = $csv1 + $csv2
$allRecords | Group-Object Name | Select-Object Name, #{ n='Wins'; e={ ($_.Group | Measure-Object Wins -Sum).Sum } }
the ouptut
Update
With multiple Csv Files
$allRecords = #()
$directory = "Path of the directory containing the CSV files"
$filePaths = Get-ChildItem -Path $directory -Filter "*.csv"
foreach ($filePath in $filePaths) {
$csv = import-csv $filePath
$allRecords += $csv
}
$allRecords | Group-Object Name | Select-Object Name, #{ n='Wins'; e={ ($_.Group | Measure-Object Wins -Sum).Sum } }
If you have a very high number of csv files, you'll find something like this much faster:
$CombinedRecords = Get-ChildItem -Filter *.csv -Path C:\temp | Select-Object -ExpandProperty FullName | Import-Csv
$CombinedRecords | Group-Object Name | Select-Object Name, #{ n='Wins'; e={ ($_.Group | Measure-Object Wins -Sum).Sum } }
It can even be a one-liner:
Get-ChildItem -Filter *.csv -Path C:\temp | Select-Object -ExpandProperty FullName | Import-Csv | Group-Object Name | Select-Object Name, #{ n='Wins'; e={ ($_.Group | Measure-Object Wins -Sum).Sum } }

Import-csv Multiple files

Simple Task. Just trying to loop through a folder of .csv files and import them. I'm a bit confused as this does work on other scripts, but I must be missing something very obvious.
$CSVFiles correctly finds 3 files in the folder.
However $csvOutput only contains data from the last file in the folder.
The rest of the script is working fine.
$CSVFolder = 'D:\XXXX\2021_MAY\'
$OutputFile = 'D:\XXXX\2021_MAY\merged_MAY.csv'
$CSVFiles = Get-ChildItem -Path $CSVFolder -Filter *.csv
foreach ($csv in $CSVFiles)
{$csvOutput = Import-Csv $csv.FullName
$group = $csvOutput | Select-Object
'Channel Reference',
'Received Date',
'Currency',
'Payment Method',
'Channel Payment Reference',
'Line Total Excluding Tax',
'Order Tax','Order Total' | Group-Object {$_.'Channel Reference'}
$group | Where Count -gt 1 | Foreach-Object
{
$_.Group[0].'Line Total Excluding Tax' = ($_.Group | Measure-Object 'Line Total Excluding Tax' -Sum).Sum
}
$group | Foreach-Object { $_.Group[0] } | Export-Csv $OutputFile -NoType
}
I've now fixed this
$CSVFolder = 'D:\XXXX\2021_MAY\'
$OutputFile = 'D:\XXXX\2021_MAY\merged_MAY.csv'
$CSVFiles = Get-ChildItem -Path $CSVFolder -Filter *.csv
foreach ($csv in $CSVFiles){
$group = Import-Csv $csv.FullName | Select-Object 'Channel Reference','Received Date','Currency','Payment Method','Channel Payment Reference','Line Total Excluding Tax','Order Tax','Order Total' | Group-Object {$_.'Channel Reference'} |
Foreach-Object {
if ($_.Count -gt 1){
$_.Group[0].'Line Total Excluding Tax' = ($_.Group | Measure-Object 'Line Total Excluding Tax' -Sum).Sum
}
$_.Group[0]
}
$group | Export-Csv $OutputFile -notype -append
}

Powershell - Replacing rows in .csv with grouped items

I have the following scenario.
A .csv file containing orders. Orders with multiple items are on separate rows.
I'm grouping rows by order id and sku to perform sums on some columns prior to exporting to .csv
I have the following code below which performs the grouping and sums and can write out the results to separate .csv
What I need to do is append the original .csv file by replacing the original rows with the summed rows.
Any help greatly appreciated
For example:
example
$output = #()
# Import .CSV and group on amazon-order-id and sku
# Filter group to only give lines with multiple occurances of each sku per order
Import-Csv D:\Exports\Test\AMAZON\*.csv | Group-Object amazon-order-id, sku | Where-Object {$_.Count -gt 1} |
# Loop through group object. Take first line of each group and place in $new variable
# Using dot notation, sum required columns and add rows to $output variable
ForEach-Object {
$new = $_.Group[0].psobject.Copy()
$new.'quantity-shipped' = ($_.Group | Measure-Object quantity-shipped -Sum).Sum
$new.'item-price' = ($_.Group | Measure-Object item-price -Sum).Sum
$new.'item-tax' = ($_.Group | Measure-Object item-tax -Sum).Sum
$new.'shipping-price' = ($_.Group | Measure-Object shipping-price -Sum).Sum
$new.'shipping-tax' = ($_.Group | Measure-Object shipping-tax -Sum).Sum
$new.'gift-wrap-price' = ($_.Group | Measure-Object gift-wrap-price -Sum).Sum
$new.'gift-wrap-tax' = ($_.Group | Measure-Object gift-wrap-tax -Sum).Sum
$new.'item-promotion-discount' = ($_.Group | Measure-Object item-promotion-discount -Sum).Sum
$new.'ship-promotion-discount' = ($_.Group | Measure-Object ship-promotion-discount -Sum).Sum
$output += $new
}
#Select all group members and export to .csv file
$output | select * | Export-Csv D:\Exports\Test\AMAZON\Import_Me.csv -not
I am assuming that you want to group common rows into a single row and only be left with those grouped single rows plus uncommon rows.
$CSVFiles = Get-ChildItem -Path D:\Exports\Test\AMAZON -File -Filter '*.csv' | Where Extension -eq '.csv'
$output = foreach ($csv in $CSVFiles) {
$csvOutput = Import-Csv $csv.FullName
$group = $csvOutput | Group-Object amazon-order-id, sku
$group | Where Count -gt 1 | Foreach-Object {
$_.Group[0].'quantity-shipped' = ($_.Group | Measure-Object quantity-shipped -Sum).Sum
$_.Group[0].'item-price' = ($_.Group | Measure-Object item-price -Sum).Sum
$_.Group[0].'item-tax' = ($_.Group | Measure-Object item-tax -Sum).Sum
$_.Group[0].'shipping-price' = ($_.Group | Measure-Object shipping-price -Sum).Sum
$_.Group[0].'shipping-tax' = ($_.Group | Measure-Object shipping-tax -Sum).Sum
$_.Group[0].'gift-wrap-price' = ($_.Group | Measure-Object gift-wrap-price -Sum).Sum
$_.Group[0].'gift-wrap-tax' = ($_.Group | Measure-Object gift-wrap-tax -Sum).Sum
$_.Group[0].'item-promotion-discount' = ($_.Group | Measure-Object item-promotion-discount -Sum).Sum
$_.Group[0].'ship-promotion-discount' = ($_.Group | Measure-Object ship-promotion-discount -Sum).Sum
}
$group | Foreach-Object { $_.Group[0] } # Output to variable only
$group | Foreach-Object { $_.Group[0] } | Export-Csv $csv.FullName -NoType
}
$output
Explanation:
$CSVFiles is a collection of CSV files. We need to be able to target them one-by-one to know which file to update. Each file is targeted using a foreach loop with the current file being $csv.
Since $csvOutput is the contents of a CSV file as an array of PSCustomObjects, we can update each object with a Foreach-Object and it will be reflected back in $csvOutput.
$group is assigned the Group-Object output of each CSV file. Using a variable here minimizes the grouping action to only once per file. First, perform the modifications on each group where there are multiple matches. Using your logic, the first object in a grouping is modified. A Foreach-Object is used to go through all the groupings.
Once all modifications are done for one CSV file, $group is output (this includes multiple and single groupings) and only the first object in a grouping is selected ($_.Group[0]) using another Foreach-Object, which works for the single object groupings as well. That output is passed into Export-Csv to update the appropriate file.
$output lists all CSV contents after modifications.

Powershell extraction memory consuming

I have worked on a little project which is to extract some information out of a file server. To perform that projet I have created a script that outputs all the information in a .csv file. The problem is that Powershell eats up all my computer's RAM during the process because there is like hundreds Gb of data to parse.
Hereunder is my script.
$folder = Get-ChildItem -Recurse 'Complete_Path' | select FullName, #{Name="Owner";Expression={(Get-Acl $_.FullName).Owner}}, CreationTime, LastWriteTime, LastAccessTime, PSIsContainer | sort FullName
$output = #()
$folder | foreach {
$type =
if ($_.PSIsContainer -eq "True") {
Write-Output "Folder"
}
else {
Write-Output "File"
}
$size =
if ($_.PSIsContainer -eq "True") {
Get-ChildItem -Recurse $_.FullName | measure -Property Length -Sum -ErrorAction SilentlyContinue | select -ExpandProperty Sum
}
else {
Get-Item $_.FullName | measure -Property Length -Sum -ErrorAction SilentlyContinue | select -ExpandProperty Sum
}
$hash = #{
FullName = $_.FullName
Owner = $_.Owner
CreationTime = $_.CreationTime
LastWriteTime = $_.LastWriteTime
LastAccessTime = $_.LastAccessTime
Type = $type
'Size in MB' = [math]::Round($($size/1Mb),2)
}
$output += New-Object PSObject -Property $hash
}
$output | select FullName, Owner, CreationTime, LastWriteTime, LastAccessTime, Type, 'Size in MB' | Export-Csv C:\myDOCS.csv -Delimiter ";" -NoTypeInformation -Encoding UTF8
Have you guys any idea how can I get the job done faster and less ram consuming? It can take days to get the extraction.
Thank you in advance.
Replace your Powershell array $output=#() with A .Net PSObject list $output = [System.Collections.Generic.List[psobject]]::new() and use the .Add method of that object to add your items.
For small list, you won't notice but using Powershell array and += operator is a big performance sink. Each time you do +=, array is recreated entirely with one more item.
Include Length in your initial Get-ChildItem statement. Later on, you can measure the sum without going through Get-ChildItem again all the time
Pipeline play nice on memory but slower overall. I tend to prefer not using the pipeline when performance become an issue.
Something like that should already be significantly faster
$folder = Get-ChildItem -Recurse "$($env:USERPROFILE)\Downloads" | select FullName, #{Name = "Owner"; Expression = { (Get-Acl $_.FullName).Owner } }, CreationTime, LastWriteTime, LastAccessTime, PSIsContainer, Length | sort FullName
$output = [System.Collections.Generic.List[psobject]]::new()
foreach ($Item in $folder) {
if ($Item.PSIsContainer) {
$Type = 'Folder'
$size = $folder.Where( { $_.FullName -like $item.FullName }).FullName | measure -Property Length -Sum -ErrorAction SilentlyContinue | select -ExpandProperty Sum
}
else {
$Type = 'File'
$size = $Item.Length
}
$size = [math]::Round($($size / 1Mb), 2)
$hash = #{
FullName = $Item.FullName
Owner = $Item.Owner
CreationTime = $Item.CreationTime
LastWriteTime = $Item.LastWriteTime
LastAccessTime = $Item.LastAccessTime
Type = $Type
'Size in MB' = $size
}
[void]($output.Add((New-Object PSObject -Property $hash)))
}
$output | select FullName, Owner, CreationTime, LastWriteTime, LastAccessTime, Type, 'Size in MB' | Export-Csv C:\myDOCS.csv -Delimiter ";" -NoTypeInformation -Encoding UTF8
You could still improve on the size calculation so the deepest of folders size is calculated first, then parent folder can grab the value and sum up the children folder instead of recalculating the files
Another thought would be to not do the Get-ACl immediately (I suspect this one is slow to perform ) and get your items, do the rest, then parallelize the Get-ACL so you can get the values on a number of parallel threads and add the the value as you get it to your list.
Think about testing your code on smaller batches and use the Measure-Command to determine where are the slowest operation in your code.
I recommend you taking a look at some more advanced topic on the subject.
Here's a good article to get your started : Slow Code: Top 5 ways to make your Powershell scripts run faster
Is this better with the whole thing in one pipeline?
Get-ChildItem -Recurse |
select FullName, #{Name="Owner";Expression={(Get-Acl $_.FullName).Owner}},
CreationTime, LastWriteTime, LastAccessTime, PSIsContainer | sort FullName |
foreach {
$type =
if ($_.PSIsContainer -eq "True") {
Write-Output "Folder"
}
else {
Write-Output "File"
}
$size =
if ($_.PSIsContainer -eq "True") {
Get-ChildItem -Recurse $_.FullName |
measure -Property Length -Sum -ErrorAction SilentlyContinue |
select -ExpandProperty Sum
}
else {
Get-Item $_.FullName |
measure -Property Length -Sum -ErrorAction SilentlyContinue |
select -ExpandProperty Sum
}
$hash = #{
FullName = $_.FullName
Owner = $_.Owner
CreationTime = $_.CreationTime
LastWriteTime = $_.LastWriteTime
LastAccessTime = $_.LastAccessTime
Type = $type
'Size in MB' = [math]::Round($($size/1Mb),2)
}
New-Object PSObject -Property $hash
} | select FullName, Owner, CreationTime, LastWriteTime, LastAccessTime,
Type, 'Size in MB' |
Export-Csv myDOCS.csv -Delimiter ";" -NoTypeInformation -Encoding UTF8

Comparing two files: Single column in FirstFile - Multiple columns in SecondFile

I've figured out how to compare single columns in two files, but I cant figure out how to compare two files, with one column in the first and multiple columns in the second file. Both containing emails.
First file.csv (contains single column with emails)
john#email.com
jack#email.com
jill#email.com
Second file.csv (contains multiple column with emails)
john#email.nl,john#email.eu,john#email.com
jill#email.se,jill#email.com,jill#email.us
By comparing I would like to output, the difference. This would result in.
Output.csv
jack#email.com
Anyone able to help me? :)
Single columns comparison and output difference
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Since the first file already contains email addresses per column, you can import it right away.
Take the second file and split the strings containing several addresses.
A new array with seperate addresses will be generated.
Judging from your output, you only seek addresses that are within the first csv but not in the second.
Your code could look like this:
$firstFile = Get-Content 'FirstFile.csv'
$secondFile = (Get-Content 'SecondFile.csv').Split(',')
foreach ($item in $firstFile) {
if ($item -notin $secondFile) {
$item | Export-Csv output.csv -Append -NoTypeInformation
}
}
If you want to maintain your code, can you consider a script like:
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
Rename-Item .\users-emails.csv users-emails.csv.bk
Get-Content .\users-emails.csv.bk).replace(',', "`r`n") | Set-Content .\users-emails.csv
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Remove-Item .\users-emails.csv
Rename-Item .\users-emails.csv.bk users-emails.csv
or, more simplest
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
Get-Content .\users-emails.csv).replace(',', "`r`n") | Set-Content .\users-emails.csv.bk
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv.bk | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Remove-Item .\users-emails.csv.bk
None of the suggestions so far works :(
Still hoping :)
Will delete comment when happy :p
Can you try this?
$One = (Get-Content .\FirstFile.csv).Split(',')
$Two = (Get-Content .\SecondFile.csv).Split(',')
$CsvPath = '.\Output.csv'
$Diff = #()
(Compare-Object ($One | Sort-Object) ($two | Sort-Object)| `
Where-Object {$_.SideIndicator -eq '<='}).inputobject | `
ForEach-Object {$Diff += New-Object PSObject -Property #{email=$_}}
$Diff | Export-Csv -Path $CsvPath -NoTypeInformation
Output.csv will contain entries that exist in FirstFile but not SecondFIle.