Compare two different csv files using PowerShell - powershell

I'm looking for a solution to compare 2 .csv files and compare the results.
The first .csv file is an monthly backup size in KB on based client name. The second .csv file is an next monthly backup size in KB on based client name.
It lists all the Client Name in column A. Column B has the corresponding policy name of client and last column backup size in KB (i.e. - 487402463).
If the difference between client size (1638838488 - 1238838488 = 0.37 in TB ) is greater than 0.10 TB , the results will be spit out in TB size to a csv file like below.
Also , a client may be related multiple policy name.
My question is : I want to add something too. Sometimes it may be duplicate client and policy name such as hostnameXX,Company_Policy_XXX or case-sensitive HOSTNAMEXX,Company_Policy_XXX.
additionally, lets say , if hostnameYY,Company_Policy_XXX,41806794 does not exist in CSV2 then I want to display as negative like below.
I used the Join-Object module.
Example CSVFile1.csv
Client Name,Policy Name,KB Size
hostname1,Company_Policy,487402463
hostname2,Company_Policy,227850336
hostname3,Company_Policy_11,8360960
hostname4,Company_Policy_11,1238838488
hostname1,Company_Policy_55,521423110
hostname10,Company_Policy,28508975
hostname3,Company_Policy_66,295925
hostname5,Company_Policy_22,82001824
hostname2,Company_Policy_33,26176885
hostnameXX,Company_Policy_XXX,0
hostnameXX,Company_Policy_XXX,41806794
hostnameYY,Company_Policy_XXX,41806794
Example CSVFile2.csv
Client Name,Policy Name,KB Size
hostname1,Company_Policy,487402555
hostname2,Company_Policy,227850666
hostname3,Company_Policy_11,8361200
hostname4,Company_Policy_11,1638838488
hostname1,Company_Policy_55,621423110
hostname10,Company_Policy,28908975
hostname3,Company_Policy_66,295928
hostname5,Company_Policy_22,92001824
hostname2,Company_Policy_33,36176885
hostname22,Company_Policy,291768854
hostname23,Company_Policy,291768854
Desired Output :
Client Name,Policy Name,TB Size
hostname4,Company_Policy_11,0.37
hostname22,Company_Policy,0.27
hostname23,Company_Policy,0.27
hostnameYY,Company_Policy_XXX,-0.03
hostnameXX,Company_Policy_XXX,-0.04

Using this Join-Object cmdlet (see also: what's the best way to join two tables into one?):
$CSV2 | FullJoin $CSV1 `
-On 'Client Name','Policy Name' `
-Property 'Client Name',
'Policy Name',
#{'TB Size' = {[math]::Round(($Left.'KB Size' - $Right.'KB Size') * 1KB / 1TB, 2)}} |
Where-Object {[math]::Abs($_.'TB Size') -gt 0.01}
Result:
Client Name Policy Name TB Size
----------- ----------- -------
hostname4 Company_Policy_11 -0.37
hostname1 Company_Policy_55 -0.09
hostnameXX Company_Policy_XXX 0.04
hostnameYY Company_Policy_XXX 0.04
hostname22 Company_Policy -0.27
hostname23 Company_Policy -0.27
Update 2019-11-24
Improved -Where parameter which will now also apply to outer joins.
You can now use the -Where parameter instead of the Where-Object cmdlet for these type of queries, e.g.:
$Actual = $CSV2 | FullJoin $CSV1 `
-On 'Client Name','Policy Name' `
-Property 'Client Name',
'Policy Name',
#{'TB Size' = {[math]::Round(($Left.'KB Size' - $Right.'KB Size') / 1GB, 2)}} `
-Where {[math]::Abs($Left.'KB Size' - $Right.'KB Size') -gt 100MB}
The advantage of using the -Where parameter is that there is a slight performance improvement as some output objects aren't required to be created at all.
Note 1: The -Where parameter applies to the $Left and $Right objects that represent respectively each $LeftInput and $RightInput object and not the Output Object. In other words you can't use e.g. the calculated TB Size property in the -Where expression for this example.
Note 2: The $Right object always exists in a Left Join or full join even if there is no relation. In case there is no relation, all properties of the $Right object will be set to $Null. The same applies to the $Left object in a right join or full join.

I have never used the Join-Object module, so I wrote it using standard cmdlets.
$data1 = Import-Csv "CSVFile1.csv"
$data1 | ForEach-Object { $_."KB Size" = -1 * $_."KB Size" } # Convert to negative value
$data2 = Import-Csv "CSVFile2.csv"
#($data2; $data1) | Group-Object "Client Name","Policy Name" | ForEach-Object {
$size = [Math]::Round(($_.Group | Measure-Object "KB Size" -Sum).Sum * 1KB / 1TB, 2)
if ($size -ge 0 -and $size -lt 0.1) { return }
[pscustomobject]#{
"Client Name" = $_.Group[0]."Client Name"
"Policy Name" = $_.Group[0]."Policy Name"
"TB Size" = $size
}
}

Related

How to summarize value rows of one column with reference to another column in PowerShell object

I'm learning to work with the import-excel module and have successfully imported the data from a sample.xlsx file. I need to extract out the total amount based on the values of another column values. Basically, I want to just create a grouped data view where I can store the sum of values next to each type. Here's the sample data view.
Type Amount
level 1 $1.00
level 1 $2.00
level 2 $3.00
level 3 $4.00
level 3 $5.00
Now to import I'm just using the simple code
$fileName = "C:\SampleData.xlsx"
$data = Import-Excel -Path $fileName
#extracting distinct type values
$distinctTypes = $importedExcelRows | Select-Object -ExpandProperty "Type" -Unique
#looping through distinct types and storing it in the output
$output = foreach ($type in $distinctTypes)
{
$data | Group-Object $type | %{
New-Object psobject -Property #{
Type = $_.Name
Amt = ($_.Group | Measure-Object 'Amount' -Sum).Sum
}
}
}
$output
The output I'm looking for looks somewhat like:
Type Amount
level 1 $3.00
level 2 $3.00
level 3 $9.00
However, I'm getting nothing in the output. It's $null I think. Any help is appreciated I think I'm missing something in the looping.
You're halfway there by using Group-Object for this scenario, kudos on that part. Luckily, you can group by the type at your import and then measure the sum:
$fileName = "C:\SampleData.xlsx"
Import-Excel -Path $fileName | Group-Object -Property Type | % {
$group = $_.Group | % {
$_.Amount = $_.Amount -replace '[^0-9.]'
$_
} | Measure-Object -Property Amount -Sum
[pscustomobject]#{
Type = $_.Name
Amount = "{0:C2}" -f $group.Sum
}
}
Since you can't measure the amount in currency format, you can remove the dollar sign with some regex of [^0-9.], removing everything that is not a number, or ., or you could use ^\$ instead as well. This allows for the measurement of the amount and you can just format the amount back to currency format using the string format operator '{0:C2} -f ....
I don't know what your issue is but when the dollar signs are not part of the data you pull from the Excel sheet it should work as expected ...
$InputCsvData = #'
Type,Amount
level 1,1.00
level 1,2.00
level 2,3.00
level 3,4.00
level 3,5.00
'# |
ConvertFrom-Csv
$InputCsvData |
Group-Object -Property Type |
ForEach-Object {
[PSCustomObject]#{
Type = $_.Name
Amt = '${0:n2}'-f ($_.Group | Measure-Object -Property Amount -Sum).Sum
}
}
The ouptut looks like this:
Type Amt
---- ---
level 1 $3,00
level 2 $3,00
level 3 $9,00
Otherwise you may remove the dollar signs before you try to summarize the numbers.

how to caculate the avg of column from a csv which does not have headers with powershell?

The data array inside the csv which does not have headers(shoudl be: pkg, pp0, pp1, dram, time):
37.0036,27.553,0,0,0.100111
35.622,26.1947,0,0,0.200702
34.931,25.5656,0,0,0.300765
34.814,25.4795,0,0,0.400826
34.924,25.5676,0,0,0.500888
34.8971,25.5443,0,0,0.600903
if I want to get the avg value of the columns and make the output like:
The avg of Pkg: xxx
The avg of pp0: xxx
The avg of pp1: xxx
The avg of time: xxx
how can I do?
When you're using Import-CSV, PowerShell references the first row as the header row. The error you're getting,
import-csv : The member "0" is already present.
Is because there is already a header name of 0 in the header row. To give new names to the headers, use the Import-CSV -Header command to give manual names in the csv file.
From here, you can use the Measure-Object command to determine the averages
$myData = Import-Csv .\a.csv -Header pkg,pp0,pp1,dram,time
Write-Host "The avg of Pkg: $(($myData | Measure-Object -Property pkg -Average).Average)"
Write-Host "The avg of pp0: $(($myData | Measure-Object -Property pp0 -Average).Average)"
Write-Host "The avg of pp1: $(($myData | Measure-Object -Property pp1 -Average).Average)"
Write-Host "The avg of time: $(($myData | Measure-Object -Property time -Average).Average)"

How to merge two tables that have same table structure? in PowerShell

In the following script, the outputs are displayed as two separate tables (each with two columns). How can I display both tables in a table with three columns?
# Create a hash table for File QTY
$Qty = #{}
# Create a hash table for file size
$Size = #{}
# Import all files into one $Files
$Files = Get-ChildItem 'D:\' -Filter *.* -Recurse -File
# Create a loop to check $file in $Files
foreach ($file in $Files) {
# Count files based on Extension
$Qty[$file.Extension] += 1
# Summarize file sizes based on their format
$Size[$file.Extension] += $file.Length / 1GB
}
# Show File QTY table
$Qty
# Show file size table
$Size
Like this:
Type Size Qyt
----- ----- -----
.jpg 10 GB 10000
.png 30 GB 30000
.tif 40 GB 40000
Extract the keys (the file extensions) from one table and use that as a "driver" to construct a new set of objects with the size and quantity values from the tables:
$Qty.psbase.Keys |Select-Object #{Name='Type';Expression={$_}},#{Name='Size';Expression={$Size[$_]}},#{Name='Qty'; Expression={$Qty[$_]}}
While you already got an answer that does exactly what you asked for, I would take a different approach, that doesn't need hashtables:
# Import all files into one $Files
$Files = Get-ChildItem 'D:\' -Filter *.* -Recurse -File
# Group files by extension and collect the per-extension information
$Files | Group-Object -Property Extension |
Select-Object #{ n = 'Type'; e = 'Name' },
#{ n = 'Size'; e = { '{0:0.00} GB' -f (($_.Group | Measure-Object Length -Sum).Sum / 1GB) } },
#{ n = 'Qty'; e = 'Count' }
Output is like:
Type Size Qty
----- ----- -----
.jpg 10.23 GB 10000
.png 30.07 GB 30000
.tif 40.52 GB 40000
Group-Object produces an object per unique file extension. It has a property Count, which is the number of grouped items, so we can use that directly. Its property Name contains the file extension. Its Group property contains all FileInfo objects from Get-ChildItem that have the same extension.
Using Select-Object we rename the Name property to Type and rename the Count property to Qty. We still need to calculate the total size per file type, which is done using Measure-Object -Sum. Using the format operator -f we pretty-print the result.
Replace {0:0.00} by {0:0} to remove the fractional digits from the Size column.
The syntax #{ n = ...; e = ... } is shortcut for #{ name = ...; expression = ... } to create a calculated property.
You may defer the formatting of the Size column to Format-Table, to be able to do additional processing based on the numeric value, e. g. sorting:
$files | Group-Object -Property Extension |
Select-Object #{ n = 'Type'; e = 'Name' },
#{ n = 'Size'; e = { ($_.Group | Measure-Object Length -Sum).Sum } },
#{ n = 'Qty'; e = 'Count' } |
Sort-Object Size -Descending |
Format-Table 'Type', #{ n = 'Size'; e = { '{0:0} GB' -f ($_.Size / 1GB) } }, 'Qty'

Join-Object two different csv files using PowerShell

The first .csv file is an monthly backup size in KB on based client name. The second .csv file is an next monthly backup size in KB on based client name.
It lists all the Client Name in column A. Column B has the corresponding policy name of client and last column backup size in KB (i.e. - 487402463).
If the difference between client size (1638838488 - 1238838488 = 0.37 in TB ) is greater than 0.10 TB , the results will be spit out in TB size to a csv file like below.
Also , a client may be related multiple policy name.
My question is : I want to add something too.
Backup size may decrease in the next month such as hostname15,Company_Policy_11.
Also , hostname55,Company_Policy_XXX may have different policy name.
hostnameXX,Company_Policy_XXX,0 and hostnameXX,Company_Policy_XXX,41806794 it may be duplicate client and policy name. if this does not exist in CSV2 then I want to display as negative (-0.14) like below. Or may be exist in CSV2 hostnameZZ,Company_Policy_XXX as well.
Lastly just it may be in CSV2 such as hostnameSS,Company_Policy_XXX.
I used the Join-Object module. https://github.com/ili101/Join-Object
Example CSVFile1.csv
Client Name,Policy Name,KB Size
hostname1,Company_Policy,487402463
hostname2,Company_Policy,227850336
hostname3,Company_Policy_11,8360960
hostname4,Company_Policy_11,1238838488
hostname15,Company_Policy_11,3238838488
hostname1,Company_Policy_55,521423110
hostname10,Company_Policy,28508975
hostname3,Company_Policy_66,295925
hostname5,Company_Policy_22,82001824
hostname2,Company_Policy_33,26176885
hostnameXX,Company_Policy_XXX,0
hostnameXX,Company_Policy_XXX,141806794
hostnameYY,Company_Policy_XXX,121806794
hostname55,Company_Policy_XXX,41806794
hostnameZZ,Company_Policy_XXX,0
hostnameZZ,Company_Policy_XXX,141806794
Example CSVFile2.csv
Client Name,Policy Name,KB Size
hostname1,Company_Policy,487402555
hostname2,Company_Policy,227850666
hostname3,Company_Policy_11,8361200
hostname4,Company_Policy_11,1638838488
hostname1,Company_Policy_55,621423110
hostname15,Company_Policy_11,1238838488
hostname10,Company_Policy,28908975
hostname3,Company_Policy_66,295928
hostname5,Company_Policy_22,92001824
hostname2,Company_Policy_33,36176885
hostname22,Company_Policy,291768854
hostname23,Company_Policy,291768854
hostname55,Company_Policy_BBB,191806794
hostnameZZ,Company_Policy_XXX,0
hostnameZZ,Company_Policy_XXX,291806794
hostnameSS,Company_Policy_XXX,0
hostnameSS,Company_Policy_XXX,291806794
Desired Output :
Client Name,Policy Name,TB Size
hostname4,Company_Policy_11,0.37
hostname22,Company_Policy,0.27
hostname23,Company_Policy,0.27
hostnameYY,Company_Policy_XXX,-0.12
hostnameXX,Company_Policy_XXX,-0.14
hostname15,Company_Policy_11,-2
hostname55,Company_Policy_BBB,0.15
hostnameZZ,Company_Policy_XXX,0.15
hostnameSS,Company_Policy_XXX,0.29
Here is my script so far :
$CSV2 | FullJoin $CSV1 `
-On 'Client Name','Policy Name' `
-Property 'Client Name',
'Policy Name',
#{'TB Size' = {[math]::Round(($Left.'KB Size' - $Right.'KB Size') * 1KB / 1TB, 2)}} |
Where-Object {[math]::Abs($_.'TB Size') -gt 0.10} | Export-Csv C:\Toolbox\DataReport.csv -NoTypeInformation
You could so something similar to the following. This assumes you want to subtract CSV1 values from CSV2 values.
# Read CSV files and make CSV1 sizes negative. Makes summing totals simpler.
$1 = Import-Csv CSVFile1.csv | Foreach-Object { $_.'KB Size' = -$_.'KB Size'; $_ }
$2 = Import-Csv CSVFile2.csv
# Calculated Properties to be used with Select-Object
$CalculatedProperties = #{n='Client Name';e={$_.Group.'Client Name' | Get-Unique}},
#{n='Policy Name';e={$_.Group.'Policy Name' | Get-Unique}},
#{n='TB Size';e={[math]::Round(($_.Group.'KB Size' | Measure -Sum).Sum*1KB/1TB,2)}}
# Grouping objects based on unique client and policy name combinations
$1 + $2 | Group-Object 'Client Name','Policy Name' |
Select-object $CalculatedProperties |
Where {[math]::Abs($_.'TB Size') -gt 0.10}

Filter logfile to create a csv report using PowerShell

I have a NetApp log output in a log file which is the below format.
DeviceDetails.log file content
/vol/DBCXARCHIVE002_E_Q22014_journal/DBCXARCHIVE002_E_Q22014_journal 1.0t (1149038714880) (r/w, online, mapped)
Comment: " "
Serial#: e3eOF4y4SRrc
Share: none
Space Reservation: enabled (not honored by containing Aggregate)
Multiprotocol Type: windows_2008
Maps: DBCXARCHIVE003=33
Occupied Size: 1004.0g (1077986099200)
Creation Time: Wed Apr 30 20:14:51 IST 2014
Cluster Shared Volume Information: 0x0
Read-Only: disabled
/vol/DBCXARCHIVE002_E_Q32014_journal/DBCXARCHIVE002_E_Q32014_journal 900.1g (966429273600) (r/w, online, mapped)
Comment: " "
Serial#: e3eOF507DSuU
Share: none
Space Reservation: enabled (not honored by containing Aggregate)
Multiprotocol Type: windows_2008
Maps: DBCXARCHIVE003=34
Occupied Size: 716.7g (769556951040)
Creation Time: Tue Aug 12 20:24:14 IST 2014
Cluster Shared Volume Information: 0x0
Read-Only: disabled
Wherein the output is of only 2 devices , it has more than x devices appended in the log file.
I just need 4 details from each module ,
The first line contains 3 needed details
Device Name : /vol/DBCXARCHIVE002_E_Q22014_journal/DBCXARCHIVE002_E_Q22014_journal
Total Capacity : 1.0t (1149038714880)
Status : (r/w, online, mapped)
And the 4th Detail I need is Occupied Size: 1004.0g (1077986099200)
So the CSV output should look like below :
I am not just a beginner at coding and trying to achieve this with the below code, it does not help much though :/
$logfile = Get-Content .\DeviceDetails.log
$l1 = $logfile | select-string "/vol"
$l2 = $logfile | select-string "Occupied Size: "
$objs =#()
$l1 | ForEach {
$o = $_
$l2 | ForEach {
$o1 = $_
$Object22 = New-Object PSObject -Property #{
'LUN Name , Total Space, Status, Occupied Size' = "$o"
'Occupied Size' = "$o1"
}
}
$objs += $Object22
}
$objs
$obj = $null # variable to store each output object temporarily
Get-Content .\t.txt | ForEach-Object { # loop over input lines
if ($_ -match '^\s*(/vol.+?)\s+(.+? \(.+?\))\s+(\(.+?\))') {
# Create a custom object with all properties of interest,
# and store it in the $obj variable created above.
# What the regex's capture groups - (...) - captured is available in the
# the automatic $Matches variable via indices starting at 1.
$obj = [pscustomobject] #{
'Device Name' = $Matches[1]
'Total Space' = $Matches[2]
'Status' = $Matches[3]
'Occupied Size' = $null # filled below
}
} elseif ($_ -match '\bOccupied Size: (.*)') {
# Set the 'Occupied Size' property value...
$obj.'Occupied Size' = $Matches[1]
# ... and output the complete object.
$obj
}
} | Export-Csv -NoTypeInformation out.csv
- Note that Export-Csv defaults to ASCII output encoding; change that with the -Encoding parameter.
- To extract only the numbers inside (...) for the Total Space and Occupied Size columns, use
$_ -match '^\s*(/vol.+?)\s+.+?\s+\((.+?)\)\s+(\(.+?\))' and
$_ -match '\bOccupied Size: .+? \((.*)\)' instead.
Note how this solution processes the input file line by line, which keeps memory use down, though generally at the expense of performance.
As for what you tried:
You collect the entire input file as an array in memory ($logfile = Get-Content .\DeviceDetails.log)
You then filter this array twice into parallel arrays, containing corresponding lines of interest.
Things go wrong when you attempt to nest the processing of these 2 arrays. Instead of nesting, you must enumerate them in parallel, as their corresponding indices contain matching entries.
Additionally:
a line such as 'LUN Name , Total Space, Status, Occupied Size' = "$o" creates a single property named LUN Name , Total Space, Status, Occupied Size, which is not the intent.
in order to create distinct properties (to be reflected as distinct colums in CSV output), you must create them as such, which requires parsing the input into distinct values accordingly.