Multiple Criteria Matching in PowerShell - powershell

Hello PowerShell Scriptwriters,
I got an objective to count rows, based on the multiple criteria matching. My PowerShell script can able to fetch me the end result, but it consumes too much time[when the rows are more, the time it consumes becomes even more]. Is there a way to optimism my existing code? I've shared my code for your reference.
$csvfile = Import-csv "D:\file\filename.csv"
$name_unique = $csvfile | ForEach-Object {$_.Name} | Select-Object -Unique
$region_unique = $csvfile | ForEach-Object {$_."Region Location"} | Select-Object -Unique
$cost_unique = $csvfile | ForEach-Object {$_."Product Cost"} | Select-Object -Unique
Write-host "Save Time on Report" $csvfile.Length
foreach($nu in $name_unique)
{
$inc = 1
foreach($au in $region_unique)
{
foreach($tu in $cost_unique)
{
foreach ($mainfile in $csvfile)
{
if (($mainfile."Region Location" -eq $au) -and ($mainfile.'Product Cost' -eq $tu) -and ($mainfile.Name -eq $nu))
{
$inc++ #Matching Counter
}
}
}
}
$inc #expected to display Row values with the total count.And export the result as csv
}

You can do this quite simply using the Group option on a Powershell object.
$csvfile = Import-csv "D:\file\filename.csv"
$csvfile | Group Name,"Region Location","Product Cost" | Select Name, Count
This gives output something like the below
Name Count
---- ------
f1, syd, 10 2
f2, syd, 10 1
f3, syd, 20 1
f4, melb, 10 2
f2, syd, 40 1
P.S. the code you provided above is not matching all of the fields, it is simply checking the Name parameter (looping through the other parameters needlessly).

Related

Powershell--split CSV into 2 files based on column value

I have several large CSV files that I need to split based on a match in one column.
The column is called "Grading Period Title" and there are up to 10 different values. I want to separate all values of "Overall" into overall.CSV file, and all other values to term.CSV and preserve all the other columns of data.
Grading Period Title
Score
Overall
5
22-23 MC T2
6
Overall
7
22-23 T2
1
I found this code to group and split by all the values, but I can't get it to split overall and all other values into 2 files
#Splitting a CSV file into multiple files based on column value
$groups = Import-Csv -Path "csvfile.csv" | Group-Object 'Grading Period Title' -eq '*Overall*'
$groups | ForEach-Object {$_.Group | Export-Csv "$($_.Name).csv" -NoTypeInformation}
Count Name Group
278 22-23 MC T2
71657 Overall
71275 22-23 T2
104 22-23 8th Blk Q2
So they are grouped, but I don't know how to select the Overall group as one file, and the rest as the 2nd file and name them.
thanks!
To just split it, you can filter with Where-Object, for example:
# overall group
Import-Csv -Path "csvfile.csv" |
Where-Object { $_.'Grading Period Title' -Like '*Overall*' } |
Export-CSV -Path "overall.csv" -NoTypeInformation
# looks like
Grading Period Title Score
-------------------- -----
Overall 5
Overall 7
# term group
Import-Csv -Path "csvfile.csv" |
Where-Object { $_.'Grading Period Title' -NotLike '*Overall*' } |
Export-CSV -Path "term.csv" -NoTypeInformation
# looks like
Grading Period Title Score
-------------------- -----
22-23 MC T2 6
22-23 T2 1
To complement the clear answer from #Cpt.Whale and do this is one iteration using the Steppable Pipeline:
Import-Csv .\csvfile.csv |
ForEach-Object -Begin {
$Overall = { Export-CSV -notype -Path .\Overall.csv }.GetSteppablePipeline()
$Term = { Export-CSV -notype -Path .\Term.csv }.GetSteppablePipeline()
$Overall.Begin($True)
$Term.Begin($True)
} -Process {
if ( $_.'Grading Period Title' -Like '*Overall*' ) {
$Overall.Process($_)
}
else {
$Term.Process($_)
}
} -End {
$Overall.End()
$Term.End()
}
For details, see: Mastering the (steppable) pipeline.

Group csv column data and display count using powershell

I am having below data in my csv
"Path_Name","Lun_Number","status"
"vmhba0:C2:T0:L1","1","active"
"vmhba0:C1:T0:L1","1","active"
"vmhba1:C0:T7:L230","230","active"
"vmhba1:C0:T7:L231","231","active"
"vmhba1:C0:T7:L232","230","active"
"vmhba1:C0:T7:L235","231","active"
"vmhba1:C0:T7:L236","230","active"
I need to group the data based on Lun_Number and create a column to get the count of those Lun_Number
expected output
"Path_Name","Lun_Number","status","Count"
"vmhba0:C2:T0:L1","1","active", 2
"vmhba0:C1:T0:L1","1","active",
"vmhba1:C0:T7:L230","230","active",3
"vmhba1:C0:T7:L231","230","active",
"vmhba1:C0:T7:L232","230","active",
"vmhba1:C0:T7:L235","231","active",2
"vmhba1:C0:T7:L236","231","active",
Please let me know how can I do that. I tried group-object, sort-object but it doesn't seems to be working
Below is the code which is generating the above csv
$status_csv = Import-Csv -Path E:\pathstate.csv
$path_csv = Import-Csv -Path E:\PathInfo.csv
foreach($row in $path_csv)
{
$path_1 = $row.Path_Name
$path_2 = $status_csv | where{$_.Name -match "^$path_1$" }
[PsCustomObject]#{
Path_Name = $path_1
Lun_Number = $row.Lun_Number
status = $path_2.PathState
} | Export-Csv -Path E:\FinalReport.csv -NoTypeInformation -Append | Group-Object Lun_Number
}
I can see the approach you're trying to take, and I think something like the following could be useful:
$path_csv = Import-Csv 'sample.csv'
$Unique_Counts = $path_csv.Lun_Number | Group-Object | Select-Object Name,Count
This will help give you the output that you can use as part of your mapping for later, where you can make it dynamic to match with the row you're checking. Which you can use to pull out through a loop, such as $Unique_Counts.
Meaning that it you do something like $Unique_Counts[0].Count, will be able to grab the Lun_Number associated to it (listed as Name in the array).
Name Count
---- -----
1 2
230 3
231 2
If you're okay with having the count for each row, you can then use something like what you have:
foreach($row in $path_csv)
{
$path_1 = $row.Path_Name
[PsCustomObject]#{
Path_Name = $path_1
Lun_Number = $row.Lun_Number
status = $row.status
count = $Unique_Counts | Where-Object {$_.name -eq $row.Lun_Number} | Select-Object -ExpandProperty Count
} | Export-Csv $finalReport -NoTypeInformation -Append
}
This then provides me with the following outcome:
"Path_Name","Lun_Number","status","count"
"vmhba0:C2:T0:L1","1","active","2"
"vmhba0:C1:T0:L1","1","active","2"
"vmhba1:C0:T7:L230","230","active","3"
"vmhba1:C0:T7:L231","231","active","2"
"vmhba1:C0:T7:L232","230","active","3"
"vmhba1:C0:T7:L235","231","active","2"
"vmhba1:C0:T7:L236","230","active","3"
Hope this helps, it may be useful to understand more with the use case, but at least you can grab the unique count for all Lun_numbers. Just has to put it in all rows too.

Get results of For-Each arrays and display in a table with column headers one line per results

I am trying to get a list of files and a count of the number of rows in each file displayed in a table consisting of two columns, Name and Lines.
I have tried using format table but I don't think the problem is with the format of the table and more to do with my results being separate results. See below
#Get a list of files in the filepath location
$files = Get-ChildItem $filepath
$files | ForEach-Object { $_ ; $_ | Get-Content | Measure-Object -Line} | Format-Table Name,Lines
Expected results
Name Lines
File A
9
File B
89
Actual Results
Name Lines
File A
9
File B
89
Another approach how to make a custom object like this: Using PowerShell's Calculated Properties:
$files | Select-Object -Property #{ N = 'Name' ; E = { $_.Name} },
#{ N = 'Lines'; E = { ($_ | Get-Content | Measure-Object -Line).Lines } }
Name Lines
---- -----
dotNetEnumClass.ps1 232
DotNetVersions.ps1 9
dotNETversionTable.ps1 64
Typically you would make a custom object like this, instead of outputting two different kinds of objects.
$files | ForEach-Object {
$lines = $_ | Get-Content | Measure-Object -Line
[pscustomobject]#{name = $_.name
lines = $lines.lines}
}
name lines
---- -----
rof.ps1 11
rof.ps1~ 7
wai.ps1 2
wai.ps1~ 1

CSV file - count distinct, group by, sum

I have a file that looks like the following;
- Visitor ID,Revenue,Channel,Flight
- 1234,100,Email,BA123
- 2345,200,PPC,BA112
- 456,150,Email,BA456
I need to produce a file that contains;
The count of distinct Visitor IDs (3)
The total revenue (450)
The count of each Channel
Email 2
PPC 2
The count of each Flight
BA123 1
BA112 1
BA456 1
So far I have the following code, however when executing this on the 350MB file, it takes too long and in some cases breaks the memory limit. As I have to run this function on multiple columns, it is going through the file many times. I ideally need to do this in one file pass.
$file = 'log.txt'
function GroupBy($columnName)
{
$objects = Import-Csv -Delimiter "`t" $file | Group-Object $columnName |
Select-Object #{n=$columnName;e={$_.Group[0].$columnName}}, Count
for($i=0;$i -lt $objects.count;$I++) {
$line += $columnName +"|"+$objects[$I]."$columnName" +"|Count|"+ $objects[$I].'Count' + $OFS
}
return $line
}
$finalOutput += GroupBy "Channel"
$finalOutput += GroupBy "Flight"
Write-Host $finalOutput
Any help would be much appreciated.
Thanks,
Craig
The fact that your are importing the CSV again for each column is what is killing your script. Try to do the loading once, then re-use the data. For example:
$data = Import-Csv .\data.csv
$flights = $data | Group-Object Flight -NoElement | ForEach-Object {[PsCustomObject]#{Flight=$_.Name;Count=$_.Count}}
$visitors = ($data | Group-Object "Visitor ID" | Measure-Object).Count
$revenue = ($data | Measure-Object Revenue -Sum).Sum
$channel = $data | Group-Object Channel -NoElement | ForEach-Object {[PsCustomObject]#{Channel=$_.Name;Count=$_.Count}}
You can display the data like this:
"Revenue : $revenue"
"Visitors: $visitors"
$flights | Format-Table -AutoSize
$channel | Format-Table -AutoSize
This will probably work - using hashmaps.
Pros: It will be faster/use less memory.
Cons: It is less readable
by far than Group-Object, and requires more code.
Make it even less memory-hungry: Read the CSV-file line by line
$data = Import-CSV -Path "C:\temp\data.csv" -Delimiter ","
$DistinctVisitors = #{}
$TotalRevenue = 0
$ChannelCount = #{}
$FlightCount = #{}
$data | ForEach-Object {
$DistinctVisitors[$_.'Visitor ID'] = $true
$TotalRevenue += $_.Revenue
if (-not $ChannelCount.ContainsKey($_.Channel)) {
$ChannelCount[$_.Channel] = 0
}
$ChannelCount[$_.Channel] += 1
if (-not $FlightCount.ContainsKey($_.Flight)) {
$FlightCount[$_.Flight] = 0
}
$FlightCount[$_.Flight] += 1
}
$DistinctVisitorsCount = $DistinctVisitors.Keys | Measure-Object | Select-Object -ExpandProperty Count
Write-Output "The count of distinc Visitor IDs $DistinctVisitorsCount"
Write-Output "The total revenue $TotalRevenue"
Write-Output "The Count of each Channel"
$ChannelCount.Keys | ForEach-Object {
Write-Output "$_ $($ChannelCount[$_])"
}
Write-Output "The count of each Flight"
$FlightCount.Keys | ForEach-Object {
Write-Output "$_ $($FlightCount[$_])"
}

compare columns in two csv files

With all of the examples out there you would think I could have found my solution. :-)
Anyway, I have two csv files; one with two columns, one with 4. I need to compare one column from each one using powershell. I thought I had it figured out but when I did a compare of my results, it comes back as false when I know it should be true. Here's what I have so far:
$newemp = Import-Csv -Path "C:\Temp\newemp.csv" -Header login_id, lastname, firstname, other | Select-Object "login_id"
$ps = Import-Csv -Path "C:\Temp\Emplid_LoginID.csv" | Select-Object "login id"
If ($newemp -eq $ps)
{
write-host "IDs match" -forgroundcolor green
}
Else
{
write-host "Not all IDs match" -backgroundcolor yellow -foregroundcolor black
}
I had to specifiy headers for the first file because it doesn't have any. What's weird is that I can call each variable to see what it holds and they end up with the same info but for some reason still comes up as false. This occurs even if there is only one row (not counting the header row).
I started to parse them as arrays but wasn't quite sure that was the right thing. What's important is that I compare row1 of the first file with with row1 of the second file. I can't just do a simple -match or -contains.
EDIT: One annoying thing is that the variables seem to hold the header row as well. When I call each one, the header is shown. But if I call both variables, I only see one header but two rows.
I just added the following check but getting the same results (False for everything):
$results = Compare-Object -ReferenceObject $newemp -DifferenceObject $ps -PassThru | ForEach-Object { $_.InputObject }
Using latkin's answer from here I think this would give you the result set you're looking for. As per latkin's comment, the property comparison is redundant for your purposes but I left it in as it's good to know. Additionally the header is specified even for the csv with headers to prevent the header row being included in the comparison.
$newemp = Import-Csv -Path "C:\Temp\_sotemp\Book1.csv" -Header loginid |
Select-Object "loginid"
$ps = Import-Csv -Path "C:\Temp\_sotemp\Book2.csv" -Header loginid |
Select-Object "loginid"
#get list of (imported) CSV properties
$props1 = $newemp | gm -MemberType NoteProperty | select -expand Name | sort
$props2 = $ps | gm -MemberType NoteProperty | select -expand Name | sort
#first check that properties match
#omit this step if you know for sure they will be
if(Compare-Object $props1 $props2){
throw "Properties are not the same! [$props1] [$props2]"
}
#pass properties list to Compare-Object
else{
Compare-Object $newemp $ps -Property $props1
}
In the second line, I see there a space "login id" and the first line doesn't have it. Could that be an issue. Try having the same name for the headers in the .csv files itself. And it works for without providing header or select statements. Below is my experiment based upon your input.
emp.csv
loginid firstname lastname
------------------------------
abc123 John patel
zxy321 Kohn smith
sdf120 Maun scott
tiy123 Dham rye
k2340 Naam mason
lk10j5 Shaan kelso
303sk Doug smith
empids.csv
loginid
-------
abc123
zxy321
sdf120
tiy123
PS C:\>$newemp = Import-csv C:\scripts\emp.csv
PS C:\>$ps = Import-CSV C:\scripts\empids.csv
PS C:\>$results = Compare-Object -ReferenceObject $newemp -DifferenceObject $ps | foreach { $_.InputObject}
Shows the difference objects that are not in $ps
loginid firstname lastname SideIndicator
------- --------- -------- -------------
k2340 Naam mason <=
lk10j5 Shaan kelso <=
303sk Doug smith <=
I am not sure if this is what you are looking for but i have used the PowerShell to do some CSV formatting for myself.
$test = Import-Csv .\Desktop\Vmtools-compare.csv
foreach ($i in $test) {
foreach ($n in $i.name) {
foreach ($m in $test) {
$check = "yes"
if ($n -eq $m.prod) {
$check = "no"
break
}
}
if ($check -ne "no") {$n}
}
}
this is how my excel csv file looks like:
prod name
1 3
2 5
3 8
4 2
5 0
and script outputs this:
8
0
so basically script takes each number under Name column and then checks it against prod column. If the number is there then it won't display else it will display that number.
I have also done it the opposite way:
$test = Import-Csv c:\test.csv
foreach ($i in $test) {
foreach ($n in $i.name) {
foreach ($m in $test) {
$check = "yes"
if ($n -eq $m.prod) {echo $n}
}
}
}
this is how my excel csv looks like:
prod name
1 3
2 5
3 8
4 2
5 0
and script outputs this:
3
5
2
so script shows the matching entries only.
You can play around with the code to look at different columns.