Powershell--split CSV into 2 files based on column value

Powershell--split CSV into 2 files based on column value - powershell

I have several large CSV files that I need to split based on a match in one column.
The column is called "Grading Period Title" and there are up to 10 different values. I want to separate all values of "Overall" into overall.CSV file, and all other values to term.CSV and preserve all the other columns of data.
Grading Period Title
Score
Overall
5
22-23 MC T2
6
Overall
7
22-23 T2
1
I found this code to group and split by all the values, but I can't get it to split overall and all other values into 2 files
#Splitting a CSV file into multiple files based on column value
$groups = Import-Csv -Path "csvfile.csv" | Group-Object 'Grading Period Title' -eq '*Overall*'
$groups | ForEach-Object {$_.Group | Export-Csv "$($_.Name).csv" -NoTypeInformation}
Count Name Group
278 22-23 MC T2
71657 Overall
71275 22-23 T2
104 22-23 8th Blk Q2
So they are grouped, but I don't know how to select the Overall group as one file, and the rest as the 2nd file and name them.
thanks!

To just split it, you can filter with Where-Object, for example:
# overall group
Import-Csv -Path "csvfile.csv" |
Where-Object { $_.'Grading Period Title' -Like '*Overall*' } |
Export-CSV -Path "overall.csv" -NoTypeInformation
# looks like
Grading Period Title Score
-------------------- -----
Overall 5
Overall 7
# term group
Import-Csv -Path "csvfile.csv" |
Where-Object { $_.'Grading Period Title' -NotLike '*Overall*' } |
Export-CSV -Path "term.csv" -NoTypeInformation
# looks like
Grading Period Title Score
-------------------- -----
22-23 MC T2 6
22-23 T2 1

To complement the clear answer from #Cpt.Whale and do this is one iteration using the Steppable Pipeline:
Import-Csv .\csvfile.csv |
ForEach-Object -Begin {
$Overall = { Export-CSV -notype -Path .\Overall.csv }.GetSteppablePipeline()
$Term = { Export-CSV -notype -Path .\Term.csv }.GetSteppablePipeline()
$Overall.Begin($True)
$Term.Begin($True)
} -Process {
if ( $_.'Grading Period Title' -Like '*Overall*' ) {
$Overall.Process($_)
}
else {
$Term.Process($_)
}
} -End {
$Overall.End()
$Term.End()
}
For details, see: Mastering the (steppable) pipeline.

Related

Get results of For-Each arrays and display in a table with column headers one line per results

I am trying to get a list of files and a count of the number of rows in each file displayed in a table consisting of two columns, Name and Lines.
I have tried using format table but I don't think the problem is with the format of the table and more to do with my results being separate results. See below
#Get a list of files in the filepath location
$files = Get-ChildItem $filepath
$files | ForEach-Object { $_ ; $_ | Get-Content | Measure-Object -Line} | Format-Table Name,Lines
Expected results
Name Lines
File A
9
File B
89
Actual Results
Name Lines
File A
9
File B
89

Another approach how to make a custom object like this: Using PowerShell's Calculated Properties:
$files | Select-Object -Property #{ N = 'Name' ; E = { $_.Name} },
#{ N = 'Lines'; E = { ($_ | Get-Content | Measure-Object -Line).Lines } }
Name Lines
---- -----
dotNetEnumClass.ps1 232
DotNetVersions.ps1 9
dotNETversionTable.ps1 64

Typically you would make a custom object like this, instead of outputting two different kinds of objects.
$files | ForEach-Object {
$lines = $_ | Get-Content | Measure-Object -Line
[pscustomobject]#{name = $_.name
lines = $lines.lines}
}
name lines
---- -----
rof.ps1 11
rof.ps1~ 7
wai.ps1 2
wai.ps1~ 1

Extract Columns based on Row data from .CSV

Total Newbie with PowerShell, but used to use WSH with .vbs back in the day - so hopefully can structure this question correctly.
I would like to extract x number of columns from a .csv file, only if the row data equals a certain value - and then send the filtered data to a new .csv in another destination.
So taking a saved Windows event log as an example, I would like to extract Columns A-F but only on rows where column 'A' equals 'Error' - and then send that output to a new .csv in a child directory.
I think I am pretty close, but can only get it to save columns A-F but no rows with the data I need!
Can anyone help me figure this out or show me where I am going wrong please?
$folderPath = 'C:\DLA\'
$folderPathDest = 'C:\DLA\OUT\'
$desiredColumns = 'A','B','C','D','E','F'
$topics.Where({$desiredColumns.play -eq 'Error'}).topic
Get-ChildItem $folderPath -Name |
ForEach-Object {
$filePath = $folderPath + $_
$filePathdest = $folderPathDest + $_
Import-Csv $filePath | Select $desiredColumns | Select $topics |
Export-Csv -Path $filePathDest –NoTypeInformation
}

Just put the filter directly in your pipeline:
Import-Csv $filePath | Select $desiredColumns | where {$_.A -eq 'Error'} | Export-Csv -Path $filePathDest –NoTypeInformation

Below command worked for me to extract all the columns to a new file. This can be modified to select the desired columns:
import-csv $filePath | ? { $_.columnName -eq 'Error' } | export-csv $filePathDest -NoTypeInformation
columnName is the header title of the columns

Multiple Criteria Matching in PowerShell

Hello PowerShell Scriptwriters,
I got an objective to count rows, based on the multiple criteria matching. My PowerShell script can able to fetch me the end result, but it consumes too much time[when the rows are more, the time it consumes becomes even more]. Is there a way to optimism my existing code? I've shared my code for your reference.
$csvfile = Import-csv "D:\file\filename.csv"
$name_unique = $csvfile | ForEach-Object {$_.Name} | Select-Object -Unique
$region_unique = $csvfile | ForEach-Object {$_."Region Location"} | Select-Object -Unique
$cost_unique = $csvfile | ForEach-Object {$_."Product Cost"} | Select-Object -Unique
Write-host "Save Time on Report" $csvfile.Length
foreach($nu in $name_unique)
{
$inc = 1
foreach($au in $region_unique)
{
foreach($tu in $cost_unique)
{
foreach ($mainfile in $csvfile)
{
if (($mainfile."Region Location" -eq $au) -and ($mainfile.'Product Cost' -eq $tu) -and ($mainfile.Name -eq $nu))
{
$inc++ #Matching Counter
}
}
}
}
$inc #expected to display Row values with the total count.And export the result as csv
}

You can do this quite simply using the Group option on a Powershell object.
$csvfile = Import-csv "D:\file\filename.csv"
$csvfile | Group Name,"Region Location","Product Cost" | Select Name, Count
This gives output something like the below
Name Count
---- ------
f1, syd, 10 2
f2, syd, 10 1
f3, syd, 20 1
f4, melb, 10 2
f2, syd, 40 1
P.S. the code you provided above is not matching all of the fields, it is simply checking the Name parameter (looping through the other parameters needlessly).

Filtering files by partial name match

I have a network share with 20.000 XML files in the format
username-computername.xml
There are duplicate entries in the form of (when a user received a new comptuer)
user1-computer1.xml
user1-computer2.xml
or
BLRPPR-SKB52084.xml
BLRSIA-SKB50871.xml
S028DS-SKB51334.xml
s028ds-SKB52424.xml
S02FL6-SKB51644.xml
S02FL6-SKB52197.xml
S02VUD-SKB52083.xml
Since im going to manipulate the XMLs later I can't just dismiss properties of the array as at the very least I need the full path. The aim is, if a duplicate is found, the one with the newer timestamp is being used.
Here is a snipet of the code where I need that logic
$xmlfiles = Get-ChildItem "network share"
Here I'm just doing a foreach loop:
foreach ($xmlfile in $xmlfiles) {
[xml]$xmlcontent = Get-Content -Path $xmlfile.FullName -Encoding UTF8
Select-Xml -Xml $xmlcontent -Xpath " "
# create [pscustomobject] etc...
}
Essentially what I need is
if ($xmlfiles.Name.Split("-")[0]) - duplicate) {
# select the one with higher $xmlfiles.LastWriteTime and store either
# the full object or the $xmlfiles.FullName
}
Ideally that should be part of the foreach loop to not to have to loop through twice.

You can use Group-Object to group files by a custom attribute:
$xmlfiles | Group-Object { $_.Name.Split('-')[0] }
The above statement will produce a result like this:
Count Name Group
----- ---- -----
1 BLRPPR {BLRPPR-SKB52084.xml}
1 BLRSIA {BLRSIA-SKB50871.xml}
2 S028DS {S028DS-SKB51334.xml, s028ds-SKB52424.xml}
2 S02FL6 {S02FL6-SKB51644.xml, S02FL6-SKB52197.xml}
1 S02VUD {S02VUD-SKB52083.xml}
where the Group property contains the original FileInfo objects.
Expand the groups in a ForEach-Object loop, sort each group by LastWriteTime, and select the most recent file from it:
... | ForEach-Object {
$_.Group | Sort-Object LastWriteTime -Desc | Select-Object -First 1
}

Count tabs per line and return the lines with too many tabs

Looking for a PowerShell script that looks in a text file for rows that have too many (or too few) tabs.
I found this PowerShell script that does exactly what I want (almost).
This counts the number of tabs per row:
Get-Content test.txt | ForEach-Object {
($_ | Select-String `t -all).matches | Measure-Object | Select-Object count
}
Can someone extend/modify/re-write this to return only the rows (with row numbers) that have more than, or less than, X number of tabs per row?

Don't use Get-Content before piping to Select-String, you'll lose contextual information about each line.
Instead, use the -Path parameter with Select-String:
$Tabs = Select-String -Path .\test.txt -Pattern "`t" -AllMatches
$Tabs |Select-Object LineNumber,Line,#{Name='TabCount';Expression={ $_.Matches.Count }}
To return only the ones where the number of tabs is greater than $x, use Where-Object:
$x = 3
$Tabs |Where-Object { $_.TabCount -ge $x} | Select-Object -ExpandProperty Line
If you just want a quick overview of the distribution, you could also use Group-Object:
Get-Content .\test.txt | Group-Object { "{0} tabs" -f [regex]::Matches($_,"`t").Count }

Lots of ways to do this. Get-Content works just fine for me and we create a custom object that you can then filter as desired.
Get-Content test.txt | ForEach-Object{
New-Object PSObject -Property #{
Line = $_
LineNumber = $_.ReadCount
NumberofTabs = [regex]::matches($_,"`t").count
}
}
Use the .net regex method to count the tabs returned and populate a value based on the result.
NumberofTabs Number Line
------------ ------ ----
8 1 ;lkjasfdsa
8 2 asdfasdf
4 3 asdfasdfasdfa
2 4 fasdfjasdlfjas;l
Now you can use PowerShell to filter as you see fit.
} | Where-Object { $_.NumberofTabs -ne 4}
So if 4 was the perfect number then line 3 would be ommited from the results.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Powershell--split CSV into 2 files based on column value - powershell

Related

Get results of For-Each arrays and display in a table with column headers one line per results

Extract Columns based on Row data from .CSV

Multiple Criteria Matching in PowerShell

Filtering files by partial name match

Count tabs per line and return the lines with too many tabs

Categories

Resources