I have several large CSV files that I need to split based on a match in one column.
The column is called "Grading Period Title" and there are up to 10 different values. I want to separate all values of "Overall" into overall.CSV file, and all other values to term.CSV and preserve all the other columns of data.
Grading Period Title
Score
Overall
5
22-23 MC T2
6
Overall
7
22-23 T2
1
I found this code to group and split by all the values, but I can't get it to split overall and all other values into 2 files
#Splitting a CSV file into multiple files based on column value
$groups = Import-Csv -Path "csvfile.csv" | Group-Object 'Grading Period Title' -eq '*Overall*'
$groups | ForEach-Object {$_.Group | Export-Csv "$($_.Name).csv" -NoTypeInformation}
Count Name Group
278 22-23 MC T2
71657 Overall
71275 22-23 T2
104 22-23 8th Blk Q2
So they are grouped, but I don't know how to select the Overall group as one file, and the rest as the 2nd file and name them.
thanks!
To just split it, you can filter with Where-Object, for example:
# overall group
Import-Csv -Path "csvfile.csv" |
Where-Object { $_.'Grading Period Title' -Like '*Overall*' } |
Export-CSV -Path "overall.csv" -NoTypeInformation
# looks like
Grading Period Title Score
-------------------- -----
Overall 5
Overall 7
# term group
Import-Csv -Path "csvfile.csv" |
Where-Object { $_.'Grading Period Title' -NotLike '*Overall*' } |
Export-CSV -Path "term.csv" -NoTypeInformation
# looks like
Grading Period Title Score
-------------------- -----
22-23 MC T2 6
22-23 T2 1
To complement the clear answer from #Cpt.Whale and do this is one iteration using the Steppable Pipeline:
Import-Csv .\csvfile.csv |
ForEach-Object -Begin {
$Overall = { Export-CSV -notype -Path .\Overall.csv }.GetSteppablePipeline()
$Term = { Export-CSV -notype -Path .\Term.csv }.GetSteppablePipeline()
$Overall.Begin($True)
$Term.Begin($True)
} -Process {
if ( $_.'Grading Period Title' -Like '*Overall*' ) {
$Overall.Process($_)
}
else {
$Term.Process($_)
}
} -End {
$Overall.End()
$Term.End()
}
For details, see: Mastering the (steppable) pipeline.
Related
I am trying to get a list of files and a count of the number of rows in each file displayed in a table consisting of two columns, Name and Lines.
I have tried using format table but I don't think the problem is with the format of the table and more to do with my results being separate results. See below
#Get a list of files in the filepath location
$files = Get-ChildItem $filepath
$files | ForEach-Object { $_ ; $_ | Get-Content | Measure-Object -Line} | Format-Table Name,Lines
Expected results
Name Lines
File A
9
File B
89
Actual Results
Name Lines
File A
9
File B
89
Another approach how to make a custom object like this: Using PowerShell's Calculated Properties:
$files | Select-Object -Property #{ N = 'Name' ; E = { $_.Name} },
#{ N = 'Lines'; E = { ($_ | Get-Content | Measure-Object -Line).Lines } }
Name Lines
---- -----
dotNetEnumClass.ps1 232
DotNetVersions.ps1 9
dotNETversionTable.ps1 64
Typically you would make a custom object like this, instead of outputting two different kinds of objects.
$files | ForEach-Object {
$lines = $_ | Get-Content | Measure-Object -Line
[pscustomobject]#{name = $_.name
lines = $lines.lines}
}
name lines
---- -----
rof.ps1 11
rof.ps1~ 7
wai.ps1 2
wai.ps1~ 1
Total Newbie with PowerShell, but used to use WSH with .vbs back in the day - so hopefully can structure this question correctly.
I would like to extract x number of columns from a .csv file, only if the row data equals a certain value - and then send the filtered data to a new .csv in another destination.
So taking a saved Windows event log as an example, I would like to extract Columns A-F but only on rows where column 'A' equals 'Error' - and then send that output to a new .csv in a child directory.
I think I am pretty close, but can only get it to save columns A-F but no rows with the data I need!
Can anyone help me figure this out or show me where I am going wrong please?
$folderPath = 'C:\DLA\'
$folderPathDest = 'C:\DLA\OUT\'
$desiredColumns = 'A','B','C','D','E','F'
$topics.Where({$desiredColumns.play -eq 'Error'}).topic
Get-ChildItem $folderPath -Name |
ForEach-Object {
$filePath = $folderPath + $_
$filePathdest = $folderPathDest + $_
Import-Csv $filePath | Select $desiredColumns | Select $topics |
Export-Csv -Path $filePathDest –NoTypeInformation
}
Just put the filter directly in your pipeline:
Import-Csv $filePath | Select $desiredColumns | where {$_.A -eq 'Error'} | Export-Csv -Path $filePathDest –NoTypeInformation
Below command worked for me to extract all the columns to a new file. This can be modified to select the desired columns:
import-csv $filePath | ? { $_.columnName -eq 'Error' } | export-csv $filePathDest -NoTypeInformation
columnName is the header title of the columns
Hello PowerShell Scriptwriters,
I got an objective to count rows, based on the multiple criteria matching. My PowerShell script can able to fetch me the end result, but it consumes too much time[when the rows are more, the time it consumes becomes even more]. Is there a way to optimism my existing code? I've shared my code for your reference.
$csvfile = Import-csv "D:\file\filename.csv"
$name_unique = $csvfile | ForEach-Object {$_.Name} | Select-Object -Unique
$region_unique = $csvfile | ForEach-Object {$_."Region Location"} | Select-Object -Unique
$cost_unique = $csvfile | ForEach-Object {$_."Product Cost"} | Select-Object -Unique
Write-host "Save Time on Report" $csvfile.Length
foreach($nu in $name_unique)
{
$inc = 1
foreach($au in $region_unique)
{
foreach($tu in $cost_unique)
{
foreach ($mainfile in $csvfile)
{
if (($mainfile."Region Location" -eq $au) -and ($mainfile.'Product Cost' -eq $tu) -and ($mainfile.Name -eq $nu))
{
$inc++ #Matching Counter
}
}
}
}
$inc #expected to display Row values with the total count.And export the result as csv
}
You can do this quite simply using the Group option on a Powershell object.
$csvfile = Import-csv "D:\file\filename.csv"
$csvfile | Group Name,"Region Location","Product Cost" | Select Name, Count
This gives output something like the below
Name Count
---- ------
f1, syd, 10 2
f2, syd, 10 1
f3, syd, 20 1
f4, melb, 10 2
f2, syd, 40 1
P.S. the code you provided above is not matching all of the fields, it is simply checking the Name parameter (looping through the other parameters needlessly).
I have a network share with 20.000 XML files in the format
username-computername.xml
There are duplicate entries in the form of (when a user received a new comptuer)
user1-computer1.xml
user1-computer2.xml
or
BLRPPR-SKB52084.xml
BLRSIA-SKB50871.xml
S028DS-SKB51334.xml
s028ds-SKB52424.xml
S02FL6-SKB51644.xml
S02FL6-SKB52197.xml
S02VUD-SKB52083.xml
Since im going to manipulate the XMLs later I can't just dismiss properties of the array as at the very least I need the full path. The aim is, if a duplicate is found, the one with the newer timestamp is being used.
Here is a snipet of the code where I need that logic
$xmlfiles = Get-ChildItem "network share"
Here I'm just doing a foreach loop:
foreach ($xmlfile in $xmlfiles) {
[xml]$xmlcontent = Get-Content -Path $xmlfile.FullName -Encoding UTF8
Select-Xml -Xml $xmlcontent -Xpath " "
# create [pscustomobject] etc...
}
Essentially what I need is
if ($xmlfiles.Name.Split("-")[0]) - duplicate) {
# select the one with higher $xmlfiles.LastWriteTime and store either
# the full object or the $xmlfiles.FullName
}
Ideally that should be part of the foreach loop to not to have to loop through twice.
You can use Group-Object to group files by a custom attribute:
$xmlfiles | Group-Object { $_.Name.Split('-')[0] }
The above statement will produce a result like this:
Count Name Group
----- ---- -----
1 BLRPPR {BLRPPR-SKB52084.xml}
1 BLRSIA {BLRSIA-SKB50871.xml}
2 S028DS {S028DS-SKB51334.xml, s028ds-SKB52424.xml}
2 S02FL6 {S02FL6-SKB51644.xml, S02FL6-SKB52197.xml}
1 S02VUD {S02VUD-SKB52083.xml}
where the Group property contains the original FileInfo objects.
Expand the groups in a ForEach-Object loop, sort each group by LastWriteTime, and select the most recent file from it:
... | ForEach-Object {
$_.Group | Sort-Object LastWriteTime -Desc | Select-Object -First 1
}
Looking for a PowerShell script that looks in a text file for rows that have too many (or too few) tabs.
I found this PowerShell script that does exactly what I want (almost).
This counts the number of tabs per row:
Get-Content test.txt | ForEach-Object {
($_ | Select-String `t -all).matches | Measure-Object | Select-Object count
}
Can someone extend/modify/re-write this to return only the rows (with row numbers) that have more than, or less than, X number of tabs per row?
Don't use Get-Content before piping to Select-String, you'll lose contextual information about each line.
Instead, use the -Path parameter with Select-String:
$Tabs = Select-String -Path .\test.txt -Pattern "`t" -AllMatches
$Tabs |Select-Object LineNumber,Line,#{Name='TabCount';Expression={ $_.Matches.Count }}
To return only the ones where the number of tabs is greater than $x, use Where-Object:
$x = 3
$Tabs |Where-Object { $_.TabCount -ge $x} | Select-Object -ExpandProperty Line
If you just want a quick overview of the distribution, you could also use Group-Object:
Get-Content .\test.txt | Group-Object { "{0} tabs" -f [regex]::Matches($_,"`t").Count }
Lots of ways to do this. Get-Content works just fine for me and we create a custom object that you can then filter as desired.
Get-Content test.txt | ForEach-Object{
New-Object PSObject -Property #{
Line = $_
LineNumber = $_.ReadCount
NumberofTabs = [regex]::matches($_,"`t").count
}
}
Use the .net regex method to count the tabs returned and populate a value based on the result.
NumberofTabs Number Line
------------ ------ ----
8 1 ;lkjasfdsa
8 2 asdfasdf
4 3 asdfasdfasdfa
2 4 fasdfjasdlfjas;l
Now you can use PowerShell to filter as you see fit.
} | Where-Object { $_.NumberofTabs -ne 4}
So if 4 was the perfect number then line 3 would be ommited from the results.