Line Count ForEach-Object on multiple files - powershell

I'm trying to figure out how to incorporate a line count that gets added to each file for a loop. The count needs to be put into the footer of each file as it checks it. Another concern is that the count needs to include the addition of the header and footer lines (i.e. 8 lines + 1 header + 1 footer = 10). My code I'm using is below and I know that the code to count the lines is Get-Content $mypath | Measure-Object -Line | {$linecount = $_.Count} but I dont know how to properly incorporate it. Any suggestions?
Get-ChildItem $destinationfolderpath -REcurse -Filter *.txt | ForEach-Object -Begin { $seq = 0 } -Process {
$seq++
$seq1 = "{0:D4}" -f $seq; $header="File Sequence Number $seq1"
$footer="File Sequence Number $seq1 and Line Count $looplinecount"
$header + "`n" + (Get-Content $_.FullName | Out-String) + $footer | Set-Content -Path $_.FullName
}

So load the content of the file to a variable within the loop, perform your measure -line on that variable, add 2 (one of the header line, one for the footer line), and drop that into a sub-expression for the footer...
Get-ChildItem $destinationfolderpath -REcurse -Filter *.txt | ForEach-Object -Begin { $seq = 0 } -Process {
$seq++
$seq1 = "{0:D4}" -f $seq
$header="File Sequence Number $seq1"
$Content=Get-Content $_.FullName | Out-String
$footer="File Sequence Number $seq1 and Line Count $(($content|measure -line|select -expand lines)+2)"
"$header`n$Content$footer" | Set-Content -Path $_.FullName
}

Related

How to select strings from file via one lines?

How to select strings from file via one lines?
For example my file contains strings
string1
string2
string3
string4
i want get
string2
string4
I try it this way
Get-Content -Path "E:\myfile.txt" | Select-String
but i don't know how make this from Select-String method
If you literally want to select these two lines, then I guess this is the shortest way to do that:
(Get-Content -Path "E:\myfile.txt")[1,3]
or
Get-Content -Path "E:\myfile.txt" | Select-Object -Index 1,3
However, if you mean you want to select only the even numbered lines from the file, you could do this:
# return only the even lines (for odd lines, do for ($i = 0; ...)
$text = Get-Content -Path "E:\myfile.txt"; for ($i = 1; $i -lt #($text).Count; $i+=2) { $text[$i] }
Or by using Select-String
# return only the even lines (for odd lines, remove the ! exclamation mark
(Select-String -Path "E:\myfile.txt" -Pattern '.*' | Where-Object {!($_.LineNumber % 2)}).Line
Get-Content -Path "~\Desktop\strings.txt" | Select-String -Pattern "string2|string4"
You can use the Where-Object cmdlet to filter a stream of objects (strings in this case):
Get-Content -Path "E:\myfile.txt" | Where-Object {$_ -match '[24]$'}
# or
Get-Content -Path "E:\myfile.txt" | Where-Object {$_ -like '*[24]'}
# or
Get-Content -Path "E:\myfile.txt" | Where-Object {$_.EndsWith('2') -or $_.EndsWith('4')'}
If you want only even-numbered lines from the file:
Get-Content -Path "E:\myfile.txt" | Where-Object {$_.ReadCount % 2 -eq 0}

Splitting file into smaller files, working script, but need some tweaks

I have a script here that looks for a delimiter in a text file with several reports in it.  The script saves each individual report as it's own text document. The tweaks I'm trying to achieve are:
In the middle of the data of each page there is - SPEC #: RX:<string>.  I want that string to be saved as the filename.
it currently saves from the delimiter down to the next one. This ignores the first report and grabs every one after. I want it to go from the delimiter UP to the next one, but I haven't figured out how to achieve that.
$InPC = "C:\Users\path"
Get-ChildItem -Path $InPC -Filter *.txt | ForEach-Object -Process {
$basename= $_.BaseName
$m = ( ( Get-Content $_.FullName | Where { $_ | Select-String "END OF
REPORT" -Quiet } | Measure-Object | ForEach-Object { $_.Count } ) -ge 2)
$a = 1
if ($m) {
Get-Content $_.FullName | % {
If ($_ -match "END OF REPORT") {
$OutputFile = "$InPC\$basename _$a.txt"
$a++
}
Add-Content $OutputFile $_
}
Remove-Item $_.FullName
}
}
This works, as stated it outputs the file with END OF REPORT on top, the first report in the file gets omitted as it does not have END OF REPORT above it.
Edited code:
$InPC = 'C:\Path' #
ForEach($File in Get-ChildItem -Path $InPC -Filter *.txt){
$RepNum=0
ForEach($Report in (([IO.File]::ReadAllText('C:\Path'$File) -split 'END OF REPORT\r?\n?' -ne '')){
if ($Report -match 'SPEC #: RX:(?<ReportFile>.*?)\.'){
$ReportFile=$Matches.ReportFile
}
$OutputFile = "{0}\{1}_{2}_{3}.txt" -f $InPC,$File.BaseName,$ReportFile,++$RepNum
$Report | Add-Content $OutputFile
}
# Remove-Item $File.FullName
}
I suggest to use Regular Expressions to
read in the file with -raw parameter and
split the file at the marker END OF REPORT into sections
use the 'SPEC #: RX:(?<ReportFile>.*?)\.' with a named capture group to extract the string
Edit adapted to PowerShell v2
## Q:\Test\2019\09\12\SO_57911471.ps1
$InPC = 'C:\Users\path' # 'Q:\Test\2019\09\12\' #
ForEach($File in Get-ChildItem -Path $InPC -Filter *.txt){
$RepNum=0
ForEach($Report in (((Get-Content $File.FullName) -join "`n") -split 'END OF REPORT\r?\n?' -ne '')){
if ($Report -match 'SPEC #: RX:(?<ReportFile>.*?)\.'){
$ReportFile=$Matches.ReportFile
}
$OutputFile = "{0}\{1}_{2}_{3}.txt" -f $InPC,$File.BaseName,$ReportFile,++$RepNum
$Report | Add-Content $OutputFile
}
# Remove-Item $File.FullName
}
This construed sample text:
## Q:\Test\2019\09\12\SO_57911471.txt
I have a script here that looks for a delimiter in a text file with several reports in it.
In the middle of the data of each page there is -
SPEC #: RX:string1.
I want that string to be saved as the filename.
END OF REPORT
I have a script here that looks for a delimiter in a text file with several reports in it.
In the middle of the data of each page there is -
SPEC #: RX:string2.
I want that string to be saved as the filename.
END OF REPORT
yields:
> Get-ChildItem *string* -name
SO_57911471_string1_1.txt
SO_57911471_string2_2.txt
The added ReportNum is just a precaution in case the string could not be grepped.

How to parse csv file, look for trigger and split into new files with powershell

I have a CSV file which is structured like this:
"SA1";"21020180123155514000000000000000002"
"SA2";"21020180123155514000000000000000002";"210"
"SA4";"21020180123155514000000000000000002";"210";"200000001"
"SA5";"21020180123155514000000000000000002";"210";"200000001";"140000001";"ZZ"
"SA1";"21020180123155522000000000000000002"
"SA2";"21020180123155522000000000000000002";"210"
"SA4";"21020180123155522000000000000000002";"210";"200000001"
"SA5";"21020180123155522000000000000000002";"210";"200000001";"140000671";"ZZ"
"SA1";"21020180123155567000000000000000002"
"SA2";"21020180123155567000000000000000002";"210"
"SA4";"21020180123155567000000000000000002";"210";"200000001"
"SA5";"21020180123155567000000000000000002";"210";"200000001";"140000001";"ZZ"
So the Value in the second field (separator ';') marks the data which belongs together and value 140000001 or 140000671 is the trigger.
So the result should be:
1st file: 140000001.txt
"SA1";"21020180123155514000000000000000002"
"SA2";"21020180123155514000000000000000002";"210"
"SA4";"21020180123155514000000000000000002";"210";"200000001"
"SA5";"21020180123155514000000000000000002";"210";"200000001";"140000001";"ZZ"
"SA1";"21020180123155567000000000000000002"
"SA2";"21020180123155567000000000000000002";"210"
"SA4";"21020180123155567000000000000000002";"210";"200000001"
"SA5";"21020180123155567000000000000000002";"210";"200000001";"140000001";"ZZ"
2nd file: 140000671.txt
"SA1";"21020180123155522000000000000000002"
"SA2";"21020180123155522000000000000000002";"210"
"SA4";"21020180123155522000000000000000002";"210";"200000001"
"SA5";"21020180123155522000000000000000002";"210";"200000001";"140000671";"ZZ"
For now I found a snippet which splits the big file by the second field:
$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\\*"
$header = Get-Content -Path $src | select -First 1
Get-Content -Path $src | select -Skip 1 | foreach {
$file = "$(($_ -split ";")[1]).txt"
Write-Verbose "Wrting to $file"
$file = $file.Replace('"',"")
if (-not (Test-Path -Path $dstDir\$file))
{
Out-File -FilePath $dstDir\$file -InputObject $header -Encoding ascii
}
$file -replace '"', ""
Out-File -FilePath $dstDir\$file -InputObject $_ -Encoding ascii -Append
}
For the rest I'm standing in the dark.
Please help.
The Import-CSV cmdlet will work here, if you don't already know about it. I would use that, as it returns all the rows as different objects in an array, with the properties being the column values. And you don't have to manually remove the quotes and such. Assuming the second column is a date time value, and should be unique for each group of 4 consecutive rows, then this will work:
$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\*"
$csv = Import-CSV $src -Delimiter ';'
$DateTimeGroups = $csv | Group-Object -Property 'ColumnTwoHeader'
foreach ($group in $DateTimeGroups) {
$filename = $group.Group.'ColumnFiveHeader' | select -Unique
$group.Group | Export-CSV "$dstDir\$filename.txt" -Append -NoTypeInformation
}
However, this will break if two of those "groups of 4 consecutive rows" have the same value for the second column and the fifth column. There isn't a way to fix this unless you are certain that there will always be 4 consecutive rows in each time group. In which case:
$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\*"
$csv = Import-CSV $src -Delimiter ';'
if ($csv.count % 4 -ne 0) {
Write-Error "CSV does not have a proper number of rows. Attempting to continue will be bad :)"
return
}
for ($i = 0 ; $i -lt $csv.Count ; $i=$i+4) {
$group = $csv[$i..($i+4)]
$group | Export-Csv "$dstDir\$($group[3].'ColumnFiveHeader').txt" -Append -NoTypeInformation
}
Just be sure to replace Column2Header and Column5Header with the appropriate values.
If performance is not a concern, combining Import-Csv / Export-Csv with Group-Object allows the most concise, direct expression of your intent, using PowerShell's ability to convert CSV to objects and back:
$src = "C:\temp\ORD001.txt" # Input CSV file
$dstDir = "C:\temp\files" # Output directory
# Delete previous output files, if necessary.
Remove-Item -Path "$dstDir\*" -WhatIf
# Import the source CSV into custom objects with properties named for the columns.
# Note: The assumption is that your CSV header line defines columns "Col1", "Col2", ...
Import-Csv $src -Delimiter ';' |
# Group the resulting objects by column 2
Group-Object -Property Col2 |
ForEach-Object { # Process each resulting group.
# Determine the output filename via the group's last row's column 5 value.
$outFile = '{0}\{1}.txt' -f $dstDir, $_.Group[-1].Col5
# Append the group at hand to the target file.
$_.Group | Export-Csv -Append -Encoding Ascii $outFile -Delimiter ';' -NoTypeInformation
}
Note:
The assumption - in line with your sample data - is that it is always the last row in a group of lines sharing the same column-2 value whose column 5 contains the root of the output filename (e.g., 140000001)
Sorry but I don't have a Header Column. It's a semikolon seperated txt file for an interface
You can simply read the file with Get-Content, and then search for the trigger in the line.
I hope this small example can help:
$file = Get-Content CSV_File.txt
$140000001 = #()
$140000671 = #()
$bTrig = #()
foreach($line in $file){
$bTrig += $line
if($line -match ';"140000001";'){
$140000001 += $bTrig
$bTrig = #()
}
elseif($line -match ';"140000671";'){
$140000671 += $bTrig
$bTrig = #()
}
}
if($bTrig.Count -ne 0){Write-Warning "No trigger for $bTrig"}
$140000001 | Out-File 140000001.txt -Encoding ascii
$140000671 | Out-File 140000671.txt -Encoding ascii

PowerShell Get-Content with basic manipulations so slow

I am merging a lot of large CSV files, e.g. while skipping the leading junk and appending the filename to each line:
Get-ChildItem . | Where Name -match "Q[0-4]20[0-1][0-9].csv" |
Foreach-Object {
$file = $_.BaseName
Get-Content $_.FullName | select-object -skip 3 | % {
"$_,${file}" | Out-File -Append temp.csv -Encoding ASCII
}
}
In PowerShell this is incredibly slow even on an i7/16GB machine (~5 megabyte/minute). Can I make it more efficient or should I just switch to e.g. Python?
Get-Content / Set-Content are terrible with larger files. Streams are a good alternative when performance is key. So with that in mind lets use one to read in each file and another to write out the results.
$rootPath = "C:\temp"
$outputPath = "C:\test\somewherenotintemp.csv"
$streamWriter = [System.IO.StreamWriter]$outputPath
Get-ChildItem $rootPath -Filter "*.csv" -File | ForEach-Object{
$file = $_.BaseName
[System.IO.File]::ReadAllLines($_.FullName) |
Select-Object -Skip 3 | ForEach-Object{
$streamWriter.WriteLine(('{0},"{1}"' -f $_,$file))
}
}
$streamWriter.Close(); $streamWriter.Dispose()
Create a writing stream $streamWriter to output the edited lines in each file. We could read in the file and write the file in larger batches, which would be faster, but we need to ignore a few lines and make changes to each one so processing line by line is simpler. Avoid writing anything to console during this time as it will just slow everything down.
What '{0},"{1}"' -f $_,$file does is quote that last "column" that is added in case the basename contains spaces.
Measure-Command -Expression {
Get-ChildItem C:\temp | Where Name -like "*.csv" | ForEach-Object {
$file = $_.BaseName
Get-Content $_.FullName | select-object -Skip 3 | ForEach-Object {
"$_,$($file)" | Out-File -Append C:\temp\t\tempe1.csv -Encoding ASCII -Force
}
}
} # TotalSeconds : 12,0526802 for 11415 lines
If you first put everything into an array in memory, things go a lot faster:
Measure-Command -Expression {
$arr = #()
Get-ChildItem C:\temp | Where Name -like "*.csv" | ForEach-Object {
$file = $_.BaseName
$arr += Get-Content $_.FullName | select-object -Skip 3 | ForEach-Object {
"$_,$($file)"
}
}
$arr | Out-File -Append C:\temp\t\tempe2.csv -Encoding ASCII -Force
} # TotalSeconds : 0,8197193 for 11415 lines
EDIT: Fixed it so that your filename was added to each row.
To avoid -Append to ruin the performance of your script you could use a buffer array variable:
# Initialize buffer
$csvBuffer = #()
Get-ChildItem *.csv | Foreach-Object {
$file = $_.BaseName
$content = Get-Content $_.FullName | Select-Object -Skip 3 | %{
"$_,${file}"
}
# Populate buffer
$csvBuffer += $content
# Write buffer to disk if it contains 5000 lines or more
$csvBufferCount = $csvBuffer | Measure-Object | Select-Object -ExpandProperty Count
if( $csvBufferCount -ge 5000 )
{
$csvBuffer | Out-File -Path temp.csv -Encoding ASCII -Append
$csvBuffer = #()
}
}
# Important : empty the buffer remainder
if( $csvBufferCount -gt 0 )
{
$csvBuffer | Out-File -Path temp.csv -Encoding ASCII -Append
$csvBuffer = #()
}

Need to output multiple rows to CSV file

I am using the following script that iterates through hundreds of text files looking for specific instances of the regex expression within. I need to add a second data point to the array, which tells me the object the pattern matched in.
In the below script the [Regex]::Matches($str, $Pattern) | % { $_.Value } piece returns multiple rows per file, which cannot be easily output to a file.
What I would like to know is, how would I output a 2 column CSV file, one column with the file name (which should be $_.FullName), and one column with the regex results? The code of where I am at now is below.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Lines = #()
Get-ChildItem -Recurse $FolderPath -File | ForEach-Object {
$_.FullName
$str = Get-Content $_.FullName
$Lines += [Regex]::Matches($str, $Pattern) |
% { $_.Value } |
Sort-Object |
Get-Unique
}
$Lines = $Lines.Trim().ToUpper() -replace '[\r\n]+', ' ' -replace ";", '' |
Sort-Object |
Get-Unique # Cleaning up data in array
I can think of two ways but the simplest way is to use a hashtable (dict). Another way is create psobjects to fill your Lines variable. I am going to go with the simple way so you can only use one variable, the hashtable.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Results =#{}
Get-ChildItem -Recurse $FolderPath -File |
ForEach-Object {
$str = Get-Content $_.FullName
$Line = [regex]::matches($str,$Pattern) | % { $_.Value } | Sort-Object | Get-Unique
$Line = $Line.Trim().ToUpper() -Replace '[\r\n]+', ' ' -Replace ";",'' | Sort-Object | Get-Unique # Cleaning up data in array
$Results[$_.FullName] = $Line
}
$Results.GetEnumerator() | Select #{L="Folder";E={$_.Key}}, #{L="Matches";E={$_.Value}} | Export-Csv -NoType -Path <Path to save CSV>
Your results will be in $Results. $Result.keys contain the folder names. $Results.Values has the results from expression. You can reference the results of a particular folder by its key $Results["Folder path"]. of course it will error if the key does not exist.