Splitting file into smaller files, working script, but need some tweaks - powershell

I have a script here that looks for a delimiter in a text file with several reports in it.  The script saves each individual report as it's own text document. The tweaks I'm trying to achieve are:
In the middle of the data of each page there is - SPEC #: RX:<string>.  I want that string to be saved as the filename.
it currently saves from the delimiter down to the next one. This ignores the first report and grabs every one after. I want it to go from the delimiter UP to the next one, but I haven't figured out how to achieve that.
$InPC = "C:\Users\path"
Get-ChildItem -Path $InPC -Filter *.txt | ForEach-Object -Process {
$basename= $_.BaseName
$m = ( ( Get-Content $_.FullName | Where { $_ | Select-String "END OF
REPORT" -Quiet } | Measure-Object | ForEach-Object { $_.Count } ) -ge 2)
$a = 1
if ($m) {
Get-Content $_.FullName | % {
If ($_ -match "END OF REPORT") {
$OutputFile = "$InPC\$basename _$a.txt"
$a++
}
Add-Content $OutputFile $_
}
Remove-Item $_.FullName
}
}
This works, as stated it outputs the file with END OF REPORT on top, the first report in the file gets omitted as it does not have END OF REPORT above it.
Edited code:
$InPC = 'C:\Path' #
ForEach($File in Get-ChildItem -Path $InPC -Filter *.txt){
$RepNum=0
ForEach($Report in (([IO.File]::ReadAllText('C:\Path'$File) -split 'END OF REPORT\r?\n?' -ne '')){
if ($Report -match 'SPEC #: RX:(?<ReportFile>.*?)\.'){
$ReportFile=$Matches.ReportFile
}
$OutputFile = "{0}\{1}_{2}_{3}.txt" -f $InPC,$File.BaseName,$ReportFile,++$RepNum
$Report | Add-Content $OutputFile
}
# Remove-Item $File.FullName
}

I suggest to use Regular Expressions to
read in the file with -raw parameter and
split the file at the marker END OF REPORT into sections
use the 'SPEC #: RX:(?<ReportFile>.*?)\.' with a named capture group to extract the string
Edit adapted to PowerShell v2
## Q:\Test\2019\09\12\SO_57911471.ps1
$InPC = 'C:\Users\path' # 'Q:\Test\2019\09\12\' #
ForEach($File in Get-ChildItem -Path $InPC -Filter *.txt){
$RepNum=0
ForEach($Report in (((Get-Content $File.FullName) -join "`n") -split 'END OF REPORT\r?\n?' -ne '')){
if ($Report -match 'SPEC #: RX:(?<ReportFile>.*?)\.'){
$ReportFile=$Matches.ReportFile
}
$OutputFile = "{0}\{1}_{2}_{3}.txt" -f $InPC,$File.BaseName,$ReportFile,++$RepNum
$Report | Add-Content $OutputFile
}
# Remove-Item $File.FullName
}
This construed sample text:
## Q:\Test\2019\09\12\SO_57911471.txt
I have a script here that looks for a delimiter in a text file with several reports in it.
In the middle of the data of each page there is -
SPEC #: RX:string1.
I want that string to be saved as the filename.
END OF REPORT
I have a script here that looks for a delimiter in a text file with several reports in it.
In the middle of the data of each page there is -
SPEC #: RX:string2.
I want that string to be saved as the filename.
END OF REPORT
yields:
> Get-ChildItem *string* -name
SO_57911471_string1_1.txt
SO_57911471_string2_2.txt
The added ReportNum is just a precaution in case the string could not be grepped.

Related

delete double quotes in an export-csv result using powershell [duplicate]

I would like remove all quotations character in my exported csv file, it's very annoying when i generated a new csv file and i need to manually to remove all the quotations that include in the string. Could anyone provide me a Powershell script to overcome this problem? Thanks.
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object {
$_ -replace """, ""
} | Set-Content $File
Next time you make one, export-csv in powershell 7 has a new option you may like:
export-csv -UseQuotes AsNeeded
It seems many of us have already explained that quotes are sometimes needed in CSV files. This is the case when:
the value contains a double quote
the value contains the delimiter character
the value contains newlines or has whitespace at the beginning or the end of the string
With PS version 7 you have the option to use parameter -UseQuotes AsNeeded.
For older versions I made this helper function to convert to CSV using only quotes when needed:
function ConvertTo-CsvNoQuotes {
# returns a csv delimited string array with values unquoted unless needed
[OutputType('System.Object[]')]
[CmdletBinding(DefaultParameterSetName = 'ByDelimiter')]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true, Position = 0)]
[PSObject]$InputObject,
[Parameter(Position = 1, ParameterSetName = 'ByDelimiter')]
[char]$Delimiter = ',',
[Parameter(ParameterSetName = 'ByCulture')]
[switch]$UseCulture,
[switch]$NoHeaders,
[switch]$IncludeTypeInformation # by default, this function does NOT include type information
)
begin {
if ($UseCulture) { $Delimiter = (Get-Culture).TextInfo.ListSeparator }
# regex to test if a string contains a double quote, the delimiter character,
# newlines or has whitespace at the beginning or the end of the string.
# if that is the case, the value needs to be quoted.
$needQuotes = '^\s|["{0}\r\n]|\s$' -f [regex]::Escape($Delimiter)
# a boolean to check if we have output the headers or not from the object(s)
# and another to check if we have output type information or not
$doneHeaders = $doneTypeInfo = $false
}
process {
foreach($item in $InputObject) {
if (!$doneTypeInfo -and $IncludeTypeInformation) {
'#TYPE {0}' -f $item.GetType().FullName
$doneTypeInfo = $true
}
if (!$doneHeaders -and !$NoHeaders) {
$row = $item.PsObject.Properties | ForEach-Object {
# if needed, wrap the value in quotes and double any quotes inside
if ($_.Name -match $needQuotes) { '"{0}"' -f ($_.Name -replace '"', '""') } else { $_.Name }
}
$row -join $Delimiter
$doneHeaders = $true
}
$item | ForEach-Object {
$row = $_.PsObject.Properties | ForEach-Object {
# if needed, wrap the value in quotes and double any quotes inside
if ($_.Value -match $needQuotes) { '"{0}"' -f ($_.Value -replace '"', '""') } else { $_.Value }
}
$row -join $Delimiter
}
}
}
}
Using your example to remove the unnecessary quotes in an existing CSV file:
$File = "c:\programfiles\programx\file.csv"
(Import-Csv $File) | ConvertTo-CsvNoQuotes | Set-Content $File
keeping in mind that this may trash your data if you have embedded double quotes in your data, here is yet another variation on the idea ... [grin]
what it does ...
defines the input & output full file names
grabs the *.tmp files from the temp dir
filters for the 1st three files & only three basic properties
creates the file to work with
loads the file content
replaces the double quotes with nothing
saves the cleaned file to the 2nd file name
displays the original & cleaned versions of the file
the code ...
$TestCSV = "$env:TEMP\Ted.Xiong_-_Test.csv"
$CleanedTestCSV = $TestCSV -replace 'Test', 'CleanedTest'
Get-ChildItem -LiteralPath $env:TEMP -Filter '*.tmp' -File |
Select-Object -Property Name, LastWriteTime, Length -First 3 |
Export-Csv -LiteralPath $TestCSV -NoTypeInformation
(Get-Content -LiteralPath $TestCSV) -replace '"', '' |
Set-Content -LiteralPath $CleanedTestCSV
Get-Content -LiteralPath $TestCSV
'=' * 30
Get-Content -LiteralPath $CleanedTestCSV
output ...
"Name","LastWriteTime","Length"
"hd4130E.tmp","2020-03-13 5:23:06 PM","0"
"hd418D4.tmp","2020-03-12 11:47:59 PM","0"
"hd41F7D.tmp","2020-03-13 5:23:09 PM","0"
==============================
Name,LastWriteTime,Length
hd4130E.tmp,2020-03-13 5:23:06 PM,0
hd418D4.tmp,2020-03-12 11:47:59 PM,0
hd41F7D.tmp,2020-03-13 5:23:09 PM,0
As above, the quotations are valid for csv, but to remove them you need to escape the quote mark in the replace operation as is a special character:
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object {
$_ -replace "`"", ""
} | Set-Content $File
Why are you manually in a text editor read Csv files?
You exported them to that format for a reason. To read them, just import them back in and view them on screen and or Read them back in and send the readout to notepad for reading.
Export-Csv -Path D:\temp\book1.csv
Import-Csv -Path D:\temp\book1.csv |
Clip |
Notepad # then press crtl+v, then save the notepad file with a new name.
If you don't want Csv, then don't export as Csv, just output as a flat-file, using Out-File instead.
Update
Since your last comment to me indicated your final use case. CSV into SQL is a very common thing. A quick web search will show you how even provide you with a script. You should also be looking at the PowerShell DBATools module.
How to import data from .csv in SQL Server using PowerShell?
Importing CSV files into a Microsoft SQL DB using PowerShell
ImportingCSVsIntoSQLv1.zip
Four Easy Ways to Import CSV Files to SQL Server with PowerShell
Find-Module -Name '*dba*'
<#
Version Name Repository Description
------- ---- ---------- -----------
1.0.101 dbatools PSGallery The community module that enables SQL Server Pros to automate database development and server administration
...
#>
Update
You mean this...
Get-Content 'D:\temp\book1.csv'
<#
# Results
"Site","Dept"
"Main","aaa,bbb,ccc"
"Branch1","ddd,eee,fff"
"Branch2","ggg,hhh,iii"
#>
Get-ChildItem -Path 'D:\temp' -Filter 'book1.csv' |
ForEach {
$NewFile = New-Item -Path 'D:\Temp' -Name "$($PSItem.BaseName).txt"
Get-Content -Path $PSItem.FullName |
ForEach-Object {
Add-Content -Path $NewFile -Value ($PSItem -replace '"') -WhatIf
}
}
<#
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt"
#>
Get-ChildItem -Path 'D:\temp' -Filter 'book1.csv' |
ForEach {
$NewFile = New-Item -Path 'D:\Temp' -Name "$($PSItem.BaseName).txt"
Get-Content -Path $PSItem.FullName |
ForEach-Object {
Add-Content -Path $NewFile -Value ($PSItem -replace '"')
}
}
Get-Content 'D:\temp\book1.txt'
<#
# Results
Site,Dept
Main,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
#>
Of course, you need to use a wildcard for the csv files and use the -Resurse to get all directories and an error handler to make sure you don't have file name collisions.
One solution for dont remove the double quote into the string quoted :
$delimiter=","
$InputFile="c:\programfiles\programx\file.csv"
$OutputFile="c:\programfiles\programx\resultfile.csv"
#import file in variable (not necessary if your faile is big repeat this import where i use $ContentFile)
$ContentFile=import-csv $InputFile -Delimiter $delimiter -Encoding utf8
#list of property of csv file
$properties=($ContentFile | select -First 1 | Get-Member -MemberType NoteProperty).Name
#write header into new file
$properties -join $delimiter | Out-File $OutputFile -Encoding utf8
#write data into new file
$ContentFile | %{
$RowObject=$_ #==> get row object
$Line=#() #==> create array
$properties | %{$Line+=$RowObject."$_"} #==> Loop on every property, take value (without quote) inot row object
$Line -join $delimiter #==> join array for get line with delimer and send to standard outut
} | Out-File $OutputFile -Encoding utf8 -Append #==> export result to output file
An extra double quote can be used to escape a double quote in a string:
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object { $_ -replace """", "" } | Set-Content $File
After you have exported the CSV file with Export-CSV, you can use Get-Content to load the CSV file into an array of strings, then use Set-Content and replace to remove the quotation marks:
Set-Content -Path sample.csv -Value ((Get-Content -Path sample.csv) -replace '"')
As mklement0 helpfully pointed out, this could potentially corrupt the CSV if some lines need quoting. This solution simply goes through the whole file and replaces every quote with ''.
You could also speed this up with using the -Raw switch with Get-Content, which returns a whole string with the newlines preserved, instead of an array of newline delimited strings:
Set-Content -NoNewline -Path sample.csv -Value ((Get-Content -Raw -Path sample.csv) -replace '"')

How to remove all quotations mark in the csv file using powershell script?

I would like remove all quotations character in my exported csv file, it's very annoying when i generated a new csv file and i need to manually to remove all the quotations that include in the string. Could anyone provide me a Powershell script to overcome this problem? Thanks.
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object {
$_ -replace """, ""
} | Set-Content $File
Next time you make one, export-csv in powershell 7 has a new option you may like:
export-csv -UseQuotes AsNeeded
It seems many of us have already explained that quotes are sometimes needed in CSV files. This is the case when:
the value contains a double quote
the value contains the delimiter character
the value contains newlines or has whitespace at the beginning or the end of the string
With PS version 7 you have the option to use parameter -UseQuotes AsNeeded.
For older versions I made this helper function to convert to CSV using only quotes when needed:
function ConvertTo-CsvNoQuotes {
# returns a csv delimited string array with values unquoted unless needed
[OutputType('System.Object[]')]
[CmdletBinding(DefaultParameterSetName = 'ByDelimiter')]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true, Position = 0)]
[PSObject]$InputObject,
[Parameter(Position = 1, ParameterSetName = 'ByDelimiter')]
[char]$Delimiter = ',',
[Parameter(ParameterSetName = 'ByCulture')]
[switch]$UseCulture,
[switch]$NoHeaders,
[switch]$IncludeTypeInformation # by default, this function does NOT include type information
)
begin {
if ($UseCulture) { $Delimiter = (Get-Culture).TextInfo.ListSeparator }
# regex to test if a string contains a double quote, the delimiter character,
# newlines or has whitespace at the beginning or the end of the string.
# if that is the case, the value needs to be quoted.
$needQuotes = '^\s|["{0}\r\n]|\s$' -f [regex]::Escape($Delimiter)
# a boolean to check if we have output the headers or not from the object(s)
# and another to check if we have output type information or not
$doneHeaders = $doneTypeInfo = $false
}
process {
foreach($item in $InputObject) {
if (!$doneTypeInfo -and $IncludeTypeInformation) {
'#TYPE {0}' -f $item.GetType().FullName
$doneTypeInfo = $true
}
if (!$doneHeaders -and !$NoHeaders) {
$row = $item.PsObject.Properties | ForEach-Object {
# if needed, wrap the value in quotes and double any quotes inside
if ($_.Name -match $needQuotes) { '"{0}"' -f ($_.Name -replace '"', '""') } else { $_.Name }
}
$row -join $Delimiter
$doneHeaders = $true
}
$item | ForEach-Object {
$row = $_.PsObject.Properties | ForEach-Object {
# if needed, wrap the value in quotes and double any quotes inside
if ($_.Value -match $needQuotes) { '"{0}"' -f ($_.Value -replace '"', '""') } else { $_.Value }
}
$row -join $Delimiter
}
}
}
}
Using your example to remove the unnecessary quotes in an existing CSV file:
$File = "c:\programfiles\programx\file.csv"
(Import-Csv $File) | ConvertTo-CsvNoQuotes | Set-Content $File
keeping in mind that this may trash your data if you have embedded double quotes in your data, here is yet another variation on the idea ... [grin]
what it does ...
defines the input & output full file names
grabs the *.tmp files from the temp dir
filters for the 1st three files & only three basic properties
creates the file to work with
loads the file content
replaces the double quotes with nothing
saves the cleaned file to the 2nd file name
displays the original & cleaned versions of the file
the code ...
$TestCSV = "$env:TEMP\Ted.Xiong_-_Test.csv"
$CleanedTestCSV = $TestCSV -replace 'Test', 'CleanedTest'
Get-ChildItem -LiteralPath $env:TEMP -Filter '*.tmp' -File |
Select-Object -Property Name, LastWriteTime, Length -First 3 |
Export-Csv -LiteralPath $TestCSV -NoTypeInformation
(Get-Content -LiteralPath $TestCSV) -replace '"', '' |
Set-Content -LiteralPath $CleanedTestCSV
Get-Content -LiteralPath $TestCSV
'=' * 30
Get-Content -LiteralPath $CleanedTestCSV
output ...
"Name","LastWriteTime","Length"
"hd4130E.tmp","2020-03-13 5:23:06 PM","0"
"hd418D4.tmp","2020-03-12 11:47:59 PM","0"
"hd41F7D.tmp","2020-03-13 5:23:09 PM","0"
==============================
Name,LastWriteTime,Length
hd4130E.tmp,2020-03-13 5:23:06 PM,0
hd418D4.tmp,2020-03-12 11:47:59 PM,0
hd41F7D.tmp,2020-03-13 5:23:09 PM,0
As above, the quotations are valid for csv, but to remove them you need to escape the quote mark in the replace operation as is a special character:
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object {
$_ -replace "`"", ""
} | Set-Content $File
Why are you manually in a text editor read Csv files?
You exported them to that format for a reason. To read them, just import them back in and view them on screen and or Read them back in and send the readout to notepad for reading.
Export-Csv -Path D:\temp\book1.csv
Import-Csv -Path D:\temp\book1.csv |
Clip |
Notepad # then press crtl+v, then save the notepad file with a new name.
If you don't want Csv, then don't export as Csv, just output as a flat-file, using Out-File instead.
Update
Since your last comment to me indicated your final use case. CSV into SQL is a very common thing. A quick web search will show you how even provide you with a script. You should also be looking at the PowerShell DBATools module.
How to import data from .csv in SQL Server using PowerShell?
Importing CSV files into a Microsoft SQL DB using PowerShell
ImportingCSVsIntoSQLv1.zip
Four Easy Ways to Import CSV Files to SQL Server with PowerShell
Find-Module -Name '*dba*'
<#
Version Name Repository Description
------- ---- ---------- -----------
1.0.101 dbatools PSGallery The community module that enables SQL Server Pros to automate database development and server administration
...
#>
Update
You mean this...
Get-Content 'D:\temp\book1.csv'
<#
# Results
"Site","Dept"
"Main","aaa,bbb,ccc"
"Branch1","ddd,eee,fff"
"Branch2","ggg,hhh,iii"
#>
Get-ChildItem -Path 'D:\temp' -Filter 'book1.csv' |
ForEach {
$NewFile = New-Item -Path 'D:\Temp' -Name "$($PSItem.BaseName).txt"
Get-Content -Path $PSItem.FullName |
ForEach-Object {
Add-Content -Path $NewFile -Value ($PSItem -replace '"') -WhatIf
}
}
<#
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt".
What if: Performing the operation "Add Content" on target "Path: D:\Temp\book1.txt"
#>
Get-ChildItem -Path 'D:\temp' -Filter 'book1.csv' |
ForEach {
$NewFile = New-Item -Path 'D:\Temp' -Name "$($PSItem.BaseName).txt"
Get-Content -Path $PSItem.FullName |
ForEach-Object {
Add-Content -Path $NewFile -Value ($PSItem -replace '"')
}
}
Get-Content 'D:\temp\book1.txt'
<#
# Results
Site,Dept
Main,aaa,bbb,ccc
Branch1,ddd,eee,fff
Branch2,ggg,hhh,iii
#>
Of course, you need to use a wildcard for the csv files and use the -Resurse to get all directories and an error handler to make sure you don't have file name collisions.
One solution for dont remove the double quote into the string quoted :
$delimiter=","
$InputFile="c:\programfiles\programx\file.csv"
$OutputFile="c:\programfiles\programx\resultfile.csv"
#import file in variable (not necessary if your faile is big repeat this import where i use $ContentFile)
$ContentFile=import-csv $InputFile -Delimiter $delimiter -Encoding utf8
#list of property of csv file
$properties=($ContentFile | select -First 1 | Get-Member -MemberType NoteProperty).Name
#write header into new file
$properties -join $delimiter | Out-File $OutputFile -Encoding utf8
#write data into new file
$ContentFile | %{
$RowObject=$_ #==> get row object
$Line=#() #==> create array
$properties | %{$Line+=$RowObject."$_"} #==> Loop on every property, take value (without quote) inot row object
$Line -join $delimiter #==> join array for get line with delimer and send to standard outut
} | Out-File $OutputFile -Encoding utf8 -Append #==> export result to output file
An extra double quote can be used to escape a double quote in a string:
$File = "c:\programfiles\programx\file.csv"
(Get-Content $File) | Foreach-Object { $_ -replace """", "" } | Set-Content $File
After you have exported the CSV file with Export-CSV, you can use Get-Content to load the CSV file into an array of strings, then use Set-Content and replace to remove the quotation marks:
Set-Content -Path sample.csv -Value ((Get-Content -Path sample.csv) -replace '"')
As mklement0 helpfully pointed out, this could potentially corrupt the CSV if some lines need quoting. This solution simply goes through the whole file and replaces every quote with ''.
You could also speed this up with using the -Raw switch with Get-Content, which returns a whole string with the newlines preserved, instead of an array of newline delimited strings:
Set-Content -NoNewline -Path sample.csv -Value ((Get-Content -Raw -Path sample.csv) -replace '"')

Replace the text for all files in a Directory

I have written the below conditional script to go through the files in the directory and replace the one text in all files only if file contains the word as 'Health'
cd -Path "\\shlhfilprd08\Direct Credits\Temp2"
ForEach ($file in (Get-ChildItem -Path "\\shlhfilprd08\Direct Credits\Temp2"))
{
$filecontent = Get-Content -path $file -First 1
if($filecontent -like '*Health*'){$filecontent = $filecontent -replace 'TEACHERF','UniHlth '}
Set-Content $file.PSpath -Value $filecontent
}
I come across with two issues such as
If the ($filecontent -like 'Health'), it is replacing the word in first raw and deleting other rows along with replace.I do not want that to happen
I'm getting set-content to path is denied error message for file content does not contain the Health text
Can you try with this
cd -Path "\\shlhfilprd08\Direct Credits\Temp2"
$configFiles = Get-ChildItem . *.config -rec
foreach ($file in $configFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace "TEACHERF", "UniHlth " } |
Set-Content $file.PSPath
}
I would try this; it worked for me in a little file
(make a small copy of a few data into a new folder and test it there)
$path = "\\shlhfilprd08\Direct Credits\Temp2"
$replace ="TEACHERF" #word to be replaced
$by = "UniHlth " #by this word (change $replace by $by)
gci $path -file | %{
foreach($line in $(Get-content $_.Fullname)){
if($line -like $replace){
$newline = $line.Replace($($replace),$($by))
Set-Content $_.FullName $newline
}
}
}

Filtering sections of data including the starting and ending lines- PowerShell

I have a text file that looks like this:
Data I'm NOT looking for
More data that doesn't matter
Even more data that I don't
&Start/Finally the data I'm looking for
&Data/More data that I need
&Stop/I need this too
&Start/Second batch of data I need
&Data/I need this too
&Stop/Okay now I'm done
Ending that I don't need
Here is what the output needs to be:
File1.txt
&Start/Finally the data I'm looking for
&Data/More data that I need
&Stop/I need this too
File2.txt
&Start/Second batch of data I need
&Data/I need this too
&Stop/Okay now I'm done
I need to do this for every file in a folder (sometimes there will be multiple files that will need to be filtered.) The files names can be incrementing: ex. File1.txt, File2.txt, File3.txt.
This is what I have tried with no luck:
ForEach-Object{
$text -join "`n" -split '(?ms)(?=^&START)' -match '^&START' |
Out-File B:\PowerShell\$filename}
Thanks!
Looks like you were pretty close: your code correctly extracted the paragraphs of interest, but intra-paragraph out-filtering of non-&-starting lines was missing, and you needed to write to paragraph-specific output files:
$text -join "`n" -split '(?m)(?=^&Start)' -match '^&Start' |
ForEach-Object { $ndx=0 } { $_ -split '\n' -match '^&' | Out-File "File$((++$ndx)).txt" }
This creates sequentially numbered files starting with File1.txt for every paragraph of interest.
To do it for every file in a folder, with output filenames using fixed naming scheme File<n> across all input files (and thus cumulative numbering):
Get-ChildItem -File . | ForEach-Object -Begin { $ndx=0 } -Process {
(Get-Content -Raw $_) -split '(?m)(?=^&Start)' -match '^&Start' |
ForEach-Object { $_ -split '\n' -match '^&' | Out-File "File$((++$ndx)).txt" }
}
To do it for every file in a folder, with output filenames based on the input filenames and numbering per input file (PSv4+, due to use of -PipelineVariable):
Get-ChildItem -File . -PipelineVariable File | ForEach-Object {
(Get-Content -Raw $_) -split '(?m)(?=^&Start)' -match '^&Start' |
ForEach-Object {$ndx=0} { $_ -split '\n' -match '^&' | Out-File "$($File.Name)$((++$ndx)).txt" }
}
You post a second question (against the rules) and it was deleted but here is my quick answer for it. I hope it will help you and give you more sense how PS works:
$InputFile = "C:\temp\test\New folder (3)\File1.txt"
# get file content
$a=Get-Content $InputFile
# loop for every line in range 2 to last but one
for ($i=1; $i -lt ($a.count-1); $i++)
{
#geting string part between & and / , and construct output file name
$OutFile = "$(Split-Path $InputFile)\$(($a[$i] -split '/')[0] -replace '&','').txt"
$a[0]| Out-File $OutFile #creating output file and write first line in it
$a[$i]| Out-File $OutFile -Append #write info line
$a[-1]| Out-File $OutFile -Append #write last line
}
Something like this?
$i=0
gci -path "C:\temp\ExplodeDir" -file | %{ (get-content -path $_.FullName -Raw).Replace("`r`n`r`n", ";").Replace("`r`n", "~").Split(";") | %{if ($_ -like "*Start*") {$i++; ($_ -split "~") | out-file "C:\temp\ResultFile\File$i.txt" }} }

Using Powershell to replace multiple strings in multiple files & folders

I have a list of strings in a CSV file. The format is:
OldValue,NewValue
223134,875621
321321,876330
....
and the file contains a few hundred rows (each OldValue is unique). I need to process changes over a number of text files in a number of folders & subfolders. My best guess of the number of folders, files, and lines of text are - 15 folders, around 150 text files in each folder, with approximately 65,000 lines of text in each folder (between 400-500 lines per text file).
I will make 2 passes at the data, unless I can do it in one. First pass is to generate a text file I will use as a check list to review my changes. Second pass is to actually make the change in the file. Also, I only want to change the text files where the string occurs (not every file).
I'm using the following Powershell script to go through the files & produce a list of the changes needed. The script runs, but is beyond slow. I haven't worked on the replace logic yet, but I assume it will be similar to what I've got.
# replace a string in a file with powershell
[reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null
Function Search {
# Parameters $Path and $SearchString
param ([Parameter(Mandatory=$true, ValueFromPipeline = $true)][string]$Path,
[Parameter(Mandatory=$true)][string]$SearchString
)
try {
#.NET FindInFiles Method to Look for file
[Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles(
$Path,
[Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,
$SearchString
)
} catch { $_ }
}
if (Test-Path "C:\Work\ListofAllFilenamesToSearch.txt") { # if file exists
Remove-Item "C:\Work\ListofAllFilenamesToSearch.txt"
}
if (Test-Path "C:\Work\FilesThatNeedToBeChanged.txt") { # if file exists
Remove-Item "C:\Work\FilesThatNeedToBeChanged.txt"
}
$filefolder1 = "C:\TestFolder\WorkFiles"
$ftype = "*.txt"
$filenames1 = Search $filefolder1 $ftype
$filenames1 | Out-File "C:\Work\ListofAllFilenamesToSearch.txt" -Width 2000
if (Test-Path "C:\Work\FilesThatNeedToBeChanged.txt") { # if file exists
Remove-Item "C:\Work\FilesThatNeedToBeChanged.txt"
}
(Get-Content "C:\Work\NumberXrefList.CSV" |where {$_.readcount -gt 1}) | foreach{
$OldFieldValue, $NewFieldValue = $_.Split("|")
$filenamelist = (Get-Content "C:\Work\ListofAllFilenamesToSearch.txt" -ReadCount 5) #|
foreach ($j in $filenamelist) {
#$testvar = (Get-Content $j )
#$testvar = (Get-Content $j -ReadCount 100)
$testvar = (Get-Content $j -Delimiter "\n")
Foreach ($i in $testvar)
{
if ($i -imatch $OldFieldValue) {
$j + "|" + $OldFieldValue + "|" + $NewFieldValue | Out-File "C:\Work\FilesThatNeedToBeChanged.txt" -Width 2000 -Append
}
}
}
}
$FileFolder = (Get-Content "C:\Work\FilesThatNeedToBeChanged.txt" -ReadCount 5)
Get-ChildItem $FileFolder -Recurse |
select -ExpandProperty fullname |
foreach {
if (Select-String -Path $_ -SimpleMatch $OldFieldValue -Debug -Quiet) {
(Get-Content $_) |
ForEach-Object {$_ -replace $OldFieldValue, $NewFieldValue }|
Set-Content $_ -WhatIf
}
}
In the code above, I've tried several things with Get-Content - default, with -ReadCount, and -Delimiter - in an attempt to avoid an out of memory error.
The only thing I have control over is the length of the old & new replacement strings file. Is there a way to do this in Powershell? Is there a better option/solution? I'm running Windows 7, Powershell version 3.0.
Your main problem is that you're reading the file over and over again to change each of the terms. You need to invert the looping of the replace terms and looping of the files. Also, pre-load the csv. Something like:
$filefolder1 = "C:\TestFolder\WorkFiles"
$ftype = "*.txt"
$filenames = gci -Path $filefolder1 -Filter $ftype -Recurse
$replaceValues = Import-Csv -Path "C:\Work\NumberXrefList.CSV"
foreach ($file in $filenames) {
$contents = Get-Content -Path $file
foreach ($replaceValue in $replaceValues) {
$contents = $contents -replace $replaceValue.OldValue, $replaceValue.NewValue
}
Copy-Item $file "$file.old"
Set-Content -Path $file -Value $contents
}