replace content of multiple Files with PowerShell - powershell

I'm trying to replace a certain Line in multiple logonscripts (>2000 Scripts).
The script works in the current form, but it writes every file to the disk, even when no changes are made, but I don't want this behaviour. It only should write to the disk, if changes are made.
This is what I already have:
$varFiles = Get-ChildItem $varPath*.$VarEnding
foreach ($file in $varFiles)
{
(Get-Content $file) |
Foreach-Object { $_ -replace [regex]::Escape("$varFind"), "$varReplace" } |
Set-Content $file
}
And this is what I already tried, but it seems, that it is not possible to use if in piped commands:
$varFiles = Get-ChildItem $varPath*.$VarEnding
foreach ($file in $varFiles)
{
$control = $file
(Get-Content $file) |
Foreach-Object { $_ -replace [regex]::Escape("$varFind"), "$varReplace" } |
If($control -ne $file){Set-Content $file}
}
The variables $varPath, $varEnding, $varFind and $varReplace are defined by a few Read-Host commands at the start of the script.
I hope you guys can help me :)

For simplicity and speed - although at the expense of memory use - I'd simply cache and operate on whole input files (requires PowerShell v3+, due to use of -Raw[1]); since logon scripts are generally small, this should be acceptable:
$varFindEscaped = [regex]::Escape($varFind)
$varReplaceEscaped = $varReplace -replace '\$', '$$$$'
foreach ($file in Get-ChildItem $varPath*$varEnding) {
$contentBefore = Get-Content -Raw $file
$contentAfter = $contentBefore -replace $varFindEscaped, $varReplaceEscaped
if ($contentBefore -ne $contentAfter) {
Set-Content $file $contentAfter
}
}
To improve performance I've moved escaping of the -regex operands outside the loop.
Note that I'm also escaping $ instances in the replacement value to prevent their interpretation as references to what was matched, such as $& to refer to the entire match.
Note that Set-Content by default uses the system's default code page rather than UTF-8 encoding.
[1]
In PS v2, you may omit -Raw, which turns $contentBefore into an array of strings (lines) on whose elements -replace then operates individually (as in the OP's approach). While probably slightly slower, it does have the advantage of performing the substitution on individual lines only rather than potentially across multiple lines.

Related

PowerShell: Replace string in all .txt files within directory

I am trying to replace every instance of a string within a directory. However my code is not replacing anything.
What I have so far:
Test Folder contains multiple files and folders containing content that I need to change.
The folders contain .txt documents, the .txt documents contain strings like this: Content reference="../../../PartOfPath/EN/EndofPath/Caution.txt" that i need to change into this: Content reference="../../../PartOfPath/FR/EndofPath/Caution.txt"
Before this question comes up, yes it has to be done this way, as there are other similar strings that I don't want to edit. So I cannot just replace all instances of EN with FR.
$DirectoryPath = "C:\TestFolder"
$Parts =#(
#{PartOne="/PartOfPath";PartTwo="EndofPath/Caution.txt"},
#{PartOne="/OtherPartOfPath";PartTwo="EndofPath/Note.txt"},
#{PartOne="/ThirdPartOfPath";PartTwo="OtherEndofPath/Warning.txt"}) | % { New-Object object | Add-Member -NotePropertyMembers $_ -PassThru }
Get-ChildItem $DirectoryPath | ForEach {
foreach($n in $Parts){
[string]$PartOne = $n.PartOne
[string]$PartTwo = $n.PartTwo
$ReplaceThis = "$PartOne/EN/$PartTwo"
$WithThis = "$PartOne/FR/$PartTwo"
(Get-Content $_) | ForEach {$_ -Replace $ReplaceThis, $WithThis} | Set-Content $_
}
}
The code will run and overwrite files, however no edits will have been made.
While troubleshooting I came across this potential cause:
This test worked:
$FilePath = "C:\TestFolder\Test.txt"
$ReplaceThis ="/PartOfPath/EN/Notes/Note.txt"
$WithThis = "/PartOfPath/FR/Notes/Note.txt"
(Get-Content -Path $FilePath) -replace $ReplaceThis, $WithThis | Set-Content $FilePath
But this test did not
$FilePath = "C:\TestFolder\Test.txt"
foreach($n in $Parts){
[string]$PartOne = $n.PartOne
[string]$PartTwo = $n.PartTwo
[string]$ReplaceThis = "$PartOne/EN/$PartTwo"
[string]$WithThis = "$PartOne/FR/$PartTwo"
(Get-Content -Path $FilePath) -replace $ReplaceThis, $WithThis | Set-Content $FilePath
}
If you can help me understand what is wrong here I would greatly appreciate it.
Thanks to #TessellatingHeckler 's comments I revised my code and found this solution:
$DirectoryPath = "C:\TestFolder"
$Parts =#(
#{PartOne="/PartOfPath";PartTwo="EndofPath/Caution.txt"},
#{PartOne="/OtherPartOfPath";PartTwo="EndofPath/Note.txt"},
#{PartOne="/ThirdPartOfPath";PartTwo="OtherEndofPath/Warning.txt"}) | % { New-Object object | Add-Member -NotePropertyMembers $_ -PassThru }
Get-ChildItem $LanguageFolderPath -Filter "*.txt" -Recurse | ForEach {
foreach($n in $Parts){
[string]$PartOne = $n.PartOne
[string]$PartTwo = $n.PartTwo
$ReplaceThis = "$PartOne/EN/$PartTwo"
$WithThis = "$PartOne/FR/$PartTwo"
(Get-Content $_) | ForEach {$_.Replace($ReplaceThis, $WithThis)} | Set-Content $_
}
}
There were two problems:
Replace was not working as I intended, so I had to use .replace instead
The original Get-ChildItem was not returning any values and had to be replaced with the above version.
PowerShell's -replace operator is regex-based and case-insensitive by default:
To perform literal replacements, \-escape metacharacters in the pattern or call [regex]::Escape().
By contrast, the [string] type's .Replace() method performs literal replacement and is case-sensitive, invariably in Windows PowerShell, by default in PowerShell (Core) 7+ (see this answer for more information).
Therefore:
As TessellatingHeckler points out, given that your search strings seem to contain no regex metacharacters (such as . or \) that would require escaping, there is no obvious reason why your original approach didn't work.
Given that you're looking for literal substring replacements, the [string] type's .Replace() is generally the simpler and faster option if case-SENSITIVITY is desired / acceptable (invariably so in Windows PowerShell; as noted, in PowerShell (Core) 7+, you have the option of making .Replace() case-insensitive too).
However, since you need to perform multiple replacements, a more concise, single-pass -replace solution is possible (though whether it actually performs better would have to be tested; if you need case-sensitivity, use -creplace in lieu of -replace):
$oldLang = 'EN'
$newLang = 'FR'
$regex = #(
"(?<prefix>/PartOfPath/)$oldLang(?<suffix>/EndofPath/Caution.txt)",
"(?<prefix>/OtherPartOfPath/)$oldLang(?<suffix>/EndofPath/Note.txt)",
"(?<prefix>/ThirdPartOfPath/)$oldLang(?<suffix>/OtherEndofPath/Warning.txt)"
) -join '|'
Get-ChildItem C:\TestFolder\Test.txt -Filter *.txt -Recurse | ForEach-Object {
($_ |Get-Content -Raw) -replace $regex, "`${prefix}$newLang`${suffix}" |
Set-Content -LiteralPath $_.FullName
}
See this regex101.com page for an explanation of the regex and the ability to experiment with it.
The expression used as the replacement operand, "`${prefix}$newLang`${suffix}", mixes PowerShell's up-front string interpolation ($newLang, which could also be written as ${newLang}) with placeholders referring to the named capture groups (e.g. (?<prefix>...)) in the regex, which only coincidentally use the same notation as PowerShell variables (though enclosing the name in {...} is required; also, here the $ chars. must be `-escaped to prevent PowerShell's string interpolation from interpreting them); see this answer for background information.
Note the use of -Raw with Get-Content, which reads a text file as a whole into memory, as a single, multi-line string. Given that you don't need line-by-line processing in this case, this greatly speeds up the processing of a given file.
As a general tip: you may need to use the -Encoding parameter with Set-Content to ensure the desired character encoding, given that PowerShell never preserves a file's original coding when reading it. By default, you'll get ANSI-encoded files in Windows PowerShell, and BOM-less UTF-8 files in PowerShell (Core) 7+.

Slowness to Remove 3,7 and 9 column from | separated txt file using PowerShell

I have Pipe separated data file with huge data and i want to remove 3,7, and 9 column.
below script is working 100% fine. but its too slow its taking 5 mins for 22MB file.
Adeel|01|test|1234589|date|amount|00|123345678890|test|all|01|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|05|test|1234589|date|amount|00|123345678890|test|all|05|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|09|test|1234589|date|amount|00|123345678890|test|all|09|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|00|test|1234589|date|amount|00|123345678890|test|all|00|
Adeel|12|test|1234589|date|amount|00|123345678890|test|all|12|
param
(
# Input data file
[string]$Path = 'O:\Temp\test.txt',
# Columns to be removed, any order, dupes are allowed
[int[]]$Remove = (3,6)
)
# sort indexes descending and remove dupes
$Remove = $Remove | Sort-Object -Unique -Descending
# read input lines
Get-Content $Path | .{process{
# split and add to ArrayList which allows to remove items
$list = [Collections.ArrayList]($_ -split '\|')
# remove data at the indexes (from tail to head due to descending order)
foreach($i in $Remove) {
$list.RemoveAt($i)
}
# join and output
#$list -join '|'
$contentUpdate=$list -join '|'
Add-Content "O:\Temp\testoutput.txt" $contentUpdate
}
}
Get-Content is comparatively slow. Use of the pipeline adds additional overhead.
When performance matters, StreamReader and StreamWriter can be a better choice:
param (
# Input data file
[string] $InputPath = 'input.txt',
# Output data file
[string] $OutputPath = 'output.txt',
# Columns to be removed, any order, dupes are allowed
[int[]] $Remove = (1, 2, 2),
# Column separator
[string] $Separator = '|',
# Input file encoding
[Text.Encoding] $Encoding = [Text.Encoding]::Default
)
$ErrorActionPreference = 'Stop'
# Gets rid of dupes and provides fast lookup ability
$removeSet = [Collections.Generic.HashSet[int]] $Remove
$reader = $writer = $null
try {
$reader = [IO.StreamReader]::new(( Convert-Path -LiteralPath $InputPath ), $encoding )
$null = New-Item $OutputPath -ItemType File -Force # as Convert-Path requires existing path
while( $line = $reader.ReadLine() ) {
if( -not $writer ) {
# Construct writer only after first line has been read, so $reader.CurrentEncoding is available
$writer = [IO.StreamWriter]::new(( Convert-Path -LiteralPath $OutputPath ), $false, $reader.CurrentEncoding )
}
$columns = $line.Split( $separator )
$isAppend = $false
for( $i = 0; $i -lt $columns.Length; $i++ ) {
if( -not $removeSet.Contains( $i ) ) {
if( $isAppend ) { $writer.Write( $separator ) }
$writer.Write( $columns[ $i ] )
$isAppend = $true
}
}
$writer.WriteLine() # Write (CR)LF
}
}
finally {
# Make sure to dispose the reader and writer so files get closed.
if( $writer ) { $writer.Dispose() }
if( $reader ) { $reader.Dispose() }
}
Convert-Path is used because .NET has a different current directory than PowerShell, so it's best practice to pass absolute paths to .NET API.
If this still isn't fast enough, consider writing this in C# instead. Especially with such "low level" code, C# tends to be faster. You may embed C# code in PowerShell using Add-Type -TypeDefinition $csCode.
As another optimization, instead of using String.Split() which creates more sub strings than actually needed, you may use String.IndexOf() and String.Substring() to only extract the necessary columns.
Last not least, you may experiment with StreamReader and StreamWriter constructors that lets you allow to specify a buffer size.
Just a more native PowerShell solution/syntax:
Import-Csv .\Test.txt -Delimiter "|" -Header #(1..12) |
Select-Object -ExcludeProperty $Remove |
ConvertTo-Csv -Delimiter "|" -UseQuotes Never |
Select-Object -Skip 1 |
Set-Content -Path .\testoutput.txt
For general performance hints, see: PowerShell scripting performance considerations. Meaning that the answer from #zett42 probably holds the fastest solution. But there are a few reasons you might want to defer from this solution if only for the note:
⚠ Note
Many of the techniques described here are not idiomatic PowerShell and may reduce the readability of a PowerShell script. Script authors are advised to use idiomatic PowerShell unless performance dictates otherwise.
(Correctly) using the PowerShell Pipeline might save a lot of memory (as every item will be immediately processed and released from memory at the end of the stream -when e.g. sent to disk-) where .Net solutions generally require to load everything into memory. Meaning, at the moment your PC runs out of physical memory and memory pages are swapped to disk, PowerShell might even outperform .Net solutions.
As the helpful comment "note that calling Add-Content in every iteration is slow, because the file has to be opened and closed every time. Instead, add another pipeline segment with a single Set-Content call" from #mklement0 implies: the Set-Content cmdlet should be at the end of the pipeline, after the (last) pipe (|) character.
The syntax ... .{process{ ... is probably an attempt to Speeding Up the Pipeline. This might indeed improve the PowerShell performance but if you want to implement this properly, you probably don't want to dot-source this but invoke this via the background operator &, See: #8911 Performance problem: (implicitly) dot-sourced code is slowed down by variable lookups.
Anyways, the bottleneck is likely the input/output device (disk), as long as PowerShell is able to keep up with this input/output device, there is probably no performance improvement in tweaking this.
Besides the fact that the later PowerShell versions (7.2) generally perform better than Windows PowerShell (5.1), some cmdlets are improved. Like the newer ConvertTo-Csv which has additional -UseQuotes <QuoteKind> and -QuoteFields <String[]> parameters. If you are stuck with Windows PowerShell, you might check this question: Delete duplicate lines from text file based on column
Although there is an easy way to read delimited files without headers using the Import-Csv cmdlet (with the -header parameter) using the Import-Csv cmdlet, there is no easier way to skip the Cvs header for the counter cmdlet Export-Csv. This can be worked around with the with: ConvertTo-Csv |Select-Object -Skip 1 |Set-Content -Path .\output.txt, see also: #17527 Add -NoHeader switch to Export-Csv and ConvertTo-Csv

Scanning log file using ForEach-Object and replacing text is taking a very long time

I have a Powershell script that scans log files and replaces text when a match is found. The list is currently 500 lines, and I plan to double/triple this. the log files can range from 400KB to 800MB in size. 
Currently, when using the below, a 42MB file takes 29mins, and I'm looking for help if anyone can see any way to make this faster?
I tried changing ForEach-Object with ForEach-ObjectFast but it's causing the script to take sufficiently longer. also tried changing the first ForEach-Object to a forloop but still took ~29 mins. 
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
'bbb:ccc:456'='WORDB:WORDBC:NUMBER456'
}
Get-Content -Path $inputfile | ForEach-Object {
$line=$_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line-match$_.Key)
{
$line=$line-replace$_.Key,$_.Value
}
}
$line
}|Set-Content -Path $outputfile
Since you say your input file could be 800MB in size, reading and updating the entire content in memory could potentially not fit.
The way to go then is to use a fast line-by-line method and the fastest I know of is switch
# hardcoded here for demo purposes.
# In real life you get/construct these from the Get-ChildItem
# cmdlet you use to iterate the log files in the root folder..
$inputfile = 'D:\Test\test.txt'
$outputfile = 'D:\Test\test_new.txt' # absolute full file path because we use .Net here
# because we are going to Append to the output file, make sure it doesn't exist yet
if (Test-Path -Path $outputfile -PathType Leaf) { Remove-Item -Path $outputfile -Force }
$lookupTable= #{
'aaa:bbb:123'='WORDA:WORDB:NUMBER1'
}
# create a regex string from the Keys of your lookup table,
# merging the strings with a pipe symbol (the regex 'OR').
# your Keys could contain characters that have special meaning in regex, so we need to escape those
$regexLookup = '({0})' -f (($lookupTable.Keys | ForEach-Object { [regex]::Escape($_) }) -join '|')
# create a StreamWriter object to write the lines to the new output file
# Note: use an ABSOLUTE full file path for this
$streamWriter = [System.IO.StreamWriter]::new($outputfile, $true) # $true for Append
switch -Regex -File $inputfile {
$regexLookup {
# do the replacement using the value in the lookup table.
# because in one line there may be multiple matches to replace
# get a System.Text.RegularExpressions.Match object to loop through all matches
$line = $_
$match = [regex]::Match($line, $regexLookup)
while ($match.Success) {
# because we escaped the keys, to find the correct entry we now need to unescape
$line = $line -replace $match.Value, $lookupTable[[regex]::Unescape($match.Value)]
$match = $match.NextMatch()
}
$streamWriter.WriteLine($line)
}
default { $streamWriter.WriteLine($_) } # write unchanged
}
# dispose of the StreamWriter object
$streamWriter.Dispose()

How to add counter into Powershells ForEach-Object function

So I have a Pipe that will search a file for a specific stream and if found will replace it with a masked value, I am trying to have a counter for all of the times the oldValue is replaced with the newValue. It doesn't necessarily need to be a one liner just curious how you guys would go about this. TIA!
Get-Content -Path $filePath |
ForEach-Object {
$_ -replace "$oldValue", "$newValue"
} |
Set-Content $filePath
I suggest:
Reading the entire input file as a single string with Get-Content's -Raw switch.
Using -replace / [regex]::Replace() with a script block to determine the substitution text, which allows you to increment a counter variable every time a replacement is made.
Note: Since you're replacing the input file with the results, be sure to make a backup copy first, to be safe.
In PowerShell (Core) 7+, the -replace operator now directly accepts a script block that allows you to determine the substitution text dynamically:
$count = 0
(Get-Content -Raw $filePath) -replace $oldValue, { $newValue; ++$count } |
Set-Content -NoNewLine $filePath
$count now contains the number of replacements, across all lines (including multiple matches on the same line), that were performed.
In Windows PowerShell, direct use of the underlying .NET API, [regex]::Replace(), is required:
$count = 0
[regex]::Replace(
'' + (Get-Content -Raw $filePath),
$oldValue,
{ $newValue; ++(Get-Variable count).Value }
) | Set-Content -NoNewLine $filePath
Note:
'' + ensures that the call succeeds even if file $filePath has no content at all; without it, [regex]::Replace() would complain about the argument being null.
++(Get-Variable count).Value must be used in order to increment the $count variable in the caller's scope (Get-Variable can retrieve variables defined in ancestral scopes; -Scope 1 is implied here, thanks to PowerShell's dynamic scoping). Unlike with -replace in PowerShell 7+, the script block runs in a child scope.
As an aside:
For this use case, the only reason a script block is used is so that the counter variable can be incremented - the substitution text itself is static. See this answer for an example where the substitution text truly needs to be determined dynamically, by deriving it from the match at hand, as passed to the script block.
Changing my answer due to more clarifications in comments. The best way I can think of is to get the count of the $Oldvalue ahead of time. Then replace!
$content = Get-Content -Path $filePath
$toBeReplaced = Select-String -InputObject $content -Pattern $oldValue -AllMatches
$replacedTotal = $toBeReplaced.Matches.Count
$content | ForEach-Object {$_ -replace "$oldValue", "$newValue"} | Set-Content $filePath

Delete lines from multiple textfiles in PowerShell

I am trying to delete lines with a defined content from multiple textfiles.
It works in the core, but it will rewrite every file even if no changes are made, which is not cool if you are just modifying 50 out of about 3000 logonscripts.
I even made a if statement but it seems like it doesn't work.
Alright this is what I already have:
#Here $varFind will be escaped from potential RegEx triggers.
$varFindEscaped = [regex]::Escape($varFind)
#Here the deletion happens.
foreach ($file in Get-ChildItem $varPath*$varEnding) {
$contentBefore = Get-Content $file
$contentAfter = Get-Content $file | Where-Object {$_ -notmatch $varFindEscaped}
if ($contentBefore -ne $contentAfter) {Set-Content $file $contentAfter}
}
What the variables mean:
$varPath is the path in which the logonscripts are.
$varEnding is the file ending of the files to modify.
$varFind is the string that triggers the deletion of the line.
Any help would be highly appreciated.
Greetings
Löwä Cent
You have to read the file regardless but some improvement on your change condition could help.
#Here the deletion happens.
foreach ($file in Get-ChildItem $varPath*$varEnding) {
$data = (Get-Content $file)
If($data -match $varFindEscaped){
$data | Where-Object {$_ -notmatch $varFindEscaped} | Set-Content $file
}
}
Read the file into $data. Check to see if the pattern $varFindEscaped is present in the file. If it is than filter out those matching the same pattern. Else we move onto the next file.