Powershell Script to replace accented letters in multiple files not working - powershell

I'm trying to replace ALL accented letters and some strings in multiple files located in one folder. The strings replacement is working but not the accented letters
I've multiple files located in "C:\\FilePath"
I've created a Batch file with the following code:
#echo off
Powershell.exe -executionpolicy remotesigned -File C:\Users\User\Desktop\IFCParser.ps1
pause
And IFCParser.ps1 contains all the following lines, one after the other:
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName | Select-String -Pattern 'IFCBuilding') {(Get-Content $_ | ForEach {$_ -replace 'IFCBuilding', 'IFCBuildingElementProxy'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName | Select-String -Pattern 'IFCAnotherWord') {(Get-Content $_ | ForEach {$_ -replace 'IFCAnotherWord', 'IFCBuildingElementProxy'}) | Set-Content $_ }}
The above code DOES the job when I run the bat file, but I can't get the following part to work:
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'á' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'á', 'a'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'é' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'é', 'e'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'í' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'í', 'i'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'ó' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'ó', 'o'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'ú' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'ú', 'u'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Á' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Á', 'A'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'É' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'É', 'E'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Í' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Í', 'I'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Ó' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Ó', 'O'}) | Set-Content $_ }}
Get-ChildItem -Path C:\FilePath\*.* -recurse | ForEach {If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'Ú' -AllMatches) {(Get-Content $_ -Encoding UTF8 | ForEach {$_ -creplace 'Ú', 'U'}) | Set-Content $_ }}
I'm testing this on a file like this:
áéíóúÁÉÍÓÚÑñáéíóúÁ
ÉÍÓÚÑñáéíóúÁÉÍÓÚÑñá
éíóúÁÉÍÓÚÑñáéíóúÁÉÍÓÚÑñáéíó
úÁÉÍÓÚÑñáéíóúÁÉÍÓÚÑñ
And it stays the same, no accents removed.
I think that I've something wrong with the encoding, I've run this with the parameter just in the first GetContent, only on the second one, and with no -Encoding at all.
By the way, I'm sure that there are more effective ways of doing this, but I'm just starting with this here and not finding one that works.

As for replacing the contents of the files in your folder, you should be able to do that using just one Get-ChildItem call.
Put this helper function on top of your script; it is used for replacing all the accented letters in the files:
function Replace-Diacritics {
Param(
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[string] $Text
)
($Text.Normalize( [Text.NormalizationForm]::FormD ).ToCharArray() |
Where-Object {[Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne
[Globalization.UnicodeCategory]::NonSpacingMark }) -join ''
}
Now the rest of the code simplified:
Get-ChildItem -Path 'C:\FilePath\*.*' -File -Recurse | ForEach-Object {
$content = Get-Content -Path $_.FullName -Raw -Encoding UTF8 | Replace-Diacritics
$content -replace '\b(IFCBuilding|IFCAnotherWord)\b', 'IFCBuildingElementProxy' | Set-Content -Path $_.FullName -Encoding UTF8
}
Using your example file, the new content after calling `Replace-Diacritics``will be:
aeiouAEIOUNnaeiouA
EIOUNnaeiouAEIOUNna
eiouAEIOUNnaeiouAEIOUNnaeio
uAEIOUNnaeiouAEIOUNn
Operator -replace uses regex. The pattern '\b(IFCBuilding|IFCAnotherWord)\b' means to find he words 'IFCBuilding' OR 'IFCAnotherWord' as whole words (\b is a Word Boundary) and replace these with 'IFCBuildingElementProxy'.
If you also need this to be case-sensitive, use -creplace instead of -replace
For very large files, Get-Content may not be the cmdlet you'll want to use as it reads the file in memory as a whole.
To handle those large files using a combination of a StreamReader and a StreamWriter would be much more memory efficient (at the cost of more disk read/write actions).
Note that you cannot read a file and write to the same file simultaniously, so the code below will create a new name for the updated file by appending _New to the BaseName.
Again start with this helper function on top
function Replace-Diacritics {
Param(
[Parameter(Mandatory = $true, ValueFromPipeline = $true)]
[string] $Text
)
($Text.Normalize( [Text.NormalizationForm]::FormD ).ToCharArray() |
Where-Object {[Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne
[Globalization.UnicodeCategory]::NonSpacingMark }) -join ''
}
Get-ChildItem -Path 'C:\FilePath\*.*' -File -Recurse | ForEach-Object {
# create a StreamReader to read the file line-by-line
$reader = [System.IO.StreamReader]::new($_.FullName, [System.Text.Encoding]::UTF8)
# older PowerShell versions use:
# $reader = New-Object System.IO.StreamReader($_.FullName, [System.Text.Encoding]::UTF8)
# create a full path and filename for the updated output file
$outFile = Join-Path -Path $_.DirectoryName -ChildPath ('{0}_New{1}' -f $_.BaseName, $_.Extension)
# create a StreamWriter object to write the lines to the new output file
# The StreamWriter class by default writes files with UTF-8 encoding without a Byte-Order Mark (BOM)
$writer = [System.IO.StreamWriter]::new($outFile)
# loop through the lines of the file
while ($null -ne ($line = $reader.ReadLine())) {
if (![string]::IsNullOrWhiteSpace($line)) {
$line = ($line | Replace-Diacritics) -replace '\b(IFCBuilding|IFCAnotherWord)\b', 'IFCBuildingElementProxy'
}
$writer.WriteLine($line)
}
# clean up for next file
$writer.Flush()
$writer.Dispose()
$reader.Dispose()
}

Running a single line of code on a single file like this works as expected:
Get-ChildItem -Path C:\temp\testdata.txt | ForEach-Object {
If (Get-Content $_.FullName -Encoding UTF8 | Select-String 'á' -AllMatches) {
(Get-Content $_ -Encoding UTF8 | ForEach-Object { $_ -creplace 'á', 'a' }) | Set-Content $_ }
}
Given this, your code must be failing in the file recursion or in the execution process.
Run the script in an editor before trying to run as a batch and try adding error trapping. You can also add some logging to track down what's happening when running as batch:
Start-Transcript -Path 'c:\temp\outputlog.txt'
Try {
Get-ChildItem -Path C:\temp\testdata.txt -recurse -ErrorAction Stop | ForEach-Object {
Write-Host "Processing $_"
If (Get-Content $_.FullName -Encoding UTF8 -ErrorAction Stop | Select-String 'á' -AllMatches) {
Write-Host "Found match for á, replacing...."
(Get-Content $_ -Encoding UTF8 -ErrorAction Stop | ForEach-Object { $_ -creplace 'á', 'a' }) | Set-Content $_ -ErrorAction Stop }
}
}
Catch {
$_
Stop-Transcript
}
Stop-Transcript

Related

How to replace text in multiple file in many folder using powershell

I have many folder
ex: folder1,folder2,folder3... about folder100
In those folder have many files
ex: 1.html,2.html,3.html,4.html...about 20.html
I want to replace some text in those all html file in all folder
but not all text i want to replace is same.
ex:(for 1.html, i want to replace ./1_files/style.css to style.css) and (for 2.html, i want to replace ./2_files/style.css to style.css)....
So i try something like this and it work well
Get-ChildItem "*\1.html" -Recurse | ForEach-Object -Process {
(Get-Content $_) -Replace './1_files/style.css', 'style.css' | Set-Content $_
}
Get-ChildItem "*\2.html" -Recurse | ForEach-Object -Process {
(Get-Content $_) -Replace './2_files/style.css', 'style.css' | Set-Content $_
}
Get-ChildItem "*\3.html" -Recurse | ForEach-Object -Process {
(Get-Content $_) -Replace './3_files/style.css', 'style.css' | Set-Content $_
}
Get-ChildItem "*\4.html" -Recurse | ForEach-Object -Process {
(Get-Content $_) -Replace './4_files/style.css', 'style.css' | Set-Content $_
}
but i have to write many of those code "\4.html" "\5.html" "*\6.html" ...
i try this but it do not work
Do {
$val++
Write-Host $val
$Fn = "$val.html"
Get-ChildItem "*\$Fn" -Recurse | ForEach-Object -Process {
(Get-Content $_) -Replace './$val_files/style.css', 'style.css' |
Set-Content $_
}
} while($val -ne 100)
Please show me correct way to do..loop replace
thanks you
Assuming all your subfolders can be found inside one source folder path, you can do below to do the replacement in all those files:
# the path where all subfolders and html files can be found
$sourcePath = 'X:\Wherever\Your\Subfolders\Are\That\Contain\The\Html\Files'
Get-ChildItem -Path $sourcePath -Filter '*.html' -Recurse -File |
# filter on html files that have a numeric basename
Where-Object {$_.BaseName -match '(\d+)'} | ForEach-Object {
# construct the string to repace and escape the regex special characters
$replace = [regex]::Escape(('./{0}_files/style.css' -f $matches[1]))
# get the content as one single multiline string so -replace works faster
(Get-Content -Path $_.FullName -Raw) -replace $replace, 'style.css' |
Set-Content -Path $_.FullName
}

Combine file IO for -replace and Set-Content in PowerShell

I have the following script:
$allFiles = Get-ChildItem "./" -Recurse | Where { ($_.Extension -eq ".ts")}
foreach($file in $allFiles)
{
# Find and replace the dash cased the contents of the files
(Get-Content $file.PSPath) |
Foreach-Object {$_ -replace "my-project-name", '$appNameDashCased$'} |
Set-Content $file.PSPath
# Find and replace the dash cased the contents of the files
(Get-Content $file.PSPath) |
Foreach-Object {$_ -replace "MyProjectName", '$appNameCamelCased$'} |
Set-Content $file.PSPath
# Find and replace the dash cased the contents of the files
(Get-Content $file.PSPath) |
Foreach-Object {$_ -replace "myProjectName", '$appNamePascalCased$'} |
Set-Content $file.PSPath
}
It takes a file and does some replacing, then saves the file. Then it takes the same file and does some more replacing then saves the file again. Then it does it one more time.
This works, but seems inefficient.
Is there a way to do all the replacing and then save the file once?
(If possible, I would prefer to keep the readable style of PowerShell.)
Sure, just chain your replaces inside the ForEach-Object block:
$allFiles = Get-ChildItem "./" -Recurse | Where { ($_.Extension -eq ".ts")}
foreach($file in $allFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object {
# Find and replace the dash cased the contents of the files
$_ -replace "my-project-name", '$appNameDashCased$' `
-replace "MyProjectName", '$appNameCamelCased$' `
-replace "myProjectName", '$appNamePascalCased$'
} |
Set-Content $file.PSPath
}
This can be done, and is actually far simpler than what you're doing. You can chain the -Replace command as such:
$allFiles = Get-ChildItem "./" -Recurse | Where { ($_.Extension -eq ".ts")}
foreach($file in $allFiles)
{
# Find and replace the dash cased the contents of the files
(Get-Content $file.PSPath) -replace "my-project-name", '$appNameDashCased$' -replace "StringB", '$SecondReplacement$' -replace "StringC", '$ThirdReplacement$' | Set-Content $file.PSPath
}

Search and replace with PowerShell

I'm using the below PowerShell script to search and replace, which works fine.
$files = Get-ChildItem 'E:\replacetest' -Include "*.txt" -Recurse | ? {Test-Path $_.FullName -PathType Leaf}
foreach($file in $files)
{
$content = Get-Content $file.FullName | Out-String
$content| Foreach-Object{$_ -replace 'hello' , 'hellonew'`
-replace 'hola' , 'hellonew' } | Out-File $file.FullName -Encoding utf8
}
The issue is the script also modifies the files which does not have the matching text in it. How we ignore the files that do not have the matching text?
You can use match to see if the content is actually changed. Since you were always writing using out-file the file would be modified.
$files = Get-ChildItem 'E:\replacetest' -Include "*.txt" -Recurse | Where-Object {Test-Path $_.FullName -PathType Leaf}
foreach( $file in $files ) {
$content = Get-Content $file.FullName | Out-String
if ( $content -match ' hello | hola ' ) {
$content -replace ' hello ' , ' hellonew ' `
-replace ' hola ' , ' hellonew ' | Out-File $file.FullName -Encoding utf8
Write-Host "Replaced text in file $($file.FullName)"
}
}
You've got an extra foreach and you need an if statement:
$files = Get-ChildItem 'E:\replacetest' -Include "*.txt" -Recurse | ? {Test-Path $_.FullName -PathType Leaf}
foreach($file in $files)
{
$content = Get-Content $file.FullName | Out-String
if ($content -match 'hello' -or $content -match 'hola') {
$content -replace 'hello' , 'hellonew'`
-replace 'hola' , 'hellonew' | Out-File $file.FullName -Encoding utf8
}
}

Showing information after replacement

I use below code to change strings in files:
Set-Location -Path C:\Users\Documents\corporate
foreach ($file in get-ChildItem *.rdl)
{
$_.Replace("Protection", "Converters") | Set-Content $file
$_.Replace("Drives", "Automation") | Set-Content $file
$_.Replace("MACHINES", "Generators") | Set-Content $file
$file.name
}
I want to add information what has changed in individual files.
For example:
file 1 Protection
file 3 Protection, MACHINES
try this way ...
Get-Content -Path "C:\Users\Documents\corporate" -Filter "*.rdl" | ForEach-Object {
$Local:CurrentFileFullName = $_.FullName
((Get-Content -Path $CurrentFileFullName ) -replace "Protection", "Converters" -replace "Drives", "Automation" -replace "MACHINES", "Generators" | Set-Content $CurrentFileFullName -Force)
}

PowerShell get-child item and get-content a few folders deep

My script below only works for 1 folder using the "$_" before prior to the location of the file:
get-childitem E:\WebSystems\Configs\ | Foreach-Object {get-content E:\WebSystems\Configs\$_\Web.config} | foreach-object {$_ -replace "Web1", "Web2"} | set-content E:\WebSystems\Configs\$_\Web.config}
How about two folders deep? ex: E:\WebSystems\Configs\Folder1\Folder2\Web.config
The following script doesn't work.
get-childitem E:\WebSystems\Configs\ | Foreach-Object {get-content E:\WebSystems\Configs\$_\$_\Web.config} | foreach-object {$_ -replace "Web1", "Web2"} | set-content E:\WebSystems\Configs\$_\$_\Web.config}
This should work:
get-childitem -recurse -include Web.Config | foreach-object { $name = $_.FullName; get-content $name } | foreach-object {$_ -replace "Web1", "Web2" } | set-content $name
I implore you to test this on an isolated temporary directory. You might want to try this at first:
get-childitem -recurse -include Web.Config | foreach-object { $name = $_.FullName; get-content $name } | foreach-object {$_ -replace "Web1", "Web2" } | set-content "$name.modified"