Replacing/inserting newlines using Powershell - powershell

(I have read the other threads with similar names...)
I'm new to PowerShell. I am trying to understand how to find and replace newlines. For example, find double newlines and replace them with a single or vice versa.
I have a test document that was created using Notepad:
The quick brown fox jumped over the lazy dog
The quick brown fox jumped over the lazy dog
The quick brown fox jumped over the lazy dog
The quick brown fox jumped over the lazy dog
I am working in the PowerShell ISE for testing/learning.
When I run the following command (attempting to replace one newline with two):
((Get-Content -path $filename -raw) -replace '`n','`n`n') | Set-Content -path $filename
Get-Content -path $filename -raw
The output is unchanged. So I tried the following and it remained unchanged.
((Get-Content -path $filename -raw) -replace '`r`n','`r`n`r`n') | Set-Content -path $filename
So, knowing that PowerShell uses a back-tick rather than a backslash, but out of frustration, I tried the following command:
((Get-Content -path $filename -raw) -replace '\n','\n\n') | Set-Content -path $filename
And, surprisingly (to me), all of the newlines were replaced, but with the string literal '\n\n'. So it seems searching for a newline worked with a backslash but not with a back-tick. The replacement, unfortunately, was the literal string rather than the CRLF I need.
I'm stumped. But for what it's worth, I also tried the following and the string literal was again used for the replacement (i.e., in place of newlines, the document contained '`r`n').
((Get-Content -path $filename -raw) -replace '\n','`r`n') | Set-Content -path $filename
I have seen many posts where people were mistakenly using a backslash, but in my case it seems like a backslash is required for the search, and I don't understand what is required to replace a newline.
Thanks!

'`n' just matches the literal characters [backtick][n], which isn't what you want. You want to interpret those values. For that, you'll need to use double quotes i.e., '`n' should be "`n". According to Microsoft...
The special characters in PowerShell begin with the backtick
character, also known as the grave accent (ASCII 96). ... These
characters are case-sensitive. The escape character is only
interpreted when used within double quoted (") strings.

Use double quotes. You probably also want the -nonewline option to set-content, so that another `r`n doesn't get put at the end of the file.
PS> '`n'
`n
PS> "`n"
PS> (Get-Content -path $filename -raw) -replace "`n","`n`n" |
Set-Content -path $filename -nonewline

There are several ways of doing this. First one is to read the file as a single string and perform a regex -replace on it:
Remember that on Windows machines the Newline is a combination of two characters CR ('\r', ASCII value 13) and LF ('\n', ASCII value 10).
$filename = 'D:\test.txt'
# replace single newlines by a double newline
$replaceWith = '{0}{0}' -f [Environment]::NewLine
(Get-Content -Path $filename -Raw) -replace '\r?\n', $replaceWith | Set-Content -Path 'D:\test-to-double.txt' -Force
# replace double newlines by a single newline
$replaceWith = [Environment]::NewLine
(Get-Content -Path $filename -Raw) -replace '(\r?\n){2}', $replaceWith | Set-Content -Path 'D:\test-to-single.txt' -Force
Another way is to read in the file as string array (let PowerShell deal with single newlines):
# read the file as string array and join the elements with a double newline
$replaceWith = '{0}{0}' -f [Environment]::NewLine
(Get-Content -Path $filename) -join $replaceWith | Set-Content -Path 'D:\test-to-double.txt' -Force
# read the file as string array and join the elements with a single newline
$replaceWith = [Environment]::NewLine
(Get-Content -Path $filename) -join $replaceWith | Set-Content -Path 'D:\test-to-single.txt' -Force
The latter method is also extremely suited for removing empty or whitespace-only lines before you 'normalize' the newlines in the text:
In that case, just replace (Get-Content -Path $filename) with (Get-Content -Path $filename | Where-Object { $_ -match '\S' })

Related

powershell -replace only write file on changes

I'm doing some multiple regEX replacements in powershell on a large number of files and would like to only write the file if any replacements were actually made.
For example if I do:
($_ | Get-Content-Raw) -Replace 'MAKEUPS', 'Makeup' -Replace '_MAKEUP', 'Makeup' -Replace 'Make up', 'Makeup' -Replace 'Make-up', 'Makeup' -Replace '"SELF:/', '"' |
Out-File $_.FullName -encoding ASCII
I only want to write the file if it found anything to replace. Is this possible, maybe with a count or boolean operation?
I did think maybe to check the length of the string before and after but was hoping for a more elegant solution, so I thought I'd ask the experts!
You can take advantage of the fact that PowerShell's -replace operator passes the input string through as-is if no replacements were performed:
# <# some Get-ChildItem command #> ... | ForEach-Object {
# Read the input file in full, as a single string.
$originalContent = $_ | Get-Content -Raw
# *Potentially* perform replacements, depending on whether the search patterns are found.
$potentiallyModifiedContent =
$originalContent -replace 'MAKEUPS', 'Makeup' -replace '_MAKEUP', 'Makeup' -replace 'Make up', 'Makeup' -replace 'Make-up', 'Makeup' -replace '"SELF:/', '"'
# Save, but only if modifications were made.
if (-not [object]::ReferenceEquals($originalContent, $potentiallyModifiedConent)) {
Set-Content -NoNewLine -Encoding Ascii -LiteralPath $_.FullName -Value $potentiallyModifiedConent
}
# }
[object]::ReferenceEquals() tests for reference equality, i.e. whether the two strings represent the exact same string instance, which makes the comparison very efficient (no need to look at the content of the strings).
Set-Content rather than Out-File is used to write the output file, which is preferable for performance reasons with input that is made up of strings already.
-NoNewLine is needed to prevent a trailing newline from getting appended to the output file.
You could use the script block feature added in ps6 to set a variable when a replacement takes place, then return the replacement string.
$replaced = $false
$content = (Get-content -raw $file) -replace "(make-up|makeups|make up|...)", {
# $replaced = $true
Set-Variable replaced $true -Scope 1
return "Makeup"
} -replace "SELF:/", {
Set-Variable replaced $true -Scope 1
# $replaced = $true
return '"'
}
If ($replaced){
Set-content -path $file -value $content
}
In older versions of PowerShell, you might check the content length and if they're the same do a comparison... I wouldn't do a match to see if replacement is needed, that would be a lot more expensive...
$original = (Get-content -raw $file)
$content = ($original) -replace "(make-up|makeups|make up|...)", "Make up"
If (($original.length -ne $content.length) -or ($original -ne $content)) {
Set-content ...
}

How to read every csv file in the folder and remove quote character from the csv file?

How can i read every csv file the specific folder? When script below is executed, it only will remove quote character of one csv file.
$file="C:\test\IV-1-2020-04-02.csv"
(GC $file) | % {$_ -replace '"', ''} > $file
Get-ChildItem -Path C:\test\ -Filter '*.csv'
The output only will remove the quote character of "IV-1-2020-04-02.csv". What if i have different filename ?
You can iterate each .csv file from Get-ChildItem and replace the quotes " with '' using Set-Content.
$files = Get-ChildItem -Path "YOUR_FOLDER_PATH" -Filter *.csv
foreach ($file in $files)
{
Set-Content -Path $file.FullName -Value ((Get-Content -Path $file.FullName -Raw) -replace '"', '')
}
Make sure to pass your folder path to -Path, which tells Get-ChildItem to fetch every file from this folder
Its also faster to use the -Raw switch for Get-Content, since it reads the file into one string and preserves newlines. If you omit this switch, Get-Content will by default split the lines by newlines into an array of strings
If you want to read files in deeper sub directories as well, then add the -Recurse switch to Get-ChildItem:
$files = Get-ChildItem -Path "YOUR_FOLDER_PATH" -Filter *.csv -Recurse
Addtionally, you could also use Foreach-Object here:
Get-ChildItem -Path "YOUR_FOLDER_PATH" -Filter *.csv -Recurse | ForEach-Object {
Set-Content -Path $_.FullName -Value ((Get-Content -Path $_.FullName -Raw) -replace '"', '')
}
Furthermore, you could replace Foreach-Object with its alias %. However, If your using VSCode and have PSScriptAnalyzer enabled, you may get this warning:
'%' is an alias of 'ForEach-Object'. Alias can introduce possible problems and make scripts hard to maintain. Please consider changing alias to its full content.
Which warns against using aliases for maintainability. Its much safer and more portable to use the full version. I only use the aliases for quick command line usage, but when writing scripts I use the full versions.
Note: The above solutions could potentially corrupt the CSV if some lines need quoting. This solution simply goes through the whole file and replaces every quote with ''. PowerShell 7 offers a -UseQuotes AsNeeded option for Export-Csv, so you may look into that instead.
Don't just replace all the " unless you are very certain that it's a good idea; otherwise replace the " when it shouldn't matter because the field doesn't contain text with a comma, double quote, nor line break. (see RFC-4180 section 2, #6 and #7)
As with any script that overwrites its working files, make sure you have backups of those files should you want an undo option later on...
$tog = $true
$sep = ':_:'
$header=#()
filter asString{
$obj=$_
if($tog){
$header=(gm -InputObject $obj -Type NoteProperty).Name
$hc = $header.Count-1
$tog=$false
$str = $header -join $sep
$str = "$sep$str" -replace '"','""'
$str = $str -replace "$sep(((?!$sep)[\s\S])*(,|""|\n)((?!$sep)[\s\S])*)",($sep+'"$1"')
($str -replace $sep,',').Substring(1)
}
$str = (0..$hc | %{$obj.($header[$_])}) -join $sep
$str = "$sep$str" -replace '"','""'
$str = $str -replace "$sep(((?!$sep)[\s\S])*(,|""|\n)((?!$sep)[\s\S])*)",($sep+'"$1"')
($str -replace $sep,',').Substring(1)
}
ls *.csv | %{$tog=$true;import-csv $_ | asString | sc "$_.new";$_.FullName} | %{if(test-path "$_.new"){mv "$_.new" $_ -force}}
Note: the CSV files are expected to contain their own headers. You could work around that if you needed to with the use of the -Header option of Import-Csv

Replace new lines in CLXML file

After generating a CLXML file:
[string]$myString = 'foobar' | Export-Clixml -Path C:\Files\test.clxml
I'm trying to remove line breaks after the right close anchor >. I have tried:
(Get-Content C:\Files\test.clxml) -replace "`n", "" | Set-Content C:\Files\test.clxml
Also tried using -replace r but this strips out r characters from the file.
What am I doing wrong?
Get-Content returns an array holding each single line (not containing any line feeds).
Set-Content writes your array of lines to a single file separating them with line feeds.
Meaning you should do the following to get what you want:
(Get-Content C:\Files\test.clxml) -join "" | Set-Content C:\Files\test.clxml
Your issue is that in your test there are no newlines to replace. Get-Content is returning a string array that is seen as having newlines on screen when rendered. To actually get them inside the string to be manipulated try one of these.
(Get-Content C:\Files\test.clxml -Raw) -replace "`n" | Set-Content C:\Files\test.clxml
(Get-Content C:\Files\test.clxml | Out-String) -replace "`n" | Set-Content C:\Files\test.clxml
The latter would be needed if your have PS Version 2.0

Search for >$(bla)< and replace

I must replace a value in a file. This works for normal text with this command
(Get-Content $file) | Foreach-Object {$_ -replace "SEARCH", "REPLACE"} | Set-Content $file
But now, the search text is "$(SEARCH)" (without quotes). Backslash escaping the '$' with '`$' doesn't work:
(Get-Content $file) | Foreach-Object {$_ -replace "`$(SEARCH)", "BLA"} | Set-Content $file
Any ideas? Thank you.
The -replace operator is actually a regular expression replacement not a simple string replacement, so you've got to escape the regular expression:
(Get-Content $file) | Foreach-Object {$_ -replace '\$\(SEARCH\)', "BLA"} | Set-Content $file
Note that you can suppress string interpolation by using single quotes (') of double quotes (") around string literals, which I've done above.
Macintron,
You can try something like below :
(Get-Content $file) | Foreach-Object {$_.replace('$(SEARCH)', "BLA"} | Set-Content $file
slightly (or even more) faster
sc $file ((gc $file -Raw) -replace '\$\(search\)','BLAHH')

Powershell script for going into texts files within a directory and replacing characters

Morning all,
I'm trying to work out how to go into a number of text files within a directory, and replace the characters with the following:
'BS' = '\'
'FS' = '/'
'CO' = ':'
What I managed to get to so far is:
(get-content C:\users\x\desktop\info\*.txt) | foreach-object {$_ -replace "bs", "\"} | set-content C:\Users\x\desktop\info\*.txt
I have got 6 text files, all with a line of text in them, the script above copies the line of text into all the text files. So what I end up with is 6 text files, each with 6 lines of text, I need 6 text files with 1 line of original text.
If that makes sense, does anyone have any pointers on this?
Some other example, should do the same trick :)
$fileName = Get-ChildItem "C:\users\x\desktop\info\*.txt" -Recurse
$filename | %{
(gc $_) -replace "BS","\" -replace "FS","/" -replace "CO",":" |Set-Content $_.fullname
}
This should do the trick:
Get-ChildItem C:\users\x\desktop\info\*.txt | ForEach-Object {(Get-Content $_.PSPath) | Foreach-Object {$_ -replace "bs", "\" -replace "fs", "/" -replace "co", ":"} | Set-Content $_.PSPath}
The reason yours wasn't acting as you expected it to, is because you were literally taking all the contents out of all files using get-content. This acts as a string concatenation of all text in all files.
You first have to get a list of files, then pipe that into a foreach to get the contents per file, to then replace what you want replaced.