using powershell to replace extended ascii character in a text file

using powershell to replace extended ascii character in a text file - powershell

I'm needing to replace a hex 93 character to a "" string inside several csv files. Below is the code that I'm using. But it is not working I think the reason that it does not work is because the hex value is greater than 7F (Dec 127). I've tried several other methods to no avail. Any help would be appreciated.
$q1 = [String](0x93 -as [char])
Get-ChildItem ".\*.csv" -Recurse | ForEach {
(Get-Content $_ | ForEach { $_.replace($q1, '""') }) |
Set-Content $_
}
Note: Attach is a image of the format-hex dump of my test file. The first character is the one that I need to perform the replace on:

In Windows PowerShell, the default character encoding when reading from / writing to[1] files is "ANSI", i.e., the legacy 8-bit code page implied by the active system locale.
(By contrast, PowerShell Core defaults to UTF-8.)
For instance, the code page associated with the system locale on an US-English system is 1252, i.e., Windows-1252, where code point 0x93 is the non-ASCII “ quotation mark.
Howere, once a text file's content has been read into memory, in memory a string's characters are represented as UTF-16LE code units, i.e., as .NET [string] instances.
As a Unicode character, “ has code point U+201c, expressed as 0x201c in UTF-16LE.
Therefore - because in memory all strings are UTF-16LE code units - what you need to replace is [char] 0x201c:
$q1 = [char] 0x201c # “
Get-ChildItem *.csv -Recurse | ForEach-Object {
(Get-Content $_.FullName) -replace $q1, '""' | Set-Content $_.FullName
}
Note that Set-Content too uses the default character encoding, so the rewritten files will use "ANSI" encoding too - use the -Encoding parameter to change the output encoding, if desired.
Also note the (...) around the Get-Content call, which ensures that the input file i read into memory in full up front, which enables writing back to the same file in the same pipeline.
While this approach is convenient, note that it bears a slight risk of data loss if writing back to the input file is interrupted before completion.
Converting an "ANSI" code point to a Unicode code point
The following shows how an "ANSI" (8-bit) code point such as 0x93 can be converted to its equivalent UTF-16 code point, 0x201c:
# Convert an array of "ANSI" code points (1 byte each) to the UTF-16
# string they represent.
# Note: In Windows PowerShell, [Text.Encoding]::Default contains
# the "ANSI" encoding set by the system locale.
$str = [Text.Encoding]::Default.GetString([byte[]] 0x93) # -> '“'
# Get the UTF-16 code points of the characters making up the string.
$codePoints = [int[]] [char[]] $str
# Format the first and only code point as a hex. number.
'0x{0:x}' -f $codePoints[0] # -> '0x201c'
[1] Writing files with Set-Content, that is; using Out-File / >, by contrast, creates UTF-16LE ("Unicode") files. The cmdlets in Windows PowerShell display a bewildering array of differing encodings: see this answer. Fortunately, PowerShell Core now consistently defaults to (BOM-less) UTF-8.

Related

Simple way to convert txt file from UTF-8 to ASCII

I am trying to convert just one file from UTF-8 to ASCII. I found the following script online, and it creates the Out-File but it does not change the encoding to ASCII. Why is this not working?
Get-Content -Path "File/Path/to/file.txt" | Out-File -FilePath "File/Path/to/processed.txt" -Encoding ASCII

tl;dr
-Encoding ASCII does work, though your editor's GUI may still report the resulting file as UTF-8-encoded, for the reasons explained below.
First, a general caveat:
If your input file also contains non-ASCII-range characters, they will be transliterated to verbatim ?, i.e. you'll potentially lose information.
Conversely, if your input files are UTF-8-encoded but do not contain non-ASCII characters, they in effect already are ASCII-encoded files; see below.
ASCII encoding is a subset of UTF-8 encoding (except that ASCII encoding never involves a BOM).
Therefore, any (BOM-less) file composed exclusively of bytes representing ASCII characters is by definition also a valid UTF-8 file.
Modern editors default to BOM-less UTF-8; that is, if a file doesn't start with a BOM, they assume that it is UTF-8-encoded, and that's what their GUIs reflect - even if a given file happens to be composed of ASCII characters only.
To verify that your output file is indeed only composed of ASCII characters, use the following:
# This should return $false; '\P{IsBasicLatin}' matches any NON-ASCII character.
(Get-Content -Raw File/Path/to/processed.txt) -cmatch '\P{IsBasicLatin}'
For an explanation of this test, especially with respect to needing to use -cmatch, the case-sensitive variant of the -match operator, see this answer.
A complete example:
# Write a string that contains non-ASCII characters to a
# file with -Encoding Ascii
# The resulting fill will contain 1 line, with content 'caf?'
# That is, the "é" character was "lossily" transliterated to (ASCII) "?"
'café' | Out-File -Encoding Ascii temp.txt
# Examining the file for non-ASCII characters now indicates that
# there are none, i.e, $false is returned.
(Get-Content -Raw temp.txt) -cmatch '\P{IsBasicLatin}'

Replace String in a Binary Clipboard Dump from OneNote

I'm using an AHK script to dump the current clipboard contents to a file (which contains a copy of a part of Microsoft OneNote page to a file).
I would like to modify this binary file to search for a specific string and be able to import it back into AHK.
I tried the following but it looks like powershell is doing something additional to the file (like changing the encoding) and the import of the file into the clipboard is failing.
$ThisFile = 'B:\Users\Desktop\onenote-new-entry.txt'
$data = Get-Content $ThisFile
$data = $data.Replace('asdf','TESTREPLACE!')
$data | Out-File -encoding utf8 $ThisFile
Any suggestions on doing a string replace to the file without changing existing encoding?
I tried manually modifying in a text editor and it works fine. Obviously though I would like to have the modifications be done in mass and automatically which is why I need a script.
The text copied from OneNote and then dumped to file via AHK looks like this:
However, note the clipboard dump file has a lot of other meta-data as shown below when opened in an editor. To download for testing with PS, click here:

Since your file is a mix of binary data and UTF-8 text, you cannot use text processing (as you tried with Out-File -Encoding utf8), because the binary data would invariably be interpreted as text too, resulting in its corruption.
PowerShell offers no simple method for editing binary files, but you can solve your problem via an auxiliary "hex string" representation of the file's bytes:
# To compensate for a difference between Windows PowerShell and PowerShell (Core) 7+
# with respect to how byte processing is requested: -Encoding Byte vs. -AsByteStream
$byteEncParam =
if ($IsCoreCLR) { #{ AsByteStream = $true } }
else { #{ Encoding = 'Byte' } }
# Read the file *as a byte array*.
$ThisFile = 'B:\Users\Desktop\onenote-new-entry.txt'
$data = Get-Content #byteEncParam -ReadCount 0 $ThisFile
# Convert the array to a "hex string" in the form "nn-nn-nn-...",
# where nn represents a two-digit hex representation of each byte,
# e.g. '41-42' for 0x41, 0x42, which, if interpreted as a
# single-byte encoding (ASCII), is 'AB'.
$dataAsHexString = [BitConverter]::ToString($data)
# Define the search and replace strings, and convert them into
# "hex strings" too, using their UTF-8 byte representation.
$search = 'asdf'
$replacement = 'TESTREPLACE!'
$searchAsHexString = [BitConverter]::ToString([Text.Encoding]::UTF8.GetBytes($search))
$replaceAsHexString = [BitConverter]::ToString([Text.Encoding]::UTF8.GetBytes($replacement))
# Perform the replacement.
$dataAsHexString = $dataAsHexString.Replace($searchAsHexString, $replaceAsHexString)
# Convert he modified "hex string" back to a byte[] array.
$modifiedData = [byte[]] ($dataAsHexString -split '-' -replace '^', '0x')
# Save the byte array back to the file.
Set-Content #byteEncParam $ThisFile -Value $modifiedData
Note:
As discussed in the comments, in the case at hand this can only be expected to work if the search and the replacements strings are of the same length, because the file also contains metadata denoting the position and length of the embedded text parts. A replacement string of different length would require adjusting that metadata accordingly.
The string replacement performed is (a) literal, and (b) case-sensitive, and (c) - for accented characters such as é - only works the if the input - like string literals in .NET - uses the composed Unicode normalization form , where é is a single code point and encoded as such (resulting in a multi-byte UTF-8 escape sequence).
More sophisticated replacements, such as regex-based ones, would only be possible if you knew how to split the file data into binary and textual parts, allowing you to operate on the textual parts directly.
Optional reading: Modifying a UTF-8 file without incidental alterations:
Note:
The following applies to text-only files that are UTF-8-encoded.
Unless extra steps are taken, reading and re-saving such files in PowerShell can result in unwanted incidental changes to the file. Avoiding them is discussed below.
PowerShell never preserves information about the character encoding of an input file, such as one read with Get-Content. Also, unless you use -Raw, information about the specific newline format is lost, as well as whether the file had a trailing newline or not.
Assuming that you know the encoding:
Read the file with Get-Content -Raw and specify the encoding with -Encoding (if necessary). You'll receive the file's content as a single, multi-line .NET string.
Use Set-Content -NoNewLine to save the modified string back to the file, using -Encoding with the original encoding.
Caveat: In Windows PowerShell, -Encoding utf8 invariably creates a UTF-8 file with BOM, unlike in PowerShell (Core) 7+, which defaults to BOM-less UTF-8 and requires you to use -Encoding utf8BOM if you want a BOM.
If you're using Windows PowerShell and do not want a UTF-8 BOM, use $null =New-Item -Force ... as a workaround, and pass the modified string to the -Value parameter.
Therefore:
$ThisFile = 'B:\Users\Desktop\onenote-new-entry.txt'
$data = Get-Content -Raw -Encoding utf8 $ThisFile
$data = $data.Replace('asdf','TESTREPLACE!')
# !! Note the caveat re BOM mentioned above.
$data | Set-Content -NoNewLine -Encoding utf8 $ThisFile
Streamlined reformulation, in a single pipeline:
(Get-Content -Raw -Encoding utf8 $ThisFile) |
ForEach-Object Replace 'asdf', 'TESTREPLACE!' |
Set-Content -NoNewLine -Encoding utf8 $ThisFile
With the New-Item workaround, if the output file mustn't have a BOM:
(Get-Content -Raw -Encoding utf8 $ThisFile) |
ForEach-Object Replace 'asdf', 'TESTREPLACE!' |
New-Item -Force $ThisFile |
Out-Null # suppress New-Item's output (a file-info object)

Encode File save utf8

I'm breaking my head: D
I am trying to encode a text file that will be saved in the same way as Notepad saves
It looks exactly the same but it's not the same only if I go into the file via Notepad and save again it works for me what could be the problem with encoding? Or how can I solve it? Is there an option for a command that opens Notepad and saves again?
i use now
(Get-Content 000014.log) | Out-FileUtf8NoBom ddppyyyyy.txt
and after this
Get-ChildItem ddppyyyyy.txt | ForEach-Object {
# get the contents and replace line breaks by U+000A
$contents = [IO.File]::ReadAllText($_) -replace "`r`n?", "`n"
# create UTF-8 encoding without signature
$utf8 = New-Object System.Text.UTF8Encoding $false
# write the text back
[IO.File]::WriteAllText($_, $contents, $utf8)
}

When you open a file with notepad.exe it autodetects the encoding (or do you open the file explicitly File->Open.. as UTF-8?). If your file is actually not UFT-8 but something else notepad could be able to work around this and converts it to the required encoding when the file is resaved. So, when you do not specify the correct input encoding in your PoSh script things are will go wrong.
But that's not all; notepad also drops erroneous characters when the file is saved to create a regular text file. For instance, your text file might contain a NULL character that only gets removed when you use notepad. If this is the case it is highly unlikely that your input file is UTF-8 encoded (unless it is broken). So, it looks like your problem is your source file is UTF16 or similar; try to find the right input encoding and rewrite it, e.g. UTF-16 to UTF-8
Get-Content file.foo -Encoding Unicode | Set-Content -Encoding UTF8 newfile.foo
Try it like this:
Get-ChildItem ddppyyyyy.txt | ForEach-Object {
# get the contents and replace Windows line breaks by U+000A
$raw= (Get-Content -Raw $_ -Encoding UTF8) -replace "`r?`n", "`n" -replace "`0", ""
# create UTF-8 encoding without BOM signature
$utf8NoBom = New-Object System.Text.UTF8Encoding $false
# write the text back
[System.IO.File]::WriteAllLines($_, $raw, $utf8NoBom)
}
If you are struggling with the Byte-order-mark it is best to use a hex editor to check the file header manually; checking your file after I have saved it like shown above and then opening it with Notepad.exe and saving it under a new name shows no difference anymore:
The hex-dumped beginning of a file with BOM looks like this instead:
Also, as noted, while your regex pattern should work it want to convert Windows newlines to Unix style it is much more common and safer to make the CR optional: `r?`n
Als noted by mklement0 reading the file using the correct encoding is important; if your file is actually in Latin1 or something you will end up with a broken file if you carelessly convert it to UTF-8 in PoSH.
Thus, I have added the -Encoding UTF8 param to the Get-Content Cmdlet; adjust as needed.

Update: There is nothing wrong with the code in the question, the true problem was embedded NUL characters in the files, which caused problems in R, and which opening and resaving in Notepad implicitly removed, thereby resolving the problem (assuming that simply discarding these NULs works as intended) - see also: wp78de's answer.
Therefore, modifying the $contents = ... line as follows should fix your problem:
$contents = [IO.File]::ReadAllText($_) -replace "`r`n", "`n" -replace "`0"
Note: The code in the question uses the Out-FileUtf8NoBom function from this answer, which allows saving to BOM-less UTF-8 files in Windows PowerShell; it now supports a -UseLF switch, which would simplify the OP's command to (additional problems notwithstanding):
Get-Content 000014.log | Out-FileUtf8NoBom ddppyyyyy.txt -UseLF
There's a conceptual flaw in your regex, though it is benign in this case: instead of "`r`n?" you want "`r?`n" (or, expressed as a pure regex, '\r?\n') in order to match both CRLF ("`r`n") and LF-only ("`n") newlines.
Your regex would instead match CRLF and CR-only(!) newlines; however, as wp78de points out, if your input file contains only the usual CRLF newlines (and not also isolated CR characters), your replacement operation should still work.
In fact, you don't need a regex at all if all you need is to replace CRLF sequences with LF: -replace "`r`n", "`n"
Assuming that your original input files are ANSI-encoded, you can simplify your approach as follows, without the need to call Out-FileUtf8NoBom first (assumes Windows PowerShell):
# NO need for Out-FileUtf8NoBom - process the ANSI-encoded files directly.
Get-ChildItem *SomePattern*.txt | ForEach-Object {
# Get the contents and make sure newlines are LF-only
# [Text.Encoding]::Default is the encoding for the active ANSI code page
# in Windows PowerShell.
$contents = [IO.File]::ReadAllText(
$_.FullName,
[Text.Encoding]::Default
) -replace "`r`n", "`n"
# Write the text back with BOM-less UTF-8 (.NET's default)
[IO.File]::WriteAllText($_.FullName, $contents, $utf8)
}
Note that replacing the content of files in-place bears a risk of data loss, so it's best to create backup copies of the original files first.
Note: If you wanted to perform the same operation in PowerShell [Core] v6+, which is built on .NET Core, the code must be modified slightly, because [Text.Encoding]::Default no longer reflects the active ANSI code page and instead invariably returns a BOM-less UTF-8 encoding.
Therefore, the $contents = ... statement would have to change to (note that this would work in Windows PowerShell too):
$contents = [IO.File]::ReadAllText(
$_.FullName,
[Text.Encoding]::GetEncoding(
[cultureinfo]::CurrentCulture.TextInfo.AnsiCodePage
)
) -replace "`r`n", "`n"

How to expand file content with powershell

I want to do this :
$content = get-content "test.html"
$template = get-content "template.html"
$template | out-file "out.html"
where template.html contains
<html>
<head>
</head>
<body>
$content
</body>
</html>
and test.html contains:
<h1>Test Expand</h1>
<div>Hello</div>
I get weird characters in first 2 characters of out.html :
��
and content is not expanded.
How to fix this ?

To complement Mathias R. Jessen's helpful answer with a solution that:
is more efficient.
ensures that the input files are read as UTF-8, even if they don't have a (pseudo-)BOM (byte-order mark).
avoids the "weird character" problem altogether by writing a UTF-8-encoded output file without that pseudo-BOM.
# Explicitly read the input files as UTF-8, as a whole.
$content = get-content -raw -encoding utf8 test.html
$template = get-content -raw -encoding utf8 template.html
# Write to output file using UTF-8 encoding *without a BOM*.
[IO.File]::WriteAllText(
"$PWD/out.html",
$ExecutionContext.InvokeCommand.ExpandString($template)
)
get-content -raw (PSv3+) reads the files in as a whole, into a single string (instead of an array of strings, line by line), which, while more memory-intensive, is faster. With HTML files, memory usage shouldn't be a concern.
An additional advantage of reading the files in full is that if the template were to contain multi-line subexpressions ($(...)), the expansion would still function correctly.
get-content -encoding utf8 ensures that the input files are interpreted as using character encoding UTF-8, as is typical in the web world nowadays.
This is crucial, given that UTF-8-encoded HTML files normally do not have the 3-byte pseudo-BOM that PowerShell needs in order to correctly identify a file as UTF-8-encoded (see below).
A single $ExecutionContext.InvokeCommand.ExpandString() call is then sufficient to perform the template expansion.
Out-File -Encoding utf8 would invariably create a file with the pseudo-BOM, which is undesired.
Instead, [IO.File]::WriteAllText() is used, taking advantage of the fact that the .NET Framework by default creates UTF-8-encoded files without the BOM.
Note the use of $PWD/ before out.html, which is needed to ensure that the file gets written in PowerShell's current location (directory); unfortunately, what the .NET Framework considers the current directory is not in sync with PowerShell.
Finally, the obligatory security warning: use this expansion technique only on input that you trust, given that arbitrary embedded commands may get executed.
Optional background information
PowerShell's Out-File, > and >> use UTF-16 LE character encoding with a BOM (byte-order mark) by default (the "weird characters", as mentioned).
While Out-File -Encoding utf8 allows creating UTF-8 output files instead,
PowerShell invariably prepends a 3-byte pseudo-BOM to the output file, which some utilities, notably those with Unix heritage, have problems with - so you would still get "weird characters" (albeit different ones).
If you want a more PowerShell-like way of creating BOM-less UTF-8 files,
see this answer of mine, which defines an Out-FileUtf8NoBom function that otherwise emulates the core functionality of Out-File.
Conversely, on reading files, you must use Get-Content -Encoding utf8 to ensure that BOM-less UTF-8 files are recognized as such.
In the absence of the UTF-8 pseudo-BOM, Get-Content assumes that the file uses the single-byte, extended-ASCII encoding specified by the system's legacy codepage (e.g., Windows-1252 on English-language systems, an encoding that PowerShell calls Default).
Note that while Windows-only editors such as Notepad create UTF-8 files with the pseudo-BOM (if you explicitly choose to save as UTF-8; default is the legacy codepage encoding, "ANSI"), increasingly popular cross-platform editors such as Visual Studio Code, Atom, and Sublime Text by default do not use the pseudo-BOM when they create files.

For the "weird characters", they're probably BOMs (Byte-order marks). Specify the output encoding explicitly with the -Encoding parameter when using Out-File, for example:
$Template |Out-File out.html -Encoding UTF8
For the string expansion, you need to explicitly tell powershell to do so:
$Template = $Template |ForEach-Object {
$ExecutionContext.InvokeCommand.ExpandString($_)
}
$Template | Out-File out.html -Encoding UTF8

Powershell's Out-File adds a newline to the Top of the file - Out-File vs. Set-Content

I have the following powershell:
# Find all .csproj files
$csProjFiles = get-childitem ./ -include *.csproj -recurse
# Remove the packages.config include from the csproj files.
$csProjFiles | foreach ($_) {(get-content $_) |
select-string -pattern '<None Include="packages.config" />' -notmatch |
Out-File $_ -force}
And it seems to work fine. The line with the packages.config is not in the file after I run.
But after I run there is an extra newline at that TOP of the file. (Not the bottom.)
I am confused as to how that is getting there. What can I do to get rid of the extra newline char that this generates at the top of the file?
UPDATE:
I swapped out to a different way of doing this:
$csProjFiles | foreach ($_) {$currentFile = $_; (get-content $_) |
Where-Object {$_ -notmatch '<None Include="packages.config" />'} |
Set-Content $currentFile -force}
It works fine and does not have the extra line at the top of the file. But I wouldn't mind knowing why the top example was adding the extra line.

Out-File and redirection operators > / >> take arbitrary input objects and convert them to string representations as they would present in the console - that is, PowerShell's default output formatting is applied - and sends those string representations to the output file.
These string representations often have leading and/or trailing newlines for readability.
See Get-Help about_Format.ps1xml to learn more.
Set-Content is for input objects that are already strings or should be treated as strings.
PowerShell calls .psobject.ToString() on all input objects to obtain the string representation, which in most cases defers to the underlying .NET type's .ToString() method.
The resulting representations are typically not the same, and it's important to know when to choose which cmdlet / operator.
Additionally, the default character encodings differ:
Out-File and > / >> default to UTF-16 LE, which PowerShell calls Unicode in the context of the optional -Encoding parameter.
Set-Content defaults to your system's legacy "ANSI" code page (a single-byte, extended-ASCII code page), which PowerShell calls Default.
Note that the the docs as of PSv5.1 mistakenly claim that the default is ASCII.[1]
To change the encoding:
Ad-hoc change: Use the -Encoding parameter with Out-File or Set-Content to control the output character encoding explicitly.
You cannot change the encoding used by > / >> ad-hoc, but see below.
[PSv3+] Changing the default (use with caution): Use the $PSDefaultParameterValues mechanism (see Get-Help about_Parameters_DefaultValues), which enables setting default values for parameters:
Changing the default encoding for Out-File also changes it for > / >> in PSv5.1 or above[2].
To change it to UTF-8, for instance, use:
$PSDefaultParameterValues['Out-File:Encoding']='UTF8'
Note that in PSv5.0 or below you cannot change what encoding > and >> use.
If you change the default for Set-Content, be sure to change it for Add-Content too:
$PSDefaultParameterValues['Set-Content:Encoding'] = $PSDefaultParameterValues['Add-Content:Encoding'] ='UTF8'
You can also use wildcard patterns to represent the cmdlet / advanced function name to apply the default parameter value to; for instance, if you used $PSDefaultParameterValues['*:Encoding']='UTF8', then all cmdlets that have an -Encoding parameter would default to that value, but that is ill-advised, because in some cmdlets the -Encoding refers to the input encoding.
There is no single shared prefix among cmdlets that write to files that allows you to target all output cmdlets, but you can define a pattern for each of the verbs:
$enc = 'UTF8; $PSDefaultParameterValues += #{ 'Out-*:Encoding'=$enc; 'Set-*:Encoding'=$enc; 'Add-*:Encoding'=$enc; 'Export-*:Encoding'=$enc }
Caveat: $PSDefaultParameterValues is defined in the global scope, so any modifications you make to it take effect globally, and affect subsequent commands.
To limit changes to a script / function's scope and its descendent scopes, use a local $PSDefaultParameterValues variable, which you can either initialize to an empty hashtable to start from scratch ($PSDefaultParameterValues = #{}), or initialize to a clone of the global value ($PSDefaultParameterValues = $PSDefaultParameterValues.Clone())
Caveats:
Using the utf8 encoding in Windows PowerShell invariably creates UTF-8 files with a BOM. (Commendably, in PowerShell [Core] v6+ it does not, and this edition even consistently defaults to BOM-less UTF-8; however, you can create a BOM on demand with utf8BOM.
However, if you're running Windows 10 and you're willing to switch to BOM-less UTF-8 encoding system-wide - which can have side effects - even Windows PowerShell can be made to use BOM-less UTF-8 consistently - see this answer.
In the case at hand, the output objects are [Microsoft.PowerShell.Commands.MatchInfo] instances output by Select-String:
Using default formatting, as happens with Out-File, they output an empty line above, and two empty lines below (with multiple instances printing in a contiguous block between a single set of the empty lines above and below).
If you call .psobject.ToString() on them, as happens with Set-File, they evaluate to just the matching lines (with no origin-path prefix, given that input was provided via the pipeline rather than as filenames via the -Path / -LiteralPath parameters), with no leading or trailing empty lines.
That said, had you piped to | Select-Object -ExpandProperty Line or simply | ForEach-Object Line in order to explicitly output just the matching lines as strings, both Out-File and Set-Content would have yielded the same result (except for their default encoding).
P.S.: LotPing's observation is correct: You seem to be confusing the foreach statement with the ForEach-Object cmdlet (which, regrettably, is also known by built-in alias foreach, causing confusion).
The ForEach-Object cmdlet doesn't need an explicit definition for $_: in the (implied -Process) script block you pass to it, $_ is automatically defined to be the input object at hand.
Your ($_) argument to foreach (ForEach-Object) is effectively ignored: because it evaluates to $null: automatic variable $_, when used outside of special contexts - such as script blocks in the pipeline - effectively evaluates to $null, and putting (...) around it makes no difference, so you're effectively passing $null, which is ignored.
[1] Verify that ASCII is not the default as follows: '0x{0:x}' -f $('ä' | Set-Content t.txt; $b=[System.IO.File]::ReadAllBytes("$PWD\t.txt")[0]; ri t.txt; $b) yields 0xe4 on an en-US system, which is the Windows-1252 code point for ä (which coincides with the Unicode codepoint, but the output is a single-byte-encoded file with no BOM).
If you use -Encoding ASCII explicitly, you get 0x3f, the code point for literal ?, because that's what using ASCII converts all non-ASCII chars. to.
[2] PetSerAl found the source-code location that shows that > and >> are effective aliases for Out-File [-Append], and he points out that redefining Out-File therefore also redefines > / >>; similarly, specifying a default encoding via $PSDefaultParameterValues for Out-File also takes effect for > / >>.
Windows PowerShell v5.1 is the minimum version that works this way..
Tip of the hat to PetSerAl for his help.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse