Powershell magnling ascii text - powershell

I'm getting extra characters and lines when trying to modify hosts files. For example, this select string does not take anything out, but the two files are different:
get-content -Encoding ascii C:\Windows\system32\drivers\etc\hosts |
select-string -Encoding ascii -notmatch "thereisnolinelikethis" |
out-file -Encoding ascii c:\temp\testfile
PS C:\temp> (get-filehash C:\windows\system32\drivers\etc\hosts).hash
C54C246D2941F02083B85CE2774D271BD574F905BABE030CC1BB41A479A9420E
PS C:\temp> (Get-FileHash C:\temp\testfile).hash
AC6A1134C0892AD3C5530E58759A09C73D8E0E818EC867C9203B9B54E4B83566

I can confirm that your commands do inexplicably result in extra line breaks in the output file, in the start and in the end. Powershell also converts the tabs in the original file into four spaces instead.
While I cannot explain why, these commands do the same thing without these issues:
Try this code instead:
Get-Content -Path C:\Windows\System32\drivers\etc\hosts -Encoding Ascii |
Where-Object { -not $_.Contains("thereisnolinelikethis") } |
Out-File -FilePath "c:\temp\testfile" -Encoding Ascii

I think this is more of an issue with PowerShell's F&O (formatting & output) engine. Keep in mind that Select-String outputs a rich object called MatchInfo. When that object reaches the end of the output it needs to be rendered to a string. I think it is that rendering/formatting that injects the extra line. One of the properties on MatchInfo is the line that was matched (or notmatched). If you pass just the Line property down the pipe, it seems to work better (hashes match):
Get-Content C:\Windows\system32\drivers\etc\hosts |
Select-String -notmatch "thereisnolinelikethis" |
Foreach {$_.Line} |
Out-File -Encoding ascii c:\temp\testfile
BTW you only need to specify ASCII encoding when outputting back to the file. Everywhere else in PowerShell, just let the string flow as Unicode.
All that said, I would use Where-Object instead of Select-String for this scenario. Where-Object is a filtering command which is what you want. Select-String takes input of one form (string) and converts it to a different object (MatchInfo).

Out-File adds a trailing NewLine ("`r`n") to the testfile file.
C:\Windows\System32\drivers\etc\hosts does not contain a trailing newline out of the box, which is why you get a different FileHash
If you open the files with a StreamReader, you'll see that the underlying stream differs in length (due to the trailing newline in the new file):
PS C:\> $Hosts = [System.IO.StreamReader]"C:\Windows\System32\drivers\etc\hosts"
PS C:\> $Tests = [System.IO.StreamReader]"C:\temp\testfile"
PS C:\> $Hosts.BaseStream.Length
822
PS C:\> $Tests.BaseStream.Length
824
PS C:\> $Tests.BaseStream.Position = 822; $Tests.Read(); $Tests.Read()
13
10
ASCII characters 13 (0x0D) and 10 (0x0A) correspond to [System.Environment]::NewLine or CR+LF

Related

How to prevent trailing newline in PowerShell script?

This code adds lines, even when using "-NoNewline"
$LST1="OsName","OsVersion","TimeZone","CsName"
$LST2="CsManufacturer","CsModel","CsSystemType","BiosBIOSVersion","BiosReleaseDate"
$MEM1 = (Get-CimInstance Win32_PhysicalMemory | Measure-Object -Property capacity -Sum | Foreach {"{0:N2}" -f ([math]::round(($_.Sum / 1GB),2))})
$Pro1 = "systemname","DeviceID","numberOfCores","NumberOfLogicalProcessors"
Add-Content OutText.txt "OS Information:" -NoNewline
Get-ComputerInfo -Property $LST1 | Format-List | Out-File -Encoding ASCII -FilePath OutText.txt -Append
Add-Content OutText.txt "Hardware Information:" -NoNewline
Get-ComputerInfo -Property $LST2 | Format-List | Out-File -Encoding ASCII -FilePath OutText.txt -Append
Add-Content OutText.txt "RAM: $RAM1 GB" -NoNewline
Get-WmiObject -class win32_processor -Property $Pro1 | Select-Object -Property $Pro1 | Out-File -FilePath OutText.txt -Encoding ASCII -Append
Too many lines breaks:
Theo has provided the crucial pointer in a comment:
(Get-ComputerInfo -Property $LST1 | Format-List | Out-String).Trim() |
Add-Content -Path $path -NoNewline
Let me elaborate:
To prevent leading and trailing empty lines in the output of Format-List from showing up in a file via Out-File / >, use Out-String to create an in-memory string representation of the formatted output first,
which then allows you to apply .Trim() to the resulting multi-line string in order to remove leading and trailing lines (whitespace in general) from Out-String's output.
Since Out-String itself renders the formatting instructions output by Format-List, you can then use Set-Content or Add-Content to save / append the resulting string to a file.
The behavior of Out-String:
Out-String produces the same for-display representation that you get by default in the console - or via other Out-* cmdlets, notably Out-File / > - as a single, multi-line string by default.
While this representation may itself contain empty lines, as is typical, Out-String additionally appends a trailing newline, even though there's no good reason to do so, as discussed in GitHub issue #14444.
In cases where you want to remove this extraneous trailing newline only, you can use the following approach, via the -replace operator (the operation works with both Windows-style CRLF newlines (\r\n) and Unix-style LF-only ones (\n)):
(... | Out-String) -replace '\r?\n\z'
Or, less efficiently, using the -Stream switch to output lines individually and then re-join them with newlines without a trailing one ("`n" creates a LF-only newline, which PowerShell accepts interchangeably with CRLF newlines ("`r`n"):
(... | Out-String -Stream) -join "`n"
Out-String applied to output from external programs:
Out-String can also be used to capture the lines output by external programs as a single, multi-line string (by default, PowerShell captures output line by line, resulting in an array of strings when captured in a variable).
However, this use of Out-String is problematic:
There too the trailing newline that is appended can be a nuisance.
In Windows PowerShell there's an additional nuisance (which has since been corrected in PowerShell (Core) 7+): If you use a 2>&1 to merge stderr output into the success output stream, the first stderr line is formatted like a PowerShell error.
Run cmd /c 'echo yes & echo no >&2' 2>&1 | Out-String to see the problem.
The following idiom avoids both problems (... represents your external-program call):
$multiLineString = [string[]] (... 2>&1) -join "`n"
Note: The above uses a LF-only newline to join the array elements, which is usually sufficient. Use "`r`n" for CRLF newlines or [Environment]::NewLine for the OS-appropriate newline sequence.
Example:
The following cmd.exe CLI call outputs both a stdout line and a stderr line, with 2>&1 on the PowerShell side merging the two into the success output stream.
PS> [string[]] (cmd /c 'echo yes & echo no >&2' 2>&1) -join "`n" |
ForEach-Object { "[$_]" } # just to visualize the string boundaries
[yes
no ]
Note: The trailing space after no is owed to the unusual behavior of cmd.exe's built-in echo command: it includes the space before the >&2 redirection in its output.
I use the following to strip the Cr/Lf added by Out-String.
$YourVariableHere = $($YourVariableHere.Substring(0,($YourVariableHere.Length-2)))
You can adjust the number at the end if there is more than one Cr/Lf you want to remove.
HTH

Issues with specific characters in outfile

I have a script that merges files and that works fine - but characters like åäö looks not good in the output file
Here is the complete script:
$startOfToday = (Get-Date).Date
Get-ChildItem "C:\TEST -include *.* -Recurse |
Where-Object LastWriteTime -gt $startOfToday | ForEach-Object {gc $_; ""} |
Out-File "C:\$(Get-Date -Format 'yyyy/mm/dd').txt"
In the files in looks like this for example
Order ID 1
Order ID 2
This is för får
In the output it gets like this for the last row
Order ID 1
Order ID 2
får för fär
is there a way to make those characters appear in the output file as they appear in the first file?
The implication is that your input files are UTF-8-encoded without a BOM, which in Windows PowerShell are (mis)interpreted to be ANSI-encoded (using the system's active ANSI code page, such as Windows-1252).
The solution is to tell gc (Get-Content) explicitly what encoding to use, via the -Encoding parameter:
Get-ChildItem C:\TEST -include *.* -Recurse |
Where-Object LastWriteTime -gt $startOfToday |
ForEach-Object { Get-Content -Encoding Utf8 $_; ""} |
Out-File "C:\$(Get-Date -Format 'yyyy/mm/dd').txt"
Note that PowerShell never preserves the input encoding automatically, therefore, in the absence of using -Encoding with Out-File, its default encoding is used, which is "Unicode" (UTF-16LE) in Windows PowerShell.
While PowerShell (Core) 7+ also doesn't preserve input encodings, it consistently defaults to BOM-less UTF-8, so your original code would work as-is there.
For more information about default encodings in Windows PowerShell vs. PowerShell (Core) 7+, see this answer.
Note: As AdminOfThings suggests in a comment, simply replacing Out-File with Set-Content in your original code also works in this particular case, because the same misinterpretation of the encoding is then performed on both in- and output, and the data is simply being passed through. This isn't a general solution, however, notably not if you need to process the strings in memory first, before saving them to a file.

Strip lines from text file based on content

I like to use one of the packaged HOSTS (MVPS,) files to protect myself from some of the nastier domains. Unfortunately, sometimes these files are a bit overzealous for me (blocking googleadsservices is a pain sometimes). I want an easy way to strip certain lines out of these files. In Linux I use:
cat hosts |grep -v <pattern> >hosts.new
And the file is rewritten minus the lines referencing the pattern I specified in the grep. So I just set it up to replace hosts with hosts.new on reboot and I'm done.
Is there an easy way to do this in PowerShell?
In PowerShell you'd do
(Get-Content hosts) -notmatch $pattern | Out-File hosts.new
or
(cat hosts) -notmatch $pattern > hosts.new
for short.
Of course, since Out-File (and with it the redirection operator) default to Unicode format, you may actually want to use Set-Content instead of Out-File:
(Get-Content hosts) -notmatch $pattern | Set-Content hosts.new
or
(gc hosts) -notmatch $pattern | sc hosts.new
And since the input file is read in a grouping expression (the parentheses around Get-Content hosts) you could actually write the output back to the source file:
(Get-Content hosts) -notmatch $pattern | Set-Content hosts
To complement Ansgar Wiechers' helpful answer (which offers pragmatic and concise solutions based on reading the entire input file into memory up-front):
PowerShell's grep equivalent is the Select-String cmdlet and, just like grep, it directly accepts a filename argument (PSv3+ syntax):
Select-String -NotMatch <pattern> hosts | ForEach-Object Line | Set-Content hosts.new
Select-String -NotMatch <pattern> hosts is short for
Select-String -NotMatch -Pattern <pattern> -LiteralPath hosts and is the virtual equivalent of
grep -v <pattern> hosts
However, Select-String doesn't output strings, it outputs [Microsoft.PowerShell.Commands.MatchInfo] instances that wrap matching lines (stored in property .Line) along with metadata about the match.
ForEach-Object Line extracts just the matching lines (the value of property .Line) from these objects.
Set-Content hosts.new writes the matching lines to file hosts.new, using "ANSI" encoding in Windows PowerShell - i.e., it uses the legacy code page implied by the active system locale, typically a supranational 8-bit superset of ASCII - and UTF-8 encoding (without BOM) in PowerShell Core.
Use the -Encoding parameter to specify a different encoding.
>, by contrast (an effective alias of the Out-File cmdlet), creates:
UTF16-LE ("Unicode") files by default in Windows PowerShell.
UTF-8 files (without BOM) in PowerShell Core - in other words: in PowerShell Core, using
> hosts.new in lieu of | Set-Content hosts.new will do.
Note: While both > / Out-File and Set-Content are suitable for sending string inputs to an output file, they are not generally suitable for sending other data types to a file for programmatic processing: > / Out-File output objects the way they would print to the console / terminal, which is pretty format for display, whereas Set-Content stringifies (simply put: calls .ToString() on) the input objects, which often results in loss of information.
For non-string data, consider a (more) structured data format such as XML (Export-CliXml), JSON (ConvertTo-Json) or CSV (Export-Csv).

Concatenate files using PowerShell

I am using PowerShell 3.
What is best practice for concatenating files?
file1.txt + file2.txt = file3.txt
Does PowerShell provide a facility for performing this operation directly? Or do I need each file's contents be loaded into local variables?
If all the files exist in the same directory and can be matched by a simple pattern, the following code will combine all files into one.
Get-Content .\File?.txt | Out-File .\Combined.txt
I would go this route:
Get-Content file1.txt, file2.txt | Set-Content file3.txt
Use the -Encoding parameter on Set-Content if you need something other than ASCII which is the default for Set-Content.
If you need more flexibility, you could use something like
Get-ChildItem -Recurse *.cs | ForEach-Object { Get-Content $_ } | Out-File -Path .\all.txt
Warning: Concatenation using a simple Get-Content (whether or not using -Raw flag) works for text files; Powershell is too helpful for that:
Without -Raw, it "fixes" (i.e. breaks, pun intended) line breaks, or what Powershell thinks is a line break.
With -Raw, you get a terminating line end (normally CR+LF) at the
end of each file part, which is added at the end of the pipeline. There's an option for that in newer Powershells' Set-Content.
To concatenate a binary file (that is, an arbitrary file that was split for some reason and needs to be put together again), use either this:
Get-Content -Raw file1, file2 | Set-Content -NoNewline destination
or something like this:
Get-Content file1 -Encoding Byte -Raw | Set-Content destination -Encoding Byte
Get-Content file2 -Encoding Byte -Raw | Add-Content destination -Encoding Byte
An alternative is to use the CMD shell and use
copy file1 /b + file2 /b + file3 /b + ... destinationfile
You must not overwrite any part, that is, use any of the parts as destination. The destination file must be different from any of the parts. Otherwise you're up for a surprise and must find a backup copy of the file part.
a generalization based on #Keith answer:
gc <some regex expression> | sc output
Here is an interesting example of how to make a zip-in-image file based on Powershell 7
Get-Content -AsByteStream file1.png, file2.7z | Set-Content -AsByteStream file3.png
Get-Content -AsByteStream file1.png, file2.7z | Add-Content -AsByteStream file3.png
gc file1.txt, file2.txt > output.txt
I think this is as short as it gets.
In case you would like to ensure the concatenation is done in a specific order, use the Sort-Object -Property <Some Name> argument. For example, concatenate based on the name sorting in an ascending order:
Get-ChildItem -Path ./* -Include *.txt -Exclude output.txt | Sort-Object -Property Name | ForEach-Object { Get-Content $_ } | Out-File output.txt
IMPORTANT: -Exclude and Out-File MUST contain the same values, otherwise, it will recursively keep on adding to output.txt until your disk is full.
Note that you must append a * at the end of the -Path argument because you are using -Include, as mentioned in Get-ChildItem documentation.

How do I remove carriage returns from text file using Powershell?

I'm outputting the contents of a directory to a txt file using the following command:
$SearchPath="c:\searchpath"
$Outpath="c:\outpath"
Get-ChildItem "$SearchPath" -Recurse | where {!$_.psiscontainer} | Format-Wide -Column 1'
| Out-File "$OutPath\Contents.txt" -Encoding ASCII -Width 200
What I end up with when I do this is a txt file with the information I need, but it adds numerous carriage returns I don't need, making the output harder to read.
This is what it looks like:
c:\searchpath\directory
name of file.txt
name of another file.txt
c:\searchpath\another directory
name of some file.txt
That makes a txt file that requires a lot of scrolling, but the actual information isn't that much, usually a lot less than a hundred lines.
I would like for it to look like:
c:\searchpath\directory
nameoffile.txt
c:\searchpath\another directory
another file.txt
This is what I've tried so far, not working
$configFiles=get-childitem "c:\outpath\*.txt" -rec
foreach ($file in $configFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object {$_ -replace "'n", ""} |
Set-Content $file.PSPath
}
I've also tried 'r but both options leave the file unchanged.
Another attempt:
Select-String -Pattern "\w" -Path 'c:\outpath\contents.txt' | foreach {$_.line}'
| Set-Content -Path c:\outpath\contents2.txt
When I run that string without the Set-content at the end, it appears exactly as I need it in the ISE, but as soon as I add the Set-Content at the end, it once agains carriage returns where I don't need them.
Here's something interesting, if I create a text file with a few carriage returns and a few tabs, then if I use the same -replace script I've been using, but uset to replace the tabs, it works perfect. Butr and n do not work. It's almost as though it doesn't recognize them as escape characters. But if I addr and `n in the txt file then run the script, it still doesn't replace anything. Doesn't seem to know what to do with it.
Set-Content adds newlines by default. Replacing Set-Content by Out-File in your last attempt in your question will give you the file you want:
Select-String -Pattern "\w" -Path 'c:\outpath\contents.txt' | foreach {$_.line} |
Out-File -FilePath c:\outpath\contents2.txt
It's not 'r (apostrophe), it's a back tick: `r. That's the key above the tab key on the US keyboard layout. :)
You can simply avoid all those empty lines by using Select-Object -ExpandProperty Name:
Get-ChildItem "$SearchPath" -Recurse |
Where { !$_.PSIsContainer } |
Select-Object -ExpandProperty Name |
Out-File "$OutPath\Contents.txt" -Encoding ASCII -Width 200
... if you don't need the folder names.