Powershell: Out-Printer, unicode and font sizing - powershell

powershell "get-content foo.txt|Out-Printer"
As long as foo.txt is english, everything is fine (well mostly)
If foo.txt contains unicode characters e.g. राष्ट्र then what gets printed is stuff like °à¥€ ब
I tried passing the -Encoding option to get-content but it did not change the result.
Is it possible ensure that unicode text gets printed properly without
launching Word/IE etc in the background to print it?
My second question is
Is it possible to control which font (type and size) is used for
printing by out-printer?

In my environment (Windows 10 2004 Build 19041.985, Japanese locale), I got the correct result with the following situation:
Save the .txt file in UTF-8 with BOM, and print with Get-Content .\foo.txt | Out-Printer
Save the .txt file in UTF-8 without BOM, and print with Get-Content .\foo.txt -Encoding UTF8 | Out-Printer
I got the incorrect result (like 爨ー爨セ爨キ爭財、游・財、ー) with the following situation:
Save the .txt file in UTF-8 without BOM, and print with Get-Content .\foo.txt | Out-Printer
So it looks like an encoding problem. Please check what #RavenKnit said first.
Is it possible ensure that unicode text gets printed properly without launching Word/IE etc in the background to print it?
I couldn't find the way to do with the Get-PrinterPort, Get-WmiObject Win32_Printer, prnmngr.vbs or prnqctl.vbs. If you just don't want to show a Windows while you print the content of a file, you can use the Print verb. It runs notepad.exe for .txt files, winword.exe for .docx files, etc.
Start-Process .\foo.txt -Verb Print -WindowStyle Hidden -Wait
Is it possible to control which font (type and size) is used for printing by out-printer?
According to the source of Out-Printer, the default font is embbeded in the .resx file. So it looks like you cannot control the default font.

Related

Encode File save utf8

I'm breaking my head: D
I am trying to encode a text file that will be saved in the same way as Notepad saves
It looks exactly the same but it's not the same only if I go into the file via Notepad and save again it works for me what could be the problem with encoding? Or how can I solve it? Is there an option for a command that opens Notepad and saves again?
i use now
(Get-Content 000014.log) | Out-FileUtf8NoBom ddppyyyyy.txt
and after this
Get-ChildItem ddppyyyyy.txt | ForEach-Object {
# get the contents and replace line breaks by U+000A
$contents = [IO.File]::ReadAllText($_) -replace "`r`n?", "`n"
# create UTF-8 encoding without signature
$utf8 = New-Object System.Text.UTF8Encoding $false
# write the text back
[IO.File]::WriteAllText($_, $contents, $utf8)
}
When you open a file with notepad.exe it autodetects the encoding (or do you open the file explicitly File->Open.. as UTF-8?). If your file is actually not UFT-8 but something else notepad could be able to work around this and converts it to the required encoding when the file is resaved. So, when you do not specify the correct input encoding in your PoSh script things are will go wrong.
But that's not all; notepad also drops erroneous characters when the file is saved to create a regular text file. For instance, your text file might contain a NULL character that only gets removed when you use notepad. If this is the case it is highly unlikely that your input file is UTF-8 encoded (unless it is broken). So, it looks like your problem is your source file is UTF16 or similar; try to find the right input encoding and rewrite it, e.g. UTF-16 to UTF-8
Get-Content file.foo -Encoding Unicode | Set-Content -Encoding UTF8 newfile.foo
Try it like this:
Get-ChildItem ddppyyyyy.txt | ForEach-Object {
# get the contents and replace Windows line breaks by U+000A
$raw= (Get-Content -Raw $_ -Encoding UTF8) -replace "`r?`n", "`n" -replace "`0", ""
# create UTF-8 encoding without BOM signature
$utf8NoBom = New-Object System.Text.UTF8Encoding $false
# write the text back
[System.IO.File]::WriteAllLines($_, $raw, $utf8NoBom)
}
If you are struggling with the Byte-order-mark it is best to use a hex editor to check the file header manually; checking your file after I have saved it like shown above and then opening it with Notepad.exe and saving it under a new name shows no difference anymore:
The hex-dumped beginning of a file with BOM looks like this instead:
Also, as noted, while your regex pattern should work it want to convert Windows newlines to Unix style it is much more common and safer to make the CR optional: `r?`n
Als noted by mklement0 reading the file using the correct encoding is important; if your file is actually in Latin1 or something you will end up with a broken file if you carelessly convert it to UTF-8 in PoSH.
Thus, I have added the -Encoding UTF8 param to the Get-Content Cmdlet; adjust as needed.
Update: There is nothing wrong with the code in the question, the true problem was embedded NUL characters in the files, which caused problems in R, and which opening and resaving in Notepad implicitly removed, thereby resolving the problem (assuming that simply discarding these NULs works as intended) - see also: wp78de's answer.
Therefore, modifying the $contents = ... line as follows should fix your problem:
$contents = [IO.File]::ReadAllText($_) -replace "`r`n", "`n" -replace "`0"
Note: The code in the question uses the Out-FileUtf8NoBom function from this answer, which allows saving to BOM-less UTF-8 files in Windows PowerShell; it now supports a -UseLF switch, which would simplify the OP's command to (additional problems notwithstanding):
Get-Content 000014.log | Out-FileUtf8NoBom ddppyyyyy.txt -UseLF
There's a conceptual flaw in your regex, though it is benign in this case: instead of "`r`n?" you want "`r?`n" (or, expressed as a pure regex, '\r?\n') in order to match both CRLF ("`r`n") and LF-only ("`n") newlines.
Your regex would instead match CRLF and CR-only(!) newlines; however, as wp78de points out, if your input file contains only the usual CRLF newlines (and not also isolated CR characters), your replacement operation should still work.
In fact, you don't need a regex at all if all you need is to replace CRLF sequences with LF: -replace "`r`n", "`n"
Assuming that your original input files are ANSI-encoded, you can simplify your approach as follows, without the need to call Out-FileUtf8NoBom first (assumes Windows PowerShell):
# NO need for Out-FileUtf8NoBom - process the ANSI-encoded files directly.
Get-ChildItem *SomePattern*.txt | ForEach-Object {
# Get the contents and make sure newlines are LF-only
# [Text.Encoding]::Default is the encoding for the active ANSI code page
# in Windows PowerShell.
$contents = [IO.File]::ReadAllText(
$_.FullName,
[Text.Encoding]::Default
) -replace "`r`n", "`n"
# Write the text back with BOM-less UTF-8 (.NET's default)
[IO.File]::WriteAllText($_.FullName, $contents, $utf8)
}
Note that replacing the content of files in-place bears a risk of data loss, so it's best to create backup copies of the original files first.
Note: If you wanted to perform the same operation in PowerShell [Core] v6+, which is built on .NET Core, the code must be modified slightly, because [Text.Encoding]::Default no longer reflects the active ANSI code page and instead invariably returns a BOM-less UTF-8 encoding.
Therefore, the $contents = ... statement would have to change to (note that this would work in Windows PowerShell too):
$contents = [IO.File]::ReadAllText(
$_.FullName,
[Text.Encoding]::GetEncoding(
[cultureinfo]::CurrentCulture.TextInfo.AnsiCodePage
)
) -replace "`r`n", "`n"

Print Non-latin charaters in powershell

I am a newbie to this so excuse if my Q is "dumb".
I have a .txt file which has some non-latin script in it (Arabic, Hindi, japanese, etc). These characters are showing fine when i open in notepad. However if i try to print them (raw data) in cmd prompt window or windows powershell, they appear as boxes or Q marks.
I am reading some websites but finding some conflicting info - are non-latin scripts NOT supported on the above consoles?
You need to change the -Encoding on the command that you call the *.txt file with (if it supports it).
Get-Content C:\temp\test2.txt
सà¥à¤Ÿà¥ˆà¤• ओवरफ़à¥à¤²à¥‹
كومة أكثر من التدÙÙ‚
スタックオーãƒãƒ¼ãƒ•ãƒ­ãƒ¼
Get-Content C:\temp\test2.txt -Encoding UTF8
स्टैक ओवरफ़्लो
كومة أكثر من التدفق
スタックオーバーフロー

Powershell Out-file special characters

I have a script that processes data from files and writes result based on a condition to txt. Given data are strings with words like: "Distribución" or "México". When processed, those special characters like "é" and "ó" are broken (typical white square or question mark).
How can i encode the output file to make it work with those characters? I tried encoding in Utf8, utf8 without BOM, it doesn't work. Here is to file writing line:
...| Out-file -encoding XXX .\result.txt
in XXX i tried ASCII, Utf8, nothing works :/
Out-File will always add a BOM. It's a particularly annoying "feature" of that Cmdlet. Unfortunately - to my knowledge - there is no quick way to save a file using UTF8 WITHOUT a BOM in powershell. You can, however, leverage .Net to do this. This isn't really production ready, but here's a quick example:
$outputPath = "D:\temp.txt"
$data = "Distribución or México"
[System.IO.File]::WriteAllLines($outputPath, $data)
Wrap it in a Cmdlet, function and / or module to make it reusable. Of course you can take more control over the file encoding with .Net too.

Override Powershell > shortcut

In Powershell using > is the same as using | Out-File, so I can write
"something" > file.txt and It will write 'something' into file.txt . This is what I expect of a shell. Unfortunately, Powershell uses Unicode for writing file.txt. The only way to change it into UTF-8 is to write the quite long command:
"something" | Out-File file.txt -Encoding UTF8
I want to override the > shortcut, so that it adds the UTF-8 encoding by default. Is there a way to do that?
NOT A DUPLICATE CLARIFICATION:
This is not a duplicate. As is explained clearly here, Out-File has a hard-coded default. I don't want to change Out-File's behavior, I want to change >'s behavior.
No, can't be done
Even the documentation alludes to this.
From the last paragraph of Get-Help about_Redirection:
When you are
writing to files, the redirection operators use Unicode encoding. If
the file has a different encoding, the output might not be formatted
correctly. To redirect content to non-Unicode files, use the Out-File
cmdlet with its Encoding parameter.
(emphasis added)
The output encoding can be overriden by changing the $OutputEncoding variable. However, that only works for piping output into executables. It doesn't work for redirection operators. If you need a specific encoding for file output you must use Out-File or Set-Content with the -Encoding parameter (or a StreamWriter).

Powershell generates .bat, and put special character

I'm currently working with powershell in order to create a .bat script.
I put text in .bat script with >>
For example,
Write "start program xxx" >> script.bat
but when i try to execute this script.bat with cmd, it says :
"■s" is not recognize ... etc.
And in powershell it says : 'þp' is not recognize ..
So I guess that doing >> script put special character at the beginning of the line. If someone got information on this. And what those "■s" and 'þp' are.
The file redirection operators (>> etc.) will write text encoded in UTF-16. If the file already contains text in a different encoding everything will be confused (and I'm not use of cmd.exe understands UTF-16 at all.
Easier to use Out-File with the -encoding parameter to specify something consistent. Use the -append switch parameter to append rather than overwriting.
Eg.
"Some text" | Out-File -encoding ASCII -append -FilePath 'script.bat`
(If you find yourself writing the same out-file and parameters, then put it in a helper advanced function that will read pipeline input to encapsulate the out-file.)