Out-File and carriage returns - powershell

If I have a program, for example this Go program:
package main
import "fmt"
func main() {
fmt.Print("North East\n")
fmt.Print("South West\n")
}
This program produces no carriage returns at all, only newlines. However if I do this:
prog.exe > prog.txt
PowerShell takes it upon itself to add carriage returns to every line. I only want PowerShell to faithfully output what my program created, nothing more. So I tried this instead:
prog.exe | Out-File -NoNewline prog.txt
and PowerShell didn't add carriage returns, but it went ahead and removed the newlines too. How do I do what I am trying to do? Update: based on an answer, this seems to do it:
start -rso prog.txt prog.exe

It seems this is due to the behavior of redirecting output. It's split up into separate lines wherever a newline is detected, but when Powershell joins the lines again, it will use both newline and carriage return (the Windows default).
This should not be an issue though, if you redirect the output directly. So this should give you the expected behavior:
Start-Process prog.exe -RedirectStandardOutput prog.txt

This works for me. Note that out-file defaults to utf16 encoding, vs set-content.
"hi`nthere`n" | out-file file -NoNewline
format-hex file
Path: C:\users\admin\foo\file
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 FF FE 68 00 69 00 0A 00 74 00 68 00 65 00 72 00 .þh.i...t.h.e.r.
00000010 65 00 0A 00 e...
Ok, my first Go program:
(go run .) -join "`n" | set-content file -NoNewline
The problem is windows programs output carriage return and linefeed. This would work fine, or no carriage return would work in linux or osx powershell.
package main
import "fmt"
func main() {
fmt.Print("North East\r\n")
fmt.Print("South West\r\n")
}
go run . > file
Actually this problem is resolved in powershell 7. The file will end up having \r\n even if the go code doesn't.

Related

Issue with PowerShell ConvertTo-Json adding un-printable characters to beginning of file

I am having a problem with a PowerShell ConvertTo-Json command. The resulting file has two non-printable characters as the first to characters of the file. Using Format-Hex, the return values are:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 FF FE 5B 00 0D 00 0A 00 20 00 20 00 20 00 20 00 .þ[..... . . . .
00000010 7B 00 0D 00 0A 00 20 00 20 00 20 00 20 00 20 00 {..... . . . . .
...
The bad characters are the FF and FE in the 00 and 01 positions. The command I am using to generate the file is the following:
$search.resources.attributes |select $Object.Attributes.ID |Sort-Object -Property displayname | ConvertTo-Json | Out-File ([Environment]::GetFolderPath("Desktop")+"\RL_Identities.json")
The $search is a result of an Invoke-RestMethod call.
I am importing this file into an Oracle CLOB column. This column has a CHECK (COLUMN_NAME IS JSON) check constraint and it fails with the two un-printable characters in the file. If I open the file in Notepad++, do a select all and copy/paste to a new file, the new file loads perfectly because it doesn't have the two characters at the beginning.
Is there any reason the two characters are there? Is it "feature" of the ConvertTo-Json command or could it be coming from the data in the Invoke-RestMethod call? If there is no way to prevent these characters from being there, is there a way to programmatically remove the first two bytes of the file?
Your file uses the "Unicode" (UTF-16LE) character encoding, which is what you get by default in Windows PowerShell when you use > / Out-File.
The first two bytes, FF and FE, make up the so-called BOM (byte-order mark), aka Unicode signature, which identifies the encoding.
Instead, use Set-Content (or -Out-File) with the -Encoding parameter to specify the desired encoding.
The caveat is that if you need UTF-8, -Encoding utf8 in Windows PowerShell creates UTF-8 files with a BOM, which not all consumers understand.
In PowerShell (Core) 7+, by contrast, you get BOM-less UTF-8 by default, across all cmdlets (and therefore also with >).
If you're on Windows PowerShell and need to create a BOM-less UTF-8 file, you can use the following workaround via New-Item:
# Creates out.txt with BOM-less UTF-8 encoding.
# Note that the -Value argument must be a single, potentially multi-line
# string and a trailing newline is NOT added.
$null = New-Item -Force out.txt -Value (
ConvertTo-Json 'hü'
)
The sample ConvertTo-Json call results in verbatim "hü". Passing the resulting file to Format-Hex shows that the file has no BOM and that ü (LATIN SMALL LETTER U WITH DIAERESIS, U+00FC) is correctly encoded as UTF-8 byte sequence 0xC3, 0xBC:
Path: C:\Users\jdoe\out.txt
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 22 68 C3 BC 22 "hü"

How to open a binary pipe with Powershell

As far as I have read Powershell can not redirect input streams. Instead one has to use Get-Content to pipe the result to the target program. But this seems to create text streams.
I tried to pipe binary data to plink:
Get-Content client.zip | & 'C:\Program Files (x86)\PuTTY\plink.exe' unix nop
The target system 'unix' is a Debian with a fixed command in the authorized_keys file.
This are the first bytes of the file I tried to transfer:
00000000 50 4b 03 04 0a 00 00 00 00 00 6f 4a 59 50 c8 cb |PK........oJYP..|
And this is what arrived on the target system:
00000000 50 4b 03 04 0d 0a 00 00 00 00 00 6f 4a 59 50 3f |PK.........oJYP?|
'0a' gets replaced by '0d 0a'. I am not sure, but I suppose Get-Content does this.
How to pipe binary data with Powershell?
I installed already Powershell 6. I tried already the options -AsByteStream -ReadCount -Raw and I get may different funny results. But nothing gives my just an exact copy of the zip file. Where is the option "--stop-doing-anything-with-my-file"?
I think I got it myself. This seems to do what I want:
Start-Process 'C:\Program Files (x86)\PuTTY\plink.exe' -ArgumentList "unix nop" -RedirectStandardInput .\client.zip -NoNewWindow -Wait
Give this a try:
# read binary
$bytes = [System.IO.File]::ReadAllBytes('client.zip')
# pipe all Bytes to external prg
$bytes | & 'C:\Program Files (x86)\PuTTY\plink.exe' unix nop

J1939 RTR Issue

I have and issue with rtr frames using candump and cansend.
Dumping the broadcasted data is no issue.
Architecture -
Raspberry pi with a pican shield reading data from a J1939 simulator.
I run candump to receive all messages on the bus. Then get an ack frame back from the simulator when I execute a cansend for pgn feec. Im requesting a preprogrammed VIN but I get nothing back. Here is what Im seeing from candump:
can0 18FEF500 [8] 7D FF FF 40 25 4B FF FF '}..#%K..'
can0 18FEE900 [8] D1 4B 03 00 D1 4B 03 00 '.K...K..'
can0 18FEF700 [8] FF FF FF FF E0 01 FF FF '........'
can0 18FECA00 [8] 03 FF 00 00 00 00 00 00 '........'
can0 00FEEC00 [0] remote request
can0 18E80000 [8] 01 FF FF FF FF EC FE 00 '........'
can0 0CF00300 [8] FF 7D 7D FF FF FF FF FF '.}}.....'
can0 18FE6C00 [8] FF FF FF FF FF FF 80 7D '.......}'
can0 0CF00400 [8] FF FF 7D 80 7D FF FF FF '..}.}...''
The E800 PGN is a standard ack message.
And message I am sending while candump is running:
cansend can0 00feec00#r
Basically, I'm not getting the PGN for VIN back. Any ideas?
Turns out there are a couple of issues here.
1- #r is not supported with J1939
2- you don't request pgns by asking for that pgn directly. the method is to send data to a specific pgn which handles requests. example below:
EA 00 is the PGN to send data to. Inside the data message lives the pgn we want to request (LSB) so PGN FEE5 is now E5FE. Three bytes are required which is why 00 is in the message below.
Here is the working request for Engine Hours:
cansend 18EA00FF#E5FE00
and the reponse:
21 00 00 00 8F 01 00 00

Detect actual charset encoding in UTF

Need good tool to detect encoding of the strings using some kind of mapping or heuristic method.
For example String: áÞåàÐÝØÒ ÜÝÞÓÞ ßàØÛÞÖÕÝØÙ Java, ÜÞÖÝÞ ×ÐÝïâì Òáî ÔÞáâãßÝãî ßÐÜïâì
Expected: сохранив много приложений Java, можно занять всю доступную память
The encoding is "ISO8859-5". When I'am trying to detect it with the below libs the result is "UTF-8". It is obviously that string was saved in utf, but is there any heuristic way using symbols mapping to analyse the characters and match them with the correct encoding?
Used usual encoding detect libs:
- enca (aptitude install enca)
- chardet (aptitude install chardet)
- uchardet (aptitude install uchardet)
- http://tika.apache.org/
- http://npmjs.com/package/detect-encoding
- libencode-detect-perl
- http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html
- http://jchardet.sourceforge.net/
- http://grepcode.com/snapshot/repo1.maven.org/maven2/com.googlecode.juniversalchardet/juniversalchardet/1.0.3/
- http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/src/
- http://userguide.icu-project.org/
- http://site.icu-project.org
You need to unwrap the UTF-8 encoding and then pass it to a character-encoding detection library.
If random 8-bit data is encoded into UTF-8 (assuming an identity mapping, i.e. a C4 byte is assumed to represent U+00C4, as is the case with ISO-8859-1 and its superset Windows 1252), you end up with something like
Source: 8F 0A 20 FE 65
Result: C2 8F 0A 20 C3 BE 65
(because the UTF-8 encoding of U+008F is C2 8F, and U+00FE is C3 BE). You need to revert this encoding in order to obtain the source string, so that you can then identify its character encoding.
In Python, something like
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import chardet
mystery = u'áÞåàÐÝØÒ ÜÝÞÓÞ ßàØÛÞÖÕÝØÙ Java, ÜÞÖÝÞ ×ÐÝïâì Òáî ÔÞáâãßÝãî ßÐÜïâì'
print chardet.detect(mystery.encode('cp1252'))
Result:
{'confidence': 0.99, 'encoding': 'ISO-8859-5'}
On the Unix command line,
vnix$ echo 'áÞåàÐÝØÒ ÜÝÞÓÞ ßàØÛÞÖÕÝØÙ Java, ÜÞÖÝÞ ×ÐÝïâì Òáî ÔÞáâãßÝãî ßÐÜïâì' |
> iconv -t cp1252 | chardet
<stdin>: ISO-8859-5 (confidence: 0.99)
or iconv -t cp1252 file | chardet to decode a file and pass it to chardet.
(For this to work successfully at the command line, you need to have your environment properly set up for transparent Unicode handling. I am assuming that your shell, your terminal, and your locale are adequately configured. Try a recent Ubuntu Live CD or something if your regular environment is stuck in the 20th century.)
In the general case, you cannot know that the incorrectly applied encoding is CP 1252 but in practice, I guess it's going to be correct (as in, yield correct results for this scenario) most of the time. In the worst case, you would have to loop over all available legacy 8-bit encodings and try them all, then look at the one(s) with the highest confidence rating from chardet. Then, the example above will be more complex, too -- the mapping from legacy 8-bit data to UTF-8 will no longer be a simple identity mapping, but rather involve a translation table as well (for example, a byte F5 might correspond arbitrarily to U+0092 or whatever).
(Incidentally, iconv -l spits out a long list of aliases, so you will get a lot of fundamentally identical results if you use that as your input. But here is a quick ad-hoc attempt at fixing your slightly weird Perl script.
#!/bin/sh
iconv -l |
grep -F -v -e UTF -e EUC -e 2022 -e ISO646 -e GB2312 -e 5601 |
while read enc; do
echo 'áÞåàÐÝØÒ ÜÝÞÓÞ ßàØÛÞÖÕÝØÙ Java, ÜÞÖÝÞ ×ÐÝïâì Òáî ÔÞáâãßÝãî ßÐÜïâì' |
iconv -f utf-8 -t "${enc%//}" 2>/dev/null |
chardet | sed "s%^[^:]*%${enc%//}%"
done |
grep -Fwive ascii -e utf -e euc -e 2022 -e None |
sort -k4rn
The output still contains a lot of chaff, but once you remove that, the verdict is straightforward.
It makes no sense to try any multi-byte encodings such as UTF-16, ISO-2022, GB2312, EUC_KR etc in this scenario. If you convert a string into one of these successfully, then the result will most definitely be in that encoding. This is outside the scope of the problem outlined above: a string converted from an 8-bit encoding into UTF-8 using the wrong translation table.
The ones which returned ascii definitely did something wrong; most of them will have received an empty input, because iconv failed with an error. In a Python script, error handling would be more straightforward.)
The string
сохранив много приложений Java, можно занять всю доступную память
is encoded in ISO8859-5 as bytes
E1 DE E5 E0 D0 DD D8 D2 20 DC DD DE D3 DE 20 DF E0 D8 DB DE D6 D5 DD D8 D9 20 4A 61 76 61 2C 20 DC DE D6 DD DE 20 D7 D0 DD EF E2 EC 20 D2 E1 EE 20 D4 DE E1 E2 E3 DF DD E3 EE 20 DF D0 DC EF E2 EC
The string
áÞåàÐÝØÒ ÜÝÞÓÞ ßàØÛÞÖÕÝØÙ Java, ÜÞÖÝÞ ×ÐÝïâì Òáî ÔÞáâãßÝãî ßÐÜïâì
is encoded in ISO-8859-1 as bytes
E1 DE E5 E0 D0 DD D8 D2 20 DC DD DE D3 DE 20 DF E0 D8 DB DE D6 D5 DD D8 D9 20 4A 61 76 61 2C 20 DC DE D6 DD DE 20 D7 D0 DD EF E2 EC 20 D2 E1 EE 20 D4 DE E1 E2 E3 DF DD E3 EE 20 DF D0 DC EF E2 EC
Look familiar? They are the same bytes, just interpreted differently by different charsets.
Any tool that would look at these bytes would not be able to tell you the charset automatically, as they are perfectly valid bytes in both charsets. You would have to tell the tool which charset to use when interpreting the bytes.
Any tool that tells you this particular byte sequence is encoded as UTF-8 is wrong. These are NOT valid UTF-8 bytes.

Why does my Perl CGI program return a server error?

I recently got into learning cgi and I set up an Ubuntu server in vbox. The first program I wrote was in Python using vim through ssh. Then I installed Eclipse on my Windows 7 station and created the exact same Perl file; just a simple hello world deal.
I tried running it, and I was getting a 500 on it, while the Python code in the same dir (/usr/lib/cgi-bin) was showing up fine. Frustrated, I checked and triple-checked the permissions and that it began with #!/usr/bin/perl. I also checked whether or not AddHandler was set to .pl. Everything was set fine, and on a whim I decided to write the same exact code within the server using vim like I did with the Python file.
Lo and behold, it worked. I compared them, thinking I'd gone mad, and they are exactly the same. So, what's the deal? Why is a file made in Windows 7 on Eclipse different than a file made in Ubuntu server with vim? Do they have different binary headers or something? This can really affect my development environment.
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "Testing.";
Apache error log:
[Tue Aug 07 12:32:02 2012] [error] [client 192.168.1.8] (2)No such file or directory: exec of '/usr/lib/cgi-bin/test.pl' failed
[Tue Aug 07 12:32:02 2012] [error] [client 192.168.1.8] Premature end of script headers: test.pl
[Tue Aug 07 12:32:02 2012] [error] [client 192.168.1.8] File does not exist: /var/www/favicon.ico
This is the continuing error I get.
I think you have some spurious \r characters on the first line of your Perl script when you write it in Windows.
For example I created the following file on Windows:
#!/usr/bin/perl
code goes here
When viewed with hexdump it shows:
00000000 23 21 2f 75 73 72 2f 62 69 6e 2f 70 65 72 6c 0d |#!/usr/bin/perl.|
00000010 0a 0d 0a 63 6f 64 65 20 67 6f 65 73 20 68 65 72 |...code goes her|
00000020 65 0d 0a |e..|
00000023
Notice the 0d - \r that I've marked out in that. If I try and run this using ./test.pl I get:
zsh: ./test.pl: bad interpreter: /usr/bin/perl^M: no such file or directory
Whereas if I write the same code in Vim on a UNIX machine I get:
00000000 23 21 2f 75 73 72 2f 62 69 6e 2f 70 65 72 6c 0a |#!/usr/bin/perl.|
00000010 0a 63 6f 64 65 20 67 6f 65 73 20 68 65 72 65 0a |.code goes here.|
00000020
You can fix this in one of several ways:
You can probably make your editor save "UNIX line endings" or similar.
You can run dos2unix or similar on the file after saving it
You can use sed: sed -e 's/\r//g' or similar.
Your apache logs should be able to confirm this (If they don't crank up the logging a bit on your development server).
Sure, it can.
One environment might have a module installed that the other might not.
Perl might be installed in different locations in the two environment.
The environments might have different versions of Perl.
The environments might have different operating systems.
The permissions might be setup incorrectly in one of the environments.
etc
But instead of speculating wildly like this, why don't you check the error log for what error you actually got?
No, they are just text files. Of course, it's possible to write unportable programs, trivially by using system() or other similar services which depend on the environment.