Python, using real byte values in Bytearray

Python, using real byte values in Bytearray - sockets

I'm using Socket-Communication between my PC and a SPS Unit. They are charing bytes to communicate, by using the python bytestring().
On PC side, I have to switch numeric commands (integer), but them into the bytestring and send it via socket.
Values from 8, 9, 12 and much more, produces an error on SPS side, because the are changed by UTF-8: 8 --> \t , 9 --> \n.
How can I keep this data in real hex and without coding UTF?
my_b = bytearray()
my_b.append(8) #fill with Bytevalues
my_b.append(9)
my_b.append(10)
my_b.append(11)
my_b.append(12)
my_b.append(13)
print (my_b)
>> bytearray(b'\x08\t\n\x0b\x0c\r')

They are not being changed. \t, \n and \r are simply built-in short-hand for expressing the byte values of 9, 10, and 13. Those values are ASCII control characters (tab, line-feed and carriage-return, specifically).
To prove this, add these lines to the end of your code snippet:
with open("/tmp/foo", "wb") as f:
f.write(my_b)
Then dump /tmp/foo as hex bytes:
od -tx1 /tmp/foo
0000000 08 09 0a 0b 0c 0d
0000006

Related

Powershell: Translate Octet String (SNMP) Output to Hex (Mac address)

So i will shortly explain the env:
Need to work on a Win2k8 Server with Powershell 4.0
I want to get some information with using SNMP (so printer type and printer MAC address):
$SNMP = new-object -ComObject olePrn.OleSNMP
$SNMP.open($P_IP,"public",2,3000)
$PType = $SNMP.get(".1.3.6.1.2.1.25.3.2.1.3.1")
$PMac = $SNMP.get(".1.3.6.1.2.1.2.2.1.6.2")
echo $PType
echo $PMac
So, the Output looks like this (as an example):
$PType = HP Officejet Pro 251dw Printer
$PMac =  ÓÁÔ*
So, first of all i started to check, if i used the right OID - using the command line tool of SnmpSoft Company. There, the output looked well:
OID= OID=.1.3.6.1.2.1.2.2.1.6.2
Type=OctetString
Value= A0 D3 C1 D4 2A 95 ....*.
Alright, so i started to check, what kind of datatype this OID value have: It's octet string. In the next steps, i started to search for possibilities, how to transform this octet string value to some readable hex - until now without any progression. i tried to transform it into Bytes this way:
$bytes = [System.Text.Encoding]::Unicode.GetBytes($PMac)
[System.Text.Encoding]::ASCII.GetString($bytes)
echo $bytes
But the output just confusing me
160
0
211
0
193
0
212
0
42
0
34
32
Tryed to interpret this output without any success. Google can't help me anymore because i don't understand slowly also, how or what to search.
So here i am and hoping to get some help or an advice, how i can change the output of this query to something readable.

It's an encoding problem.
1.3.6.1.2.1.2.2.1.6 is the interface physical address. So I would expect the value to be the MAC address of the interface. Your command line result begins with A0-D3-C1, which is an HP MAC address range, so it's consistent. Your printers MAC address must be A0 D3 C1 D4 2A 95? You didn't state that, so you're leaving me to guess.
I suspect that $PMac is supposed to be a [byte[]] (byte array), but the output is converting it to a string and PowerShell's output system is interpreting it as characters.
Example:
PS C:\> [byte[]]$bytes = 0xa0, 0xd3, 0xc1, 0xd4, 0x2a, 0x95
PS C:\> [System.Text.Encoding]::Default.GetString($bytes)
 ÓÁÔ*•
You probably need to do something like this:
$MAC = [System.Text.Encoding]::Default.GetBytes($PMac) | ForEach-Object {
$_.ToString('X2')
}
$MAC = $MAC -join '-'
You may want to use [System.Text.Encoding]::ASCII.GetBytes($PMac) instead, since raw SNMP is supposed to use ASCII encoding. I've no idea what olePrn.OleSNMP uses.
You might also look at one of the SNMP PowerShell modules on the PowerShell Gallery. That will be much easier than dealing with COM object in PowerShell.
I also came across this page on #SNMP's handling of OCTET STRING. #SNMP is a .Net SNMP library, and OCTET STRING appears to be what the underlying type is for this OID. The page describes some of the difficulties of working with this particular object type with .Net. You could also use this library for developing your own Cmdlets in PowerShell; it's available through NuGet.

The output you got is very nearly your expected MAC address
160 0 211 0 193 0 212 0 42 0 34 32
160 is decimal for hexadecimal 0xA0
211 is 0xD3
193 is 0xC1
The additional zeros between each byte may have been added during the Unicode.GetBytes call, which I don't think you'll need to use.
I suspect you'll need to read $PMac as an array of bytes, then do hexadecimal string conversion for each byte. This is probably not the most elegant, but may get the job done:
[byte[]] $arrayofBytes = #(160,211,193)
[string] $hexString
foreach ($b in $arrayofBytes) {
$HexString += [convert]::toString($b,16)
$HexString += ' '
}

Send Concatenating SMS with Pdu Format

Sir,
I have send sms in PDU formate through AT commands.
AT+CMGS=18
0011000C912933634241140000AA04D370DA0C
Message send successfuly.But when i am trying to send message with UDH & UDHL i am using the Following At Command but show me Error .....
AT+CMGS=24
0011000C912933634241140000AA05000303020104D370DA0C
What is the wrong in my code please help me.

Is it 7 bit encoded? I'm trying to solve this problem myself, and the problem itself is that your UDH requires the message part (04D370DA0C) to be padded (at least in my case).
The text below is from https://en.wikipedia.org/wiki/Concatenated_SMS#PDU_Mode_SMS
the UDH is a total of (number of octets x bit size of octets) 6 x 8 = 48 bits long. Therefore, a single bit of padding has to be prepended to the message. The UDH is therefore (bits for UDH / bits per septet) = (48 + 1)/7 = 7 septets in length.
With a message of "Hello world", the [message] is encoded as
90 65 36 FB 0D BA BF E5 6C 32
as you need to prepend the least significant bits of the next 7bit
character whereas without padding, the [message] would be
C8 32 9B FD 06 DD DF 72 36 19
and the UDL is 7 (header septets) + 11 (message septets) = 18 septets.

perl perlpacktut not making sense for me

I am REALLY confused about pack and unpack definition for perl.
Below is the excerpt from perl.doc.org
The pack function converts values to a byte sequence containing
representations according to a given specification, the so-called
"template" argument. unpack is the reverse process, deriving some values
from the contents of a string of bytes.
So I get the idea that pack takes human readable things(such as A) and turn it into binary format. Am I wrong on this interpretation??
So that is my interpreation but then same doc immediately proceeds to put this example which put my understanding exactly the opposite.
my( $hex ) = unpack( 'H*', $mem );
print "$hex\n";
What am I missing?

The pack function puts one or more things together in a single string. It represents things as octets (bytes) in a way that it can unpack reliably in some other program. That program might be far away (like, the distance to Mars far away). It doesn't matter if it starts as something human readable or not. That's not the point.
Consider some task where you have a numeric ID that's up to about 65,000 and a string that might be up to six characters.
print pack 'S A6', 137, $ARGV[0];
It's easier to see what this is doing if you run it through a hex dumper as you run it:
$ perl pack.pl Snoopy | hexdump -C
00000000 89 00 53 6e 6f 6f 70 79 |..Snoopy|
The first column counts the position in the output so ignore that. Then the first two octets represent the S (short, 'word', whatever, but two octets) format. I gave it the number 137 and it stored that as 0x8900. Then it stored 'Snoopy' in the next six octets.
Now try it with a shorter name:
$ perl test.pl Linus | hexdump -C
00000000 89 00 4c 69 6e 75 73 20 |..Linus |
Now there's a space character at the end (0x20). The packed data still has six octets. Try it with a longer name:
$ perl test.pl 'Peppermint Patty' | hexdump -C
00000000 89 00 50 65 70 70 65 72 |..Pepper|
Now it truncates the string to fit the six available spaces.
Consider the case where you immediately send this through a socket or some other way of communicating with something else. The thing on the other side knows it's going to get eight octets. It also knows that the first two will be the short and the next six will be the name. Suppose the other side stored that it $tidy_little_package. It gets the separate values by unpacking them:
my( $id, $name ) = unpack 'S A6', $tidy_little_package;
That's the idea. You can represent many values of different types in a binary format that's completely reversible. You send that packed string wherever it needs to be used.
I have many more examples of pack in Learning Perl and Programming Perl.

Base64 Encoding and Decoding

I would appreciate if someone could please explain this to me.
I came across this post (not important just reference) and saw a token encoded with base64 where the guy decoded it.
EYl0htUzhivYzcIo+zrIyEFQUE1PQkk= -> t3+(:APPMOBI
I then tried to encode t3+(:APPMOBI again using base64 to see if I would get the same result, but was very surprised to get:
t3+(:APPMOBI - > dDMrKDpBUFBNT0JJ
Completly different token.
I then tried to decode the original token EYl0htUzhivYzcIo+zrIyEFQUE1PQkk= and got t3+(:APPMOBI with random characters between it. (I got ◄ëtå╒3å+╪═┬(√:╚╚APPMOBI could be wrong, I quickly did it off the top off my head)
What is the reason for the difference in tokens were they not supposed to be the same?

The whole purpose of base64 encoding is to encode binary data into text representation so that they can be transmitted over the network or displayed without corruption. But it ironically happened with the original post you were referring to,
EYl0htUzhivYzcIo+zrIyEFQUE1PQkk= does NOT decode to t3+(:APPMOBI
instead, it contains some binary bytes(not random btw) that you correctly showed. So the problem was due to the original post where either the author or the tool/browser that she/he used "cleaned up", or rather corrupted the decoded binary data.
There is always one-to-one relationship between encoded and decoded data (provided the same "base" is used, i.e. the same set of characters are used for encoded text.)
t3+(:APPMOBI indeed will be encoded into dDMrKDpBUFBNT0JJ

The problem is in the encoding that displayed the output to you, or in the encoding that you used to input the data to base64. This is actually the problem that base64 encoding was invented to help solve.
Instead of trying to copy and paste the non-ASCII characters, save the output as a binary file, then examine it. Then, encode the binary file. You'll see the same base64 string.
c:\TEMP>type b.txt
EYl0htUzhivYzcIo+zrIyEFQUE1PQkk=
c:\TEMP>base64 -d b.txt > b.bin
c:\TEMP>od -t x1 b.bin
0000000 11 89 74 86 d5 33 86 2b d8 cd c2 28 fb 3a c8 c8
0000020 41 50 50 4d 4f 42 49
c:\TEMP>base64 -e b.bin
EYl0htUzhivYzcIo+zrIyEFQUE1PQkk=
od is a tool (octal dump) that outputs binary data using hexadecimal notation, and shows each of the bytes.
EDIT:
You asked about a different string in your comments, dDMrKDpBUFBNT0JJ, and why does that decode to the same thing? Well, it doesn't decode to the same thing. It decodes to this string of bytes: 74 33 2b 28 3a 41 50 50 4d 4f 42 49. Your original string decoded to this string of bytes: 11 89 74 86 d5 33 86 2b d8 cd c2 28 fb 3a c8 c8 41 50 50 4d 4f 42 49.
Notice the differences: your original string decoded to 23 bytes, your second string decoded to only 12 bytes. The original string included non-ASCII bytes like 11, d5, d8, cd, c2, fb, c8, c8. These bytes don't print the same way on every system. You referred to them as "random bytes", but they're not. They're part of the data, and base64 is designed to make sure they can be transmitted.
I think to understand why these strings are different, you need to first understand the nature of character data, what base64 is, and why it exists. Remember that computers work only on numbers, but people need to work with familiar concepts like letters and digits. So ASCII was created as an "encoding" standard that represents a little number (we call this little number a "byte") as a letter or a digit, so that we humans can read it. If we line up a group of bytes, we can spell out a message. 41 50 50 4d 4f 42 49 are the bytes that represent the word APPMOBI. We call a group of bytes like this a "string".
Every letter from A-Z and every digit from 0-9 has a number specified in ASCII that represents it. But there are many extra numbers that are not in the standard, and not all of those represent visible or sensible letters or digits. We say they're non-printable. Your longer message includes many bytes that aren't printable (you called them random.)
When a computer program like email is dealing with a string, if the bytes are printable ASCII characters, it's easy. The email program knows what to do with them. But if your bytes instead represent a picture, the bytes could have values that aren't ASCII, and various email programs won't know what to do with them. Base64 was created to take all kinds of bytes, both printable and non-printable bytes, and translate them into a string of bytes representing only printable letters. Because they're all printable, a program like email or a web server can easily handle them, even if it doesn't know that they actually contain a picture.
Here's the decode of your new string:
c:\TEMP>type c.txt
dDMrKDpBUFBNT0JJ
c:\TEMP>base64 -d c.txt
t3+(:APPMOBI
c:\TEMP>base64 -d c.txt > c.bin
c:\TEMP>od -t x1 c.bin
0000000 74 33 2b 28 3a 41 50 50 4d 4f 42 49
0000014
c:\TEMP>type c.bin
t3+(:APPMOBI
c:\TEMP>

What can explain this bad character-encoding?

What "stack" of bad encoding would produce the following bytes of weirdness for the string "cinéma télédiffusion"? (I left out the space character, hex: 20)
cinÃ%ma
in HEX: 63 69 6E C3 83 25 6D 61
mapped: c i n ---�---- m a
tÃclÃcdiffusion
in HEX: 74 C3 83 63 6C C3 83 63 64 69 66 66 75 73 69 6F 6E
mapped: t ---�---- l ---�---- d i f f u s i o n
The ---�---- parts represent the bytes that aren't right.
I considered the idea "What if it was a messed up transcoding? How about a double encoding?", but, looking at http://www.fileformat.info/info/unicode/char/00e9/charset_support.htm (and the code page edition, too), I noted that there no encodings that could possibly end é with the hex bytes %25 or %63. It doesn't even look like double-UTF8 encoding at this point, because, http://en.wikipedia.org/wiki/UTF-8 clarified that bytes following a %C3 would need to be have the first bits set to 10xxxxxx.
How could some program have turned the accented é into an "Ã followed by %" as well as "Ã followed by c"? I want to trace back the history of the misencoding so that I can try to come up with something that can take steps at repairing the mangled strings.
There also exists the possibility that the é weren't ever é to begin with, but I can't fathom what kind of typo someone could have made in the same phrase to get two different versions of é that eventually get misencoded into two completely different sets of bytes.
Extra context details: I find these mangled strings inside of an XML file. The file has no <?xml version="1.0"?> header, so it's presumed to be UTF-8. There exists nodes containing phrases that have perfectly good é characters in them at the same time that there exists nodes containing phrases with mangled é characters.
iconv-and-family don't do anything at all to help this situation, as far as I've attempted.
A couple of trailing considerations that I now hold are: Should I suspect MySQL and its infamously lazy character set transcodings? Could it be somebody's really badly written custom encoding function as they exported the XML?

The encoding looks a bit strange:
Taken the é from cinéma results in for utf-8 encoding:
é = C3 A9
where you got:
C3 83 25
So when it will be double encoded the following should happen:
c3: Ã -> c3 83
a9: © -> c2 a9
But this will not explain the 25 within your result.
25: %
So the question is if this is encode once, then unknown characters like © will be replaced by % and then it's encoded a second time?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse