using AT commands. of service in response encoding and read Chinese or Arabic for Nokia phones - command

I am developing an application for GSM Modems using AT commands. I have a problem reading Unicode messages or ussd example:
that dcs=17 not 7 or 15 or 72
Two years ago, and I'm looking for a solution to no availI was able to find a partial solution through the use of Chinese phone where the phone can read Chinese codingBut all Nokia phones do not support the codec Arabic or ChineseAnd service responses appear incomprehensible symbols
Example:
+CUSD: 0,"ar??c
?J <10???#d#??? #0#??#D? ?Z?xb
# $#?#?#Z##?? #-#H?#???#b##$? #3#h?P???#??(??",17
But when you use the phone shows the Chinese response service correctly 100%
How do I address coding through Nokia phones or other

The character set used for strings in AT commands is controlled by AT+CSCS. The default value is "GSM" which is not capable of displaying anything outside a relative limited set of characters.
In your case, to read Arabic or Chinese "UTF-8" is probably the best choice, although "UCS-2" also can be used (will require a little post processing though).
Below you can see how the selected character set affects strings. I have kept the phone number to my Chinese teacher from when I lived in Taiwan, stored as "teacher" in Chinese (lǎo shī). The actual phone number is stripped out here, but otherwise the following is a verbatim copy of the responses from my phone:
$ echo at+cscs? | atinout - /dev/ttyACM0 -
+CSCS: "GSM"
OK
$ echo at+cpbr=403 | atinout - /dev/ttyACM0 -
+CPBR: 403,"",145,"??/M"
OK
$ echo at+cscs=? | atinout - /dev/ttyACM0 -
+CSCS: ("GSM","IRA","8859-1","UTF-8","UCS2")
OK
$ echo 'at+cscs="UTF-8"' | atinout - /dev/ttyACM0 -
OK
$ echo at+cscs? | atinout - /dev/ttyACM0 -
+CSCS: "UTF-8"
OK
$ echo at+cpbr=403 | atinout - /dev/ttyACM0 -
+CPBR: 403,"",145,"老師/M"
OK
$ echo 'at+cscs="UCS2"; +cpbr=403' | atinout - /dev/ttyACM0 -
+CPBR: 403,"",145,"80015E2B002F004D"
OK
$ echo 'at+cscs=?' | atinout - /dev/ttyACM0 -
+CSCS: ("00470053004D","004900520041","0038003800350039002D0031","005500540046002D0038","0055004300530032")
OK
$ echo 'at+cscs="005500540046002D0038"' | atinout - /dev/ttyACM0 -
OK
$ echo 'at+cscs=?' | atinout - /dev/ttyACM0 -
+CSCS: ("GSM","IRA","8859-1","UTF-8","UCS2")
OK
Update, upon checking 27.007, the string for the +CUSD: <m>[,<str>,<dcs>] unsolicited result code is not a regular string, but has its own encoding:
<str>: string type USSD-string (when <str> parameter is not given,
network is not interrogated):
- if <dcs> indicates that 3GPP TS 23.038 [25] 7 bit default alphabet is used:
- if TE character set other than "HEX" (refer command Select TE Character
Set +CSCS): MT/TA converts GSM alphabet into current TE character set
according to rules of 3GPP TS 27.005 [24] Annex A
- if TE character set is "HEX": MT/TA converts each 7-bit character of GSM
alphabet into two IRA character long hexadecimal number (e.g. character
Π (GSM 23) is presented as 17 (IRA 49 and 55))
- if <dcs> indicates that 8-bit data coding scheme is used: MT/TA converts each
8-bit octet into two IRA character long hexadecimal number (e.g. octet with
integer value 42 is presented to TE as two characters 2A (IRA 50 and 65))
<dcs>: 3GPP TS 23.038 [25] Cell Broadcast Data Coding Scheme in integer format
(default 0)
You therefore have to first determine if dcs is 7 or 8 bit, and then decode according to the above.
PS, the "USC2 0x81" format is described here. although it should not behave differently from plain UCS2 in this particular case.

Related

postgres: how to count multibyte emoji strings display length in UTF-8

Postgres (v11) counts the red heart ❤️ as two characters, and so on for other multibyte UTF-8 chars with selector units. Anyone know how I get postgres to count true characters and not the bytes?
For example, I would like both of the examples below should return 1.
select length('❤️') = 2 (Unicode: 2764 FE0F)
select length('🏃‍♂️') = 4 (Unicode: 1F3C3 200D 2642 FE0F)
UPDATE
Thank you to folks pointing out that postgres is correctly counting the Unicode code points and why and how this happens.
I don't see any other option other than pre-processing the emoji strings as bytes against a table of official Unicode character bytes, in Python or some such, to get the perceived length.
So one way to do this is to ignore all characters in the Variation Selector and decrement by 2 if you hit the General Punctuation range.
This could be converted into a postgres function.
python
"""
# For reference, these code pages apply to emojis
Name Range
Emoticons 1F600-1F64F
Supplemental_Symbols_and_Pictographs 1F900-1F9FF
Miscellaneous Symbols and Pictographs 1F300-1F5FF
General Punctuation 2000-206F
Miscellaneous Symbols 2600-26FF
Variation Selectors FE00-FE0F
Dingbats 2700-27BF
Transport and Map Symbols 1F680-1F6FF
Enclosed Alphanumeric Supplement 1F100-1F1FF
"""
emojis="🏃‍♂️🏃‍♂️🏃‍♂️🏃‍♂️🏃‍♂️🏃‍♂️🏃‍♂️" # true count is 7, postgres length() returns 28
true_count=0
for char in emojis:
d=ord(char)
char_type=None
if (d>=0x2000 and d<=0x206F) : char_type="GP" # Zero Width Joiner
elif (d>=0xFE00 and d<=0xFE0F) : char_type="VS" # Variation Selector
print(d, char_type)
if ( char_type=="GP") : true_count-=2
elif (char_type!="VS" ): true_count+=1
print(true_count)

HTTP GET Chinese character using luasocket

I use luasocket to GET a web page which contains Chinese characters "开奖结果" (the page itself is encoded in charset="gb2312"), as below:
require "socket"
host = '61.129.89.226'
fileformat = '/fcopen/cp_kjgg_dfw.jsp?lottery_type=ssq&lottery_issue=%s'
function getlottery(num)
c = assert(socket.connect(host, 80))
c:send('GET ' .. string.format(fileformat, num) .. " HTTP/1.0\r\n\r\n")
content = c:receive('*l')
while content do
if content and content:find('开奖结果') then -- failed
print(content)
end
content = c:receive('*l')
end
c:close()
end
--http://61.129.89.226/fcopen/cp_kjgg_dfw.jsp?lottery_type=ssq&lottery_issue=2012138
getlottery('2012138')
Unfortunately, it fails to match the expected characters:
content:find('开奖结果') -- failed
I know Lua is capable of finding unicode characters:
Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio
> if string.find("This is 开奖结果", "开奖结果") then print("found!") end
found!
Then I guess it might be caused by how luasocket retrieves data from the web. Could anyone shed some lights on this?
Thanks.
If the page is encoded in GB2312, and your script (the file itself) is encoded in utf-8, there's no way the match will work. Because .find() will look for utf-8 codepoints, and it will just slide over the characters you're looking for, because they're not encoded the same way...
开 奖 结 果
GB bfaa bdb1 bde1 b9fb
UTF-16 5f00 5956 7ed3 679c
UTF-8 e5bc80 e5a596 e7bb93 e69e9c

Google Calculator Thousands Separator Special Character

NOTE: For more answers related to this, please see
Special Characters in Google Calculator
I noticed when grabbing the return value for a Google Calculator calculation, the thousands place is separated by a rather odd character. It is not simply a space.
Let's take the example of converting $4,000 USD to GBP.
If you visit the following Google link:
http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp
You'll note that the response is:
{lhs: "4000 U.S. dollars",rhs: "2 497.81441 British pounds",error: "",icc: true}
This looks reasonable, and the thousands place appears to be separated by a whitespace character.
However, if you enter the following into your command line:
curl -s "http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp"
You'll note that the response is:
{lhs: "4000 U.S. dollars",rhs: "2?498.28243 British pounds",error: "",icc: true}
That question mark (?) is a replacement character. What is going on?
AppleScript returns a different replacement character:
{lhs: "4000 U.S. dollars",rhs: "2†498.28243 British pounds",error: "",icc: true}
I am also getting from other sources:
{lhs: "4000 U.S. dollars",rhs: "2�498.28243 British pounds",error: "",icc: true}
It turns out that � is the proper Unicode replacement character 65533.
Can anyone give me insight into what Google is passing me?
It's a non-breaking space, U+00A0. It's to ensure that the number won't get broken at the end of a line.
Google returns the correct encoding (UTF-8) however:
Content-Type: text/html; charset=UTF-8
so ...
if it comes out as a normal space (U+0020) instead (Firefox does that when copying, stupidly enough), then the application performs conversion of certain characters to lookalikes, maybe to fit in some sort of restricted code page (ASCII perhaps).
if there is a question mark, then it was correctly read as Unicode but some part in processing uses a legacy character set that doesn't contain that character so it gets converted.
if there is a replacement character � (U+FFFD) then it was likely read as UTF-8, converted into a legacy character set that contains the character (e.g. Latin 1) and then re-interpreted as UTF-8.
if there is a totally different character, such as your dagger (†), then I'd guess the response is read correctly as Unicode, gets converted to a character set that contains the character and re-interpreted in another character set. A quick look at the Mac Roman codepage reveals that A0 indeed maps to †.
Needless to say, some parts in whatever you use in processing that response seem to be horrible broken in regard to Unicode. Something I'd hope wouldn't really happen that often in this millennium, but apparently it still does.
I figured out what it was by fiddling around in PowerShell a bit:
PS Home:\> $wc = new-object net.webclient
PS Home:\> $x = $wc.downloadstring('http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp')
PS Home:\> [char[]]$x|%{"$_ - " + +$_}
...
" - 34
2 - 50
  - 160
4 - 52
9 - 57
8 - 56
. - 46
2 - 50
8 - 56
2 - 50
4 - 52
...
Also a quick look at the response headers revealed that the encoding is set correctly.
According to my tests with curl in the Terminal on OSX, by changing the International character encoding in the Terminal preferences : The encoding is iso latin 1.
When I set the encoding to UTF8 : I get "2?498.28243"
When I set the encoding to MacRoman : I get "2†498.28243"
First solution : use a user agent from any browser (Safari on OSX 10.6.8 in this example)
curl -s -A 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.48 (KHTML, like Gecko) Version/5.1 Safari/534.48' 'http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp'
Second solution : use iconv
curl -s 'http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp' | iconv -t utf8 -f iso-8859-1
Try
set myUrl to quoted form of "http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp"
set xxx to do shell script "curl " & myUrl & " | sed 's/[†]/,/'"

scapy encoding when stringed

What is this encoding below that you get when you string a packet in scapy? This is certainly not hex.
str(IP()) ’E\x00\x00\x14\x00\x01\x00\x00#\x00|\xe7\x7f\x00\x00\x01\x7f\x00\x00\x01’
the \x is the hex notation. In this case when you use str(IP()) you are trying to convert the packet data into string which is not completely valid because not every raw hex data could be found in ASCII table to substitute it with a letter so any hex that couldn't be converted will seen in this format \x14.
I think the following example will help:
viewing packet summary using scapy method
encoding the packet data into hex format to view using python methods
Welcome to Scapy (2.1.1-dev)
>>> pkt=IP()
>>> pkt.summary()
'127.0.0.1 > 127.0.0.1 ip'
>>> data=str(pkt)
>>> data.encode('hex')
'450000140001000040007ce77f0000017f000001'
>>>
consider these points:
in SCAPY if you create an IP layer without determining the source and destination the loopback address will be set as default for both as shown in the example
'127.0.0.1 > 127.0.0.1 ip'
.summary() is a Scapy method
str(), .encode are a python methods

Easy to remember fingerprints for data?

I need to create fingerprints for RSA keys that users can memorize or at least easily recognize. The following ideas have come to mind:
Break the SHA1 hash into portions of, say 4 bits and use them as coordinates for Bezier splines. Draw the splines and use that picture as a fingerprint.
Use the SHA1 hash as input for some fractal algorithm. The result would need to be unique for a given input, i.e. the output can't be a solid square half the time.
Map the SHA1 hash to entries in a word list (as used in spell checkers or password lists). This would create a passphrase consisting of real words.
Instead of a word list, use some other large data set like Google maps (map the SHA1 hash to map coordinates and use the map region(s) as a fingerprint)
Any other ideas? I'm sure this has been implemented in one form or another.
OpenSSH contains something like that, under the name "visual host key". Try this:
ssh -o VisualHostKey=yes somesshhost
where somesshhost is some machine with a SSH server running. It will print out a "fingerprint" of the server key, both in hexadecimal, and as an ASCII-art image which may look like this:
+--[ RSA 2048]----+
| .+ |
| + o |
| o o + |
| + o + |
| . o E S |
| + * . |
| X o . |
| . * o |
| .o . |
+-----------------+
Or like this:
+--[ RSA 1024]----+
| .*BB+ |
| . .++o |
| = oo. |
| . =o+.. |
| So+.. |
| ..E. |
| |
| |
| |
+-----------------+
Apparently, this is inspired from techniques described in this article. OpenSSH is opensource, with a BSD-like license, so chances are that you could simply reuse their code (it seems to be in the key.c file, function key_fingerprint_randomart()).
For item 3 (entries in a word list), see RFC-1751 - A Convention for Human-Readable 128-bit Keys, which notes that
The authors of S/Key devised a system to make the 64-bit one-time
password easy for people to enter.
Their idea was to transform the password into a string of small
English words. English words are significantly easier for people to
both remember and type. The authors of S/Key started with a
dictionary of 2048 English words, ranging in length from one to four
characters. The space covered by a 64-bit key (2^64) could be covered
by six words from this dictionary (2^66) with room remaining for
parity. For example, an S/Key one-time password of hex value:
EB33 F77E E73D 4053
would become the following six English words:
TIDE ITCH SLOW REIN RULE MOT
You could also use a compound fingerprint to improve memorability, like english words followed (or preceeded) by one or more key-dependent images.
For generating the image, you could use things like Identicon, Wavatar, MonsterID, or RoboHash.
Example:
TIDE ITCH SLOW
REIN RULE MOT
I found something called random art which generates an image from a hash. There is a Python implementation available for download: http://www.random-art.org/about/
There is also a paper about using random art for authentication: http://sparrow.ece.cmu.edu/~adrian/projects/validation/validation.pdf
It's from 1999; I don't know if further research has been done on this.
Your first suggestion (draw the path of splines for every four bytes, then fill using the nonzero fill rule) is exactly what I use for visualization in hashblot.