Is each of these one single UTF-8 character? - unicode

I want to be able to insert these characters into my clipboard programmatically. Is each of them one single UTF-8 character? If not, what encoding are they in?
I was looking though UTF-8 character table http://www.utf8-chartable.de/unicode-utf8-table.pl under latin letters but couldn't find them.
Ấ
Ầ
Ẩ
Ẫ
Ậ
Ứ
Ừ
Ử
Ữ
Ự
Ỡ
Ợ
Ở
Ề
Ể
Ễ

The character table you linked to in your question covers only the codepoints in Unicode's Basic Latin (U+0000..U+007F) and Latin-1 Supplement (U+0080..U+00FF) blocks. Each of the characters you have shown are codepoints in Unicode's Latin Extended Additional block (U+1E00..U+1EFF). When encoded in UTF-8, these characters take up 3 bytes each, as follows:
Ấ = U+1EA4 = E1 BA A4
Ầ = U+1EA6 = E1 BA A6
Ẩ = u+1EA8 = E1 BA A8
Ẫ = U+1EAA = E1 BA AA
Ậ = U+1EAC = E1 BA AC
Ứ = U+1EE8 = E1 BB A8
Ừ = U+1EEA = E1 BB AA
Ử = U+1EEC = E1 BB AC
Ữ = U+1EEE = E1 BB AE
Ự = U+1EF0 = E1 BB B0
Ỡ = U+1EE0 = E1 BB A0
Ợ = U+1EE2 = E1 BB A2
Ở = U+1EDE = E1 BB 9E
Ề = U+1EC0 = E1 BB 80
Ể = U+1EC2 = E1 BB 82
Ễ = U+1EC4 = E1 BB 84
Depending on your platform, you may or may not be able to store UTF-8 on the clipboard. For instance, on Windows, you can store text only as ANSI or UTF-16 (unless you create a custom clipboard format to hold UTF-8).

Related

Integer encoding format

I've run across some PIN encoding which I'm trying to figure out so I can improve upon a web application used at my place of work.
When I reset users' PINs (in this case, just my own for testing purposes), I'm seeing the following:
PIN VALUE
000000 = 7F55858585858585
111111 = 7F55868686868686
222222 = 7F55878787878787
999999 = 7F558E8E8E8E8E8E
000001 = 7F01313131313132
000011 = 7F55858585858686
000111 = 7F01313131323232
001111 = 7F55858586868686
011111 = 7F01313232323232
000002 = 7F02323232323234
100000 = 7F01323131313131
111112 = 7F03343434343435
123456 = 7F0738393A3B3C3D
654321 = 7F073D3C3B3A3938
1357924680 = 7F01323436383A3335373931
1111111111 = 7F5586868686868686868686
1234567890 = 7F0132333435363738393A31
It's clearly just hex, and always starts with 7F (1111111 or 127), but I'm not seeing a pattern for how the next two characters are chosen. Those two characters seem to be the determining value for converting the PIN.
For example:
000000 = 7F 55 858585858585
7F (hex) = 127 (dec) or 1111111 (bin) ## appears to not be used in the calculation?
55 (hex) = 85 (dec) or 1010101 (bin)
0 (PIN) + 85 = 85
000000 = 858585858585
111111 = 7F 55 868686868686
7F (hex) = 127 (dec) or 1111111 (bin) ## appears to not be used in the calculation?
55 (hex) = 85 (dec)
1 (PIN) + 85 = 86
111111 = 868686868686
But then also:
1357924680 = 7F 01 323436383A3335373931
01 (hex) = 31 (dec) ?
1 (PIN) + 31 = 32
1357924680 = 323436383A3335373931
Any help pointing me in the right direction would be greatly appreciated.
I don't see enough data in your minimal reproducible example to uncover an algorithm how the pinshift value should be determined (supplied to the pin_to_hex function). A random value is used in the following solution:
def hex_to_pin( pinhex: str) -> list:
'''
decode a PIN from a particular hexadecimal-formatted string
hex_to_pin('7F0738393A3B3C3D')
inverse of the "pin_to_hex" function (any of the following):
hex_to_pin(pin_to_hex('123456', 7))
pin_to_hex(*hex_to_pin('7F0738393A3B3C3D'))
'''
xxaux = bytes.fromhex(pinhex)
return [bytes([x - xxaux[1] for x in xxaux[2:]]).decode(),
xxaux[1]]
def pin_to_hex( pindec: str, pinshift: int, upper=False) -> str:
'''
encode a PIN to a particular hexadecimal-formatted string
pin_to_hex('123456', 7)
inverse of the "hex_to_pin" function (any of the following):
pin_to_hex(*hex_to_pin('7F0738393A3B3C3D'),True)
hex_to_pin(pin_to_hex('123456', 7))
'''
shift_ = max( 1, pinshift % 199) ## 134 for alpha-numeric PIN code
retaux = [b'\x7F', shift_.to_bytes(1, byteorder='big')]
for digit_ in pindec.encode():
retaux.append( (digit_ + shift_).to_bytes(1, byteorder='big'))
if upper:
return (b''.join(retaux)).hex().upper()
else:
return (b''.join(retaux)).hex()
def get_pin_shift( pindec: str) -> int:
'''
determine "pinshift" parameter for the "pin_to_hex" function
currently returns a random number
'''
return random.randint(1,198) ## (1,133) for alpha-numeric PIN code
hexes = [
'7F01323436383A3335373931',
'7F0738393A3B3C3D',
'7F558E8E8E8E8E8E'
]
print("hex_to_pin:")
maxlen = len( max(hexes, key=len))
deces = []
for xshex in hexes:
xsdec = hex_to_pin( xshex)
print( f"{xshex:<{maxlen}} ({xsdec[1]:>3}) {xsdec[0]}")
deces.append(xsdec[0])
import random
print("pin_to_hex:")
for xsdec in deces:
xsshift = get_pin_shift( xsdec)
xshex = pin_to_hex( xsdec, xsshift)
print( f"{xshex:<{maxlen}} ({xsshift:>3}) {xsdec}")
Output SO\71875753.py
hex_to_pin:
7F01323436383A3335373931 ( 1) 1357924680
7F0738393A3B3C3D ( 7) 123456
7F558E8E8E8E8E8E ( 85) 999999
pin_to_hex:
7f1041434547494244464840 ( 16) 1357924680
7f4e7f8081828384 ( 78) 123456
7f013a3a3a3a3a3a ( 1) 999999

How to get the UTF-8 code from a single character in VBScript

I would like to get the UTF-8 Code of a character, have attempted to use streams but it doesn't seem to work:
Example: פ should give 16#D7A4, according to https://en.wikipedia.org/wiki/Pe_(Semitic_letter)#Character_encodings
Const adTypeBinary = 1
Dim adoStr, bytesthroughado
Set adoStr = CreateObject("Adodb.Stream")
adoStr.Charset = "utf-8"
adoStr.Open
adoStr.WriteText labelString
adoStr.Position = 0
adoStr.Type = adTypeBinary
adoStr.Position = 3
bytesthroughado = adoStr.Read
Msgbox(LenB(bytesthroughado)) 'gives 2
adoStr.Close
Set adoStr = Nothing
MsgBox(bytesthroughado) ' gives K
Note: AscW gives Unicode - not UTF-8
The bytesthroughado is a value of byte() subtype (see 1st output line) so you need to handle it in an appropriate way:
Option Explicit
Dim ss, xx, ii, jj, char, labelString
labelString = "ařЖפ€"
ss = ""
For ii=1 To Len( labelString)
char = Mid( labelString, ii, 1)
xx = BytesThroughAdo( char)
If ss = "" Then ss = VarType(xx) & " " & TypeName( xx) & vbNewLine
ss = ss & char & vbTab
For jj=1 To LenB( xx)
ss = ss & Hex( AscB( MidB( xx, jj, 1))) & " "
Next
ss = ss & vbNewLine
Next
Wscript.Echo ss
Function BytesThroughAdo( labelChar)
Const adTypeBinary = 1 'Indicates binary data.
Const adTypeText = 2 'Default. Indicates text data.
Dim adoStream
Set adoStream = CreateObject( "Adodb.Stream")
adoStream.Charset = "utf-8"
adoStream.Open
adoStream.WriteText labelChar
adoStream.Position = 0
adoStream.Type = adTypeBinary
adoStream.Position = 3
BytesThroughAdo = adoStream.Read
adoStream.Close
Set adoStream = Nothing
End Function
Output:
cscript D:\bat\SO\61368074q.vbs
8209 Byte()
a 61
ř C5 99
Ж D0 96
פ D7 A4
€ E2 82 AC
I used characters ařЖפ€ to demonstrate the functionality of your UTF-8 encoder (the alts8.ps1 PowerShell script comes from another project):
alts8.ps1 "ařЖפ€"
Ch Unicode Dec CP IME UTF-8 ? IME 0405/cs-CZ; CP852; ANSI 1250
a U+0061 97 …97… 0x61 a Latin Small Letter A
ř U+0159 345 …89… 0xC599 Å� Latin Small Letter R With Caron
Ж U+0416 1046 …22… 0xD096 Ð� Cyrillic Capital Letter Zhe
פ U+05E4 1508 …228… 0xD7A4 פ Hebrew Letter Pe
€ U+20AC 8364 …172… 0xE282AC â�¬ Euro Sign

Read text file in matlab

I am trying to read a text file in matlab. I have done this, but I don't know how to store this value in an array.
My text file contains data like this:
01 ff 02 ff
02 ff 02 ff
03 ff 02 ff
file = fopen(fpath,'r');
allData = textscan(file, '%s', 'delimiter','\n');
for i = 1:491003
newData = allData{1,1}{i};
end
I want to store each row in separate array, something like this:
a[0] = '01 ff 02 ff'
a[1] = '02 ff 02 ff'
Once I have such arrays, I want to access each value of this arrays, something like this:
a[0][0] = 01, a[0][1] = ff, a[0][2] = 02..
a[1][0] = 02, a[1][1] = ff, a[1][2] = 02..
I am new to MATLAB and couldn't find much help myself. Plz help.
allData = textscan(file, '%s %s %s %s');
allData will be a cell array
Ok, I finally got my answer. I used "Import Data" facility which is available in Matlab 2013. It really helps you to get your data in the way you want.
Cheers.

the code is not returning any value and the code is being stored in user defined function instead of stored procedure

CREATE PROCEDURE bank(
in bk_cd_in CHAR( 4 ) ,
out bk_cd_out CHAR( 4 ) ,
out bk_nm CHAR( 40 ) ,
out brh_cd CHAR( 8 ) ,
out bak_hnm CHAR( 40 ) ,
out ur_cd CHAR(18),
out updt DATE,
out updt_flag CHAR( 1 ) ,
out brh_nm CHAR( 40 ) ,
out cty_nm CHAR( 40 ) )
SELECT bank_cd, bank_nm, branch_cd, bank_hnm, user_cd, update_dt, update_flag, branch_nm, city_nm
FROM bankmst
WHERE bank_cd = bk_cd_in
into bk_cd_out, bk_nm, brh_cd, bak_hnm, ur_cd, updt, updt_flag, brh_nm, cty_nm ;
the calling code written in jsp the out parametres are empty and on running the jsp page the error : com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'b1' in 'field list' ended
try
{
cs = con.prepareCall("{call bank(?,?,?,?,?,?,?,?,?,?)}");
String s1 = para;
cs.setString(1,para);
cs.registerOutParameter(2, java.sql.Types.VARCHAR);
cs.registerOutParameter(3, java.sql.Types.VARCHAR);
cs.registerOutParameter(4, java.sql.Types.CHAR);
cs.registerOutParameter(5, java.sql.Types.CHAR);
cs.registerOutParameter(6, java.sql.Types.CHAR);
cs.registerOutParameter(7, java.sql.Types.DATE);
cs.registerOutParameter(8, java.sql.Types.CHAR);
cs.registerOutParameter(9, java.sql.Types.CHAR);
cs.registerOutParameter(10, java.sql.Types.CHAR);
cs.registerOutParameter(11, java.sql.Types.INTEGER);
rs = cs.executeQuery();
out.println("\nexecuted 3\n");
String b1 = cs.getString(2);
String b2 = cs.getString(3);
String b3 = cs.getString(4);
String b4 = cs.getString(5);
String b5 = cs.getString(6);
java.util.Date b6 = cs.getDate(7);
String b7 = cs.getString(8);
String b8 = cs.getString(9);
String b9 = cs.getString(10);
System.out.println(b1);
System.out.println(b2);
System.out.println(b3);
System.out.println(b4);
System.out.println(b5);
System.out.println(b6);
System.out.println(b7);
System.out.println(b8);
System.out.println(b9);
}
..I am using eclipse 3.6 and the code is written in mysql 5.1

How to write a unicode symbol in lua

How can I write a Unicode symbol in lua. For example I have to write symbol with 9658
when I write
string.char( 9658 );
I got an error. So how is it possible to write such a symbol.
Lua does not look inside strings. So, you can just write
mychar = "►"
(added in 2015)
Lua 5.3 introduced support for UTF-8 escape sequences:
The UTF-8 encoding of a Unicode character can be inserted in a literal string with the escape sequence \u{XXX} (note the mandatory enclosing brackets), where XXX is a sequence of one or more hexadecimal digits representing the character code point.
You can also use utf8.char(9658).
Here is an encoder for Lua that takes a Unicode code point and produces a UTF-8 string for the corresponding character:
do
local bytemarkers = { {0x7FF,192}, {0xFFFF,224}, {0x1FFFFF,240} }
function utf8(decimal)
if decimal<128 then return string.char(decimal) end
local charbytes = {}
for bytes,vals in ipairs(bytemarkers) do
if decimal<=vals[1] then
for b=bytes+1,2,-1 do
local mod = decimal%64
decimal = (decimal-mod)/64
charbytes[b] = string.char(128+mod)
end
charbytes[1] = string.char(vals[2]+decimal)
break
end
end
return table.concat(charbytes)
end
end
c=utf8(0x24) print(c.." is "..#c.." bytes.") --> $ is 1 bytes.
c=utf8(0xA2) print(c.." is "..#c.." bytes.") --> ¢ is 2 bytes.
c=utf8(0x20AC) print(c.." is "..#c.." bytes.") --> € is 3 bytes.
c=utf8(0x24B62) print(c.." is "..#c.." bytes.") --> 𤭢 is 4 bytes.
Maybe this can help you:
function FromUTF8(pos)
local mod = math.mod
local function charat(p)
local v = editor.CharAt[p]; if v < 0 then v = v + 256 end; return v
end
local v, c, n = 0, charat(pos), 1
if c < 128 then v = c
elseif c < 192 then
error("Byte values between 0x80 to 0xBF cannot start a multibyte sequence")
elseif c < 224 then v = mod(c, 32); n = 2
elseif c < 240 then v = mod(c, 16); n = 3
elseif c < 248 then v = mod(c, 8); n = 4
elseif c < 252 then v = mod(c, 4); n = 5
elseif c < 254 then v = mod(c, 2); n = 6
else
error("Byte values between 0xFE and OxFF cannot start a multibyte sequence")
end
for i = 2, n do
pos = pos + 1; c = charat(pos)
if c < 128 or c > 191 then
error("Following bytes must have values between 0x80 and 0xBF")
end
v = v * 64 + mod(c, 64)
end
return v, pos, n
end
To get broader support for Unicode string content, one approach is slnunicode which was developed as part of the Selene database library. It will give you a module that supports most of what the standard string library does, but with Unicode characters and UTF-8 encoding.