I want to be able to insert these characters into my clipboard programmatically. Is each of them one single UTF-8 character? If not, what encoding are they in?
I was looking though UTF-8 character table http://www.utf8-chartable.de/unicode-utf8-table.pl under latin letters but couldn't find them.
Ấ
Ầ
Ẩ
Ẫ
Ậ
Ứ
Ừ
Ử
Ữ
Ự
Ỡ
Ợ
Ở
Ề
Ể
Ễ
The character table you linked to in your question covers only the codepoints in Unicode's Basic Latin (U+0000..U+007F) and Latin-1 Supplement (U+0080..U+00FF) blocks. Each of the characters you have shown are codepoints in Unicode's Latin Extended Additional block (U+1E00..U+1EFF). When encoded in UTF-8, these characters take up 3 bytes each, as follows:
Ấ = U+1EA4 = E1 BA A4
Ầ = U+1EA6 = E1 BA A6
Ẩ = u+1EA8 = E1 BA A8
Ẫ = U+1EAA = E1 BA AA
Ậ = U+1EAC = E1 BA AC
Ứ = U+1EE8 = E1 BB A8
Ừ = U+1EEA = E1 BB AA
Ử = U+1EEC = E1 BB AC
Ữ = U+1EEE = E1 BB AE
Ự = U+1EF0 = E1 BB B0
Ỡ = U+1EE0 = E1 BB A0
Ợ = U+1EE2 = E1 BB A2
Ở = U+1EDE = E1 BB 9E
Ề = U+1EC0 = E1 BB 80
Ể = U+1EC2 = E1 BB 82
Ễ = U+1EC4 = E1 BB 84
Depending on your platform, you may or may not be able to store UTF-8 on the clipboard. For instance, on Windows, you can store text only as ANSI or UTF-16 (unless you create a custom clipboard format to hold UTF-8).
Related
I've run across some PIN encoding which I'm trying to figure out so I can improve upon a web application used at my place of work.
When I reset users' PINs (in this case, just my own for testing purposes), I'm seeing the following:
PIN VALUE
000000 = 7F55858585858585
111111 = 7F55868686868686
222222 = 7F55878787878787
999999 = 7F558E8E8E8E8E8E
000001 = 7F01313131313132
000011 = 7F55858585858686
000111 = 7F01313131323232
001111 = 7F55858586868686
011111 = 7F01313232323232
000002 = 7F02323232323234
100000 = 7F01323131313131
111112 = 7F03343434343435
123456 = 7F0738393A3B3C3D
654321 = 7F073D3C3B3A3938
1357924680 = 7F01323436383A3335373931
1111111111 = 7F5586868686868686868686
1234567890 = 7F0132333435363738393A31
It's clearly just hex, and always starts with 7F (1111111 or 127), but I'm not seeing a pattern for how the next two characters are chosen. Those two characters seem to be the determining value for converting the PIN.
For example:
000000 = 7F 55 858585858585
7F (hex) = 127 (dec) or 1111111 (bin) ## appears to not be used in the calculation?
55 (hex) = 85 (dec) or 1010101 (bin)
0 (PIN) + 85 = 85
000000 = 858585858585
111111 = 7F 55 868686868686
7F (hex) = 127 (dec) or 1111111 (bin) ## appears to not be used in the calculation?
55 (hex) = 85 (dec)
1 (PIN) + 85 = 86
111111 = 868686868686
But then also:
1357924680 = 7F 01 323436383A3335373931
01 (hex) = 31 (dec) ?
1 (PIN) + 31 = 32
1357924680 = 323436383A3335373931
Any help pointing me in the right direction would be greatly appreciated.
I don't see enough data in your minimal reproducible example to uncover an algorithm how the pinshift value should be determined (supplied to the pin_to_hex function). A random value is used in the following solution:
def hex_to_pin( pinhex: str) -> list:
'''
decode a PIN from a particular hexadecimal-formatted string
hex_to_pin('7F0738393A3B3C3D')
inverse of the "pin_to_hex" function (any of the following):
hex_to_pin(pin_to_hex('123456', 7))
pin_to_hex(*hex_to_pin('7F0738393A3B3C3D'))
'''
xxaux = bytes.fromhex(pinhex)
return [bytes([x - xxaux[1] for x in xxaux[2:]]).decode(),
xxaux[1]]
def pin_to_hex( pindec: str, pinshift: int, upper=False) -> str:
'''
encode a PIN to a particular hexadecimal-formatted string
pin_to_hex('123456', 7)
inverse of the "hex_to_pin" function (any of the following):
pin_to_hex(*hex_to_pin('7F0738393A3B3C3D'),True)
hex_to_pin(pin_to_hex('123456', 7))
'''
shift_ = max( 1, pinshift % 199) ## 134 for alpha-numeric PIN code
retaux = [b'\x7F', shift_.to_bytes(1, byteorder='big')]
for digit_ in pindec.encode():
retaux.append( (digit_ + shift_).to_bytes(1, byteorder='big'))
if upper:
return (b''.join(retaux)).hex().upper()
else:
return (b''.join(retaux)).hex()
def get_pin_shift( pindec: str) -> int:
'''
determine "pinshift" parameter for the "pin_to_hex" function
currently returns a random number
'''
return random.randint(1,198) ## (1,133) for alpha-numeric PIN code
hexes = [
'7F01323436383A3335373931',
'7F0738393A3B3C3D',
'7F558E8E8E8E8E8E'
]
print("hex_to_pin:")
maxlen = len( max(hexes, key=len))
deces = []
for xshex in hexes:
xsdec = hex_to_pin( xshex)
print( f"{xshex:<{maxlen}} ({xsdec[1]:>3}) {xsdec[0]}")
deces.append(xsdec[0])
import random
print("pin_to_hex:")
for xsdec in deces:
xsshift = get_pin_shift( xsdec)
xshex = pin_to_hex( xsdec, xsshift)
print( f"{xshex:<{maxlen}} ({xsshift:>3}) {xsdec}")
Output SO\71875753.py
hex_to_pin:
7F01323436383A3335373931 ( 1) 1357924680
7F0738393A3B3C3D ( 7) 123456
7F558E8E8E8E8E8E ( 85) 999999
pin_to_hex:
7f1041434547494244464840 ( 16) 1357924680
7f4e7f8081828384 ( 78) 123456
7f013a3a3a3a3a3a ( 1) 999999
I would like to get the UTF-8 Code of a character, have attempted to use streams but it doesn't seem to work:
Example: פ should give 16#D7A4, according to https://en.wikipedia.org/wiki/Pe_(Semitic_letter)#Character_encodings
Const adTypeBinary = 1
Dim adoStr, bytesthroughado
Set adoStr = CreateObject("Adodb.Stream")
adoStr.Charset = "utf-8"
adoStr.Open
adoStr.WriteText labelString
adoStr.Position = 0
adoStr.Type = adTypeBinary
adoStr.Position = 3
bytesthroughado = adoStr.Read
Msgbox(LenB(bytesthroughado)) 'gives 2
adoStr.Close
Set adoStr = Nothing
MsgBox(bytesthroughado) ' gives K
Note: AscW gives Unicode - not UTF-8
The bytesthroughado is a value of byte() subtype (see 1st output line) so you need to handle it in an appropriate way:
Option Explicit
Dim ss, xx, ii, jj, char, labelString
labelString = "ařЖפ€"
ss = ""
For ii=1 To Len( labelString)
char = Mid( labelString, ii, 1)
xx = BytesThroughAdo( char)
If ss = "" Then ss = VarType(xx) & " " & TypeName( xx) & vbNewLine
ss = ss & char & vbTab
For jj=1 To LenB( xx)
ss = ss & Hex( AscB( MidB( xx, jj, 1))) & " "
Next
ss = ss & vbNewLine
Next
Wscript.Echo ss
Function BytesThroughAdo( labelChar)
Const adTypeBinary = 1 'Indicates binary data.
Const adTypeText = 2 'Default. Indicates text data.
Dim adoStream
Set adoStream = CreateObject( "Adodb.Stream")
adoStream.Charset = "utf-8"
adoStream.Open
adoStream.WriteText labelChar
adoStream.Position = 0
adoStream.Type = adTypeBinary
adoStream.Position = 3
BytesThroughAdo = adoStream.Read
adoStream.Close
Set adoStream = Nothing
End Function
Output:
cscript D:\bat\SO\61368074q.vbs
8209 Byte()
a 61
ř C5 99
Ж D0 96
פ D7 A4
€ E2 82 AC
I used characters ařЖפ€ to demonstrate the functionality of your UTF-8 encoder (the alts8.ps1 PowerShell script comes from another project):
alts8.ps1 "ařЖפ€"
Ch Unicode Dec CP IME UTF-8 ? IME 0405/cs-CZ; CP852; ANSI 1250
a U+0061 97 …97… 0x61 a Latin Small Letter A
ř U+0159 345 …89… 0xC599 Å� Latin Small Letter R With Caron
Ж U+0416 1046 …22… 0xD096 Ð� Cyrillic Capital Letter Zhe
פ U+05E4 1508 …228… 0xD7A4 פ Hebrew Letter Pe
€ U+20AC 8364 …172… 0xE282AC â�¬ Euro Sign
I am trying to read a text file in matlab. I have done this, but I don't know how to store this value in an array.
My text file contains data like this:
01 ff 02 ff
02 ff 02 ff
03 ff 02 ff
file = fopen(fpath,'r');
allData = textscan(file, '%s', 'delimiter','\n');
for i = 1:491003
newData = allData{1,1}{i};
end
I want to store each row in separate array, something like this:
a[0] = '01 ff 02 ff'
a[1] = '02 ff 02 ff'
Once I have such arrays, I want to access each value of this arrays, something like this:
a[0][0] = 01, a[0][1] = ff, a[0][2] = 02..
a[1][0] = 02, a[1][1] = ff, a[1][2] = 02..
I am new to MATLAB and couldn't find much help myself. Plz help.
allData = textscan(file, '%s %s %s %s');
allData will be a cell array
Ok, I finally got my answer. I used "Import Data" facility which is available in Matlab 2013. It really helps you to get your data in the way you want.
Cheers.
CREATE PROCEDURE bank(
in bk_cd_in CHAR( 4 ) ,
out bk_cd_out CHAR( 4 ) ,
out bk_nm CHAR( 40 ) ,
out brh_cd CHAR( 8 ) ,
out bak_hnm CHAR( 40 ) ,
out ur_cd CHAR(18),
out updt DATE,
out updt_flag CHAR( 1 ) ,
out brh_nm CHAR( 40 ) ,
out cty_nm CHAR( 40 ) )
SELECT bank_cd, bank_nm, branch_cd, bank_hnm, user_cd, update_dt, update_flag, branch_nm, city_nm
FROM bankmst
WHERE bank_cd = bk_cd_in
into bk_cd_out, bk_nm, brh_cd, bak_hnm, ur_cd, updt, updt_flag, brh_nm, cty_nm ;
the calling code written in jsp the out parametres are empty and on running the jsp page the error : com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 'b1' in 'field list' ended
try
{
cs = con.prepareCall("{call bank(?,?,?,?,?,?,?,?,?,?)}");
String s1 = para;
cs.setString(1,para);
cs.registerOutParameter(2, java.sql.Types.VARCHAR);
cs.registerOutParameter(3, java.sql.Types.VARCHAR);
cs.registerOutParameter(4, java.sql.Types.CHAR);
cs.registerOutParameter(5, java.sql.Types.CHAR);
cs.registerOutParameter(6, java.sql.Types.CHAR);
cs.registerOutParameter(7, java.sql.Types.DATE);
cs.registerOutParameter(8, java.sql.Types.CHAR);
cs.registerOutParameter(9, java.sql.Types.CHAR);
cs.registerOutParameter(10, java.sql.Types.CHAR);
cs.registerOutParameter(11, java.sql.Types.INTEGER);
rs = cs.executeQuery();
out.println("\nexecuted 3\n");
String b1 = cs.getString(2);
String b2 = cs.getString(3);
String b3 = cs.getString(4);
String b4 = cs.getString(5);
String b5 = cs.getString(6);
java.util.Date b6 = cs.getDate(7);
String b7 = cs.getString(8);
String b8 = cs.getString(9);
String b9 = cs.getString(10);
System.out.println(b1);
System.out.println(b2);
System.out.println(b3);
System.out.println(b4);
System.out.println(b5);
System.out.println(b6);
System.out.println(b7);
System.out.println(b8);
System.out.println(b9);
}
..I am using eclipse 3.6 and the code is written in mysql 5.1
How can I write a Unicode symbol in lua. For example I have to write symbol with 9658
when I write
string.char( 9658 );
I got an error. So how is it possible to write such a symbol.
Lua does not look inside strings. So, you can just write
mychar = "►"
(added in 2015)
Lua 5.3 introduced support for UTF-8 escape sequences:
The UTF-8 encoding of a Unicode character can be inserted in a literal string with the escape sequence \u{XXX} (note the mandatory enclosing brackets), where XXX is a sequence of one or more hexadecimal digits representing the character code point.
You can also use utf8.char(9658).
Here is an encoder for Lua that takes a Unicode code point and produces a UTF-8 string for the corresponding character:
do
local bytemarkers = { {0x7FF,192}, {0xFFFF,224}, {0x1FFFFF,240} }
function utf8(decimal)
if decimal<128 then return string.char(decimal) end
local charbytes = {}
for bytes,vals in ipairs(bytemarkers) do
if decimal<=vals[1] then
for b=bytes+1,2,-1 do
local mod = decimal%64
decimal = (decimal-mod)/64
charbytes[b] = string.char(128+mod)
end
charbytes[1] = string.char(vals[2]+decimal)
break
end
end
return table.concat(charbytes)
end
end
c=utf8(0x24) print(c.." is "..#c.." bytes.") --> $ is 1 bytes.
c=utf8(0xA2) print(c.." is "..#c.." bytes.") --> ¢ is 2 bytes.
c=utf8(0x20AC) print(c.." is "..#c.." bytes.") --> € is 3 bytes.
c=utf8(0x24B62) print(c.." is "..#c.." bytes.") --> 𤭢 is 4 bytes.
Maybe this can help you:
function FromUTF8(pos)
local mod = math.mod
local function charat(p)
local v = editor.CharAt[p]; if v < 0 then v = v + 256 end; return v
end
local v, c, n = 0, charat(pos), 1
if c < 128 then v = c
elseif c < 192 then
error("Byte values between 0x80 to 0xBF cannot start a multibyte sequence")
elseif c < 224 then v = mod(c, 32); n = 2
elseif c < 240 then v = mod(c, 16); n = 3
elseif c < 248 then v = mod(c, 8); n = 4
elseif c < 252 then v = mod(c, 4); n = 5
elseif c < 254 then v = mod(c, 2); n = 6
else
error("Byte values between 0xFE and OxFF cannot start a multibyte sequence")
end
for i = 2, n do
pos = pos + 1; c = charat(pos)
if c < 128 or c > 191 then
error("Following bytes must have values between 0x80 and 0xBF")
end
v = v * 64 + mod(c, 64)
end
return v, pos, n
end
To get broader support for Unicode string content, one approach is slnunicode which was developed as part of the Selene database library. It will give you a module that supports most of what the standard string library does, but with Unicode characters and UTF-8 encoding.