Telugu Anu Script Text - unicode

About Indian language script which is losing characters when copy/pasted to browsers
I need to know about the character types and conversion of them to different supportable formats. My question is —
I have text which is typed using Anu Script Software with Apple Keyboard.
The text which is typed using Anu, cannot be used as input at any type of browsers or web WhatsApp also.
Can anyone solve this
The Text copied and Pasted displays like this:-

And the Real Text is as shown as in the Below Screenshot:-
This image shows one language of India, typed using Anu Script Software

The character codes that were copied and pasted into the question are Unicode code points in the Unicode BMP (Basic Multilingual Plane) Private Use Area (PUA). The distinct points are:
U+F020,
U+F026,
U+F02B,
U+F03C,
U+F054,
U+F058,
U+F05C,
U+F06A
U+F073,
U+F075,
U+F077,
U+F079,
U+F080,
U+F083,
U+F087,
U+F088
U+F08A,
U+F090,
U+F091,
U+F09F,
U+F0B2,
U+F0BC,
U+F0BF,
U+F0C2
U+F0D2,
U+F0D4,
U+F0E1,
U+F0E6,
U+F0E7,
U+F0EC,
U+F0FB
If you go to the Unicode Charts page and enter 'F020' as the code, it gives you UE000.pdf to download, which says:
Private Use Area
Range: E000-F8FF
The Private Use Area does not contain any character assignments, consequently no character code charts or names lists are
provided for this area.
What this means is that the Anu Script Software is using Unicode points that have no international agreed meaning — the BMP PUA is, by definition, for 'private use' and the parties sharing data using the PUA must agree on what the code points mean and how to display them. They only work with software that understands the convention. You cannot use these code points except with software that understands what Anu Script Software does.
Browsers will only understand those code points if they're made aware of where the relevant font is, which gets into intricate details and is probably platform specific. (I've no idea where to start!)
The standard Unicode range for Telugu is U+0C00..U+0C7F.
Telugu
Range: 0C00–0C7F
Your best bet is probably to analyze the similarities and differences between the code points used by Anu Script Software and the Unicode standard range for Telugu, and then use the Unicode standard codes. You might need to understand combining accents and various other aspects of Telugu.
I don't know Telugu at all, so what follows may be inaccurate, but I think it more or less makes sense of what's in the Anu Script Software output:
UTF-8 bytes PUA Telugu Glyph
0xEF 0x82 0x87 = U+F087 ==> U+0C08 ఈ
0xEF 0x80 0xA0 = U+F020 ==> U+0020 space
0xEF 0x82 0x80 = U+F080 ==> U+0C06 ఆ
0xEF 0x81 0x9C = U+F05C ==> U+0C32 ల
0xEF 0x81 0xAA = U+F06A \
0xEF 0x83 0xA1 = U+F0E1 ==> U+0C2F య (three code points for one character)
0xEF 0x81 0x94 = U+F054 /
0xEF 0x80 0xAB = U+F02B ==> U+0C66 ౦
0xEF 0x80 0xA0 = U+F020 ==> U+0020 space
0xEF 0x83 0x82 = U+F0C2
0xEF 0x81 0xB3 = U+F073
0xEF 0x80 0xAB = U+F02B
0xEF 0x80 0xA6 = U+F026
0xEF 0x82 0x83 = U+F083
0xEF 0x81 0x94 = U+F054
0xEF 0x80 0xA0 = U+F020 ==> U+0020 space
0xEF 0x80 0xBC = U+F03C
0xEF 0x82 0x8A = U+F08A
0xEF 0x81 0x98 = U+F058
0xEF 0x83 0xA6 = U+F0E6
0xEF 0x81 0xB5 = U+F075
0xEF 0x82 0xB2 = U+F0B2
0xEF 0x83 0x92 = U+F0D2
0xEF 0x81 0x9C = U+F05C
0xEF 0x80 0xA0 = U+F020 ==> U+0020 space
0xEF 0x83 0xA7 = U+F0E7 ==> U+0C46 U+0C66 ౦ె (Note 1)
0xEF 0x82 0xBF = U+F0BF
0xEF 0x83 0xAC = U+F0EC
0xEF 0x83 0x94 = U+F0D4
0xEF 0x83 0xA1 = U+F0E1
0xEF 0x80 0xAB = U+F02B
0xEF 0x80 0xA0 = U+F020 ==> U+0020 space
0xEF 0x81 0xB3 = U+F073
0xEF 0x82 0x90 = U+F090
0xEF 0x83 0xA7 = U+F0E7
0xEF 0x81 0xB7 = U+F077
0xEF 0x82 0x9F = U+F09F
0xEF 0x82 0xBC = U+F0BC
0xEF 0x80 0xA0 = U+F020 ==> U+0020 space
0xEF 0x80 0xBC = U+F03C
0xEF 0x83 0xBB = U+F0FB
0xEF 0x81 0xB9 = U+F079
0xEF 0x82 0x90 = U+F090
0xEF 0x80 0xBC = U+F03C
0xEF 0x82 0x91 = U+F091
0xEF 0x81 0xAA = U+F06A
0xEF 0x83 0xA1 = U+F0E1
0xEF 0x81 0x94 = U+F054
0xEF 0x80 0xA0 = U+F020 ==> U+0020 space
0xEF 0x80 0xBC = U+F03C
0xEF 0x82 0x8A = U+F08A
0xEF 0x81 0xB3 = U+F073
0xEF 0x82 0x90 = U+F090
0xEF 0x82 0x88 = U+F088
0xEF 0x80 0xBC = U+F03C
0xEF 0x82 0x91 = U+F091
0xEF 0x81 0xAA = U+F06A \
0xEF 0x83 0xA1 = U+F0E1 ==> U+0C2F య
0xEF 0x81 0x94 = U+F054 /
Note 1: The TELUGU VOWEL SIGN E U+0C46 should combine with TELUGU DIGIT ZERO U+0C66 — if I've identified the characters correctly, which seems improbable. I will leave off trying here; I recognize some shapes by matching what you show in the image with the Unicode chart page, but I'm not confident of the mapping to the PUA code points.
You should be able to get appropriate information from the people who provided the Anu Script Software.

Related

How to remove '0x' in hex and get 2 digits

I am having a hex output "Res" which looks like this:
Res = 0x0 0x1a 0x9 0x14 0x13 0x0
I want to
- remove the '0x' from the beginning of each
-have 2 digits
- and to remove the spaces in between
i.e I want to have the Res like this: 001a09141300
I tried .join but then I want to have 2 digits first.
This is one way to approach it:
res='0x0 0x1a 0x9 0x14 0x13 0x0'
newStr=''
for x in res.split(' '):
x=x[2:]
if len(x)<2:
x='0'+x
newStr=newStr+x
print(newStr)
Output:
001a09141300
How about this:
res = '0x0 0x1a 0x9 0x14 0x13 0x0'
li = [int(s, 16) for s in res.split()] # [0, 26, 9, 20, 19, 0]
ls = [f"{i:0>2x}" for i in li] # ['00', '1a', '09', '14', '13', '00']
result = "".join(ls)
print(result) # 001a09141300
You need Python 3.6 or higher to use f-string.
If your Python version is lower than that, you may use ls = ["{:0>2x}".format(i) for i in li] instead.
Explanation of f"{i:0>2x}":
>2: Right align with width 2
0 on the left: Fill the empty space with 0
x on th right: Represent as hexadecimal form
res='0x0 0x1a 0x9 0x14 0x13 0x0'
hex_ls=[x.replace('0x','0') if len(x)<4 else x.replace('0x','') for x in res.split(" ")]
print("".join(hex_ls))
The output is 001a09141300
res ='0x0 0x1a 0x9 0x14 0x13 0x0'
res = res.replace('0x', '')
res = res.zfill(4)

convert 16 bits signed (x2) to 32bits unsigned

I've got a problem with a modbus device :
The device send data in modbus protocol.
I read 4 bytes from the modbus communication that represent a pressure value
I have to convert theses 4 bytes in a unsigned 32bits integer.
There is the modbus documentation :
COMBINING 16bit REGISTERS TO 32bit VALUE
Pressure registers 2 & 3 in SENSOR INPUT REGISTER MAP of this guide are stored as u32 (UNSIGNED 32bit INTEGER)
You can calculate pressure manually :
1) Determine what display you have - if register values are positive skip to step 3.
2) Convert negative register 2 & 3 values from Signed to Unsigned (note: 65536 = 216 ):
(reg 2 value) + 65536* = 35464 ; (reg 3 value) + 65536 = 1
3) Shift register #3 as this is the upper 16 bits: 65536 * (converted reg 3 value) = 65536
4) Put two 16bit numbers together: (converted reg 2 value) + (converted reg 3 value) = 35464 + 65536 = 101000 Pa
Pressure information is then 101000 Pascal.
I don't find it very clear... For exemple, we don't have the 4 bytes that gives this calcul.
So, if anybody has a formula to convert my bytes into a 32bits unsigned int it could be very helpful
You should be able to read your bytes in some kind of type representation (hex, dec, bin, oct...)
let's assume you're receiving the following bytes frame:
in hex:
0x00, 0x06, 0x68, 0xA0
in bin:
0000 0000, 0000 0110, 0110 1000, 1010 0000
all of these are different representation of the same 4 bytes values.
Another thing that you should know is the bytes position (endianess):
If you're frame is transmitted in big endian, you're going to read the bytes in the order that you have them ( so 0x00, 0x06, 0x68, 0xA0 is correct).
If the frame is transmitted in little endian, you need to perform the following operation:
Switch the first 2 bytes with the last 2:
0x68, 0xA0, 0x00, 0x06
and then switch the position between the first and the second byte and the third and the fourth byte:
0xA0, 0x68, 0x06, 0x00
so if your frame is in little endian, the correct frame will be 0xA0, 0x68, 0x06, 0x00.
If you don't know the endianess, assume it's in big endian.
Now you simply have to 'put' your values togheter:
0x00, 0x06, 0x68, 0xA0 will become 0x000668A0
or
0000 0000, 0000 0110, 0110 1000, 1010 0000 will become 00000000000001100110100010100000
Once you have your hex or bin, you can convert your bin to an integer or convert your hex to an integer
Here you can find an interesting tool for converting HEX to float, unit32, int32, int16 in all endianess.
TL;DR
if you can use python, you should use struct:
import struct
frame = [0x00, 0x06, 0x68, 0xA0] # or [0, 6, 104, 160] in dec or [0b00000000, 0b00000110, 0b01101000, 0b10100000] in bin
print struct.unpack('>L', ''.join(map(chr, frame)))[0]

Deciphering HCI event from BLUEnrg-ms bluetooth device

I am successfully able to communicate with my IDB05A1 from my nucleo-64 board.
I make it discoverable and pair my phone to it. However immediately after pairing, the device disconnects from my phone.
Before it disconnects I receive a HCI event i cannot decipher:
0x04 0xff 0x0b 0x01 0x0c 0x01 0x08 0x04 0x00 0x02 0x00 0x00 0x02 0x00
please help me decipher this. datasheet with commands and events
0x04 //HCI event
0xff //Vendor specific
0x0b //Contains 0b(12) bytes
0x01, 0x0c //BLUEnrg event code
0x......
What event is this?
This event looks not related to the problem.
It indicates the modified value of some attribute.
HCI event packet:
0x04 //HCI event
0xff //Vendor specific
0x0b //Contains 0b(11) bytes
0x01 0x0c 0x01 0x08 0x04 0x00 0x02 0x00 0x00 0x02 0x00 //Event data
Event data:
0x0c01 //ACI_GATT_ATTRIBUTE_MODIFIED_EVENT
0x0801 //The connection handle which modified the attribute
0x0004 //Handle of the attribute that was modified
0x0002 //Length of the attribute data
0x00 //Offset
0x0002 //The modified value

why did not fill with zeros

Allocated array for 10000 bits = 1250 bytes(10000/8):
mov edi, 1250
call malloc
tested the pointer:
cmp rax, 0
jz .error ; error handling at label down the code
memory was allocated:
(gdb) p/x $rax
$3 = 0x6030c0
attempted to fill that allocated memory with zeros:
mov rdi, rax
xor esi, esi
mov edx, 1250 ; 10000 bits
call memset
checked first byte:
(gdb) p/x $rax
$2 = 0x6030c0
(gdb) x/xg $rax + 0
0x6030c0: 0x0000000000000000
checked last byte(0 - first byte, 1249 - last byte)
(gdb) p/x $rax + 1249
$3 = 0x6035a1
(gdb) x/xg $rax + 1249
0x6035a1: 0x6100000000000000
SOLVED QUESTION
Should have typed x/1c $rax + 1249
You interpreted memory as a 64 bit integer, but you forgot that endianness of intel is little endian. So bytes were reversed.
0x6100000000000000 is the value that the CPU reads when de-serializing the memory at this address. Since it's little endian, the 0x61 byte is last in memory (not very convenient to dump memory in this format, unless you have a big endian architecture)
Use x /10bx $rax + 1249 you'll see that it's zero at the correct location. The rest is garbage (happens to be zero for a while, then garbage)
0x00 0x00 0x00 0x00 0x00 0x00 0x61

SPI fails to read first 6 bytes

I'm having a lot of issues with SPI module on my STM32F051 MCU. I've got it configured as a master to drive a slave flash memory module (that doesn't really matter).
I'm trying to read 8 bytes from memory, this is how the 'read data' message is structured:
First 4 bytes of the message are transmitted, next 8 are received.
First byte is 'read data' opcode, three following are data address and equal 0 in this case.
Code:
memset(out, 0x00, 256);
memset(in, 0x00, 256);
out[0] = OPCODE_READ;
out[1] = 0x00;
out[2] = 0x00;
out[3] = 0x00;
uint32_t len = 4 + size; // size == 8
spi_select(M25P80);
HAL_SPI_TransmitReceive(&hspi1, out, in, len, TIMEOUT);
delay_ms(BYTE_SPEED_MS * 5); // Needed because ^ finishes before physically
// transmitting the data. Nevermind the 5, it
// was picked experimentally
spi_deselect(M25P80);
Signal (yellow - clock, red - miso):
At 488 bits/s transmitting 4 bytes takes 4 * 1E3 / (488 / 8) = 65.5 ms. Then the reception starts. Memory starts transmitting [0xFF...0xFF] right away, but contents of the 'in' buffer are:
[0x00 0x00 0x00 0x00] [0x00 0x00 0x00 0x00 0x00] 0xFF 0xFF 0x00...0x00
^ zero because this ^ should be 0xFF ^ correct data
is the part where
data was being sent
to the memory
So first six bytes of data are just lost. Am I the only one who's having such a hard time with STM's SPI module?
EDIT:
I've gotten myself a different eval board with a slightly different MCU (STM32F030) and it gets even weirder:
[0x02 0x02 0x02 0x02]
0x00 0x02 0x00 0x00 0xFF 0x00 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0x00...0x00
Although I must mention that I'm using a different compiler with this MCU.
EDIT 2:
The way I partially got it to work is using 16-bit mode with SPI. This fixed this particular bug, but there are more similar oddities with STM32's SPI.
EDIT 3:
SPI initialisation code:
void MX_SPI1_Init(void)
{
hspi1.Instance = SPI1;
hspi1.Init.Mode = SPI_MODE_MASTER;
hspi1.Init.Direction = SPI_DIRECTION_2LINES;
hspi1.Init.DataSize = SPI_DATASIZE_16BIT;
hspi1.Init.CLKPolarity = SPI_POLARITY_LOW;
hspi1.Init.CLKPhase = SPI_PHASE_1EDGE;
hspi1.Init.NSS = SPI_NSS_SOFT;
hspi1.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_2;
hspi1.Init.FirstBit = SPI_FIRSTBIT_MSB;
hspi1.Init.TIMode = SPI_TIMODE_DISABLED;
hspi1.Init.CRCCalculation = SPI_CRCCALCULATION_DISABLED;
hspi1.Init.NSSPMode = SPI_NSS_PULSE_DISABLED;
HAL_SPI_Init(&hspi1);
}
Are you sure that the initialization of SPI is right?
Maybe your Clock polarity or phase settings does not match between Master and Slave?
Take a watch to ClockSettings.
Please show your SPI-Initialization-Code!