UTF8 Hex Codepoint to Decimal Mismatch - unicode

I'm working on a program that takes the hex value of a unicode character and converts it to an integer, then to a byte array, then to a UTF-8 string. All is fine other than the fact that, for example, the hex value E2 82 AC (€ symbol) is 14 844 588 in decimal, but, if you look at the code point value of it on the web page provided below, it's 226 130 172, which is a big difference.
http://utf8-chartable.de/unicode-utf8-table.pl?start=8320&number=128&names=-
If you sort the values their by decimal, they're not just converting the hex to decimal. Obviously I don't understand encodings as well as I thought I did.
E2 82 AC maps to 226 130 172 instead of 14 844 588.
Why is this discrepancy?
Thanks in advance.

I think your statement, "the hex value E2 82 AC (€ symbol) is 14 844 588 in decimal", is incorrect.
How did you interpret the hex values E2, 82, and AC?
hex E2 = hex E * 16 + hex 2 = 14 * 16 + 2 = 226.
hex 82 = hex 8 * 16 + hex 2 = 8 * 16 + 2 = 130.
hex AC = hex A * 16 + hex C = 10 * 16 + 12 = 172.
So, the hex value E2 82 AC (€ symbol) is in fact 226 130 172 in decimal.

Related

[guid]::NewGuid().GetBytes() returns different result than [System.Text.Encoding]::UTF8.GetBytes(...)

I found this excellent approach on shortening GUIDs here on stackowerflow: .NET Short Unique Identifier
I have some other strings that I wanted to treat the same way, but I found out that in most cases the Base64String is even longer than the original string.
My question is: why does [guid]::NewGuid().ToByteArray() return a significant smaller byte array than [System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)?
For example, let's look at the following GUID:
$guid = [guid]::NewGuid()
$guid
Guid
----
34c2b21e-18c3-46e7-bc76-966ae6aa06bc
With $guid.GetBytes(), the following is returned:
30
178
194
52
195
24
231
70
188
118
150
106
230
170
6
188
And [System.Convert]::ToBase64String($guid.ToByteArray()) generates HrLCNMMY50a8dpZq5qoGvA==
[System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($guid.Guid)), however, returns MzRjMmIyMWUtMThjMy00NmU3LWJjNzYtOTY2YWU2YWEwNmJj, with [System.Text.Encoding]::UTF8.GetBytes($guid.Guid) being:
51
52
99
50
98
50
49
101
45
49
56
99
51
45
52
54
101
55
45
98
99
55
54
45
57
54
54
97
101
54
97
97
48
54
98
99
The GUID struct is an object storing a 16 byte array that contains its value.
These are the 16 bytes you see when you perform its method .ToByteArray() method.
The 'normal' string representation is a grouped series of these bytes in hexadecimal format. (4-2-2-2-6)
As for converting to Base64, this will always return a longer string because each Base64 digit represents exactly 6 bits of data.
Therefore, every three 8-bits bytes of the input (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits).
The resulting string can even be extended with = padding characters at the end of the string to always be a multiple of 4.
The result is a string of [math]::Ceiling(<original size> / 3) * 4 length.
Using [System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid) is actually first performing the GUID's .ToString() method and from that string it will return the ascii values of each character in there.
(hexadecimal representation = 2 characters per byte = 32 values + the four dashes in it leaves a 36-byte array)
[guid]::NewGuid().ToByteArray()
In the scope of this question, a GUID can be seen as a 128-bit number (actually it is a structure, but that's not relevant to the question). When converting it into a byte array, you divide 128 by 8 (bits per byte) and get an array of 16 bytes.
[System.Text.Encoding]::UTF8.GetBytes([guid]::NewGuid().Guid)
This converts the GUID to a hexadecimal string representation first. Then this string gets encoded as UTF-8.
A hex string uses two characters per input byte (one hex digit for the lower and one for the upper 4 bits). So we need at least 32 characters (16 bytes of GUID multiplied by 2). When converted to UTF-8 each character relates to exactly one byte, because all hex digits as well as the dash are in the basic ASCII range which maps 1:1 to UTF-8. So including the dashes we end up with 32 + 4 = 36 bytes.
So this is what [System.Convert]::ToBase64String() has to work with - 16 bytes of input in the first case and 36 bytes in the second case.
Each Base64 output digit represents up to 6 input bits.
16 input bytes = 128 bits, divided by 6 = 22 Base64 characters
36 input bytes = 288 bits, divided by 6 = 48 Base64 characters
That's how you end up with more than twice the number of Base64 characters when converting a GUID to hex string first.

setxor single value of cell array with whole cell array values?

My cell array,it contain values like
a=['10100011' '11000111' 00010111' 11100011 '];
I want to apply xor operation ;
I used setxor. I want to xor first value of array i.e 10100011 to all values of cell arrays,required input & output is as follows !
**setxor(10100011, 10100011) =00000000%(first value xor result with first value)
setxor(10100011 , 11000111) =1100100%(first val xor result with second value)
setxor(10100011, 00010111 )=10110100%(first val xor result with third value)
setxor(10100011 ,11100011)=1000000 %(first val xor result with forth value)**
but i don't know how to pass full cell array and single value, i tried to use cellfun but that's not working. I want something that setxor(first val of my cell array , all vals of my cell array) as my cell array is 16x16. Your help would be highly appriciated.
If you're starting from data that looks like this:
A = [163 215 9 131 248 72 246 244 179 33 21 120 153 177 175 249; ...
231 45 77 138 206 76 202 46 82 149 217 30 78 56 68 40];
And you want to XOR the bits of the first entry with every other entry, you don't have to convert A using dec2bin. You can just use bitxor, then format the result however you want (decimal or cell arrays of binary strings):
>> decOut = bitxor(A(1), A)
decOut =
0 116 170 32 91 235 85 87 16 130 182 219 58 18 12 90
68 142 238 41 109 239 105 141 241 54 122 189 237 155 231 139
>> binOut = reshape(cellstr(dec2bin(decOut)), size(A))
binOut =
2×16 cell array
Columns 1 through 10
'00000000' '01110100' '10101010' '00100000' '01011011' '11101011' '01010101' '01010111' '00010000' '10000010'
'01000100' '10001110' '11101110' '00101001' '01101101' '11101111' '01101001' '10001101' '11110001' '00110110'
Columns 11 through 16
'10110110' '11011011' '00111010' '00010010' '00001100' '01011010'
'01111010' '10111101' '11101101' '10011011' '11100111' '10001011'

Function readmtx on matlab

I want to read a matrix that is on my matlab path. I was using the function readmtx but I don't know what to put on 'precision' (mtx = readmtx(fname,nrows,ncols,precision)).
I was wondering if you could help me with that. Or suggest a better way to read the matrix
You could read a matrix from text file with load command. If the first line include text, that should be started with %.
Note that each row of the text file should be values of a row in matrix, which are separated by a space, for Example:
%C1 C2 C3
1 2 3
4 5 6
7 8 9
Then, if you use load command you can read the text file into a matrix, something like:
myMatrix = load('textFileName.txt')
Now, Let's talk about readmtx ;)
About precision as described here:
Both binary and formatted data files can be read. If the file is binary, the precision argument is a format string recognized by fread. Repetition modifiers such as '40*char' are not supported. If the file is formatted, precision is a fscanf and sscanf-style format string of the form '%nX', where n is the number of characters within which the formatted data is found, and X is the conversion character such as 'g' or 'd'. Fortran-style double-precision output such as '0.0D00' can be read using a precision string such as '%nD', where n is the number of characters per element. This is an extension to the C-style format strings accepted by sscanf. Users unfamiliar with C should note that '%d' is preferred over '%i' for formatted integers. MATLAB syntax follows C in interpreting '%i' integers with leading zeros as octal. Formatted files with line endings need to provide the number of trailing bytes per row, which can be 1 for platforms with carriage returns or linefeed (Macintosh, UNIX®), or 2 for platforms with carriage returns and linefeeds (DOS).
Check this example also:
Write and read a binary matrix file:
fid = fopen('binmat','w');
fwrite(fid,1:100,'int16');
fclose(fid);
mtx = readmtx('binmat',10,10,'int16')
mtx =
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
mtx = readmtx('binmat',10,10,'int16',[2 5],3:2:9)
mtx =
13 15 17 19
23 25 27 29
33 35 37 39
43 45 47 49

how to save values to .dat file in specific format using matlab

I have a matrix of values like [150 255 25;400 80 10;240 68 190]. I want to store these values to text file in hexadecimal format such that each value in matrix is represented by 3digit hexa value (12bit). i.e
Decimal Hexa notation
150 255 25 096 0FF 019
400 80 10 -> 190 050 00A
240 68 190 0F0 044 0BE
I am using like this
`fp=fopen('represen.dat','wb');
for i=1:1:x
for j=1:1:y
fprintf(fp,"%3x\t",A(i,j));
end
fprintf(fp,"\n");
end`
It is giving result as
Decimal Hexa notation
150 255 25 96 FF 19
400 80 10 -> 190 50 0A
240 68 190 F0 44 BE
help me in this regard..
First you have to convert the data to hex:
myHexData = dec2hex(myDecimalData)
Then you can save it, as explained here and mentioned in the comments by Deve:
how-to-save-values-to-text-file-in-specific-format-using-matlab

Little Endian Encoding

The following byte sequence is encoded as Little Endian Unsigned Int.
F0 00 00 00
I just read about endianness. Just wanted to verify if it is 240 decimal.
Translating the byte sequence to bits...
[1111 0000] [0000 0000] [0000 0000] [0000 0000]
Converting the first byte to decimal...
= 0*2^0 + 0*2^1 + 0*2^2 + 0*2^3 + 1*2^4 + 1*2^5 + 1*2^6 + 1*2^7
Doing the math...
= 16 + 32 + 64 + 128 = 240
Yes, 0x000000F0 = 240.
If it were big-endian, it would be 0xF0000000 = 4026531840 (or -268435456 if signed).