Is it safe to replace CP850 with UTF-8 encoding - encoding

I have an old project reading files with CP850 encoding. But it handles accent characters wrong (e.g., Montréal becomes MontrÚal).
I want to replace CP850 with UTF-8. The question is:
Is it safe? In other word, can we assume UTF-8 is a super set and Encoding the same way as CP850 encoding characters?
Thanks
I tried hexdump, below is the sample of my csv file, is it UTF-8?
000000d0 76 20 64 65 20 4d 61 72 6c 6f 77 65 2c 2c 4d 6f |v de Marlowe,,Mo|
000000e0 6e 74 72 c3 a9 61 6c 2c 51 43 2c 48 34 41 20 20 |ntr..al,QC,H4A |

If by superset you mean does UTF-8 include all the characters of CP850, then trivially yes, since UTF-8 can encode all valid Unicode code points using a variable-length encoding (1–4 bytes).
If you mean are characters encoded the same way, then as you've seen this is not the case, since é (U+00E9) is encoded as 82 in CP850 and C3 A9 in UTF-8.
I cannot see a character set / code page that encodes Ú as 82, but Ú is encoded as E9 in CP850, which is the ISO-8859-1 representation of é, so it's possible you've got your conversion the wrong way around (i.e. you're converting your file from ISO-8859-1 to CP850, and you want to convert from CP850 to UTF-8).
Here's an example using hd and iconv:
hd test.cp850.txt
00000000 4d 6f 6e 74 72 82 61 6c |Montr.al|
00000008
iconv --from cp850 --to utf8 test.cp850.txt > test.utf8.txt
hd test.utf8.txt
00000000 4d 6f 6e 74 72 c3 a9 61 6c |Montr..al|
00000009

Related

Hex Encoding and Decoding

I have two modules in some 3rd party application(it does not have any documentation and I cannot reveal application name due to confidentiality). One module outputs only integers and other outputs only floating point numbers.
The module that outputs integers has very simple data format as it was HEX representation of numbers in reverse byte order. So, I am able to decode it successfully. But having issues in decoding HEX representation of floating point numbers.
The data below shows the data dump in HEX followed by the expected converted value. I have a little information about its representation that the last two bytes are some sort of CRC, so, it is like 8 byte number with two CRC bytes.
I have highlighted the 8 bytes that needs to be converted and their expected value is given below :
Dataset 1: 02 B5 E6 7B 15 C8 0C 00 0A F9 = 999359.533
Dataset 2: 7C 4C 3A 00 00 00 00 00 B7 4C = 0.001
Can anyone suggest something here, I have tried many encoding schemes including IEEE formats also. I do not have any other relevant information that I can share(I know it will be a hit and trial technique to solve this).
Not sure if this helps but:
02 B5 E6 7B 15 C8 0C 00 = 0x000CC8157BE6B502 = 3597694319113474
7C 4C 3A 00 00 00 00 00 = 0x00000000003A4C7C = 3820668
and
3597694319113474 / 3600000000 = 999359.5331
3820668 / 3600000000 = 0.001061297
So within a certain amount of rounding maybe they are fixed point numbers in fractions of 3600000000?
Can you get some more data points?

Why do you wrap around in 16 bit checksum (hex used)?

I have the question:
Compute the 16-bit checksum for the data block E3 4F 23 96 44 27 99
F3. Then perform the verification calculation.
I can perform the addition and I get the overflow like:
E3 4F
23 96
44 27
99 F3
``````````
1 E4 FF (overflow)
The solution then takes the overflow and adds it causing E4 FF to become E5 00. Can someone explain to me why this occurs?

CRC-16 in MATLAB

I have a working program in LabVIEW that I want to port to MATLAB. It takes an input number, converts it to hex, appends it to a constant string (0110 0001 0002 0400 03), calculates a CRC-16, and sends it all to a COM port. Here are two examples for 1500 and 2000 respectively.
0110 0001 0002 0400 0305 DCC0 AA
0110 0001 0002 0400 0307 D0C1 CF
I can see that dec2hex(1500) produces the 5DC, and dec2hex(2000) produces the 7D0. The AA and the CF are produced by a CRC-16 LabVIEW program, which are 170 and 207 respectively. I understand these are some sort of check-sums, but I can't find a way to reproduce it in MATLAB.
Solution found via: FEX submission
A = ['01';'10';'00';'01';'00';'02';'04';'00';'03';'07';'D0']
dec2hex(append_crc(hex2dec(A)'))
Returns:
01
10
00
01
00
02
04
00
03
07
D0
C1
CF

Manual Wilcoxon Rank-Sum Test

My statistics professor wants us to perform a manual Wilcoxon Rank-Sum Test using Matlab. Unfortunately, I have no experience with Matlab whatsoever, and I have been discovering as I go along. In short, we are given a list of 24 paired observations:
33 53 54 84 69 34 60 34 50 56 64 50 76 47 58 63 55 66 58 43 28 80 45
55
66 62 54 58 60 74 54 68 64 60 53 59 61 49 63 55 61 64 54 59 64 46 70
82
I've gotten to the point where I have a matrix with the absolute differences in the first column, the sign of the difference (indicated by a 1 for positive and -1 for negative) in the second column and the rank of the difference (1 through 24) in the third column.
I am struggling with finding a quick and efficient way to "break the ties" between the differences of equal size and allocating the average rank to each of these differences. I expect that some loops and logical statements may be required, but I am having a hard time with them as I have no prior experience.
Any suggestions on how to do this would be much appreciated.
One way to average over the ranks for entries with matching differences is as follows:
irankavg=zeros(length(dp),1);
[dpu,ix,iclass]=unique(dp);
for ii=1:length(dpu)
irankavg(iclass(ii)==iclass) = mean(irank(iclass(ii)==iclass));
end
where dp is a column array that contains the differences

information about tiff image

iminfo of my image gives
FormatSignature: [73 73 42 0]
Offset: 264322
what is an offset?how this value comes?
It is the signature of a TIFF file. Matlab's way of saying "I endorse this file as TIFF".
Matlab detects the file type by itself, not from the extension. Change the extension to JPG or XYZ, you'll still open it as a TIFF file.
Edit:
This is for PNG for example :
137 80 78 71 13 10 26 10
GIF:
71 73 70 56 57 97 16 0
You can also use an hex editor, open the file, grab the first 8 bytes and convert it from hex to decimal.