Why do you wrap around in 16 bit checksum (hex used)? - numbers

I have the question:
Compute the 16-bit checksum for the data block E3 4F 23 96 44 27 99
F3. Then perform the verification calculation.
I can perform the addition and I get the overflow like:
E3 4F
23 96
44 27
99 F3
``````````
1 E4 FF (overflow)
The solution then takes the overflow and adds it causing E4 FF to become E5 00. Can someone explain to me why this occurs?

Related

Is it safe to replace CP850 with UTF-8 encoding

I have an old project reading files with CP850 encoding. But it handles accent characters wrong (e.g., Montréal becomes MontrÚal).
I want to replace CP850 with UTF-8. The question is:
Is it safe? In other word, can we assume UTF-8 is a super set and Encoding the same way as CP850 encoding characters?
Thanks
I tried hexdump, below is the sample of my csv file, is it UTF-8?
000000d0 76 20 64 65 20 4d 61 72 6c 6f 77 65 2c 2c 4d 6f |v de Marlowe,,Mo|
000000e0 6e 74 72 c3 a9 61 6c 2c 51 43 2c 48 34 41 20 20 |ntr..al,QC,H4A |
If by superset you mean does UTF-8 include all the characters of CP850, then trivially yes, since UTF-8 can encode all valid Unicode code points using a variable-length encoding (1–4 bytes).
If you mean are characters encoded the same way, then as you've seen this is not the case, since é (U+00E9) is encoded as 82 in CP850 and C3 A9 in UTF-8.
I cannot see a character set / code page that encodes Ú as 82, but Ú is encoded as E9 in CP850, which is the ISO-8859-1 representation of é, so it's possible you've got your conversion the wrong way around (i.e. you're converting your file from ISO-8859-1 to CP850, and you want to convert from CP850 to UTF-8).
Here's an example using hd and iconv:
hd test.cp850.txt
00000000 4d 6f 6e 74 72 82 61 6c |Montr.al|
00000008
iconv --from cp850 --to utf8 test.cp850.txt > test.utf8.txt
hd test.utf8.txt
00000000 4d 6f 6e 74 72 c3 a9 61 6c |Montr..al|
00000009

CPU and Memory Friendly Solution to Merge Large Matrix

For the following typical case:
n = 1000000;
r = randi(n,n,2);
(assume there are 0.05% common numbers between all rows; n could be even tens of millions)
I am looking for a CPU and Memory efficient solution to merge rows based on any common items (here integer numbers). A list of sample codes in Python is available here and a quick try to translate one into Matlab can be found here.
In my attempt they take ages (minutes to hours), so I am in favor of finding faster solution.
For the above example, the typical output should look like (cell):
{
[1 90 34 67 ... 9]
[35 89]
[45000 23 828 130 8999 45326 ... 11]
...
}
Note also that, I have tried to compile as mex but failed due to no-support for cell in Matlab-Coder.
Edit: A tiny demonstration example
%---------------------------------------
clc
n = 100;
r = randi(n,n,2); % random integers in [1,n], size(n,2)
%---------------------------------------
>> r
r =
82 17 % (1) 82 17
91 13 % (2) 91 13
13 32 % (3) 91 13 32 merged with (2), common 13
82 53 % (4) 82 17 53 merged with (1), common 82
64 17 % (5) 82 17 53 64 merged with (4), common 17
...
94 45
13 31 % (77) 91 13 32 31 merged with (3), common 13
57 51
47 52
2 13 % (80) 91 13 32 31 2 merged with (77), common 13
34 80
%---------------------------------------
c = merge(r); % cpu and memory friendly solution is searched for.
%---------------------------------------
c =
[82 17 53 64]
[91 13 32 31 2]
...
You need an index.
In Python, use a dict. In MATLAB - I'd not use MATLAB, because open-source is the future, and MATLAB is dying out.
But Python is quite slow. You can likely get a 10x speedup by using e.g. Cython to translate and optimize the code in C. Avoid using Python data types such as a list of int, because they are very memory intensive. numpy has memory-efficient arrays of integer.
If you get a new pair (a,b) you can use this dictionary to find existing items to merge. Then update the dict after the merge.
Actually for integers, you should use an array instead of a dict.
The trickiest part is handling the case when both a and b exist, but are large different groups. There are some neat optimizations possible here, if that isn't fast enough yet.
It's not clustering, but connected components.

Hex Encoding and Decoding

I have two modules in some 3rd party application(it does not have any documentation and I cannot reveal application name due to confidentiality). One module outputs only integers and other outputs only floating point numbers.
The module that outputs integers has very simple data format as it was HEX representation of numbers in reverse byte order. So, I am able to decode it successfully. But having issues in decoding HEX representation of floating point numbers.
The data below shows the data dump in HEX followed by the expected converted value. I have a little information about its representation that the last two bytes are some sort of CRC, so, it is like 8 byte number with two CRC bytes.
I have highlighted the 8 bytes that needs to be converted and their expected value is given below :
Dataset 1: 02 B5 E6 7B 15 C8 0C 00 0A F9 = 999359.533
Dataset 2: 7C 4C 3A 00 00 00 00 00 B7 4C = 0.001
Can anyone suggest something here, I have tried many encoding schemes including IEEE formats also. I do not have any other relevant information that I can share(I know it will be a hit and trial technique to solve this).
Not sure if this helps but:
02 B5 E6 7B 15 C8 0C 00 = 0x000CC8157BE6B502 = 3597694319113474
7C 4C 3A 00 00 00 00 00 = 0x00000000003A4C7C = 3820668
and
3597694319113474 / 3600000000 = 999359.5331
3820668 / 3600000000 = 0.001061297
So within a certain amount of rounding maybe they are fixed point numbers in fractions of 3600000000?
Can you get some more data points?

Manual Wilcoxon Rank-Sum Test

My statistics professor wants us to perform a manual Wilcoxon Rank-Sum Test using Matlab. Unfortunately, I have no experience with Matlab whatsoever, and I have been discovering as I go along. In short, we are given a list of 24 paired observations:
33 53 54 84 69 34 60 34 50 56 64 50 76 47 58 63 55 66 58 43 28 80 45
55
66 62 54 58 60 74 54 68 64 60 53 59 61 49 63 55 61 64 54 59 64 46 70
82
I've gotten to the point where I have a matrix with the absolute differences in the first column, the sign of the difference (indicated by a 1 for positive and -1 for negative) in the second column and the rank of the difference (1 through 24) in the third column.
I am struggling with finding a quick and efficient way to "break the ties" between the differences of equal size and allocating the average rank to each of these differences. I expect that some loops and logical statements may be required, but I am having a hard time with them as I have no prior experience.
Any suggestions on how to do this would be much appreciated.
One way to average over the ranks for entries with matching differences is as follows:
irankavg=zeros(length(dp),1);
[dpu,ix,iclass]=unique(dp);
for ii=1:length(dpu)
irankavg(iclass(ii)==iclass) = mean(irank(iclass(ii)==iclass));
end
where dp is a column array that contains the differences

information about tiff image

iminfo of my image gives
FormatSignature: [73 73 42 0]
Offset: 264322
what is an offset?how this value comes?
It is the signature of a TIFF file. Matlab's way of saying "I endorse this file as TIFF".
Matlab detects the file type by itself, not from the extension. Change the extension to JPG or XYZ, you'll still open it as a TIFF file.
Edit:
This is for PNG for example :
137 80 78 71 13 10 26 10
GIF:
71 73 70 56 57 97 16 0
You can also use an hex editor, open the file, grab the first 8 bytes and convert it from hex to decimal.