Detect number of bytes required for an arbitrary number in Scala - scala

I'm trying to figure out the simplest way to write a function to detect the amount of bytes required for a number in Scala.
For instance the number
0 should be 0 bytes
1 should be 1 byte
127 should be 1 byte
128 should be 2 bytes
32767 should be 2 bytes
32768 should be 3 bytes
8388607 should be 3 bytes
8388608 should be 4 bytes
2147483647 should be 4 bytes
2147483648 should be 5 bytes
549755813887 should be 5 bytes
549755813888 should be 6 bytes
9223372036854775807 should be 8 bytes.
-1 should be 1 byte
-127 should be 1 bytes
-128 should be 2 bytes
-32767 should be 2 bytes
-32768 should be 3 bytes
-8388607 should be 3 bytes
-8388608 should be 4 bytes
-2147483647 should be 4 bytes
-2147483648 should be 5 bytes
-549755813887 should be 5 bytes
-549755813888 should be 6 bytes
-9223372036854775807 should be 8 bytes
is there any way to do this besides doing the math figuring out where the number is wrt 2^N?

After all the precisions in the comments, I guess the algorithm for negative numbers would be: whatever the answer for their opposite would be; and Long.MinValue is not an acceptable input value.
Therefore, I suggest:
def bytes(x: Long): Int = {
val posx = x.abs
if (posx == 0L) 0
else (64 - java.lang.Long.numberOfLeadingZeros(posx)) / 8 + 1
}
Tests needed.

As I mentioned, you're basically asking for "what's the smallest power-of-2-number larger than my number", with a bit of adjustment for the extra digit for the sign (positive or negative).
Here's my solution, although the result differs for 0 and -128, because, as Bergi commented on your question, you can't really write 0 with 0 bytes, and -128 fits in 1 byte.
import Math._
def bytes(x: Double): Int = {
val y = if (x >= 0) x + 1 else -x
ceil((log(y)/log(2) + 1)/8).toInt
}

Related

Why Int8.max &+ Int8.max equals to "-2"?

Following Swift Standard Library documentation, &+ discards any bits that overflow the fixed width of the integer type. I just did not get why adding two maximum values, 8-bit signed integer can hold results in -2:
/// Two max Int8 values (127 each, 8-bit group)
let x6 = Int8.max
let x7 = Int8.max
/// Prints `1 1 1 1 1 1 1`
String(Int8.max, radix: 2)
/// Here we get `-2` in decimal system
let x8 = x6 &+ x7
/// Prints `-1 0`
String(x8, radix: 2)
If we break down the binary calculation we will get this:
1 1 1 1 1 1 1
+ 1 1 1 1 1 1 1
-----------------------------
1 1 1 1 1 1 1 0
Which is -126, as the leftmost bit is a negative sign.
Why does Swift discards any bits except the rightmost two (1 and 0). Did I miss some overflow rules? I've read some pieces of knowledge in the web, but did not get closed to cracking this one.
Swift (and every other programming language I know) uses 2's complement to represent signed integers, rather than sign-and-magnitude as you seem to assume.
In the 2's complement representation, the leftmost 1 does not represent "a negative sign". You can think of it as representing -128, so the Int8 value of -2 would be represented as 1111 1110 (-128 + 64 + 32 + 16 + 8 + 4 + 2).
OTOH, -126 would be represented as 1000 0010 (-128 + 2).

Error while trying to use polyval

I have the following vector
vec = [ 255 0 255 0 255 0 255 0 255 0 255 0 255 0 255 0]
vec 1x16 double
and using the following command
polyval(vec', 256);
I get
ans = 3.3896e+038
but when I try to get back my original vector
vec2 = decimal2base(ans, 256)
I get
vec2 = 255 0 255 0 255 1 0 0 0 0 0 0 0 0 0 0
and this is clearly not my original vector.
Whats more if again I run polyval in this vector
polyval(vec2', 256);
I get
ans=
3.3896e+038
I am not entirely sure what sort of mistake I am making as I know that my conversion functions are ok, so it must be a number precision thing.
Ah, large numbers. The value 3.3896e+038 is higher than the maximum integer that can be represented by a double without loss of accuracy.
That maximum number is 2^53 or
>> flintmax('double')
ans =
9.0072e+15
So you are losing accuracy and you cannot reverse the computation.
Doing the computations with uint64 values only:
>> pows = uint64(fliplr(0:numel(vec)-1));
>> sum(uint64(vec).*(uint64(256).^pows),'native')
ans =
18446744073709551615
That's about 1.84e+19. Just a little different from what you get if you use doubles. But wait... that number looks familiar:
>> intmax('uint64')
ans =
18446744073709551615
So, you've maxed out unsigned 64-bit integers too:
>> uint64(256).^pows
ans =
Columns 1 through 5
18446744073709551615 18446744073709551615 18446744073709551615 18446744073709551615 18446744073709551615
Columns 6 through 10
18446744073709551615 18446744073709551615 18446744073709551615 72057594037927936 281474976710656
Columns 11 through 15
1099511627776 4294967296 16777216 65536 256
Column 16
When you get above 255^8 or so, you're passing intmax('uint64') and you can't manage numbers this large, at least not with MATLAB's built-in data types.
see if this returns '1':
polyval(vec(6:end),256)==polyval(vec2(6:end),256);
If so, then it's just a property of '255+1' for that special 'vec'.

Padding in MD5 Hash Algorithm

I need to understand the Md5 hash algorithm. I was reading a documents and it states
"The message is "padded" (extended) so that its length (in bits) is
congruent to 448, modulo 512. That is, the message is extended so
that it is just 64 bits shy of being a multiple of 512 bits long.
Padding is always performed, even if the length of the message is
already congruent to 448, modulo 512."
I need to understand what this means in simple terms, especially the 448 modulo 512. The word MODULO is the issue. Please I will appreciate simple examples to this. Funny though, this is the first step to MD5 hash! :)
Thanks
Modulo or mod, is a function that results in telling you the remainder when two numbers are divided by each other.
For example:
5 modulo 3:
5/3 = 1, with 2 remainder. So 5 mod 3 is 2.
10 modulo 16 = 10, because 16 cannot be made.
15 modulo 5 = 0, because 15 goes into 5 exactly 3 times. 15 is a multiple of 5.
Back in school you would have learnt this as "Remainder" or "Left Over", modulo is just a fancy way to say that.
What this is saying here, is that when you use MD5, one of the first things that happens is that you pad your message so it's long enough. In MD5's case, your message must be n bits, where n= (512*z)+448 and z is any number.
As an example, if you had a file that was 1472 bits long, then you would be able to use it as an MD5 hash, because 1472 modulo 512 = 448. If the file was 1400 bits long, then you would need to pad in an extra 72 bits before you could run the rest of the MD5 algorithm.
Modulus is the remainder of division. In example
512 mod 448 = 64
448 mod 512 = 448
Another approach of 512 mod 448 would be to divide them 512/448 = 1.142..
Then you subtract 512 from result number before dot multiplied by 448:
512 - 448*1 == 64 That's your modulus result.
What you need to know that 448 is 64 bits shorter than multiple 512.
But what if it's between 448 and 512??
Normally we need to substract 448 by x(result of modulus).
447 mod 512 = 447; 448 - 447 = 1; (all good, 1 zero to pad)
449 mod 512 = 1; 448 - 449 = -1 ???
So this problem solution would be to take higher multiple of 512 but still shorter of 64;
512*2 - 64 = 960
449 mod 512 = 1; 960 - 449 = 511;
This happens because afterwards we need to add 64 bits original message and the full length have to be multiple of 512.
960 - 449 = 511;
511 + 449 + 64 = 1024;
1024 is multiple of 512;

Output in Vector form

I am getting output of reshape function as follow
s1 =
11
00
10
11
01
11
10
10
10
10
10
10
10
01
10
01
How to convert s1 as
[1 1 0 0 1 0 1 1 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1]
so i can pick up bit values
s1(1) will be 1
s1(3) will be 0
s1(5) will be 1
I have tried it with reshape and transpose but not picking up correct bit values. Appreciate any help..
I am doing following operation with below code
converting cipher text to bytes, then I am calculating index variable (called as p) & formula is MOD(No of Bytes,3).. I have ciphertext length as 5 bytes so Index Variable (p) is 2.. I will always have index varaible values as 0 or 1 or 2 which will be based on no. of Bytes
Say ciphertext is 11001011 01111010 10101010 10011001 01010101
This data is five bytes there for inde variable is 2
11001011 01111010 10101010 10011001 01010101
Now
for first two bits (11) , index variable to be assigned as 2
for next two bits (00), index variable to be assigned as 0
for next two bits (10), index variable to be assigned as 1
for next two bits (11), index variable to be assigned as 2..so on till end of my bits.
Other Example
Ciphertet with Three Bytes 11001011 01111010 10101010
Index Variable (p) will be 0
for first two bits (11) , index variable to be assigned as 0
for next two bits (00), index variable to be assigned as 1
for next two bits (10), index variable to be assigned as 2
for next two bits (11), index variable to be assigned as 0..
so on till end of my bits..
s = '11001011 01111010 10101010 10011001 01010101'
p = rem(numel(regexp(s,' [01]'))+1,3)
k = (0:2)'
s1 = reshape(regexprep(s,' ',''),2,[])'
n = size(s1,1)
N = k(:,ones(fix((n+1)/3)+1,1))
P = N(find(N(:,1) == p)+(0:n-1))'
Well s1' will give you
[11 00 10 11 ...]
but if you show us the input to your function I might be able to give you the answer you really want.
The key to this question is that the elements are of char type. Then you could use reshape like this:
reshape(s1.',1, [])
ans = 11001011...
This is a similar question.

Matlab, Image compression

i am unsure about what this is asking me to do in matlab? what does it mean to encode? what format should the answer be? can anyone help me to work it out please?
Encode the 8x8 image patch and print out the results
I have got an 8X8 image
symbols=[0 20 50 99];
p=[32 8 16 8];
p = p/sum(p);
[dict, avglen] = huffmandict(symbols, p);
A = ...
[99 99 99 99 99 99 99 99 ...
20 20 20 20 20 20 20 20 ...
0 0 0 0 0 0 0 0 ...
0 0 50 50 50 50 0 0 ...
0 0 50 50 50 50 0 0 ...
0 0 50 50 50 50 0 0 ...
0 0 50 50 50 50 0 0 ...
0 0 0 0 0 0 0 0];
comp=huffmanenco(A,dict);
ratio=(8*8*8)/length(comp)
Do you understand the principle of Huffman coding?
To put it simply, it is an algorithm used to compress data (like images in your case). This means that the input of the algorithm is an image and the output is a numeric code that is smaller in size than the input: hence the compression.
The principle of Huffman coding is (roughly) to replace symbols in the original data (in your case the value of each pixel of the image) by a numeric code that is attributed according to the probability of the symbol. The most probable (i.e. the most common) symbol will be replaced by shorter codes in order to realize a compression of the data.
To solve your problem, Matlab has two functions in the Communications Toolbox: huffmandict and huffmanenco.
huffmandict: this function build a dictionary that is used to translate symbols from the original data to their numeric Huffman codewords. To build this dictionary, huffmandict needs the list of symbols used in the data and their probability of appearance which is the number of time they are used divided by the total number of symbols in your data.
huffmanenco: this function is used to translate your original data by using the dictionary built by huffmandict. Each symbol in the original data is translated to a numeric Huffman code. To measure the gain in size of this compression method, you can compute the compression ration, which is the ratio between the number of bits used to describe your original data and the number of bits of the Huffman corresponding code. In your case, infering from your computation of the compression ratio, you have an 8 by 8 image using 8 bits integer to describe each pixel, and the Huffman corresponding code uses length(comp) bits.
With all this in mind, you could read your code in this way:
% Original image
A = ...
[99 99 99 99 99 99 99 99 ...
20 20 20 20 20 20 20 20 ...
0 0 0 0 0 0 0 0 ...
0 0 50 50 50 50 0 0 ...
0 0 50 50 50 50 0 0 ...
0 0 50 50 50 50 0 0 ...
0 0 50 50 50 50 0 0 ...
0 0 0 0 0 0 0 0];
% First step: extract the symbols used in the original image
% and their probability (number of occurences / number of total symbols)
symbols=[0 20 50 99];
p=[32 8 16 8];
p=p/sum(p);
% To do this you could also use the following which automatically extracts
% the symbols and their probability
[symbols,p]=hist(A,unique(A));
p=p/sum(p);
% Second step: build the Huffman dictionary
[dict,avglen]=huffmandict(symbols,p);
% Third step: encode your original image with the dictionary you just built
comp=huffmanenco(A,dict);
% Finally you can compute the compression ratio
ratio=(8*8*8)/length(comp)