I am confused about BitSet. Does a BitSet data structure stores 1s and 0s?
val b = BitSet(0, 2, 3)
means store 1s for bit locations 0, 2 and 3?
If so, what are the max. no. of bits, 32 or 64?
A BitSet in Scala is implemented as an Array[Long], where each bit signals the presence of a number in the array. Long is 64 bit in Scala (on the JVM). One such Long can store values 0 to 63, the next one after it 64 to 127, on so on. This is possible since we're only talking about positive numbers and don't need to account for the sign.
Given your example:
BitSet(0, 2, 3)
We can store all these numbers inside a single Long, which in binary would be:
1101
Since we're in the range of 0 to 63, this works on a single Long value.
In general, the upper limit, or the biggest value stored inside a BitSet in Scala is Int.MaxValue, meaning 2^31-1 (2147483647). In order to store it, you'd need 2147483647 / 64 "bits" representing the number, which is ~= 33554432 longs. This is why storing large numbers in a bit set can get quite expensive, and it is usually only recommended when you're dealing with numbers in around the hundreds.
As a side note, immutable.BitSet has a special implementation in Scala (of the BitSetLike trait), namely BitSet1 and BitSet2, which are backed by a one and two longs, respectively, avoiding the need to allocate an additional array to wrap them.
From the documentation:
Bitsets are sets of non-negative integers which are represented as
variable-size arrays of bits packed into 64-bit words. The memory
footprint of a bitset is determined by the largest number stored in
it.
Given that the API deals with adding and removing Ints, then it seems reasonable to believe the maximum bit that can be set is a max integer, i.e. 2^31-1. Looking at the source for scala.collection.immutable.BitSet, there's also an assert to disallow negative integers (which makes sense according to the above description):
def + (elem: Int): BitSet = {
require(elem >= 0, "bitset element must be >= 0")
Related
Background
In the past I've written an encoder/decoder for converting an integer to/from a string using an arbitrary alphabet; namely this one:
abcdefghjkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ23456789
Lookalike characters are excluded, so 1, I, l, O, and 0 are not present in this alphabet. This was done for user convenience and to make it easier to read and to type out a value.
As mentioned above, my previous project, python-ipminify converts a 32-bit IPv4 address to a string using an alphabet similar to the above, but excluding upper-case characters. In my current undertaking, I don't have the constraint of excluding upper-case characters.
I wrote my own Python for this project using the excellent question and answer here on how to build a URL-shortener.
I have published a stand-alone example of the logic here as a Gist.
Problem
I'm now writing a performance-critical implementation of this in a compiled language, most likely Rust, but I'd need to port it to other languages as well.. I'm also having to accept an arbitrary-length array of bytes, rather than an arbitrary-width integer, as is the case in Python.
I suppose that as long as I use an unsigned integer and use consistent endianness, I could treat the byte array as one long arbitrary-precision unsigned integer and do division over it, though I'm not sure how performance will scale with that. I'd hope that arbitrary-precision unsigned integer libraries would try to use vector instructions where possible, but I'm not sure how this would work when the input length does not match a specific instruction length, i.e. when the input size in bits is not evenly divisible by supported instructions, e.g. 8, 16, 32, 64, 128, 256, 512 bits.
I have also considered breaking up the byte array into 256-bit (32 byte) blocks and using SIMD instructions (I only need to support x86_64 on recent CPUs) directly to operate on larger unsigned integers, but I'm not exactly sure how to deal with size % 32 != 0 blocks; I'd probably need to zero-pad, but I'm not clear on how I would know when to do this during decoding, i.e. when I don't know the underlying length of the source value, only that of the decoded value.
Question
If I'm going the arbitrary unsigned integer width route, I'd essentially be at the mercy of the library author, which is probably fine; I'd imagine that these libraries would be fairly optimized to vectorize as much as possible.
If I try to go the block route, I'd probably zero-pad any remaining bits in the block if the input length was not divisible by the block size during encoding. However, would it even be possible to decode such a value without knowing the decoded value size?
For a mandelbrot generator I want to used fixed point arithmetic going from 32 up to maybe 1024 bit as you zoom in.
Now normaly SSE or AVX is no help there due to the lack of add with carry and doing normal integer arithmetic is faster. But in my case I have literally millions of pixels that all need to be computed. So I have a huge vector of values that all need to go through the same iterative formula over and over a million times too.
So I'm not looking at doing a fixed point add/sub/mul on single values but doing it on huge vectors. My hope is that for such vector operations AVX/AVX2 can still be utilized to improve the performance despite the lack of native add with carry.
Anyone know of a library for fixed point arithmetic on vectors or some example code how to do emulate add with carry on AVX/AVX2.
FP extended precision gives more bits per clock cycle (because double FMA throughput is 2/clock vs. 32x32=>64-bit at 1 or 2/clock on Intel CPUs); consider using the same tricks that Prime95 uses with FMA for integer math. With care it's possible to use FPU hardware for bit-exact integer work.
For your actual question: since you want to do the same thing to multiple pixels in parallel, probably you want to do carries between corresponding elements in separate vectors, so one __m256i holds 64-bit chunks of 4 separate bigintegers, not 4 chunks of the same integer.
Register pressure is a problem for very wide integers with this strategy. Perhaps you can usefully branch on there being no carry propagation past the 4th or 6th vector of chunks, or something, by using vpmovmskb on the compare result to generate the carry-out after each add. An unsigned add has carry out of a+b < a (unsigned compare)
But AVX2 only has signed integer compares (for greater-than), not unsigned. And with carry-in, (a+b+c_in) == a is possible with b=carry_in=0 or with b=0xFFF... and carry_in=1 so generating carry-out is not simple.
To solve both those problems, consider using chunks with manual wrapping to 60-bit or 62-bit or something, so they're guaranteed to be signed-positive and so carry-out from addition appears in the high bits of the full 64-bit element. (Where you can vpsrlq ymm, 62 to extract it for addition into the vector of next higher chunks.)
Maybe even 63-bit chunks would work here so carry appears in the very top bit, and vmovmskpd can check if any element produced a carry. Otherwise vptest can do that with the right mask.
This is a handy-wavy kind of brainstorm answer; I don't have any plans to expand it into a detailed answer. If anyone wants to write actual code based on this, please post your own answer so we can upvote that (if it turns out to be a useful idea at all).
Just for kicks, without claiming that this will be actually useful, you can extract the carry bit of an addition by just looking at the upper bits of the input and output values.
unsigned result = a + b + last_carry; // add a, b and (optionally last carry)
unsigned carry = (a & b) // carry if both a AND b have the upper bit set
| // OR
((a ^ b) // upper bits of a and b are different AND
& ~r); // AND upper bit of the result is not set
carry >>= sizeof(unsigned)*8 - 1; // shift the upper bit to the lower bit
With SSE2/AVX2 this could be implemented with two additions, 4 logic operations and one shift, but works for arbitrary (supported) integer sizes (uint8, uint16, uint32, uint64). With AVX2 you'd need 7uops to get 4 64bit additions with carry-in and carry-out.
Especially since multiplying 64x64-->128 is not possible either (but would require 4 32x32-->64 products -- and some additions or 3 32x32-->64 products and even more additions, as well as special case handling), you will likely not be more efficient than with mul and adc (maybe unless register pressure is your bottleneck).As
As Peter and Mystical suggested, working with smaller limbs (still stored in 64 bits) can be beneficial. On the one hand, with some trickery, you can use FMA for 52x52-->104 products. And also, you can actually add up to 2^k-1 numbers of 64-k bits before you need to carry the upper bits of the previous limbs.
The max values of int, float and long in Scala are:
Int.MaxValue = 2147483647
Float.MaxValue = 3.4028235E38
Long.MaxValue = 9223372036854775807L
From the authors of Scala compiler, Keynote, PNW Scala 2013, slide 16 What's Int.MaxValue between friends?:
val x1: Float = Long.MaxValue
val x2: Float = Long.MaxValue - Int.MaxValue
println (x1 == x2)
// NO WONDER NOTHING WORKS
Why does this expression return true?
A Float is a 4-byte floating point value. Meanwhile a Long is an 8-byte value and an Int is also a 4-byte value. However, the way numbers are stored in 4-byte floating point values means that they have only around 8 digits of precision. Consequently, they do not have the capacity to store even the 4 most significant bytes (around 9-10 digits) of a Long regardless of the value of the least 4 significant bytes (another 9-10 digits).
Consequently, the Float representation of the two expressions is the same, because the bits that differ are below the resolution of a Float. Hence the two values compare equal.
Echoing Mike Allen's answer, but hoping to provide some additional context (would've left this as a comment rather than a separate answer, but SO's reputation feature wouldn't let me).
Integers have a maximum range of values defined as either 0 to 2^n (if it is an unsigned integer) or -2^(n-1) to 2^(n-1) (for signed integers) where n is the number of bits in the underlying implementation (n=32 in this case). If you wish to represent a number larger than 2^31 with a signed value, you can't use an int. A signed long will work up to 2^63. For anything larger than this, a signed float can go up to roughly 2^127.
One other thing to note is that these resolution issues are only in force when the value stored in the floating point number approaches the max. In this case, the subtraction operation causes a change in true value that is many orders of magnitude smaller than the first value. A float would not round off the difference between 100 and 101, but it might round off the difference between 10000000000000000000000000000 and 10000000000000000000000000001.
Same goes for small values. If you cast 0.1 to an integer, you get exactly 0. This is not generally considered a failing of the integer data type.
If you are operating on numbers that are many orders of magnitude different in size, and also not able to tolerate rounding errors, you will need data structures and algorithms that account for inherent limitations of binary data representation. One possible solution would be to use a floating point encoding with fewer bits of exponential, thereby limiting the max value but providing for greater resolution is less significant bits. For greater detail, check out:
look up the IEEE Standard 754 (which defines the floating point encoding)
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
Initially, I had documents in the form of: resources:{wood:123, coal:1, silver:5} and boxes:{wood:999, coal:20}. In this example, my server's code tests (quite efficiently) if there is enough space for the wood (and it is) and enough space for the coal (and it is) and enough space for the silver (there is not, if space is 0 I don't even include it in boxes) then all is well.
I want to shorten the _id value from wood, coal, silver to a numeric representation which in turns occupies less space, packets of information are smaller when communicating to and from the client / server, etc.
I am curious about using 0, 1, 2...as numbers for the _id or _0, _1, _2...
What are the advantages of using Number or String? Are Numbers faster for queries? (ignoring index speed).
I am adding these values manually btw :P
The number of bytes necessary to represent an integer can be found by taking the integer and dividing by 256. The number of bytes necessary to represent a string are the number of characters in the string.
It takes only one byte to represent the numbers 87 or 202, but it takes two and three bytes to represent the same as a string (plus one more if you use the underscore).
Integers are almost certainly what you want here. However, if you're concerned about over-the-wire size, then you might see gains by shortening your keys. Rather than using wood, coal, and silver, you could use w, c, and s, saving you 11 bytes per record pulled.
I'm testing a photo application for Facebook. I'm getting object IDs from the Facebook API, but I received some incorrect ones, which doesn't make sense - why would Facebook send wrong IDs? I investigated a bit and found out that numbers with 17 and more digits are automatically turning into even numbers!
For example, let's say the ID I'm supposed to receive from Facebook is 12345678912345679. In the debugger, I've noticed that Flash Player automatically turns it into 12345678912345678. And I even tried to manually set it back to an odd number, but it keeps changing back to even.
Is there any way to stop Flash Player from rounding the numbers? BTW the variable is defined as Object, I receive it like that from Facebook.
This is related to the implementation of data types:
int is a 32-bit number, with an even distribution of positive and
negative values, including 0. So the maximum value is
(2^32 / 2 ) - 1 == 2,147,483,647.
uint is also a 32-bit number, but it doesn't have negative values. So the
maximum value is
2^32 - 1 == 4,294,967,295.
When you use a numerical value greater than the maximum value of int or uint, it is automatically cast to Number. From the Adobe Doc:
The Number data type is useful when you need to use floating-point
values. Flash runtimes handle int and uint data types more efficiently
than Number, but Number is useful in situations where the range of
values required exceeds the valid range of the int and uint data
types. The Number class can be used to represent integer values well
beyond the valid range of the int and uint data types. The Number data
type can use up to 53 bits to represent integer values, compared to
the 32 bits available to int and uint.
53 bits have a maximum value of:
2^53 - 1 == 9,007,199,254,740,989 => 16 digits
So when you use any value greater than that, the inner workings of floating point numbers apply.
You can read about floating point numbers here, but in short, for any floating point value, the first couple of bits are used to specify a multiplication factor, which determines the location of the point. This allows for a greater range of values than are actually possible to represent with the number of bits available - at the cost of reduced precision.
When you have a value greater than the maximum possible integer value a Number could have, the least significant bit (the one representing 0 and 1) is cut off to allow for a more significant bit (the one representing 2^54) to exist => hence, you lose the odd numbers.
There is a simple way to get around this: Keep all your IDs as Strings - they can have as many digits as your system has available bytes ;) It's unlikely you're going to do any calculations with them, anyway.
By the way, if you had a value greater than 2^54-(1+2), your numbers would be rounded down to the next multiple of 4; if you had a value greater than 2^55-(1+2+4), they would be rounded down to the next multiple of 8, etc.