I have a question.
Maybe it is simple.. but I wondered.
Is "Positive float" mean k>=0?
I want to ask positive float includes zero?
I wondered it because of Ridge regression hyper parameters tuning.
Thank you
Depends on the context. If we are talking about numbers having a type of float, then the float number is positive if its sign bit is 0:
Sign bit s (bit 31). The most significant bit represents the sign of
the number (1 for negative, 0 for positive).
https://introcs.cs.princeton.edu/java/91float/
As a matter of fact, positive zero and negative zero are existent. Both are representing a number with an infinitely small absolute value, converging towards zero, but a positive zero is infinitely slightly bigger than 0 and a negative zero is infinitely slightly smaller than 0. Both positive and negative zero have all 0 values for all bits, except the sign bit. In the case of a sign bit, a positive zero's sign bit is 0, a negative zero's sign bit is 1.
However, if we talk about positive float, and negative float in general, we get scheduling concepts. In this context, a positive float means that there is excess time for an activity, while a negative float means that there is no excess time for an activity, so the activity has to start as soon as possible.
Related
I am writing a program in Swift that takes the multiplicative inverse of random bytes. Sometimes, the byte is 0, and when the multiplicative inverse is taken, it results in inf.
The multiplicative inverse is being determined using
powf(Float(byte), -1.0)
byte is of type UInt8. If byte is equal to 0, the result is inf as mentioned earlier. How would the multiplicative inverse of 0 be infinity? Wouldn't the multiplicative inverse also be 0 since 0/0's multiplicative inverse is 0/0?
Short answer: By definition. In Swift (and many other languages), floating point numbers are backed by IEEE-754 definition of floats, which is directly implemented by the underlying hardware in most cases and thus quite fast. And according to that standard, division by 0 for floats is defined to be Infinity, and Swift is merely returning that result back to you. (To be precise, 0/0 is defined to be NaN, any positive number divided by 0 is defined to be Infinity, and any negative number divided by 0 is defined to be -Infinity.)
An interesting question to ask might be "why?" Why does IEEE-754 define division by 0 to be Infinity for floats, where one can reasonably also expect the machine to throw an error, or maybe define it as NaN (not-a-number), or perhaps maybe even 0? For an analysis of this, you should really read Kahan's (the designer of the semantics behind IEEE-754) own notes regarding this matter. Starting on page 10 of the linked document, he discusses why the choice of Infinity is preferable for division-by-zero, which essentially boils down to efficient implementation of numerical algorithms since this convention allows skipping of expensive tests in iterative numerical analysis. Start reading on page 10, and go through the examples he discusses, which ends on top of page 14.
To sum up: Floating point division by 0 is defined to be Infinity by the IEEE-754 standard, and there are good reasons for making this choice. Of course, one can imagine different systems adopting a different answer as well, depending on their particular need or application area; but then they wouldn't be IEEE-754 compliant.
Plugging in 0 just means it is 0 divided by some positive number. Then, the multiplicative inverse will be dividing by 0. As you probably know, this is undefined in mathematics, but in swift, it tries to calculate it. Essentially, it keeps subtracting 0 from the number, but never gets a result, so it will output infinity.
Edit: As Alias pointed out, Swift is not actually going through that process of continually subtracting 0. It will just return infinity anytime it is supposed to divide by 0.
The max values of int, float and long in Scala are:
Int.MaxValue = 2147483647
Float.MaxValue = 3.4028235E38
Long.MaxValue = 9223372036854775807L
From the authors of Scala compiler, Keynote, PNW Scala 2013, slide 16 What's Int.MaxValue between friends?:
val x1: Float = Long.MaxValue
val x2: Float = Long.MaxValue - Int.MaxValue
println (x1 == x2)
// NO WONDER NOTHING WORKS
Why does this expression return true?
A Float is a 4-byte floating point value. Meanwhile a Long is an 8-byte value and an Int is also a 4-byte value. However, the way numbers are stored in 4-byte floating point values means that they have only around 8 digits of precision. Consequently, they do not have the capacity to store even the 4 most significant bytes (around 9-10 digits) of a Long regardless of the value of the least 4 significant bytes (another 9-10 digits).
Consequently, the Float representation of the two expressions is the same, because the bits that differ are below the resolution of a Float. Hence the two values compare equal.
Echoing Mike Allen's answer, but hoping to provide some additional context (would've left this as a comment rather than a separate answer, but SO's reputation feature wouldn't let me).
Integers have a maximum range of values defined as either 0 to 2^n (if it is an unsigned integer) or -2^(n-1) to 2^(n-1) (for signed integers) where n is the number of bits in the underlying implementation (n=32 in this case). If you wish to represent a number larger than 2^31 with a signed value, you can't use an int. A signed long will work up to 2^63. For anything larger than this, a signed float can go up to roughly 2^127.
One other thing to note is that these resolution issues are only in force when the value stored in the floating point number approaches the max. In this case, the subtraction operation causes a change in true value that is many orders of magnitude smaller than the first value. A float would not round off the difference between 100 and 101, but it might round off the difference between 10000000000000000000000000000 and 10000000000000000000000000001.
Same goes for small values. If you cast 0.1 to an integer, you get exactly 0. This is not generally considered a failing of the integer data type.
If you are operating on numbers that are many orders of magnitude different in size, and also not able to tolerate rounding errors, you will need data structures and algorithms that account for inherent limitations of binary data representation. One possible solution would be to use a floating point encoding with fewer bits of exponential, thereby limiting the max value but providing for greater resolution is less significant bits. For greater detail, check out:
look up the IEEE Standard 754 (which defines the floating point encoding)
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
I know of these values:
CGFloat.infinity, which is greater than
CGFloat.greatestFiniteMagnitude, which is greater than
CGFloat.leastNormalMagnitude, which is greater than
CGFloat.leastNonzeroMagnitude, which is greater than
0
but it stops there... as far as I can tell. Where are the negative values? I imagine maybe it's as easy as placing a - before them, but then I worry that there might be strange exceptions.
How do I find the key negative numbers in CGFloat? Critically, the lowest negative non-infinite number.
Floating point numbers have a sign bit and negating them changes this sign bit to the opposite value (there is actually a negative zero).
Just put a minus before them. The negation is well defined.
I just can't understand fixed point and floating point numbers due to hard to read definitions about them all over Google. But none that I have read provide a simple enough explanation of what they really are. Can I get a plain definition with example?
A fixed point number has a specific number of bits (or digits) reserved for the integer part (the part to the left of the decimal point) and a specific number of bits reserved for the fractional part (the part to the right of the decimal point). No matter how large or small your number is, it will always use the same number of bits for each portion. For example, if your fixed point format was in decimal IIIII.FFFFF then the largest number you could represent would be 99999.99999 and the smallest non-zero number would be 00000.00001. Every bit of code that processes such numbers has to have built-in knowledge of where the decimal point is.
A floating point number does not reserve a specific number of bits for the integer part or the fractional part. Instead it reserves a certain number of bits for the number (called the mantissa or significand) and a certain number of bits to say where within that number the decimal place sits (called the exponent). So a floating point number that took up 10 digits with 2 digits reserved for the exponent might represent a largest value of 9.9999999e+50 and a smallest non-zero value of 0.0000001e-49.
A fixed point number just means that there are a fixed number of digits after the decimal point. A floating point number allows for a varying number of digits after the decimal point.
For example, if you have a way of storing numbers that requires exactly four digits after the decimal point, then it is fixed point. Without that restriction it is floating point.
Often, when fixed point is used, the programmer actually uses an integer and then makes the assumption that some of the digits are beyond the decimal point. For example, I might want to keep two digits of precision, so a value of 100 means actually means 1.00, 101 means 1.01, 12345 means 123.45, etc.
Floating point numbers are more general purpose because they can represent very small or very large numbers in the same way, but there is a small penalty in having to have extra storage for where the decimal place goes.
From my understanding, fixed-point arithmetic is done using integers. where the decimal part is stored in a fixed amount of bits, or the number is multiplied by how many digits of decimal precision is needed.
For example, If the number 12.34 needs to be stored and we only need two digits of precision after the decimal point, the number is multiplied by 100 to get 1234. When performing math on this number, we'd use this rule set. Adding 5620 or 56.20 to this number would yield 6854 in data or 68.54.
If we want to calculate the decimal part of a fixed-point number, we use the modulo (%) operand.
12.34 (pseudocode):
v1 = 1234 / 100 // get the whole number
v2 = 1234 % 100 // get the decimal number (100ths of a whole).
print v1 + "." + v2 // "12.34"
Floating point numbers are a completely different story in programming. The current standard for floating point numbers use something like 23 bits for the data of the number, 8 bits for the exponent, and 1 but for sign. See this Wikipedia link for more information on this.
The term ‘fixed point’ refers to the corresponding manner in which numbers are represented, with a fixed number of digits after, and sometimes before, the decimal point.
With floating-point representation, the placement of the decimal point can ‘float’ relative to the significant digits of the number.
For example, a fixed-point representation with a uniform decimal point placement convention can represent the numbers 123.45, 1234.56, 12345.67, etc, whereas a floating-point representation could in addition represent 1.234567, 123456.7, 0.00001234567, 1234567000000000, etc.
There's of what a fixed-point number is and , but very little mention of what I consider the defining feature. The key difference is that floating-point numbers have a constant relative (percent) error caused by rounding or truncating. Fixed-point numbers have constant absolute error.
With 64-bit floats, you can be sure that the answer to x+y will never be off by more than 1 bit, but how big is a bit? Well, it depends on x and y -- if the exponent is equal to 10, then rounding off the last bit represents an error of 2^10=1024, but if the exponent is 0, then rounding off a bit is an error of 2^0=1.
With fixed point numbers, a bit always represents the same amount. For example, if we have 32 bits before the decimal point and 32 after, that means truncation errors will always change the answer by 2^-32 at most. This is great if you're working with numbers that are all about equal to 1, which gain a lot of precision, but bad if you're working with numbers that have different units--who cares if you calculate a distance of a googol meters, then end up with an error of 2^-32 meters?
In general, floating-point lets you represent much larger numbers, but the cost is higher (absolute) error for medium-sized numbers. Fixed points get better accuracy if you know how big of a number you'll have to represent ahead of time, so that you can put the decimal exactly where you want it for maximum accuracy. But if you don't know what units you're working with, floats are a better choice, because they represent a wide range with an accuracy that's good enough.
It is CREATED, that fixed-point numbers don't only have some Fixed number of decimals after point (digits) but are mathematically represented in negative powers. Very good for mechanical calculators:
e.g, the price of smth is USD 23.37 (Q=2 digits after the point. ) The machine knows where the point is supposed to be!
Take the number 123.456789
As an integer, this number would be 123
As a fixed point (2), this
number would be 123.46 (Assuming you rounded it up)
As a floating point, this number would be 123.456789
Floating point lets you represent most every number with a great deal of precision. Fixed is less precise, but simpler for the computer..
int a=7
int b=10
float answer = (float)a/b;
answer=0.699999988 ( I expect 0.7 ??)
The short version is: Floating points are not accurate, it's only a finite set of bits, and a finite set of bits cannot be used to represent an infinite set of numbers.
The longer version is here: What Every Computer Scientist Should Know About Floating-Point Arithmetic
See also:
How is floating point stored? When does it matter?
Why is my number being rounded incorrectly?
Floating point numbers are accurate only to a certain finite number of digits of precision. You will need to do some rounding to get whole numbers.
If you need more precision, use the double data type, or the NSDecimal class (Which will preserve your decimal digits at the expense of complexity).
It is because floating point calculations are not precise.
The only thing I rely on is the existence of exact small integers (namely -2, -1, 0, 1, 2, as you might use for representing [0,1] plus some special values), and some people frown on using that too.