In HLSL, how would I go about packing two floats within the range of 0-1 into one float with an optimal precision. This would be incredibly useful to compress my GBuffer further.
//Packing
float a = 0.45;
float b = 0.55;
uint aScaled = a * 0xFFFF;
uint bScaled = b * 0xFFFF;
uint abPacked = (aScaled << 16) | (bScaled & 0xFFFF);
float finalFloat = asfloat(abPacked);
//Unpacking
float inputFloat = finalFloat;
uint uintInput = asuint(inputFloat);
float aUnpacked = (uintInput >> 16) / 65535.0f;
float bUnpacked = (uintInput & 0xFFFF) / 65535.0f;
Converting floating point numbers to fixed point integers is an error prone idea, due to floats covering much larger magnitudes. Say unpacking sRGB will give you pow(255,2.2) values, which are larger than 0xffff, and you will need several times than amount for robust HDR. Generally fixed point code is very fragile, obfuscated and a nightmare to debug. People invented floats for a good reason.
There are several 16-bit float formats. IEEE 16-bit float one is optimized for numbers between -1.0 to 1.0, but also support numbers up to 0x10000, just in case you need HDR, still so you will need to normalize your larger floats for it, Then there is bfloat16, which behaves like normal 32-bit float, just with less precision. IEEE 16-bit floats are widely supported by modern CPUs and GPUs, and can also be converted quickly even in software. bfloat16 is just gaining popularity, so you will have to research if it is suitable for your needs. Finally you can introduce your own 16-bit float format, using integer log function, which is provided by most CPUs as a single instruction.
Related
I got an error when trying to multiply a number with a floating point in dart. Does anyone know why this happens?
void main() {
double x = 37.8;
int y = 100;
var z = x * y;
print(z);
// 3779.9999999999995
}
In other languages (C#/C++) I would have the result: 3780.0
This is completely expected because 37.8 (or rather, the 0.8 part) cannot be precisely encoded as a binary fraction in the IEEE754 standard so instead you get a close approximation that will include an error term in the LSBs of the fraction.
If you need numbers that are lossless (e.g. if you are handling monetary calculations) then check out the decimal package.
A simpler hack if your floating point number has sufficient bits allocated to the fraction to keep the erroroneous bits out of the way is to round off the number after your calculation to the number of decimal places that you care about.
You can use toStringAsFixed to control fraction digits.
void main() {
double x = 37.8;
int y = 100;
var z = x * y;
print(z.toStringAsFixed(1));
// 3780.0
}
I was converting Float => CGFloat and it gave me following result. Why It comes as "0.349999994039536" after conversion but works fine with Double => CGFloat?
let float: Float = 0.35
let cgFloat = CGFloat(float)
print(cgFloat)
// 0.349999994039536
let double: Double = 0.35
let cgFloat = CGFloat(double)
print(cgFloat)
// 0.35
Both converting “.35” to float and converting “.35” to double produce a value that differs from .35, because the floating-point formats use a binary base, so the exact mathematical value must be approximated using powers of two (negative powers of two in this case).
Because the float format uses fewer bits, its result is less precise and, in this case, less accurate. The float value is 0.3499999940395355224609375, and the double value is 0.34999999999999997779553950749686919152736663818359375.
I am not completely conversant with Swift, but I suspect the algorithm it is using to convert a CGFloat to decimal (with default options) is something like:
Produce a fixed number of decimal digits, with correct rounding from the actual value of the CGFloat to the number of digits, and then suppress any trailing zeroes. For example, if the exact mathematical value is 0.34999999999999997…, and the formatting uses 15 significant digits, the intermediate result is “0.350000000000000”, and then this is shorted to “0.35”.
The way this operates with float and double is:
When converted to double, .35 becomes 0.34999999999999997779553950749686919152736663818359375. When printed using the above methods, the result is “0.35”.
When converted to float, .35 becomes 0.3499999940395355224609375. When printed using the above method, the result is “0.349999994039536”.
Thus, both the float and double values differ from .35, but the formatting for printing does not use enough digits to show the deviation for the double value, while it does use enough digits to show the deviation for the float value.
we, developers very often need to calculate angle to perform rotation. Usually we can use atan2() function but sometimes we need more precision. What do you do then?
I know that theoretically atan2 is precise but in my system (iOS) it's inaccurate about 0.05 radians so it's big difference. That's not just my problem. I've seen similar opinions.
atan2 is used to get an angle a from a vector (x,y). If then you use this angle to apply a rotation you will use cos(a) and sin(a). You could simply compute cos and sin by normalizing (x,y), and keep them instead of the angle. Precision will be higher, and you will save a lot of cycles lost in trigonometric functions.
Edit. If you really want an angle from (x,y), it can be computed using variants of CORDIC to the precision you need.
you can use atan2l if long double has more precision than double in your system.
long double atan2l(long double y, long double x);
On iOS, I've found that the standard trigonometry operators are precise to roughly 13 or 14 decimal digits, so it sounds very odd that you're seeing errors on the order of 0.05 radians. If you can produce code and specific values that demonstrate this, please file a bug report on the behavior (and post the code here so that we can have a record of it).
That said, if you really need high precision for your trigonometry operators, I've modified a few of the routines that Dave DeLong created for his DDMathParser code. These routines use NSDecimal for performing the math, giving you up to ~34 digits of decimal precision while avoiding your standard floating point problems with representing base 10 decimals. You can download the code for these modified routines from here.
An NSDecimal version of atan() is calculated using the following code:
NSDecimal DDDecimalAtan(NSDecimal x) {
// from: http://en.wikipedia.org/wiki/Inverse_trigonometric_functions#Infinite_series
// The normal infinite series diverges if x > 1
NSDecimal one = DDDecimalOne();
NSDecimal absX = DDDecimalAbsoluteValue(x);
NSDecimal z = x;
if (NSDecimalCompare(&one, &absX) == NSOrderedAscending)
{
// y = x / (1 + sqrt(1+x^2))
// Atan(x) = 2*Atan(y)
// From: http://www.mathkb.com/Uwe/Forum.aspx/math/14680/faster-Taylor-s-series-of-Atan-x
NSDecimal interiorOfRoot;
NSDecimalMultiply(&interiorOfRoot, &x, &x, NSRoundBankers);
NSDecimalAdd(&interiorOfRoot, &one, &interiorOfRoot, NSRoundBankers);
NSDecimal denominator = DDDecimalSqrt(interiorOfRoot);
NSDecimalAdd(&denominator, &one, &denominator, NSRoundBankers);
NSDecimal y;
NSDecimalDivide(&y, &x, &denominator, NSRoundBankers);
NSDecimalMultiply(&interiorOfRoot, &y, &y, NSRoundBankers);
NSDecimalAdd(&interiorOfRoot, &one, &interiorOfRoot, NSRoundBankers);
denominator = DDDecimalSqrt(interiorOfRoot);
NSDecimalAdd(&denominator, &one, &denominator, NSRoundBankers);
NSDecimal y2;
NSDecimalDivide(&y2, &y, &denominator, NSRoundBankers);
// NSDecimal two = DDDecimalTwo();
NSDecimal four = DDDecimalFromInteger(4);
NSDecimal firstArctangent = DDDecimalAtan(y2);
NSDecimalMultiply(&z, &four, &firstArctangent, NSRoundBankers);
}
else
{
BOOL shouldSubtract = YES;
for (NSInteger n = 3; n < 150; n += 2) {
NSDecimal numerator;
if (NSDecimalPower(&numerator, &x, n, NSRoundBankers) == NSCalculationUnderflow)
{
numerator = DDDecimalZero();
n = 150;
}
NSDecimal denominator = DDDecimalFromInteger(n);
NSDecimal term;
if (NSDecimalDivide(&term, &numerator, &denominator, NSRoundBankers) == NSCalculationUnderflow)
{
term = DDDecimalZero();
n = 150;
}
if (shouldSubtract) {
NSDecimalSubtract(&z, &z, &term, NSRoundBankers);
} else {
NSDecimalAdd(&z, &z, &term, NSRoundBankers);
}
shouldSubtract = !shouldSubtract;
}
}
return z;
}
This uses a Taylor series approximation, with some shortcuts to speed convergence. I believe that the precision might not be the full 34 digits at results very close to Pi / 4 radians, so I might still need to fix that.
If you need extreme precision this is an option, but again what you're reporting shouldn't be happening with double values, so there's something odd here.
Use angles very often? No, you don't. Out of 10 times that I have seen a developer use angles, 7 times he should have used linear algebra instead and avoid any trigoniometric functions.
A rotation is better done with a matrix, not with an angle. See also this question:
CGAffineTranformRotate atan2 inaccuration
I am developing an app and want to round off values
i.e if the output is 4.8 I want to display 4.8
while if the output is 4.0 , I want to display 4
Also, it would be great if I could precisely round values : as in if value is 4.34 then round to 4.3 while if its 4.37 then round it to 4.4
One way to round floating point values is to just add 0.5 and then truncate the value.
double valueToRound = GetTheValueFromSomewhere();
double roundedValue = (double)((int)(valueToRound + 0.5));
This will round 1.4 down to 1.0 and 1.5 up to 2.0 for example. To round to other decimal places as you mentioned, simply multiply the initial value by 10, or 100, etc. use the same sort of code, and then divide the result by the same number and you'll get the same result at whatever decimal place you want.
Here's an example for rounding at an arbitrary precision.
double valueToRound = GetTheValueFromSomewhere();
int decimalPrecisionAtWhichToRound = 0;
double scale = 10^decimalPrecisionAtWhichToRound;
double tmp = valueToRound * scale;
tmp = (double)((int)(tmp + 0.5));
double roundedValue = tmp / scale;
So, if decimalPrecisionAtWhichToRound is set to 0 as in the above it'll round to the nearest whole integer. 1.4 will round to 1.0. 1.5 will round to 2.0.
If you set decimalPrecisionAtWhichToRound to 1, it would round to the nearest tenth. 1.45 would round to 1.5 and 1.43 would round to 1.4.
You need to first understand how to do rounding on paper, without someone showing you the code to do it. Write down some numbers and figure out how to round them.
To round to a specific decimal position you add half the value of that position and then truncate. Ie, 1.67 + 0.05 = 1.72 then truncate to 1.7.
But there are two tricky things in programming that aren't there when you do it on paper:
Knowing how to truncate -- There are several ways to do it while programming, but they're non-trivial.
Dealing with the fact that floating-point numbers are imprecise. Ie, there is no exact representation of, say, 1.7, but rather the two closest numbers are apt to be something like 1.69998 and 1.700001
For truncating the trick of multiplying the number by the appropriate power of 10 to produce an integer works pretty well. Eg, (1.67 + 0.05) * 10 = 17.2, then convert to int to get 17, then convert back to float and divide by 10 to get 1.7 (more or less). Or (if you're printing or displaying the value) just format the integer number with the decimal point inserted. (By formatting the integer value you don't have to deal with the problem of imprecise floating point representations.)
If you want to suppress trailing zeros it gets a bit trickier and you probably have to actually write some code -- format the number, then scan backwards and take off any trailing zeros up to the decimal point. (And take the decimal point too, if you wish.)
float number=17.125;
NSNumberFormatter *format = [[NSNumberFormatter alloc]init];
[format setNumberStyle:NSNumberFormatterDecimalStyle];
[format setRoundingMode:NSNumberFormatterRoundHalfUp];
[format setMaximumFractionDigits:2];
NSString *temp = [format stringFromNumber:[NSNumber numberWithFloat:number]];
NSLog(#"%#",temp);
double myNumber = 7.99;
NSString *formattedNumber = [NSString stringWithFormat:#"%.*f",
fmod(round(myNumber * 10), 10) ? 1 : 0, myNumber];
How do I round a float to three decimal places?
I have this:
(round(1000.0f * currentHue) / 1000.0f)
Which rounds it to 3 decimal places but leaves a bunch of 0s on the end.
Most numbers can't be represented precisely with floating point. If you need precision that bad pick appropriate scientific library. If you want to print it nicely, use formatters -
NSLog(#"%.3f", currentHue);
This is a fairly hairy way to do it... But it should work.
float x = [[NSString stringWithFormat:#"%1.3f",(round(1000.0f * currentHue) / 1000.0f)] floatValue];