I've been using matlab to solve some boundary value problems lately, and I've noticed an annoying quirk. Suppose I start with the interval [0,1], and I want to search inside it. Naturally, one would perform a binary search, so I would subdivide the interval into [0,0.5] and [0.5,1]. Excellent: let's now suppose we narrow down our search to [0.5,1]. Now we divide the interval [0.5,0.75] and [0.75,1]. No apparent problem yet. However, as we keep going, representation of powers of 2 in base 10 becomes less and less natural. For example, 2^-22 in binary is just 22 bits, while in decimal it is 16 digits. However, keep in mind that each digit of decimal is really encoding ~ 4 bits. In other words, representing these fractions as decimal is extremely inefficient.
Matlab's precision only extends to 16 digit decimal floats, so a binary search going to 2^-22 is as good as you can do. However, 2^-22 ~ 10^-7, which is much bigger than 10^-16, so the best search strategy in matlab seems to be a decimal search! In any case, this is what I have done so far: to take full advantage of the 16 digit precision, I've had to subdivide the interval [0,1] into 10 pieces.
Hopefully I've made my problem clear. So, my question is: how do I make matlab count in native binary? I want to work with 64 bit binary floats!
Related
I have a task to use Marie Simulator to calculate the area of a circle
requiring its radius
I know that in Marie Language there is no multiplication operator so we use multiplication by adding numbers several times so If I wanted to multiply 2*3 I could write it down like 3+3 or 2+2+2
but when using the area of a circle there is pi which is 3.14 I can't imagine how could I get it so can anyone give me the algorithm or code for that ?
thanks in advance.
MARIE does not have floating point support.
So, should refer to your course work or ask your instructors what to do, as it is not obvious.
It is, of course, possible to do floating point in software, but the complexity is extraordinary, so unlikely to be what the're looking for.
You could use fixed point arithmetic, fractions, or decimal.
Here's one solution that might be appropriate: multiply one of the numbers (having decimal places) by some fixed constant factor, do the arithmetic, then interpret answers accordingly. For example, let's use 100 as the factor, so 3.14 is represented by 314. Let's say r is 9, so we can square that (9x9=81), then multiply 81 x 314 = 25434. Now we know that value is 100x too large, so the real answer is 254.34. (You can choose to ignore the .34, or, round it, then ignore. 254 is still more accurate than 243 which we would get from 9x9x3.)
Fixed point multiplies all numbers by the constant (usually a power of 2, so that the binary point is in the same bit position). Additions are relatively straightforward, but multiplications need to interpret results by factoring in (or out) that both sources are in scaled, meaning the answer is doubly scaled.
If you need to measure radius also with decimal digits, e.g. 9.5, then you could scale both 9.5 and 3.14 by 100. Then we need 950x950, and multiply by 314. The answer will be 100x100x100 too large, so 1000000x too large. With this approach, 16 bits that MARIE offers will overflow, so you would need to use at least 32-bit arithmetic (not trivial on 16-bit machine).
You can use two different scaling factors, e.g. 9.5 as 95 and 3.14 as 314. Take 95x95x314, is 10000x too large, so interpret the answer accordingly. Still this will overflow MARIE's 16-bits
Fractions would maintain both a numerator and denominator for all numbers. So, 3.14 could be 314/100, and 9.5 could be 95/10 — and simplified 157/50 and 19/2. To add you have to find a common denominator, convert, then sum numerators. To multiply you multiply both numerators and denominators: numerator = 19x19x157, denominator = 2x2x50. Just fits in 16-bit unsigned arithmetic, but still overflows 16-bit signed arithmetic..
And finally binary coded decimal is more like a string format, where numbers are stored one decimal digit per byte or per nibble (packed decimal). Algorithms for addition and subtraction need to account for variable length inputs.
Big integer forms also use similar to binary coded decimal but compose much larger elements instead of single decimal digits.
All of these approaches require some thought, and the more limitations you want to remove, the more work required. So, I'd suggest to go back to your course to find what they really want.
I have a program which calculates probability values
(p-values),
but it is entering a very large negative number into the
exp function
exp(-626294.830) which evaluates to zero instead of the very small
positive number that it should be.
How can I get this to evaluate as a very small floating point number?
I have tried
Math::BigFloat,
bignum, and
bigrat
but all have failed.
Wolfram Alpha says that exp(-626294.830) is 4.08589×10^-271997... zero is a pretty close approximation to that ;-) Although you've edited and removed the context from your question, do you really need to work with such tiny numbers, or perhaps there is some way you could optimize your algorithm or scale your numbers?
Anyway, you are correct that code like Math::BigFloat->new("-626294.830")->bexp seems to take quite some time, even with the support of use Math::BigFloat lib => 'GMP';.
The only alternative I can offer at the moment is Math::Prime::Util::GMP's expreal, although you need to specify a precision to it.
use Math::Prime::Util::GMP qw/expreal/;
use Math::BigFloat;
my $e = Math::BigFloat->new(expreal(-626294.830,272000));
print $e->bnstr,"\n";
__END__
4.086e-271997
But on my machine, even that still takes ~20s to run, which brings us back to the question of potential optimization in other places.
Floating point numbers do not have infinite precision. Assuming the number is represented as an IEEE 754 double, we have 52 bits for a fraction, 11 bits for the exponent, and one bit for the sign. Due to the way exponents are encoded, the smallest positive number that can be represented is 2^-1022.
If we look at your number e^-626294.830, we can do a change of base and see that it equals 2^(log_2 e · -626294.830) = 2^-903552.445, which is significantly smaller than 2^-1022. Approximating your number as zero is therefore correct.
Instead of calculating this value using arbitrary-precision numerics, you are likely better off solving the necessary equations by hand, then coding this in a way that does not require extreme precision. For example, it is unlikely that you need the exact value of e^-626294.830, but perhaps just the magnitude. Then, you can calculate the logarithm instead of using exp().
A number like:
0.000000000000000000000000000000000000000123456
is difficult to store without a large performance penalty with the available numeric types in postgres. This question addresses a similar problem, but I don't feel like it came to an acceptable resolution. Currently one of my colleagues landed on rounding numbers like this to 15 decimal places and just storing them as:
0.000000000000001
So that the double precision numeric type can be used which prevents the penalty associated with moving to a decimal numeric type. Numbers that are this small for my purposes are more or less functionally equivalent, because they are both very small (and mean more or less the same thing). However, we are graphing these results and when a large portion of the data set would be rounded like this it looks exceptionally stupid (flat line on the graph).
Because we are storing tens of thousands of these numbers and operating on them, the decimal numeric type is not a good option for us as the performance penalty is too large.
I am a scientist, and my natural inclination would just be to store these types of numbers in scientific notation, but it does't appear that postgres has this kind of functionality. I don't actually need all of the precision in the number, I just want to preserve 4 digits or so, so I don't even need the 15 digits that the float numeric type offers. What are the advantages and disadvantages of storing these numbers in two fields like this:
1.234 (real)
-40 (smallint)
where this is equivalent to 1.234*10^-40? This would allow for ~32000 leading decimals with only 2 bytes used to store them and 4 bytes to store the real value, for a total of maximally 6 bytes per number (gives me the exact number I want to store and takes less space than the existing solution which consumes 8 bytes). It also seems like sorting these numbers would be much improved as you'd need only sort on the smallint field first followed by the real field second.
You and/or your colleague seem to be confused about what numbers can be represented using the floating point formats.
A double precision (aka float) number can store at least 15 significant digits, in the range from about 1e-307 to 1e+308. You have to think of it as scientific notation. Remove all the zeroes and move that to the exponent. If whatever you have once in scientific notation has less than 15 digits and an exponent between -307 and +308, it can be stored as is.
That means that 0.000000000000000000000000000000000000000123456 can definitely be stored as a double precision, and you'll keep all the significant digits (123456). No need to round that to 0.000000000000001 or anything like that.
Floating point numbers have well-known issue of exact representation of decimal numbers (as decimal numbers in base 10 do not necessarily map to decimal numbers in base 2), but that's probably not an issue for you (it's an issue if you need to be able to do exact comparisons on such numbers).
What are the advantages and disadvantages of storing these numbers in
two fields like this
You'll have to manage 2 columns instead of one.
Roughly, what you'll be doing is saving space by storing lower-precision floats. If you only need 4 digits of precision, you can go further and save 2 more bytes by using smallint + smallint (1000-9999 + exponent). Using that format, you could cram the two smallint into one 32 bits int (exponent*2^16 + mantissa), that should work too.
That's assuming that you need to save storage space and/or need to go beyond the +/-308 digits exponent limit of the double precision float. If that's not the case, the standard format is fine.
I am having issue with rounding a float in iPhone application.
float f=4.845;
float s= roundf(f * 100.0)/100;
NSLog(#"Output-1: %.2f",s);
s= roundf(484.5)/100;
NSLog(#"Output-2: %.2f",s);
Output-1: 4.84
Output-2: 4.85
Let me know whats problem in this and how to solve this.
The problem is that you don't yet realise one of the inherent problems with floating point: the fact that most numbers cannot be represented exactly (a).
This means that 4.845 is likely to be, in reality, something like 4.8449999999999 which, when you round it, gives you 4.84 rather than what you expect, 4.85.
And what value you end up with also depends on how you calculate it, which is why you're getting a different result.
And, of course, no floating point "inaccuracy" answer would be complete on SO without the authoritative What Every Computer Scientist Should Know About Floating-Point Arithmetic.
(a) Only sums of exact powers of two, within a certain similar range, can be exactly rendered in IEEE754. So, for example, 484.5 is
256 + 128 + 64 + 32 + 4 + 0.5 (28 + 27 + 26 + 25 + 22 + 2-1).
See this answer for a more detailed look into the IEEE754 format.
As to solving it, you have a few choices. One is to use double instead of float. That gives you more precision and greater range of numbers but only moves the problem further away rather than really solving it. Since 0.1 is a repeating fraction in IEEE754, no amount of bits (short of infinity) can exactly represent it.
Another choice is to use a custom library like a big decimal type, which can represent decimals of arbitrary precision (that's not infinite precision as some people are wont to suggest, since it's limited by memory). This will reduce the errors caused by the binary/decimal mismatch.
You may also want to look into NSDecimalNumber - this doesn't give you arbitrary precision but it does give a large range with accurate decimal representation.
There'll still be numbers you can't represent, like PI or the square root of 2 or any other irrational number, but it should cover most cases. If you really need to handle those other values, you need to switch to symbolic numeric representations.
Unlike 484.5 which can be represented exactly as a float* , 4.845 is represented as 4.8449998 (see this calculator if you wish to try other numbers). Multiplying by one hundred keeps the number at 484.49998, which correctly rounds to 484.
* An exact representation is possible because its fractional part 0.5 is a power of two (i.e. 2^-1).
I have following code:
float totalSpent;
int intBudget;
float moneyLeft;
totalSpent += Amount;
moneyLeft = intBudget - totalSpent;
And this is how it looks in debugger: http://www.braginski.com/math.tiff
Why would moneyLeft calculated by the code above is .02 different compared to the expression calculated by the debugger?
Expression windows is correct, yet code above produces wrong by .02 result. It only happens for a very large numbers (yet way below int limit)
thanks
A single-precision float has 23 bits of precision. That means that every calculation is rounded to 23 binary digits. This means that if you have a computation that, say, adds a very small number to a very large number, rounding may result in strange results.
Imagine that you are doing math in scientific notation decimal by hand, under the rule that you may only have four significant figures. Let's say I ask you to write twelve in scientific notation, with four significant figures. Remembering junior high school, you write:
1.200 × 101
Now I say compute the square of 12, and then add 0.5. That is easy enough:
1.440×102 + 0.005×102 = 1.445×102
How about twelve cubed plus 0.75:
1.728×103 + 0.00075×103 = 1.72875×103
But remember, I only gave you room for four significant digits, so you must round; then we get:
1.728×103 + 7.5×10-1 = 1.729×103
See? The lack of precision can make the computation come out with unexpected results.
In your example, you've got 999999 in a calculation where you're trying to be precise to 0.01. log2(999999) = 19.93 and log2(0.01) = -6.64. The difference is more than 23; therefore you would need more than 23 binary digits to perform this calculation accurately.
Because floating point mathematics rounds-off precision by its very nature, it is usually a bad choice for currency computation, where you must be accurate to the last cent. But are you really concerned with fractions of a cent in your application? If not, then why not do away with the decimal point altogether, and simply store cents (instead of dollars) in a 64-bit integer? 264¢ is more than the GDP of the entire planet.
Floating point will always produce strange results with money type calculations.
The golden rule is that floating point is good for things you measure litres,yards,lightyears,bushels etc. etc. but not for things you count like
sheep, beans, buttons etc.
Most money calculations are to do with counting pennies so use integer math
and you wont get the strange results. Either use a fixed decimal arithimatic
library (which would probably be overkill on an iPhone) or store your amounts
as whole numbers of cents and only convert to $ and cents on display.