How probable is that two exact same calculations give different results? - numbers

I am currently working on remaking an old invoicing program that was originally written in VB6.
It has two parts, one on an android tablet, the other on a pc. The old database used , stored derived values because there was a chance that the calculations would be incorrect if repeated.
For example if one sold 5 items whose price was 10 euros at 10% discount and a tax value of 23% , it would store the above 4 values but also the result of the calucation of (5 * (10 * 1.23)) * 0.9.
I do not really like having duplicate or derivable information in my database, but the actual sell value must be the same, whether it is viewed on a tablet , or a pc.
So my question is , is there a chance (even the slightest one) that the above calucation (to a three decimal percision) would have different results on different operating systems (such as an android device and a desktop computer) ?
Thanks in advance for any help you can provide

Yes, it's possible. Floating-point arithmetic is always subject to rounding errors and different languages (and architectures) deal with those errors in different ways. There are best practices in dealing with these issues, though I don't consider myself knowledgeable enough to speak to them. But here are a couple of options for you.
Use a data type meant for floating-point arithmetic. For example, VB6 has a Single and Double type for floating point but also a Currency type for accurate decimal math.
Scale your floating-point values to integers and perform your calculations on these integer values. You can even store the results as integers in your DB. The ERP system we use does this and includes a data dictionary that defines how each type was scaled so that it can be "unscaled" before display.
Hope that helps.

Related

What is the most optimal method for calculating very, very large numbers? Namely, in context of an exponential game?

Right now, my current method for updating very large numbers is the follows:
Keep track of the health of a given monster, to be attacked, with both a
double to keep track of significant digits, and an int to keep track of the power
If the monster is attacked, only update the small double / exponent values used
By following the above, I am only ever dealing with calculations that are between small doubles between 0 and 10. However, I feel as though this is extremely complicated, in comparison to simply using bigintegers.
I have worked in Java previously, and using BigInt resulted in a HUGE performance loss when the exponential numbers became massive (e.g., 10 ^ 20000 +), if I remember correctly.
In the interest of coding a game, and not having to re-evaluate formulas later, what is the best course of action?
Am I really saving that much performance by keeping calculations between small numbers? Or is there an implementation of BigDouble / BigInt for Swift that makes any gain in performance by another implementation negligible? I am well versed in how to use BigDouble / BigInt, so I am really only concerned with the performance difference between using either of those versus an implementation of big numbers by splitting the number into a double (to represent significant digits) and an int (to represent the exponent).
Thank you, and if there needs to be any clarification, I can provide it.

Progress 4GL auto round off in case of decimal calculation

I am calculating this value in progress 31500 * (10 / 100) * (1 / 12) its giving 262.49999 but in scientific calculator of windows its giving 262.5... Why there is diffrence between these two values?
1 / 12 = 0.083333333333333333....
Preserving all precision the correct answer is 262.499999999999999...
(You could work it out by hand if you don't trust the computers.)
Which is why many calculations, especially those involving money, round numbers to a certain precision.
Rounding is not the same as truncation. In a business context the 4gl ROUND( expression, precision ) function is usually what you want.
Occasionally you do actually need to TRUNCATE() -- but that is rare when dealing with money. More frequent is a need for CEILING() (aka "round up") which, sadly, the 4GL does not provide. Stackoverflow, however, has a solution to that ;) How to round up in progress programming
I have lost count of the number of times I have seen homemade rounding code and had to explain why it doesn't do what the author expects. They used to teach this stuff in school but I guess they don't do that anymore?
There are other rounding rules that are used in scientific applications. But since this is a progress-4gl tagged question I doubt that they are relevant.
It's possible that your calculator is rounding up. (Which is what #jensd's DISPLAY statement is doing. Try using MESSAGE.) It's also possible that your calculator is storing values to more than nine decimal places (which is what Openedge uses) or has a special rounding error protection algorithm (which Openedge does not).
Sometimes the problem is that ABL does integer arithmetic because the values in the expression are integers; but this does not appear to be the case here. This is just a calculation that does not come out cleanly in 9dp.

How Can I decide what data type i must use in any programming language?

My English is not good so i apologize for it.
i experienced little about java and C++. But there is a problem. I only use integer for integer numbers and double for decimal numbers. There are many types like float, long int etc. Is there a specific way to decide what i must use?
It purely depends on the size of the data and of course the type of it. For example if you have a very large number that cannot fit within the size of a machine word (typically mapped to an int[eger] type) then you would choose long, and so forth.
For a small number I would go with char (since it occupies one byte in C/C++), or short if the number is greater than 255 but less than 65535, etc.
And all of these again depend on the programming language.
Be sure to check your programming language reference for the limits.
Hope that helps.
Different numerical data types are used for different value ranges. What range applies to what data type depends on the language you are using and the operating system, where the program is compiled/run.
For example, byte data type uses 1 byte of storage and can store numbers from 0 to 255. word data type usually uses 2 bytes of storage and can store numbers from 0 to 65,536. Then you get int - here the number of bytes vary, but often it would be 4 bytes with values of -2^31 to 2^31-1 - and so on. In C/C++ there also qualifiers signed and unsigned, which are not present in Java.
With float/double, not only the range of numbers, but also the precision (the number of decimal places that can be stored) will be one of the deciding factors. With double you can store a lot more decimal places than with single.
On the whole, the decision will be based on what data you need to store in it, how much memory you're willing to allocate and what platform you're running on. Check your language documentation for more details. For example, this page describes primitive data types in java.
You must check first for the type of data you want to store with the reference of data types provided in that programming language. Then very important you must check for the range of that data type...

What's the biggest number in a computer?

Just asked by my 5 year old kid: what is the biggest number in the computer?
We are not talking about max number for a specific data types, but the biggest number that a computer can represent.
Infinity is not allowed.
UPDATE my kid always wants to print as
well, so lets say the computer needs
to print this number and the kid to
know that its a big number. Of course,
in practice we won't print because
theres not enough trees.
This question is actually a very interesting one which mathematicians have devoted a fair bit of thought to. You can read about it in this article, which is a fascinating and accessible read.
Briefly, a guy named Tibor Rado set out to find some really big, but still well-defined, numbers by defining a sequence called the Busy Beaver numbers. He defined BB(n) to be the largest number of steps any Turing Machine could take before halting, given an input of n symbols. Note that this sequence is by its very nature not computable, so the numbers themselves, while well-defined, are very difficult to pin down. Here are the first few:
BB(1) = 1
BB(2) = 6
BB(3) = 21
BB(4) = 107
... wait for it ...
BB(5) >= 8,690,333,381,690,951
No one is sure how big exactly BB(5) is, but it is finite. And no one has any idea how big BB(6) and above are. But at least these numbers are completely well-defined mathematically, unlike "the largest number any human has ever thought of, plus one." ;)
So how about this:
The biggest number a computer can represent is the most instructions a program small enough to fit in its available memory can perform before halting.
Squared.
No, wait, cubed. No, raised to the power of itself!
Dammit!
Bits are not numbers. You, as a programmer, give them the meaning you want, possibly numbers.
Now, I decide that 1 represents "the biggest number ever thought by a human plus one".
Errr this is a five year old?
How about something along the lines of: "I'd love to tell you but the number is so big and would take so long to say, I'd die before I finished telling you".
// wait to see
for(;;)
{
printf("9");
}
roughly 2^AVAILABLE_MEMORY_IN_BITS
EDIT: The above is for actually storing a number and treats all media (RAM, HD, cloud etc.) as memory. Subtracting the OS footprint (measured in KB) doesn't make "roughly" less accurate...
If you want to "represent" a number in a meaningful way, then you probably want to go with what the CPU provides: unsigned 32 bit integers (roughly 4 Gigs) or unsigned 64 bit integers for most computers your kid will come into contact with.
NOTE for talking to 5-year-olds: Often, they just want a factoid. Give him a really big and very accurate number (lots of digits), like 4'294'967'295. Then, once the glazing leaves his eyes, try to see how far you can get with explaining how computers represent numbers.
EDIT #2: I once read this article: Who Can Name the Bigger Number that should provide a whole lot of interesting information for your kid. Obviously he's not your normal five-year-old. So this might get you started in a cool direction about numbers and computation.
The answer to life (and this kids question): 42
That depends on the datatype you use to represent it. The computer only stores bits (0/1). We, as developers, give the bits meaning. (65 can be a number or the letter A).
For example, I can define my datatype as 1^N where N is unsigned and represented by an array of bits of arbitrary size. The next person can come up with 10^N which would be ten times larger than my biggest number.
Sure, there would be gaps but if you don't need them, that doesn't matter.
Therefore, the question is meaningless since it doesn't have context.
Well I had the same question earlier this day, so thought why not to make a little c++ codes to see where the computer gonna stop ...
But my laptop wasn't with me in class so I used another, well the number was to big but it never ends, i'll run it again for a night then i'll share the number
you can try the code is stupid
#include <stdlib.h>
#include <stdio.h>
int main() {
int i = 0;
for (i = 0; i <= i; i++) {
printf("%i\n", i);
i++;
}
}
And let it run till it stops ^^
The size will obviously be limited by the total size of hard drives you manage to put into your PC. After all, you can store a number in a text file occupying all disk space.
You can have 4x2Tb drives even in a simple box so around 8Tb available. if you store as binary, then the biggest number is 2 pow 64000000000000.
If your hard drive is 1 TB (8'000'000'000'000 bits), and you would print the number that fits on it on paper as hex digits (nobody would do that, but let's assume), that's 2,000,000,000,000 hex digits.
Each page would contain 4000 hex digits (40 x 100 digits). That's 500,000,000 pages.
Now stack the pages on top of each other (let's say each page is 0.004 inches / 0.1 mm thick), then the stack would be as 5 km (about 3 miles) tall.
I'll try to give a practical answer.
Common Lisp number crunching is particularly powerful. It has something called "bignums" which are integers that can be arbitrarily large, limited by the amount of available.
See: http://en.wikibooks.org/wiki/Common_Lisp/Advanced_topics/Numbers#Fixnums_and_Bignums
Don't know much about theory, but I far as I understood from your question, is: what is the largest number that the computer can represent (and I add: in a reasonable time, and not printing "9" until the Earth will "be eaten by the Sun"). And I put my PC to make one simple calculation (in PHP or whatever language): echo pow(2,1023) - resulting: 8.9884656743116E+307. So I guess this is the largest number that my PC can calculate. On the other side, I think the respresentation of the largest negative number can be: -0,(0)1
LE: That computed value was obataind through PHP, but I tried to figure out what's the largest number that my windows calculator can compute, and it is pow(2, 33219) = 8.2304951207588748764521361245002E+9999. Now I guess this is the largest number my PC can handle.
I think you should be very proud that your 5 year old is already asking questions like this.
And you should continue to promote that! This is truly amazing! With that said, I would say that saying Infinity does not
count is thinking incorrectly about what numbers mean in computer memory.
I feel like this way of thinking is a handicap.
Mathematicians will never be able to write out ALL the digits of pi or eulers number, BUT we FULLY understand it.
Pi, as an example, is perfectly represented by infinite this series: (Pi / 4) = 1 - 1/3 + 1/5 - 1/7 + 1/9 - …
Just because you literally can’t go to inf. or print every single digit in a console means nothing.
You could have printed the symbol representing pi and therefore capturing the inf. series.
Computer Algebra Systems (CAS) represent numbers symbolically all the time. Pi, for instance,
may be a Symbolic object in memory (the binary in memory did not DIRECTLY represent the number. It represents an "mathematical algorithm" for producing the answer to arbitrary precision).
Then you do some math with it, transforming from one expression to the next.
At no point in time did we not represent the number COMPLETELY.
At the end, you can do 2 things with this:
A) Evaluate the expression, turning it into a number of some kind (or Matrix or whatever).
BUT this number could very well be an approximation (say like 20 digits of pi).
B) Keep it in its symbolic form for reference. Obviously we don’t like staring at symbols because we
need to eventually turn the nobs on the apparatii.
NOTE: sometimes you can get a finite (non-irrational) number perfectly represented in memory (like number 1)
by taking limits or going to inf. Not literally having an inf. number in memory, but symbolically representing it.
Just throw this in Wolfram alpha: Lim[Exp[-x], x --> Inf]; It gives you the number 0. Which is EXACT.
In short:
It was the HUMANS need to have some binary in memory that DIRECTLY represented the number that caused
the number to degrade. Symbolically it was perfectly represented. You could design some algorithm that
just continues to calculate the next digits of pi or eulers number giving you an arbitrary amount of precision (Now, this is obviously not practical of course).
I hope this was at least somewhat useful or interesting to you, even if you disagree =)
Depends on how much the computer can handle. Although there are some times when the computer can handle numbers greater than (2^(bits-1)-1)... For example:
My computer is 64 bit (9223372036854775807), however the calculator that comes with the computer itself can handle numbers of up to 10^9999.
Many other supercomputers can exceed these limits, and the one with the most memory (bits) might as well be the one with the record (current largest number that can be held by computers).
Or, if it comes to visually seeing it on computers, you can just make a program that, on monitor, repeats writing 9 and not skips that line to form an ever-growing bunch of 9. :P
go on chrome then go on three dots above and click them then go on tools and then go on developer tool click on console and type Number.MAX_VALUE

Uniquely identifying URLs with one 64-bit number

This is basically a math problem, but very programing related: if I have 1 billion strings containing URLs, and I take the first 64 bits of the MD5 hash of each of them, what kind of collision frequency should I expect?
How does the answer change if I only have 100 million URLs?
It seems to me that collisions will be extremely rare, but these things tend to be confusing.
Would I be better off using something other than MD5? Mind you, I'm not looking for security, just a good fast hash function. Also, native support in MySQL is nice.
EDIT: not quite a duplicate
If the first 64 bits of the MD5 constituted a hash with ideal distribution, the birthday paradox would still mean you'd get collisions for every 2^32 URL's. In other words, the probability of a collision is the number of URL's divided by 4,294,967,296. See http://en.wikipedia.org/wiki/Birthday_paradox#Cast_as_a_collision_problem for details.
I wouldn't feel comfortable just throwing away half the bits in MD5; it would be better to XOR the high and low 64-bit words to give them a chance to mix. Then again, MD5 is by no means fast or secure, so I wouldn't bother with it at all. If you want blinding speed with good distribution, but no pretence of security, you could try the 64-bit versions of MurmurHash. See http://en.wikipedia.org/wiki/MurmurHash for details and code.
You have tagged this as "birthday-paradox", I think you know the answer already.
P(Collision) = 1 - (2^64)!/((2^64)^n (1 - n)!)
where n is 1 billion in your case.
You will be a bit better using something other then MD5, because MD5 have pratical collusion problem.
From what I see, you need a hash function with the following requirements,
Hash arbitrary length strings to a 64-bit value
Be good -- Avoid collisions
Not necessarily one-way (security not required)
Preferably fast -- which is a necessary characteristic for a non-security application
This hash function survey may be useful for drilling down to the function most suitable for you.
I will suggest trying out multiple functions from here and characterizing them for your likely input set (pick a few billion URL that you think you will see).
You can actually generate another column like this test survey for your test URL list to characterize and select from the existing or any new hash functions (more rows in that table) that you might want to check. They have MSVC++ source code to start with (reference to ZIP link).
Changing the hash functions to suit your output width (64-bit) will give you a more accurate characterization for your application.
If you have 2^n hash possibilities, there's over a 50% chance of collision when you have 2^(n/2) items.
E.G. if your hash is 64 bits, you have 2^64 hash possibilities, you'd have a 50% chance of collision if you have 2^32 items in a collection.
Just by using a hash, there is always a chance of collisions. And you don't know beforehand wether collisions will happen once or twice, or even hundreds or thousands of times in your list of urls.
The probability is still just a probability. Its like throwing a dice 10 or 100 times, what are the chances of getting all sixes? The probability says it is low, but it still can happen. Maybe even many times in a row...
So while the birthday paradox shows you how to calculate the probabilities, you still need to decide if collisions are acceptable or not.
...and collisions are acceptable, and hashes are still the right way to go; find a 64 bit hashing algorithm instead of relying on "half-a-MD5" having a good distribution. (Though it probably has...)