I'm running the following code on Emacs Lisp Interaction:
(defun square (x) (* x x))
(square (square (square 1001)))
which is giving me 1114476179152563777. However, the ((1001^2)^2)^2 is actually 1008028056070056028008001.
How is this possible?
#Barmar's answer is accurate for Emacs versions < 27.
In Emacs 27 bignum support has been added. NEWS says:
** Emacs Lisp integers can now be of arbitrary size.
Emacs uses the GNU Multiple Precision (GMP) library to support
integers whose size is too large to support natively. The integers
supported natively are known as "fixnums", while the larger ones are
"bignums". The new predicates 'bignump' and 'fixnump' can be used to
distinguish between these two types of integers.
All the arithmetic, comparison, and logical (a.k.a. "bitwise")
operations where bignums make sense now support both fixnums and
bignums. However, note that unlike fixnums, bignums will not compare
equal with 'eq', you must use 'eql' instead. (Numerical comparison
with '=' works on both, of course.)
Since large bignums consume a lot of memory, Emacs limits the size of
the largest bignum a Lisp program is allowed to create. The
nonnegative value of the new variable 'integer-width' specifies the
maximum number of bits allowed in a bignum. Emacs signals an integer
overflow error if this limit is exceeded.
Several primitive functions formerly returned floats or lists of
integers to represent integers that did not fit into fixnums. These
functions now simply return integers instead. Affected functions
include functions like 'encode-char' that compute code-points, functions
like 'file-attributes' that compute file sizes and other attributes,
functions like 'process-id' that compute process IDs, and functions like
'user-uid' and 'group-gid' that compute user and group IDs.
and indeed using my 27.0.50 build:
(defun square (x) (* x x))
square
(square (square (square 1001)))
1008028056070056028008001
Emacs Lisp doesn't implement bignums, it uses the machine's integer type. The range of integers it supports is between most-negative-fixnum and most-positive-fixnum. On a 64-bit system, most-positive-fixnum will be 261-1, which has about 20 decimal digits.
See Integer Basics in the Elisp manual.
The correct result of your calculation is 25 digits, which is much larger than this. The calculation overflows and wraps around. It should be correct modulo 262.
You could use floating point instead. It has a much larger range, although very large numbers lose precision.
(square (square (square 1001.0)))
1.008028056070056e+24
I am tryin to retrieve the minimum value of integer in lisp. I found
most-negative-fixnum
variable that should represent the lowest possible number. whatever i try doing with it is throws error
Variable `MOST-NEGATIVE-FIXNUM' is unbound.
Is there any specific way how to get the value of variables in lisp? My research about this is without results.
Thanks
It should work as follows, using for example SBCL as an implementation of Common Lisp:
CL-USER> (lisp-implementation-type)
"SBCL"
CL-USER> (lisp-implementation-version)
"1.3.12.51-868cff4"
CL-USER> most-negative-fixnum
-4611686018427387904
[...] variable that should represent the lowest possible number
That would be the lowest possible fixnum. You have big numbers too:
CL-USER> (* 10000 most-negative-fixnum)
-46116860184273879040000
Since Common Lisp has bignum as a part of its specifications there are no limit to how low numeric value you can have from the language point of view.
Since machines have finite memory you will experience that there is a limit to how low the numbers your machine can have. It's not possible for an implementation to know this number or show it to you without actually trying to make it or perhaps the implementation has some ideas of th eoverhead. Without overhead you can use your available memory in bytes as an estimate and you'll have 8 bits per available byte. Eg my machine has about 11GB available so I guess I can use 10GB as the actual storage and sign, ie. 80G bits for the actual number. -2^80G+1 or ~-10^2900000000. You won't be able to print it since you have no available memory.
I am reading this in a Lisp textbook:
Lisp can perform some amazing feats with numbers, especially when compared with most other languages. For instance, here we’re using the function expt to calculate the fifty-third power of 53:
CL> (expt 53 53)
24356848165022712132477606520104725518533453128685640844505130879576720609150223301256150373
Most languages would choke on a calculation involving such a large number.
Yes, that's cool, but the author fails to explain why Lisp can do this more easily and more quickly than other languages.
Surely there is a simple reason, can anyone explain?
This is a good example that "worse is not always better".
The New Jersey approach
The "traditional" languages, like C/C++/Java, have limited range integer arithmetics based on the hardware capabilities, e.g., int32_t - signed 32-bit numbers which silently overflow when the result does not fit into 32 bits. This is very fast and often seems good enough for practical purposes, but causes subtle hard to find bugs.
The MIT/Stanford style
Lisp took a different approach.
It has a "small" unboxed integer type fixnum, and when the result of fixnum arithmetics does not fit into a fixnum, it is automatically and transparently promoted to an arbitrary size bignum, so you always get mathematically correct results. This means that, unless the compiler can prove that the result is a fixnum, it has to add code which will check whether a bignum has to be allocated. This, actually, should have a 0 cost on modern architecture, but was a non-trivial decision when it was made 4+ decades ago.
The "traditional" languages, when they offer bignum arithmetics, do that in a "library" way, i.e.,
it has to be explicitly requested by the user;
the bignums are operated upon in a clumsy way: BigInteger.add(a,b) instead of a+b;
the cost is incurred even when the actual number is small and would fit into a machine int.
Note that the Lisp approach is quite in line with the Lisp tradition of doing the right thing at a possible cost of some extra complexity. It is also manifested in the automated memory management, which is now mainstream, but was viciously attacked in the past. The lisp approach to integer arithmetics has now been used in some other languages (e.g., python), so the progress is happening!
To add to what wvxvw wrote, it's easier in Lisp because bignums are built into the language. You can juggle large numbers around just as, say, ints in C or Java.
I'm trying to figure out the best programming language for an analytical model I'm building. Primary consideration is speed at which it will run FOR loops.
Some detail:
The model needs to perform numerous (~30 per entry, over 12 cycles) operations on a set of elements from an array -- there are ~300k rows, and ~150 columns in the array. Most of these operations are logical in nature, e.g., if place(i) = 1, then j(i) = 2.
I've built an earlier version of this model using Octave -- to run it takes ~55 hours on an Amazon EC2 m2.xlarge instance (and it uses ~10 GB of memory, but I'm perfectly happy to throw more memory at it). Octave/Matlab won't do elementwise logical operations, so a large number of for loops are needed -- I'm relatively certain that I've vectorized as much as possible -- the loops that remain are necessary. I've gotten octave-multicore to work with this code, which makes some improvement (~30% speed reduction when I get it running on 8 EC2 cores), but ends up being unstable with file locking, etc.
+I'm really looking for a step change in run-time -- I know that actually using Matlab might get me as much as a 50% improvement from looking at some benchmarks, but that is cost-prohibitive. The original plan when starting this was to actually run a Monte Carlo with this, but at 55 hours a run, that's completely impractical.
The next version of this is going to be a complete rebuild from the ground up (for IP reasons I won't get in to if nothing else), so I'm completely open to any programming language. I'm most familiar with Octave/Matlab, but have dabbled in R, C, C++, Java. I'm also proficient w/ SQL if the solution involves storing the data in a database. I'll learn any language for this -- these aren't complicated functionality we're looking for, no interfacing with other programs, etc., so not too concerned about learning curve.
So with all that said, what's the fastest programming language specifically for FOR loops? From a search of SO and Google, Fortran and C bubble to the top, but looking for some more advice before diving in to one or the other.
Thanks!
This for loop looks no more complex than this when it hits the CPU:
for(int i = 0; i != 1024; i++) translates to
mov r0, 0 ;;start the counter
top:
;;some processing
add r0, r0, 1 ;;increment the counter by 1
jne top: r0, 1024 ;;jump to the loop top if we havn't hit the top of the for loop (1024 elements)
;;continue on
As you can tell, this is sufficiently simple you can't really optimize it very well[1]... Refocus towards the algorithm level.
The first cut at the problem is to look at cache locality. Look up the classic example of matrix multiplication and swapping the i and j indexes.
edit: As a second cut, I would suggest evaluating the algorithm for data-dependencies between iterations and data dependency between localities in your 'matrix' of data. It may be a good candidate for parallelization.
[1] There are some micro-optimizations possible, but those will not produce the speedsups you're looking for.
~300k * ~150 * ~30 * ~12 = ~16G iterations, right?
This number of primitive operations should complete in a matter of minutes (if not seconds) in any compiled language on any decent CPU.
Fortran, C/C++ should do it almost equally well. Even managed languages, such as Java and C#, would only fall behind by a small margin (if at all).
If you have a problem of ~16G iterations running 55 hours, this means that they are very far from being primitive (80k per second? this is ridiculous), so maybe we should know more. (as was already suggested, is disk access limiting performance? is it network access?)
As #Rotsor said, 16G operations / 55 hours is about 80,000 operations per second, or one operation every 12.5 microseconds. That's a lot of time per operation.
That means your loops are not the cause of poor performance, it's what's in the innermost loop that's taking the time. And Octave is an interpreted language. That alone means an order of magnitude slowdown.
If you want speed, you first need to be in a compiled language. Then you need to do performance tuning (aka profiling) or, just single step it in a debugger at the instruction level. That will tell you what it is actually doing in its heart of hearts. Once you've got it to where it's not wasting cycles, fancier hardware, cores, CUDA, etc. will give you a parallelism speedup. But it's silly to do that if your code is taking unnecessarily many cycles. (Here's an example of performance tuning - a 43x speedup just by trimming the fat.)
I can't believe the number of responders talking about matlab, APL, and other vectorized languages. Those are interpreters. They give you concise source code, which is not at all the same thing as fast execution. When it comes down to the bare metal, they are stuck with the same hardware as every other language.
Added: to show you what I mean, I just ran this C++ code, which does 16G operations, on this moldy old laptop, and it took 94 seconds, or about 6ns per iteration. (I can't believe you baby-sat that thing for 2 whole days.)
void doit(){
double sum = 0;
for (int i = 0; i < 1000; i++){
for (int j = 0; j < 16000000; j++){
sum += j * 3.1415926;
}
}
}
In terms of absolute speed, probably Fortran followed by C, followed by C++. In practical application, well written code in any of the three, compiled with a descent compiler should be plenty fast.
Edit- generally you are going to see much better performance with any kind of looped or forking (e.g.- if statements) code with a compiled language, versus an interpreted language.
To give an example, on a recent project I was working on, the data sizes and operations were about 3/4 the size of what you're talking about here, but like your code, had very little room for vectorization, and required significant looping. After converting the code from matlab to C++, runtimes went from 16-18 hours, down to around 25 minutes.
For what you're discussing, Fortran is probably your first choice. The closest second place is probably C++. Some C++ libraries use "expression templates" to gain some speed over C for this kind of task. It's not entirely certain those will do you any good, but C++ can be at least as fast as C, and possibly somewhat faster.
At least in theory, there's no reason Ada couldn't be competitive as well, but it's been so long since I used it for anything like this that I hesitate to recommend it -- not because it isn't good, but because I just haven't kept track of current Ada compilers well enough to comment on them intelligently.
Any compiled language should perform the loop itself on roughly equal terms.
If you can formulate your problem in its terms, you might want to look at CUDA or OpenCL and run your matrix code on the GPU - though this is less good for code with lots of conditionals.
If you want to stay on conventional CPUs, you may be able to formulate your problem in terms of SSE scatter/gather and bitmask operations.
Probably the assembly language for whatever your platform is. But compilers (especially special-purpose ones that specifically target a single platform (e.g., Analog Devices or TI DSPs)) are often as good as or better than humans. Also, compilers often know about tricks that you don't. For example, the aforementioned DSPs support zero-overhead loops and the compiler will know how to optimize code to use those loops.
Matlab will do element-wise logical operations and they are generally quite fast.
Here is a quick example on my computer (AMD Athalon 2.3GHz w/3GB) :
d=rand(300000,150);
d=floor(d*10);
>> numel(d(d==1))
ans =
4501524
>> tic;d(d==1)=10;toc;
Elapsed time is 0.754711 seconds.
>> numel(d(d==1))
ans =
0
>> numel(d(d==10))
ans =
4501524
In general I've found matlab's operators are very speedy, you just have to find ways to express your algorithms directly in terms of matrix operators.
C++ is not fast when doing matrixy things with for loops. C is, in fact, specifically bad at it. See good math bad math.
I hear that C99 has __restrict pointers that help, but don't know much about it.
Fortran is still the goto language for numerical computing.
How is the data stored? Your execution time is probably more effected by I/O (especially disk or worse, network) than by your language.
Assuming operations on rows are orthogonal, I would go with C# and use PLINQ to exploit all the parallelism I could.
Might you not be best with a hand-coded assembler insert? Assuming, of course, that you don't need portability.
That and an optimized algorithm should help (and perhaps restructuring the data?).
You might also want to try multiple algorithms and profile them.
APL.
Even though it's interpreted, its primitive operators all operate on arrays natively, therefore you rarely need any explicit loops. You write the same code, whether the data is scalar or array, and the interpreter takes care of any looping needed internally, and thus with the minimum overhead - the loops themselves are in a compiled language, and will have been heavily optimised for the specific architecture of the CPU it's running on.
Here's an example of the simplicity of array handling in APL:
A <- 2 3 4 5 6 8 10
((2|A)/A) <- 0
A
2 0 4 0 6 8 10
The first line sets A to a vector of numbers.
The second line replaces all the odd numbers in the vector with zeroes.
The third line queries the new values of A, and the fourth line is the resulting output.
Note that no explicit looping was required, as scalar operators such as '|' (remainder) automatically extend to arrays as required. APL also has built-in primitives for searching and sorting, which will probably be faster than writing your own loops for these operations.
Wikipedia has a good article on APL, which also provides links to suppliers such as IBM and Dyalog.
Any modern compiled or JITted language is going to render down to pretty much the same machine language code, giving a loop overhead of 10 nano seconds or less, per iteration, on modern processors.
Quoting #Rotsor:
If you have a problem of ~16G iterations running 55 hours, this means that they are very far from being primitive (80k per second? this is ridiculous), so maybe we should know more.
80k operations per second is around 12.5 microseconds each - a factor of 1000 greater than the loop overhead you'd expect.
Assuming your 55 hour runtime is single threaded, and if your per item operations are as simple as suggested, you should be able to (conservatively) achieve a speedup of 100x and cut it down to under an hour very easily.
If you want to run faster still, you'll want to look at writing multi-threaded solution, in which case a language that provides good support would be essential. I'd lean towards PLINQ and C# 4.0, but that's because I already know C# - YMMV.
what about a lazy loading language like clojure. it is a lisp so like most lisp dialects lacks a for loop but has many other forms that operate more idiomatically for list processing. It might help your scaling issues as well because the operations are thread safe and because the language is functional it has fewer side effects. If you wanted to find all the items in the list that were 'i' values to operate on them you might do something like this.
(def mylist ["i" "j" "i" "i" "j" "i"])
(map #(= "i" %) mylist)
result
(true false true true false true)
I was reading about the subset-sums problem when I came up with what appears to be a general-purpose algorithm for solving it:
(defun subset-contains-sum (set sum)
(let ((subsets) (new-subset) (new-sum))
(dolist (element set)
(dolist (subset-sum subsets)
(setf new-subset (cons element (car subset-sum)))
(setf new-sum (+ element (cdr subset-sum)))
(if (= new-sum sum)
(return-from subset-contains-sum new-subset))
(setf subsets (cons (cons new-subset new-sum) subsets)))
(setf subsets (cons (cons element element) subsets)))))
"set" is a list not containing duplicates and "sum" is the sum to search subsets for. "subsets" is a list of cons cells where the "car" is a subset list and the "cdr" is the sum of that subset. New subsets are created from old ones in O(1) time by just cons'ing the element to the front.
I am not sure what the runtime complexity of it is, but appears that with each element "sum" grows by, the size of "subsets" doubles, plus one, so it appears to me to at least be quadratic.
I am posting this because my impression before was that NP-complete problems tend to be intractable and that the best one can usually hope for is a heuristic, but this appears to be a general-purpose solution that will, assuming you have the CPU cycles, always give you the correct answer. How many other NP-complete problems can be solved like this one?
NP-complete problems are solvable, just not in polynomial time (as far as we know). That is, an NP-complete problem may have an O(n*2^n) algorithm that could solve it, but it won't have, for example, an O(n^3) algorithm to solve it.
Interestingly, if a quick (polynomial) algorithm was found for any NP-complete problem, then every problem in NP could be solved in polynomial time. This is what P=NP is about.
If I understand your algorithm correctly (and this is based more on your comments than on the code), then it is equivalent to the O(n*2^n) algorithm here. There are 2^n subsets, and since you also need to sum each subset, the algorithm is O(n*2^n).
One more thing about complexity - the O(whatever) only indicates how well a particular algorithm scales. You cannot compare two algorithms and say that one is faster than the other based on this. Big-O notation doesn't care about implementation details and optimisations - it is possible to write two implementations of the same algorithm with one being much faster than the other, even though they might both be O(n^2). One woman making babies is an O(n) operation, but the chances are that this is going to take a lot longer than most O(n*log(n)) sorts you perform. All you can say based on this is that sorting will be slower for very large values on n.
All of the NP-complete problems have solutions. As long as you're willing to spend the time to compute the answer, that is. Just because there's not an efficient algorithm, doesn't mean there isn't one. For example, you could just iterate over every potential solution, and you'll eventually get one. These problems are used all over the place in real-world computing. You just need to be careful about how a big a problem you set for yourself if you're going to need exponential time (or worse!) to solve it.
I am not sure what the runtime
complexity of it is, but appears that
with each element "sum" grows by, the
size of "subsets" doubles, plus one,
so it appears to me to at least be
quadratic.
If the run-time doubles for each increase in N, you're looking at an O(2^N) algorithm. That's also what I'd expect from visiting all subsets of a set (or all members of the powerset of a set), as that's exactly 2^N members (if you include rhe empty set).
The fact that adding or not adding an element to all hitherto-seen sets is fast doesn't mean that the total processing is fast.
What is going on here could be expressed much more simply using recursion:
(defun subset-sum (set sum &optional subset)
(when set
(destructuring-bind (head . tail) set
(or (and (= head sum) (cons head subset))
(subset-sum tail sum subset)
(subset-sum tail (- sum head) (cons head subset))))))
The two recursive calls at the end clearly show we are traversing a binary tree of depth n, the size of the given set. The number of nodes in the binary tree is O(2^n), as expected.
It's karpreducible to polynomial time. Reduce with Karp reduction to a decision problem O(nM) using a heap or binary search upper bounds is log(M*2^M)=logM+log(2^M)=logM+Mlog2 Ergo Time:O(nM)