Perl has long been my choice scripting language but I've run into a horrible problem. By default there is no support for long (64 bit) integers. Most of the time an integer is just a string and they work for seeking in huge files but there are plenty of places they don't work, such as binary &, printf, pack, unpack, <<, >>.
Now these do work in newer versions of Perl but only if it is built with 64-bit integer support, which does not help if I want to make portable code to run on Perls built without this option. And you don't always get control over the Perl on a system your code runs on.
My question is do Python, PHP, and Ruby suffer from such a problem, or do they also depend on version and build options?
The size of high speed hardware integers (assuming the language has them) will always be dependent on whatever size integers are available to the compiler that compiled the language interpreter (usually C).
If you need cross-platform / cross-version big integer support, the Perl pragma use bigint; will do the trick. If you need more control, bigint is a wrapper around the module Math::BigInt.
In the scope where use bigint; is loaded, all of the integers in that scope will be transparently upgraded to Math::BigInt numbers. Lastly, when using any sort of big number library, be sure to not use tricks like 9**9**9 to get infinity, because you might be waiting a while :)
In Python, you never get overflows. Instead, python switches the implementation of numbers it is using automatically. The basic implementation uses the native ints on the platform, but long integers use an infinite length number implementation. As a result, you never have to worry about your numbers becoming too large, python just handles it naturally.
Tcl 8.5's long integer support is pretty good from a user perspective. Internally, it represents integers as whatever type is necessary to hold them (up to and including bigints) and things that consume integers will take any of them (though might impose their own limits; you don't really want to use a number that will only fit in a bigint as a Unix file mode...)
The only time you really need to think about it at all is when you're going to/from some fixed-width binary format. That's reasonably obvious though (it's fixed width after all).
Excuse me sir, bigint and Math::BigInt are part of core modules. Just friggin' use one of them, it will work on any platform.
Related
Does choosing a programming language decide performance when all of it is compiled to some 1's and 0's
Eg: printf (in C) vs cout (C++) vs print (in Python)
Do all of the above have same binary compiled code ?
Appreciate any help in understanding this concept of programming language and role on hardware in detail! Thanks in advance
The choice of programming language can have many impacts on the performance of your code, how portable it is, the comparability and among other things, how easily the objective can be put into code. To answer you question directly, C and C++ would likely produce the 'same binary' when printing an output, if they were both done for the same target environment. Python is different because it is an interpreted language, meaning the code is read by a program written in code native to the architecture and acted upon accordingly. Python is something of an edge case in this regard because it is technically compiled at execution time (and can be before distribution) but into an intermediate code similar in principle to Java byte code that is only understood by the Python interpreter.
The difference you bring up between lower language's like C and higher ones like Java, Python and even JavaScript is the nature of their execution being done by native hardware or by the interpreter. Language's running on bare metal are generally understood to be faster than those on interpreters as the interpreter takes time to understand the code and uses it's own system resources. Java tends to break this rule because it's interpreter is a full virtual machine that understands very simple byte code, making it competitive in speed to language's like C.
To what kind of binary code they are compiled depends on the compiler. For C and C++ there are dozens of different compilers which might generate different binary code. Besides that, most compilers even have optimization flags that influence the generated binary code a lot.
Python isn't even directly compiled into "machine code", it's compiled into bytecode for a python interpreter. The Python interpreter itself is a program that runs on the machine, then reads the python-bytecode and executes it probably by internally calling predefined functions (that already exist in machine-code)
I wish that John McCarthy was still alive, but...
From LISP 1.5 Programmer's Manual :
LISP can interpret and execute programs written in the form of S-
expressions. Thus, like machine language, and unlike most other higher
level languages, it can be used to generate programs for further
execution.
I need more clarification about how machine language can used to generate programs and how Lisp can do it?
All that is saying is that machine code can directly write machine instructions to memory and jump to those instructions to execute them; this is the basis of many attack vectors to break into software, in fact.
The point is, when you're writing machine code, it's easy to generate machine code. But when you're writing in a compiled language like C, you can't just generate C code at run time and then execute it - unless your program includes a C compiler.
Lisp - and, these days, many other languages, especially "scripting languages" like Perl, Python, Ruby, Tcl, Javascript, and command shells - have the ability to execute code that is generated at runtime. In Lisp, since code and data have the same structure, this is usually less work than it is in the other languages, where the code to be evaluated is generally a string that has to be parsed. (Though Perl has the ability to eval a block instead of a string, which lets the compiler do the parsing ahead of time for literal code.)
A machine language can alter itself while running. The last assembly programming i did was for MS DOS and resident program that i used to run before testing other programs. When my program misbehaved, a keystroke switched to the resident program and could peek into the running program and alter it directly before resuming. It was quite handy since I didn't have a debugger.
LISP had this from the very beginning since it was originally interpreted. You could change the definition of a function while you were running and the whole langugage was always available at runtime, even eval and define. When it started getting compiled it wasn't compiled like Algol, but partially, allowing for interpreted and compiled code to intermix at the same time. The fact that its code structure was list structure and that symbols are a data type contributed to this.
Last interview I saw with McCarthy he was asked about what he thought of modern programming languages (Not LISP family but the Algol family language Ruby, that is said to be influenced by LISP), and before answering he asked if they could represent code as data (like list structure). Since it didn't, Ruby is still behind what LISP was in the 60s in his opinion.
Many new programming languages are emerging in the Algol family and some of the most promising ones, like Perl6 and Nemerle, are getting closer to the features LISP had in the 60s.
Machine language programs can fill memory regions with arbitrary bytes. Then they can just jump to the start of such region which will thus get executed right away.
Lisp language programs can easily create arbitrary S-expressions in memory, using cons. Then they can just call eval on these S-expressions to evaluate (interpret) them.
High level languages programs can easily fill memory regions with characters representing new code in the language's syntax. But they can not run such a code.
I'm looking for recommended virtual machines that can run on a 8-bit microprocessor AND support dynamic languages. I'd like a VM solution because I perceive benefits in terms of code density, portability, and ability to have a smaller interpreter, leaving more room for larger programs.
My goal is to run a complete LOGO interpreter, following "LOGO for the Apple II" syntax, on something like a 6502 microprocessor.
I've seen references to PyMite, Java "micro edition", and of course now the UCSD p-System sources from the 1970s are available.
Suggestions are welcome.
(Note: I've already +1'ed the FORTH answer.)
Since you mention the 6502, Steve Wozniak (!) wrote an article for Byte magazine in the late 1970s, describing the SWEET16 interpreter for the 6502. This was a partial VM for the 6502, that provided 16-bit integer arithmetic that was EASILY interspersed into 6502 assembly language. It was the basis for the original Integer BASIC, that (as I recall) was later replaced by the floating-point Applesoft BASIC.
FORTH implementation for 6502.
You might want to check out the PICOBIT system, which is a Scheme implementation that works on very very small systems, such as the PIC18. It has since been ported to ARM, and could almost certainly be ported to the 6502 or other processors.
Other than the purely obvious: "It translates Perl to C."; are there any real world uses (a.k.a. hacks) for the Perl compiler's optimized C translation backend, B::CC?
Not really. It means you can convert a (small) Perl script into a (big) C program, which will be much harder for the recipient to reverse engineer. In some paranoid circles, this might be accounted an advantage (for example, if your Perl code is embarrassingly bad and you'd rather conceal that fact from your paying customers). But mostly it is of limited to negative value.
Compiling a Perl program to an optree, which can then be executed, can take a while sometimes. You can safe some of that time by using perlcc with any of its backends. That'll, in one way or another, serialise the compiled optree and make loading it later, when executing your compiled binary, somewhat faster. I can see that being useful in, for example, CGI environments, for which, however, much better alternatives of avoiding startup costs are available.
Contrary to popular believe, perlcc doesn't make it very hard to reverse-engineer the resulting binary, as discussed in How can I reverse-engineer a Perl program that has been compiled with perlcc?
When it come to saying what version of Perl we need for our scripts, we've got options, oh, brother, we've got options:
use 5.010;
use 5.010_001;
use 5.10.0;
use v5.10;
use v5.10.0;
All seem to work. perlcritic complains about all but the first two. (It's unfortunate that the v strings seem to have such flaws, since Perl 6 expects you to do use v6; for your Perl 6 scripts...)
So, what should we be doing to indicate that we want to use a particular version of perl?
There are really only two options: decimal numbers and v-strings. Which form to use depends in part on which versions of Perl you want to "support" with a meaningful error message instead of a syntax error. (The v-string syntax was added in Perl 5.6.) The accepted best practice -- which is what perlcritic enforces -- is to use decimal notation. You should specify the minimum version of Perl that's required for your script to behave properly. Normally that means declaring a dependency on language features added in a major release, such as using the say function added in 5.10. You should include the patch level if it's important for your script to behave properly. For example, some of my code specifies use 5.008001 because it depends on the fix for a bug that 5.8.0 had which was fixed in 5.8.1.
I just use something like 5.010_001. I've grow weary of dealing with version string problems for something that should be mind-numbingly simple.
Since I mostly deal with build systems, I have the constant struggle of Module::Build's internal version.pm which is out of sync with the version.pm on CPAN. I think that's mostly better now, but I have better things to think about.
The best practice should always be to do the thing that commands the least of your attention, and certainly not take more attention than the value it gives back. In my opinion, v-strings and dotted decimals were a huge distraction with no additional benefit, wasting a lot of valuable programmer time just to get back to the starting point.
I should also note that Perl::Critic has often pushed questionable practices for the higher purpose of reducing the ways that people do things. However, those practices often cause problems, make them un-best. This is one of those cases. A more realistic best practice is to not make Perl::Critic compliance your goal. Use it where it is useful, but in cases like this, don't waste mental time on it.
The "modern" way is to use the forms starting with v. However, that may not necessarily be what you really want to do.
Critic complains because older versions of Perl won't understand and play nicely with the forms that start with v. However, if your version of Perl supports it, v is nicer to read because you can say:
use v5.10.1;
... rather than ...
use 5.010_001;
So, in the documentation for use, the following workaround is offered:
use 5.006; use v5.6.1;
NB: I think the documenation is in error here, as the v is omitted from the example at perldoc use.
Since the versions of Perl that don't support the v syntax will fail at the first use, they won't get to the second more specific and readable one.