Does Common Lisp have the fastest PCRE implementation? - perl

A friend claimed that Common Lisp has the fastest Perl-compatible regular expression library of any language, including Perl itself, because with an optimizing JIT compiler like SBCL, CL-PPCRE can compile each particular regex down to native assembly whereas other implementations, include Perl's, must generate bytecode and interpret it. In practice, especially for the common case where we try to match the same regex against many inputs or long input, the compilation overhead is more than justified.
Unfortunately, I can't find any benchmarks on this, and I don't know enough to run my own, so I turn to the hive mind. Can anyone evaluate this claim?

I have no benchmarks of my own to share, but perhaps your friend was referring to results concerning the portable regular expression library CL-PPCRE. The current web page no longer speaks of benchmarks, but courtesy of the Wayback Machine we can see that it used to show benchmark results where CL-PPCRE outperformed Perl 2-to-1. Benchmarking is a tricky business (especially for moving targets), which might explain why the current page is silent on the matter.

PCRE library has a JIT compiler module now, and it has a good performance compared to other regexes: http://sljit.sourceforge.net/regex_perf.html or http://blog.rburchell.com/2011/12/why-i-avoid-qregexp-in-qt-4-and-so.html

Related

Perl does not have a simple #include, OK, but WHY?

Perl does not have a C style preprocessor level "include" function. That is how it is, and there are numerous sites that explain how to more or less emulate the same sort of behavior.
The one thing I couldn't find on any of these sites is any explanation for WHY perl does not have this functionality. Given that Perl often provides many different ways to accomplish the same thing, it is a curious omission.
Can somebody please explain why the decision was made to exclude this sort of functionality?
Perl already has require, do, eval and here documents among other things. It doesn't need a builtin preprocessor, if you need one that badly, there are filters. http://perldoc.perl.org/perlfilter.html
In general, nobody wants #include, even C and C++ programmers would mostly be happy to give it up in exchange for:
Faster compiles
Clean module system
#include is legacy, period. If a mainstream language designer announced tomorrow that they were adding #include to (your favorite language here) you'd probably see mass hysteria, laughing, and loss of confidence in that designer.
Language designers don't implement #include in any new language, there are simply better ways to do it. In general the trend is to attempt to achieve single pass lexing. Preprocessing requires you to incrementally expand #includes and potentially revisit the same characters repeatedly. It has been wrought with problems, and is one of the reasons that C++ is such dog to compile. It was ok in the 60s and 70s when memory and CPU were tiny and languages and problems were simpler, as were codebases. Nowadays, you want to be able to compile a "library" once, store its type metadata with it so the compiler can access it efficiently without rescanning it. That is what Microsoft does anyway with precompiled headers.
So what would #include be good for?
Modules ? No. See above. Modules are compiled once, export their metadata efficiently, they don't pollute the namespace of the clients, they don't recursively inject other includes, they can be distributed in binary form, among umpteen other advantages that I'm not even smart enough to think of.
Including macros ? No. Replace with constants, inlining and generic programming. All of which can be precompiled and expored from a module.
Splicing in generated code ? Better ways to do it anyway. See modules.
The only useful functionality for the preprocessor, IMO, is conditional compilation.
#ifdef _WIN32_
// do windowsy stuff
#else
#endif
Again, Perl can do this with do, eval or require as well.
Perl doesn't have or lack it any more than C does.
The C preprocessor was designed such that it and C need to know as little as possible about each other. There is no reason why you can't use it with Perl.
So why don't Perl programmers do it?
As codenhein explains, it's generally a bad idea to use an include mechanism with a compiler that don't know anything about each other, as it leaves you open to some crazy errors that neither can diagnose; the fact that C programmers are used to it doesn't change that.

Communicate from Lisp to other runtimes

Short version:
Is there a way to allow other programs to call Lisp functions of a Lisp program?
Long version:
I'm contemplating a graph database project :) Not to be started immediately, I'm only probing the ground so far. I've tried couple of graph databases, and my biggest gripe about them is they all are written in Java, (some are in C++, which isn't going to cut it either...). Java has no good way of communicating outwards. It may only be embedded inside another Java program. C++ is just hard to embed / I'm dubious that embedding was even planned.
So, I would obviously want to write it in CL, but I'm considering other options too. So, if you believe that CL simply won't do it, but you have reasons to believe that some other language will, then that's an interesting answer! My requirements to the "other language" would be that it must support parallel computing in some way. Obviously, high performance. And, as mentioned, extensibility.
I see multiple ways to call Lisp from other languages:
The simplest way that should work with all implementations would be to just maintain a bidirectional stream to the REPL. So you could send commands to the REPL and receive the REPL's response. One drawback of this would of course be that everything would be converted to strings.
You could mirror the way SLIME communicates with SWANK. In that case, you either use SWANK directly on the Lisp side and communicate through the same protocol SLIME uses, or write your own version of such a library.
Finally, there are Lisp implementations designed with embeddability in mind. I'm thinking particularly of Embeddabble Common Lisp (ECL) here which has a C API. For example, this section in the manual explains how to call functions, by getting hold of the function's symbol with ecl_make_symbol and then calling it with cl_funcall or cl_apply.
As alternatives to Common Lisp, other Lisp languages might be worthwhile to consider. Various Scheme implementations are designed to be embeddable, this is for example the documentation of Racket's C API. It seems you prefer the native code side of the runtime world over the JVM, but otherwise, Clojure is also interesting for being embeddable within Java.
For the host language there are few limits because most languages should support "pipes" (i.e. streams to other processes) or have a C FFI to call some Lisp's C API.

What are the uses of Perl's C translation backend?

Other than the purely obvious: "It translates Perl to C."; are there any real world uses (a.k.a. hacks) for the Perl compiler's optimized C translation backend, B::CC?
Not really. It means you can convert a (small) Perl script into a (big) C program, which will be much harder for the recipient to reverse engineer. In some paranoid circles, this might be accounted an advantage (for example, if your Perl code is embarrassingly bad and you'd rather conceal that fact from your paying customers). But mostly it is of limited to negative value.
Compiling a Perl program to an optree, which can then be executed, can take a while sometimes. You can safe some of that time by using perlcc with any of its backends. That'll, in one way or another, serialise the compiled optree and make loading it later, when executing your compiled binary, somewhat faster. I can see that being useful in, for example, CGI environments, for which, however, much better alternatives of avoiding startup costs are available.
Contrary to popular believe, perlcc doesn't make it very hard to reverse-engineer the resulting binary, as discussed in How can I reverse-engineer a Perl program that has been compiled with perlcc?

Macros: What's the benefit?

Paul Graham writes:
For example, types seem to be an
inexhaustible source of research
papers, despite the fact that static
typing seems to preclude true macros--
without which, in my opinion, no
language is worth using.
What's the big deal with macros? I haven't spent a whole lot of time with them, but from the legacy C/C++ I've worked with they appear to be mostly used as a hack before templates/generics existed.
It's hard to imagine that
DECLARELIST(StrList, string);
StrList slist;
is somehow preferable to
List<String> slist;
Am I missing something?
Then there's the usage as a pseudo-function, like MAKEPOINTS:
POINTS MAKEPOINTS(
DWORD dwValue
);
Why not define it as a function instead? Is this some optimization, where you avoid code duplication without having the added overhead of another stack frame?
Then there's also tricky control flow things involving GOTO, which seem to be of dubious value.
What's so great about macros? They're less type safe (in C and C++) (right?). Why won't Paul Graham program without them?
LISP macros are an entirely different beast. C/C++ macros can merely replace a piece of text with abother piece of text using an extremely basic language. Whereas a LISP program is (after "reading") is a LISP data structure and can therefore be manipulated using the whole language.
With such macros, you could (given you're a really clever hacker) vastly extend the language and everybody could use it relatively easily, since you did it with macros. Take for example the the Common Lisp Object System. At its core, the language has nothing even remotely like objects. It is entirely implemented in the language itself, including a relatively simple syntax for use - using macros.
Of course macros are less necessary when the language has most things you'd every want built-in. OTOH, the LISP fans are of the opinion that a sufficiently simple language (LISP) with sufficiently powerful metaprogramming capabilities (macros) is better since new concepts can be incorporated into the language without changing the spec or working implementations. But the most compelling example for macro usage is the DSL area. Ruby on Rails and others show every day how useful DSLs can be. Yes, Ruby doesn't have macros, it just exploits how much Ruby syntax can be bent. In other languages, or when even Ruby's syntax isn't flexible enough, you need macros or a fully-blown parser/interpreter to implement a complex DSL.
Macros are really only good for two things in C/C++, and should generally be the tool of last resort (if you can accomplish something without using macros, do so).
Creating new syntactic structures or abstractions that do not exist in the language.
Eliminating duplication, especially between things that must be in sync with each other.
It's almost never to use a macro as a function.
You also have to realize that LISP macros are not C/C++ macros.

Uses for both static strong typed languages like Haskell and dynamic (strong) languages like Common LIsp

I was working with a Lisp dialect but also learning some Haskell as well. They share some similarities but the main difference in Common Lisp seems to be that you don't have to define a type for each function, argument, etc. whereas in Haskell you do. Also, Haskell is mostly a compiled language. Run the compiler to generate the executable.
My question is this, are there different applications or uses where a language like Haskell may make more sense than a more dynamic language like Common Lisp. For example, it seems that Lisp could be used for more bottom programming, like in building websites or GUIs, where Haskell could be used where compile time checks are more needed like in building TCP/IP servers or code parsers.
Popular Lisp applications:
Emacs
Popular Haskell applications:
PUGS
Darcs
Do you agree, and are there any studies on this?
Programming languages are tools for thinking with. You can express any program in any language, if you're willing to work hard enough. The chief value provided by one programming language over another is how much support it gives you for thinking about problems in different ways.
For example, Haskell is a language that emphasizes thinking about your problem in terms of types. If there's a convenient way to express your problem in terms of Haskell's data types, you'll probably find that it's a convenient language to write your program in.
Common Lisp's strengths (which are numerous) lie in its dynamic nature and its homoiconicity (that is, Lisp programs are very easy to represent and manipulate as Lisp data) -- Lisp is a "programmable programming language". If your program is most easily expressed in a new domain-specific language, for example, Lisp makes it very easy to do that. Lisp (and other dynamic languages) are a good fit if your problem description deals with data whose type is poorly specified or might change as development progresses.
Language choice is often as much an aesthetic decision as anything. If your project requirements don't limit you to specific languages for compatibility, dependency, or performance reasons, you might as well pick the one you feel the best about.
You're opening multiple cans of very wriggly worms. First off, the whole strongly vs weakly typed languages can. Second, the functional vs imperative language can.
(Actually, I'm curious: by "lisp dialect" do you mean Clojure by any chance? Because it's largely functional and closer in some ways to Haskell.)
Okay, so. First off, you can write pretty much any program in pretty much any normal language, with more or less effort. The purported advantage to strong typing is that a large class of errors can be detected at compile time. On the other hand, less typeful languages can be easier to code in. Common Lisp is interesting because it's a dynamic language with the option of declaring and using stronger types, which gives the CL compiler hints on how to optimize. (Oh, and real Common Lisp is usually implemented with a compiler, giving you the option of compiling or sticking with interpreted code.)
There are a number of studies about comparing untyped, weakly typed, and strongly typed languages. These studies invariably either say one of them is better, or say there's no observable difference. There is, however, little agreement among the studies.
The biggest area in which there may be some clear advantage is in dealing with complicated specifications for mathematical problems. In those cases (cryptographic algorithms are one example) a functional language like Haskell has advantages because it is easier to verify the correspondence between the Haskell code and the underlying algorithm.
I come mostly from a Common Lisp perspective, and as far as I can see, Common Lisp is suited for any application.
Yes, the default is dynamic typing (i.e. type detection at runtime), but you can declare types anyway for optimization (as a side note for other readers: CL is strongly typed; don't confuse weak/strong with static/dynamic!).
I could imagine that Haskell could be a bit better suited as a replacement for Ada in the avionics sector, since it forces at least all type checks at compile time.
I do not see how CL should not be as useful as Haskell for TCP/IP servers or code parsers -- rather the opposite, but my contacts with Haskell have been brief so far.
Haskell is a pure functional language. While it does allow imperative constructs (using monads), it generally forces the programmer to think the problem in a rather different way, using a more mathematical-oriented approach. You can't reassign another value to a variable, for example.
It is claimed that this reduces the probability of making some types of mistakes. Moreover, programs written in Haskell tend to be shorter and more concise than those written in typical programming languages. Haskell also makes heavy use of non-strict, lazy evaluation, which could theoretically allow the compiler to make optimizations not otherwise possible (along with the no-side-effects paradigm).
Since you asked about it, I believe Haskell's typing system is quite nice and useful. Not only it catches common errors, but it can also make code more concise (!) and can effectively replace object-oriented constructs from common OO languages.
Some Haskell development kits, like GHC, also feature interactive environments.
The best use for dynamic typing that I've found is when you depend on things that you have no control over so it could as well be used dynamically. For example getting information from XML document we could do something like this:
var volume = parseXML("mydoc.xml").speaker.volume()
Not using duck typing would lead to something like this:
var volume = parseXML("mydoc.xml").getAttrib["speaker"].getAttrib["volume"].ToString()
The benefit of Haskell on the other hand is in safety. You can for example make sure, using types, that degrees in Fahrenheit and Celsius are never mixed unintentionally. Besides that I find that statically typed languages have better IDEs.