Church Numerals convert to int without language primitive - lisp

Is it possible to convert a church numeral to an integer representation without using a language primitive such as add1?
All the examples I've come across use a primitive to dechurch to int
Example:
plus1 = lambda x: x + 1
church2int = lambda n: n(plus1)(0)
Example 2:
(define (church-numeral->int cn)
((cn add1) 0))
I'm experimenting with a micro lisp intepretter (using only John McCarthy's 10 rules) and would like to understand if that can be done without adding a primitive.

The integer numeric type is not part of McCarthy's list of Lisp elementary primitive procedures - you only have functions at that level, no other data types exist. That's why integers would need to be represented as functions (for instance, using Church numerals) if we were to adhere strictly to such minimalistic definition of Lisp. So the answer is no. You can't convert to a data type that doesn't exist yet.
Now suppose that we add integers as atoms in the language (notice that adding a new data type to the language goes beyond the 7-10 primitive procedures mentioned). To simplify even more, suppose that we just add a single number, the number zero - then we'd still need the add1 operation to build the rest of the integers, as per Peano axioms, which require the existence of the successor operation for the natural numbers to exist. Again, we can't convert from Church numerals to integers without at least having the number zero as an atom and the add1 function.

No. int, as you describe it, is a primitive type of value, not a function. You can't manipulate such ints at all without primitives (without add1, how are you ever going to get to 1 from 0?).
However, you certainly can convert between two different Church-encodings of natural numbers without using primitives, as long as your language is Turing-complete without those primitives.

Related

Can you create a programming language with just one symbol?

Can you create a programming language with just one symbol like brainfuck.
Yes, it has been done before - see Unary.
Basically it's a strange encoding of brainfuck. Treat each BF command as a number. The whole program is then also a number, created by concatenating the commands together (with an extra 1 at front, for unambiguous decoding). Convert the number to unary numeric system (aka the number of digits is your number) and you're done.
Note however the programs in this tend to be very large - a cat implemented in Unary is (according to the information on page) 56623 characters long.
MGIFOS, Lenguage and Ellipsis follow the same principle. Note that e.g. a hello world in MGIFOS
has more characters than particles in the observable universe
Then Len(language,encoding) extends this principle to any language.
They are called OISC One Instruction Set Compiler.
The first one know of is Melzak's Arithmetic Machine (1961), with the instruction:
z = x-y or jump if y>x
You also have Zero Instruction Set Computer, which are more like neural nets.
Not forgetting the amazing FRACTRAN of Conway & Guy (1996), with no instruction but interprets a series of fractions (the program) in a Tuning complete way.

multiple-value hash common lisp

How can I hash pairs or triples of 'eq-able objects like symbols or ints?
In python I can use tuples as dictionary keys, is there a way to do this in lisp without resorting to an 'equal test?
While some implementations might provide provisions for custom hash table functions, the standard only defines four:
18.1.1 Hash-Table Operations
There are four kinds of hash tables: those whose keys are compared with eq, those whose keys are compared with eql, those whose keys are
compared with equal, and those whose keys are compared with equalp.
That means that if you want to use the standard hash tables, then you'll probably need to use an equal or equalp hash table. I do notice that you wrote:
How can I hash pairs or triples of 'eq-able objects like symbols or
ints?
While symbols can be compared reliably with eq, you shouldn't compare numbers with eq. The documentation of eq says:
numbers with the same value need not be eq, … An implementation is permitted to make "copies" of characters and numbers at any time. The effect is that Common Lisp makes no guarantee that eq is true even when both its arguments are "the same thing" if that thing is a character or number.
and gives this example:
(eq 3 3)
; => true
; OR=> false
However, if you are working with (small) tuples of integers, you could easily hash on a function of them. E.g., the tuple (a,b,c) could be mapped to 2a×3b×5c. Since a function like that would generate unique numbers which are comparable with eql, you could use an eql hash table.
Another option for such a mapping function (that would work with symbols, too) would be to use sxhash. It's a standard hashing function that should produce identical values for equal values. How it works, and what exactly it does is not really specified at all, but it has the advantage that it's stable across Lisp images of the same implementation (e.g., run one version of SBCL today and tomorrow, and sxhash will return the same result for an equal object). Of course, it's possible that an equal-hash-table is just doing this for you already, so your mileage might vary.

Why can't CASE be used on string values and only symbol values?

In book 'land of lisp' I read
Because the case command uses eq for comparisons, it is usually used
only for branching on symbol values. It cannot be used to branch on
string values, among other things.
Please explain why?
The other two excellent answers do answer the question asked. I will try to answer the natural next question - why does case use eql?
The reason is actually the same as in C (where the corresponding switch statement uses numeric comparison): the case forms in Lisp are usually compiled to something like goto, so (case x (1 ...) (2 ...) (3 ...)) is much more efficient than the corresponding cond. This is often accomplished by compiling case to a hash table lookup which maps the value being compared to the clause directly.
That said, the next question would be - why not have a case variant with equal hash table clause lookup instead of eql? Well, this is not in the ANSI standard, but implementations can provide such extensions, e.g., ext:fcase in CLISP.
See also why eql is the default comparison.
Two strings with the same content "foo" and "foo" are not EQL. CASE uses EQL as a comparison (not EQ as in your question). Usually one might want different tests: string comparison case and case insensitive, for example. But for CASE on cannot use another test. EQL is built-in. EQL compares for pointer equality, numbers and characters. But not string contents. You can test if two strings are the identical data objects, though.
So, two strings "FOO" and "FOO" are usually two different objects.
But two symbols FOO and FOO are usually really the same object. That's a basic feature of Lisp. Thus they are EQL and CASE can be used to compare them.
Because (eq "foo" "foo") is not necessarily true. Each time you type a string literal, it may create a fresh, unique string. So when CASE is comparing the value with the literals in the cases with EQ, they won't match.

If Ascii operators are definable, why not Unicode Symbols?

I'm sure I join many in being glad there's finally a powerful language tied tightly to a mainstream GUI/Database/Communication framework.
I haven't been sure where to post this, but here seems the best spot.
I need to use Unicode symbol characters either as operators or as function names. I'd like syntactic sugar, but I don't need it.
Guy Steele pointed out in Communications of the ACM that "*" was a forced choice when it was adopted from Ascii as multiply, but my software works in Unicode, so I'm not tethered to Ascii anymore.
!$%&*+-./<=>?,#^|~:
Part of localization includes local programmers. Why limit the set of operators that can be defined in F#? It isn’t orthogonal to C#'s and F#'s acceptance of many Unicode IsLetter in identifiers.
Also, F# is likely to be used for symbolic manipulation of problems from logic, math, physicists, etc. It makes work much easier if there’s a direct mapping into the language of the basic operators. (F# and C# accept many Unicode IsLetter? as well as IsDigit’? This is a request to allow Unicode IsSymbol? As operators with the precedence of, for example, *, or, since “+” is both a unary and binary operator, I could put up with the precedence of + and make up the difference with parenthesized groupings.
Consider the domain-specific needs of logicians, mathematicians, physicists, etc. I’d rather write a symbolic differentiator or integrator using math symbols than Ascii permutations of already-taken operators.
Logic: ∀ ∃ ⇒
Math: ∑ ∫ ∂
Group theory: ≤ ≥ ∈ ∉
Set Theory: ⊆ ⊇ ⊃ ∪ ∩
Tensors: ⊗
I’ve written many languages in other languages, but because F# is tightly .Net-integrated, this issue poses special challenges without language support:
It’s trivial to cobble up a translator that takes Unicode-operator F# source and maps it, line-by-line, to Ascii-operator F# source.
But when debugging, how do I make sure the programmer still sees their untranslated source? And that they can see variable values.
Operators and converts them is trivial. But how do I ensure the translation is what gets compiled, while the programmer sees their own source? If I map line-for-line correctly, how do I ensure they can still point at a variable and see its value?
There is a math (Unicode) symbol extension for F# available in the Visual Studio Gallery.
This allows you to define Unicode symbols, e.g.:
let inline (~∑) xs = xs |> Seq.sum
let total = ∑myList
You may be interested in Project Fortress which is a new functional programming language that embraces the Unicode character set (among many other features). In particular, see the Mathematical Syntax in Fortress page which contains some sample code.
For an interesting discussion on this check: http://cs.hubfs.net/forums/thread/9690.aspx
Other languages, such as Scala, do permit operators from outside the ASCII range -- mathematical symbols(Sm) and other symbols(So)

What are the limitations of primitive character types in D?

I am currently exploring the specification of the Digital Mars D language, and am having a little trouble understanding the complete nature of the primitive character types. The book Learn to Tango With D is similarly vague on the capabilities and limitations of the language in this area.
The types are given on the website as:
char; // unsinged 8 bit UTF-8
wchar; // unsigned 16 bit UTF-16
dchar; // unsigned 32 bit UTF-32
Since we know that most of the Unicode Transformation (UTF) Format encodings represent characters with a variable bit-width, does this mean that a char in D can only contain the values that will fit in 8 bits, or does it expand in the machine's physical memory when you give it double byte characters? Perhaps there is some other possibility, like automatic casting into the next most appropriate type as you overload the variable?
Let's say for example, I want to use the UTF-8 char in an editor and type in Chinese . Will it simply fall over, or is it able to deal with Unicode characters more 'correctly', like in C#? Would it still be necessary to provide glue code to allow working with any language supported by Unicode?
I'd appreciate any specific information you can offer on how these types work under the covers, and any general best practices advice on dealing with their limitations.
A single char or wchar represents an UTF code unit. This means that, by its own, a char in can either represent an ASCII symbol (0-127) or be part of an UTF-8 sequence representing an Unicode character (code point). Only the dchar type can represent an entire Unicode character, because there are more than 65536 code points in Unicode.
Casting one type of string type (string, wstring and dstring, which are simply dynamic arrays of the character types) will not automatically convert their contents to the respective UTF representation. In order to do this, you must use the functions toUTF8, toUTF16 and toUTF32 from std.utf (or toString / toString16 / toString32 from tango.text.convert.Utf if you use Tango).
Users have implemented string classes which will automatically use the most memory-efficient representation that can map each character to a single code unit. This allows quick slicing and indexing with a minimal memory overhead. One such implementation is mtext by Christopher E. Miller.
Further reading:
the String handling section in Wikipedia's entry on D
Text in D, by Daniel Keep