I found this description about creating a hash of a string to a uint32_t size:
http://lolengine.net/blog/2011/12/20/cpp-constant-string-hash
I want to use this macro to init a global variable. I don't want to add the string in compiled binary, only the hash.
But when using this macros I get the error: error:Initializer element is not constant
Is there a work around for this is for C & GCC compiler?
Any other idea to place a hash of a string by pre-processor?
In C, a static initializer must be a constant expression, and C is very picky about what a constant expression is:
An arithmetic constant expression shall have arithmetic type and shall only have operands that are integer constants, floating constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and _Alignof expressions. (§6.6/8)
Note that string literals are not in the list of valid operands, so "A string"[2] does not qualify.
C++ does not require static initializers to be constant expressions, and it is also a lot more generous about what is accepted as a constant expression. (For example, a static const int variable can be used in C++, but not in C.)
So the C preprocessor is not going to be able to help you construct a static declaration initialized to a hash computed from a character string. If you really want to do this, your best bet is probably to do your own preprocessing of the source file, with a utility which identifies the calls to HASH and replaces them with the computed hash as an integer constant.
Related
How can I hash pairs or triples of 'eq-able objects like symbols or ints?
In python I can use tuples as dictionary keys, is there a way to do this in lisp without resorting to an 'equal test?
While some implementations might provide provisions for custom hash table functions, the standard only defines four:
18.1.1 Hash-Table Operations
There are four kinds of hash tables: those whose keys are compared with eq, those whose keys are compared with eql, those whose keys are
compared with equal, and those whose keys are compared with equalp.
That means that if you want to use the standard hash tables, then you'll probably need to use an equal or equalp hash table. I do notice that you wrote:
How can I hash pairs or triples of 'eq-able objects like symbols or
ints?
While symbols can be compared reliably with eq, you shouldn't compare numbers with eq. The documentation of eq says:
numbers with the same value need not be eq, … An implementation is permitted to make "copies" of characters and numbers at any time. The effect is that Common Lisp makes no guarantee that eq is true even when both its arguments are "the same thing" if that thing is a character or number.
and gives this example:
(eq 3 3)
; => true
; OR=> false
However, if you are working with (small) tuples of integers, you could easily hash on a function of them. E.g., the tuple (a,b,c) could be mapped to 2a×3b×5c. Since a function like that would generate unique numbers which are comparable with eql, you could use an eql hash table.
Another option for such a mapping function (that would work with symbols, too) would be to use sxhash. It's a standard hashing function that should produce identical values for equal values. How it works, and what exactly it does is not really specified at all, but it has the advantage that it's stable across Lisp images of the same implementation (e.g., run one version of SBCL today and tomorrow, and sxhash will return the same result for an equal object). Of course, it's possible that an equal-hash-table is just doing this for you already, so your mileage might vary.
Unicode code points range from U+000000 to U+10FFFF. While writing myself a lexer generator in F#, I ran into the following problem:
For the character set definitions, I intend to use a simple tuple of type char * char, expressing a range of characters. Omitting some peripheral details, I also need a range I call Alland which is supposed to be the full unicode range.
Now, it is possible to define a char literal as such: let c = '\u3000'. And for strings, it is also possible to refer to a real 32 bit code point like this: let s = "\U0010FFFF". But the latter does not work for chars. The reason being, that a char in .NET is a 16 bit unicode character and the code point would yield 2 words, not one.
So the question is - is there a way I can stick to my char * char tuple and get my All defined somehow or do I need to change it to uint32 * uint32 and define all my character ranges as 32 bit values? And if I have to change, is there a type I should prefer over uint32 I did not discover yet?
Thanks, in advance.
Is it possible to convert a church numeral to an integer representation without using a language primitive such as add1?
All the examples I've come across use a primitive to dechurch to int
Example:
plus1 = lambda x: x + 1
church2int = lambda n: n(plus1)(0)
Example 2:
(define (church-numeral->int cn)
((cn add1) 0))
I'm experimenting with a micro lisp intepretter (using only John McCarthy's 10 rules) and would like to understand if that can be done without adding a primitive.
The integer numeric type is not part of McCarthy's list of Lisp elementary primitive procedures - you only have functions at that level, no other data types exist. That's why integers would need to be represented as functions (for instance, using Church numerals) if we were to adhere strictly to such minimalistic definition of Lisp. So the answer is no. You can't convert to a data type that doesn't exist yet.
Now suppose that we add integers as atoms in the language (notice that adding a new data type to the language goes beyond the 7-10 primitive procedures mentioned). To simplify even more, suppose that we just add a single number, the number zero - then we'd still need the add1 operation to build the rest of the integers, as per Peano axioms, which require the existence of the successor operation for the natural numbers to exist. Again, we can't convert from Church numerals to integers without at least having the number zero as an atom and the add1 function.
No. int, as you describe it, is a primitive type of value, not a function. You can't manipulate such ints at all without primitives (without add1, how are you ever going to get to 1 from 0?).
However, you certainly can convert between two different Church-encodings of natural numbers without using primitives, as long as your language is Turing-complete without those primitives.
First of all, as I understand it variable identifiers are called symbols in common lisp.
I noted that while in languages like C variable identifiers can only be alphanumberics and underscores, Common Lisp allows many more characters to be used like "*" and (at least scheme does) "?"
So, what I want to know is: what exactly is the full set of characters that Common Lisp allows to have in a symbol (or variable identifier if I'm wrong)? is that the same for Scheme?
Also, is the set of characters different for function names?
I've been googling, looking in the CLHS, and in Practical Common Lisp, and for the life of me, something must be wrong because I can't seem to find the answer.
A detailed answer is a bit tricky. There is the ANSI standard for Common Lisp. It defines the set of available characters. Basically you can use all those defined characters for symbols. See also Symbols as Tokens.
For example
|Polynom 2 * x ** 3 - 5 * x ** 2 + 10|
is a valid symbol. Note that the vertical bars mark the symbol and do not belong to the symbol name.
Then there are the existing implementations of Common Lisp and their support of various character sets and string types. So several support Unicode (or similar) and allow Unicode characters in symbol names.
LispWorks:
CL-USER 1 > (list 'δ 'ψ 'σ '\|)
(δ ψ σ \|)
[From a Schemer's perspective. Even though some concepts in Scheme and Common Lisp have the same name, it does not mean that the mean the same thing in the two languages.]
First note that symbols and identifiers are two different things.
Symbols can be thought of as strings which support fast equality comparision.
Two symbols s and t are equal (more or less) if they are spelled the same way. The operation string=? needs to loop over the characters in the and see if they are all alike. This take time proportional to the length of the shortest string. Symbols on the other hand are automatically (ny the runtime system) put into a (typically) hash table. Therefore symbol=? boils down to a simple pointer comparison and is thus very fast. Symbols are often used in cases where one in C would use enumerations.
Symbols are values that can be present at runtime.
Identifiers are simply names of variables in a program.
Now if said program is to be represented as a Scheme value, one choice would be to use symbols to represent identifiers - but that does not mean symbols are identifiers (or vice versa). A better representation of identifiers (still in Scheme) is syntax objects which besides the name of the identifier also records the where the identifier was read (or constructed). Say you encounter an undefined variable and want to signal where in the program the undefined variable is, then is very convenient that the source location is part of the representation of the identifier.
Last but not least. What are the legal characters of an identifer? Here it is best to quote chapter and version from R6RS:
4.2.4 Identifiers
Most identifiers allowed by other programming languages are also
acceptable to Scheme. In general, a sequence of letters, digits, and
“extended alphabetic characters” is an identifier when it begins with
a character that cannot begin a representation of a number object. In
addition, +, -, and ... are identifiers, as is a sequence of letters,
digits, and extended alphabetic characters that begins with the
two-character sequence ->. Here are some examples of identifiers:
lambda q soup
list->vector + V17a
<= a34kTMNs ->-
the-word-recursion-has-many-meanings
Extended alphabetic characters may be used within identifiers as if
they were letters. The following are extended alphabetic characters:
! $ % & * + - . / : < = > ? # ^ _ ~
Moreover, all characters whose Unicode scalar values are greater than
127 and whose Unicode category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co can be used within
identifiers. In addition, any character can be used within an
identifier when specified via an <inline hex escape>. For
example, the identifier H\x65;llo is the same as the identifier
Hello, and the identifier \x3BB; is the same as the identifier
λ.
Any identifier may be used as a variable or as a syntactic keyword
(see sections 5.2 and 9.2) in a Scheme program. Any identifier may
also be used as a syntactic datum, in which case it represents a
symbol (see section 11.10).
From: http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4
See Chapter 2 of the CLHS, which describes the reader algorithm in detail. But the simple answer is that if a token isn't a readmacro invocation (section 2.4), and isn't a number or all dots, it defaults to being interpreted as a symbol.
I am currently exploring the specification of the Digital Mars D language, and am having a little trouble understanding the complete nature of the primitive character types. The book Learn to Tango With D is similarly vague on the capabilities and limitations of the language in this area.
The types are given on the website as:
char; // unsinged 8 bit UTF-8
wchar; // unsigned 16 bit UTF-16
dchar; // unsigned 32 bit UTF-32
Since we know that most of the Unicode Transformation (UTF) Format encodings represent characters with a variable bit-width, does this mean that a char in D can only contain the values that will fit in 8 bits, or does it expand in the machine's physical memory when you give it double byte characters? Perhaps there is some other possibility, like automatic casting into the next most appropriate type as you overload the variable?
Let's say for example, I want to use the UTF-8 char in an editor and type in Chinese . Will it simply fall over, or is it able to deal with Unicode characters more 'correctly', like in C#? Would it still be necessary to provide glue code to allow working with any language supported by Unicode?
I'd appreciate any specific information you can offer on how these types work under the covers, and any general best practices advice on dealing with their limitations.
A single char or wchar represents an UTF code unit. This means that, by its own, a char in can either represent an ASCII symbol (0-127) or be part of an UTF-8 sequence representing an Unicode character (code point). Only the dchar type can represent an entire Unicode character, because there are more than 65536 code points in Unicode.
Casting one type of string type (string, wstring and dstring, which are simply dynamic arrays of the character types) will not automatically convert their contents to the respective UTF representation. In order to do this, you must use the functions toUTF8, toUTF16 and toUTF32 from std.utf (or toString / toString16 / toString32 from tango.text.convert.Utf if you use Tango).
Users have implemented string classes which will automatically use the most memory-efficient representation that can map each character to a single code unit. This allows quick slicing and indexing with a minimal memory overhead. One such implementation is mtext by Christopher E. Miller.
Further reading:
the String handling section in Wikipedia's entry on D
Text in D, by Daniel Keep