Using F#'s hash function inside GetHashCode() evil? - hash

I encountered a couple of places online where code looked something like this:
[<CustomEquality;NoComparison>]
type Test =
| Foo
| Bar
override x.Equals y =
match y with
| :? Test as y' ->
match y' with
| Foo -> false
| Bar -> true // silly, I know, but not the question here
| _ -> failwith "error" // don't do this at home
override x.GetHashCode() = hash x
But when I run the above in FSI, the prompt does not return when I either call hash foo on an instance of Test or when I call foo.GetHashCode() directly.
let foo = Test.Foo;;
hash foo;; // no returning to the console until Ctrl-break
foo.GetHashCode();; // no return
I couldn't readily proof it, but it suggests that hash x calls GetHashCode() on the object, which means the above code is dangerous. Or is it just FSI playing up?
I thought code like the above just means "please implement custom equality, but leave the hash function as default".
I have meanwhile implemented this pattern differently, but am still wondering whether I am correct in assuming that hash just calls GetHashCode(), leading to an eternal loop.
As an aside, using equality inside FSI returns immediately, suggesting that it either does not call GetHashCode() prior to comparison, or it does something else. Update: this makes sense as in the example above x.Equals does not call GetHashCode(), and the equality operator calls into Equals, not into GetHashCode().

It's not quite as simple as the hash function simply being a wrapper for GetHashCode but I can comfortably tell you that it's definitely not safe to use the implementation : override x.GetHashCode() = hash x.
If you trace the hash function through, you end up here:
let rec GenericHashParamObj (iec : System.Collections.IEqualityComparer) (x: obj) : int =
match x with
| null -> 0
| (:? System.Array as a) ->
match a with
| :? (obj[]) as oa -> GenericHashObjArray iec oa
| :? (byte[]) as ba -> GenericHashByteArray ba
| :? (int[]) as ba -> GenericHashInt32Array ba
| :? (int64[]) as ba -> GenericHashInt64Array ba
| _ -> GenericHashArbArray iec a
| :? IStructuralEquatable as a ->
a.GetHashCode(iec)
| _ ->
x.GetHashCode()
You can see here that the wild-card case calls x.GetHashCode(), hence it's very possible to find yourself in an infinite recursion.
The only case I can see where you might want to use hash inside an implementation of GetHashCode() would be when you are manually hashing some of an object's members to produce a hash code.
There is a (very old) example of using hash inside GetHashCode() in this way in Don Syme's WebLog.
By the way, that's not the only thing unsafe about the code you posted.
Overrides for object.Equals absolutely must not throw exceptions. If the types do not match, they are to return false. This is clearly documented in System.Object.
Implementations of Equals must not throw exceptions; they should
always return a value. For example, if obj is null, the Equals method
should return false instead of throwing an ArgumentNullException.
(Source)

If the GetHashCode() method is overridden, then the hash operator will use that:
[The hash operator is a] generic hash function, designed to return equal hash values for items that are equal according to the = operator. By default it will use structural hashing for F# union, record and tuple types, hashing the complete contents of the type. The exact behavior of the function can be adjusted on a type-by-type basis by implementing System.Object.GetHashCode for each type.
So yes, this is a bad idea and it makes sense that it would lead to an infinite loop.

Related

Function generation with arbitrary signature - revisited

I am resubmitting a question asked almost a decade ago on this site link - but which is not as generic as I would like.
What I am hoping for is a way to construct a function from a list of types, where the final output type can have an arbitrary/default value (such as 0.0 for a float, or "" for a string). So, from
[float; int; float;]
I would get something that amounts to
fun(f: float) ->
fun(i: int) ->
0.0
I am hopeful of achieving this, but am so far unable to. It would be helping me out a lot if I could see a sample that does the above.
The answer in the above link goes some of the way, but the example seems to know its function signature at compile time, which I won't, and also generates a compiler warning.
The scenario I have, for those that find context helpful, is that I want to be able to open a dll and one way or another identify a method which will have a given signature with argument-types limited to a known set of types (i.e. float, int). For each input parameter in this function signature I will run code to generate a 'buffer' object, which will have
a buffer of data items of the given type, i.e. [1.2; 3.2; 4.5]
a supplier of that data type (supplies may be intermittent so the receiving buffer may be empty at any one time)
a generator function that transforms data items before being dispatched. This function can be updated at any time.
a dispatch function. The dispatch target of bufferA will be bufferB, and for bufferB it will be a pub-sub thing where subscribers can subscribe to the end result of the calculation, in this case a stream of floats. Data accumulates in applicative style down the chain of buffers, until the final result is published as a new stream.
a regulator that turns the stream of data heading out to the consumer on or off. This ensures orderly function application.
The function from the dll will eventually be given to BufferA to apply to a float and pass the result on to buffer B (to pick up an int). However, while setting up the buffer infrastructure I only need a function with the correct signature, so a dummy value, such as 0.0, is fine.
For a function of a known signature I can handcraft the code that creates the necessary infrastructure, but I would like to be able to automate this, and ideally register dlls and have new calculated streams available plugin-style without rebuilding the application.
If you're willing to throw type safety out the window, you could do this:
let rec makeFunction = function
| ["int"] -> box 0
| ["float"] -> box 0.0
| ["string"] -> box ""
| "int" :: types ->
box (fun (_ : int) -> makeFunction types)
| "float" :: types ->
box (fun (_ : float) -> makeFunction types)
| "string" :: types ->
box (fun (_ : string) -> makeFunction types)
| _ -> failwith "Unexpected"
Here's a helper function for invoking one of these monstrosities:
let rec invokeFunction types (values : List<obj>) (f : obj) =
match types, values with
| [_], [] -> f
| ("int" :: types'), (value :: values') ->
let f' = f :?> (int -> obj)
let value' = value :?> int
invokeFunction types' values' (f' value')
| ("float" :: types'), (value :: values') ->
let f' = f :?> (float -> obj)
let value' = value :?> float
invokeFunction types' values' (f' value')
| ("string" :: types'), (value :: values') ->
let f' = f :?> (string -> obj)
let value' = value :?> string
invokeFunction types' values' (f' value')
| _ -> failwith "Unexpected"
And here it is in action:
let types = ["int"; "float"; "string"] // int -> float -> string
let f = makeFunction types
let values = [box 1; box 2.0]
let result = invokeFunction types values f
printfn "%A" result // output: ""
Caveat: This is not something I would ever recommend in a million years, but it works.
I got 90% of what I needed from this blog by James Randall, entitled compiling and executing fsharp dynamically at runtime. I was unable to avoid concretely specifying the top level function signature, but a work-around was to generate an fsx script file containing that signature (determined from the relevant MethodInfo contained in the inspected dll), then load and run that script. James' blog/ github repository also describes loading and running functions contained in script files. Having obtained the curried function from the dll, I then apply it to default arguments to get representative functions of n-1 arity using
let p1: 'p1 = Activator.CreateInstance(typeof<'p1>) :?> 'p1
let fArity2 = fArity3 p1
Creating and running a script file is slow, of course, but I only need to perform this once when setting up the calculation stream

Using STArray and ignore the return of modify in Purescript

I think I'm close to what I want, though I suspect I'm not understanding how thaw / TH Region works.
Here is what I'm trying to implement (at least roughly)
modifyPerIndex :: forall t a. Foldable t => t (Tuple Int (a -> a)) -> Array a -> Array a
modifyPerIndex foldableActions array = run do
mutableArray <- thaw array
let actions = fromFoldable foldableActions
foreach actions (\(Tuple index action) -> modify index action mutableArray)
freeze mutableArray
This is sort of how I imagine updateAtIndices works. I suppose I could write modifyPerIndex to use updateAtIndices by reading in the values, applying the (a -> a) and mapping the result into a list of Tuples to be sent to updateAtIndices.
I'm curious how to do it this way though.
In the code above modify returns ST h Boolean, which I'd like to change into ST h Unit. That's where I'm lost. I get that h here is a constraint put on mutable data to stop it from leaving run, what I don't understand is how to use that.
There are a few options. But it has nothing to do with h. You don't have to "use" it for anything, and you don't have to worry about it at all.
First, the most dumb and straightforward approach - just bind the result to an ignored variable and then separately return unit:
foreach actions \(Tuple index action) -> do
_ <- modify index action mutableArray
pure unit
Alternatively, you can use void, which does more or less the same thing under the hood:
foreach actions \(Tuple index action) -> void $ modify index action mutableArray
But I would go straight for for_, which is the same as foreach, but works for any monad (not just ST) and ignores individual iterations' return values:
for_ actions \(Tuple index action) -> modify index action mutableArray

Verifying programs with heterogeneous arrays in VST

I'm verifying a c program that uses arrays to store heterogeneous data - in particular, the program uses arrays to implement cons cells, where the first element of the array is an integer value, and the second element is a pointer to the next cons cell.
For example, the free operation for this list would be:
void listfree(void * x) {
if((x == 0)) {
return;
} else {
void * n = *((void **)x + 1);
listfree(n);
free(x);
return;
}
}
Note: Not shown here, but other code sections will read the values of the array and treat it as an integer.
While I understand that the natural way to express this would be as some kind of struct, the program itself is written using an array, and I can't change this.
How should I specify the structure of the memory in VST?
I've defined an lseg predicate as follows:
Fixpoint lseg (x: val) (s: (list val)) (self_card: lseg_card) : mpred := match self_card with
| lseg_card_0 => !!(x = nullval) && !!(s = []) && emp
| lseg_card_1 _alpha_513 =>
EX v : Z,
EX s1 : (list val),
EX nxt : val,
!!(~ (x = nullval)) &&
!!(s = ([(Vint (Int.repr v))] ++ s1)) &&
(data_at Tsh (tarray tint 2) [(Vint (Int.repr v)); nxt] x) *
(lseg nxt s1 _alpha_513)
end.
However, I run into troubles when trying to evaluate void *n = *(void **)x; presumably because the specification states that the memory contains an array of ints not pointers.
The issue is probably as follows, and can almost be solved as follows.
The C semantics permit casting an integer (of the right size) to a pointer, and vice versa, as long as you don't actually do any pointer operations to an integer value, or vice versa. Very likely your C program obeys those rules. But the type system of Verifiable C tries to enforce that local variables (and array elements, etc.) of integer type will never contain pointer values, and vice versa (except the special integer value 0, which is NULL).
However, Verifiable C does support a (proved-foundationally-sound) workaround to this stricter enforcement:
typedef void * int_or_ptr
#ifdef COMPCERT
__attribute((aligned(_Alignof(void*))))
#endif
;
That is: the int_or_ptr type is void*, but with the attribute "align this as void*". So it's semantically identical to void*, but the redundant attribute is a hint to the VST type system to be less restrictive about C type enforcement.
So, when I say "can almost be solved", I'm asking: Can you modify the C program to use an array of "void* aligned as void*" ?
If so, then you can proceed. Your VST verification should use int_or_ptr_type, which is a definition of type Ctypes.type provided by VST-Floyd, when referring to the C-language type of these array elements, or of local variables that these elements are loaded into.
Unfortunately, int_or_ptr_type is not documented in the reference manual (VC.pdf), which is an omission that should be correct. You can look at progs/int_or_ptr.c and progs/verif_int_or_ptr.v, but these do much more than you want or need: They axiomatize operators that distinguish odd integers from aligned pointers, which is undefined in C11 (but consistent with C11, otherwise the ocaml garbage collector could never work). That is, those axiomatized external functions are consistent with CompCert, gcc, clang; but you won't need any of them, because the only operations you're doing on int_or_pointer are the perfectly-legal "comparison with NULL" and "cast to integer" or "cast to struct foo *".

How do I cache hash codes for an AST?

I am working on a language in F# and upon testing, I find that the runtime spends over 90% of its time comparing for equality. Because of that the language is so slow as to be unusable. During instrumentation, the GetHashCode function shows fairly high up on the list as a source of overhead. What is going on is that during method calls, I am using method bodies (Expr) along with the call arguments as keys in a dictionary and that triggers repeated traversals over the AST segments.
To improve performance I'd like to add memoization nodes in the AST.
type Expr =
| Add of Expr * Expr
| Lit of int
| HashNode of int * Expr
In the above simplified example, what I would like is that the HashNode represent the hash of its Expr, so that the GetHashCode does not have to travel any deeper in the AST in order to calculate it.
That having said, I am not sure how I should override the GetHashCode method. Ideally, I'll like to reuse the inbuilt hash method and make it ignore only the HashNode somehow, but I am not sure how to do that.
More likely, I am going to have to make my own hash function, but unfortunately I know nothing about hash functions so I am a bit lost right now.
An alternative idea that I have would be to replace nodes with unique IDs while keeping that hash function as it is, but that would introduce additional complexities into the code that I'd rather avoid unless I have to.
I needed a similar thing recently in TheGamma (GitHub) where I build a dependency graph (kind of like AST) that gets recreated very often (when you change code in editor and it gets re-parsed), but I have live previews that may take some time to calculate, so I wanted to reuse as much of the previous graph as possible.
The way I'm doing that is that I attach a "symbol" to each node. Two nodes with the same symbol are equal, which I think you could use for efficient equality testing:
type Expr =
| Add of ExprNode * ExprNode
| Lit of int
and ExprNode(expr:Expr, symbol:int) =
member x.Expression = expr
member x.Symbol = symbol
override x.GetHashCode() = symbol
override x.Equals(y) =
match y with
| :? ExprNode as y -> y.Symbol = x.Symbol
| _ -> false
I do keep a cache of nodes - the key is some code of the node kind (0 for Add, 1 for Lit, etc.) and symbols of all nested nodes. For literals, I also add the number itself, which will mean that creating the same literal twice will give you the same node. So creating a node looks like this:
let node expr ctx =
// Get the key from the kind of the expression
// and symbols of all nested node in this expression
let key =
match expr with
| Lit n -> [0; n]
| Add(e1, e2) -> [1; e1.Symbol; e2.Symbol]
// Return either a node from cache or create a new one
match ListDictionary.tryFind key ctx with
| Some res -> res
| None ->
let res = ExprNode(expr, nextId())
ListDictionary.set key res ctx
res
The ListDictionary module is a mutable dictionary where the key is a list of integers and nextId is the usual function to generate next ID:
type ListDictionaryNode<'K, 'T> =
{ mutable Result : 'T option
Nested : Dictionary<'K, ListDictionaryNode<'K, 'T>> }
type ListDictionary<'K, 'V> = Dictionary<'K, ListDictionaryNode<'K, 'V>>
[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module ListDictionary =
let tryFind ks dict =
let rec loop ks node =
match ks, node with
| [], { Result = Some r } -> Some r
| k::ks, { Nested = d } when d.ContainsKey k -> loop ks (d.[k])
| _ -> None
loop ks { Nested = dict; Result = None }
let set ks v dict =
let rec loop ks (dict:ListDictionary<_, _>) =
match ks with
| [] -> failwith "Empty key not supported"
| k::ks ->
if not (dict.ContainsKey k) then
dict.[k] <- { Nested = Dictionary<_, _>(); Result = None }
if List.isEmpty ks then dict.[k].Result <- Some v
else loop ks (dict.[k].Nested)
loop ks dict
let nextId =
let mutable id = 0
fun () -> id <- id + 1; id
So, I guess I'm saying that you'll need to implement your own caching mechanism, but this worked quite well for me and may hint at how to do this in your case!

Scala closures on wikipedia

Found the following snippet on the Closure page on wikipedia
//# Return a list of all books with at least 'threshold' copies sold.
def bestSellingBooks(threshold: Int) = bookList.filter(book => book.sales >= threshold)
//# or
def bestSellingBooks(threshold: Int) = bookList.filter(_.sales >= threshold)
Correct me if I'm wrong, but this isn't a closure? It is a function literal, an anynomous function, a lambda function, but not a closure?
Well... if you want to be technical, this is a function literal which is translated at runtime into a closure, closing the open terms (binding them to a val/var in the scope of the function literal). Also, in the context of this function literal (_.sales >= threshold), threshold is a free variable, as the function literal itself doesn't give it any meaning. By itself, _.sales >= threshold is an open term At runtime, it is bound to the local variable of the function, each time the function is called.
Take this function for example, generating closures:
def makeIncrementer(inc: Int): (Int => Int) = (x: Int) => x + inc
At runtime, the following code produces 3 closures. It's also interesting to note that b and c are not the same closure (b == c gives false).
val a = makeIncrementer(10)
val b = makeIncrementer(20)
val c = makeIncrementer(20)
I still think the example given on wikipedia is a good one, albeit not quite covering the whole story. It's quite hard giving an example of actual closures by the strictest definition without actually a memory dump of a program running. It's the same with the class-object relation. You usually give an example of an object by defining a class Foo { ... and then instantiating it with val f = new Foo, saying that f is the object.
-- Flaviu Cipcigan
Notes:
Reference: Programming in Scala, Martin Odersky, Lex Spoon, Bill Venners
Code compiled with Scala version 2.7.5.final running on Java 1.6.0_14.
I'm not entirely sure, but I think you're right. Doesn't a closure require state (I guess free variables...)?
Or maybe the bookList is the free variable?
As far as I understand, this is a closure that contains a formal parameter, threshold and context variable, bookList, from the enclosing scope. So the return value(List[Any]) of the function may change while applying the filter predicate function. It is varying based on the elements of List(bookList) variable from the context.