Ocaml error with if statement - hash

I have a list of lists, eg [[1;2;3];[2];[3;4;5;6]; [7;8;9;10]
I want to place these in a Hashtbl where the key is the length of the list and the value is list of lists, which contains all sublists of the given length.
So for the example above the hash will look like as follows
Key Value
1 [[2]]
3 [[1;2;3]]
4 [[3;4;5;6];[7;8;9;10]]
In addition I am also trying to keep track of the length of the longest list and that number is what is returned by the function
The code that does this is as follows.
let hashify lst =
let hash = Hashtbl.create 123456 in
let rec collector curmax lst =
match lst with
[] -> curmax
| h::t -> let len = (List.length h) in
(if ((Hashtbl.mem hash len)=true)
then ( let v = (Hashtbl.find hash len) in Hashtbl.add hash len v#[h] ) (* Line 660 *)
else ( Hashtbl.add hash len [h]));
(collector (max len curmax) t)
in
collector 0 lst
;;
Now when I do this I get the following error for the code above
File "all_code.ml", line 600, characters 50-72:
Error: This expression has type unit but an expression was expected of type
'a list
Why does Ocaml require a return type of 'a list and how do I fix this.
Thanks in advance
Puneet

You probably should add parenthesis in (v#[h]) to avoid have it parsed as (Hashtbl.add hash len v)#[h]
And you probably should not pass 123456 to Hashtbl.create but a reasonable prime number like 307 or 2017

You are almost there: # has a lower priority than apply, and thus, as Basil said, Hashtbl.add hash len v#[h] is parsed as (Hashtbl.add hash len v)#[h]. Moreover, you are using way too much parenthesis and if ((Hashtbl.mem hash len)=true) is unnecessary verbose. So a possible good way to write your function is:
let hashify lst =
let hash = Hashtbl.create 307 in
let rec collector curmax = function
| [] -> curmax
| h::t ->
let len = List.length h in
if Hashtbl.mem hash len then
let v = Hashtbl.find hash len in
Hashtbl.add hash len (v#[h])
else
Hashtbl.add hash len [h];
collector (max len curmax) t in
collector 0 lst

Most heavy work on hash tables in OCaml hugely benefits from an update primitive. There are actually two versions, which do different things depending on whether the value exists in the table. This is the one you wish to use:
(* Binds a value to the key if none is already present, and then updates
it by applying the provided map function and returns the new value. *)
let update hashtbl key default func =
let value = try Hashtbl.find hashtbl key with Not_found -> default in
let value' = func value in
Hashtbl.remove hashtbl key ; Hashtbl.add hashtbl key value' ; value'
With these primitives, it becomes simple to manage a hashtable-of-lists:
let prepend hashtbl key item =
update hashtbl key [] (fun list -> item :: list)
From there, traversing a list and appending everything to a hashtable is quite simple:
let hashify lst =
let hash = Hashtbl.create 607 in
List.fold_left (fun acc list ->
let l = List.length list in
let _ = prepend hash l list in
max acc l
) 0 lst

Related

Function generation with arbitrary signature - revisited

I am resubmitting a question asked almost a decade ago on this site link - but which is not as generic as I would like.
What I am hoping for is a way to construct a function from a list of types, where the final output type can have an arbitrary/default value (such as 0.0 for a float, or "" for a string). So, from
[float; int; float;]
I would get something that amounts to
fun(f: float) ->
fun(i: int) ->
0.0
I am hopeful of achieving this, but am so far unable to. It would be helping me out a lot if I could see a sample that does the above.
The answer in the above link goes some of the way, but the example seems to know its function signature at compile time, which I won't, and also generates a compiler warning.
The scenario I have, for those that find context helpful, is that I want to be able to open a dll and one way or another identify a method which will have a given signature with argument-types limited to a known set of types (i.e. float, int). For each input parameter in this function signature I will run code to generate a 'buffer' object, which will have
a buffer of data items of the given type, i.e. [1.2; 3.2; 4.5]
a supplier of that data type (supplies may be intermittent so the receiving buffer may be empty at any one time)
a generator function that transforms data items before being dispatched. This function can be updated at any time.
a dispatch function. The dispatch target of bufferA will be bufferB, and for bufferB it will be a pub-sub thing where subscribers can subscribe to the end result of the calculation, in this case a stream of floats. Data accumulates in applicative style down the chain of buffers, until the final result is published as a new stream.
a regulator that turns the stream of data heading out to the consumer on or off. This ensures orderly function application.
The function from the dll will eventually be given to BufferA to apply to a float and pass the result on to buffer B (to pick up an int). However, while setting up the buffer infrastructure I only need a function with the correct signature, so a dummy value, such as 0.0, is fine.
For a function of a known signature I can handcraft the code that creates the necessary infrastructure, but I would like to be able to automate this, and ideally register dlls and have new calculated streams available plugin-style without rebuilding the application.
If you're willing to throw type safety out the window, you could do this:
let rec makeFunction = function
| ["int"] -> box 0
| ["float"] -> box 0.0
| ["string"] -> box ""
| "int" :: types ->
box (fun (_ : int) -> makeFunction types)
| "float" :: types ->
box (fun (_ : float) -> makeFunction types)
| "string" :: types ->
box (fun (_ : string) -> makeFunction types)
| _ -> failwith "Unexpected"
Here's a helper function for invoking one of these monstrosities:
let rec invokeFunction types (values : List<obj>) (f : obj) =
match types, values with
| [_], [] -> f
| ("int" :: types'), (value :: values') ->
let f' = f :?> (int -> obj)
let value' = value :?> int
invokeFunction types' values' (f' value')
| ("float" :: types'), (value :: values') ->
let f' = f :?> (float -> obj)
let value' = value :?> float
invokeFunction types' values' (f' value')
| ("string" :: types'), (value :: values') ->
let f' = f :?> (string -> obj)
let value' = value :?> string
invokeFunction types' values' (f' value')
| _ -> failwith "Unexpected"
And here it is in action:
let types = ["int"; "float"; "string"] // int -> float -> string
let f = makeFunction types
let values = [box 1; box 2.0]
let result = invokeFunction types values f
printfn "%A" result // output: ""
Caveat: This is not something I would ever recommend in a million years, but it works.
I got 90% of what I needed from this blog by James Randall, entitled compiling and executing fsharp dynamically at runtime. I was unable to avoid concretely specifying the top level function signature, but a work-around was to generate an fsx script file containing that signature (determined from the relevant MethodInfo contained in the inspected dll), then load and run that script. James' blog/ github repository also describes loading and running functions contained in script files. Having obtained the curried function from the dll, I then apply it to default arguments to get representative functions of n-1 arity using
let p1: 'p1 = Activator.CreateInstance(typeof<'p1>) :?> 'p1
let fArity2 = fArity3 p1
Creating and running a script file is slow, of course, but I only need to perform this once when setting up the calculation stream

How can I read system input in Swift easily

I'm beginner in Swift and am having a hard time dealing with Swift String.
It has many differences from other languages I think.
So, can somebody tell me why is this statement incorrect?
I want to read a Line and insert each one Integer to variable n, l
in C, like this -> scanf("%d %d", &n, &l);
var n, l : Int?
var read : String = readLine()!
n = Int(read[read.startIndex])
l = read[read.index(read.startIndex, offsetBy : 2)]
The best way to handle input for a cli tool in Swift is probably by using the official ArgumentParser library.
But a super naive implementation would involve something like:
Read the input
Split it using spaces
Try to parse into Ints
The following example is of course not something that could be used for anything other than learning...:
print("Please input 2 numbers separated by space:")
let read = readLine()
if let inputs = read?.split(separator: " ") // Split using space
.map(String.init) // Convert substring to string
.compactMap(Int.init), // Try to convert to Ints (get rid of nils)
inputs.count > 1 { // Ensure that we got at least 2 elements
let (n, l) = (inputs[0], inputs[1])
print(n, l)
} else {
// Handle the case
}

How do I cache hash codes for an AST?

I am working on a language in F# and upon testing, I find that the runtime spends over 90% of its time comparing for equality. Because of that the language is so slow as to be unusable. During instrumentation, the GetHashCode function shows fairly high up on the list as a source of overhead. What is going on is that during method calls, I am using method bodies (Expr) along with the call arguments as keys in a dictionary and that triggers repeated traversals over the AST segments.
To improve performance I'd like to add memoization nodes in the AST.
type Expr =
| Add of Expr * Expr
| Lit of int
| HashNode of int * Expr
In the above simplified example, what I would like is that the HashNode represent the hash of its Expr, so that the GetHashCode does not have to travel any deeper in the AST in order to calculate it.
That having said, I am not sure how I should override the GetHashCode method. Ideally, I'll like to reuse the inbuilt hash method and make it ignore only the HashNode somehow, but I am not sure how to do that.
More likely, I am going to have to make my own hash function, but unfortunately I know nothing about hash functions so I am a bit lost right now.
An alternative idea that I have would be to replace nodes with unique IDs while keeping that hash function as it is, but that would introduce additional complexities into the code that I'd rather avoid unless I have to.
I needed a similar thing recently in TheGamma (GitHub) where I build a dependency graph (kind of like AST) that gets recreated very often (when you change code in editor and it gets re-parsed), but I have live previews that may take some time to calculate, so I wanted to reuse as much of the previous graph as possible.
The way I'm doing that is that I attach a "symbol" to each node. Two nodes with the same symbol are equal, which I think you could use for efficient equality testing:
type Expr =
| Add of ExprNode * ExprNode
| Lit of int
and ExprNode(expr:Expr, symbol:int) =
member x.Expression = expr
member x.Symbol = symbol
override x.GetHashCode() = symbol
override x.Equals(y) =
match y with
| :? ExprNode as y -> y.Symbol = x.Symbol
| _ -> false
I do keep a cache of nodes - the key is some code of the node kind (0 for Add, 1 for Lit, etc.) and symbols of all nested nodes. For literals, I also add the number itself, which will mean that creating the same literal twice will give you the same node. So creating a node looks like this:
let node expr ctx =
// Get the key from the kind of the expression
// and symbols of all nested node in this expression
let key =
match expr with
| Lit n -> [0; n]
| Add(e1, e2) -> [1; e1.Symbol; e2.Symbol]
// Return either a node from cache or create a new one
match ListDictionary.tryFind key ctx with
| Some res -> res
| None ->
let res = ExprNode(expr, nextId())
ListDictionary.set key res ctx
res
The ListDictionary module is a mutable dictionary where the key is a list of integers and nextId is the usual function to generate next ID:
type ListDictionaryNode<'K, 'T> =
{ mutable Result : 'T option
Nested : Dictionary<'K, ListDictionaryNode<'K, 'T>> }
type ListDictionary<'K, 'V> = Dictionary<'K, ListDictionaryNode<'K, 'V>>
[<CompilationRepresentation(CompilationRepresentationFlags.ModuleSuffix)>]
module ListDictionary =
let tryFind ks dict =
let rec loop ks node =
match ks, node with
| [], { Result = Some r } -> Some r
| k::ks, { Nested = d } when d.ContainsKey k -> loop ks (d.[k])
| _ -> None
loop ks { Nested = dict; Result = None }
let set ks v dict =
let rec loop ks (dict:ListDictionary<_, _>) =
match ks with
| [] -> failwith "Empty key not supported"
| k::ks ->
if not (dict.ContainsKey k) then
dict.[k] <- { Nested = Dictionary<_, _>(); Result = None }
if List.isEmpty ks then dict.[k].Result <- Some v
else loop ks (dict.[k].Nested)
loop ks dict
let nextId =
let mutable id = 0
fun () -> id <- id + 1; id
So, I guess I'm saying that you'll need to implement your own caching mechanism, but this worked quite well for me and may hint at how to do this in your case!

How to prevent this Cyclic polynomial hash function from using a type constraint?

I am trying to implement the Cyclic polynomial hash function in f#. It uses the bit-wise operators ^^^ and <<<. Here is an example of a function that hashes an array:
let createBuzhash (pattern : array<'a>) =
let n = pattern.Length
let rec loop index pow acc =
if index < n then
loop (index+1) (pow-1) (acc ^^^ ((int pattern.[index]) <<< pow))
else
acc
loop 0 (n-1) 0
My problem is that the type of 'a will be constrained to an int, while i want this function to work with any of the types that work with bit-wise operators, for example a char. I tried using inline, but that creates some problems farther down in my library. Is there a way to fix this without using inline?
Edit for clarity: The function will be part of a library, and another hash function is provided for types that don't support the bit-wise operators. I want this function to work with arrays of numeric types and/or chars.
Edit 2 (problem solved) : The problem with inline was the way how i loaded the function from my library. instead of
let hashedPattern = library.createBuzhash targetPattern
I used this binding:
let myFunction = library.createBuzhash
let hashedPattern = myFunction targetPattern
that constraints the input type for myFunction to int, although the createBuzhash function is an inline function in the library. Changing the way I call the function fixed the type constraint problem, and inline works perfectly fine, as the answer below suggests.
In the implementation, you are converting the value in the array to an Integer using the int function as follows: int pattern.[index]
This creates a constraint on the type of array elements requiring them to be "something that can be converted to int". If you mark the function as inline, it will actually work for types like char and you'll be able to write:
createBuzhash [|'a'; 'b'|]
But there are still many other types that cannot be converted to integer using the int function.
To make this work for any type, you have to decide how you want to handle types that are not numeric. Do you want to:
Provide your own hashing function for all values?
Use the built-in .NET GetHashCode operation?
Only make your function work on numeric types and arrays of numeric types?
One option would be to add a parameter that specifies how to do the conversion:
let inline createBuzhash conv (pattern : array<'a>) =
let n = pattern.Length
let rec loop index pow acc =
if index < pattern.Length then
loop (index+1) (pow-1) (acc ^^^ ((conv pattern.[index]) <<< pow))
else
acc
loop 0 (n-1) 0
When calling createBuzhash, you now need to give it a function for hashing the elements. This works on primitive types using the int function:
createBuzhash int [| 0 .. 10 |]
createBuzhash int [|'a'; 'b'|]
But you can also use built-in F# hashing mechanism:
createBuzhash hash [| (1,"foo"); (2,"bar") |]
And you can even handle nested arrays by passing the function to itself:
createBuzhash (createBuzhash int) [| [| 1 |]; [| 2 |] |]

MiniZinc: type error: expected `array[int] of int', actual `array[int] of var opt int

I am trying to write a predicate that performs the same operation as circuit, but ignores zeros in the array, and I keep getting the following error:
MiniZinc: type error: initialisation value for 'x_without_0' has invalid type-inst: expected 'array[int] of int', actual 'array[int] of var opt int'
in the code:
% [0,5,2,0,7,0,3,0] -> true
% [0,5,2,0,4,0,3,0] -> false (no circuit)
% [0,5,2,0,3,0,8,7] -> false (two circuits)
predicate circuit_ignoring_0(array[int] of var int: x) =
let {
array[int] of int: x_without_0 = [x[i] | i in 1..length(x) where x[i] != 0],
int: lbx = min(x_without_0),
int: ubx = max(x_without_0),
int: len = length(x_without_0),
array[1..len] of var lbx..ubx: order
} in
alldifferent(x_without_0) /\
alldifferent(order) /\
order[1] = x_without_0[1] /\
forall(i in 2..len) (
order[i] = x_without_0[order[i-1]]
)
/\ % last value is the minimum (symmetry breaking)
order[ubx] = lbx
;
I am using MiniZinc v2.0.11
Edit
Per Kobbe's suggestion that it was an issue with having a variable length array, I used "the usual workaround" of keeping the order array the same size as the original array x, and using a parameter, nnonzeros, to keep track of the part of the array I care about:
set of int: S = index_set(x),
int: u = max(S),
var int: nnonzeros = among(x, S),
array[S] of var 0..u: order
This kind of answers your question:
The problem you are experiencing is that your array size is dependent on a var. This means that MiniZinc can not really know the size of the array is should create and the opt type is used. I would suggest that you stay away from the opt type if you do not know how to handle it.
Generally the solution is to make some workaround where your arrays are not dependent of the size of an var. My solution is most often to pad the array, i.e [2,0,5,0,8] -> [2,2,5,5,8], if the application allows it, or
var int : a;
[i * bool2int(i == a) in 1..5]
if you are okay with zeroes in your answer (I guess not in this case).
Furthermore, the alldifferent_except_0 could be in interest for you, or at least you can look how alldifferent_except_0 solves the problem with zeroes in the answer.
predicate alldifferent_except_0(array [int] of var int: vs) =
forall ( i, j in index_set(vs) where i < j ) (
vs[i]!=0 /\ vs[j]!=0 -> vs[i]!=vs[j]
)
from MiniZinc documentation