Ways to make operator `<` work on custom types - operator-overloading

I have type 'a edge = {from: 'a; destination: 'a; weight: int}
and I want to have Printf.printf "%b\n"
( {from= 0; destination= 8; weight= 7}
< {from= 100; destination= 33; weight= -1} ) print true
so I tried this let ( < ) {weight= wa} {weight= wb} = wa < wb
but after this, the < operator only works on 'a edge, and it means that 1 < 2 will raise an error.
the reason why I want to do this is below
I write a leftist tree
type 'a leftist = Leaf | Node of 'a leftist * 'a * 'a leftist * int
let rank t = match t with Leaf -> 0 | Node (_, _, _, r) -> r
let is_empty t = rank t = 0
let rec merge t1 t2 =
match (t1, t2) with
| Leaf, _ -> t2
| _, Leaf -> t1
| Node (t1l, v1, t1r, r1), Node (t2l, v2, t2r, r2) ->
if v1 > v2 then merge t2 t1
else
let next = merge t1r t2 in
let rt1l = rank t1l and rn = rank next in
if rt1l < rn then Node (next, v1, t1l, rn + 1)
else Node (t1l, v1, next, rt1l + 1)
let insert v t = merge t (Node (Leaf, v, Leaf, 1))
let peek t = match t with Leaf -> None | Node (_, v, _, _) -> Some v
let pop t = match t with Leaf -> Leaf | Node (l, _, r, _) -> merge l r
If I cannot make < work as I expect, I must pass in a compare lambda wherever the < is used and substitute it. And I find it unattractive.

OCaml does not support adhoc polymorphism, but you can put the custom operator in a module that you can open locally only where you need it:
module Infix =
struct
let ( > ) = ...
end
...
if Infix.(rt1l < rn) then ...
That way, < will work on trees only inside Infix.( ... ) and still refer to Pervasives.(<) outside it.

Related

Min-cost flow modification for arcs with fixed cost?

I have a min-cost flow network in which some arcs have a fixed charge, that is, if arc k has non-zero flow x_k, then the cost is c_k, independent of the amount of flow. A flow of 0 incurs 0 cost. These arcs do not have capacity constraints.
I know how to model this as a mixed integer program (MIP): Add a 0/1 variable y_k with cost c_k. Set the capacity on arc k to M * y_k, where M is larger than the sum of all supplies. So the fixed cost is incurred if and only if the arc has flow.
Can this be solved using a min-cost flow formulation, which would be more efficient than a general MIP implementation? Does OR-Tools (or any other package) have an extension to min-cost flow that accommodates this?
Cross-posted to the Google OR-Tools list.
Thanks,
Hershel
I'm not sure that I understand you (most likely due to my ignorance). - you will possibly get a better response from the OR-forum than here.
However, I think there may be a way of doing what you ask as a circuit via AddCircuit()
Essentially I believe one can maximise (or minimise) those arcs which are marked as having a cost.
Here is an example using the AddCircuit constraint, where one outgoing arc from each node has a fixed cost.
from ortools.sat.python import cp_model
class DiGraphSolver:
def __init__(self, desc):
self.model = cp_model.CpModel()
self.status = cp_model.UNKNOWN
self.timing = None
# AddCircuit needs a numeric index for each node.
# Here's two lazy key->index / index->key lookups.
self.keys = {k: i for i, k in enumerate(desc.nodes.keys()) }
self.revs = {i: k for k, i in self.keys.items() }
# Determine the start and stop nodes
self.start = self.keys[desc.start]
self.stop = self.keys[desc.stop]
# Store the nodes dict in it's indexed form.
self.nodes = {self.keys[head]: [self.keys[t] for t in tails] for head,tails in desc.nodes.items()}
self.heavies = [(self.keys[head],self.keys[tail]) for head,tail in desc.heavies.items()]
self.arcs = []
self.vars = []
self.result = []
self.heavy_arcs = []
self.weight = 0
def setup(self):
self.arcs = [
(head,tail, self.model.NewBoolVar(f'{head}:{tail}')) for head, tails in self.nodes.items() for tail in tails
]
self.heavy_arcs = [arc[2] for arc in self.arcs if arc[:-1] in self.heavies]
# vars is a list of all the arcs defined in the problem.
self.vars = [arc[2] for arc in self.arcs]
# Add self loops for all *optional* nodes (because AddCircuit requires a Hamiltonian Circuit)
# for this example, that's everywhere except for 'start' and 'stop'
# We just use the keys of self.revs (the index values).
loops = [(n, n, self.model.NewBoolVar(f'{n}:{n}')) for n in self.revs if n not in [self.start, self.stop]]
self.arcs += loops
# connect the stop variable to the start variable as a dummy arc to complete the hamiltonian circuit.
# Because start and stop are not self-closing (non-optional), we don't need to set truth values.
loop = (self.stop, self.start, self.model.NewBoolVar(f'loop'))
self.arcs.append(loop)
# Now add the circuit as a constraint.
self.model.AddCircuit(self.arcs)
# Now reduce weighted nodes.
self.model.Minimize(sum(self.heavy_arcs)) # look for the shortest network with the lightest weight.
def solve(self) -> bool:
cp_solver = cp_model.CpSolver()
cp_solver.parameters.max_time_in_seconds = 1
cp_solver.parameters.num_search_workers = 12
self.status = cp_solver.Solve(self.model)
return self.summarise(cp_solver)
def summarise(self, cp_solver) -> bool:
if self.status in (cp_model.OPTIMAL, cp_model.FEASIBLE):
self.store(cp_solver)
return True
else:
if self.status == cp_model.INFEASIBLE:
print(f"Challenge for {self.step_count} arc{'s ' if self.step_count > 1 else ' '}is infeasible after {cp_solver.WallTime()}s.")
else:
print(f"Solver ran out of time.")
return False
def store(self, cp_solver):
self.timing = cp_solver.WallTime()
used = [arc for arc in self.arcs if cp_solver.Value(arc[2])]
arc = None, self.start
while True:
arc = next((link for link in used if link[0] == arc[1]), None)
self.result.append(self.revs[arc[0]])
if arc[1] == self.start:
break
self.weight = cp_solver.ObjectiveValue()
self.step_count = len(self.result) - 1
def show(self):
print(f"{'-'.join(self.result)}")
print(f'Cost: {self.weight}')
class RandomDigraph:
"""
define a problem.
26 nodes, labelled 'a' ... 'z'
start at 'a', stop at 'z'
Each node other than 'z' has a 4 outgoing arcs (random but not going to 'a')
"""
def __init__(self):
from random import sample,randint #
names = 'abcdefghijklmnopqrstuvwxyz'
arcs = 4
self.steps = 1
self.start = 'a'
self.stop = 'z'
but_first = set(names) ^ set(self.start)
self.nodes = {v: sample(but_first - set(v), arcs) for v in names}
self.heavies = {v: self.nodes[v][randint(0, arcs - 1)] for v in names if v != self.stop}
self.nodes[self.stop] = []
def print_nodes(self):
for key, value in self.nodes.items():
vs = [f" {v} " if v != self.heavies[key] else f"*{v}*" for v in value]
print(f'{key}: {"".join(vs)}')
def solve_with_steps(problem) -> int:
solver = DiGraphSolver(problem)
solver.setup()
if solver.solve():
solver.show()
return solver.step_count
def solve_az_paths_of_a_random_digraph():
problem = RandomDigraph()
problem.print_nodes()
print()
solve_with_steps(problem)
if __name__ == '__main__':
solve_az_paths_of_a_random_digraph()
Example run (solving a..z) gives
# network: (heavy arcs are marked by the tail in **.)
# eg. a->p is a heavy arc.
a: *p* d i l
b: *t* u e y
c: r v *m* q
d: q t *f* l
e: k *o* y i
f: i p z *u*
g: s h i *x*
h: *g* l j d
i: x f e *k*
j: *g* r e p
k: d *c* g q
l: r f j *h*
m: *i* b d r
n: t v y *b*
o: s x q *w*
p: w g *h* n
q: o r *f* p
r: f *c* i m
s: y c w *p*
t: *y* d v i
u: *h* z w n
v: *d* x f t
w: l c *s* r
x: *j* r g m
y: b j *u* c
z:
Solution:
a-i-e-k-g-h-j-p-w-c-q-o-s-y-b-u-n-t-v-x-r-m-d-l-f-z
Cost: 0.0

How to use string.split() without foreach()?

Write a program in Scala that reads an String from the keyboard and counts the number of characters, ignoring if its UpperCase or LowerCase
ex: Avocado
R: A = 2; v = 1; o = 2; c = 1; d = 2;
So, i tried to do it with two fors iterating over the string, and then a conditional to transform the character in the position (x) to Upper and compare with the character in the position (y) which is the same position... basically i'm transforming the same character so i can increment in the counter ex: Ava -> A = 2; v = 1;
But with this logic when i print the result it comes with:
ex: Avocado
R: A = 2; v = 1; o = 2; c = 1; a = 2; d = 1; o = 2;
its repeting the same character Upper or Lower in the result...
so my teacher asked us to resolve this using the split method and yield of Scala but i dunno how to use the split without forEach() that he doesnt allow us to use.
sorry for the bad english
object ex8 {
def main(args: Array[String]): Unit = {
println("Write a string")
var string = readLine()
var cont = 0
for (x <- 0 to string.length - 1) {
for (y <- 0 to string.length - 1) {
if (string.charAt(x).toUpper == string.charAt(y).toUpper)
cont += 1
}
print(string.charAt(x) + " = " + cont + "; ")
cont = 0
}
}
}
But with this logic when i print the result it comes with:
ex: Avocado
R: A = 2; V = 1; o = 2; c = 1; a = 2; d = 1; o = 2;
Scala 2.13 has added a very handy method to cover this sort of thing.
inputStr.groupMapReduce(_.toUpper)(_ => 1)(_+_)
.foreach{case (k,v) => println(s"$k = $v")}
//A = 2
//V = 1
//C = 1
//O = 2
//D = 1
It might be easier to group the individual elements of the String (i.e. a collection of Chars, made case-insensitive with toLower) to aggregate their corresponding size using groupBy/mapValues:
"Avocado".groupBy(_.toLower).mapValues(_.size)
// res1: scala.collection.immutable.Map[Char,Int] =
// Map(a -> 2, v -> 1, c -> 1, o -> 2, d -> 1)
Scala 2.11
Tried with classic word count approach of map => group => reduce
val exampleStr = "Avocado R"
exampleStr.
toLowerCase.
trim.
replaceAll(" +","").
toCharArray.map(x => (x,1)).groupBy(_._1).
map(x => (x._1,x._2.length))
Answer :
exampleStr: String = Avocado R
res3: scala.collection.immutable.Map[Char,Int] =
Map(a -> 2, v -> 1, c -> 1, r -> 1, o -> 2, d -> 1)

Kendall tau distance in Scala

Is this a correct implementation of Kendall tau distance in Scala
def distance[A : Ordering](s: Seq[A], t: Seq[A]): Int = {
assert(s.size == t.size, "Both sequences should be of the same length")
s.combinations(2).zip(t.combinations(2)).count {
case (Seq(s1, s2), Seq(t1, t2)) =>
(s1 > s2 && t1 < t2) || (s1 < s2 && t1 > t2)
}
}
The problem is I do not have enough data to test the algorithm on, only a few examples from Wikipedia. And I do not understand the algorithm well enough to generate my own test data. Most sources are about Kendall tau rank correlation coefficient, which is related but different animal. Maybe I could somehow derive one from the other?
For now let's say that performance is not important.
UPDATE
So, now I have three implementations of Kendall tau distance algorithm. Two of them (distance1 and distance3) give identical results (see bellow). So, which one is correct?
import scala.math.Ordering.Implicits._
val permutations = Random.shuffle((0 until 5).permutations).take(100)
println("s\tt\tDist1\tDist2\tDist3")
permutations.sliding(2).foreach { case Seq(s, t) =>
println(s.mkString(",")+"\t"+t.mkString(",")+"\t"+distance1(s, t)+"\t"+distance2(s, t)+
"\t"+distance3(s, t))
}
def distance1[A : Ordering](s: Seq[A], t: Seq[A]): Int = {
assert(s.size == t.size, "Both sequences should be of the same length")
s.combinations(2).zip(t.combinations(2)).count { case (Seq(s1, s2), Seq(t1, t2)) =>
(s1 > s2 && t1 < t2) || (s1 < s2 && t1 > t2)
}
}
def distance2[A](a: Seq[A], b: Seq[A]): Int = {
val aMap = a.zipWithIndex.toMap // map of a items to their ranks
val bMap = b.zipWithIndex.toMap // map of b items to their ranks
a.combinations(2).count{case Seq(i, j) =>
val a1 = aMap.get(i).get // rank of i in A
val a2 = aMap.get(j).get // rank of j in A
val b1 = bMap.get(i).get // rank of i in B
val b2 = bMap.get(j).get // rank of j in B
a1.compare(a2) != b1.compare(b2)
}
}
def distance3(τ_1: Seq[Int], τ_2: Seq[Int]) =
(0 until τ_1.size).map { i =>
(i+1 until τ_2.size).count { j =>
(τ_1(i) < τ_1(j) && τ_2(i) > τ_2(j)) || (τ_1(i) > τ_1(j) && τ_2(i) < τ_2(j))
}
}.sum
And here are some results:
s t Dist1 Dist2 Dist3
3,0,4,2,1 1,4,3,0,2 6 6 6
1,4,3,0,2 0,4,1,2,3 3 5 3
0,4,1,2,3 4,0,1,3,2 8 2 8
4,0,1,3,2 1,2,0,4,3 4 6 4
1,2,0,4,3 2,3,1,4,0 3 5 3
2,3,1,4,0 1,0,3,2,4 8 6 8
1,0,3,2,4 1,3,2,4,0 7 3 7
1,3,2,4,0 4,3,0,1,2 6 6 6
4,3,0,1,2 1,0,2,4,3 7 7 7
1,0,2,4,3 3,4,1,2,0 8 8 8
3,4,1,2,0 1,4,2,0,3 5 5 5
1,4,2,0,3 1,0,3,4,2 8 4 8
I don't think this is quite right. Here's some quickly written code that emphasizes that what you are comparing is the rank of the items in the sequences (you don't really want to keep those get(n).get calls in your code though). I used compare, too, which I think makes sense:
def tauDistance[A](a: Seq[A], b: Seq[A]) = {
val aMap = a.zipWithIndex.toMap // map of a items to their ranks
val bMap = b.zipWithIndex.toMap // map of b items to their ranks
a.combinations(2).count{case Seq(i, j) =>
val a1 = aMap.get(i).get // rank of i in A
val a2 = aMap.get(j).get // rank of j in A
val b1 = bMap.get(i).get // rank of i in B
val b2 = bMap.get(j).get // rank of j in B
a1.compare(a2) != b1.compare(b2)
}
}
So, the Wikipedia defines K on the ranks of the elements like this:
K(τ_1,τ_2) = |{(i,j): i < j, (τ_1(i) < τ_1(j) && τ_2(i) > τ_2(j)) || (τ_1(i) > τ_1(j) && τ_2(i) < τ_2(j))}|
We can implement this pretty directly in Scala, remembering that the inputs are sequences of ranks, not the items themselves:
def K(τ_1: Seq[Int], τ_2: Seq[Int]) =
(0 until τ_1.size).map{i =>
(i+1 until τ_2.size).count{j =>
(τ_1(i) < τ_1(j) && τ_2(i) > τ_2(j)) || (τ_1(i) > τ_1(j) && τ_2(i) < τ_2(j))
}
}.sum
This is actually a bit preferable to the tauDistance function above, since that function assumes all the items are unique (and so will fail if the sequences have duplicates) while this one works on the ranks directly.
Working with combinatoric functions is hard sometimes, and it's often not enough just to have unit tests that pass.

println in scala for-comprehension

In a for-comprehension, I can't just put a print statement:
def prod (m: Int) = {
for (a <- 2 to m/(2*3);
print (a + " ");
b <- (a+1) to m/a;
c = (a*b)
if (c < m)) yield c
}
but I can circumvent it easily with a dummy assignment:
def prod (m: Int) = {
for (a <- 2 to m/(2*3);
dummy = print (a + " ");
b <- (a+1) to m/a;
c = (a*b)
if (c < m)) yield c
}
Being a side effect, and only used (so far) in code under development, is there a better ad hoc solution?
Is there a serious problem why I shouldn't use it, beside being a side effect?
update showing the real code, where adapting one solution is harder than expected:
From the discussion with Rex Kerr, the necessity has risen to show the original code, which is a bit more complicated, but did not seem to be relevant for the question (2x .filter, calling a method in the end), but when I tried to apply Rex' pattern to it I failed, so I post it here:
def prod (p: Array[Boolean], max: Int) = {
for (a <- (2 to max/(2*3)).
filter (p);
dummy = print (a + " ");
b <- (((a+1) to max/a).
filter (p));
if (a*b <= max))
yield (em (a, b, max)) }
Here is my attempt -- (b * a).filter is wrong, because the result is an int, not a filterable collection of ints:
// wrong:
def prod (p: Array[Boolean], max: Int) = {
(2 to max/(2*3)).filter (p).flatMap { a =>
print (a + " ")
((a+1) to max/a).filter (p). map { b =>
(b * a).filter (_ <= max).map (em (a, b, max))
}
}
}
Part II belongs to the comments, but can't be read, if written there - maybe I delete it in the end. Please excuse.
Ok - here is Rex last answer in code layout:
def prod (p: Array[Boolean], max: Int) = {
(2 to max/(2*3)).filter (p).flatMap { a =>
print (a + " ")
((a+1) to max/a).filter (b => p (b)
&& b * a < max).map { b => (m (a, b, max))
}
}
}
This is how you need to write it:
scala> def prod(m: Int) = {
| for {
| a <- 2 to m / (2 * 3)
| _ = print(a + " ")
| b <- (a + 1) to (m / a)
| c = a * b
| if c < m
| } yield c
| }
prod: (m: Int)scala.collection.immutable.IndexedSeq[Int]
scala> prod(20)
2 3 res159: scala.collection.immutable.IndexedSeq[Int] = Vector(6, 8, 10, 12, 14
, 16, 18, 12, 15, 18)
Starting Scala 2.13, the chaining operation tap, has been included in the standard library, and can be used with minimum intrusiveness wherever we need to print some intermediate state of a pipeline:
import util.chaining._
def prod(m: Int) =
for {
a <- 2 to m / (2 * 3)
b <- (a + 1) to (m / a.tap(println)) // <- a.tap(println)
c = a * b
if c < m
} yield c
prod(20)
// 2
// 3
// res0: IndexedSeq[Int] = Vector(6, 8, 10, 12, 14, 16, 18, 12, 15, 18)
The tap chaining operation applies a side effect (in this case println) on a value (in this case a) while returning the value (a) untouched:
def tap[U](f: (A) => U): A
It's very convenient when debugging as you can use a bunch of taps without having to modify the code:
def prod(m: Int) =
for {
a <- (2 to m.tap(println) / (2 * 3)).tap(println)
b <- (a + 1) to (m / a.tap(println))
c = (a * b).tap(println)
if c < m
} yield c
I generally find that style of coding rather difficult to follow, since loops and intermediate results and such get all mixed in with each other. I would, instead of a for loop, write something like
def prod(m: Int) = {
(2 to m/(2*3)).flatMap { a =>
print(a + " ")
((a+1) to m/a).map(_ * a).filter(_ < m)
}
}
This also makes adding print statements and such easier.
It doesn't seem like good style to put a side-effecting statement within a for-comprehension (or indeed in the middle of any function), execept for debugging in which case it doesn't really matter what you call it ("debug" seems like a good name).
If you really need to, I think you'd be better separating your concerns somewhat by assigning an intermediate val, e.g. (your original laid out more nicely):
def prod (p: Array[Boolean], max: Int) = {
for {
a <- (2 to max / (2 * 3)) filter p
debug = print (a + " ")
b <- ((a + 1) to max / a) filter p
if a * b <= max
} yield em(a, b, max)
}
becomes
def prod2 (p: Array[Boolean], max: Int) = {
val as = (2 to max / (2 * 3)) filter p
for(a <- as) print(a + " ")
as flatMap {a =>
for {
b <- ((a + 1) to max / a) filter p
if a * b <= max
} yield em(a, b, max)
}
}

Implementing sequences of sequences in F#

I am trying to expose a 2 dimensional array as a sequence of sequences on an object(to be able to do Seq.fold (fun x -> Seq.fold (fun ->..) [] x) [] mytype stuff specifically)
Below is a toy program that exposes the identical functionality.
From what I understand there is a lot going on here, first of IEnumerable has an ambiguous overload and requires a type annotation to explicitly isolate which IEnumerable you are talking about.
But then there can be issues with unit as well requiring additional help:
type blah =
class
interface int seq seq with
member self.GetEnumerator () : System.Collections.Generic.IEnumerable<System.Collections.Generic.IEnumerable<(int*int)>> =
seq{ for i = 0 to 10 do
yield seq { for j=0 to 10 do
yield (i,j)} }
end
Is there some way of getting the above code to work as intended(return a seq<seq<int>>) or am I missing something fundamental?
Well for one thing, GetEnumerator() is supposed to return IEnumerator<T> not IEnumerable<T>...
This will get your sample code to compile.
type blah =
interface seq<seq<(int * int)>> with
member self.GetEnumerator () =
(seq { for i = 0 to 10 do
yield seq { for j=0 to 10 do
yield (i,j)} }).GetEnumerator()
interface System.Collections.IEnumerable with
member self.GetEnumerator () =
(self :> seq<seq<(int * int)>>).GetEnumerator() :> System.Collections.IEnumerator
How about:
let toSeqOfSeq (array:array<array<_>>) = array |> Seq.map (fun x -> x :> seq<_>)
But this works with an array of arrays, not a two-dimensional array. Which do you want?
What are you really out to do? A seq of seqs is rarely useful. All collections are seqs, so you can just use an array of arrays, a la
let myArrayOfArrays = [|
for i = 0 to 9 do
yield [|
for j = 0 to 9 do
yield (i,j)
|]
|]
let sumAllProds = myArrayOfArrays |> Seq.fold (fun st a ->
st + (a |> Seq.fold (fun st (x,y) -> st + x*y) 0) ) 0
printfn "%d" sumAllProds
if that helps...
module Array2D =
// Converts 2D array 'T[,] into seq<seq<'T>>
let toSeq (arr : 'T [,]) =
let f1,f2 = Array2D.base1 arr , Array2D.base2 arr
let t1,t2 = Array2D.length1 arr - f1 - 1 , Array2D.length2 arr - f2 - 1
seq {
for i in f1 .. t1 do
yield seq {
for j in f2 .. t2 do
yield Array2D.get arr i j }}
let myArray2D : string[,] = array2D [["a1"; "b1"; "c1"]; ["a2"; "b2"; "c2"]]
printf "%A" (Array2D.toSeq myArray2D)