Scala for/yield runs but doesn't complete - scala

I'm trying to walk through two arrays of potentially different sizes and compose a new array of randomly selected elements from them (for crossover in a genetic algorithm) (childGeneCount is just the length of the longer array).
In the following code snippet, each gene.toString logs, but my code doesn't seem to execute the last log. What dumb thing am I doing?
val genes = for (i <- 0 to childGeneCount) yield {
val gene = if (Random.nextBoolean()) {
if (i < p1genes.length) {
p1genes(i)
} else {
p2genes(i)
}
} else {
if (i < p2genes.length) {
p2genes(i)
} else {
p1genes(i)
}
}
Logger.debug(gene.toString)
gene
}
Logger.debug("crossover finishing - never gets here??")
New to scala, and would be happy for a slap on the wrist accompanied by a "do it this completely different way instead" if appropriate.

You are right, the problem was with "to" should have been "until". I have changed your code a bit to make it more scala like.
val p1genes = "AGTCTC"
val p2genes = "ATG"
val genePair = p1genes.zipAll(p2genes, None, None)
val matchedGene = for (pair <- genePair) yield {
pair match {
case (p1Gene, None) => p1Gene
case (None, p2Gene) => p2Gene
case (p1Gene, p2Gene) => if (Random.nextBoolean()) p1Gene else p2Gene
}
}
println(matchedGene)
The process is:
First zip two dna sequences into one.
Fill the shorter sequence with None.
Now loop over the zipped sequences and populate the new sequence.

Reworked Tawkir's answer, with cleaner None handling:
val p1genes = "AGTCTC"
val p2genes = "ATG"
val genePair = p1genes.map(Some.apply).zipAll(p2genes.map(Some.apply), None, None)
val matchedGene = genePair.map {
case (Some(p1Gene), None) => p1Gene
case (None, Some(p2Gene)) => p2Gene
case (Some(p1Gene), Some(p2Gene)) => if (Random.nextBoolean()) p1Gene else p2Gene
}
println(matchedGene)
If you want to avoid wrapping the sequence with Some, another solution is to use a character known not to appear in the sequence as a "none" marker:
val p1genes = "AGTCTC"
val p2genes = "ATG"
val none = '-'
val genePair = p1genes.zipAll(p2genes, none, none)
val matchedGene = genePair.map {
case (p1Gene, `none`) => p1Gene
case (`none`, p2Gene) => p2Gene
case (p1Gene, p2Gene) => if (Random.nextBoolean()) p1Gene else p2Gene
}
println(matchedGene)

Pretty sure harry0000's answer is correct: I was using "to" like "until", and am so used to exceptions being thrown loudly that I didn't think to look there!
I ended up switching from for/yield to List.tabulate(childGeneCount){ i => {, which fixed the error probably for the same reason.

Since you asked for possible style improvements, here are two suggested implementations. The first one is less idiomatic, but more performant. The second one is prettier but does some more work.
def crossover[E : ClassTag](a: Array[E], b: Array[E]): Array[E] = {
val (larger, smaller) = if(a.length > b.length) (a, b) else (b, a)
val result = Array.ofDim[E](larger.length)
for(i <- smaller.indices)
result(i) = if(Random.nextBoolean()) larger(i) else smaller(i)
for(i <- smaller.length until larger.length)
result(i) = larger(i)
result
}
def crossoverIdiomatic[E : ClassTag](a: Array[E], b: Array[E]): Array[E] = {
val randomPart = (a zip b).map { case (x,y) => if(Random.nextBoolean()) x else y }
val (larger, smaller) = if(a.length > b.length) (a, b) else (b, a)
randomPart ++ larger.drop(smaller.length)
}
val a = Array("1", "2", "3", "4", "5", "6")
val b = Array("one", "two", "three", "four")
// e.g. output: [one,2,three,4,5,6]
println(crossover(a, b).mkString("[", ",", "]"))
println(crossoverIdiomatic(a, b).mkString("[", ",", "]"))
Note that the E : ClassTag are only there to make the compiler happy about using Array[E], if you only need Int for your work, you can drop all the fancy generics.

Related

Scala alternative to series of if statements that append to a list?

I have a Seq[String] in Scala, and if the Seq contains certain Strings, I append a relevant message to another list.
Is there a more 'scalaesque' way to do this, rather than a series of if statements appending to a list like I have below?
val result = new ListBuffer[Err]()
val malformedParamNames = // A Seq[String]
if (malformedParamNames.contains("$top")) result += IntegerMustBePositive("$top")
if (malformedParamNames.contains("$skip")) result += IntegerMustBePositive("$skip")
if (malformedParamNames.contains("modifiedDate")) result += FormatInvalid("modifiedDate", "yyyy-MM-dd")
...
result.toList
If you want to use some scala iterables sugar I would use
sealed trait Err
case class IntegerMustBePositive(msg: String) extends Err
case class FormatInvalid(msg: String, format: String) extends Err
val malformedParamNames = Seq[String]("$top", "aa", "$skip", "ccc", "ddd", "modifiedDate")
val result = malformedParamNames.map { v =>
v match {
case "$top" => Some(IntegerMustBePositive("$top"))
case "$skip" => Some(IntegerMustBePositive("$skip"))
case "modifiedDate" => Some(FormatInvalid("modifiedDate", "yyyy-MM-dd"))
case _ => None
}
}.flatten
result.toList
Be warn if you ask for scala-esque way of doing things there are many possibilities.
The map function combined with flatten can be simplified by using flatmap
sealed trait Err
case class IntegerMustBePositive(msg: String) extends Err
case class FormatInvalid(msg: String, format: String) extends Err
val malformedParamNames = Seq[String]("$top", "aa", "$skip", "ccc", "ddd", "modifiedDate")
val result = malformedParamNames.flatMap {
case "$top" => Some(IntegerMustBePositive("$top"))
case "$skip" => Some(IntegerMustBePositive("$skip"))
case "modifiedDate" => Some(FormatInvalid("modifiedDate", "yyyy-MM-dd"))
case _ => None
}
result
Most 'scalesque' version I can think of while keeping it readable would be:
val map = scala.collection.immutable.ListMap(
"$top" -> IntegerMustBePositive("$top"),
"$skip" -> IntegerMustBePositive("$skip"),
"modifiedDate" -> FormatInvalid("modifiedDate", "yyyy-MM-dd"))
val result = for {
(k,v) <- map
if malformedParamNames contains k
} yield v
//or
val result2 = map.filterKeys(malformedParamNames.contains).values.toList
Benoit's is probably the most scala-esque way of doing it, but depending on who's going to be reading the code later, you might want a different approach.
// Some type definitions omitted
val malformations = Seq[(String, Err)](
("$top", IntegerMustBePositive("$top")),
("$skip", IntegerMustBePositive("$skip")),
("modifiedDate", FormatInvalid("modifiedDate", "yyyy-MM-dd")
)
If you need a list and the order is siginificant:
val result = (malformations.foldLeft(List.empty[Err]) { (acc, pair) =>
if (malformedParamNames.contains(pair._1)) {
pair._2 ++: acc // prepend to list for faster performance
} else acc
}).reverse // and reverse since we were prepending
If the order isn't significant (although if the order's not significant, you might consider wanting a Set instead of a List):
val result = (malformations.foldLeft(Set.empty[Err]) { (acc, pair) =>
if (malformedParamNames.contains(pair._1)) {
acc ++ pair._2
} else acc
}).toList // omit the .toList if you're OK with just a Set
If the predicates in the repeated ifs are more complex/less uniform, then the type for malformations might need to change, as they would if the responses changed, but the basic pattern is very flexible.
In this solution we define a list of mappings that take your IF condition and THEN statement in pairs and we iterate over the inputted list and apply the changes where they match.
// IF THEN
case class Operation(matcher :String, action :String)
def processInput(input :List[String]) :List[String] = {
val operations = List(
Operation("$top", "integer must be positive"),
Operation("$skip", "skip value"),
Operation("$modify", "modify the date")
)
input.flatMap { in =>
operations.find(_.matcher == in).map { _.action }
}
}
println(processInput(List("$skip","$modify", "$skip")));
A breakdown
operations.find(_.matcher == in) // find an operation in our
// list matching the input we are
// checking. Returns Some or None
.map { _.action } // if some, replace input with action
// if none, do nothing
input.flatMap { in => // inputs are processed, converted
// to some(action) or none and the
// flatten removes the some/none
// returning just the strings.

Parallel Aggregate is not working on lists .length > 8

I'm writing a small exercise app that calculates number of unique letters (incl Unicode) in a seq of strings, and I'm using aggregate for it, as I try to run in parallel
here's my code:
class Frequency(seq: Seq[String]) {
type FreqMap = Map[Char, Int]
def calculate() = {
val freqMap: FreqMap = Map[Char, Int]()
val pattern = "(\\p{L}+)".r
val seqop: (FreqMap, String) => FreqMap = (fm, s) => {
s.toLowerCase().foldLeft(freqMap){(fm, c) =>
c match {
case pattern(char) => fm.get(char) match {
case None => fm+((char, 1))
case Some(i) => fm.updated(char, i+1)
}
case _ => fm
}
}
}
val reduce: (FreqMap, FreqMap) => FreqMap =
(m1, m2) => {
m1 ++ m2.map { case (k, v) => k -> (v + m1.getOrElse(k, 0)) }
}
seq.par.aggregate(freqMap)(seqop, reduce)
}
}
and then the code that makes use of that
object Frequency extends App {
val text = List("abc", "abc", "abc", "abc", "abc", "abc", "abc", "abc", "abc");
def frequency(seq: Seq[String]):Map[Char, Int] = {
new Frequency(seq).calculate()
}
Console println frequency(seq=text)
}
though I supplied "abc" 9 times, the result is Map(a -> 8, b -> 8, c -> 8), as it is for any number of "abc"'s > 8
I've looked at this, and it seems like I'm using aggregate correctly
Any suggestions to make it work?
You're discarding already collected results (the first fm) in your seqop. You need to add these to the new results you're computing, e.g. like this:
def calculate() = {
val freqMap: FreqMap = Map[Char, Int]()
val pattern = "(\\p{L}+)".r
val reduce: (FreqMap, FreqMap) => FreqMap =
(m1, m2) => {
m1 ++ m2.map { case (k, v) => k -> (v + m1.getOrElse(k, 0)) }
}
val seqop: (FreqMap, String) => FreqMap = (fm, s) => {
val res = s.toLowerCase().foldLeft(freqMap){(fm, c) =>
c match {
case pattern(char) => fm.get(char) match {
case None => fm+((char, 1))
case Some(i) => fm.updated(char, i+1)
}
case _ => fm
}
}
// I'm reusing your existing combinator function here:
reduce(res,fm)
}
seq.par.aggregate(freqMap)(seqop, reduce)
}
Depending on how the parallel collections divide the work you discard some of it. In your case (9x "abc") it divides the thing in 8 parallel seqop operations which means you discard exactly one result set. This varies depending on numbers, if you run in with say 17x "abc" it runs in 13 parallel operations, discarding 4 result sets (on my machine anyway - I'm not familiar with the underlying code and how it divides the work, this probably depends on the used ExecutionContext/Threadpool and subsequently number of CPUs/cores and so on).
Generally parallel collections are a drop in replacement for sequential collections, meaning if you drop .par you should still get the same result, albeit usually slower. If you do this with your original code you get a result of 1, which tells you that it's not a parallelization problem. This is a good way to test if you're doing to right thing when using these.
And last but not least: This was harder to spot than usual for me because you use the same variable name twice and subsequently shadow fm. Not doing that would make the code more readable and mistakes such as this easier to spot.

Tuple seen as Product, compiler rejects reference to element

Constructing phoneVector:
val phoneVector = (
for (i <- 1 until 20) yield {
val p = killNS(r.get("Phone %d - Value" format(i)))
val t = killNS(r.get("Phone %d - Type" format(i)))
if (p == None) None
else
if (t == None) (p,"Main") else (p,t)
}
).filter(_ != None)
Consider this very simple snippet:
for (pTuple <- phoneVector) {
println(pTuple.getClass.getName)
println(pTuple)
//val pKey = pTuple._1.replaceAll("[^\\d]","")
associate() // stub prints "associate"
}
When I run it, I see output like this:
scala.Tuple2
((609) 954-3815,Mobile)
associate
When I uncomment the line with replaceAll(), compile fails:
....scala:57: value _1 is not a member of Product with Serializable
[error] val pKey = pTuple._1.replaceAll("[^\\d]","")
[error] ^
Why does it not recognize pTuple as a Tuple2 and treat it only as Product
OK, this compiles and produces the desired result. But it's too verbose. Can someone please demonstrate a more concise solution for dealing with this typesafe stuff?
for (pTuple <- phoneVector) {
println(pTuple.getClass.getName)
println(pTuple)
val pPhone = pTuple match {
case t:Tuple2[_,_] => t._1
case _ => None
}
val pKey = pPhone match {
case s:String => s.replaceAll("[^\\d]","")
case _ => None
}
println(pKey)
associate()
}
You can do:
for (pTuple <- phoneVector) {
val pPhone = pTuple match {
case (key, value) => key
case _ => None
}
val pKey = pPhone match {
case s:String => s.replaceAll("[^\\d]","")
case _ => None
}
println(pKey)
associate()
}
Or simply phoneVector.map(_._1.replaceAll("[^\\d]",""))
By changing the construction of phoneVector, as wrick's question implied, I've been able to eliminate the match/case stuff because Tuple is assured. Not thrilled by it, but Change is Hard, and Scala seems cool.
Now, it's still possible to slip a None value into either of the Tuple values. My match/case does not check for that, and I suspect that could lead to a runtime error in the replaceAll call. How is that allowed?
def killNS (s:Option[_]) = {
(s match {
case _:Some[_] => s.get
case _ => None
}) match {
case None => None
case "" => None
case s => s
}
}
val phoneVector = (
for (i <- 1 until 20) yield {
val p = killNS(r.get("Phone %d - Value" format(i)))
val t = killNS(r.get("Phone %d - Type" format(i)))
if (t == None) (p,"Main") else (p,t)
}
).filter(_._1 != None)
println(phoneVector)
println(name)
println
// Create the Neo4j nodes:
for (pTuple <- phoneVector) {
val pPhone = pTuple._1 match { case p:String => p }
val pType = pTuple._2
val pKey = pPhone.replaceAll(",.*","").replaceAll("[^\\d]","")
associate(Map("target"->Map("label"->"Phone","key"->pKey,
"dial"->pPhone),
"relation"->Map("label"->"IS_AT","key"->pType),
"source"->Map("label"->"Person","name"->name)
)
)
}
}

Combining multiple Lists of arbitrary length

I am looking for an approach to join multiple Lists in the following manner:
ListA a b c
ListB 1 2 3 4
ListC + # * § %
..
..
..
Resulting List: a 1 + b 2 # c 3 * 4 § %
In Words: The elements in sequential order, starting at first list combined into the resulting list. An arbitrary amount of input lists could be there varying in length.
I used multiple approaches with variants of zip, sliding iterators but none worked and especially took care of varying list lengths. There has to be an elegant way in scala ;)
val lists = List(ListA, ListB, ListC)
lists.flatMap(_.zipWithIndex).sortBy(_._2).map(_._1)
It's pretty self-explanatory. It just zips each value with its position on its respective list, sorts by index, then pulls the values back out.
Here's how I would do it:
class ListTests extends FunSuite {
test("The three lists from his example") {
val l1 = List("a", "b", "c")
val l2 = List(1, 2, 3, 4)
val l3 = List("+", "#", "*", "§", "%")
// All lists together
val l = List(l1, l2, l3)
// Max length of a list (to pad the shorter ones)
val maxLen = l.map(_.size).max
// Wrap the elements in Option and pad with None
val padded = l.map { list => list.map(Some(_)) ++ Stream.continually(None).take(maxLen - list.size) }
// Transpose
val trans = padded.transpose
// Flatten the lists then flatten the options
val result = trans.flatten.flatten
// Viola
assert(List("a", 1, "+", "b", 2, "#", "c", 3, "*", 4, "§", "%") === result)
}
}
Here's an imperative solution if efficiency is paramount:
def combine[T](xss: List[List[T]]): List[T] = {
val b = List.newBuilder[T]
var its = xss.map(_.iterator)
while (!its.isEmpty) {
its = its.filter(_.hasNext)
its.foreach(b += _.next)
}
b.result
}
You can use padTo, transpose, and flatten to good effect here:
lists.map(_.map(Some(_)).padTo(lists.map(_.length).max, None)).transpose.flatten.flatten
Here's a small recursive solution.
def flatList(lists: List[List[Any]]) = {
def loop(output: List[Any], xss: List[List[Any]]): List[Any] = (xss collect { case x :: xs => x }) match {
case Nil => output
case heads => loop(output ::: heads, xss.collect({ case x :: xs => xs }))
}
loop(List[Any](), lists)
}
And here is a simple streams approach which can cope with an arbitrary sequence of sequences, each of potentially infinite length.
def flatSeqs[A](ssa: Seq[Seq[A]]): Stream[A] = {
def seqs(xss: Seq[Seq[A]]): Stream[Seq[A]] = xss collect { case xs if !xs.isEmpty => xs } match {
case Nil => Stream.empty
case heads => heads #:: seqs(xss collect { case xs if !xs.isEmpty => xs.tail })
}
seqs(ssa).flatten
}
Here's something short but not exceedingly efficient:
def heads[A](xss: List[List[A]]) = xss.map(_.splitAt(1)).unzip
def interleave[A](xss: List[List[A]]) = Iterator.
iterate(heads(xss)){ case (_, tails) => heads(tails) }.
map(_._1.flatten).
takeWhile(! _.isEmpty).
flatten.toList
Here's a recursive solution that's O(n). The accepted solution (using sort) is O(nlog(n)). Some testing I've done suggests the second solution using transpose is also O(nlog(n)) due to the implementation of transpose. The use of reverse below looks suspicious (since it's an O(n) operation itself) but convince yourself that it either can't be called too often or on too-large lists.
def intercalate[T](lists: List[List[T]]) : List[T] = {
def intercalateHelper(newLists: List[List[T]], oldLists: List[List[T]], merged: List[T]): List[T] = {
(newLists, oldLists) match {
case (Nil, Nil) => merged
case (Nil, zss) => intercalateHelper(zss.reverse, Nil, merged)
case (Nil::xss, zss) => intercalateHelper(xss, zss, merged)
case ( (y::ys)::xss, zss) => intercalateHelper(xss, ys::zss, y::merged)
}
}
intercalateHelper(lists, List.empty, List.empty).reverse
}

What is Scala way of finding whether all the elements of an Array has same length?

I am new to Scala and but very old to Java and had some understanding working with FP languages like "Haskell".
Here I am wondering how to implement this using Scala. There is a list of elements in an array all of them are strings and I just want to know if there is a way I can do this in Scala in a FP way. Here is my current version which works...
def checkLength(vals: Array[String]): Boolean = {
var len = -1
for(x <- conts){
if(len < 0)
len = x.length()
else{
if (x.length() != len)
return false
else
len = x.length()
}
}
return true;
}
And I am pretty sure there is a better way of doing this in Scala/FP...
list.forall( str => str.size == list(0).size )
Edit: Here's a definition that's as general as possilbe and also allows to check whether a property other than length is the same for all elements:
def allElementsTheSame[T,U](f: T => U)(list: Seq[T]) = {
val first: Option[U] = list.headOption.map( f(_) )
list.forall( f(_) == first.get ) //safe to use get here!
}
type HasSize = { val size: Int }
val checkLength = allElementsTheSame((x: HasSize) => x.size)_
checkLength(Array( "123", "456") )
checkLength(List( List(1,2), List(3,4) ))
Since everyone seems to be so creative, I'll be creative too. :-)
def checkLength(vals: Array[String]): Boolean = vals.map(_.length).removeDuplicates.size <= 1
Mind you, removeDuplicates will likely be named distinct on Scala 2.8.
Tip: Use forall to determine whether all elements in the collection do satisfy a certain predicate (e.g. equality of length).
If you know that your lists are always non-empty, a straight forall works well. If you don't, it's easy to add that in:
list match {
case x :: rest => rest forall (_.size == x.size)
case _ => true
}
Now lists of length zero return true instead of throwing exceptions.
list.groupBy{_.length}.size == 1
You convert the list into a map of groups of equal length strings. If all the strings have the same length, then the map will hold only one such group.
The nice thing with this solution is that you don't need to know anything about the length of the strings, and don't need to comapre them to, say, the first string. It works well on an empty string, in which case it returns false (if that's what you want..)
Here's another approach:
def check(list:List[String]) = list.foldLeft(true)(_ && list.head.length == _.length)
Just my €0.02
def allElementsEval[T, U](f: T => U)(xs: Iterable[T]) =
if (xs.isEmpty) true
else {
val first = f(xs.head)
xs forall { f(_) == first }
}
This works with any Iterable, evaluates f the minimum number of times possible, and while the block can't be curried, the type inferencer can infer the block parameter type.
"allElementsEval" should "return true for an empty Iterable" in {
allElementsEval(List[String]()){ x => x.size } should be (true)
}
it should "eval the function at each item" in {
allElementsEval(List("aa", "bb", "cc")) { x => x.size } should be (true)
allElementsEval(List("aa", "bb", "ccc")) { x => x.size } should be (false)
}
it should "work on Vector and Array as well" in {
allElementsEval(Vector("aa", "bb", "cc")) { x => x.size } should be (true)
allElementsEval(Vector("aa", "bb", "ccc")) { x => x.size } should be (false)
allElementsEval(Array("aa", "bb", "cc")) { x => x.size } should be (true)
allElementsEval(Array("aa", "bb", "ccc")) { x => x.size } should be (false)
}
It's just a shame that head :: tail pattern matching fails so insidiously for Iterables.