Iterate through scala Vector and pair a certain object with the next one - scala

Assuming that I have a collection (Vector[Int]),1,2,5,4,3,5,5,5,6,7,7 and want to get another collection (Vector[Vector[Int]]) pairing every number 5 with the next number (1),(2),(5,4),(3),(5,5),(5,6),(7),(7) what are my options apart from this:
var input= Vector.[Int]
var output = Vector.empty[Vector[Int]]
var skip = false
for(i <- input.indices){
if (input(i) == 5 && skip == false){
output = output :+ input(i) :+ input(i + 1)
skip = true;
}else if(input(i - 1) != 5){
output = output :+ input(i)
}else{
skip = false;
}
}
which works but is not very scala-like.
Would it be possible to achieve the same result with a for comprehension? for(x <- c; if cond) yield {...}

You can use foldLeft
val output = input.foldLeft (Vector.empty[Vector[Int]]) { (result, next) =>
if(!result.isEmpty && result.last == Vector(5)) {
result.dropRight(1) :+ Vector(5, next)
} else {
result :+ Vector(next)
}
}

You could use pattern matching as well
def prepareVector(lv: Vector[Int]): Vector[Vector[Int]] = {
val mv = new ArrayBuffer[Vector[Int]]()
def go(ll: List[Int]): Unit = ll match {
case y :: Nil => mv += Vector(y)
case 5 :: ys => {
mv += Vector(5, ys.head)
go(ys.tail)
}
case y :: ys => {
mv += Vector(y)
go(ys)
}
case Nil => None
}
go(lv.toList)
mv.toVector
}

Related

Shorten MQTT topic filtering function

I wrote the following logic function, but I am sure it is possible to write it (way) shorter.
In case you are unfamiliar with MQTT wildcards, you can read up on them here.
self is the topic we are "subscribed" to, containing zero or more wildcards. incoming is the topic we received something on, which must match the self topic either fully, or conforming to the wildcard rules.
All my tests on this function succeed, but I just don't like the lengthiness and "iffyness" of this Scala function.
def filterTopic(incoming: String, self: String): Boolean = {
if (incoming == self || self == "#") {
true
} else if (self.startsWith("#") || (self.contains("#") && !self.endsWith("#")) || self.endsWith("+")) {
false
} else {
var valid = true
val selfSplit = self.split('/')
var j = 0
for (i <- selfSplit.indices) {
if (selfSplit(i) != "+" && selfSplit(i) != "#" && selfSplit(i) != incoming.split('/')(i)) {
valid = false
}
j += 1
}
if (j < selfSplit.length && selfSplit(j) == "#") {
j += 1
}
j == selfSplit.length && valid
}
}
Here's a shot at it assuming that '+' can be at the end and that the topics are otherwise well-structured
def filterTopic(incoming: String, self: String): Boolean = {
// helper function that works on lists of parts of the topics
def go(incParts: List[String], sParts: List[String]): Boolean = (incParts, sParts) match {
// if they're equivalent lists, the topics match
case (is, ss) if is == ss => true
// if sParts is just a single "#", the topics match
case (_, "#" :: Nil) => true
// if sParts starts with '+', just check if the rest match
case (_ :: is, s :: ss) if s == "+" =>
go(is, ss)
// otherwise the first parts have to match, and we check the rest
case (i :: is, s :: ss) if i == s =>
go(is, ss)
// otherwise they don't match
case _ => false
}
// split the topic strings into parts
go(incoming.split('/').toList, self.split('/').toList)
}

Parallel FP Growth in Spark

I am trying to understand the "add" and "extract" methods of the FPTree class:
(https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala).
What is the purpose of 'summaries' variable?
where is the Group list?
I assume it is the following, am I correct:
val numParts = if (numPartitions > 0) numPartitions else data.partitions.length
val partitioner = new HashPartitioner(numParts)
What will 'summaries contain for 3 transactions of {a,b,c} , {a,b} , {b,c} where all are frequent?
def add(t: Iterable[T], count: Long = 1L): FPTree[T] = {
require(count > 0)
var curr = root
curr.count += count
t.foreach { item =>
val summary = summaries.getOrElseUpdate(item, new Summary)
summary.count += count
val child = curr.children.getOrElseUpdate(item, {
val newNode = new Node(curr)
newNode.item = item
summary.nodes += newNode
newNode
})
child.count += count
curr = child
}
this
}
def extract(
minCount: Long,
validateSuffix: T => Boolean = _ => true): Iterator[(List[T], Long)] = {
summaries.iterator.flatMap { case (item, summary) =>
if (validateSuffix(item) && summary.count >= minCount) {
Iterator.single((item :: Nil, summary.count)) ++
project(item).extract(minCount).map { case (t, c) =>
(item :: t, c)
}
} else {
Iterator.empty
}
}
}
After a bit experiments, it is pretty straight forward:
1+2) The partition is indeed the Group representative.
It is also how the conditional transactions calculated:
private def genCondTransactions[Item: ClassTag](
transaction: Array[Item],
itemToRank: Map[Item, Int],
partitioner: Partitioner): mutable.Map[Int, Array[Int]] = {
val output = mutable.Map.empty[Int, Array[Int]]
// Filter the basket by frequent items pattern and sort their ranks.
val filtered = transaction.flatMap(itemToRank.get)
ju.Arrays.sort(filtered)
val n = filtered.length
var i = n - 1
while (i >= 0) {
val item = filtered(i)
val part = partitioner.getPartition(item)
if (!output.contains(part)) {
output(part) = filtered.slice(0, i + 1)
}
i -= 1
}
output
}
The summaries is just a helper to save the count of items in transaction
The extract/project will generate the FIS by using up/down recursion and dependent FP-Trees (project), while checking summaries if traversal that path is needed.
summaries of node 'a' will have {b:2,c:1} and children of node 'a' are 'b' and 'c'.

combine condition inside flat map and return result

val date2 = Option(LocalDate.parse("2017-02-01"))
//date1.compareTo(date2)>=0
case class dummy(val prop:Seq[Test])
case class Test(val s :String)
case class Result(val s :String)
val s = "11,22,33"
val t = Test(s)
val dt =Test("2017-02-06")
val list = dummy(Seq(t))
val list2 = dummy(Seq(dt))
val code = Option("22")
val f = date2.flatMap(c => list2
.prop
.find(d=>LocalDate.parse(d.s)
.compareTo(c)>=0))
.map(_ => Result("Found"))
.getOrElse(Result("Not Found"))
code.flatMap(c => list
.prop
.find(_.s.split(",").contains(c)))
.map(_ => Result("Found"))
.getOrElse(Result("Not Found"))
I want to && the conditions below and return Result("Found")/Result("Not Found")
d=>LocalDate.parse(d.s).compareTo(c)>=0)
_.s.split(",").contains(c)
Is there any possible way to achieve the above .In actual scenerio list and list 2 are Future
I tried to make a more realistic example based on Futures. Here is how I would do it:
val date2 = Option(LocalDate.parse("2017-02-01"))
case class Test(s: String)
case class Result(s: String)
val t = Test("11,22,33")
val dt = Test("2017-02-06")
val code = Option("22")
val f1 = Future(Seq(t))
val f2 = Future(Seq(dt))
// Wait for both futures to finish
val futureResult = Future.sequence(Seq(f1, f2)).map {
case Seq(s1, s2) =>
// Check the first part, this will be a Boolean
val firstPart = code.nonEmpty && s1.exists(_.s.split(",").contains(code.get))
// Check the second part, also a boolean
val secondPart = date2.nonEmpty && s2.exists(d => LocalDate.parse(d.s).compareTo(date2.get) >= 0)
// Do the AND logic you wanted
if (firstPart && secondPart) {
Result("Found")
} else {
Result("Not Found")
}
}
// This is just for testing to see we got the correct result
val result = Await.result(futureResult, Duration.Inf)
println(result)
As an aside, your code and date2 values in your example are Options... If this is true in your production code, then we should do a check first to see if they are both defined. If they are not then there would be no need to continue with the rest of the code:
val futureResult = if (date2.isEmpty || code.isEmpty) {
Future.successful(Result("Not Found"))
} else {
Future.sequence(Seq(f1, f2)).map {
case Seq(s1, s2) =>
val firstPart = s1.exists(_.s.split(",").contains(code.get))
val secondPart = s2.exists(d => LocalDate.parse(d.s).compareTo(date2.get) >= 0)
if (firstPart && secondPart) {
Result("Found")
} else {
Result("Not Found")
}
}
}
Use pattern matching on Option instead of using flatMap
e.g.
val x = Some("20")
x match {
case Some(i) => println(i) //do whatever you want to do with the value. And then return result
case None => Result("Not Found")
}
Looking at what you are trying to do, You would have to use pattern matching twice, that too nested one.

Scala - Recursive method is return different values

I have implemented a calculation to obtain the node score of each nodes.
The formula to obtain the value is:
The children list can not be empty or a flag must be true;
The iterative way works pretty well:
class TreeManager {
def scoreNo(nodes:List[Node]): List[(String, Double)] = {
nodes.headOption.map(node => {
val ranking = node.key.toString -> scoreNode(Some(node)) :: scoreNo(nodes.tail)
ranking ::: scoreNo(node.children)
}).getOrElse(Nil)
}
def scoreNode(node:Option[Node], score:Double = 0, depth:Int = 0):Double = {
node.map(n => {
var nodeScore = score
for(child <- n.children){
if(!child.children.isEmpty || child.hasInvitedSomeone == Some(true)){
nodeScore = scoreNode(Some(child), (nodeScore + scala.math.pow(0.5, depth)), depth+1)
}
}
nodeScore
}).getOrElse(score)
}
}
But after i've refactored this piece of code to use recursion, the results are totally wrong:
class TreeManager {
def scoreRecursive(nodes:List[Node]): List[(Int, Double)] = {
def scoreRec(nodes:List[Node], score:Double = 0, depth:Int = 0): Double = nodes match {
case Nil => score
case n =>
if(!n.head.children.isEmpty || n.head.hasInvitedSomeone == Some(true)){
score + scoreRec(n.tail, score + scala.math.pow(0.5, depth), depth + 1)
} else {
score
}
}
nodes.headOption.map(node => {
val ranking = node.key -> scoreRec(node.children) :: scoreRecursive(nodes.tail)
ranking ::: scoreRecursive(node.children)
}).getOrElse(Nil).sortWith(_._2 > _._2)
}
}
The Node is an object of a tree and it's represented by the following class:
case class Node(key:Int,
children:List[Node] = Nil,
hasInvitedSomeone:Option[Boolean] = Some(false))
And here is the part that i'm running to check results:
object Main {
def main(bla:Array[String]) = {
val xx = new TreeManager
val values = List(
Node(10, List(Node(11, List(Node(13))),
Node(12,
List(
Node(14, List(
Node(15, List(Node(18))), Node(17, hasInvitedSomeone = Some(true)),
Node(16, List(Node(19, List(Node(20)))),
hasInvitedSomeone = Some(true))),
hasInvitedSomeone = Some(true))),
hasInvitedSomeone = Some(true))),
hasInvitedSomeone = Some(true)))
val resIterative = xx.scoreNo(values)
//val resRecursive = xx.scoreRec(values)
println("a")
}
}
The iterative way is working because i've checked it but i didn't get why recursive return wrong values.
Any idea?
Thank in advance.
The recursive version never recurses on children of the nodes, just on the tail. Whereas the iterative version correctly both recurse on the children and iterate on the tail.
You'll notice your "iterative" version is also recursive btw.

How can I speed up flatten?

I have this method:
val reportsWithCalculatedUsage = time("Calculate USAGE") {
reportsHavingCalculatedCounter.flatten.flatten.toList.groupBy(_._2.product).mapValues(_.map(_._2)) mapValues { list =>
list.foldLeft(List[ReportDataHelper]()) {
case (Nil, head) =>
List(head)
case (tail, head) =>
val previous = tail.head
val current = head copy (
usage = if (head.machine == previous.machine) head.counter - previous.counter else head.usage)
current :: tail
} reverse
}
}
Where reportsHavingCalculatedCounter is of type: val reportsHavingCalculatedCounter:
scala.collection.immutable.Iterable[scala.collection.immutable.IndexedSeq[scala.collection.immutable.Map[Strin
g,com.agilexs.machinexs.logic.ReportDataHelper]]].
This code works perfectly. The problem is that this reportsHavingCalculatedCounter has maps inside it whom sum of ReportDataHelper objects (map values) is about 50 000 entries and the flatten.flatten takes about 15s to be processed.
I've also tried with 2 flat maps but that's almost the same (time consuming). Is there any way to improve this? (please ignore foldLeft or reverse; if I remove that the issue is still present, the most time consuming are those 2 flatten).
UPDATE: I've tried with a different scenario:
val reportsHavingCalculatedCounter2: Seq[ReportDataHelper] = time("Counter2") {
val builder = new ArrayBuffer[ReportDataHelper](50000)
var c = 0
reportsHavingCalculatedCounter.foreach { v =>
v.foreach { v =>
v.values.foreach { v =>
c += 1
builder += v
}
}
}
println("Count:" + c)
builder.result
}
And it takes: Counter2 (15.075s).
I can't imagine that scala is slow. This is the slowest part v.values.foreach.