I'm writing a simple breadth first search algorithm is Scala and I feel like it should be pretty efficient. However when I run this one some relatively small problems I'm managing to run out of memory.
def search(start: State): Option[State] = {
val queue: mutable.Queue[State] = mutable.Queue[State]()
queue.enqueue( start )
while( queue.nonEmpty ){
val node = queue.dequeue()
if( self.isGoal(node) )
return Some(node)
self.successors(node).foreach( queue.enqueue )
}
None
}
I believe the enqueue and dequeue methods on a mutable queue were constant time and for each is implemented efficiently. The methods isGoal and successors I know are as efficient as they as they can be. I don't understand how I can be running out of memory so quickly. Are there any inefficiencies in this code that I'm missing?
I think c0der's comment nailed it: you may be getting caught in an infinite loop re-checking nodes that you've already visited. Consider the following changes:
def search(start: State): Option[State] = {
var visited: Set[State] = Set() // change #1
val queue: mutable.Queue[State] = mutable.Queue[State]()
queue.enqueue( start )
while( queue.nonEmpty ){
val node = queue.dequeue()
if (!visited.contains(node)) { // change #2
visited += node // change #3
if( self.isGoal(node) )
return Some(node)
self.successors(node).foreach( queue.enqueue )
}
}
None
}
Initialize a new Set, visited, to keep track of which Nodes you've been to.
Immediately after dequeueing a Node, check if you've visited it before. If not, continue checking this Node. Otherwise, ignore it.
Make sure to add this Node to the visited Set so it's not checked again in the future.
Hope that helps :D
You have some Java, not Scala code there. For Scala vars and while is something that you should not use at all. Here is my suggestion how you could solve this.
class State(val neighbours: List[State]) // I am not sure how your State class looks like, but it could look something like this
val goal = new State(List())
def breathFirst(start: State): Option[State] = {
#scala.annotation.tailrec
def recursiveFunction(visited: List[State], toVisit: List[State]): Option[State] = { // So we will create recursive function with visited nodes and nodes that we should visit
if (toVisit.isEmpty) return None // If toVisit is empty that means that there is no path from start to goal, return none
else {
val visiting = toVisit.head // Else we should take first node from toVisit
val visitingNeighbours = visiting.neighbours // Take all neighbours from node that we are visiting
val visitingNeighboursNotYetVisited = visitingNeighbours.filter(x => !visited.contains(x)) //Filter all neighbours that are not visited
if (visitingNeighboursNotYetVisited.contains(goal)) { //if we found goal, return it
return Some(goal)
} else {
return recursiveFunction(visited :+ visiting, toVisit.tail ++ visitingNeighboursNotYetVisited) // Otherwise add node that we visited in this iteration to list of visited nodes that does not have visited node - it was head so we take toVisit.tail
// and also we will take all neighbours that are not visited and add them to toVisit list for next iteration
}
}
}
if (start == goal) { // If goal is start, return start
Some(start)
} else { // else call our recursive function with empty visited list and with toVisit list that has start node
recursiveFunction(List(), List(start))
}
}
NOTE: You could change:
val visitingNeighboursNotYetVisited = visitingNeighbours.filter(x => !visited.contains(x)) //Filter all neighbours that are not visited
with
val visitingNeighboursNotYetVisited = visitingNeighbours
and check if you will go out of memory, and, as probably you wont it will show you why you should use tailrec.
Related
I learned early on that there is no reason to use the return keyword in Scala (as far as I'm aware). That being said I found an example where simply changing adding the return keyword made my function work, where it previously didn't.
The code in question comes from my solution to the Advent of Code day 7 challenge.
def containsShinyGoldBag(bagContents: Map[String, List[String]], currentBag: String): Boolean = {
val contents = bagContents(currentBag)
if (bagContents(currentBag).contains("shiny gold") ) {
// Base Case: Bag Found in list of bags
true
} else if (contents == List.empty){
// Base Case: Dead End
false
} else {
// Continue searching down list
// Ideal solution ( gives same result as the working solution without return keyword )
// for (b <- contents) containsShinyGoldBag(bagContents, b)
// Working solution
for (b <- contents) {
if (containsShinyGoldBag(bagContents, b)) {
println(s"Found one! $b inside a $currentBag")
return true // <--- culprit
}
else false
}
false
}
}
// In the main function
var count = 0
for (bag <- bagContents.keys) {
if (containsShinyGoldBag(bagContents, bag)) {
count = count + 1
}
}
println(s"There are $count way to bring a shiny gold bag!")
When I run the code without return I end up with count = 7, which is the number of bags directly containing a shiny gold bag, rather than the correct number which counts bags that contain a shiny gold bag somewhere inside of one of their other bags down the line.
A function returns the value of the last expression it evaluates; in your case that will be one of:
true after if (bagContents(currentBag).contains("shiny gold") );
false after else if (contents == List.empty);
the last false.
true is not in such a position, so you need return to, well, make the function return it. Otherwise it's evaluated and ignored because you don't do anything with it. So is else false in the same for, actually, it can be removed without changing the meaning.
The alternative to avoid return here is
contents.exists(b => containsShinyGoldBag(bagContents, b))
I want to look for an entire list of items to be found before I complete and if that entire list isn't found, then an exception (a Timeout or custom one) is to be thrown. Like the built in Observable.timer() but instead of the test passing once the first item is emitted, I want it to require all of the items in a list to be found.
Here is an example. Let's say I have some test function that emits Observable<FoundNumber>. It looks like this:
var emittedList: List<String?> = listOf(null, "202", "302", "400")
data class FoundNumber(val numberId: String?)
fun scanNumbers(): Observable<FoundNumber> = Observable
.intervalRange(0,
emittedList.size.toLong(),
0,
1,
TimeUnit.SECONDS).map { index ->
FoundNumber(emittedList[index.toInt()]) }
That function will then be called to get numbers that will be compared to a list of expected numbers. It doesn't matter if there are additional numbers coming from scanForNumbers that aren't in the "target" list. They will just be ignored. Something like this:
val expectedNumbers = listOf("202", "302","999")
scanForNumbers(expectedNumbers)
.observeOn(AndroidSchedulers.mainThread())
.subscribeOn(Schedulers.io())
.subscribe { value -> Log.d(TAG, "Was returned a $value") }
So, the expected numbers (202, 302, and 999) don't exactly match with the numbers that will be emitted (202, 302, and 400). So, a timeout SHOULD occur, but with the built in version of Observable.timer(), it will not time out since at least one item was observed.
Here is kind of what I'd like to have. Anyone know how to code this up in RxJava/RxKotlin?
fun scanForNumbers(targets: List<String>): Observable<FoundNumber> {
val accumulator: Pair<Set<Any>, FoundNumber?> = targets.toSet() to null
return scanNumbers()
.SPECIAL_TIMEOUT_FOR_LIST(5, TimeUnit.SECONDS, List)
.scan(accumulator) { acc, next ->
val (set, previous) = acc
val stringSet:MutableSet<String> = hashSetOf()
set.forEach { stringSet.add(it.toString()) }
val item = if (next.numberId in stringSet) {
next
} else null
(set - next) to item // return set and nullable item
}
.filter { Log.d(TAG, "Filtering on ${it.second}")
it.second != null } // item not null
.take(targets.size.toLong()) // limit to the number of items
.map { it.second } // unwrap the item from the pair
.map { FoundController(it.numberId) } // wrap in your class
}
How do you code, hopefully using RxJava/Kotlin, a means to timeout on a list as mentioned?
I think I get it now, you want the timeout to begin counting from the moment you subscribe, not after you observe items.
If this is what you need, then the takeUntil operator could help you:
return scanNumbers()
.takeUntil(Observable.timer(5, TimeUnit.SECONDS))
.scan(accumulator) { acc, next -> ...
In this case, the timer will begin counting as soon as you subscribe. If the main observable completes before then great, if not, then the timer will complete the main observable anyways.
But takeUntil by itself will not throw an error, it will just complete. If you need it to end with an error, then you could use the following combination:
return scanNumbers()
.takeUntil(
Observable
.error<Void>(new TimeoutError("timeout!"))
.delay(5, TimeUnit.SECONDS, true))
.scan(accumulator) { acc, next -> ...
Given the following code
case class Score(value: BigInt, random: Long = randomLong) extends Comparable[Score] {
override def compareTo(that: Score): Int = {
if (this.value < that.value) -1
else if (this.value > that.value) 1
else if (this.random < that.random) -1
else if (this.random > that.random) 1
else 0
}
override def equals(obj: _root_.scala.Any): Boolean = {
val that = obj.asInstanceOf[Score]
this.value == that.value && this.random == that.random
}
}
#tailrec
private def update(mode: UpdateMode, member: String, newScore: Score, spinCount: Int, spinStart: Long): Unit = {
// Caution: there is some subtle logic below, so don't modify it unless you grok it
try {
Metrics.checkSpinCount(member, spinCount)
} catch {
case cause: ConcurrentModificationException =>
throw new ConcurrentModificationException(Leaderboard.maximumSpinCountExceeded.format("update", member), cause)
}
// Set the spin-lock
put(member, None) match {
case None =>
// BEGIN CRITICAL SECTION
// Member's first time on the board
if (scoreToMember.put(newScore, member) != null) {
val message = s"$member: added new member in memberToScore, but found old member in scoreToMember"
logger.error(message)
throw new ConcurrentModificationException(message)
}
memberToScore.put(member, Some(newScore)) // remove the spin-lock
// END CRITICAL SECTION
case Some(option) => option match {
case None => // Update in progress, so spin until complete
//logger.debug(s"update: $member locked, spinCount = $spinCount")
for (i <- -1 to spinCount * 2) {Thread.`yield`()} // dampen contention
update(mode, member, newScore, spinCount + 1, spinStart)
case Some(oldScore) =>
// BEGIN CRITICAL SECTION
// Member already on the leaderboard
if (scoreToMember.remove(oldScore) == null) {
val message = s"$member: oldScore not found in scoreToMember, concurrency defect"
logger.error(message)
throw new ConcurrentModificationException(message)
} else {
val score =
mode match {
case Replace =>
//logger.debug(s"$member: newScore = $newScore")
newScore
case Increment =>
//logger.debug(s"$member: newScore = $newScore, oldScore = $oldScore")
Score(newScore.value + oldScore.value)
}
//logger.debug(s"$member: updated score = $score")
scoreToMember.put(score, member)
memberToScore.put(member, Some(score)) // remove the spin-lock
//logger.debug(s"update: $member unlocked")
}
// END CRITICAL SECTION
// Do this outside the critical section to reduce time under lock
if (spinCount > 0) Metrics.checkSpinTime(System.nanoTime() - spinStart)
}
}
}
There are two important data structures: memberToScore and scoreToMember. I have experimented using both TrieMap[String,Option[Score]] and ConcurrentHashMap[String,Option[Score]] for memberToScore and both have the same behavior.
So far my testing indicates the code is correct and thread safe, but the mystery is the performance of the spin-lock. On a system with 12 hardware threads, and 1000 iterations on 12 Futures: hitting the same member all the time results in spin cycles of 50 or more, but hitting a random distribution of members can result in spin cycles of 100 or more. The behavior gets worse if I don't dampen the spin without iterating over yield() calls.
So, this seems counter intuitive, I was expecting the random distribution of keys to result in less spin than the same key, but testing proves otherwise.
Can anyone offer some insight into this counter-intuitive behavior?
Granted there may be better solutions to my design, and I am open to them, but for now I cannot seem to find a satisfactory explanation for what my tests are showing, and my curiosity leaves me hungry.
As an aside, while the single member test has a lower ceiling for the spin count, the random member test has a lower ceiling for time spinning, which is what I would expect. I just cannot explain why the random member test generally produces a higher ceiling for spin count.
So I was reading tutorial about akka and came across this http://manuel.bernhardt.io/2014/04/23/a-handful-akka-techniques/ and I think he explained it pretty well, I just picked up scala recently and having difficulties with the tutorial above,
I wonder what is the difference between RoundRobinRouter and the current RoundRobinRouterLogic? Obviously the implementation is quite different.
Previously the implementation of RoundRobinRouter is
val workers = context.actorOf(Props[ItemProcessingWorker].withRouter(RoundRobinRouter(100)))
with processBatch
def processBatch(batch: List[BatchItem]) = {
if (batch.isEmpty) {
log.info(s"Done migrating all items for data set $dataSetId. $totalItems processed items, we had ${allProcessingErrors.size} errors in total")
} else {
// reset processing state for the current batch
currentBatchSize = batch.size
allProcessedItemsCount = currentProcessedItemsCount + allProcessedItemsCount
currentProcessedItemsCount = 0
allProcessingErrors = currentProcessingErrors ::: allProcessingErrors
currentProcessingErrors = List.empty
// distribute the work
batch foreach { item =>
workers ! item
}
}
}
Here's my implementation of RoundRobinRouterLogic
var mappings : Option[ActorRef] = None
var router = {
val routees = Vector.fill(100) {
mappings = Some(context.actorOf(Props[Application3]))
context watch mappings.get
ActorRefRoutee(mappings.get)
}
Router(RoundRobinRoutingLogic(), routees)
}
and treated the processBatch as such
def processBatch(batch: List[BatchItem]) = {
if (batch.isEmpty) {
println(s"Done migrating all items for data set $dataSetId. $totalItems processed items, we had ${allProcessingErrors.size} errors in total")
} else {
// reset processing state for the current batch
currentBatchSize = batch.size
allProcessedItemsCount = currentProcessedItemsCount + allProcessedItemsCount
currentProcessedItemsCount = 0
allProcessingErrors = currentProcessingErrors ::: allProcessingErrors
currentProcessingErrors = List.empty
// distribute the work
batch foreach { item =>
// println(item.id)
mappings.get ! item
}
}
}
I somehow cannot run this tutorial, and it's stuck at the point where it's iterating the batch list. I wonder what I did wrong.
Thanks
In the first place, you have to distinguish diff between them.
RoundRobinRouter is a Router that uses round-robin to select a connection.
While
RoundRobinRoutingLogic uses round-robin to select a routee
You can provide own RoutingLogic (it has helped me to understand how Akka works under the hood)
class RedundancyRoutingLogic(nbrCopies: Int) extends RoutingLogic {
val roundRobin = RoundRobinRoutingLogic()
def select(message: Any, routees: immutable.IndexedSeq[Routee]): Routee = {
val targets = (1 to nbrCopies).map(_ => roundRobin.select(message, routees))
SeveralRoutees(targets)
}
}
link on doc http://doc.akka.io/docs/akka/2.3.3/scala/routing.html
p.s. this doc is very clear and it has helped me the most
Actually I misunderstood the method, and found out the solution was to use RoundRobinPool as stated in http://doc.akka.io/docs/akka/2.3-M2/project/migration-guide-2.2.x-2.3.x.html
For example RoundRobinRouter has been renamed to RoundRobinPool or
RoundRobinGroup depending on which type you are actually using.
from
val workers = context.actorOf(Props[ItemProcessingWorker].withRouter(RoundRobinRouter(100)))
to
val workers = context.actorOf(RoundRobinPool(100).props(Props[ItemProcessingWorker]), "router2")
//I wrote java code for insertion method on doubly linked list but there is a infinite loop //when I run it. I'm trying to find a bug, but have not found so far. any suggestions?
//it is calling a helper function
public IntList insertionSort ( ) {
DListNode soFar = null;
for (DListNode p=myHead; p!=null; p=p.myNext) {
soFar = insert (p, soFar);
}
return new IntList (soFar);
}
// values will be in decreasing order.
private DListNode insert (DListNode p, DListNode head) {
DListNode q=new DListNode(p.myItem);
if(head==null){
head=q;
return head;
}
if(q.myItem>=head.myItem){
DListNode te=head;
q.myNext=te;
te.myPrev=q;
q=head;
return head;
}
DListNode a;
boolean found=false;
for(a=head; a!=null;){
if(a.myItem<q.myItem){
found=true;
break;
}
else{
a=a.myNext;
}
}
if(found==false){
DListNode temp=myTail;
temp.myNext=q;
q.myPrev=temp;
myTail=q;
return head;
}
if(found==true){
DListNode t;
t=a.myPrev;
a.myPrev=q;
t.myNext=q;
q.myPrev=t;
q.myNext=a;
}
return head;
}
Your code is a bit hard to read through but I noticed a few problems
First:
handling the case where you are inserting a number at the head of the list:
if(q.myItem>=head.myItem){
DListNode te=head;
q.myNext=te;
te.myPrev=q;
q=head;
return head;
}
specifically the line q=head; and the return. q=head can be removed, and it should return q not head because q is the new head. I think what you meant to do was head=q; return head;. The current code will essentially add the new node on the front but never return the updated head so they will "fall off the edge" in a way.
Second:
I am assuming myTail is some node reference you are keeping like myHead to the original list. I don't think you want to be using it like you are for the sorted list you are constructing. When you loop through looking for the place to insert in the new list, use that to determine the tail reference and use that instead.
DListNode lastCompared = null;
for(a=head; a!=null; a=a.myNext) {
lastCompared = a;
if(a.myItem<q.myItem) {
break;
}
}
if( a )
{
// insert node before a
...
}
else
{
// smallest value yet, throw on the end
lastCompared.myNext = q;
q.myPrev = lastCompared;
return head;
}
Finally make sure myPrev and myNext are being properly initialized to null in the constructor for DListNode.
disclaimer I didn't get a chance to test the code I added here, but hopefully it at least gets you thinking about the solution.
A couple stylistic notes (just a sidenote):
the repeated if->return format is not the cleanest in my opinion.
I generally try and limit the exit points in functions
There are a lot of intermediate variables being used and the names are super
ambiguous. At the very least try and use some more descriptive
variable names.
comments are always a good idea. Just make sure they don't just explain what the code is doing - instead try and
convey thought process and what is trying to be accomplished.