Iterate over middle third of map - scala

I have a scala Map[String, String] and I am comparing a part of the key to its value only for the mid-third of the map. Since it is not easy to iterate using indices in a map, I came up with the following but it does not work.
var i = 0
var j = 0
val mapSize = sortedMap.size/3
for((key,value) <- sortedMap) {
j+=1
if((i < 3) && (key.split(' ').take(1).mkString==value)&&(j>mapSize)){
Accuracy += 1
i += 1
}
}

You could use the method slice(from: Int, until: Int) and then only iterate over the middle third of the sorted map. Something like
val mapSize = sortedMap.size
for ((key, value) <- sortedMap.slice(mapSize/3, 2*mapSize/3)) {
...
}
Note that this is only reliable if the underlying map is sorted (as seems to be the case in your example). You also might have to adapt the index calculation a little bit, depending on what exactly you consider the middle third for maps whose size is not divisible by 3.

You could convert the map to a stream, modify the stream to remove the first third and the last third and then iterate through the remaining middle third.
val middle = sortedMap.toStream.drop(sortedMap.size / 3).dropRight(sortedMap.size / 3)
middle.foreach(println _) // replace println with your key test
For the key test, you could use a pattern match on
case key if key.split(' ')(0) == value => ...do something...

Related

How to access previous element when using yield in for loop chisel3

This is mix Chisel / Scala question.
Background, I need to sum up a lot of numbers (the number of input signals in configurable). Due to timing constrains I had to split it to groups of 4 and pipe(register it), then it is fed into next stage (which will be 4 times smaller, until I reach on)
this is my code:
// log4 Aux function //
def log4(n : Int): Int = math.ceil(math.log10(n.toDouble) / math.log10(4.0)).toInt
// stage //
def Adder4PipeStage(len: Int,in: Vec[SInt]) : Vec[SInt] = {
require(in.length % 4 == 0) // will not work if not a muliplication of 4
val pipe = RegInit(VecInit(Seq.fill(len/4)(0.S(in(0).getWidth.W))))
pipe.zipWithIndex.foreach {case(p,j) => p := in.slice(j*4,(j+1)*4).reduce(_ +& _)}
pipe
}
// the pipeline
val adderPiped = for(j <- 1 to log4(len)) yield Adder4PipeStage(len/j,if(j==1) io.in else <what here ?>)
how to I access the previous stage, I am also open to hear about other ways to implement the above
There are several things you could do here:
You could just use a var for the "previous" value:
var prev: Vec[SInt] = io.in
val adderPiped = for(j <- 1 to log4(len)) yield {
prev = Adder4PipeStage(len/j, prev)
prev
}
It is a little weird using a var with a for yield (since the former is fundamentally mutable while the latter tends to be used with immutable-style code).
You could alternatively use a fold building up a List
// Build up backwards and reverse (typical in functional programming)
val adderPiped = (1 to log4(len)).foldLeft(io.in :: Nil) {
case (pipes, j) => Adder4PipeStage(len/j, pipes.head) :: pipes
}.reverse
.tail // Tail drops "io.in" which was 1st element in the result List
If you don't like the backwards construction of the previous fold,
You could use a fold with a Vector (better for appending than a List):
val adderPiped = (1 to log4(len)).foldLeft(Vector(io.in)) {
case (pipes, j) => pipes :+ Adder4PipeStage(len/j, pipes.last)
}.tail // Tail drops "io.in" which was 1st element in the result Vector
Finally, if you don't like these immutable ways of doing it, you could always just embrace mutability and write something similar to what one would in Java or Python:
For loop and mutable collection
val pipes = new mutable.ArrayBuffer[Vec[SInt]]
for (j <- 1 to log4(len)) {
pipes += Adder4PipeStage(len/j, if (j == 1) io.in else pipes.last)
}

How to sum number of Ints and Number of Floats within a List - Scala

I need to calculate the number of integers and floats i have in a Map which is like Map[String, List[(Int, String, Float)]]
The data comes from reading a file - the data inside for example looks kinda like (however there is a few more Routes):
Cycle Route (City),1:City Centre :0.75f,2:Main Park :3.8f,3:Central Station:2.7f,4:Modern Art Museum,5:Garden Centre:2.4f,6:Music Centre:3.4f
The map is split so that the String is the name of the route and the List is the rest of the data.
I want it to calculate the number of 'checkpoints' per route and total distance of each route (which is the float) then print out e.g. Oor Wullie Route has 6 checkpoints and total distance of 18.45km
I am guessing I need to use a foldLeft however i am unsure how to do so?
Example of a simple fold i have done before but not sure how to apply one to above scenario?
val list1 = List.range(1,20)
def sum(ls:List[Int]):Int = {
ls.foldLeft(0) { _ + _}
}
You could do this with a fold, but IMO it is unnecessary.
You know the number of checkpoints by simply taking the size of the list (assuming each entry in the list is a checkpoint).
To compute the total distance, you could do:
def getDistance(list: List[(Int, String, Float)]): Float =
list
.iterator // mapping the iterator to avoid building an intermediate List instance
.map(_._3) // get the distance Float from the tuple
.sum // built-in method for collections of Numeric elements (e.g. Float)
And then get your printout like:
def summarize(routes: Map[String, List[(Int, String, Float)]]): Unit =
for { (name, stops) <- routes } {
val numStops = stops.size
val distance = getDistance(stops)
println(s"$name has $numStops stops and total distance of $distance km")
}
If you really wanted to compute both numStops and distance via foldLeft, Luis's comment on your question is the way to do it.
edit - per Luis's request, putting his comment in here and cleaning it up a bit:
stops.foldLeft(0 -> 0.0f) {
// note: "acc" is short for "accumulated"
case ((accCount, accDistance), (_, _, distance)) =>
(accCount + 1) -> (accDistance + distance)
}

How to count the number of iterations in a for comprehension in Scala?

I am using a for comprehension on a stream and I would like to know how many iterations took to get o the final results.
In code:
var count = 0
for {
xs <- xs_generator
x <- xs
count = count + 1 //doesn't work!!
if (x prop)
yield x
}
Is there a way to achieve this?
Edit: If you don't want to return only the first item, but the entire stream of solutions, take a look at the second part.
Edit-2: Shorter version with zipWithIndex appended.
It's not entirely clear what you are attempting to do. To me it seems as if you are trying to find something in a stream of lists, and additionaly save the number of checked elements.
If this is what you want, consider doing something like this:
/** Returns `x` that satisfies predicate `prop`
* as well the the total number of tested `x`s
*/
def findTheX(): (Int, Int) = {
val xs_generator = Stream.from(1).map(a => (1 to a).toList).take(1000)
var count = 0
def prop(x: Int): Boolean = x % 317 == 0
for (xs <- xs_generator; x <- xs) {
count += 1
if (prop(x)) {
return (x, count)
}
}
throw new Exception("No solution exists")
}
println(findTheX())
// prints:
// (317,50403)
Several important points:
Scala's for-comprehension have nothing to do with Python's "yield". Just in case you thought they did: re-read the documentation on for-comprehensions.
There is no built-in syntax for breaking out of for-comprehensions. It's better to wrap it into a function, and then call return. There is also breakable though, but it works with Exceptions.
The function returns the found item and the total count of checked items, therefore the return type is (Int, Int).
The error in the end after the for-comprehension is to ensure that the return type is Nothing <: (Int, Int) instead of Unit, which is not a subtype of (Int, Int).
Think twice when you want to use Stream for such purposes in this way: after generating the first few elements, the Stream holds them in memory. This might lead to "GC-overhead limit exceeded"-errors if the Stream isn't used properly.
Just to emphasize it again: the yield in Scala for-comprehensions is unrelated to Python's yield. Scala has no built-in support for coroutines and generators. You don't need them as often as you might think, but it requires some readjustment.
EDIT
I've re-read your question again. In case that you want an entire stream of solutions together with a counter of how many different xs have been checked, you might use something like that instead:
val xs_generator = Stream.from(1).map(a => (1 to a).toList)
var count = 0
def prop(x: Int): Boolean = x % 317 == 0
val xsWithCounter = for {
xs <- xs_generator;
x <- xs
_ = { count = count + 1 }
if (prop(x))
} yield (x, count)
println(xsWithCounter.take(10).toList)
// prints:
// List(
// (317,50403), (317,50721), (317,51040), (317,51360), (317,51681),
// (317,52003), (317,52326), (317,52650), (317,52975), (317,53301)
// )
Note the _ = { ... } part. There is a limited number of things that can occur in a for-comprehension:
generators (the x <- things)
filters/guards (if-s)
value definitions
Here, we sort-of abuse the value-definition syntax to update the counter. We use the block { counter += 1 } as the right hand side of the assignment. It returns Unit. Since we don't need the result of the block, we use _ as the left hand side of the assignment. In this way, this block is executed once for every x.
EDIT-2
If mutating the counter is not your main goal, you can of course use the zipWithIndex directly:
val xsWithCounter =
xs_generator.flatten.zipWithIndex.filter{x => prop(x._1)}
It gives almost the same result as the previous version, but the indices are shifted by -1 (it's the indices, not the number of tried x-s).

Function to return List of Map while iterating over String, kmer count

I am working on creating a k-mer frequency counter (similar to word count in Hadoop) written in Scala. I'm fairly new to Scala, but I have some programming experience.
The input is a text file containing a gene sequence and my task is to get the frequency of each k-mer where k is some specified length of the sequence.
Therefore, the sequence AGCTTTC has three 5-mers (AGCTT, GCTTT, CTTTC)
I've parsed through the input and created a huge string which is the entire sequence, the new lines throw off the k-mer counting as the end of one line's sequence should still form a k-mer with the beginning of the next line's sequence.
Now I am trying to write a function that will generate a list of maps List[Map[String, Int]] with which it should be easy to use scala's groupBy function to get the count of the common k-mers
import scala.io.Source
object Main {
def main(args: Array[String]) {
// Get all of the lines from the input file
val input = Source.fromFile("input.txt").getLines.toArray
// Create one huge string which contains all the lines but the first
val lines = input.tail.mkString.replace("\n","")
val mappedKmers: List[Map[String,Int]] = getMappedKmers(5, lines)
}
def getMappedKmers(k: Int, seq: String): List[Map[String, Int]] = {
for (i <- 0 until seq.length - k) {
Map(seq.substring(i, i+k), 1) // Map the k-mer to a count of 1
}
}
}
Couple of questions:
How to create/generate List[Map[String,Int]]?
How would you do it?
Any help and/or advice is definitely appreciated!
You're pretty close—there are three fairly minor problems with your code.
The first is that for (i <- whatever) foo(i) is syntactic sugar for whatever.foreach(i => foo(i)), which means you're not actually doing anything with the contents of whatever. What you want is for (i <- whatever) yield foo(i), which is sugar for whatever.map(i => foo(i)) and returns the transformed collection.
The second issue is that 0 until seq.length - k is a Range, not a List, so even once you've added the yield, the result still won't line up with the declared return type.
The third issue is that Map(k, v) tries to create a map with two key-value pairs, k and v. You want Map(k -> v) or Map((k, v)), either of which is explicit about the fact that you have a single argument pair.
So the following should work:
def getMappedKmers(k: Int, seq: String): IndexedSeq[Map[String, Int]] = {
for (i <- 0 until seq.length - k) yield {
Map(seq.substring(i, i + k) -> 1) // Map the k-mer to a count of 1
}
}
You could also convert either the range or the entire result to a list with .toList if you'd prefer a list at the end.
It's worth noting, by the way, that the sliding method on Seq does exactly what you want:
scala> "AGCTTTC".sliding(5).foreach(println)
AGCTT
GCTTT
CTTTC
I'd definitely suggest something like "AGCTTTC".sliding(5).toList.groupBy(identity) for real code.

Sort a list by an ordered index

Let us assume that I have the following two sequences:
val index = Seq(2,5,1,4,7,6,3)
val unsorted = Seq(7,6,5,4,3,2,1)
The first is the index by which the second should be sorted. My current solution is to traverse over the index and construct a new sequence with the found elements from the unsorted sequence.
val sorted = index.foldLeft(Seq[Int]()) { (s, num) =>
s ++ Seq(unsorted.find(_ == num).get)
}
But this solution seems very inefficient and error-prone to me. On every iteration it searches the complete unsorted sequence. And if the index and the unsorted list aren't in sync, then either an error will be thrown or an element will be omitted. In both cases, the not in sync elements should be appended to the ordered sequence.
Is there a more efficient and solid solution for this problem? Or is there a sort algorithm which fits into this paradigm?
Note: This is a constructed example. In reality I would like to sort a list of mongodb documents by an ordered list of document Id's.
Update 1
I've selected the answer from Marius Danila because it seems the more fastest and scala-ish solution for my problem. It doesn't come with a not in sync item solution, but this could be easily implemented.
So here is the updated solution:
def sort[T: ClassTag, Key](index: Seq[Key], unsorted: Seq[T], key: T => Key): Seq[T] = {
val positionMapping = HashMap(index.zipWithIndex: _*)
val inSync = new Array[T](unsorted.size)
val notInSync = new ArrayBuffer[T]()
for (item <- unsorted) {
if (positionMapping.contains(key(item))) {
inSync(positionMapping(key(item))) = item
} else {
notInSync.append(item)
}
}
inSync.filterNot(_ == null) ++ notInSync
}
Update 2
The approach suggested by Bask.cc seems the correct answer. It also doesn't consider the not in sync issue, but this can also be easily implemented.
val index: Seq[String]
val entities: Seq[Foo]
val idToEntityMap = entities.map(e => e.id -> e).toMap
val sorted = index.map(idToEntityMap)
val result = sorted ++ entities.filterNot(sorted.toSet)
Why do you want to sort collection, when you already have sorted index collection? You can just use map
Concerning> In reality I would like to sort a list of mongodb documents by an ordered list of document Id's.
val ids: Seq[String]
val entities: Seq[Foo]
val idToEntityMap = entities.map(e => e.id -> e).toMap
ids.map(idToEntityMap _)
This may not exactly map to your use case, but Googlers may find this useful:
scala> val ids = List(3, 1, 0, 2)
ids: List[Int] = List(3, 1, 0, 2)
scala> val unsorted = List("third", "second", "fourth", "first")
unsorted: List[String] = List(third, second, fourth, first)
scala> val sorted = ids map unsorted
sorted: List[String] = List(first, second, third, fourth)
I do not know the language that you are using. But irrespective of the language this is how i would have solved the problem.
From the first list (here 'index') create a hash table taking key as the document id and the value as the position of the document in the sorted order.
Now when traversing through the list of document i would lookup the hash table using the document id and then get the position it should be in the sorted order. Then i would use this obtained order to sort in a pre allocated memory.
Note: if the number of documents is small then instead of using hashtable u could use a pre allocated table and index it directly using the document id.
Flat Mapping the index over the unsorted list seems to be a safer version (if the index isn't found it's just dropped since find returns a None):
index.flatMap(i => unsorted.find(_ == i))
It still has to traverse the unsorted list every time (worst case this is O(n^2)). With you're example I'm not sure that there's a more efficient solution.
In this case you can use zip-sort-unzip:
(unsorted zip index).sortWith(_._2 < _._2).unzip._1
Btw, if you can, better solution would be to sort list on db side using $orderBy.
Ok.
Let's start from the beginning.
Besides the fact you're rescanning the unsorted list each time, the Seq object will create, by default a List collection. So in the foldLeft you're appending an element at the end of the list each time and this is a O(N^2) operation.
An improvement would be
val sorted_rev = index.foldLeft(Seq[Int]()) { (s, num) =>
unsorted.find(_ == num).get +: s
}
val sorted = sorted_rev.reverse
But that is still an O(N^2) algorithm. We can do better.
The following sort function should work:
def sort[T: ClassTag, Key](index: Seq[Key], unsorted: Seq[T], key: T => Key): Seq[T] = {
val positionMapping = HashMap(index.zipWithIndex: _*) //1
val arr = new Array[T](unsorted.size) //2
for (item <- unsorted) { //3
val position = positionMapping(key(item))
arr(position) = item
}
arr //6
}
The function sorts a list of items unsorted by a sequence of indexes index where the key function will be used to extract the id from the objects you're trying to sort.
Line 1 creates a reverse index - mapping each object id to its final position.
Line 2 allocates the array which will hold the sorted sequence. We're using an array since we need constant-time random-position set performance.
The loop that starts at line 3 will traverse the sequence of unsorted items and place each item in it's meant position using the positionMapping reverse index
Line 6 will return the array converted implicitly to a Seq using the WrappedArray wrapper.
Since our reverse-index is an immutable HashMap, lookup should take constant-time for regular cases. Building the actual reverse-index takes O(N_Index) time where N_Index is the size of the index sequence. Traversing the unsorted sequence takes O(N_Unsorted) time where N_Unsorted is the size of the unsorted sequence.
So the complexity is O(max(N_Index, N_Unsorted)), which I guess is the best you can do in the circumstances.
For your particular example, you would call the function like so:
val sorted = sort(index, unsorted, identity[Int])
For the real case, it would probably be like this:
val sorted = sort(idList, unsorted, obj => obj.id)
The best I can do is to create a Map from the unsorted data, and use map lookups (basically the hashtable suggested by a previous poster). The code looks like:
val unsortedAsMap = unsorted.map(x => x -> x).toMap
index.map(unsortedAsMap)
Or, if there's a possibility of hash misses:
val unsortedAsMap = unsorted.map(x => x -> x).toMap
index.flatMap(unsortedAsMap.get)
It's O(n) in time*, but you're swapping time for space, as it uses O(n) space.
For a slightly more sophisticated version, that handles missing values, try:
import scala.collection.JavaConversions._
import scala.collection.mutable.ListBuffer
val unsortedAsMap = new java.util.LinkedHashMap[Int, Int]
for (i <- unsorted) unsortedAsMap.add(i, i)
val newBuffer = ListBuffer.empty[Int]
for (i <- index) {
val r = unsortedAsMap.remove(i)
if (r != null) newBuffer += i
// Not sure what to do for "else"
}
for ((k, v) <- unsortedAsMap) newBuffer += v
newBuffer.result()
If it's a MongoDB database in the first place, you might be better retrieving documents directly from the database by index, so something like:
index.map(lookupInDB)
*technically it's O(n log n), as Scala's standard immutable map is O(log n), but you could always use a mutable map, which is O(1)