I have a list that for example has 101 record, i want to send data in batch of 10.
list.grouped(10) will return 10 lists having 10 records and 1 list with one record.
I want to send all ten lists with 10 records to futureSend method and want to get the list with one record(if exists in this scenario it does but if list has 100 records then return empty list) from this for comprehension. So if the list.size is not equals to ten, i do not want to send the list to futureSend and get that list from for comprehension.
val listRes = list.grouped(10)
for {
list <- listRes
if list.size == 10
_ = futureSend(list) map { result =>
println("sent")
)
}
} yield {}
I want to yield the list with the size less than 10(do not call futureCall method in that case) and if all the lists have size 10 yield empty list.
A for comprehension is the wrong tool for the job.
val listRes = list.grouped(10).toList
listRes.init.foreach{group => futureSend(group); println("sent")}
val last = listRes.last
val remainder = if (last.length < 10) last
else {
futureSend(last)
println("sent")
List.empty[ElementType]
}
If you're on Scala 2.13.x then there's a slightly more efficient option.
val remainder = List.unfold(list.grouped(10)){ itr =>
Option.when(itr.hasNext) {
val thisBatch = itr.next()
if (itr.hasNext || thisBatch.size == 10) {
futureSend(thisBatch)
println("sent")
(List(), itr)
} else (thisBatch, itr)
}
}.flatten
Related
I am attempting to sort a List of names using Scala, and I am trying to learn how to do this recursively. The List is a List of Lists, with the "element" list containing two items (lastName, firstName). My goal is to understand how to use recursion to sort the names. For the purpose of this post my goal is just to sort by the length of lastName.
If I call my function several times on a small sample list, it will successfully sort lastName by length from shortest to longest, but I have not been able to construct a satisfactory exit condition using recursion. I have tried variations of foreach and other loops, but I have been unsuccessful. Without a satisfactory exit condition, the recursion just continues forever.
import scala.collection.mutable.ListBuffer
import scala.annotation.tailrec
val nameListBuffer = new ListBuffer[List[String]]
val source = Source.fromFile("shortnames.txt")
val lines = source.getLines()
for (line <- lines) {
nameListBuffer += line.split(" ").reverse.toList
}
#tailrec
def sorting(x: ListBuffer[List[String]]): Unit = {
for (i <- 0 until ((x.length)-1)) {
var temp = x(i)
if (x(i)(0).length > x(i+1)(0).length) {
x(i) = x(i+1)
x(i+1) = temp
}
}
var continue = false
while (continue == false) {
for (i <- 0 until ((x.length)-1)) {
if (x(i)(0).length <= x(i+1)(0).length) {
continue == false//want to check ALL i's before evaluating
}
else continue == true
}
}
sorting(x)
}
sorting(nameListBuffer)
Sorry about the runtime complexity it's basically an inefficient bubble sort at O(n^4) but the exit criteria - focus on that. For tail recursion, the key is that the recursive call is to a smaller element than the preceding recursive call. Also, keep two arguments, one is the original list, and one is the list that you are accumulating (or whatever you want to return, it doesn't have to be a list). The recursive call keeps getting smaller until eventually you can return what you have accumulated. Use pattern matching to catch when the recursion has ended, and then you return what you were accumulating. This is why Lists are so popular in Scala, because of the Nil and Cons subtypes and because of operators like the :: can be handled nicely with pattern matching. One more thing, to be tail recursive, the last case has to make a recursive or it won't run.
import scala.collection.mutable.ListBuffer
import scala.annotation.tailrec
// I did not ingest from file I just created the test list from some literals
val dummyNameList = List(
List("Johanson", "John"), List("Nim", "Bryan"), List("Mack", "Craig")
, List("Youngs", "Daniel"), List("Williamson", "Zion"), List("Rodgersdorf", "Aaron"))
// You can use this code to populate nameList though I didn't run this code
val source = Source.fromFile("shortnames.txt")
val lines = source.getLines()
val nameList = {
for (line <- lines) yield line.split(" ").reverse.toList
}.toList
println("\nsorted:")
sortedNameList.foreach(println(_))
//This take one element and it will return the lowest element from the list
//of the other argument.
#tailrec
private def swapElem(elem: List[String], listOfLists: List[List[String]]): List[String] = listOfLists match {
case Nil => elem
case h::t if (elem(0).length > h(0).length) => swapElem(h, t)
case h::t => swapElem(elem, t)
}
//If the head is not the smallest element, then swap out the element
//with the smallest element of the list. I probably could have returned
// a tuple it might have looked nicer. It just keeps iterating though until
// there is no elements
#tailrec
private def iterate(listOfLists: List[List[String]], acc: List[List[String]]): List[List[String]] = listOfLists match {
case h::Nil => acc :+ h
case h::t if (swapElem(h, t) != h) => iterate(h :: t.filterNot(_ == swapElem(h, t)), acc :+ swapElem(h, t))
case h::t => iterate(t, acc :+ swapElem(h, t))
}
val sortedNameList = iterate(nameList, List.empty[List[String]])
println("\nsorted:")
sortedNameList.foreach(println(_))
sorted:
List(Nim, Bryan)
List(Mack, Craig)
List(Youngs, Daniel)
List(Johanson, John)
List(Williamson, Zion)
List(Rodgersdorf, Aaron)
Below code computes a distance metric between two Users as specified by case class :
case class User(name: String, features: Vector[Double])
val ul = for (a <- 1 to 100) yield (User("a", Vector(1, 2, 4)))
var count = 0;
def distance(userA: User, userB: User) = {
val subElements = (userA.features zip userB.features) map {
m => (m._1 - m._2) * (m._1 - m._2)
}
val summed = subElements.sum
val sqRoot = Math.sqrt(summed)
count += 1;
println("count is " + count)
((userA.name, userB.name), sqRoot)
}
val distances = ul.par.map(m => ul.map(m2 => {
(distance(m, m2))
})).toList.flatten
val sortedDistances = distances.groupBy(_._1._1).map(m => (m._1, m._2.sortBy(s => s._2)))
println(sortedDistances.get("a").get.size);
This performs a Cartesian product of comparison 100 users : 10000 comparisons. I'm counting each comparison, represented bu var count
Often the count value will be less than 10000, but the amount of items iterated over is always 10000. Is reason for this that as par spawns multiple threads some of these will finish before the println statement is executed. However all will finish within par code block - before distances.groupBy(_._1._1).map(m => (m._1, m._2.sortBy(s => s._2))) is evaluated.
In your example you have a single un-synchronized variable that you're mutating from multiple threads like you said. This means that each thread, at any time, may have a stale copy of count, so when they increment it they will squash any other writes that have occurred, resulting in a count less than it should be.
You can solve this using the synchronized function,
...
val subElements = (userA.features zip userB.features) map {
m => (m._1 - m._2) * (m._1 - m._2)
}
val summed = subElements.sum
val sqRoot = Math.sqrt(summed)
this.synchronized {
count += 1;
}
println("count is " + count)
((userA.name, userB.name), sqRoot)
...
Using 'this.synchronized' will use the containing object as the lock object. For more information on Scala synchronization I suggest reading Twitter's Scala School.
Hi I am new in scala and I don't know how to change following code:
def makeProfil(listProfils: List[Profil]): List[Profil] ={
// var newList = List[Profil]
var ll = List[Map[Int,Profil]]()
listProfils.foreach( item => {
var count = item.czasOgladania.get
if(item.czyKupil.get) count = count *4 ;
if(item.czyPrzeczytal.get) count = count *3 ;
if(item.czyWKarcie.get) count = count *2 ;
ll ::= Map (count -> item)
} )
}
I want to sort ll list by element count from and return sorted List[Profil] as the result. I tried various things but none of them work good.
List has a method, sortWith, which sorts your list, where you can provide the criteria for sorting the list. The criteria is a function, accepting two arguments (two profiles), and result is Boolean, which indicates which one of them is "greater".
So, you can do the following:
ll.sortWith((p1, p2) =>
getCount(p1) > getCount(p2)
)
where
def getCount(profil: Profil) = {
var count = profil.czasOgladania.get
if(profil.czyKupil.get) count = count *4 ;
if(profil.czyPrzeczytal.get) count = count *3 ;
if(profil.czyWKarcie.get) count = count *2 ;
count
}
Update
BTW, it seems that profil.czasOgladania, profil.czyKupil etc., are Options. In that case you should first check if they are defined, and than perform computations. You may define default values, e.g.
// if profil.czasOgladania is defined, get the value. Else, return 10.
val count = profil.czasOgladania.getOrElse(10)
or:
if(profil.czyWKarcie.getOrElse(false)) count = count *2
Here's the whole thing without any mutable states (vars are bad practice). First you map the list of profiles to a list of (count, profile) tuples. The map doesn't seem necessary. Then you sort the list by the first item in the tuple, then map it to a list of profiles (the second item in the tuple).
def makeProfil(listProfils: List[Profil]): List[Profil] ={
val profileCounts = listProfils.map( item => {
val count = item.czasOgladania.getOrElse(0)
val kupil = if(item.czyKupil.isDefined) 4 else 1
val przeczytal = if(item.czyPrzeczytal.isDefined) 3 else 1;
val wKarcie = if(item.czyWKarcie.isDefined) 2 else 1 ;
val finalCount = count * kupil * przeczytal * wKarcie
(count, item)
} )
profileCounts.sortBy( _._1).map(_._2)
}
You can also use sortBy:
def makeProfil(listProfils: List[Profil]): List[Profil] = {
def getCount(profil: Profil) = {
var count = profil.czasOgladania.get
if (profil.czyKupil.get) count *= 4
if (profil.czyPrzeczytal.get) count *= 3
if (profil.czyWKarcie.get) count *= 2
count
}
listProfils.sortBy(p => getCount(p))
}
I have this method:
val reportsWithCalculatedUsage = time("Calculate USAGE") {
reportsHavingCalculatedCounter.flatten.flatten.toList.groupBy(_._2.product).mapValues(_.map(_._2)) mapValues { list =>
list.foldLeft(List[ReportDataHelper]()) {
case (Nil, head) =>
List(head)
case (tail, head) =>
val previous = tail.head
val current = head copy (
usage = if (head.machine == previous.machine) head.counter - previous.counter else head.usage)
current :: tail
} reverse
}
}
Where reportsHavingCalculatedCounter is of type: val reportsHavingCalculatedCounter:
scala.collection.immutable.Iterable[scala.collection.immutable.IndexedSeq[scala.collection.immutable.Map[Strin
g,com.agilexs.machinexs.logic.ReportDataHelper]]].
This code works perfectly. The problem is that this reportsHavingCalculatedCounter has maps inside it whom sum of ReportDataHelper objects (map values) is about 50 000 entries and the flatten.flatten takes about 15s to be processed.
I've also tried with 2 flat maps but that's almost the same (time consuming). Is there any way to improve this? (please ignore foldLeft or reverse; if I remove that the issue is still present, the most time consuming are those 2 flatten).
UPDATE: I've tried with a different scenario:
val reportsHavingCalculatedCounter2: Seq[ReportDataHelper] = time("Counter2") {
val builder = new ArrayBuffer[ReportDataHelper](50000)
var c = 0
reportsHavingCalculatedCounter.foreach { v =>
v.foreach { v =>
v.values.foreach { v =>
c += 1
builder += v
}
}
}
println("Count:" + c)
builder.result
}
And it takes: Counter2 (15.075s).
I can't imagine that scala is slow. This is the slowest part v.values.foreach.
I was wondering if I can tune the following Scala code :
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
var listNoDuplicates: List[(Class1, Class2)] = Nil
for (outerIndex <- 0 until listOfTuple.size) {
if (outerIndex != listOfTuple.size - 1)
for (innerIndex <- outerIndex + 1 until listOfTuple.size) {
if (listOfTuple(i)._1.flag.equals(listOfTuple(j)._1.flag))
listNoDuplicates = listOfTuple(i) :: listNoDuplicates
}
}
listNoDuplicates
}
Usually if you have someting looking like:
var accumulator: A = new A
for( b <- collection ) {
accumulator = update(accumulator, b)
}
val result = accumulator
can be converted in something like:
val result = collection.foldLeft( new A ){ (acc,b) => update( acc, b ) }
So here we can first use a map to force the unicity of flags. Supposing the flag has a type F:
val result = listOfTuples.foldLeft( Map[F,(ClassA,ClassB)] ){
( map, tuple ) => map + ( tuple._1.flag -> tuple )
}
Then the remaining tuples can be extracted from the map and converted to a list:
val uniqList = map.values.toList
It will keep the last tuple encoutered, if you want to keep the first one, replace foldLeft by foldRight, and invert the argument of the lambda.
Example:
case class ClassA( flag: Int )
case class ClassB( value: Int )
val listOfTuples =
List( (ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)), (ClassA(1),ClassB(-1)) )
val result = listOfTuples.foldRight( Map[Int,(ClassA,ClassB)]() ) {
( tuple, map ) => map + ( tuple._1.flag -> tuple )
}
val uniqList = result.values.toList
//uniqList: List((ClassA(1),ClassB(2)), (ClassA(3),ClassB(4)))
Edit: If you need to retain the order of the initial list, use instead:
val uniqList = listOfTuples.filter( result.values.toSet )
This compiles, but as I can't test it it's hard to say if it does "The Right Thing" (tm):
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
(for {outerIndex <- 0 until listOfTuple.size
if outerIndex != listOfTuple.size - 1
innerIndex <- outerIndex + 1 until listOfTuple.size
if listOfTuple(i)._1.flag == listOfTuple(j)._1.flag
} yield listOfTuple(i)).reverse.toList
Note that you can use == instead of equals (use eq if you need reference equality).
BTW: https://codereview.stackexchange.com/ is better suited for this type of question.
Do not use index with lists (like listOfTuple(i)). Index on lists have very lousy performance. So, some ways...
The easiest:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] =
SortedSet(listOfTuple: _*)(Ordering by (_._1.flag)).toList
This will preserve the last element of the list. If you want it to preserve the first element, pass listOfTuple.reverse instead. Because of the sorting, performance is, at best, O(nlogn). So, here's a faster way, using a mutable HashSet:
def removeDuplicates(listOfTuple: List[(Class1,Class2)]): List[(Class1,Class2)] = {
// Produce a hash map to find the duplicates
import scala.collection.mutable.HashSet
val seen = HashSet[Flag]()
// now fold
listOfTuple.foldLeft(Nil: List[(Class1,Class2)]) {
case (acc, el) =>
val result = if (seen(el._1.flag)) acc else el :: acc
seen += el._1.flag
result
}.reverse
}
One can avoid using a mutable HashSet in two ways:
Make seen a var, so that it can be updated.
Pass the set along with the list being created in the fold. The case then becomes:
case ((seen, acc), el) =>