Get First nonrecurring element in a list using scala - scala

Getting an compilation error - forward reference extends over definition of value lst:
val lt = List(1,2,3,3,2,4,5,1,5,7,8,7)
var cond = false
do
{
var cond = if (lt.tail contains lt.head) true else false
if (cond == true) {
val lst : List[Int]= lt.filter(_!=lt.head)
val lt = lst
}
else {
println(lt.head)
}
}
while(cond == false)

You can implement "Get first" using find and you can implement "non-recurring" using count == 1 so the code is
lt.find(x => lt.count(_ == x) == 1)
This will return an Option[Int] that can be unpicked in the usual way.
This algorithm is clear but not efficient, so for a very long list you might want to pre-compute the count, or use a recursive function to implement your original algorithm. This would be less clear but more efficient, so avoid it unless you can prove that the inefficiency is causing a problem.
Update
Here is an example of pre-computing the count for each value. This is potentially faster for long lists because Map operations are typically O(log n) so the function is O(n log n) rather than O(n2) for the previous version.
def firstUniq[A](in: Seq[A]): Option[A] = {
val m = mutable.Map.empty[A, Int]
for (elem <- in) {
m.update(elem, m.getOrElseUpdate(elem, 0) + 1)
}
val singles = m.filter(_._2 == 1)
in.find(singles.contains)
}

first non recurring element in whole list
Get First nonrecurring element in a list using scala
You can use filter and count as
val firstNonRecurrringValue = lt.filter(x => lt.count(_ == x) == 1)(0)
so firstNonRecurrringValue is 4
first non recurring element in the list after the element
But looking at your do while code, it seems that you are trying to print the first element that is not recurring after it. For that following code should work
val firstNonRecurringValue = lt.zipWithIndex.filter(x => lt.drop(x._2).count(_ == x._1) == 1)(0)._1
Now firstNonRecurringValue should be 3

Related

Recursive sorting in Scala including an exit condition

I am attempting to sort a List of names using Scala, and I am trying to learn how to do this recursively. The List is a List of Lists, with the "element" list containing two items (lastName, firstName). My goal is to understand how to use recursion to sort the names. For the purpose of this post my goal is just to sort by the length of lastName.
If I call my function several times on a small sample list, it will successfully sort lastName by length from shortest to longest, but I have not been able to construct a satisfactory exit condition using recursion. I have tried variations of foreach and other loops, but I have been unsuccessful. Without a satisfactory exit condition, the recursion just continues forever.
import scala.collection.mutable.ListBuffer
import scala.annotation.tailrec
val nameListBuffer = new ListBuffer[List[String]]
val source = Source.fromFile("shortnames.txt")
val lines = source.getLines()
for (line <- lines) {
nameListBuffer += line.split(" ").reverse.toList
}
#tailrec
def sorting(x: ListBuffer[List[String]]): Unit = {
for (i <- 0 until ((x.length)-1)) {
var temp = x(i)
if (x(i)(0).length > x(i+1)(0).length) {
x(i) = x(i+1)
x(i+1) = temp
}
}
var continue = false
while (continue == false) {
for (i <- 0 until ((x.length)-1)) {
if (x(i)(0).length <= x(i+1)(0).length) {
continue == false//want to check ALL i's before evaluating
}
else continue == true
}
}
sorting(x)
}
sorting(nameListBuffer)
Sorry about the runtime complexity it's basically an inefficient bubble sort at O(n^4) but the exit criteria - focus on that. For tail recursion, the key is that the recursive call is to a smaller element than the preceding recursive call. Also, keep two arguments, one is the original list, and one is the list that you are accumulating (or whatever you want to return, it doesn't have to be a list). The recursive call keeps getting smaller until eventually you can return what you have accumulated. Use pattern matching to catch when the recursion has ended, and then you return what you were accumulating. This is why Lists are so popular in Scala, because of the Nil and Cons subtypes and because of operators like the :: can be handled nicely with pattern matching. One more thing, to be tail recursive, the last case has to make a recursive or it won't run.
import scala.collection.mutable.ListBuffer
import scala.annotation.tailrec
// I did not ingest from file I just created the test list from some literals
val dummyNameList = List(
List("Johanson", "John"), List("Nim", "Bryan"), List("Mack", "Craig")
, List("Youngs", "Daniel"), List("Williamson", "Zion"), List("Rodgersdorf", "Aaron"))
// You can use this code to populate nameList though I didn't run this code
val source = Source.fromFile("shortnames.txt")
val lines = source.getLines()
val nameList = {
for (line <- lines) yield line.split(" ").reverse.toList
}.toList
println("\nsorted:")
sortedNameList.foreach(println(_))
//This take one element and it will return the lowest element from the list
//of the other argument.
#tailrec
private def swapElem(elem: List[String], listOfLists: List[List[String]]): List[String] = listOfLists match {
case Nil => elem
case h::t if (elem(0).length > h(0).length) => swapElem(h, t)
case h::t => swapElem(elem, t)
}
//If the head is not the smallest element, then swap out the element
//with the smallest element of the list. I probably could have returned
// a tuple it might have looked nicer. It just keeps iterating though until
// there is no elements
#tailrec
private def iterate(listOfLists: List[List[String]], acc: List[List[String]]): List[List[String]] = listOfLists match {
case h::Nil => acc :+ h
case h::t if (swapElem(h, t) != h) => iterate(h :: t.filterNot(_ == swapElem(h, t)), acc :+ swapElem(h, t))
case h::t => iterate(t, acc :+ swapElem(h, t))
}
val sortedNameList = iterate(nameList, List.empty[List[String]])
println("\nsorted:")
sortedNameList.foreach(println(_))
sorted:
List(Nim, Bryan)
List(Mack, Craig)
List(Youngs, Daniel)
List(Johanson, John)
List(Williamson, Zion)
List(Rodgersdorf, Aaron)

Substitute while loop with functional code

I am refactoring some scala code and I am having problems with a while loop. The old code was:
for (s <- sentences){
// ...
while (/*Some condition*/){
// ...
function(trees, ...)
}
}
I've have translated that code into this one, using foldLeft to transverse sentences:
sentences./:(initialSeed){
(seed, s) =>
// ...
// Here I've replaced the while with other foldleft
trees./:(seed){
(v, n) =>
// ....
val updatedVariable = function(...., v)
}
}
Now, It may be the case that I need to stop transversing trees (The inner foldLeft before it is transverse entirely, for that I've found this question:
Abort early in a fold
But I also have the following problem:
As I transverse trees, I need to accumulate values to the variable v, function takes v and returns an updated v, called here updatedVariable. The problem is that I have the feeling that this is not a proper way of coding this functionality.
Could you recommended me a functional/immutable way of doing this?
NOTE: I've simplified the code to show the actual problem, the complete code is this:
val trainVocabulart = sentences./:(Vocabulary()){
(vocab, s) =>
var trees = s.tree
var i = 0
var noConstruction = false
trees./:(vocab){
(v, n) =>
if (i == trees.size - 1) {
if (noConstruction) return v
noConstruction = true
i = 0
} else {
// Build vocabulary
val updatedVocab = buildVocabulary(trees, v, i, Config.LeftCtx, Config.RightCtx)
val y = estimateTrainAction(trees, i)
val (newI, newTrees) = takeAction(trees, i, y)
i = newI
trees = newTrees
// Execute the action and modify the trees
if (y != Shift)
noConstruction = false
Vocabulary(v.positionVocab ++ updatedVocab.positionVocab,
v.positionTag ++ updatedVocab.positionTag,
v.chLVocab ++ updatedVocab.chLVocab,
v.chLTag ++ updatedVocab.chLTag,
v.chRVocab ++ updatedVocab.chRVocab,
v.chRTag ++ updatedVocab.chRTag)
}
v
}
}
And the old one:
for (s <- sentences) {
var trees = s.tree
var i = 0
var noConstruction = false
var exit = false
while (trees.nonEmpty && !exit) {
if (i == trees.size - 1) {
if (noConstruction) exit = true
noConstruction = true
i = 0
} else {
// Build vocabulary
buildVocabulary(trees, i, LeftCtx, RightCtx)
val y = estimateTrainAction(trees, i)
val (newI, newTrees) = takeAction(trees, i, y)
i = newI
trees = newTrees
// Execute the action and modify the trees
if (y != Shift)
noConstruction = false
}
}
}
1st - You don't make this easy. Neither your simplified or complete examples are complete enough to compile.
2nd - You include a link to some reasonable solutions to the break-out-early problem. Is there a reason why none of them look workable for your situation?
3rd - Does that complete example actually work? You're folding over a var ...
trees./:(vocab){
... and inside that operation you modify/update that var ...
trees = newTrees
According to my tests that's a meaningless statement. The original iteration is unchanged by updating the collection.
4th - I'm not convinced that fold is what you want here. fold iterates over a collection and reduces it to a single value, but your aim here doesn't appear to be finding that single value. The result of your /: is thrown away. There is no val result = trees./:(vocab){...
One solution you might look at is: trees.forall{ ... At the end of each iteration you just return true if the next iteration should proceed.

Efficient way to fold list in scala, while avoiding allocations and vars

I have a bunch of items in a list, and I need to analyze the content to find out how many of them are "complete". I started out with partition, but then realized that I didn't need to two lists back, so I switched to a fold:
val counts = groupRows.foldLeft( (0,0) )( (pair, row) =>
if(row.time == 0) (pair._1+1,pair._2)
else (pair._1, pair._2+1)
)
but I have a lot of rows to go through for a lot of parallel users, and it is causing a lot of GC activity (assumption on my part...the GC could be from other things, but I suspect this since I understand it will allocate a new tuple on every item folded).
for the time being, I've rewritten this as
var complete = 0
var incomplete = 0
list.foreach(row => if(row.time != 0) complete += 1 else incomplete += 1)
which fixes the GC, but introduces vars.
I was wondering if there was a way of doing this without using vars while also not abusing the GC?
EDIT:
Hard call on the answers I've received. A var implementation seems to be considerably faster on large lists (like by 40%) than even a tail-recursive optimized version that is more functional but should be equivalent.
The first answer from dhg seems to be on-par with the performance of the tail-recursive one, implying that the size pass is super-efficient...in fact, when optimized it runs very slightly faster than the tail-recursive one on my hardware.
The cleanest two-pass solution is probably to just use the built-in count method:
val complete = groupRows.count(_.time == 0)
val counts = (complete, groupRows.size - complete)
But you can do it in one pass if you use partition on an iterator:
val (complete, incomplete) = groupRows.iterator.partition(_.time == 0)
val counts = (complete.size, incomplete.size)
This works because the new returned iterators are linked behind the scenes and calling next on one will cause it to move the original iterator forward until it finds a matching element, but it remembers the non-matching elements for the other iterator so that they don't need to be recomputed.
Example of the one-pass solution:
scala> val groupRows = List(Row(0), Row(1), Row(1), Row(0), Row(0)).view.map{x => println(x); x}
scala> val (complete, incomplete) = groupRows.iterator.partition(_.time == 0)
Row(0)
Row(1)
complete: Iterator[Row] = non-empty iterator
incomplete: Iterator[Row] = non-empty iterator
scala> val counts = (complete.size, incomplete.size)
Row(1)
Row(0)
Row(0)
counts: (Int, Int) = (3,2)
I see you've already accepted an answer, but you rightly mention that that solution will traverse the list twice. The way to do it efficiently is with recursion.
def counts(xs: List[...], complete: Int = 0, incomplete: Int = 0): (Int,Int) =
xs match {
case Nil => (complete, incomplete)
case row :: tail =>
if (row.time == 0) counts(tail, complete + 1, incomplete)
else counts(tail, complete, incomplete + 1)
}
This is effectively just a customized fold, except we use 2 accumulators which are just Ints (primitives) instead of tuples (reference types). It should also be just as efficient a while-loop with vars - in fact, the bytecode should be identical.
Maybe it's just me, but I prefer using the various specialized folds (.size, .exists, .sum, .product) if they are available. I find it clearer and less error-prone than the heavy-duty power of general folds.
val complete = groupRows.view.filter(_.time==0).size
(complete, groupRows.length - complete)
How about this one? No import tax.
import scala.collection.generic.CanBuildFrom
import scala.collection.Traversable
import scala.collection.mutable.Builder
case class Count(n: Int, total: Int) {
def not = total - n
}
object Count {
implicit def cbf[A]: CanBuildFrom[Traversable[A], Boolean, Count] = new CanBuildFrom[Traversable[A], Boolean, Count] {
def apply(): Builder[Boolean, Count] = new Counter
def apply(from: Traversable[A]): Builder[Boolean, Count] = apply()
}
}
class Counter extends Builder[Boolean, Count] {
var n = 0
var ttl = 0
override def +=(b: Boolean) = { if (b) n += 1; ttl += 1; this }
override def clear() { n = 0 ; ttl = 0 }
override def result = Count(n, ttl)
}
object Counting extends App {
val vs = List(4, 17, 12, 21, 9, 24, 11)
val res: Count = vs map (_ % 2 == 0)
Console println s"${vs} have ${res.n} evens out of ${res.total}; ${res.not} were odd."
val res2: Count = vs collect { case i if i % 2 == 0 => i > 10 }
Console println s"${vs} have ${res2.n} evens over 10 out of ${res2.total}; ${res2.not} were smaller."
}
OK, inspired by the answers above, but really wanting to only pass over the list once and avoid GC, I decided that, in the face of a lack of direct API support, I would add this to my central library code:
class RichList[T](private val theList: List[T]) {
def partitionCount(f: T => Boolean): (Int, Int) = {
var matched = 0
var unmatched = 0
theList.foreach(r => { if (f(r)) matched += 1 else unmatched += 1 })
(matched, unmatched)
}
}
object RichList {
implicit def apply[T](list: List[T]): RichList[T] = new RichList(list)
}
Then in my application code (if I've imported the implicit), I can write var-free expressions:
val (complete, incomplete) = groupRows.partitionCount(_.time != 0)
and get what I want: an optimized GC-friendly routine that prevents me from polluting the rest of the program with vars.
However, I then saw Luigi's benchmark, and updated it to:
Use a longer list so that multiple passes on the list were more obvious in the numbers
Use a boolean function in all cases, so that we are comparing things fairly
http://pastebin.com/2XmrnrrB
The var implementation is definitely considerably faster, even though Luigi's routine should be identical (as one would expect with optimized tail recursion). Surprisingly, dhg's dual-pass original is just as fast (slightly faster if compiler optimization is on) as the tail-recursive one. I do not understand why.
It is slightly tidier to use a mutable accumulator pattern, like so, especially if you can re-use your accumulator:
case class Accum(var complete = 0, var incomplete = 0) {
def inc(compl: Boolean): this.type = {
if (compl) complete += 1 else incomplete += 1
this
}
}
val counts = groupRows.foldLeft( Accum() ){ (a, row) => a.inc( row.time == 0 ) }
If you really want to, you can hide your vars as private; if not, you still are a lot more self-contained than the pattern with vars.
You could just calculate it using the difference like so:
def counts(groupRows: List[Row]) = {
val complete = groupRows.foldLeft(0){ (pair, row) =>
if(row.time == 0) pair + 1 else pair
}
(complete, groupRows.length - complete)
}

How to yield a single element from for loop in scala?

Much like this question:
Functional code for looping with early exit
Say the code is
def findFirst[T](objects: List[T]):T = {
for (obj <- objects) {
if (expensiveFunc(obj) != null) return /*???*/ Some(obj)
}
None
}
How to yield a single element from a for loop like this in scala?
I do not want to use find, as proposed in the original question, i am curious about if and how it could be implemented using the for loop.
* UPDATE *
First, thanks for all the comments, but i guess i was not clear in the question. I am shooting for something like this:
val seven = for {
x <- 1 to 10
if x == 7
} return x
And that does not compile. The two errors are:
- return outside method definition
- method main has return statement; needs result type
I know find() would be better in this case, i am just learning and exploring the language. And in a more complex case with several iterators, i think finding with for can actually be usefull.
Thanks commenters, i'll start a bounty to make up for the bad posing of the question :)
If you want to use a for loop, which uses a nicer syntax than chained invocations of .find, .filter, etc., there is a neat trick. Instead of iterating over strict collections like list, iterate over lazy ones like iterators or streams. If you're starting with a strict collection, make it lazy with, e.g. .toIterator.
Let's see an example.
First let's define a "noisy" int, that will show us when it is invoked
def noisyInt(i : Int) = () => { println("Getting %d!".format(i)); i }
Now let's fill a list with some of these:
val l = List(1, 2, 3, 4).map(noisyInt)
We want to look for the first element which is even.
val r1 = for(e <- l; val v = e() ; if v % 2 == 0) yield v
The above line results in:
Getting 1!
Getting 2!
Getting 3!
Getting 4!
r1: List[Int] = List(2, 4)
...meaning that all elements were accessed. That makes sense, given that the resulting list contains all even numbers. Let's iterate over an iterator this time:
val r2 = (for(e <- l.toIterator; val v = e() ; if v % 2 == 0) yield v)
This results in:
Getting 1!
Getting 2!
r2: Iterator[Int] = non-empty iterator
Notice that the loop was executed only up to the point were it could figure out whether the result was an empty or non-empty iterator.
To get the first result, you can now simply call r2.next.
If you want a result of an Option type, use:
if(r2.hasNext) Some(r2.next) else None
Edit Your second example in this encoding is just:
val seven = (for {
x <- (1 to 10).toIterator
if x == 7
} yield x).next
...of course, you should be sure that there is always at least a solution if you're going to use .next. Alternatively, use headOption, defined for all Traversables, to get an Option[Int].
You can turn your list into a stream, so that any filters that the for-loop contains are only evaluated on-demand. However, yielding from the stream will always return a stream, and what you want is I suppose an option, so, as a final step you can check whether the resulting stream has at least one element, and return its head as a option. The headOption function does exactly that.
def findFirst[T](objects: List[T], expensiveFunc: T => Boolean): Option[T] =
(for (obj <- objects.toStream if expensiveFunc(obj)) yield obj).headOption
Why not do exactly what you sketched above, that is, return from the loop early? If you are interested in what Scala actually does under the hood, run your code with -print. Scala desugares the loop into a foreach and then uses an exception to leave the foreach prematurely.
So what you are trying to do is to break out a loop after your condition is satisfied. Answer here might be what you are looking for. How do I break out of a loop in Scala?.
Overall, for comprehension in Scala is translated into map, flatmap and filter operations. So it will not be possible to break out of these functions unless you throw an exception.
If you are wondering, this is how find is implemented in LineerSeqOptimized.scala; which List inherits
override /*IterableLike*/
def find(p: A => Boolean): Option[A] = {
var these = this
while (!these.isEmpty) {
if (p(these.head)) return Some(these.head)
these = these.tail
}
None
}
This is a horrible hack. But it would get you the result you wished for.
Idiomatically you'd use a Stream or View and just compute the parts you need.
def findFirst[T](objects: List[T]): T = {
def expensiveFunc(o : T) = // unclear what should be returned here
case class MissusedException(val data: T) extends Exception
try {
(for (obj <- objects) {
if (expensiveFunc(obj) != null) throw new MissusedException(obj)
})
objects.head // T must be returned from loop, dummy
} catch {
case MissusedException(obj) => obj
}
}
Why not something like
object Main {
def main(args: Array[String]): Unit = {
val seven = (for (
x <- 1 to 10
if x == 7
) yield x).headOption
}
}
Variable seven will be an Option holding Some(value) if value satisfies condition
I hope to help you.
I think ... no 'return' impl.
object TakeWhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else
seq(seq.takeWhile(_ == null).size)
}
object OptionLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T], index: Int = 0): T = if (seq.isEmpty) null.asInstanceOf[T] else
Option(seq(index)) getOrElse func(seq, index + 1)
}
object WhileLoop extends App {
println("first non-null: " + func(Seq(null, null, "x", "y", "z")))
def func[T](seq: Seq[T]): T = if (seq.isEmpty) null.asInstanceOf[T] else {
var i = 0
def obj = seq(i)
while (obj == null)
i += 1
obj
}
}
objects iterator filter { obj => (expensiveFunc(obj) != null } next
The trick is to get some lazy evaluated view on the colelction, either an iterator or a Stream, or objects.view. The filter will only execute as far as needed.

Initializing a val to be used in a different scope

How can I initialize a val that is to be used in another scope? In the example below, I am forced to make myOptimizedList as a var, since it is initialized in the if (iteration == 5){} scope and used in the if (iteration > 5){} scope.
val myList:A = List(...)
var myOptimizedList:A = null
for (iteration <- 1 to 100) {
if (iteration < 5) {
process(myList)
} else if (iteration == 5)
myOptimizedList = optimize(myList)
}
if (iteration > 5) {
process(myOptimizedList)
}
}
This may have been asked before, but I wonder if there is an elegant solution that uses Option[A].
Seems that you have taken this code example out of the context, so this solution can be not very suitable for your real context, but you can use foldLeft in order to simplify it:
val myOptimizedList = (1 to 100).foldLeft (myList) {
case (list, 5) => optimize(list)
case (list, _) => process(list); list
}
You can almost always rewrite some sort of looping construct as a (tail) recursive function:
#annotation.tailrec def processLists(xs: List[A], start: Int, stop: Int) {
val next = start + 1
if (start < 5) { process(xs); processLists(xs, next, stop)
else if (start == 5) { processLists( optimize(xs), next, stop) }
else if (start <= stop) { process(xs); processLists( xs, next, stop ) }
}
processLists(myList, 100, 1)
Here, you pass forward that data which you would otherwise have mutated. If you need to mutate a huge number of things it becomes unwieldy, but for one or two it is often as clear or clearer than doing the mutation.
It's often the case that you can rework your code to avoid the problem. Consider the simple, and common, example here:
var x = 0
if(something)
x = 5
else
x = 6
println(x)
This would be a pretty common pattern in most languages, but Scala has a better way of doing it. Specifically, if-statements can return values, so the better way is:
val x =
if(something)
5
else
6
println(x)
So we can make x a val after all.
Now, clearly your code can be rewritten to use all vals:
val myList:A = List(...)
for (iteration <- 1 to 5)
process(myList)
val myOptimizedList = optimize(myList)
for (iteration <- 5 to 100)
process(myOptimizedList)
But I suspect this is simply an example, not your real case. But if you're unsure how you might rearrange your real code to accomplish something similar, please show us what it looks like.
There's another technique (perhaps trick in this case) to delay initialization of
myOptimizedList which is to use a lazy val. Your example is very specific but the principal is still obvious, delay assignment of a val until it is first referenced.
val myList = List(A(), A(), A())
lazy val myOptimizedList = optimize(myList)
for (iteration <- 1 to 100) {
if (iteration < 5)
process(myList)
else if (iteration > 5)
process(myOptimizedList)
}
Note that the case iteration == 5 is ignored.