Scala Performance Issue with mutable List (LinkedList) - scala

I have the following code snippets: The code reads the system (Linux) dictionary(en) file and keeps it in memory List.
Code 1 : (With mutable List)
val word = scala.collection.mutable.LinkedList[String]("init");
for(line <- Source.fromFile("/usr/share/dict/words").getLines()){
val s : String = line.trim()
if( // some checks
){
word append scala.collection.mutable.LinkedList[String](s)
}
}
Code 2 : (With Immutable List)
var word = List[String]()
for(line <- Source.fromFile("/usr/share/dict/words").getLines()){
val s : String = line.trim()
if( // some checks
){
word ::= s
}
}
Code 2 : returns almost immediately , But
Code 1 : Takes for ever .
Can any one help me out , why is it taking so much time for mutable List? . Should we use Mutable at all or Am I doing something wrong?
Scala version used : 2.10.3
Thanks in Advance for your help.

word append scala.collection.mutable.LinkedList[String](s)
Traverse the word list and then at the end append the items from the other list.
word ::= s
Append s at the front of the word list and assign the new list to word variable.
Appending to the end of list is always expensive as compared to add a item to the front.

In the first example, you are adding to the end of a list repeatedly (append). This takes time on the order of the length of the list. In the second example, you are adding to the beginning of a list (::). This takes constant time. So the first example has an execution time that increases with the square of the number of lines in the file, and the second has an execution time that increases linearly with the length of the file.
This is due to the nature of linked lists, which are the data structure underlying both immutable List and mutable LinkedList. linked lists are fast to access at the front and slow to access at the back.

Related

Scala, user input till only newline is given

I have tried to get multiple user inputs to print them in Scala IDE.
I have tried the this piece of code
println(scala.io.StdIn.readLine())
which works, as the IDE takes my input and then print it in the line but this works only for a single input.
I want the code to take multiple inputs till only newline is entered. example,
1
2
3
so i decided we needed an iterator for the input, which led me to try the following 2 lines of code seperately
var in = Iterator.continually{ scala.io.StdIn.readLine() }.takeWhile { x => x != null}
and
var in = io.Source.stdin.getLines().takeWhile { x => x != null}
Unfortunately none of them worked as the IDE is not taking my input at all.
You're really close.
val in = Iterator.continually(io.StdIn.readLine).takeWhile(_.nonEmpty).toList
This will read input until an empty string is entered and saves the input in a List[String]. The reason for toList is because an Iterator element doesn't become real until next is called on it, so readLine won't be called until the next element is required. The transition to List creates all the elements of the Iterator.
update
As #vossad01 has pointed out, this can be made safer for unexpected input.
val in = Iterator.continually(io.StdIn.readLine)
.takeWhile(Option(_).fold(false)(_.nonEmpty))
.toList

Map word ngrams to counts in scala

I'm trying to create a map which goes through all the ngrams in a document and counts how often they appear. Ngrams are sets of n consecutive words in a sentence (so in the last sentence, (Ngrams, are) is a 2-gram, (are, sets) is the next 2-gram, and so on). I already have code that creates a document from a file and parses it into sentences. I also have a function to count the ngrams in a sentence, ngramsInSentence, which returns Seq[Ngram].
I'm getting stuck syntactically on how to create my counts map. I am iterating through all the ngrams in the document in the for loop, but don't know how to map the ngrams to the count of how often they occur. I'm fairly new to Scala and the syntax is evading me, although I'm clear conceptually on what I need!
def getNGramCounts(document: Document, n: Int): Counts = {
for (sentence <- document.sentences; ngram <- nGramsInSentence(sentence,n))
//I need code here to map ngram -> count how many times ngram appears in document
}
The type Counts above, as well as Ngram, are defined as:
type Counts = Map[NGram, Double]
type NGram = Seq[String]
Does anyone know the syntax to map the ngrams from the for loop to a count of how often they occur? Please let me know if you'd like more details on the problem.
If I'm correctly interpreting your code, this is a fairly common task.
def getNGramCounts(document: Document, n: Int): Counts = {
val allNGrams: Seq[NGram] = for {
sentence <- document.sentences
ngram <- nGramsInSentence(sentence, n)
} yield ngram
allNgrams.groupBy(identity).mapValues(_.size.toDouble)
}
The allNGrams variable collects a list of all the NGrams appearing in the document.
You should eventually turn to Streams if the document is big and you can't hold the whole sequence in memory.
The following groupBycreates a Map[NGram, List[NGram]] which groups your values by its identity (the argument to the method defines the criteria for "aggregate identification") and groups the corresponding values in a list.
You then only need to map the values (the List[NGram]) to its size to get how many recurring values there were of each NGram.
I took for granted that:
NGram has the expected correct implementation of equals + hashcode
document.sentences returns a Seq[...]. If not you should expect allNGrams to be of the corresponding collection type.
UPDATED based on the comments
I wrongly assumed that the groupBy(_) would shortcut the input value. Use the identity function instead.
I converted the count to a Double
Appreciate the help - I have the correct code now using the suggestions above. The following returns the desired result:
def getNGramCounts(document: Document, n: Int): Counts = {
val allNGrams: Seq[NGram] = (for(sentence <- document.sentences;
ngram <- ngramsInSentence(sentence,n))
yield ngram)
allNGrams.groupBy(l => l).map(t => (t._1, t._2.length.toDouble))
}

.pop() equivalent in scala

I have worked on python
In python there is a function .pop() which delete the last value in a list and return that
deleted value
ex. x=[1,2,3,4]
x.pop() will return 4
I was wondering is there is a scala equivalent for this function?
If you just wish to retrieve the last value, you can call x.last. This won't remove the last element from the list, however, which is immutable. Instead, you can call x.init to obtain a list consisting of all elements in x except the last one - again, without actually changing x. So:
val lastEl = x.last
val rest = x.init
will give you the last element (lastEl), the list of all bar the last element (rest), and you still also have the original list (x).
There are a lot of different collection types in Scala, each with its own set of supported and/or well performing operations.
In Scala, a List is an immutable cons-cell sequence like in Lisp. Getting the last element is not a well optimised solution (the head element is fast). Similarly Queue and Stack are optimised for retrieving an element and the rest of the structure from one end particularly. You could use either of them if your order is reversed.
Otherwise, Vector is a good performing general structure which is fast both for head and last calls:
val v = Vector(1, 2, 3, 4)
val init :+ last = v // uses pattern matching extractor `:+` to get both init and last
Where last would be the equivalent of your pop operation, and init is the sequence with the last element removed (you can also use dropRight(1) as suggested in the other answers). To just retrieve the last element, use v.last.
I tend to use
val popped :: newList = list
which assigns the first element of the list to popped and the remaining list to newList
The first answer is correct but you can achieve the same doing:
val last = x.last
val rest = x.dropRight(1)
If you're willing to relax your need for immutable structures, there's always Stack and Queue:
val poppable = scala.collection.mutable.Stack[String]("hi", "ho")
val popped = poppable.pop
Similar to Python's ability to pop multiple elements, Queue handles that:
val multiPoppable = scala.collection.mutable.Queue[String]("hi", "ho")
val allPopped = poppable.dequeueAll(_ => true)
If it is mutable.Queue, use dequeue function
/** Returns the first element in the queue, and removes this element
* from the queue.
*
* #throws java.util.NoSuchElementException
* #return the first element of the queue.
*/
def dequeue(): A =
if (isEmpty)
throw new NoSuchElementException("queue empty")
else {
val res = first0.elem
first0 = first0.next
decrementLength()
res
}

Why does the length function seems to delete the list?

I have this code snippet:
for (f <- file_list){
val file_name = path + "\\" + f + ".txt"
val line_list = Source.fromFile(file_name).getLines()
println (file_name + ": " + line_list.length)
println (file_name + ": " + line_list.length)
total_number_lines += line_list.size
}
I have a list of files, for each of them I open it, load it as a list of its lines and then I count the number of lines in the list.
The former call to line_list.length gives the right values of line number, but the latter one always returns zero. Actually, after the length function is executed, the line_list list seems to be empty.
I really cannot understand why is that.
What I am missing?
Source.getLines() returns an Iterator[String], not a collection, so calling .length on it will completely consume it.
You can use Source.fromFile(file_name).getLines().toList if you want to go through it several times.
getLines() returns an Iterator[String] and you can only traverse an iterator once. Calling length exhausts the iterator, so subsequent calls to length and size are being called when the end has being reached, hence it appearing empty:
It is of particular importance to note that, unless stated otherwise,
one should never use an iterator after calling a method on it. The two
most important exceptions are also the sole abstract methods: next and
hasNext.

How to test, if an element is in a list?

Is there a built-in function to test, if a given element is in a list in Rexx?
I could not find one. The alternative would be to loop over the list and check each element manually.
No (unless things have changed); just loop through the list.
An alternative is instead / as well have a lookup variable
i.e.
lookup. = 0 /* not all versions of Rexx support
default initialisation like this */
....
addToList:
parse arg item
numberInList = numberInList + 1
list.numberInList = item
lookup.item = 1
return
You can then check if item is in the list by
if lookup.item = 1 then do
......
It depends what you mean by a list.
At work, I use classic REXX. I frequently store lists of words in a single variable, space delimited. So WORDPOS() is the built-in function I use.
If you are using a List class in ooREXX. then why not use the hasItem method from the Collection class.