How to split a text into strings? - scala

I want to split a text file into strings, can you please tell me how to split it. For example, the following text file is given:
this course in, a style i
will have to a modern, language that encourages
writing clean; and elegant code in a good
Is there any possibility to split the text file into strings like following, for example by 2 words:
this course
in a
style i
will have
to a
modern language
that encourages
writing clean
and elegant
code in
a good
Can you please give me some hints? Thank you in advance.

Some ideas:
1) Use java.util.Scanner to read in tokens direct from the file using the next(pattern: String) method
or
2) Read in all lines (see scala.io.Source), concatenate them into a single string, split the string into an array, then use the grouped method to split that into sub-arrays of 2 elements

In addition to LuigiĀ“s answer.
3) You should think about filtering out the punctuation.
4) Another hint:
scala> val list = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
list: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
scala> val listOfTwoElements = list.sliding(2).toList
listOfTwoElements: List[List[Int]] = List(List(1, 2), List(2, 3), List(3, 4), List(4, 5), List(5, 6), List(6, 7), List(7, 8), List(8, 9), List(9, 10))

Related

Printing specific output in Scala

I have the following array of arrays that represents a cycle in a graph that I want to print in the below format.
scala> result.collect
Array[Array[Long]] = Array(Array(0, 1, 4, 0), Array(1, 5, 2, 1), Array(1, 4, 0, 1), Array(2, 3, 5, 2), Array(2, 1, 5, 2), Array(3, 5, 2, 3), Array(4, 0, 1, 4), Array(5, 2, 3, 5), Array(5, 2, 1, 5))
0:0->1->4;
1:1->5->2;1->4->0;
2:2->3->5;2->1->5;
3:3->5->2;
4:4->0->1;
5:5->2->3;5->2->1;
How can I do this? I have tried to do a for loop with if statements like other coding languages but scala's ifs in for loops are for filtering and cannot make use if/else to account for two different criteria.
example python code
for (array,i) in enumerate(range(0,result.length)):
if array[i] == array[i+1]:
//print thing needed
else:
// print other thing
I also tried to do result.groupBy to make it easier to print but doing that ruins the arrays.
Array[(Long, Iterable[Array[Long]])] = Array((4,CompactBuffer([J#3677a08a)), (0,CompactBuffer([J#695fd7e)), (1,CompactBuffer([J#50b0f441, [J#142efc4d)), (3,CompactBuffer([J#1fd66db2)), (5,CompactBuffer([J#36811d3b, [J#61c4f556)), (2,CompactBuffer([J#2eba1b7, [J#2efcf7a5)))
Is there a way to nicely print the output needed in Scala?
This should do it:
result
.groupBy(_.head)
.toArray
.sortBy(_._1)
.map {
case (node, cycles) =>
val paths = cycles.map { cycle =>
cycle
.init // drop last node
.mkString("->")
}
s"$node:${paths.mkString(";")}"
}
.mkString(";\n")
This is the output for the sample input you provided:
0:0->1->4;
1:1->5->2;1->4->0;
2:2->3->5;2->1->5;
3:3->5->2;
4:4->0->1;
5:5->2->3;5->2->1

Difference between flatMap and flatten in Scala [duplicate]

scala> List(List(1), List(2), List(3), List(4))
res18: List[List[Int]] = List(List(1), List(2), List(3), List(4))
scala> res18.flatten
res19: List[Int] = List(1, 2, 3, 4)
scala> res18.flatMap(identity)
res20: List[Int] = List(1, 2, 3, 4)
Is there any difference between these two functions? When is it appropriate to use one over the other? Are there any tradeoffs?
You can view flatMap(identity) as map(identity).flatten. (Of course it is not implemented that way, since it would take two iterations).
map(identity) gives you the same collection, so in the end it is the same as only flatten.
I would personally stick to flatten, since it is shorter/easier to understand and designed to exactly do this.
Conceptually there is no difference in the result... flatMap is taking
bit more time to produce same result...
I will show it more practically with an example of flatMap, map & then flatten and flatten
object Test extends App {
// flatmap
println(timeElapsed(List(List(1, 2, 3, 4), List(5, 6, 7, 8)).flatMap(identity)))
// map and then flatten
println(timeElapsed(List(List(1, 2, 3, 4), List(5, 6, 7, 8)).map(identity).flatten))
// flatten
println(timeElapsed(List(List(1, 2, 3, 4), List(5, 6, 7, 8)).flatten))
/**
* timeElapsed
*/
def timeElapsed[T](block: => T): T = {
val start = System.nanoTime()
val res = block
val totalTime = System.nanoTime - start
println("Elapsed time: %1d nano seconds".format(totalTime))
res
}
}
Both flatMap and flatten executed with same result after repeating several times
Conclusion : flatten is efficient
Elapsed time: 2915949 nano seconds
List(1, 2, 3, 4, 5, 6, 7, 8)
Elapsed time: 1060826 nano seconds
List(1, 2, 3, 4, 5, 6, 7, 8)
Elapsed time: 81172 nano seconds
List(1, 2, 3, 4, 5, 6, 7, 8)
Conceptually, there is no difference. Practically, flatten is more efficient, and conveys a clearer intent.
Generally, you don't use identity directly. It's more there for situations like it getting passed in as a parameter, or being set as a default. It's possible for the compiler to optimize it out, but you're risking a superfluous function call for every element.
You would use flatMap when you need to do a map (with a function other than identity) immediately followed by a flatten.

Scala: sliding(N,N) vs grouped(N)

I found myself lately using sliding(n,n) when I need to iterate collections in groups of n elements without re-processing any of them. I was wondering if it would be more correct to iterate those collections by using grouped(n). My question is if there is an special reason to use one or another for this specific case in terms of performance.
val listToGroup = List(1,2,3,4,5,6,7,8)
listToGroup: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)
listToGroup.sliding(3,3).toList
res0: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8))
listToGroup.grouped(3).toList
res1: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8))
The reason to use sliding instead of grouped is really only applicable when you want to have the 'windows' be of a length different than what you 'slide' by (that is to say, using sliding(m, n) where m != n):
listToGroup.sliding(2,3).toList
//returns List(List(1, 2), List(4, 5), List(7, 8))
listToGroup.sliding(4,3).toList
//returns List(List(1, 2, 3, 4), List(4, 5, 6, 7), List(7, 8))
As som-snytt points out in a comment, there's not going to be any performance difference, as both of them are implemented within Iterator as returning a new GroupedIterator. However, it's simpler to write grouped(n) than sliding(n, n), and your code will be cleaner and more obvious in its intended behavior, so I would recommend grouped(n).
As an example for where to use sliding, consider this problem where grouped simply doesn't suffice:
Given a list of numbers, find the sublist of length 4 with the greatest sum.
Now, putting aside the fact that a dynamic programming approach can produce a more efficient result, this can be solved as:
def maxLengthFourSublist(list: List[Int]): List[Int] = {
list.sliding(4,1).maxBy(_.sum)
}
If you were to use grouped here, you wouldn't get all the sublists, so sliding is more appropriate.

Flattening a list of lists to a set with exceptions in scala

This feels like a peculiar problem and I am very new to Scala, so I don't know how to ask the right questions in order to get progress on this problem.
As a demonstration, say I have a list of lists like this:
val data = List(List(1, 2, 3, 4), List(1, 2, 2, 3, 4), List(1, 2, 3, 3, 3, 4), List(1, 2, 3, 4), List(2, 3, 4))
and I want to be able to reduce it down to a List of integers that looks mostly like a distinct set of the multiple lists, with one exception: where each list has more than one of each integer, I want to represent that in the final list. So as a general rule, the list with the most representations of that integer will have "their" repetitions of that integer in the final list. So that would ideally give:
List(1, 2, 2, 3, 3, 3, 4)
I know I can do data.flatten.distinct and get:
List(1, 2, 3, 4)
but that's not what I want and I know there's probably a bit more work to get to the desired result.
I am wondering if there is a good way to achieve the desired result in a functional way in scala.
Try this
val data = List(List(1, 2, 3, 4), List(1, 2, 2, 3, 4), List(1, 2, 3, 3, 3, 4), List(1, 2, 3, 4), List(2, 3, 4))
val map = data.map(_.groupBy(identity)).foldLeft(Map[Int, List[Int]]()) {
case (r, c) => r ++ c.map {
case (k, v) => k -> (if (v.size > r.getOrElse(k, List()).size) v else r(k))
}
}.values.flatten
//> map : Iterable[Int] = List(2, 2, 4, 1, 3, 3, 3)
It does not maintain the ordering. After this you can call to sort this.
Maybe this is cleaner
data.flatMap(_.groupBy(identity)).groupBy(_._1).mapValues(_.sortBy(_._2.size).reverse(0)._2).values.flatten
//> res0: Iterable[Int] = List(2, 2, 4, 1, 3, 3, 3)
I don't quite get it, but you can just order elements
data.flatten.sorted
Which would give you
List(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)
if you want them ordered by number of encounters, you can do it like this:
data.flatten.groupBy(k => k).mapValues(_.size).toList.sortBy(_._2).map(_._1)
which would give you
List(1, 4, 2, 3)

Remove unique items from sequence [duplicate]

This question already has answers here:
How to get a set of all elements that occur multiple times in a list in Scala?
(2 answers)
Closed 8 years ago.
I find a lot about how to remove duplicates, but what is the most elegant way to remove unique items first and then the remaining duplicates.
E.g. a sequence (1, 2, 5, 2, 3, 4, 4, 0, 2) should be converted into (2, 4).
I can think of using a for-loop to add a count to each distinct item, but I could imagine that Scala has a more elegant way to achieve this.
distinct and diff will works for you:
val a = List(1, 2, 5, 2, 3, 4, 4, 0, 2)
> a: List[Int] = List(1, 2, 5, 2, 3, 4, 4, 0, 2)
val b = a diff a.distinct
> b: List[Int] = List(2, 4, 2)
val c = (a diff a.distinct).distinct
> c: List[Int] = List(2, 4)
In place distinct you can use toSet as well.
Also keep in mind that i => i can be replaced by identity and map(_._1) by keys, like this:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(identity).filter(_._2.size > 1).keys.toSeq
This is where a countByKey method, such as the one that can be found in Spark's API, would be useful.
Pretty straight forward:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(i => i).filter(_._2.size > 1).map(_._1).toSeq
Using the link from Ende Neu I think your code would become this:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(identity).collect { case (v, l) if l.length > 1 => v } toSeq