Best way to read lines in groups in a flat file - scala

I have a file which consists of the groups of lines. Each group represents a event. The end of the group is denoted by "END". I can think of using a for loop to loop through the lines, store the intermediate lines and emit the group when "END" is encounter.
But since I would like to do it in Scala. I am wondering if someone can suggest a more functional way to accomplish the same thing?
----------
A
B
C
END
----------
D
E
F
END
----------

Just define an iterator to return groups
def groupIterator(xs:Iterator[String]) =
new Iterator[List[String]]
{ def hasNext = xs.hasNext; def next = xs.takeWhile(_ != "END").toList}
Testing (with an Iterator[String], but Source.getLines will return you an Iterator for the lines of your file)
val str = """
A
B
C
END
D
E
F
END
""".trim
for (g <- groupIterator(str.split('\n').toIterator)) println(g)
//> List(A, B, C)
//| List(D, E, F)

Related

Scala File lines to Map

I'm Currently opening my files and utilizing .getLines to retrieve each lines from the file with a word and its phonetic pronunciation separated by two white spaces, i'm confused as to how would i go about Mapping the word and its pronunciation in Scala as i'm fairly new to the language.
i've previously though to utilize split and separate the words and their sounds into different lines,but, i'm lost
Currently i Started with
def words(filename: String, word: String): Unit = {
val file = Source.fromFile(filename).getLines().drop(56)
for(x <- file){
}
}
EX:
ARTI AA1 R T IY2
AASE AA1 S
ABAIR AH0 B EH1 R
AB AE1 B
Result:
Map("AARTI -> "AA1 R T IY2","AASE" -> "AA1 S", "ABAIR" -> " AH0 B EH1 R")
iterate each line
split by 2 white spaces " "
create a tuple of (a -> b)
convert Array[Tuple[A, B]] => Map[A, B]
example,
val data =
"""
ARTI AA1 R T IY2
AASE AA1 S
ABAIR AH0 B EH1 R
AB AE1 B
""".stripMargin
val lines: Array[String] = data.split("\n").filter(_.trim.nonEmpty)
// if you are reading from file
// val lines = Source.fromFile("src/test/resources/my_filename.txt").getLines()
val res: Array[Tuple2[String, String]] = lines.map { line =>
line.split(" ") match { case Array(a, b) => a -> b }
}
println(res.toMap)
output:
Map(ARTI -> AA1 R T IY2, AASE -> AA1 S, ABAIR -> AH0 B EH1 R, AB -> AE1 B)
Running example - https://scastie.scala-lang.org/prayagupd/jBCnEhUPQJCMPKP9TXlgWA
How to read entire file in Scala?
If your lines are in file then this will create a Map from the first word to the rest of the string:
val res: Map[String, String] = file.map(_.span(_.isLetter))(collection.breakOut)
The values in the Map will contain leading space characters so you may want to call trim on them before using them.
The map call processes each line in turn.
The span method splits the line into a tuple where the first value is your word and the second is the rest of the line.
Using collection.breakOut tells map to put the results directly into a Map rather than going through an intermediate array or list.

Scala - List of Strings to Square Cypher String

I am following an exercise in Scala to build a square cypher. Here's an overview of the problem:
List("hello", "world", "fille", "r") is written taking the first letter from each String in the List and concatenating to the final string. Essentially, if you write them in square cypher form, you get:
hwfr
eoi
lrl
lll
ode
Which if you read from top to bottom, left to right, is the message. My expected output needs to be a List[String] that becomes List("hwfr", "eoi", ...). I don't know what methods or where to start in order to manipulate the original List in order to adhere to the form that I need. I can't map zip since zip only takes two arguments and I have an indeterminate amount of Strings. I'm not exactly sure how I might iterate over this List to get the result I need and would appreciate any suggestions or tips.
scala> val list = List("hello", "world", "fille", "rtext")
list: List[String] = List(hello, world, fille, rtext)
scala> list.transpose
res6: List[List[Char]] = List(List(h, w, f, r), List(e, o, i, t), List(l, r, l, e), List(l, l, l, x), List(o, d, e, t))
does the trick, api
Here is a version which does not care about equal word length. There should be more efficient versions, but I wanted to keep it relatively short.
Basic idea: Find out how long the longest word is (max). Since you know that, you start with index i = 0 and take the character at that position i from each string and form a string from it until you are at i = max - 1 (which is the position of the last character of the longest word. When the words are not at equal length, you have to make sure that you don't access a character which is not there.
Example: i = 1, then you get e from hello, o from world, i from fille, but accessing character 1 on r would result in an exception. That is why we check for size of the string beforehand and in that case append the empty string. if(i < elem.size) elem(i) else ""
val list = List("hello", "world", "fille", "r")
val max = list.maxBy(_.size).size //gives you the size of the longest word
val result: List[String] = (0 until max).map(i => list.foldLeft("")
((s, elem) => s + (if(i < elem.size) elem(i) else "")))(collection.breakOut)
println(result) //List(hwfr, eoi, lrl, lll, ode)
Edit:
If you still want it to be readable from left-right/top-bottom (if they are not ordered by length and you don't want to order them), you can introduce spaces. Change if(i < elem.size) elem(i) else "" to if(i < elem.size) elem(i) else " ".
List("hello", "world", "fille", "r") would become List(hwfr, eoi , lrl , lll , ode ) and List("hello", "world", "r", "fille") would become List(hwrf, eo i, lr l, ll l, od e)

Scala list of tuples of different size zip issues?

Hi my two lists as follows:
val a = List((1430299869,"A",4200), (1430299869,"A",0))
val b = List((1430302366,"B",4100), (1430302366,"B",4200), (1430302366,"B",5000), (1430302366,"B",27017), (1430302366,"B",80), (1430302366,"B",9300), (1430302366,"B",9200), (1430302366,"A",5000), (1430302366,"A",4200), (1430302366,"A",80), (1430302366,"A",443), (1430302366,"C",4100), (1430302366,"C",4200), (1430302366,"C",27017), (1430302366,"C",5000), (1430302366,"C",80))
when I used zip two lists as below :
val c = a zip b
it returns results as
List(((1430299869,A,4200),(1430302366,B,4100)), ((1430299869,A,0),(1430302366,B,4200)))
Not all lists of tuples, how can I zip all above data?
EDIT
expected results as combine of two lists like :
List((1430299869,"A",4200), (1430299869,"A",0),(1430302366,"B",4100), (1430302366,"B",4200), (1430302366,"B",5000), (1430302366,"B",27017), (1430302366,"B",80), (1430302366,"B",9300), (1430302366,"B",9200), (1430302366,"A",5000), (1430302366,"A",4200), (1430302366,"A",80), (1430302366,"A",443), (1430302366,"C",4100), (1430302366,"C",4200), (1430302366,"C",27017), (1430302366,"C",5000), (1430302366,"C",80))
Second Edit
I tried this :
val d = for(((a,b,c),(d,e,f)) <- (a zip b)if(b.equals(e) && c.equals(f))) yield (d,e,f)
but it gives empty results because of (a zip b) but I replaced a zip b as a ++ b then it shows following error :
constructor cannot be instantiated to expected type;
So how can I get matching tuples?
Just add one list to another:
a ++ b
According to your 2nd edit, what you need is:
for {
(a1,b1,c) <- a //rename extracted to a1 and b1 to avoid confusion
(d,e,f) <- b
if b1.equals(e) && c.equals(f)
} yield (d,e,f)
Or:
for {
(a1, b1, c) <- a
(d, `b1`, `c`) <- b //enclosing it in backticks avoids capture and matches against already defined values
} yield (d, b1, c)
Zipping won't help since you need to compare all tuples in a with all tuples in b , it seems.
a zip b creates a list of pairs of elements from a and b.
What you're most likely looking for is list concatenation, which is a ++ b
On zipping (pairing) all data in the lists, consider first a briefer input for illustrating the case,
val a = (1 to 2).toList
val b = (10 to 12).toList
Then for instance a for comprehension may convey the needs,
for (i <- a; j <- b) yield (i,j)
which delivers
List((1,10), (1,11), (1,12),
(2,10), (2,11), (2,12))
Update
From OP latest update, consider a dedicated filtering function,
type triplet = (Int,String,Int)
def filtering(key: triplet, xs: List[triplet]) =
xs.filter( v => key._2 == v._2 && key._3 == v._3 )
and so apply it with flatMap,
a.flatMap(filtering(_, b))
List((1430302366,A,4200))
One additional step is to encapsulate this in an implicit class,
implicit class OpsFilter(val keys: List[triplet]) extends AnyVal {
def filtering(xs: List[triplet]) = {
keys.flatMap ( key => xs.filter( v => key._2 == v._2 && key._3 == v._3 ))
}
}
and likewise,
a.filtering(b)
List((1430302366,A,4200))

How to dynamically generate parallel futures with for-yield

I have below code:
val f1 = Future(genA1)
val f2 = Future(genA2)
val f3 = Future(genA3)
val f4 = Future(genA4)
val results: Future[Seq[A]] = for {
a1 <- f1
a2 <- f2
a3 <- f3
a4 <- f4
} yield Seq(a, b, c, d)
Now I have a requirement to optionally exclude a2, how to modified the code? ( with map or flatMap is also acceptable)
Further more, say if I have M possible future needs to be aggregated like above, and N of M could be optionally excluded against some flag (biz logic), how should I handle it?
thanks in advance!
Leon
In question1, I understand that you want to exclude one entry (e.g B) from the sequence given some logic and in question2, you want to supress N entries from a total of M, and have the future computed on those results. We could generalize both cases to something like this:
// Using a map as simple example, but 'generators' could be a function that creates the required computation
val generators = Map('a' -> genA1, 'b' -> genA1, 'c' -> genA3, 'd' -> genA4)
...
// shouldAccept(k) => Business logic to decide which computations should be executed.
val selectedGenerators = generators.filter{case (k,v) => shouldAccept(k)}
// Create Seq[Future] from the selected computations
val futures = selectedGenerators.map{case (k,v) => Future(v)}
// Create Future[Seq[_]] to have the result of computing all entries.
val result = Future.sequence(futures)
In general, what I think you are looking for is Future.sequence, which takes a Seq[Future[_]] and produces a Future[Seq[_]], which is basically what you are doing "by hand" with the for-comprehension.

generating permutations with scalacheck

I have some generators like this:
val fooRepr = oneOf(a, b, c, d, e)
val foo = for (s <- choose(1, 5); c <- listOfN(s, fooRepr)) yield c.mkString("$")
This leads to duplicates ... I might get two a's, etc. What I really want is to generate random permutation with exactly 0 or 1 or each of a, b, c, d, or e (with at least one of something), in any order.
I was thinking there must be an easy way, but I'm struggling to even find a hard way. :)
Edited: Ok, this seems to work:
val foo = for (s <- choose(1, 5);
c <- permute(s, a, b, c, d, e)) yield c.mkString("$")
def permute[T](n: Int, gs: Gen[T]*): Gen[Seq[T]] = {
val perm = Random.shuffle(gs.toList)
for {
is <- pick(n, 1 until gs.size)
xs <- sequence[List,T](is.toList.map(perm(_)))
} yield xs
}
...borrowing heavily from Gen.pick.
Thanks for your help, -Eric
Rex, thanks for clarifying exactly what I'm trying to do, and that's useful code, but perhaps not so nice with scalacheck, particularly if the generators in question are quite complex. In my particular case the generators a, b, c, etc. are generating huge strings.
Anyhow, there was a bug in my solution above; what worked for me is below. I put a tiny project demonstrating how to do this at github
The guts of it is below. If there's a better way, I'd love to know it...
package powerset
import org.scalacheck._
import org.scalacheck.Gen._
import org.scalacheck.Gen
import scala.util.Random
object PowersetPermutations extends Properties("PowersetPermutations") {
def a: Gen[String] = value("a")
def b: Gen[String] = value("b")
def c: Gen[String] = value("c")
def d: Gen[String] = value("d")
def e: Gen[String] = value("e")
val foo = for (s <- choose(1, 5);
c <- permute(s, a, b, c, d, e)) yield c.mkString
def permute[T](n: Int, gs: Gen[T]*): Gen[Seq[T]] = {
val perm = Random.shuffle(gs.toList)
for {
is <- pick(n, 0 until gs.size)
xs <- sequence[List, T](is.toList.map(perm(_)))
} yield xs
}
implicit def arbString: Arbitrary[String] = Arbitrary(foo)
property("powerset") = Prop.forAll {
a: String => println(a); true
}
}
Thanks,
Eric
You're not describing a permutation, but the power set (minus the empty set)Edit: you're describing a combination of a power set and a permutation. The power set of an indexed set N is isomorphic to 2^N, so we simply (in Scala alone; maybe you want to alter this for use with ScalaCheck):
def powerSet[X](xs: List[X]) = {
val xis = xs.zipWithIndex
(for (j <- 1 until (1<<xs.length)) yield {
for ((x,i) <- xis if ((j & (1<<i)) != 0)) yield x
}).toList
}
to generate all possible subsets given a set. Of course, explicit generation of power sets is unwise if they original set contains more than a handful of elements. If you don't want to generate all of them, just pass in a random number from 1 until (1<<(xs.length-1)) and run the inner loop. (Switch to Long if there are 33-64 elements, and to BitSet if there are more yet.) You can then permute the result to switch the order around if you wish.
Edit: there's another way to do this if you can generate permutations easily and you can add a dummy argument: make your list one longer, with a Stop token. Then permute and .takeWhile(_ != Stop). Ta-da! Permutations of arbitrary length. (Filter out the zero-length answer if need be.)