I'm following the tutorial from Alvin Alexander to use Loan Pattern
Here is the code what I use -
val year = 2016
val nationalData = {
val source = io.Source.fromFile(s"resources/Babynames/names/yob$year.txt")
// names is iterator of String, split() gives the array
//.toArray & toSeq is a slow process compare to .toSet // .toSeq gives Stream Closed error
val names = source.getLines().filter(_.nonEmpty).map(_.split(",")(0)).toSet
// println(names.mkString(","))
println("Names " + nationalData)
val info = for (stateFile <- new java.io.File("resources/Babynames/namesbystate").list(); if stateFile.endsWith(".TXT")) yield {
val source = io.Source.fromFile("resources/Babynames/namesbystate/" + stateFile)
val names = source.getLines().filter(_.nonEmpty).map(_.split(",")).
filter(a => a(2).toInt == year).map(a => a(3)).toArray // .toSet
(stateFile.take(2), names)
println(info(0)._2.size + " names from state "+ info(0)._1)
println(info(1)._2.size + " names from state "+ info(1)._1)
for ((state, sname) <- info) {
println("State: " +state + " Coverage of name in "+ year+" "+ sname.count(n => nationalData.contains(n)).toDouble / nationalData.size) // Set doesn't have length method
This is how I applied readTextFile, readTextFileWithTry on the above code to learn/experiment Loan Pattern in the above code
def using[A <: { def close(): Unit }, B](resource: A)(f: A => B): B =
try {
} finally {
def readTextFile(filename: String): Option[List[String]] = {
try {
val lines = using(fromFile(filename)) { source =>
(for (line <- source.getLines) yield line).toList
} catch {
case e: Exception => None
def readTextFileWithTry(filename: String): Try[List[String]] = {
Try {
val lines = using(fromFile(filename)) { source =>
(for (line <- source.getLines) yield line).toList
val year = 2016
val data = readTextFile(s"resources/Babynames/names/yob$year.txt") match {
case Some(lines) =>
val n = lines.filter(_.nonEmpty).map(_.split(",")(0)).toSet
case None => println("couldn't read file")
val data1 = readTextFileWithTry("resources/Babynames/namesbystate")
data1 match {
case Success(lines) => {
val info = for (stateFile <- data1; if stateFile.endsWith(".TXT")) yield {
val source = fromFile("resources/Babynames/namesbystate/" + stateFile)
val names = source.getLines().filter(_.nonEmpty).map(_.split(",")).
filter(a => a(2).toInt == year).map(a => a(3)).toArray // .toSet
(stateFile.take(2), names)
But in the second case, readTextFileWithTry, I am getting the following error -
Failed, message is: java.io.FileNotFoundException: resources\Babynames\namesbystate (Access is denied)
I guess the reason for the failure is from SO what I understand -
I am trying to open the same file on each iteration of the for loop
Apart from that, I have few concerns regarding how I use -
Is it the good way to use? Can some help me how can I use the TRY on multiple occasions?
I tried to change the return type of readTextFileWithTry like Option[A] or Set/Map or Scala Collection to apply higher-order functions later on that. but not able to succeed. Not sure that is a good practice or not.
How can I use higher-order functions in Success case, as there are multiple operations and in Success case the code blocks get bigger? I can't use any field outside of Success case.
Can someone help me to understand?
I think that you problem has nothing to do with "I am trying to open the same file on each iteration of the for loop" and it is actually the same as in the accepted answer
Unfortunately you didn't provide stack trace so it is not clear on which line this happens. I would guess that the falling call is
val data1 = readTextFileWithTry("resources/Babynames/namesbystate")
And looking at your first code sample:
val info = for (stateFile <- new java.io.File("resources/Babynames/namesbystate").list(); if stateFile.endsWith(".TXT")) yield {
it looks like the path "resources/Babynames/namesbystate" points to a directory. But in your second example you are trying to read it as a file and this is the reason for the error. It comes from the fact that your readTextFileWithTry is not a valid substitute for java.io.File.list call. And File.list doesn't need a wrapper because it doesn't use any intermediate closeable/disposable entity.
P.S. it might make more sense to use File.list(FilenameFilter filter) instead of if stateFile.endsWith(".TXT"))
I am trying to write a function that would return a map in which every word is a key and the values are pages at which the word shows up. Currently, I am stuck at the point where I have data of the following type: List(List(words),page).
Is there any sensible way to reformat this data if so, please explain as I have no idea how to even begin?
object G {
def main(args: Array[String]): Unit = {
def stwórzIndeks(): Unit= {
val linie = io.Source
val zippedLinie: List[(String,Int)]=linie.zipWithIndex
val splitt=zippedLinie.foldLeft(List.empty[(List[String],Int)])((acc,curr)=>{
curr match {
case (arr,int) => {
val toAdd=(arr.split("\\s+").toList,zippedLinie.length-int)
You can replace that foldLet with a flatMap with an inner map to get a big List of (word, page).
val wordsAndPage = zippedLinie.flatMap {
case (line, idx) =>
lome.split("\\s+").toList.map(word => word -> idx + 1)
After that you can check for one of the grouping methods in the scaladoc.
I am attempting to transform some data that is encapsulated in cats.effect.IO with a Map that also is in an IO monad. I'm using http4s with blaze server and when I use the following code the request times out:
def getScoresByUserId(userId: Int): IO[Response[IO]] = {
implicit val formats = DefaultFormats + ShiftJsonSerializer() + RawShiftSerializer()
implicit val shiftJsonReader = new Reader[ShiftJson] {
def read(value: JValue): ShiftJson = value.extract[ShiftJson]
implicit val shiftJsonDec = jsonOf[IO, ShiftJson]
// get the shifts
var getDbShifts: IO[List[Shift]] = shiftModel.findByUserId(userId)
// use the userRoleId to get the RoleId then get the tasks for this role
val taskMap : IO[Map[String, Double]] = taskModel.findByUserId(userId).flatMap {
case tskLst: List[Task] => IO(tskLst.map((task: Task) => (task.name -> task.standard)).toMap)
val traversed: IO[List[Shift]] = for {
shifts <- getDbShifts
traversed <- shifts.traverse((shift: Shift) => {
val lstShiftJson: IO[List[ShiftJson]] = read[List[ShiftJson]](shift.roleTasks)
.map((sj: ShiftJson) =>
taskMap.flatMap((tm: Map[String, Double]) =>
IO(ShiftJson(sj.name, sj.taskType, sj.label, sj.value.toString.toDouble / tm.get(sj.name).get)))
//TODO: this flatMap is bricking my request
lstShiftJson.flatMap((sjLst: List[ShiftJson]) => {
IO(Shift(shift.id, shift.shiftDate, shift.shiftStart, shift.shiftEnd,
shift.lunchDuration, shift.shiftDuration, shift.breakOffProd, shift.systemDownOffProd,
shift.meetingOffProd, shift.trainingOffProd, shift.projectOffProd, shift.miscOffProd,
write[List[ShiftJson]](sjLst), shift.userRoleId, shift.isApproved, shift.score, shift.comments
} yield traversed
traversed.flatMap((sLst: List[Shift]) => Ok(write[List[Shift]](sLst)))
as you can see the TODO comment. I've narrowed down this method to the flatmap below the TODO comment. If I remove that flatMap and merely return "IO(shift)" to the traversed variable the request does not timeout; However, that doesn't help me much because I need to make use of the lstShiftJson variable which has my transformed json.
My intuition tells me I'm abusing the IO monad somehow, but I'm not quite sure how.
Thank you for your time in reading this!
So with the guidance of Luis's comment I refactored my code to the following. I don't think it is optimal (i.e. the flatMap at the end seems unecessary, but I couldnt' figure out how to remove it. BUT it's the best I've got.
def getScoresByUserId(userId: Int): IO[Response[IO]] = {
implicit val formats = DefaultFormats + ShiftJsonSerializer() + RawShiftSerializer()
implicit val shiftJsonReader = new Reader[ShiftJson] {
def read(value: JValue): ShiftJson = value.extract[ShiftJson]
implicit val shiftJsonDec = jsonOf[IO, ShiftJson]
// - read the shift.roleTasks into a ShiftJson object
// - divide each task value by the task.standard where task.name = shiftJson.name
// - write the list of shiftJson back to a string
val traversed = for {
taskMap <- taskModel.findByUserId(userId).map((tList: List[Task]) => tList.map((task: Task) => (task.name -> task.standard)).toMap)
shifts <- shiftModel.findByUserId(userId)
traversed <- shifts.traverse((shift: Shift) => {
val lstShiftJson: List[ShiftJson] = read[List[ShiftJson]](shift.roleTasks)
.map((sj: ShiftJson) => ShiftJson(sj.name, sj.taskType, sj.label, sj.value.toString.toDouble / taskMap.get(sj.name).get ))
shift.roleTasks = write[List[ShiftJson]](lstShiftJson)
} yield traversed
traversed.flatMap((t: List[Shift]) => Ok(write[List[Shift]](t)))
Luis mentioned that mapping my List[Shift] to a Map[String, Double] is a pure operation so we want to use a map instead of flatMap.
He mentioned that I'm wrapping every operation that comes from the database in IO which is causing a great deal of recomputation. (including DB transactions)
To solve this issue I moved all of the database operations inside of my for loop, using the "<-" operator to flatMap each of the return values allows the variables being used to preside within the IO monads, hence preventing the recomputation experienced before.
I do think there must be a better way of returning my return value. flatMapping the "traversed" variable to get back inside of the IO monad seems to be unnecessary recomputation, so please anyone correct me.
I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
val lines : Iterator[String] = io.Source.fromFile(file).getLines()
val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty
for (line <- lines){
line match {
case errorPat(date,ip)=> errors.append((ip,date))
case loginPat(date,user,ip,id) =>logins.put(ip, id)
case _ => ""
errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
Here is a possible solution:
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
val lines = Source.fromFile(file).getLines
val (err, log) = lines.collect {
case errorPat(inf, ip) => (Some((ip, inf)), None)
case loginPat(_, _, ip, id) => (None, Some((ip, id)))
val ip2id = log.flatten.toMap
err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
1) removed unnecessary types declarations
2) tuple deconstruction instead of ulgy ._1
3) left fold instead of mutable accumulators
4) used more convenient operator-like methods :+ and +
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)] = {
val lines = io.Source.fromFile(file).getLines()
val (logins, errors) =
((Map.empty[String, String], Seq.empty[(String, String)]) /: lines) {
case ((loginsAcc, errorsAcc), next) =>
next match {
case errorPat(date, ip) => (loginsAcc, errorsAcc :+ (ip -> date))
case loginPat(date, user, ip, id) => (loginsAcc + (ip -> id) , errorsAcc)
case _ => (loginsAcc, errorsAcc)
// more concise equivalent for
// errors.toList.map { case (ip, date) => (logins.getOrElse(ip, "none") + " " + ip) -> date }
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
I have a few suggestions:
Instead of a pair/tuple, it's often better to use your own class. It gives meaningful names to both the type and its fields, which makes the code much more readable.
Split the code into small parts. In particular, try to decouple pieces of code that don't need to be tied together. This makes your code easier to understand, more robust, less prone to errors and easier to test. In your case it'd be good to separate producing your input (lines of a log file) and consuming it to produce a result. For example, you'd be able to make automatic tests for your function without having to store sample data in a file.
As an example and exercise, I tried to make a solution based on Scalaz iteratees. It's a bit longer (includes some auxiliary code for IteratorEnumerator) and perhaps it's a bit overkill for the task, but perhaps someone will find it helpful.
import java.io._;
import scala.util.matching.Regex
import scalaz._
import scalaz.IterV._
object MyApp extends App {
// A type for the result. Having names keeps things
// clearer and shorter.
type LogResult = List[(String,String)]
// Represents a state of our computation. Not only it
// gives a name to the data, we can also put here
// functions that modify the state. This nicely
// separates what we're computing and how.
sealed case class State(
logins: Map[String,String],
errors: Seq[(String,String)]
) {
def this() = {
this(Map.empty[String,String], Seq.empty[(String,String)])
def addError(date: String, ip: String): State =
State(logins, errors :+ (ip -> date));
def addLogin(ip: String, id: String): State =
State(logins + (ip -> id), errors);
// Produce the final result from accumulated data.
def result: LogResult =
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
// An iteratee that consumes lines of our input. Based
// on the given regular expressions, it produces an
// iteratee that parses the input and uses State to
// compute the result.
def logIteratee(errorPat: Regex, loginPat: Regex):
IterV[String,List[(String,String)]] = {
// Consumes a signle line.
def consume(line: String, state: State): State =
line match {
case errorPat(date, ip) => state.addError(date, ip);
case loginPat(date, user, ip, id) => state.addLogin(ip, id);
case _ => state
// The core of the iteratee. Every time we consume a
// line, we update our state. When done, compute the
// final result.
def step(state: State)(s: Input[String]): IterV[String, LogResult] =
s(el = line => Cont(step(consume(line, state))),
empty = Cont(step(state)),
eof = Done(state.result, EOF[String]))
// Return the iterate waiting for its first input.
Cont(step(new State()));
// Converts an iterator into an enumerator. This
// should be more likely moved to Scalaz.
// Adapted from scalaz.ExampleIteratee
implicit val IteratorEnumerator = new Enumerator[Iterator] {
#annotation.tailrec def apply[E, A](e: Iterator[E], i: IterV[E, A]): IterV[E, A] = {
val next: Option[(Iterator[E], IterV[E, A])] =
if (e.hasNext) {
val x = e.next();
i.fold(done = (_, _) => None, cont = k => Some((e, k(El(x)))))
} else
next match {
case None => i
case Some((es, is)) => apply(es, is)
// main ---------------------------------------------------
// Read a file as an iterator of lines:
// val lines: Iterator[String] =
// io.Source.fromFile("test.log").getLines();
// Create our testing iterator:
val lines: Iterator[String] = Seq(
"Error: 2012/03",
"Login: 2012/03 user Joe",
"Error: 2012/03",
"Error: 2012/04"
// Create an iteratee.
val iter = logIteratee("Error: (\\S+) (\\S+)".r,
"Login: (\\S+) (\\S+) (\\S+) (\\S+)".r);
// Run the the iteratee against the input
// (the enumerator is implicit)
Suppose I would like to code the following logic in Scala
val xdir = System.getProperty("XDir")
if (xdir == null)
error("No XDir") // log the error and exit
val ydir = System.getProperty("YDir")
if (ydir == null)
error("No YDir")
if (!new File(xdir).isDirectory)
error("XDir is not a directory")
if (!new File(ydir).isDirectory)
error("YDir is not a directory")
if (!new File(xdir).exists)
error("XDir does not exis")
if (!new File(ydir).exists)
error("YDir does not exist")
(and so on)
What is the best way to code this chain of validations in Scala?
Here's some useful things:
def sysValue(prop: String) = Option(System.getProperty(prop)) //returns Option[String]
def trySysValue(prop: String) = //returns Either[String, String]
sysValue(prop) map Right getOrElse Left("Absent property: " + prop)
Then you can use monadic composition of Either through its right-projection
val batch = //batch is Either[String, (File, File)]
for {
x <- trySysValue("XDir")).right
xf <- dir(x).right
y <- trySysValue("YDir").right
yf <- dir(y).right
yield (xf, yf)
def dir(s: String) = { //returns Either[String, File]
val f = new File(s)
if (!f.exists()) Left("Does not exist: " + f)
else if (!f.isDir()) Left("Is not a directory: " + f)
else Right(f)
The left-hand-side of the Either will be an error message. This monadic composition is fail fast. You can achieve composition which will accumulate all failures (for example, if neither XDir nor YDir exist, you would see both messages) using scalaz Validation. In that case, the code would look like this:
def trySysValue(prop: String) = //returns Validation[String, String]
sysValue(prop) map Success getOrElse ("Absent property: " + prop).fail
def dir(s: String) = {
val f = new File(s)
if (!f.exists())("Does not exist: " + f).fail
else if (!f.isDir()) ("Is not a directory: " + f).fail
else f.success
val batch = //batch is ValidationNEL[String, (File, File)]
(trySysValue("XDir")) flatMap dir).liftFailNel <|*|> (trySysValue("YDir")) flatMap dir).liftFailNel
something like:
val batch = for{
a <- safe(doA, "A failed") either
b <- safe(doB, "B failed") either
c <- safe(doC, "C failed") either
} yield(a,b,c)
batch fold( error(_), doSuccess(_) )
Where safe performs a, you guessed it, safe (try/catch) operation that takes a failure (Left outcome) message and returns an Either RightProjection (which allows you to do above batch operation while threading through the point-of-failure error message)
class Catching[T](f: => T) {
def either(msg: String) = {
try { Right(f).right } catch { Left(msg).right }
def safe[T](f: => T) = new Catching(f)
Can add an option method to Catching class as well, along with logging if you want to log particular error types.
See Jason Zaugg's solution for right biasing Either and this thread from scala-debate on the subject as well. No consensus as yet, but most scala "heavies" seem to be in favor.
One limitation of this approach is that if you attempt to add conditionals (if a = b) to the for{} block, it won't compile (since default Either filter method returns Option). The workaround is to implement filter and withFilter, returning Either, something I have yet to figure out/do (if someone has already done so, please post)
Yes you can use validation without scalaz, see here for a self containt implementation :
I am using the asynchronous I/O library of the playframework which uses Iteratees and Enumerators. I now have an Iterator[T] as data sink (for simplification say it's an Iterator[Byte] which stores its content into a file). This Iterator[Byte] is passed to the function which handles the writing.
But before writing I want to add some statistical information at the file begin (for simplification say it's one Byte), so I transfer the iterator the following way before passing it to the write function:
def write(value: Byte, output: Iteratee[Byte]): Iteratee[Byte] =
When I now read the stored file from the disk, I get an Enumerator[Byte] for it.
At first I want to read and remove the additional data and then I want to pass the rest of the Enumerator[Byte] to a function which handles the reading.
So I also need to transform the enumerator:
def read(input: Enumerator[Byte]): (Byte, Enumerator[Byte]) = {
val firstEnumeratorEntry = ...
val remainingEnumerator = ...
(firstEnumeratorEntry, remainingEnumerator)
But I have no idea, how to do this. How can I read some bytes from an Enumerator and get the remaining Enumerator?
Replacing Iteratee[Byte] with OutputStream and Enumerator[Byte] with InputStream, this would be very easy:
def write(value: Byte, output: OutputStream) = {
def read(input: InputStream) = (input.read,input)
But I need the asynchronous I/O of the play framework.
I wonder if you can tackle your goal from another angle.
That function that would use the remaining enumerator, let's call it remaining, presumably it applies to an iteratee to do the processing of the remainder: remaining |>> iteratee yielding another iteratee. Let's call that resulting iteratee iteratee2... Can you check whether you can get a reference to iteratee2? If that's the case, then you can get and process the first byte using a first iteratee head, then combine head and iteratee2 through flatMap:
val head = Enumeratee.take[Byte](1) &>> Iteratee.foreach[Byte](println)
val processing = for { h <- head; i <- iteratee2 } yield (h, i)
If you cannot get a hold of iteratee2 - which would be the case if your enumerator combines with an enumeratee that you did not implement - then this approach won't work.
Here is one way to achieve this by folding within the Iteratee and an appropriate (kind-of) State accumulator (a tuple here)
I go read the routes file, the first byte will be read as a Char and the other will be appended to a String as UTF-8 bytestrings.
def index = Action {
/*let's do everything asyncly*/
Async {
/*for comprehension for read-friendly*/
for (
i <- read; /*read the file */
(r:(Option[Char], String)) <- i.run /*"create" the related Promise and run it*/
) yield Ok("first : " + r._1.get + "\n" + "rest" + r._2) /* map the Promised result in a correct Request's Result*/
def read = {
//get the routes file in an Enumerator
val file: Enumerator[Array[Byte]] = Enumerator.fromFile(Play.getFile("/conf/routes"))
//apply the enumerator with an Iteratee that folds the data as wished
file(Iteratee.fold((None, ""):(Option[Char], String)) { (acc, b) =>
acc._1 match {
/*on the first chunk*/ case None => (Some(b(0).toChar), acc._2 + new String(b.tail, Charset.forName("utf-8")))
/*on other chunks*/ case x => (x, acc._2 + new String(b, Charset.forName("utf-8")))
I found yet another way using Enumeratee but it needs to create 2 Enumerator s (one short lived). However is it a bit more elegant. We use a "kind-of" Enumeratee but the Traversal one which works at a finer level than Enumeratee (chunck level).
We use take 1 that will take only 1 byte and then close the stream. On the other one, we use drop that simply drops the first byte (because we're using a Enumerator[Array[Byte]])
Furthermore, now read2 has a signature much more closer than what you wished, because it returns 2 enumerators (not so far from Promise, Enumerator)
def index = Action {
Async {
val (first, rest) = read2
val enee = Enumeratee.map[Array[Byte]] {bs => new String(bs, Charset.forName("utf-8"))}
def useEnee(enumor:Enumerator[Array[Byte]]) = Iteratee.flatten(enumor &> enee |>> Iteratee.consume[String]()).run.asInstanceOf[Promise[String]]
for {
f <- useEnee(first);
r <- useEnee(rest)
} yield Ok("first : " + f + "\n" + "rest" + r)
def read2 = {
def create = Enumerator.fromFile(Play.getFile("/conf/routes"))
val file: Enumerator[Array[Byte]] = create
val file2: Enumerator[Array[Byte]] = create
(file &> Traversable.take[Array[Byte]](1), file2 &> Traversable.drop[Array[Byte]](1))
Actually we like Iteratees because they compose. So instead of creating multiple Enumerators from your original one, you rather compose the two Iteratees sequentially (read-first and read-rest), and feed it with your single Enumerator.
For this you need a sequential composition method, now I call it andThen. Here is a rough implementation. Note that returning the unconsumed input is a bit harsh, maybe could customize behavior with a typeclass based on the Input type. Also it doesn't handle passing the leftover stuff from the first iterator to the second one (Exercise :).
object Iteratees {
def andThen[E, A, B](a: Iteratee[E, A], b: Iteratee[E, B]): Iteratee[E, (A,B)] = new Iteratee[E, (A,B)] {
def fold[C](
done: ((A, B), Input[E]) => Promise[C],
cont: ((Input[E]) => Iteratee[E, (A, B)]) => Promise[C],
error: (String, Input[E]) => Promise[C]): Promise[C] = {
(ra, aleft) => b.fold(
(rb, bleft) => done((ra, rb), aleft /* could be magicop(aleft, bleft)*/),
(bcont) => cont(e => bcont(e) map (rb => (ra, rb))),
(s, err) => error(s, err)
(acont) => cont(e => andThen[E, A, B](acont(e), b)),
(s, err) => error(s, err)
Now you can just use the following:
object Application extends Controller {
def index = Action { Async {
val strings: Enumerator[String] = Enumerator("1","2","3","4")
val takeOne = Cont[String, String](e => e match {
case Input.El(e) => Done(e, Input.Empty)
case x => Error("not enough", x)
val takeRest = Iteratee.consume[String]()
val firstAndRest = Iteratees.andThen(takeOne, takeRest)
val futureRes = strings(firstAndRest) flatMap (_.run)
futureRes.map(x => Ok(x.toString)) // prints (1,234)
} }