Best way to parse command-line parameters? [closed] - scala

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
What's the best way to parse command-line parameters in Scala?
I personally prefer something lightweight that does not require external jar.
Related:
How do I parse command line arguments in Java?
What parameter parser libraries are there for C++?
Best way to parse command line arguments in C#

For most cases you do not need an external parser. Scala's pattern matching allows consuming args in a functional style. For example:
object MmlAlnApp {
val usage = """
Usage: mmlaln [--min-size num] [--max-size num] filename
"""
def main(args: Array[String]) {
if (args.length == 0) println(usage)
val arglist = args.toList
type OptionMap = Map[Symbol, Any]
def nextOption(map : OptionMap, list: List[String]) : OptionMap = {
def isSwitch(s : String) = (s(0) == '-')
list match {
case Nil => map
case "--max-size" :: value :: tail =>
nextOption(map ++ Map('maxsize -> value.toInt), tail)
case "--min-size" :: value :: tail =>
nextOption(map ++ Map('minsize -> value.toInt), tail)
case string :: opt2 :: tail if isSwitch(opt2) =>
nextOption(map ++ Map('infile -> string), list.tail)
case string :: Nil => nextOption(map ++ Map('infile -> string), list.tail)
case option :: tail => println("Unknown option "+option)
exit(1)
}
}
val options = nextOption(Map(),arglist)
println(options)
}
}
will print, for example:
Map('infile -> test/data/paml-aln1.phy, 'maxsize -> 4, 'minsize -> 2)
This version only takes one infile. Easy to improve on (by using a List).
Note also that this approach allows for concatenation of multiple command line arguments - even more than two!

scopt/scopt
val parser = new scopt.OptionParser[Config]("scopt") {
head("scopt", "3.x")
opt[Int]('f', "foo") action { (x, c) =>
c.copy(foo = x) } text("foo is an integer property")
opt[File]('o', "out") required() valueName("<file>") action { (x, c) =>
c.copy(out = x) } text("out is a required file property")
opt[(String, Int)]("max") action { case ((k, v), c) =>
c.copy(libName = k, maxCount = v) } validate { x =>
if (x._2 > 0) success
else failure("Value <max> must be >0")
} keyValueName("<libname>", "<max>") text("maximum count for <libname>")
opt[Unit]("verbose") action { (_, c) =>
c.copy(verbose = true) } text("verbose is a flag")
note("some notes.\n")
help("help") text("prints this usage text")
arg[File]("<file>...") unbounded() optional() action { (x, c) =>
c.copy(files = c.files :+ x) } text("optional unbounded args")
cmd("update") action { (_, c) =>
c.copy(mode = "update") } text("update is a command.") children(
opt[Unit]("not-keepalive") abbr("nk") action { (_, c) =>
c.copy(keepalive = false) } text("disable keepalive"),
opt[Boolean]("xyz") action { (x, c) =>
c.copy(xyz = x) } text("xyz is a boolean property")
)
}
// parser.parse returns Option[C]
parser.parse(args, Config()) map { config =>
// do stuff
} getOrElse {
// arguments are bad, usage message will have been displayed
}
The above generates the following usage text:
scopt 3.x
Usage: scopt [update] [options] [<file>...]
-f <value> | --foo <value>
foo is an integer property
-o <file> | --out <file>
out is a required file property
--max:<libname>=<max>
maximum count for <libname>
--verbose
verbose is a flag
some notes.
--help
prints this usage text
<file>...
optional unbounded args
Command: update
update is a command.
-nk | --not-keepalive
disable keepalive
--xyz <value>
xyz is a boolean property
This is what I currently use. Clean usage without too much baggage.
(Disclaimer: I now maintain this project)

I realize that the question was asked some time ago, but I thought it might help some people, who are googling around (like me), and hit this page.
Scallop looks quite promising as well.
Features (quote from the linked github page):
flag, single-value and multiple value options
POSIX-style short option names (-a) with grouping (-abc)
GNU-style long option names (--opt)
Property arguments (-Dkey=value, -D key1=value key2=value)
Non-string types of options and properties values (with extendable converters)
Powerful matching on trailing args
Subcommands
And some example code (also from that Github page):
import org.rogach.scallop._;
object Conf extends ScallopConf(List("-c","3","-E","fruit=apple","7.2")) {
// all options that are applicable to builder (like description, default, etc)
// are applicable here as well
val count:ScallopOption[Int] = opt[Int]("count", descr = "count the trees", required = true)
.map(1+) // also here work all standard Option methods -
// evaluation is deferred to after option construction
val properties = props[String]('E')
// types (:ScallopOption[Double]) can be omitted, here just for clarity
val size:ScallopOption[Double] = trailArg[Double](required = false)
}
// that's it. Completely type-safe and convenient.
Conf.count() should equal (4)
Conf.properties("fruit") should equal (Some("apple"))
Conf.size.get should equal (Some(7.2))
// passing into other functions
def someInternalFunc(conf:Conf.type) {
conf.count() should equal (4)
}
someInternalFunc(Conf)

I like sliding over arguments for relatively simple configurations.
var name = ""
var port = 0
var ip = ""
args.sliding(2, 2).toList.collect {
case Array("--ip", argIP: String) => ip = argIP
case Array("--port", argPort: String) => port = argPort.toInt
case Array("--name", argName: String) => name = argName
}

Command Line Interface Scala Toolkit (CLIST)
here is mine too! (a bit late in the game though)
https://github.com/backuity/clist
As opposed to scopt it is entirely mutable... but wait! That gives us a pretty nice syntax:
class Cat extends Command(description = "concatenate files and print on the standard output") {
// type-safety: members are typed! so showAll is a Boolean
var showAll = opt[Boolean](abbrev = "A", description = "equivalent to -vET")
var numberNonblank = opt[Boolean](abbrev = "b", description = "number nonempty output lines, overrides -n")
// files is a Seq[File]
var files = args[Seq[File]](description = "files to concat")
}
And a simple way to run it:
Cli.parse(args).withCommand(new Cat) { case cat =>
println(cat.files)
}
You can do a lot more of course (multi-commands, many configuration options, ...) and has no dependency.
I'll finish with a kind of distinctive feature, the default usage (quite often neglected for multi commands):

How to parse parameters without an external dependency. Great question! You may be interested in picocli.
Picocli is specifically designed to solve the problem asked in the question: it is a command line parsing framework in a single file, so you can include it in source form. This lets users run picocli-based applications without requiring picocli as an external dependency.
It works by annotating fields so you write very little code. Quick summary:
Strongly typed everything - command line options as well as positional parameters
Support for POSIX clustered short options (so it handles <command> -xvfInputFile as well as <command> -x -v -f InputFile)
An arity model that allows a minimum, maximum and variable number of parameters, e.g, "1..*", "3..5"
Fluent and compact API to minimize boilerplate client code
Subcommands
Usage help with ANSI colors
The usage help message is easy to customize with annotations (without programming). For example:
(source)
I couldn't resist adding one more screenshot to show what kind of usage help messages are possible. Usage help is the face of your application, so be creative and have fun!
Disclaimer: I created picocli. Feedback or questions very welcome. It is written in java, but let me know if there is any issue using it in scala and I'll try to address it.

This is largely a shameless clone of my answer to the Java question of the same topic. It turns out that JewelCLI is Scala-friendly in that it doesn't require JavaBean style methods to get automatic argument naming.
JewelCLI is a Scala-friendly Java library for command-line parsing that yields clean code. It uses Proxied Interfaces Configured with Annotations to dynamically build a type-safe API for your command-line parameters.
An example parameter interface Person.scala:
import uk.co.flamingpenguin.jewel.cli.Option
trait Person {
#Option def name: String
#Option def times: Int
}
An example usage of the parameter interface Hello.scala:
import uk.co.flamingpenguin.jewel.cli.CliFactory.parseArguments
import uk.co.flamingpenguin.jewel.cli.ArgumentValidationException
object Hello {
def main(args: Array[String]) {
try {
val person = parseArguments(classOf[Person], args:_*)
for (i <- 1 to (person times))
println("Hello " + (person name))
} catch {
case e: ArgumentValidationException => println(e getMessage)
}
}
}
Save copies of the files above to a single directory and download the JewelCLI 0.6 JAR to that directory as well.
Compile and run the example in Bash on Linux/Mac OS X/etc.:
scalac -cp jewelcli-0.6.jar:. Person.scala Hello.scala
scala -cp jewelcli-0.6.jar:. Hello --name="John Doe" --times=3
Compile and run the example in the Windows Command Prompt:
scalac -cp jewelcli-0.6.jar;. Person.scala Hello.scala
scala -cp jewelcli-0.6.jar;. Hello --name="John Doe" --times=3
Running the example should yield the following output:
Hello John Doe
Hello John Doe
Hello John Doe

I am from Java world, I like args4j because its simple, specification is more readable( thanks to annotations) and produces nicely formatted output.
Here is my example snippet:
Specification
import org.kohsuke.args4j.{CmdLineException, CmdLineParser, Option}
object CliArgs {
#Option(name = "-list", required = true,
usage = "List of Nutch Segment(s) Part(s)")
var pathsList: String = null
#Option(name = "-workdir", required = true,
usage = "Work directory.")
var workDir: String = null
#Option(name = "-master",
usage = "Spark master url")
var masterUrl: String = "local[2]"
}
Parse
//var args = "-listt in.txt -workdir out-2".split(" ")
val parser = new CmdLineParser(CliArgs)
try {
parser.parseArgument(args.toList.asJava)
} catch {
case e: CmdLineException =>
print(s"Error:${e.getMessage}\n Usage:\n")
parser.printUsage(System.out)
System.exit(1)
}
println("workDir :" + CliArgs.workDir)
println("listFile :" + CliArgs.pathsList)
println("master :" + CliArgs.masterUrl)
On invalid arguments
Error:Option "-list" is required
Usage:
-list VAL : List of Nutch Segment(s) Part(s)
-master VAL : Spark master url (default: local[2])
-workdir VAL : Work directory.

scala-optparse-applicative
I think scala-optparse-applicative is the most functional command line parser library in Scala.
https://github.com/bmjames/scala-optparse-applicative

I liked the slide() approach of joslinm just not the mutable vars ;) So here's an immutable way to that approach:
case class AppArgs(
seed1: String,
seed2: String,
ip: String,
port: Int
)
object AppArgs {
def empty = new AppArgs("", "", "", 0)
}
val args = Array[String](
"--seed1", "akka.tcp://seed1",
"--seed2", "akka.tcp://seed2",
"--nodeip", "192.167.1.1",
"--nodeport", "2551"
)
val argsInstance = args.sliding(2, 1).toList.foldLeft(AppArgs.empty) { case (accumArgs, currArgs) => currArgs match {
case Array("--seed1", seed1) => accumArgs.copy(seed1 = seed1)
case Array("--seed2", seed2) => accumArgs.copy(seed2 = seed2)
case Array("--nodeip", ip) => accumArgs.copy(ip = ip)
case Array("--nodeport", port) => accumArgs.copy(port = port.toInt)
case unknownArg => accumArgs // Do whatever you want for this case
}
}

There's also JCommander (disclaimer: I created it):
object Main {
object Args {
#Parameter(
names = Array("-f", "--file"),
description = "File to load. Can be specified multiple times.")
var file: java.util.List[String] = null
}
def main(args: Array[String]): Unit = {
new JCommander(Args, args.toArray: _*)
for (filename <- Args.file) {
val f = new File(filename)
printf("file: %s\n", f.getName)
}
}
}

I've just found an extensive command line parsing library in scalac's scala.tools.cmd package.
See http://www.assembla.com/code/scala-eclipse-toolchain/git/nodes/src/compiler/scala/tools/cmd?rev=f59940622e32384b1e08939effd24e924a8ba8db

I've attempted generalize #pjotrp's solution by taking in a list of required positional key symbols, a map of flag -> key symbol and default options:
def parseOptions(args: List[String], required: List[Symbol], optional: Map[String, Symbol], options: Map[Symbol, String]): Map[Symbol, String] = {
args match {
// Empty list
case Nil => options
// Keyword arguments
case key :: value :: tail if optional.get(key) != None =>
parseOptions(tail, required, optional, options ++ Map(optional(key) -> value))
// Positional arguments
case value :: tail if required != Nil =>
parseOptions(tail, required.tail, optional, options ++ Map(required.head -> value))
// Exit if an unknown argument is received
case _ =>
printf("unknown argument(s): %s\n", args.mkString(", "))
sys.exit(1)
}
}
def main(sysargs Array[String]) {
// Required positional arguments by key in options
val required = List('arg1, 'arg2)
// Optional arguments by flag which map to a key in options
val optional = Map("--flag1" -> 'flag1, "--flag2" -> 'flag2)
// Default options that are passed in
var defaultOptions = Map()
// Parse options based on the command line args
val options = parseOptions(sysargs.toList, required, optional, defaultOptions)
}

I have never liked ruby like option parsers. Most developers that used them never write a proper man page for their scripts and end up with pages long options not organized in a proper way because of their parser.
I have always preferred Perl's way of doing things with Perl's Getopt::Long.
I am working on a scala implementation of it. The early API looks something like this:
def print_version() = () => println("version is 0.2")
def main(args: Array[String]) {
val (options, remaining) = OptionParser.getOptions(args,
Map(
"-f|--flag" -> 'flag,
"-s|--string=s" -> 'string,
"-i|--int=i" -> 'int,
"-f|--float=f" -> 'double,
"-p|-procedure=p" -> { () => println("higher order function" }
"-h=p" -> { () => print_synopsis() }
"--help|--man=p" -> { () => launch_manpage() },
"--version=p" -> print_version,
))
So calling script like this:
$ script hello -f --string=mystring -i 7 --float 3.14 --p --version world -- --nothing
Would print:
higher order function
version is 0.2
And return:
remaining = Array("hello", "world", "--nothing")
options = Map('flag -> true,
'string -> "mystring",
'int -> 7,
'double -> 3.14)
The project is hosted in github scala-getoptions.

I'd suggest to use http://docopt.org/. There's a scala-port but the Java implementation https://github.com/docopt/docopt.java works just fine and seems to be better maintained. Here's an example:
import org.docopt.Docopt
import scala.collection.JavaConversions._
import scala.collection.JavaConverters._
val doc =
"""
Usage: my_program [options] <input>
Options:
--sorted fancy sorting
""".stripMargin.trim
//def args = "--sorted test.dat".split(" ").toList
var results = new Docopt(doc).
parse(args()).
map {case(key, value)=>key ->value.toString}
val inputFile = new File(results("<input>"))
val sorted = results("--sorted").toBoolean

This is what I cooked. It returns a tuple of a map and a list. List is for input, like input file names. Map is for switches/options.
val args = "--sw1 1 input_1 --sw2 --sw3 2 input_2 --sw4".split(" ")
val (options, inputs) = OptParser.parse(args)
will return
options: Map[Symbol,Any] = Map('sw1 -> 1, 'sw2 -> true, 'sw3 -> 2, 'sw4 -> true)
inputs: List[Symbol] = List('input_1, 'input_2)
Switches can be "--t" which x will be set to true, or "--x 10" which x will be set to "10". Everything else will end up in list.
object OptParser {
val map: Map[Symbol, Any] = Map()
val list: List[Symbol] = List()
def parse(args: Array[String]): (Map[Symbol, Any], List[Symbol]) = _parse(map, list, args.toList)
private [this] def _parse(map: Map[Symbol, Any], list: List[Symbol], args: List[String]): (Map[Symbol, Any], List[Symbol]) = {
args match {
case Nil => (map, list)
case arg :: value :: tail if (arg.startsWith("--") && !value.startsWith("--")) => _parse(map ++ Map(Symbol(arg.substring(2)) -> value), list, tail)
case arg :: tail if (arg.startsWith("--")) => _parse(map ++ Map(Symbol(arg.substring(2)) -> true), list, tail)
case opt :: tail => _parse(map, list :+ Symbol(opt), tail)
}
}
}

I based my approach on the top answer (from dave4420), and tried to improve it by making it more general-purpose.
It returns a Map[String,String] of all command line parameters
You can query this for the specific parameters you want (eg using .contains) or convert the values into the types you want (eg using toInt).
def argsToOptionMap(args:Array[String]):Map[String,String]= {
def nextOption(
argList:List[String],
map:Map[String, String]
) : Map[String, String] = {
val pattern = "--(\\w+)".r // Selects Arg from --Arg
val patternSwitch = "-(\\w+)".r // Selects Arg from -Arg
argList match {
case Nil => map
case pattern(opt) :: value :: tail => nextOption( tail, map ++ Map(opt->value) )
case patternSwitch(opt) :: tail => nextOption( tail, map ++ Map(opt->null) )
case string :: Nil => map ++ Map(string->null)
case option :: tail => {
println("Unknown option:"+option)
sys.exit(1)
}
}
}
nextOption(args.toList,Map())
}
Example:
val args=Array("--testing1","testing1","-a","-b","--c","d","test2")
argsToOptionMap( args )
Gives:
res0: Map[String,String] = Map(testing1 -> testing1, a -> null, b -> null, c -> d, test2 -> null)

another library: scarg

Here's a scala command line parser that is easy to use. It automatically formats help text, and it converts switch arguments to your desired type. Both short POSIX, and long GNU style switches are supported. Supports switches with required arguments, optional arguments, and multiple value arguments. You can even specify the finite list of acceptable values for a particular switch. Long switch names can be abbreviated on the command line for convenience. Similar to the option parser in the Ruby standard library.

I like the clean look of this code... gleaned from a discussion here:
http://www.scala-lang.org/old/node/4380
object ArgParser {
val usage = """
Usage: parser [-v] [-f file] [-s sopt] ...
Where: -v Run verbosely
-f F Set input file to F
-s S Set Show option to S
"""
var filename: String = ""
var showme: String = ""
var debug: Boolean = false
val unknown = "(^-[^\\s])".r
val pf: PartialFunction[List[String], List[String]] = {
case "-v" :: tail => debug = true; tail
case "-f" :: (arg: String) :: tail => filename = arg; tail
case "-s" :: (arg: String) :: tail => showme = arg; tail
case unknown(bad) :: tail => die("unknown argument " + bad + "\n" + usage)
}
def main(args: Array[String]) {
// if there are required args:
if (args.length == 0) die()
val arglist = args.toList
val remainingopts = parseArgs(arglist,pf)
println("debug=" + debug)
println("showme=" + showme)
println("filename=" + filename)
println("remainingopts=" + remainingopts)
}
def parseArgs(args: List[String], pf: PartialFunction[List[String], List[String]]): List[String] = args match {
case Nil => Nil
case _ => if (pf isDefinedAt args) parseArgs(pf(args),pf) else args.head :: parseArgs(args.tail,pf)
}
def die(msg: String = usage) = {
println(msg)
sys.exit(1)
}
}

I just created my simple enumeration
val args: Array[String] = "-silent -samples 100 -silent".split(" +").toArray
//> args : Array[String] = Array(-silent, -samples, 100, -silent)
object Opts extends Enumeration {
class OptVal extends Val {
override def toString = "-" + super.toString
}
val nopar, silent = new OptVal() { // boolean options
def apply(): Boolean = args.contains(toString)
}
val samples, maxgen = new OptVal() { // integer options
def apply(default: Int) = { val i = args.indexOf(toString) ; if (i == -1) default else args(i+1).toInt}
def apply(): Int = apply(-1)
}
}
Opts.nopar() //> res0: Boolean = false
Opts.silent() //> res1: Boolean = true
Opts.samples() //> res2: Int = 100
Opts.maxgen() //> res3: Int = -1
I understand that solution has two major flaws that may distract you: It eliminates the freedom (i.e. the dependence on other libraries, that you value so much) and redundancy (the DRY principle, you do type the option name only once, as Scala program variable and eliminate it second time typed as command line text).

As everyone posted it's own solution here is mine, cause I wanted something easier to write for the user : https://gist.github.com/gwenzek/78355526e476e08bb34d
The gist contains a code file, plus a test file and a short example copied here:
import ***.ArgsOps._
object Example {
val parser = ArgsOpsParser("--someInt|-i" -> 4, "--someFlag|-f", "--someWord" -> "hello")
def main(args: Array[String]){
val argsOps = parser <<| args
val someInt : Int = argsOps("--someInt")
val someFlag : Boolean = argsOps("--someFlag")
val someWord : String = argsOps("--someWord")
val otherArgs = argsOps.args
foo(someWord, someInt, someFlag)
}
}
There is not fancy options to force a variable to be in some bounds, cause I don't feel that the parser is the best place to do so.
Note : you can have as much alias as you want for a given variable.

I'm going to pile on. I solved this with a simple line of code. My command line arguments look like this:
input--hdfs:/path/to/myData/part-00199.avro output--hdfs:/path/toWrite/Data fileFormat--avro option1--5
This creates an array via Scala's native command line functionality (from either App or a main method):
Array("input--hdfs:/path/to/myData/part-00199.avro", "output--hdfs:/path/toWrite/Data","fileFormat--avro","option1--5")
I can then use this line to parse out the default args array:
val nArgs = args.map(x=>x.split("--")).map(y=>(y(0),y(1))).toMap
Which creates a map with names associated with the command line values:
Map(input -> hdfs:/path/to/myData/part-00199.avro, output -> hdfs:/path/toWrite/Data, fileFormat -> avro, option1 -> 5)
I can then access the values of named parameters in my code and the order they appear on the command line is no longer relevant. I realize this is fairly simple and doesn't have all the advanced functionality mentioned above but seems to be sufficient in most cases, only needs one line of code, and doesn't involve external dependencies.

Here is mine 1-liner
def optArg(prefix: String) = args.drop(3).find { _.startsWith(prefix) }.map{_.replaceFirst(prefix, "")}
def optSpecified(prefix: String) = optArg(prefix) != None
def optInt(prefix: String, default: Int) = optArg(prefix).map(_.toInt).getOrElse(default)
It drops 3 mandatory arguments and gives out the options. Integers are specified like notorious -Xmx<size> java option, jointly with the prefix. You can parse binaries and integers as simple as
val cacheEnabled = optSpecified("cacheOff")
val memSize = optInt("-Xmx", 1000)
No need to import anything.

Poor man's quick-and-dirty one-liner for parsing key=value pairs:
def main(args: Array[String]) {
val cli = args.map(_.split("=") match { case Array(k, v) => k->v } ).toMap
val saveAs = cli("saveAs")
println(saveAs)
}

freecli
package freecli
package examples
package command
import java.io.File
import freecli.core.all._
import freecli.config.all._
import freecli.command.all._
object Git extends App {
case class CommitConfig(all: Boolean, message: String)
val commitCommand =
cmd("commit") {
takesG[CommitConfig] {
O.help --"help" ::
flag --"all" -'a' -~ des("Add changes from all known files") ::
O.string -'m' -~ req -~ des("Commit message")
} ::
runs[CommitConfig] { config =>
if (config.all) {
println(s"Commited all ${config.message}!")
} else {
println(s"Commited ${config.message}!")
}
}
}
val rmCommand =
cmd("rm") {
takesG[File] {
O.help --"help" ::
file -~ des("File to remove from git")
} ::
runs[File] { f =>
println(s"Removed file ${f.getAbsolutePath} from git")
}
}
val remoteCommand =
cmd("remote") {
takes(O.help --"help") ::
cmd("add") {
takesT {
O.help --"help" ::
string -~ des("Remote name") ::
string -~ des("Remote url")
} ::
runs[(String, String)] {
case (s, u) => println(s"Remote $s $u added")
}
} ::
cmd("rm") {
takesG[String] {
O.help --"help" ::
string -~ des("Remote name")
} ::
runs[String] { s =>
println(s"Remote $s removed")
}
}
}
val git =
cmd("git", des("Version control system")) {
takes(help --"help" :: version --"version" -~ value("v1.0")) ::
commitCommand ::
rmCommand ::
remoteCommand
}
val res = runCommandOrFail(git)(args).run
}
This will generate the following usage:
Usage

Related

Simple functionnal way for grouping successive elements? [duplicate]

I'm trying to 'group' a string into segments, I guess this example would explain it more succintly
scala> val str: String = "aaaabbcddeeeeeeffg"
... (do something)
res0: List("aaaa","bb","c","dd","eeeee","ff","g")
I can thnk of a few ways to do this in an imperative style (with vars and stepping through the string to find groups) but I was wondering if any better functional solution could
be attained? I've been looking through the Scala API but there doesn't seem to be something that fits my needs.
Any help would be appreciated
You can split the string recursively with span:
def s(x : String) : List[String] = if(x.size == 0) Nil else {
val (l,r) = x.span(_ == x(0))
l :: s(r)
}
Tail recursive:
#annotation.tailrec def s(x : String, y : List[String] = Nil) : List[String] = {
if(x.size == 0) y.reverse
else {
val (l,r) = x.span(_ == x(0))
s(r, l :: y)
}
}
Seems that all other answers are very concentrated on collection operations. But pure string + regex solution is much simpler:
str split """(?<=(\w))(?!\1)""" toList
In this regex I use positive lookbehind and negative lookahead for the captured char
def group(s: String): List[String] = s match {
case "" => Nil
case s => s.takeWhile(_==s.head) :: group(s.dropWhile(_==s.head))
}
Edit: Tail recursive version:
def group(s: String, result: List[String] = Nil): List[String] = s match {
case "" => result reverse
case s => group(s.dropWhile(_==s.head), s.takeWhile(_==s.head) :: result)
}
can be used just like the other because the second parameter has a default value and thus doesnt have to be supplied.
Make it one-liner:
scala> val str = "aaaabbcddddeeeeefff"
str: java.lang.String = aaaabbcddddeeeeefff
scala> str.groupBy(identity).map(_._2)
res: scala.collection.immutable.Iterable[String] = List(eeeee, fff, aaaa, bb, c, dddd)
UPDATE:
As #Paul mentioned about the order here is updated version:
scala> str.groupBy(identity).toList.sortBy(_._1).map(_._2)
res: List[String] = List(aaaa, bb, c, dddd, eeeee, fff)
You could use some helper functions like this:
val str = "aaaabbcddddeeeeefff"
def zame(chars:List[Char]) = chars.partition(_==chars.head)
def q(chars:List[Char]):List[List[Char]] = chars match {
case Nil => Nil
case rest =>
val (thesame,others) = zame(rest)
thesame :: q(others)
}
q(str.toList) map (_.mkString)
This should do the trick, right? No doubt it can be cleaned up into one-liners even further
A functional* solution using fold:
def group(s : String) : Seq[String] = {
s.tail.foldLeft(Seq(s.head.toString)) { case (carry, elem) =>
if ( carry.last(0) == elem ) {
carry.init :+ (carry.last + elem)
}
else {
carry :+ elem.toString
}
}
}
There is a lot of cost hidden in all those sequence operations performed on strings (via implicit conversion). I guess the real complexity heavily depends on the kind of Seq strings are converted to.
(*) Afaik all/most operations in the collection library depend in iterators, an imho inherently unfunctional concept. But the code looks functional, at least.
Starting Scala 2.13, List is now provided with the unfold builder which can be combined with String::span:
List.unfold("aaaabbaaacdeeffg") {
case "" => None
case rest => Some(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
or alternatively, coupled with Scala 2.13's Option#unless builder:
List.unfold("aaaabbaaacdeeffg") {
rest => Option.unless(rest.isEmpty)(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
Details:
Unfold (def unfold[A, S](init: S)(f: (S) => Option[(A, S)]): List[A]) is based on an internal state (init) which is initialized in our case with "aaaabbaaacdeeffg".
For each iteration, we span (def span(p: (Char) => Boolean): (String, String)) this internal state in order to find the prefix containing the same symbol and produce a (String, String) tuple which contains the prefix and the rest of the string. span is very fortunate in this context as it produces exactly what unfold expects: a tuple containing the next element of the list and the new internal state.
The unfolding stops when the internal state is "" in which case we produce None as expected by unfold to exit.
Edit: Have to read more carefully. Below is no functional code.
Sometimes, a little mutable state helps:
def group(s : String) = {
var tmp = ""
val b = Seq.newBuilder[String]
s.foreach { c =>
if ( tmp != "" && tmp.head != c ) {
b += tmp
tmp = ""
}
tmp += c
}
b += tmp
b.result
}
Runtime O(n) (if segments have at most constant length) and tmp.+= probably creates the most overhead. Use a string builder instead for strict runtime in O(n).
group("aaaabbcddeeeeeeffg")
> Seq[String] = List(aaaa, bb, c, dd, eeeeee, ff, g)
If you want to use scala API you can use the built in function for that:
str.groupBy(c => c).values
Or if you mind it being sorted and in a list:
str.groupBy(c => c).values.toList.sorted

Filtering inside `for` with pattern matching

I am reading a TSV file and using using something like this:
case class Entry(entryType: Int, value: Int)
def filterEntries(): Iterator[Entry] = {
for {
line <- scala.io.Source.fromFile("filename").getLines()
} yield new Entry(line.split("\t").map(x => x.toInt))
}
Now I am both interested in filtering out entries whose entryType are set to 0 and ignoring lines with column count greater or lesser than 2 (that does not match the constructor). I was wondering if there's an idiomatic way to achieve this may be using pattern matching and unapply method in a companion object. The only thing I can think of is using .filter on the resulting iterator.
I will also accept solution not involving for loop but that returns Iterator[Entry]. They solutions must be tolerant to malformed inputs.
This is more state-of-arty:
package object liner {
implicit class R(val sc: StringContext) {
object r {
def unapplySeq(s: String): Option[Seq[String]] = sc.parts.mkString.r unapplySeq s
}
}
}
package liner {
case class Entry(entryType: Int, value: Int)
object I {
def unapply(s: String): Option[Int] = util.Try(s.toInt).toOption
}
object Test extends App {
def lines = List("1 2", "3", "", " 4 5 ", "junk", "0, 100000", "6 7 8")
def entries = lines flatMap {
case r"""\s*${I(i)}(\d+)\s+${I(j)}(\d+)\s*""" if i != 0 => Some(Entry(i, j))
case __________________________________________________ => None
}
Console println entries
}
}
Hopefully, the regex interpolator will make it into the standard distro soon, but this shows how easy it is to rig up. Also hopefully, a scanf-style interpolator will allow easy extraction with case f"$i%d".
I just started using the "elongated wildcard" in patterns to align the arrows.
There is a pupal or maybe larval regex macro:
https://github.com/som-snytt/regextractor
You can create variables in the head of the for-comprehension and then use a guard:
edit: ensure length of array
for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2 && arr(0) != 0
} yield new Entry(arr(0), arr(1))
I have solved it using the following code:
import scala.util.{Try, Success}
val lines = List(
"1\t2",
"1\t",
"2",
"hello",
"1\t3"
)
case class Entry(val entryType: Int, val value: Int)
object Entry {
def unapply(line: String) = {
line.split("\t").map(x => Try(x.toInt)) match {
case Array(Success(entryType: Int), Success(value: Int)) => Some(Entry(entryType, value))
case _ =>
println("Malformed line: " + line)
None
}
}
}
for {
line <- lines
entryOption = Entry.unapply(line)
if entryOption.isDefined
} yield entryOption.get
The left hand side of a <- or = in a for-loop may be a fully-fledged pattern. So you may write this:
def filterEntries(): Iterator[Int] = for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2
// now you may use pattern matching to extract the array
Array(entryType, value) = arr
if entryType == 0
} yield Entry(entryType, value)
Note that this solution will throw a NumberFormatException if a field is not convertible to an Int. If you do not want that, you'll have to encapsulate x.toInt with a Try and pattern match again.

Is there any way to use immutable collections here + make the code look better?

I have to validate some variables manually for some reason and return a map with the sequance of the error messages for each variable. I've decided to use mutable collections for this because I think there is no other choise left:
val errors = collection.mutable.Map[String, ListBuffer[String]]()
//field1
val fieldToValidate1 = getData1()
if (fieldToValidate1 = "")
errors("fieldToValidate1") += "it must not be empty!"
if (validate2(fieldToValidate1))
errors("fieldToValidate1") += "validation2!"
if (validate3(fieldToValidate1))
errors("fieldToValidate1") += "validation3!"
//field2
val fieldToValidate2 = getData1()
//approximately the same steps
if (fieldToValidate2 = "")
errors("fieldToValidate2") += "it must not be empty!"
//.....
To my mind, it look kind of clumsy and there should other elegant solution. I'd also like to not use mutable collections if possible. Your ideas?
Instead of using mutable collections, you can define errors with var and update it in this way.
var errors = Map[String, List[String]]().withDefaultValue(Nil)
errors = errors updated ("fieldToValidate1", errors("fieldToValidate1") ++ List("it must not be empty!"))
errors = errors updated ("fieldToValidate1", errors("fieldToValidate1") ++ List("validation2"))
The code looks more tedious, but it gets out of mutable collections.
So what is a good type for your check? I was thinking about A => Option[String] if A is the type of your object under test. If your error messages do not depend on the value of the object under test, (A => Boolean, String) might be more convenient.
//for constructing checks from boolean test and an error message
def checkMsg[A](check: A => Boolean, msg: => String): A => Option[String] =
x => if(check(x)) Some(msg) else None
val checks = Seq[String => Option[String]](
checkMsg((_ == ""), "it must not be empty"),
//example of using the object under test in the error message
x => Some(x).filterNot(_ startsWith "ab").map(x => x + " does not begin with ab")
)
val objectUnderTest = "acvw"
val errors = checks.flatMap(c => c(objectUnderTest))
Error labels
As I just noted, you were asking for a map with a label for each check. In this case, you need to provide the check label, of course. Then the type of your check would be (String, A => Option[String]).
Although a [relatively] widespread way of doing-the-thing-right would be using scalaz's Validation (as #senia has shown), I think it is a little bit overwhelming approach (if you're bringing scalaz to your project you have to be a seasoned scala developer, otherwise it may bring you more harm than good).
Nice alternative could be using ScalaUtils which has Or and Every specifically made for this purpose, in fact if you're using ScalaTest you already have seen an example of them in use (it uses scalautils underneath). I shamefully copy-pasted example from their doc:
import org.scalautils._
def parseName(input: String): String Or One[ErrorMessage] = {
val trimmed = input.trim
if (!trimmed.isEmpty) Good(trimmed) else Bad(One(s""""${input}" is not a valid name"""))
}
def parseAge(input: String): Int Or One[ErrorMessage] = {
try {
val age = input.trim.toInt
if (age >= 0) Good(age) else Bad(One(s""""${age}" is not a valid age"""))
}
catch {
case _: NumberFormatException => Bad(One(s""""${input}" is not a valid integer"""))
}
}
import Accumulation._
def parsePerson(inputName: String, inputAge: String): Person Or Every[ErrorMessage] = {
val name = parseName(inputName)
val age = parseAge(inputAge)
withGood(name, age) { Person(_, _) }
}
parsePerson("Bridget Jones", "29")
// Result: Good(Person(Bridget Jones,29))
parsePerson("Bridget Jones", "")
// Result: Bad(One("" is not a valid integer))
parsePerson("Bridget Jones", "-29")
// Result: Bad(One("-29" is not a valid age))
parsePerson("", "")
// Result: Bad(Many("" is not a valid name, "" is not a valid integer))
Having said this, I don't think you can do any better than your current approach if you want to stick with core scala without any external dependencies.
In case you can use scalaz the best solution for aggregation errors is Validation:
def validate1(value: String) =
if (value == "") "it must not be empty!".failNel else value.success
def validate2(value: String) =
if (value.length > 10) "it must not be longer than 10!".failNel else value.success
def validate3(value: String) =
if (value == "error") "it must not be equal to 'error'!".failNel else value.success
def validateField(name: String, value: String): ValidationNel[(String, String), String] =
(
validate1(value) |#|
validate2(value) |#|
validate3(value)
).tupled >| value leftMap { _.map{ name -> _ } }
val result = (
validateField("fieldToValidate1", getData1()) |#|
validateField("fieldToValidate2", getData2())
).tupled
Then you could get optional errors Map like this:
val errors =
result.swap.toOption.map{
_.toList.groupBy(_._1).map{ case (k, v) => k -> v.map(_._2) }
}
// Some(Map(fieldToValidate2 -> List(it must not be equal to 'error'!), fieldToValidate1 -> List(it must not be empty!)))

Scala - how to print case classes like (pretty printed) tree

I'm making a parser with Scala Combinators. It is awesome. What I end up with is a long list of entagled case classes, like: ClassDecl(Complex,List(VarDecl(Real,float), VarDecl(Imag,float))), just 100x longer. I was wondering if there is a good way to print case classes like these in a tree-like fashion so that it's easier to read..? (or some other form of Pretty Print)
ClassDecl
name = Complex
fields =
- VarDecl
name = Real
type = float
- VarDecl
name = Imag
type = float
^ I want to end up with something like this
edit: Bonus question
Is there also a way to show the name of the parameter..? Like: ClassDecl(name=Complex, fields=List( ... ) ?
Check out a small extensions library named sext. It exports these two functions exactly for purposes like that.
Here's how it can be used for your example:
object Demo extends App {
import sext._
case class ClassDecl( kind : Kind, list : List[ VarDecl ] )
sealed trait Kind
case object Complex extends Kind
case class VarDecl( a : Int, b : String )
val data = ClassDecl(Complex,List(VarDecl(1, "abcd"), VarDecl(2, "efgh")))
println("treeString output:\n")
println(data.treeString)
println()
println("valueTreeString output:\n")
println(data.valueTreeString)
}
Following is the output of this program:
treeString output:
ClassDecl:
- Complex
- List:
| - VarDecl:
| | - 1
| | - abcd
| - VarDecl:
| | - 2
| | - efgh
valueTreeString output:
- kind:
- list:
| - - a:
| | | 1
| | - b:
| | | abcd
| - - a:
| | | 2
| | - b:
| | | efgh
Starting Scala 2.13, case classes (which are an implementation of Product) are now provided with a productElementNames method which returns an iterator over their field's names.
Combined with Product::productIterator which provides the values of a case class, we have a simple way to pretty print case classes without requiring reflection:
def pprint(obj: Any, depth: Int = 0, paramName: Option[String] = None): Unit = {
val indent = " " * depth
val prettyName = paramName.fold("")(x => s"$x: ")
val ptype = obj match { case _: Iterable[Any] => "" case obj: Product => obj.productPrefix case _ => obj.toString }
println(s"$indent$prettyName$ptype")
obj match {
case seq: Iterable[Any] =>
seq.foreach(pprint(_, depth + 1))
case obj: Product =>
(obj.productIterator zip obj.productElementNames)
.foreach { case (subObj, paramName) => pprint(subObj, depth + 1, Some(paramName)) }
case _ =>
}
}
which for your specific scenario:
// sealed trait Kind
// case object Complex extends Kind
// case class VarDecl(a: Int, b: String)
// case class ClassDecl(kind: Kind, decls: List[VarDecl])
val data = ClassDecl(Complex, List(VarDecl(1, "abcd"), VarDecl(2, "efgh")))
pprint(data)
produces:
ClassDecl
kind: Complex
decls:
VarDecl
a: 1
b: abcd
VarDecl
a: 2
b: efgh
Use the com.lihaoyi.pprint library.
libraryDependencies += "com.lihaoyi" %% "pprint" % "0.4.1"
val data = ...
val str = pprint.tokenize(data).mkString
println(str)
you can also configure width, height, indent and colors:
pprint.tokenize(data, width = 80).mkString
Docs: https://github.com/com-lihaoyi/PPrint
Here's my solution which greatly improves how http://www.lihaoyi.com/PPrint/ handles the case-classes (see https://github.com/lihaoyi/PPrint/issues/4 ).
e.g. it prints this:
for such a usage:
pprint2 = pprint.copy(additionalHandlers = pprintAdditionalHandlers)
case class Author(firstName: String, lastName: String)
case class Book(isbn: String, author: Author)
val b = Book("978-0486282114", Author("first", "last"))
pprint2.pprintln(b)
code:
import pprint.{PPrinter, Tree, Util}
object PPrintUtils {
// in scala 2.13 this would be even simpler/cleaner due to added product.productElementNames
protected def caseClassToMap(cc: Product): Map[String, Any] = {
val fieldValues = cc.productIterator.toSet
val fields = cc.getClass.getDeclaredFields.toSeq
.filterNot(f => f.isSynthetic || java.lang.reflect.Modifier.isStatic(f.getModifiers))
fields.map { f =>
f.setAccessible(true)
f.getName -> f.get(cc)
}.filter { case (k, v) => fieldValues.contains(v) }
.toMap
}
var pprint2: PPrinter = _
protected def pprintAdditionalHandlers: PartialFunction[Any, Tree] = {
case x: Product =>
val className = x.getClass.getName
// see source code for pprint.treeify()
val shouldNotPrettifyCaseClass = x.productArity == 0 || (x.productArity == 2 && Util.isOperator(x.productPrefix)) || className.startsWith(pprint.tuplePrefix) || className == "scala.Some"
if (shouldNotPrettifyCaseClass)
pprint.treeify(x)
else {
val fieldMap = caseClassToMap(x)
pprint.Tree.Apply(
x.productPrefix,
fieldMap.iterator.flatMap { case (k, v) =>
val prettyValue: Tree = pprintAdditionalHandlers.lift(v).getOrElse(pprint2.treeify(v))
Seq(pprint.Tree.Infix(Tree.Literal(k), "=", prettyValue))
}
)
}
}
pprint2 = pprint.copy(additionalHandlers = pprintAdditionalHandlers)
}
// usage
pprint2.println(SomeFancyObjectWithNestedCaseClasses(...))
import java.lang.reflect.Field
...
/**
* Pretty prints case classes with field names.
* Handles sequences and arrays of such values.
* Ideally, one could take the output and paste it into source code and have it compile.
*/
def prettyPrint(a: Any): String = {
// Recursively get all the fields; this will grab vals declared in parents of case classes.
def getFields(cls: Class[_]): List[Field] =
Option(cls.getSuperclass).map(getFields).getOrElse(Nil) ++
cls.getDeclaredFields.toList.filterNot(f =>
f.isSynthetic || java.lang.reflect.Modifier.isStatic(f.getModifiers))
a match {
// Make Strings look similar to their literal form.
case s: String =>
'"' + Seq("\n" -> "\\n", "\r" -> "\\r", "\t" -> "\\t", "\"" -> "\\\"", "\\" -> "\\\\").foldLeft(s) {
case (acc, (c, r)) => acc.replace(c, r) } + '"'
case xs: Seq[_] =>
xs.map(prettyPrint).toString
case xs: Array[_] =>
s"Array(${xs.map(prettyPrint) mkString ", "})"
// This covers case classes.
case p: Product =>
s"${p.productPrefix}(${
(getFields(p.getClass) map { f =>
f setAccessible true
s"${f.getName} = ${prettyPrint(f.get(p))}"
}) mkString ", "
})"
// General objects and primitives end up here.
case q =>
Option(q).map(_.toString).getOrElse("¡null!")
}
}
Just like parser combinators, Scala already contains pretty printer combinators in the standard library. (note: this library is deprecated as of Scala 2.11. A similar pretty printing library is a part of kiama open source project).
You are not saying it plainly in your question if you need the solution that does "reflection" or you'd like to build the printer explicitly. (though your "bonus question" hints you probably want "reflective" solution)
Anyway, in the case you'd like to develop simple pretty printer using plain Scala library, here it is. The following code is REPLable.
case class VarDecl(name: String, `type`: String)
case class ClassDecl(name: String, fields: List[VarDecl])
import scala.text._
import Document._
def varDoc(x: VarDecl) =
nest(4, text("- VarDecl") :/:
group("name = " :: text(x.name)) :/:
group("type = " :: text(x.`type`))
)
def classDoc(x: ClassDecl) = {
val docs = ((empty:Document) /: x.fields) { (d, f) => varDoc(f) :/: d }
nest(2, text("ClassDecl") :/:
group("name = " :: text(x.name)) :/:
group("fields =" :/: docs))
}
def prettyPrint(d: Document) = {
val writer = new java.io.StringWriter
d.format(1, writer)
writer.toString
}
prettyPrint(classDoc(
ClassDecl("Complex", VarDecl("Real","float") :: VarDecl("Imag","float") :: Nil)
))
Bonus question: wrap the printers into type classes for even greater composability.
The nicest, most concise "out-of-the" box experience I've found is with the Kiama pretty printing library. It doesn't print member names without using additional combinators, but with only import org.kiama.output.PrettyPrinter._; pretty(any(data)) you have a great start:
case class ClassDecl( kind : Kind, list : List[ VarDecl ] )
sealed trait Kind
case object Complex extends Kind
case class VarDecl( a : Int, b : String )
val data = ClassDecl(Complex,List(VarDecl(1, "abcd"), VarDecl(2, "efgh")))
import org.kiama.output.PrettyPrinter._
// `w` is the wrapping width. `1` forces wrapping all components.
pretty(any(data), w=1)
Produces:
ClassDecl (
Complex (),
List (
VarDecl (
1,
"abcd"),
VarDecl (
2,
"efgh")))
Note that this is just the most basic example. Kiama PrettyPrinter is an extremely powerful library with a rich set of combinators specifically designed for intelligent spacing, line wrapping, nesting, and grouping. It's very easy to tweak to suit your needs. As of this posting, it's available in SBT with:
libraryDependencies += "com.googlecode.kiama" %% "kiama" % "1.8.0"
Using reflection
import scala.reflect.ClassTag
import scala.reflect.runtime.universe._
object CaseClassBeautifier {
def getCaseAccessors[T: TypeTag] = typeOf[T].members.collect {
case m: MethodSymbol if m.isCaseAccessor => m
}.toList
def nice[T:TypeTag](x: T)(implicit classTag: ClassTag[T]) : String = {
val instance = x.asInstanceOf[T]
val mirror = runtimeMirror(instance.getClass.getClassLoader)
val accessors = getCaseAccessors[T]
var res = List.empty[String]
accessors.foreach { z ⇒
val instanceMirror = mirror.reflect(instance)
val fieldMirror = instanceMirror.reflectField(z.asTerm)
val s = s"${z.name} = ${fieldMirror.get}"
res = s :: res
}
val beautified = x.getClass.getSimpleName + "(" + res.mkString(", ") + ")"
beautified
}
}
This is a shamless copy paste of #F. P Freely, but
I've added an indentation feature
slight modifications so that the output will be of correct Scala style (and will compile for all primative types)
Fixed string literal bug
Added support for java.sql.Timestamp (as I use this with Spark a lot)
Tada!
// Recursively get all the fields; this will grab vals declared in parents of case classes.
def getFields(cls: Class[_]): List[Field] =
Option(cls.getSuperclass).map(getFields).getOrElse(Nil) ++
cls.getDeclaredFields.toList.filterNot(f =>
f.isSynthetic || java.lang.reflect.Modifier.isStatic(f.getModifiers))
// FIXME fix bug where indent seems to increase too much
def prettyfy(a: Any, indentSize: Int = 0): String = {
val indent = List.fill(indentSize)(" ").mkString
val newIndentSize = indentSize + 2
(a match {
// Make Strings look similar to their literal form.
case string: String =>
val conversionMap = Map('\n' -> "\\n", '\r' -> "\\r", '\t' -> "\\t", '\"' -> "\\\"", '\\' -> "\\\\")
string.map(c => conversionMap.getOrElse(c, c)).mkString("\"", "", "\"")
case xs: Seq[_] =>
xs.map(prettyfy(_, newIndentSize)).toString
case xs: Array[_] =>
s"Array(${xs.map(prettyfy(_, newIndentSize)).mkString(", ")})"
case map: Map[_, _] =>
s"Map(\n" + map.map {
case (key, value) => " " + prettyfy(key, newIndentSize) + " -> " + prettyfy(value, newIndentSize)
}.mkString(",\n") + "\n)"
case None => "None"
case Some(x) => "Some(" + prettyfy(x, newIndentSize) + ")"
case timestamp: Timestamp => "new Timestamp(" + timestamp.getTime + "L)"
case p: Product =>
s"${p.productPrefix}(\n${
getFields(p.getClass)
.map { f =>
f.setAccessible(true)
s" ${f.getName} = ${prettyfy(f.get(p), newIndentSize)}"
}
.mkString(",\n")
}\n)"
// General objects and primitives end up here.
case q =>
Option(q).map(_.toString).getOrElse("null")
})
.split("\n", -1).mkString("\n" + indent)
}
E.g.
case class Foo(bar: String, bob: Int)
case class Alice(foo: Foo, opt: Option[String], opt2: Option[String])
scala> prettyPrint(Alice(Foo("hello world", 10), Some("asdf"), None))
res6: String =
Alice(
foo = Foo(
bar = "hello world",
bob = 10
),
opt = Some("asdf"),
opt2 = None
)
If you use Apache Spark, you can use the following method to print your case classes :
def prettyPrint[T <: Product : scala.reflect.runtime.universe.TypeTag](c:T) = {
import play.api.libs.json.Json
println(Json.prettyPrint(Json.parse(Seq(c).toDS().toJSON.head)))
}
This gives a nicely formatted JSON representation of your case class instance. Make sure sparkSession.implicits._ is imported
example:
case class Adress(country:String,city:String,zip:Int,street:String)
case class Person(name:String,age:Int,adress:Adress)
val person = Person("Peter",36,Adress("Switzerland","Zürich",9876,"Bahnhofstrasse 69"))
prettyPrint(person)
gives :
{
"name" : "Peter",
"age" : 36,
"adress" : {
"country" : "Switzerland",
"city" : "Zürich",
"zip" : 9876,
"street" : "Bahnhofstrasse 69"
}
}
I would suggest using the same print that is used in the AssertEquals of your test framework of choice. I was using Scalameta and munit.Assertions.munitPrint(clue: => Any): String does the trick. I can pass nested classes to it and see the whole tree with the proper indentation.

Scala: how to traverse stream/iterator collecting results into several different collections

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
val lines : Iterator[String] = io.Source.fromFile(file).getLines()
val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty
for (line <- lines){
line match {
case errorPat(date,ip)=> errors.append((ip,date))
case loginPat(date,user,ip,id) =>logins.put(ip, id)
case _ => ""
}
}
errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}
Here is a possible solution:
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
val lines = Source.fromFile(file).getLines
val (err, log) = lines.collect {
case errorPat(inf, ip) => (Some((ip, inf)), None)
case loginPat(_, _, ip, id) => (None, Some((ip, id)))
}.toList.unzip
val ip2id = log.flatten.toMap
err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}
Corrections:
1) removed unnecessary types declarations
2) tuple deconstruction instead of ulgy ._1
3) left fold instead of mutable accumulators
4) used more convenient operator-like methods :+ and +
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)] = {
val lines = io.Source.fromFile(file).getLines()
val (logins, errors) =
((Map.empty[String, String], Seq.empty[(String, String)]) /: lines) {
case ((loginsAcc, errorsAcc), next) =>
next match {
case errorPat(date, ip) => (loginsAcc, errorsAcc :+ (ip -> date))
case loginPat(date, user, ip, id) => (loginsAcc + (ip -> id) , errorsAcc)
case _ => (loginsAcc, errorsAcc)
}
}
// more concise equivalent for
// errors.toList.map { case (ip, date) => (logins.getOrElse(ip, "none") + " " + ip) -> date }
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
I have a few suggestions:
Instead of a pair/tuple, it's often better to use your own class. It gives meaningful names to both the type and its fields, which makes the code much more readable.
Split the code into small parts. In particular, try to decouple pieces of code that don't need to be tied together. This makes your code easier to understand, more robust, less prone to errors and easier to test. In your case it'd be good to separate producing your input (lines of a log file) and consuming it to produce a result. For example, you'd be able to make automatic tests for your function without having to store sample data in a file.
As an example and exercise, I tried to make a solution based on Scalaz iteratees. It's a bit longer (includes some auxiliary code for IteratorEnumerator) and perhaps it's a bit overkill for the task, but perhaps someone will find it helpful.
import java.io._;
import scala.util.matching.Regex
import scalaz._
import scalaz.IterV._
object MyApp extends App {
// A type for the result. Having names keeps things
// clearer and shorter.
type LogResult = List[(String,String)]
// Represents a state of our computation. Not only it
// gives a name to the data, we can also put here
// functions that modify the state. This nicely
// separates what we're computing and how.
sealed case class State(
logins: Map[String,String],
errors: Seq[(String,String)]
) {
def this() = {
this(Map.empty[String,String], Seq.empty[(String,String)])
}
def addError(date: String, ip: String): State =
State(logins, errors :+ (ip -> date));
def addLogin(ip: String, id: String): State =
State(logins + (ip -> id), errors);
// Produce the final result from accumulated data.
def result: LogResult =
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
// An iteratee that consumes lines of our input. Based
// on the given regular expressions, it produces an
// iteratee that parses the input and uses State to
// compute the result.
def logIteratee(errorPat: Regex, loginPat: Regex):
IterV[String,List[(String,String)]] = {
// Consumes a signle line.
def consume(line: String, state: State): State =
line match {
case errorPat(date, ip) => state.addError(date, ip);
case loginPat(date, user, ip, id) => state.addLogin(ip, id);
case _ => state
}
// The core of the iteratee. Every time we consume a
// line, we update our state. When done, compute the
// final result.
def step(state: State)(s: Input[String]): IterV[String, LogResult] =
s(el = line => Cont(step(consume(line, state))),
empty = Cont(step(state)),
eof = Done(state.result, EOF[String]))
// Return the iterate waiting for its first input.
Cont(step(new State()));
}
// Converts an iterator into an enumerator. This
// should be more likely moved to Scalaz.
// Adapted from scalaz.ExampleIteratee
implicit val IteratorEnumerator = new Enumerator[Iterator] {
#annotation.tailrec def apply[E, A](e: Iterator[E], i: IterV[E, A]): IterV[E, A] = {
val next: Option[(Iterator[E], IterV[E, A])] =
if (e.hasNext) {
val x = e.next();
i.fold(done = (_, _) => None, cont = k => Some((e, k(El(x)))))
} else
None;
next match {
case None => i
case Some((es, is)) => apply(es, is)
}
}
}
// main ---------------------------------------------------
{
// Read a file as an iterator of lines:
// val lines: Iterator[String] =
// io.Source.fromFile("test.log").getLines();
// Create our testing iterator:
val lines: Iterator[String] = Seq(
"Error: 2012/03 1.2.3.4",
"Login: 2012/03 user 1.2.3.4 Joe",
"Error: 2012/03 1.2.3.5",
"Error: 2012/04 1.2.3.4"
).iterator;
// Create an iteratee.
val iter = logIteratee("Error: (\\S+) (\\S+)".r,
"Login: (\\S+) (\\S+) (\\S+) (\\S+)".r);
// Run the the iteratee against the input
// (the enumerator is implicit)
println(iter(lines).run);
}
}