Good practice for error messages when checking multiple parameters - scala

My problem:
I have a function with two file paths that are both required to start with '/'.
I struggled with finding a simple solution to tell the caller of the function precisely what went wrong if the requirement failed.
For example, if only one of the paths is wrong, should I say the first file parameter was wrong or the parameter with name aFile? It was more complicated than I thought.
My solution:
After failing to give back the incorrect paths in a simple and clear manner, I settled with the following solution, where I give back all paths, because it was simple, gave clear information and is easy to adapt to more paths. (Also it was shorter than using an if-else statement):
/** ...
* #param i Some number...
* #param aFile Path of AData file to use (e.g. "/dir/a.csv")
* #param anyPath Random text...
* #param bFile Path of BData file to use (e.g. "/bData.csv")
*/
def f(i: Int, aFile: String, anyPath: String, bFile: String): Unit = {
val s = List(aFile, bFile)
require(!s.exists(!_.startsWith("/")),
"Paths aFile and bFile must start with '/', but List(aFile, bFile) was: " + s)
}
val a = "/adf"
val b = "asdf"
f(1, a, "eee", b)
// IllegalArgumentException: requirement failed:
// Paths aFile and bFile must start with '/', but List(aFile, bFile) was: List(/adf, asdf)
Are there suggestions or a good practice to handle such a case better?

Related

%3d instead of = in file path, then i try to open file from resources

I write some tests and to get absolute path from relative path i use this function
private def getAbsolutePath(filePath: String): String = {
getClass.getResource(filePath).getFile
}
and then i do:
println(getAbsolutePath("/parquetIncrementalProcessor/withPartitioning/"))
println(getAbsolutePath("/parquetIncrementalProcessor/withPartitioning/own_loading_id=1/partition_column=test/"))
i get:
/Users/19658296/csp-fp-snaphot/library/target/scala-2.11/test-classes/parquetIncrementalProcessor/withPartitioning/
/Users/19658296/csp-fp-snaphot/library/target/scala-2.11/test-classes/parquetIncrementalProcessor/withPartitioning/own_loading_id%3d1/partition_column%3dtest/
As you can see, instead of =, I get some strange symbol. At the same time, when I try to read these files with a park, he can read the path without %3d, and with %3d he gets the error "Path does not exist".
How can I fix this?
Seems like its URL encoded, maybe because using stuff from files and resources are designed to work with Universal Resource Locators. You can URLDecode it like so:
import java.net.URLDecoder
def getAbsolutePath(filePath: String): String = {
val path = getClass.getResource(filePath).getFile
URLDecoder.decode(path, "UTF-8")
}

Scala - Get files based on name pattern

I would like to filter the files based on some patterns like :
- Team_*.txt (e.g.: Team_Orlando.txt);
- Name.*.City.txt (e.g.: Name.Robert.California.txt);
Or any name (the pattern * . * - it has spaces because was broken my text).
All the filters come from a database table and they are dynamic.
I'm trying to avoid use commands from SO like cp or mv. Is possible to filter files using patterns like the above ?
Here is what i've tried but got a regex error:
def getFiles(dir:File, filter:String) = {
(dir.isDirectory, dir.exists) match {
case (true, true) =>
dir.listFiles.filter(f => f.getName.matches(filter))
case _ =>
Array[File]()
}
}
You can use java.nio Files.newDirectoryStream() for that, it will accept pattern in desired format:
val stream = Files.newDirectoryStream(dir, pattern)
Check http://docs.oracle.com/javase/tutorial/essential/io/dirs.html#glob for detailed description.

Compute file content hash with Scala

In our app, we are in need to compute file hash, so we can compare if the file was updated later.
The way I am doing it right now is with this little method:
protected[services] def computeMigrationHash(toVersion: Int): String = {
val migrationClassName = MigrationClassNameFormat.format(toVersion, toVersion)
val migrationClass = Class.forName(migrationClassName)
val fileName = migrationClass.getName.replace('.', '/') + ".class"
val resource = getClass.getClassLoader.getResource(fileName)
logger.debug("Migration file - " + resource.getFile)
val file = new File(resource.getFile)
val hc = Files.hash(file, Hashing.md5())
logger.debug("Calculated migration file hash - " + hc.toString)
hc.toString
}
It all works perfectly, until the code get's deployed into different environment and file file is located in a different absolute path. I guess, the hashing take the path into account as well.
What is the best way to calculate some sort of reliable hash of a file content that well produce the same result for as log as the content of a file stays the same?
Thanks,
Having perused the source code https://github.com/google/guava/blob/master/guava/src/com/google/common/io/Files.java - only the file contents are hashed - the path does not come into play.
public static HashCode hash(File file, HashFunction hashFunction) throws IOException {
return asByteSource(file).hash(hashFunction);
}
Therefore you need not worry about locality of the file. Now why you end up with a different hash on a different fs .. maybe you should compare the size/contents to ensure eg no compound eol's were introduced.

Counting lines of a file in Scala

I am studying Scala nowadays and this is my code snippet to count the number of lines in a text file.
//returns line number of a file
def getLineNumber(fileName: String): Integer = {
val src = io.Source.fromFile(fileName)
try {
src.getLines.size
} catch {
case error: FileNotFoundException => -1
case error: Exception => -1
}
finally {
src.close()
}
}
I am using Source.fromFile method as explained in Programming in Scala book. Here is the problem: If my text file is like this:
baris
ayse
deneme
I get the correct result 6. If I press enter after word deneme I still get number 6, however I exptect 7 in this case. If I press space after pressing enter I get 7 which is correct again. Is this a bug in Scala standard library or more possibly am I missing something?
Finally, my basic main method here If it helps:
def main(args: Array[String]): Unit = {
println(getLineNumber("C:\\Users\\baris\\Desktop\\bar.txt"))
}
It uses java.io.BufferedReader to readLine. Here is the source of that method:
/**
* Reads a line of text. A line is considered to be terminated by any one
* of a line feed ('\n'), a carriage return ('\r'), or a carriage return
* followed immediately by a linefeed.
*
* #return A String containing the contents of the line, not including
* any line-termination characters, or null if the end of the
* stream has been reached
*
* #exception IOException If an I/O error occurs
*
* #see java.nio.file.Files#readAllLines
*/
public String readLine() throws IOException {
return readLine(false);
}
Which calls this:
...
* #param ignoreLF If true, the next '\n' will be skipped
...
String readLine(boolean ignoreLF) ...
...
/* Skip a leftover '\n', if necessary */
if (omitLF && (cb[nextChar] == '\n'))
nextChar++;
skipLF = false;
omitLF = false;
So basically that's how it's implemented. I guess it depends what a line means to you. Are you counting lines that contain something or new line characters? - different things obviously.
If you press enter after word deneme simply you add an end-of-line sequence (CR+LF, in your case) to the 6th line. You see the cursor goes to new line, but you did not create a new line: You simply specify that the sixth line is over. To create a new line you have to put a character after the end-of-line sequence, as you make when you press space.

How to further improve error messages in Scala parser-combinator based parsers?

I've coded a parser based on Scala parser combinators:
class SxmlParser extends RegexParsers with ImplicitConversions with PackratParsers {
[...]
lazy val document: PackratParser[AstNodeDocument] =
((procinst | element | comment | cdata | whitespace | text)*) ^^ {
AstNodeDocument(_)
}
[...]
}
object SxmlParser {
def parse(text: String): AstNodeDocument = {
var ast = AstNodeDocument()
val parser = new SxmlParser()
val result = parser.parseAll(parser.document, new CharArrayReader(text.toArray))
result match {
case parser.Success(x, _) => ast = x
case parser.NoSuccess(err, next) => {
tool.die("failed to parse SXML input " +
"(line " + next.pos.line + ", column " + next.pos.column + "):\n" +
err + "\n" +
next.pos.longString)
}
}
ast
}
}
Usually the resulting parsing error messages are rather nice. But sometimes it becomes just
sxml: ERROR: failed to parse SXML input (line 32, column 1):
`"' expected but `' found
^
This happens if a quote characters is not closed and the parser reaches the EOT. What I would like to see here is (1) what production the parser was in when it expected the '"' (I've multiple ones) and (2) where in the input this production started parsing (which is an indicator where the opening quote is in the input). Does anybody know how I can improve the error messages and include more information about the actual internal parsing state when the error happens (perhaps something like a production rule stacktrace or whatever can be given reasonably here to better identify the error location). BTW, the above "line 32, column 1" is actually the EOT position and hence of no use here, of course.
I don't know yet how to deal with (1), but I was also looking for (2) when I found this webpage:
https://wiki.scala-lang.org/plugins/viewsource/viewpagesrc.action?pageId=917624
I'm just copying the information:
A useful enhancement is to record the input position (line number and column number) of the significant tokens. To do this, you must do three things:
Make each output type extend scala.util.parsing.input.Positional
invoke the Parsers.positioned() combinator
Use a text source that records line and column positions
and
Finally, ensure that the source tracks positions. For streams, you can simply use scala.util.parsing.input.StreamReader; for Strings, use scala.util.parsing.input.CharArrayReader.
I'm currently playing with it so I'll try to add a simple example later
In such cases you may use err, failure and ~! with production rules designed specifically to match the error.