How can i unit test console input in scala? - scala

How can i unit test console input in scala with scalaTest.
Code under Test:
object ConsoleAction {
def readInput(in: InputStream): List[String] = {
val bs = new BufferedSource(in)(Codec.default)
val l = bs.getLines()
l.takeWhile(_!="").toList
}
def main(args: Array[String]) {
val l = ConsoleAction.readInput(System.in)
println("--> "+l)
}
}
I'd like to test the readInput method.
A one line input can be tested like that:
"Result list" should "has 1 element" in {
val input = "Hello\\n"
val is = new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_8))
assert(ConsoleAction.readInput(is).size===1)
}
... but what is the way for multiline input?
input line 1
input line 2
thx

Your problem lies with how you're escaping your newline. You're doing "\\n" rather than "\n". This test should pass.
"Result list" should "has 2 elements" in {
val input = "Hello\nWorld\n"
val is = new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_8))
assert(ConsoleAction.readInput(is).size===2)
}

Related

Nested for loop in Scala not iterating through out the file?

I have first file with data as
A,B,C
B,E,F
C,N,P
And second file with data as below
A,B,C,YES
B,C,D,NO
C,D,E,TRUE
D,E,F,FALSE
E,F,G,NO
I need every record in the first file to iterate with all records in the second file. But it's happening only for the first record.
Below is the code:
import scala.io.Source.fromFile
object TestComparision {
def main(args: Array[String]): Unit = {
val lines = fromFile("C:\\Users\\nreddy26\\Desktop\\Spark\\PRI.txt").getLines
val lines2 = fromFile("C:\\Users\\nreddy26\\Desktop\\Spark\\LKP.txt").getLines
var l = 0
var cnt = 0
for (line <- lines) {
for (line2 <- lines2) {
val cols = line.split(",").map(_.trim)
println(s"${cols(0)}|${cols(1)}|${cols(2)}")
val cols2 = line2.split(",").map(_.trim)
println(s"${cols2(0)}|${cols2(1)}|${cols2(2)}|${cols2(3)}")
}
}
}
}
As rightly suggested by #Luis, get the lines in List form by using toList:
val lines = fromFile("C:\\Users\\nreddy26\\Desktop\\Spark\\PRI.txt").getLines.toList
val lines2 = fromFile("C:\\Users\\nreddy26\\Desktop\\Spark\\LKP.txt").getLines.toList

Foreach not able to print the list elements

I am working on spark project using Scala. I need to print each element of a list named 'c' along with a variable. I am using jdoodle right now to run this small code, I am getting "value foreach is not a member of Any" error and the error message points to the foreach in the print statement.
object Graph {
def main(args: Array[String]) {
val line="1,2,3,4,5,6"
val a = line.split(",")
val b=Seq(a(0),a(0),a.drop(1).toList)
val c=b(2)
print(Seq((b(0),b(1)),(c.foreach{x=>print(s"($x,$b(1))")})))
}
}
I want the result to be a sequence like this [(1,1)(2,1)(3,1)(4,1)(5,1)(6,1)]
val data = "1,2,3,4,5,6".split(",")
//safe even if data is an empty Array()
val res = data.foldRight(Seq.empty[(String,String)]){ case (n,arr) =>
(n,data.head) +: arr}
res.foreach(print) //(1,1)(2,1)(3,1)(4,1)(5,1)(6,1)
(Suspiciously similar to this mangled question.)
val line = "1,2,3,4,5,6"
val lineOut=line.split(",").toList.map(f => (f,line.head))

What is Spark execution order with function calls in scala?

I have a spark program as follows:
object A {
var id_set: Set[String] = _
def init(argv: Array[String]) = {
val args = new AArgs(argv)
id_set = args.ids.split(",").toSet
}
def main(argv: Array[String]) {
init(argv)
val conf = new SparkConf().setAppName("some.name")
val rdd1 = getRDD(paras)
val rdd2 = getRDD(paras)
//......
}
def getRDD(paras) = {
//function details
getRDDDtails(paras)
}
def getRDDDtails(paras) = {
//val id_given = id_set
id_set.foreach(println) //worked normal, not empty
someRDD.filter{ x =>
val someSet = x.getOrElse(...)
//id_set.foreach(println) ------wrong, id_set just empty set
(some_set & id_set).size > 0
}
}
class AArgs(args: Array[String]) extends Serializable {
//parse args
}
I have a global variable id_set. At first, it is just an empty set. In main function, I call init which sets id_set to a non-empty set from args. After that, I call getRDD function which calls getRDDDtails. In getRDDDtails, I filter a rdd based on contents in id_set. However, the result semms to be empty. I tried to print is_set in executor, and it is just an empty line. So, the problem seems to be is_set is not well initilized(in init function). However, when I try to print is_set in driver(in head lines of function getRDDDtails), it worked normal, not empty.
So, I have tried to add val id_given = id_set in function getRDDDtails, and use id_given later. This seems to fix the problem. But I'm totally confused why should this happen? What is the execution order of Spark programs? Why does my solution work?

Can spark-submit with named argument?

I know i can pass argument to main function by
spark-submit com.xxx.test 1 2
and get argument by:
def main(args: Array[String]): Unit = {
// 读取参数
var city = args(0)
var num = args(1)
but i want to know is there a path to pass named argument like:
spark-submit com.xxx.test --citys=1 --num=2
and how to get this named argument in main.scala?
you can write your own custom class which parses the input arguments based on the key something like below:
object CommandLineUtil {
def getOpts(args: Array[String], usage: String): collection.mutable.Map[String, String] = {
if (args.length == 0) {
log.warn(usage)
System.exit(1)
}
val (opts, vals) = args.partition {
_.startsWith("-")
}
val optsMap = collection.mutable.Map[String, String]()
opts.map { x =>
val pair = x.split("=")
if (pair.length == 2) {
optsMap += (pair(0).split("-{1,2}")(1) -> pair(1))
} else {
log.warn(usage)
System.exit(1)
}
}
optsMap
}
}
Then you can use the methods with in your spark application
val usage = "Usage: [--citys] [--num]"
val optsMap = CommandLineUtil.getOpts(args, usage)
val citysValue = optsMap("citys")
val numValue = optsMap("num")
You can improvise CommandLineUtil as per your requirements
No.
As you can read in the Documentation, you just pass the arguments of your application, and then you handle them.
So, if you want to have "named arguments", then you should implement that in your code (I mean it will be custom).

Searching for Terms and Printing lines in a Text File

The file name is searcher.scala and I want to be able to type:
scala searcher.scala "term I want to find" "file I want to search through" "new file with new lines"
I tried this code but keeps saying I have an empty iterator
import java.io.PrintWriter
val searchTerm = args(0)
val input = args(1)
val output = args(2)
val out = new PrintWriter(output);
val listLines = scala.io.Source.fromFile(input).getLines
for (line <- listLines)
{
{ out.println("Line: " + line) }
def term (x: String): Boolean = {x == searchTerm}
val newList = listLines.filter(term)
println(listLines.filter(term))
}
out.close;
You have iterator listLines and you read it few times but iterator is one-time object:
for (line <- listLines)
val newList = listLines.filter(term)
println(listLines.filter(term))
You need revise your code to avoid repeat using of iterator.