I have written some code in scala and created one method inside that now i want to call this method in another program but I'm not not getting any result after calling this method.
First Program
object helper_class {
def driver {
def main(args: Array[String]) {
val sparksession = SparkSession.builder().appName("app").enableHiveSupport().getOrCreate();
val filepath: String = args(0)
val d1 = spark.sql(s"load data inpath '${args(0)}' into table databasename.tablename")
//some more reusable code
}
}
}
second Program
import Packagename.helper_class.driver
object child_program {
def main(args: Array[String]) {
driver //I want to call this method from helper_class
}
}
if I'm removing def main(args: Array[String]) from 1st code its giving error near args(0) as args(0) not found
args(0) I am planning to pass as spark-submit
can someone please help me how should i Implement this.
The main method is your entry point to run a program. It can't be wrapped within another method. Thus, you can't wrap it within another method.
I think what you are trying to do, is to load some Spark setup code and then use that "driver" to do something else... This is my best guess on how that could work
object ChildProgram {
def main(args: Array[String]): Unit = {
val driver = Helper.driver(args)
// do something with the driver
}
}
object Helper {
def driver(args: Array[String]) = {
val sparksession = SparkSession.builder().appName("app").enableHiveSupport().getOrCreate()
val filepath: String = args(0)
val d1 = spark.sql(s"load data inpath '${args(0)}' into table databasename.tablename")
//some more reusable code
// I am assuming you are going to return the driver here
}
}
That said, I would highly recommend reading a bit on Scala before attempting to go down that route, because you are likely to face even more obstacles. If I am to recommend a resource, you can try the awesome book "Scala for the Impatient" which should get you up and running very quickly.
Related
I am trying to parse json extract elements into case class. Just curios why code is running one way and not the other way.
This code works
import org.json4s._
import org.json4s.DefaultFormats
import org.json4s.jackson.JsonMethods.parse
object JsonCase {
def main(args: Array[String]): Unit = {
implicit val formats = DefaultFormats
val input = """{"InputDB: "XYZ"}"""
case class config(stagingDB: String)
val spec = parse(input).extract[config]
println(spec.stagingDB)
}
}
Why below code doesn't work
import org.json4s._
import org.json4s.DefaultFormats
import org.json4s.jackson.JsonMethods.parse
implicit val formats = DefaultFormats
val input = """{"stagingDB": "XYZ"}"""
case class config(stagingDB: String)
val spec = parse(input).extract[config]
println(spec.stagingDB)
I find the opposite to be true. The second block of code works but the first fails because the quoting is wrong in input. There is no closing " for InputDB so it is not valid JSON.
More generally, when comparing two blocks of code you should remove as much of the shared code as possible. So config, input and formats should be outside the object and shared between both examples so that you know you are focussing on the differences in the code not the similarities.
Everything needs to be defined at least at object/class level (it makes no sense otherwise).
Below is I think what you want
object JsonCase {
// This way it applies to the whole object
implicit val formats = DefaultFormats
def main(args: Array[String]): Unit = {
// do stuff
....
I am trying to execute sample code on databricks in scala. It is an object.
object Main {
def main(args: Array[String]) {
val res = for (a <- args) yield a.toUpperCase
println("Arguments: " + res.toString)
}
}
When I run on databricks; it says 'object defined main'. I am not sure how to execute it now or what is the code to execute it. Please help.
What you are working with is kind of scala REPL. Basically "main" function does not have any significance over there. Having said that you can run your function as follows
object Main {
def main(args: Array[String]) {
val res = for (a <- args) yield a.toUpperCase
println(res)
println("Arguments: " + res.toString)
}
}
Main.main(Array("123","23123"))
As is you can call Object Main's main method.
You can call the main method in the Main object as follows:
val args: Array[String] = Array("test1", "test2", "test3")
Main.main(args)
What you have in your main method won't print what you expect, which I assume is the values contained in the res array. To accomplish that you would need to change it to something like the following:
object Main {
def main(args: Array[String]): Unit = {
val res = for (a <- args) yield a.toUpperCase
println("Arguments: " + res.mkString(" "))
}
}
I have a method in my spark application that loads the data from a MySQL database. the method looks something like this.
trait DataManager {
val session: SparkSession
def loadFromDatabase(input: Input): DataFrame = {
session.read.jdbc(input.jdbcUrl, s"(${input.selectQuery}) T0",
input.columnName, 0L, input.maxId, input.parallelism, input.connectionProperties)
}
}
The method does nothing else other than executing jdbc method and loads data from the database. How can I test this method? The standard approach is to create a mock of the object session which is an instance of SparkSession. But since SparkSession has a private constructor I was not able to mock it using ScalaMock.
The main ask here is that my function is a pure side-effecting function (the side-effect being pull data from relational database) and how can i unit test this function given that I have issues mocking SparkSession.
So is there any way I can mock SparkSession or any other better way than mocking to test this method?
In your case I would recommend not to mock the SparkSession. This would more or less mock the entire function (which you could do anyways). If you want to test this function my suggestion would be to run an embeded database (like H2) and use a real SparkSession. To do this you need to provide the SparkSession to your DataManager.
Untested sketch:
Your code:
class DataManager (session: SparkSession) {
def loadFromDatabase(input: Input): DataFrame = {
session.read.jdbc(input.jdbcUrl, s"(${input.selectQuery}) T0",
input.columnName, 0L, input.maxId, input.parallelism, input.connectionProperties)
}
}
Your test-case:
class DataManagerTest extends FunSuite with BeforeAndAfter {
override def beforeAll() {
Connection conn = DriverManager.getConnection("jdbc:h2:~/test", "sa", "");
// your insert statements goes here
conn.close()
}
test ("should load data from database") {
val dm = DataManager(SparkSession.builder().getOrCreate())
val input = Input(jdbcUrl = "jdbc:h2:~/test", selectQuery="SELECT whateveryounedd FROM whereeveryouputit ")
val expectedData = dm.loadFromDatabase(input)
assert(//expectedData)
}
}
You can use mockito scala to mock SparkSession as shown in this article.
I have a spark program as follows:
object A {
var id_set: Set[String] = _
def init(argv: Array[String]) = {
val args = new AArgs(argv)
id_set = args.ids.split(",").toSet
}
def main(argv: Array[String]) {
init(argv)
val conf = new SparkConf().setAppName("some.name")
val rdd1 = getRDD(paras)
val rdd2 = getRDD(paras)
//......
}
def getRDD(paras) = {
//function details
getRDDDtails(paras)
}
def getRDDDtails(paras) = {
//val id_given = id_set
id_set.foreach(println) //worked normal, not empty
someRDD.filter{ x =>
val someSet = x.getOrElse(...)
//id_set.foreach(println) ------wrong, id_set just empty set
(some_set & id_set).size > 0
}
}
class AArgs(args: Array[String]) extends Serializable {
//parse args
}
I have a global variable id_set. At first, it is just an empty set. In main function, I call init which sets id_set to a non-empty set from args. After that, I call getRDD function which calls getRDDDtails. In getRDDDtails, I filter a rdd based on contents in id_set. However, the result semms to be empty. I tried to print is_set in executor, and it is just an empty line. So, the problem seems to be is_set is not well initilized(in init function). However, when I try to print is_set in driver(in head lines of function getRDDDtails), it worked normal, not empty.
So, I have tried to add val id_given = id_set in function getRDDDtails, and use id_given later. This seems to fix the problem. But I'm totally confused why should this happen? What is the execution order of Spark programs? Why does my solution work?
I am trying to write a test that will redirect the stdout of a main method, but it seems that once I call the main, it seems to start on another thread and I cannot capture the output. Here is the code:
This works:
val baos = new ByteArrayOutputStream
val ps = new PrintStream(baos)
System.setOut(ps)
print("123")
Assert.assertEquals("123", baos.toString)
This does not:
val baos = new ByteArrayOutputStream
val ps = new PrintStream(baos)
System.setOut(ps)
GameRunner.main(_)
Assert.assertEquals("123", baos.toString)
....
object GameRunner {
def main(args: Array[String]) {
print("123")
How can I catch the call to print in my test?
*I have also tried scala.Console.setOut
EDIT
I do notice that running GameRunner.main(_) does not even list anything in the console when I am not redirecting. What is causing this?
print is really Predef.print which calls Console.print. Even though you call System.setOut I don't know if that has an impact on Console.print. Try to call Console.setOut or try:
Console.withOut(ps)(GameRunner.main(null))
The other possibility is that by calling GameRunner.main(_) you are not executing anything (as may be it's just returning the function (args: Array[String]) => GameRunner.main(args)?. Should be quick to rule that out.
Edit yep:
scala> object A { def main(args: Array[String]) { println("1") } }
defined module A
scala> A.main(null)
1
scala> A.main(_)
res1: Array[String] => Unit = <function1>