How to create an Apache Beam Pipeline in Scala programming language? - scala

I am trying to implement Apache Beam in Scala. To do that, I took a simple task of loading a file from my local(windows) and getting a word count.
This is what I wrote.
object ReadFromFile {
def main(args: Array[String]): Unit = {
PipelineOptionsFactory.register(Class[MyOptions])
val options = PipelineOptionsFactory.fromArgs(args: _*).withValidation() .as(classOf[MyOptions])
val pipeline = Pipeline.create(options)
pipeline.apply("ReadFiles", TextIO.read().from(options.getInputFile))
.apply(ParDo.of(new ExtractWords))
.apply(Count.perElement())
.apply(MapElements.via(new FormatResult))
.apply("WriteWords", TextIO.write().to(options.getOutput))
pipeline.run().waitUntilFinish()
}
}
This is the rest of my setup. I followed the beam documentation from here to create an interface to implement get & set methods to read input file and create an output file.
trait MyOptions extends PipelineOptions{
#Description("Path of the file to read from")
#Default.String("path_of_input_file.txt")
def getInputFile: String
def setInputFile(path: String)
#Description("Path of the file to write to")
#Required
def getOutput: String
def setOutput(path: String)
}
and other UDFs to load, split & count the records.
class ExtractWords extends DoFn[String, String] {
#ProcessElement
def processElement(c: ProcessContext): Unit = {
for (word <- c.element().split(",")) yield {
if (word.nonEmpty) c.output(word)
}
}
}
class FormatResult extends SimpleFunction[KV[String, java.lang.Long], String] {
override def apply(input: KV[String, java.lang.Long]): String = {
input.getKey + ": " + input.getValue
}
}
class CountWords extends PTransform[PCollection[String], PCollection[KV[String, java.lang.Long]]] {
override def expand(input: PCollection[String]): PCollection[KV[String, lang.Long]] = {
input.apply(ParDo.of(new ExtractWords)) //Ignore IntelliJ error: "Cannot resolve apply". The code will compile.
.apply(Count.perElement())
}
}
But I am seeing a compilation error in my main class on the apply method at getting the count: .apply(Count.perElement())
which says:
Cannot resolve overloaded method 'apply'
This is my project structure and the same error could be seen in the image:
This is the first time I am learning Apache Beam. Could anyone let me know what is the mistake I did here and how can I fix the error ?

Related

Scala/Play: Test class: reading a custom configuration file from the /conf folder

Using Play 2.7. In a test class, I want to test a service that reads the configuration (properties) file conf/fun.conf.
I could not find any example how to carry out this feat. I have looked at the Play documentation and also questions here, but found nothing.
The service I wrote gets Configuration through injection:
class MyFunService #Inject() (config: Configuration) extends FunService {
override def getValue(key: String) = config.get[String](key)
}
I want to call this service from a test:
class FunSpec(implicit ee: ExecutionEnv) extends Specification {
sequential
"The FunService" should {
"retrieve a value" in new WithApplication() {
//Oops! Does not actually read the conf file...
val env = play.api.Environment.simple()
val config = play.api.Configuration.load(env)
//...so getting an Exception here.
val f = Future(new MyFunService(config).getValue("fun_stuff"))
f must beEqualTo("having_fun").await(retries = 0, timeout = 5.seconds)
}
}
}
I'm using this code to read test resources
import com.google.common.base.Charsets.UTF_8
import com.google.common.io.Resources
object ResourceFileSupport {
def read(file: String) =
Resources.toString(
Option( this.getClass.getClassLoader.getResource(file) ).getOrElse {
throw new IllegalArgumentException(s"resource $file not found in classpath.")
}, UTF_8)
}

DSL Like Syntax in Scala

I'm trying to come up with a CSV Parser that can be called like this:
parser parse "/path/to/csv/file" using parserConfiguration
Where the parser will be a class that contains the target case class into which the CSV file will be parsed into:
class CSVParser[A] {
def parse(path: String) = Source.fromFile(fromFilePath).getLines().mkString("\n")
def using(cfg: ParserConfig) = ??? How do I chain this optionally???
}
val parser = CSVParser[SomeCaseClass]
I managed to get up to the point where I can call:
parser parse "/the/path/to/the/csv/file/"
But I do not want to run the parse method yet as I want to apply the configuration using the using like DSL as mentioned above! So there are two rules here. If the caller does not supply a parserConfig, I should be able to run with the default, but if the user supplies a parserConfig, I want to apply the config and then run the parse method. I tried it with a combination of implicits, but could not get them to work properly!
Any suggestions?
EDIT: So the solution looks like this as per comments from "Cyrille Corpet":
class CSVReader[A] {
def parse(path: String) = ReaderWithFile[A](path)
case class ReaderWithFile[A](path: String) {
def using(cfg: CSVParserConfig): Seq[A] = {
val lines = Source.fromFile(path).getLines().mkString("\n")
println(lines)
println(cfg)
null
}
}
object ReaderWithFile {
implicit def parser2parsed[A](parser: ReaderWithFile[A]): Seq[A] = parser.using(defaultParserCfg)
}
}
object CSVReader extends App {
def parser[A] = new CSVReader[A]
val sss: Seq[A] = parser parse "/csv-parser/test.csv" // assign this to a val so that the implicit conversion gets applied!! Very important to note!
}
I guess I need to get the implicit in scope at the location where I call the parser parse, but at the same time I do not want to mess up the structure that I have above!
If you replace using with an operator with a higher precedence than parse you can get it to work without needing extra type annotations. Take for instance <<:
object parsedsl {
class ParserConfig
object ParserConfig {
val default = new ParserConfig
}
case class ParseUnit(path: String, config: ParserConfig)
object ParseUnit {
implicit def path2PU(path: String) = ParseUnit(path, ParserConfig.default)
}
implicit class ConfigSyntax(path: String) {
def <<(config: ParserConfig) = ParseUnit(path, config)
}
class CSVParser {
def parse(pu: ParseUnit) = "parsing"
}
}
import parsedsl._
val parser = new CSVParser
parser parse "path" << ParserConfig.default
parser parse "path"
Your parse method should just give a partial result, without doing anything at all. To deal with default implem, you can use implicit conversion to output type:
class CSVParser[A] {
def parse(path: String) = ParserWithFile[A](path)
}
case class ParserWithFile[A](path: String) {
def using(cfg: ParserConfig): A = ???
}
object ParserWithFile {
implicit def parser2parsed[A](parser: ParserWithFile[A]): A = parser.using(ParserConfig.default)
}
val parser = CSVParser[SomeCaseClass]

Specs2 with Scaldi - wrong implicit injector being invoked

I'm trying to run a test with scaldi and specs2. In the test I need to override a StringManipulator function that uses an injected ProxyManipulator. The ProxyManipulator takes a string and returns its upper case in a Future. The replacement manipulator in the test returns a Future("Test Message").
Here is the StringManipulator class where the injection occurs:
class StringManipulator {
def manip (str : String) (implicit inj: Injector) : String = {
val prox = inject[ProxyManipulator]
Await.result(prox.manipulate(str), 1 second)
}
}
I'm using a package.object that contains the implicit injector:
import modules.MyModule
package object controllers {
implicit val appModule = new MyModule
}
And here is the specs2 test with the new binding:
#RunWith(classOf[JUnitRunner])
class StringManipScaldiSpec extends Specification {
class TestModule extends Module {
bind [ProxyManipulator] to new ProxyManipulator {
override def manipulate(name: String) = Future("Test Message")
}
}
"Application" should {
"do something" in {
val myTestModule = new TestModule
val str = "my string"
val stringMan = new StringManipulator() //(myTestModule)
stringMan.manip(str)(myTestModule) === "Test Message"
}
}
}
The problem is that when the test runs the class StringManipulator is still using the original Proxy Manipulator instead of the one passed in the TestModule. Any ideas?

Test a nested method call on a mocked class using ScalaMock

I am new to both ScalaMock and mocking in general. I am trying to test a method which calls a method in another (mocked) class and then calls a method on the returned object.
Detailed information:
So I am using ScalaTest and there are five classes involved in this test...
SubInstruction which I am testing
class SubInstruction(label: String, val result: Int, val op1: Int, val op2: Int) extends Instruction(label, "sub") {
override def execute(m: Machine) {
val value1 = m.regs(op1)
val value2 = m.regs(op2)
m.regs(result) = value1 - value2
}
}
object SubInstruction {
def apply(label: String, result: Int, op1: Int, op2: Int) =
new SubInstruction(label, result, op1, op2)
}
Machine which must be mocked for the test
case class Machine(labels: Labels, prog: Vector[Instruction]) {
private final val NUMBEROFREGISTERS = 32
val regs: Registers = new Registers(NUMBEROFREGISTERS)
override def toString(): String = {
prog.foldLeft("")(_ + _)
}
def execute(start: Int) =
start.until(prog.length).foreach(x => prog(x) execute this)
}
object Machine extends App {
if (args.length == 0) {
println("Machine: args should be sml code file to execute")
} else {
println("SML interpreter - Scala version")
val m = Translator(args(0)).readAndTranslate(new Machine(Labels(), Vector()))
println("Here is the program; it has " + m.prog.size + " instructions.")
println(m)
println("Beginning program execution.")
m.execute(0)
println("Ending program execution.")
println("Values of registers at program termination:")
println(m.regs + ".")
}
}
Registers which is required to construct a Machine object
case class Registers(size: Int) {
val registers: Array[Int] = new Array(size)
override def toString(): String =
registers.mkString(" ")
def update(k: Int, v: Int) = registers(k) = v
def apply(k: Int) = registers(k)
}
MockableMachine which I have created as the original Machine class does not have an empty constructor and therefore (as I understand) can not be mocked
class MockableMachine extends Machine(Labels(), Vector()){
}
and finally my test class SubInstructionTest which compiles but throws the exception below.
class SubInstructionTest extends FlatSpec with MockFactory with Matchers {
val label1 = "f0"
val result1 = 25
val op1_1 = 24
val op2_1 = 20
val sub1 = SubInstruction(label1, result1, op1_1, op2_1)
"A SubInstruction" should "retrieve the operands from the correct registers in the given machine " +
"when execute(m: Machine) is called, and perform the operation saving the " +
"result in the correct register." in {
val mockMachine = mock[MockableMachine]
inSequence {
(mockMachine.regs.apply _).expects(op1_1).returning(50)
(mockMachine.regs.apply _).expects(op2_1).returning(16)
(mockMachine.regs.update _).expects(result1, 34)
}
sub1.execute(mockMachine)
}
}
Throws:
java.lang.NoSuchMethodException: Registers.mock$apply$0()
-
I have been searching for a straightforward way to mock this class for hours, but have found nothing. For the time being I have settled on the workaround detailed below, but I was under the impression that mocking would offer a less convoluted solution to the problem of testing my SubInstruction class.
The workaround:
Delete the MockableMachine class and create a CustomMachine class which extends Machine and replaces the registers value with mockedRegisters provided at construction time.
class CustomMachine (mockedRegister: Registers) extends Machine(Labels(), Vector()) {
override
val regs: Registers = mockedRegister
}
a MockableRegisters class which I have created as the original does not have an empty constructor and therefore (as I understand) can not be mocked
class MockableRegisters extends Registers(32) {
}
and the SubInstructionTest class written in a slightly different way
class SubInstructionTest extends FlatSpec with MockFactory with Matchers {
val label1 = "f0"
val result1 = 25
val op1_1 = 24
val op2_1 = 20
val sub1 = SubInstruction(label1, result1, op1_1, op2_1)
"A SubInstruction" should "retrieve the operands from the correct registers in the given machine " +
"when execute(m: Machine) is called, and perform the operation saving the " +
"result in the correct register." in {
val mockRegisters = mock[MockableRegisters]
val machine = new CustomMachine(mockRegisters)
inSequence {
(mockRegisters.apply _).expects(op1_1).returning(50)
(mockRegisters.apply _).expects(op2_1).returning(16)
(mockRegisters.update _).expects(result1, 34)
}
sub1.execute(machine)
}
}
As indicated, this feels like a workaround to me, is there not a simpler way to do this (perhaps similar to my original attempt)?
I have just included the essential code to ask the question, but you can find the full code on my GitHub account.
I don't think mocking nested objects is supported by Scalamock implicitly. You'll have to mock the object returned by the first call which is what your working example does.
FWIW, Mockito supports this. Search for RETURNS_DEEP_STUBS.

Class A cannot be cast to Class A after dynamic loading

Let's say I have:
object GLOBAL_OBJECT{
var str = ""
}
class A(_str: String){
GLOBAL_OBJECT.str = _str
}
and I would like to create 2 copies of GLOBAL_OBJECT (for tests), so I am using different classloader to create obj2:
val obj1 = new A("1")
val class_loader = new CustomClassLoader()
val clazz = class_loader.loadClass("my.packagename.A")
val obj2 = clazz.getDeclaredConstructor(classOf[String]).newInstance("2")
println("obj1.getSecret() == " + obj1.getSecret()) // Expected: 1
println("obj2.getSecret() == " + obj2.asInstanceOf[A].getSecret()) // Expected: 2
which results following error:
my.packagename.A cannot be cast to my.packagename.A.
IntelliJ Idea seems to do it correctly, I can run obj2.asInstanceOf[A].getSecret() in "expression" window during debug process without errors.
PS. I have seen similar questions, but I could not find any not regarding loading class from .jarfile.
You're not going to be able to get around Java's class casting, which requires strict typing, within the same ClassLoader. Same with traits/interfaces.
However, Scala comes to the rescue with structural typing (a.k.a. Duck Typing, as in "it quacks like a duck.") Instead of casting it to type A, cast it such that it has the method you want.
Here's an example of a function which uses structural typing:
def printSecret(name : String, secretive : { def getSecret : String } ) {
println(name+".getSecret = "+secretive.getSecret)
}
And here's sample usage:
printSecret("obj1", obj1) // Expected: 1
printSecret("obj2", obj2.asInstanceOf[ {def getSecret : String} ]) // Expected: 2
You could, of course, just call
println("secret: "+ obj2.asInstanceOf[ {def getSecret : String} ].getSecret
Here's full sample code that I wrote and tested.
Main code:
object TestBootstrap {
def createClassLoader() = new URLClassLoader(Array(new URL("file:///tmp/theTestCode.jar")))
}
trait TestRunner {
def runTest()
}
object RunTest extends App {
val testRunner = TestBootstrap.createClassLoader()
.loadClass("my.sample.TestCodeNotInMainClassLoader")
.newInstance()
.asInstanceOf[TestRunner]
testRunner.runTest()
}
In the separate JAR file:
object GLOBAL_OBJECT {
var str = ""
}
class A(_str: String) {
println("A classloader: "+getClass.getClassLoader)
println("GLOBAL classloader: "+GLOBAL_OBJECT.getClass.getClassLoader)
GLOBAL_OBJECT.str = _str
def getSecret : String = GLOBAL_OBJECT.str
}
class TestCodeNotInMainClassLoader extends TestRunner {
def runTest() {
println("Classloader for runTest: " + this.getClass.getClassLoader)
val obj1 = new A("1")
val classLoader1 = TestBootstrap.createClassLoader()
val clazz = classLoader1.loadClass("com.vocalabs.A")
val obj2 = clazz.getDeclaredConstructor(classOf[String]).newInstance("2")
def printSecret(name : String, secretive : { def getSecret : String } ) {
println(name+".getSecret = "+secretive.getSecret)
}
printSecret("obj1", obj1) // Expected: 1
printSecret("obj2", obj2.asInstanceOf[ {def getSecret : String} ]) // Expected: 2
}
}
Structural typing can be used for more than one method, the methods are separated with semicolons. So essentially you create an interface for A with all the methods you intend to test. For example:
type UnderTest = { def getSecret : String ; def myOtherMethod() : Unit }
One workaround to actually run some method from dynamically delivered object instead of casting it is to use reflection in order to extract particular method, from new class and then invoke it on our new object instance:
val m2: Method = obj2.getClass.getMethod("getSecret")
m2.invoke(obj2)
The class file that contains obj2.asInstanceOf[A].getSecret() should be reloaded by CustomClassLoader, too.
And you must not use any class that references to A unless you reload the class by the same class loader that reloads A.