I have following code:
import java.io._
import com.twitter.chill.{Input, Output, ScalaKryoInstantiator}
import scala.reflect.ClassTag
object serializer {
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
def load[T](file:_=>_,name:String,cls:Class[T]):T = {
if (java.nio.file.Files.notExists(new File(name).toPath())) {
val temp = file
val baos = new FileOutputStream(name)
val output = new Output(baos, 4096)
kryo.writeObject(output, temp)
temp.asInstanceOf[T]
}
else {
println("loading from " + name)
val baos = new FileInputStream(name)
val input = new Input(baos)
kryo.readObject(input,cls)
}
}
}
I want to use it in this way:
val mylist = serializer.load((1 to 100000).toList,"allAdj.bin",classOf[List[Int]])
I don't want to run (1 to 100000).toList every time so I want to pass it to the serializer and then decide to compute it for the first time and serialize it for future or load it from file.
The problem is that the code block is running first in my code, how can I pass the code block without executing it?
P.S. Is there any scala tool that do the exact thing for me?
To have parameters not be evaluated before being passed, use pass-by-name, like this:
def method(param: =>ParamType)
Whatever you pass won't be evaluated at the time you pass, but will be evaluated each time you use param, which might not be what you want either. To have it be evaluated only the first time you use, do this:
def method(param: =>ParamType) = {
lazy val p: ParamType = param
Then use only p on the body. The first time p is used, param will be evaluated and the value will be stored. All other uses of p will use the stored value.
Note that this happens every time you invoke method. That is, if you call method twice, it won't use the "stored" value of p -- it will evaluate it again on first use. If you want to "pre-compute" something, then perhaps you'd be better off with a class instead?
Related
In this test case, I want to check that a function is called a specific no. of times with specific values
"add tag information in supported tags" in {
val servicesTestEnv = new ServicesTestEnv(components = components)
val questionTransactionDBService = new QuestionsTransactionDatabaseService(
servicesTestEnv.mockAnswersTransactionRepository,
servicesTestEnv.mockPartitionsOfATagTransactionRepository,
servicesTestEnv.mockPracticeQuestionsTagsTransactionRepository,
servicesTestEnv.mockPracticeQuestionsTransactionRepository,
servicesTestEnv.mockSupportedTagsTransactionRepository,
servicesTestEnv.mockUserProfileAndPortfolioTransactionRepository,
servicesTestEnv.mockQuestionsCreatedByUserRepo,
servicesTestEnv.mockTransactionService,
servicesTestEnv.mockPartitionsOfATagRepository,
servicesTestEnv.mockHelperMethods
)
when(servicesTestEnv.mockTransactionService.start)
.thenReturn(servicesTestEnv.mockDistributedTransaction)
doNothing().when(servicesTestEnv.mockPracticeQuestionsTransactionRepository).add(ArgumentMatchers.any[DistributedTransaction],ArgumentMatchers.any[PracticeQuestion],ArgumentMatchers.any[MutationCondition])
doNothing().when(servicesTestEnv.mockPracticeQuestionsTagsTransactionRepository).add(ArgumentMatchers.any[DistributedTransaction],ArgumentMatchers.any[PracticeQuestionTag],ArgumentMatchers.any[MutationCondition])
when(servicesTestEnv.mockPartitionsOfATagTransactionRepository.get(ArgumentMatchers.any[DistributedTransaction],ArgumentMatchers.any[TagPartitionKeys]))
.thenReturn(Right(servicesTestEnv.questionTestEnv.tagPartitions))
doNothing().when(servicesTestEnv.mockPartitionsOfATagTransactionRepository).add(ArgumentMatchers.any[DistributedTransaction],ArgumentMatchers.any[TagPartitions],ArgumentMatchers.any[MutationCondition])
doNothing().when(servicesTestEnv.mockQuestionsCreatedByUserRepo).add(ArgumentMatchers.any[DistributedTransaction],ArgumentMatchers.any[QuestionsCreatedByAUserForATag],ArgumentMatchers.any[MutationCondition])
when(servicesTestEnv.mockUserProfileAndPortfolioTransactionRepository.get(ArgumentMatchers.any[DistributedTransaction],ArgumentMatchers.any[ExternalUserProfileKeys]))
.thenReturn(Right(servicesTestEnv.externalUserProfileWithTags))
doNothing().when(servicesTestEnv.mockUserProfileAndPortfolioTransactionRepository).update(ArgumentMatchers.any[DistributedTransaction],ArgumentMatchers.any[ExternalUserProfile])
doNothing().when(servicesTestEnv.mockDistributedTransaction).commit()
when(servicesTestEnv.mockSupportedTagsTransactionRepository.get(ArgumentMatchers.any[DistributedTransaction],ArgumentMatchers.any[SupportedTagsKeys]))
.thenReturn(Left(SupportedTagNotFoundException()))
val supportedTagInfo = SupportedTag("coding","javascript1","empty")
logger.trace(s"will compare with ${supportedTagInfo}")
val result = questionTransactionDBService.newQuestion(servicesTestEnv.questionTestEnv.practiceQuestion,servicesTestEnv.questionTestEnv.practiceQuestionTag,servicesTestEnv.user)
verify(servicesTestEnv.mockSupportedTagsTransactionRepository,times(0))
.add(servicesTestEnv.mockDistributedTransaction,supportedTagInfo)
}
If I change the value of supportedTagInfo to val supportedTagInfo = SupportedTag("coding","javascript-something else","empty"), the test case still passes.
In the traces,I can see that in both times the tag coding-javascript-empty was used. This value comes from servicesTestEnv.questionTestEnv.practiceQuestionTag which is common for both test cases and is supplied at
val result = questionTransactionDBService.newQuestion(servicesTestEnv.questionTestEnv.practiceQuestion,servicesTestEnv.questionTestEnv.practiceQuestionTag,servicesTestEnv.user)
TRACE - saving coding-javascript-empty in supported tag information
Am I doing something wrong or does Mockito not check the argument values?
UPDATE
I tried using ArgumentCaptor in Scala but am struggling.
I have created mock of the class as
val mockSupportedTagsTransactionRepository = mock(classOf[SupportedTagsTransactionRepository])
I am calling add method of the mock. Its signature is
def add(transaction:DistributedTransaction,supportedTag:SupportedTag,mutationCondition:MutationCondition = new PutIfNotExists()) = {...}
I call get and add methods of the above mock. I have defined their behaviour as
when(servicesTestEnv.mockSupportedTagsTransactionRepository.get(ArgumentMatchers.any[DistributedTransaction],ArgumentMatchers.any[SupportedTagsKeys]))
.thenReturn(Left(SupportedTagNotFoundException()))
Then I create the required ArgumentCaptor
val argumentCaptor2 = ArgumentCaptor.forClass(classOf[SupportedTag])
val argumentCaptor3 = ArgumentCaptor.forClass(classOf[MutationCondition])
and then invoke the function under test
verify(servicesTestEnv.mockSupportedTagsTransactionRepository ,times(1))
.add(argumentCaptor1.capture(),argumentCaptor2.capture(),argumentCaptor3.capture())
logger.trace(s"capture 1 ${argumentCaptor1.getAllValues}")
logger.trace(s"capture 2 ${argumentCaptor2.getAllValues}")
logger.trace(s"capture 3 ${argumentCaptor3.getAllValues}")
Then I check the result
val argumentsInvoked = argumentCaptor2.getAllValues
argumentsInvoked.contains(supportedTagInfo)
mustBe true
But argumentsInvoked type is List[Nothing] instead of List[SupportedTag]
The right way is to also specify the type of argument
val argumentCaptor1 = ArgumentCaptor.forClass(classOf[DistributedTransaction])
val argumentCaptor2 = ArgumentCaptor.forClass[SupportedTag,SupportedTag](classOf[SupportedTag]) //Note two types. First is type of argument, second is type of class. They are the same in my case.
val argumentCaptor3 = ArgumentCaptor.forClass(classOf[MutationCondition])
verify(servicesTestEnv.mockSupportedTagsTransactionRepository ,times(1))
.add(argumentCaptor1.capture(),argumentCaptor2.capture(),argumentCaptor3.capture())
val argumentsInvoked = argumentCaptor2.getAllValues //this now returns List[SupportedTag]
argumentsInvoked.size mustBe 1
val argument = argumentsInvoked.get(0)
argument.course mustBe supportedTagInfo.course
argument.subject mustBe supportedTagInfo.subject
argument.topic mustBe supportedTagInfo.topic
I have a pipeline with a set of PTransforms and my method is getting very long.
I'd like to write my DoFns and my composite transforms in a separate package and use them back in my main method. With python it's pretty straightforward, how can I achieve that with Scio? I don't see any example of doing that. :(
withFixedWindows(
FIXED_WINDOW_DURATION,
options = WindowOptions(
trigger = groupedWithinTrigger,
timestampCombiner = TimestampCombiner.END_OF_WINDOW,
accumulationMode = AccumulationMode.ACCUMULATING_FIRED_PANES,
allowedLateness = Duration.ZERO
)
)
.sumByKey
// How to write this in an another file and use it here?
.transform("Format Output") {
_
.withWindow[IntervalWindow]
.withTimestamp
}
If I understand your question correctly, you want to bundle your map, groupBy, ... transformations in a separate package, and use them in your main pipeline.
One way would be to use applyTransform, but then you would end up using PTransforms, which are not scala-friendly.
You can simply write a function that receives an SCollection and returns the transformed one, like:
def myTransform(input: SCollection[InputType]): Scollection[OutputType] = ???
But if you intend to write your own Source/Sink, take a look at the ScioIO class
You can use map function to map your elements example.
Instead of passing a lambda, you can pass a method reference from another class
Example .map(MyClass.MyFunction)
I think one way to solve this could be to define an object in another package and then create a method in that object that would have the logic required for your transformation. For example:
def main(cmdlineArgs: Array[String]): Unit = {
val (sc, args) = ContextAndArgs(cmdlineArgs)
val defaulTopic = "tweets"
val input = args.getOrElse("inputTopic", defaulTopic)
val output = args("outputTopic")
val inputStream: SCollection[Tweet] = sc.withName("read from pub sub").pubsubTopic(input)
.withName("map to tweet class").map(x => {parse(x).extract[Tweet]})
inputStream
.flatMap(sentiment.predict) // object sentiment with method predict
}
object sentiment {
def predict(tweet: Tweet): Option[List[TweetSentiment]] = {
val data = tweet.text
val emptyCase = Some("")
Some(data) match {
case `emptyCase` => None
case Some(v) => Some(entitySentimentFile(data)) // I used another method, //not defined
}
}
Please also this link for an example given in the Scio examples
I have a generator that creates a very compelx object. I cannot create this object through something like
val myGen = for{
a <- Gen.choose(-10,10)
...
} yield new MyClass(a,b,c,...)
I tried an approach of creating a custom generator like this
val myComplexGen :Gen[ComplexObject] = {
...
val myTempVariable = Gen.choose(-10,10)
val otherTempVal = Gen.choose(100,2000)
new MyComplexObject(myTempVariable,otherTempVal,...)
}
and then
test("myTest") {
forAll(myComplexGen){ complexObj =>
... // Here, complexObj.myTempVariable is always the same through all the iterations
}
}
While this works, the values generated are always the same. The inner Gen.choose yield always the same value.
Is there any way I can write a custom Gen with its own logic, and use inner Gen.choose inside, that would be random ?
I've been able to workaround the problem. The solution is definitely not elegant but that's the only way I could work it out.
I have transformed myComplexGen into a def, and called it inside another gen with dummy variables
def myComplexGen :ComplexObject = {
...
val myTempVariable = Gen.choose(-10,10)
val otherTempVal = Gen.choose(100,2000)
new MyComplexObject(myTempVariable,otherTempVal,...)
}
val realComplexGen :Gen[ComplexObject] = for {
i <- Gen.choose(0,10) // Not actually used, but for cannot be empty
} yield myComplexGen()
Now I can use realComplexGenin a forAll and the object is really random.
Based on: source code
I don't get why the parameter of Source.fromIterator is Function0[Iterator[T]] instead of Iterator[T].
Is there a pratical reason for this? Could we change the signature to def fromIterator(iterator: => Iterator[T]) instead ? (to avoid doing Source.fromIterator( () => myIterator) )
As per the docs:
The iterator will be created anew for each materialization, which is
the reason the method takes a function rather than an iterator
directly.
Stream stages are supposed to be re-usable so you can materialize them more than one. A given iterator, however, can (often) be consumed one time only. If fromIterator created a Source that referred to an existing iterator (whether passed by name or reference) a second attempt to materialize it could fail because the underlying iterator would be exhausted.
To get around this, the source needs to be able to instantiate a new iterator, so fromIterator allows you to supply the necessary logic to do this as a supplier function.
Here's an example of something we don't want to happen:
implicit val system = akka.actor.ActorSystem.create("test")
implicit val mat = akka.stream.ActorMaterializer(system)
val iter = Iterator.range(0, 2)
// pretend we pass the iterator directly...
val src = Source.fromIterator(() => iter)
Await.result(src.runForEach(println), 2.seconds)
// 0
// 1
// res0: akka.Done = Done
Await.result(src.runForEach(println), 2.seconds)
// res1: akka.Done = Done
// No results???
That's bad because the Source src is not re-usable since it doesn't give the same output on subsequent runs. However if we create the iterator lazily it works:
val iterFunc = () => Iterator.range(0, 2)
val src = Source.fromIterator(iterFunc)
Await.result(src.runForEach(println), 2.seconds)
// 0
// 1
// res0: akka.Done = Done
Await.result(src.runForEach(println), 2.seconds)
// 0
// 1
// res1: akka.Done = Done
I'm trying to implement a class evaluating the result of given scala script and input value and print the result of evaluation with scala.tools.nsc.interpreter.IMain. Here's my test code.
package some.test.package
import javax.script.{Compilable, CompiledScript}
import scala.tools.nsc.Settings
import scala.tools.nsc.interpreter.{IMain, JPrintWriter}
class Evaluator {
// Create IMain instance when created
val settings = new Settings()
settings.usejavacp.value = true
val writer = new JPrintWriter(Console.out, true)
val engine = new IMain(settings, writer)
val compiler = engine.asInstanceOf[Compilable]
var inputVal : Int = _
var compiledObj : CompiledScript = _
// compile given script
def compileScript(givenCode : String) {
compiledObj = compiler.compile(givenCode)
}
// evaluate
def evalCompiled(): Unit = {
compiledObj.eval()
}
// set input value
def setInput(givenInput:Int) {
inputVal = givenInput
}
// bind input variable
def bindInput() {
engine.bind("inputVal", "Int", inputVal)
}
}
object IMainTest {
def main(args:Array[String]): Unit = {
// create an instance
val evaluator = new Evaluator()
// first set input value to 3 and do evaluation
evaluator.setInput(3)
evaluator.bindInput()
evaluator.compileScript("def f(x:Int):Int=x+1; println(f(inputVal))")
evaluator.evalCompiled()
// let's change input value and re-evaluate
evaluator.setInput(5)
evaluator.evalCompiled()
}
}
My expectation was the program prints out '4' and '6' but the result was '4' and '4' printed out.
So, my question is...
Does IMain.bind just copies value not reference to some place that the engine can refer to?
Should I compile every time when I want to re-assign value of variable?
If the answer for 1 and 2 is 'YES', then is there any other way that I can achieve my purpose without re-bind and re-compile? I just expected re-assigning input value does not need additional bind operation or compile operation.
I think it's really hard to find IMain related examples or documents so any reference links might be also very helpful for me. Thank you.