For what use cases are AppSync pipeline resolvers better then express step functions? - aws-appsync

I am currently creating a AppSync API and after figuring out that I can use step functions as a resolver I was wondering why I would ever use a pipeline resolver?
Pipeline resolvers are significantly more complex and unwieldy to construct and maintain and are a pain to debug, whereas with step functions I get a graphical representation of my architecture and a very easy debugging interface.
But when I look at the fact that step functions is not a native resolver integration and the sparse documentation on this topic I am wondering what the disadvantages to this solution are, and in what cases the pipeline resolver is superior and why?

Related

How to implement Featuretools into my ML Process?

I am exploring the possibility of implementing Featuretools into my pipeline, to be able to create new features from my Df.
Currently I am using a GridSearchCV, with a Pipeline embedded inside it. Since Featuretools is creating new features with aggregation on columns, like STD(column) etc, I feel like it is suspectible to data leakage. In their FAQ, they are giving an example approach to tackle it, which is not suitable for a Pipeline structure I am using.
Idea 0: I would love to integrate it directly into my Pipeline but it seems like not compatible with Pipelines. It would use fold train data to construct features, transform fold test data. K times. At the end, it would use whole data to construct, during Refit= True stage of GridSearchCV. If you have any example opposed to this fact, you are very welcome.
Idea 1: I can switch to a manual CV structure, not embedded into pipeline. And inside it, I can use Train data to construct new features, and test data to transform with these. It will work K times. At the end, all data can be used to construct Ultimate model.
It is the safest option, with time and complexity disadvantages.
Idea 2: Using it with whole data, ignore the leakage possibility. I am not in favor of this of course. But when I look at Project Github page, all the examples are combining Train and Test data, creating these features with whole data. Then go on with Train-Test division for modeling.
https://github.com/Featuretools/predict-taxi-trip-duration/blob/master/NYC%20Taxi%203%20-%20Simple%20Featuretools.ipynb
Actually if the developers of the project think like that, I could give it a chance with whole data.
What do you think, I would love to hear about your experiences on FeatureTools.

In multi-stage compilation, should we use a standard serialisation method to ship objects through stages?

This question is formulated in Scala 3/Dotty but should be generalised to any language NOT in MetaML family.
The Scala 3 macro tutorial:
https://docs.scala-lang.org/scala3/reference/metaprogramming/macros.html
Starts with the The Phase Consistency Principle, which explicitly stated that free variables defined in a compilation stage CANNOT be used by the next stage, because its binding object cannot be persisted to a different compiler process:
... Hence, the result of the program will need to persist the program state itself as one of its parts. We don’t want to do this, hence this situation should be made illegal
This should be considered a solved problem given that many distributed computing frameworks demands the similar capability to persist objects across multiple computers, the most common kind of solution (as observed in Apache Spark) uses standard serialisation/pickling to create snapshots of the binded objects (Java standard serialization, twitter Kryo/Chill) which can be saved on disk/off-heap memory or send over the network.
The tutorial itself also suggested the possibility twice:
One difference is that MetaML does not have an equivalent of the PCP - quoted code in MetaML can access variables in its immediately enclosing environment, with some restrictions and caveats since such accesses involve serialization. However, this does not constitute a fundamental gain in expressiveness.
In the end, ToExpr resembles very much a serialization framework
Instead, Both Scala 2 & Scala 3 (and their respective ecosystem) largely ignores these out-of-the-box solutions, and only provide default methods for primitive types (Liftable in scala2, ToExpr in scala3). In addition, existing libraries that use macro relies heavily on manual definition of quasiquotes/quotes for this trivial task, making source much longer and harder to maintain, while not making anything faster (as JVM object serialisation is an highly-optimised language component)
What's the cause of this status quo? How do we improve it?

DRY or DAMP - Which one is efficient in API automation test scripts?

I am currently writing REST API automation test scripts. As most of the research article suggests that we should write DAMP (Descriptive And Meaningful Phrases) tests that promote readability. However, I feel that there are a lot of duplicate codes in my tests and in an attempt to remove the duplicates I end up with 'DRY' (Don't repeat yourself) code which tends to dependency tests. So I am a bit confused about which approach to use? I would really appreciate it If anyone can give me some suggestions on this?
A general rule is to keep the code related to the Test Objective DAMP, having everything else DRY.
To simplify the rule, code related to the Test Objective may reference to:
Actions that DIRECTLY impact the expected result
Data parameters that DIRECTLY affect the expected result
Code NOT related to the Test Objective may reference to:
Actions that do not impact the expected result DIRECTLY
(Example: authorization for the tests not related to login)
Configuration data and data parameters that does not affect the expected results DIRECTLY
(Example: Base URL, login and password for the tests not related to authorization)
My recommendations are:
re-use payloads (JSON or XML) from files where possible
sign-in flows that set an Authorization header should be re-usable
do not combine API requests to different end-points into a re-usable Scenario
even for the same end-point, for very different payloads (e.g. boundary / error conditions) use a separate Scenario for each
use Scenario Outlines for data-driven tests
Also please refer this answer for a good example of what NOT to do: https://stackoverflow.com/a/54126724/143475

Where are the test for Scala collections?

I am looking for tests which can be used for custom collections. Ideally these are behaviour tests.
For example implementing a new Map I would like to test if it follows all required Map-rules and
methods like map, filter, view, etc.
What is Scala using to test its own collections?
This is a good question that has been asked before on SO.
There are some collections tests under test/files/scalacheck and others under test/files/run/*coll* in the source repository.
There is no conformance test or TCK per se for custom collections. Integration with collections usually involves a specific implementation requirement.
For example, the ScalaDoc for immutable.MapLike tells you to implement get, iterator and + and -. In theory, if you test the template methods, you can rely on everything you get for free from the library.
But the doc adds:
It is also good idea to override methods foreach and size for
efficiency.
So if you care about that, you'll be adding performance tests too. The standard library doesn't include automated performance testing.

DSL to implement business rules for REST service routing and processing

I am hoping that Combinator parsers, (http://debasishg.blogspot.com/2008/04/external-dsls-made-easy-with-scala.html), will work for a design to process the routing rules for a REST service that is implemented with Scalatra,(http://tutorialbin.com/tutorials/80408/infoq-scalatra-a-sinatra-like-web-framework-for-scala).
This REST service is to serve as a proxy so external applications can get access to services within the firewall, as it will have additional layers of security that can be customized for the business requirements of each REST service.
So, if a person wants to access their class schedule there will be less security than if you want to look at the transcript of someone.
I would like the rules for where to go to actually get the information, and how to return it, as well as what security is needed, in a DSL.
But, the first problem is how to dynamically change the routing rules for the REST service based on a DSL, as I am trying to create a framework that doesn't require a great deal of recompiling to add new rules, but just write the appropriate scripts and just let it be processed.
So, can a DSL be implemented using the Combinator Parser, in Scala, that will allow JAX-RS (http://download.oracle.com/javaee/6/tutorial/doc/giepu.html) to have dynamically changed routing?
UPDATE:
I haven't designed the language yet, but this is what I am trying to do:
route /transcript using action GET to
http://inside.com/transcript/{firstparam}/2011/{secondparam}
return json encrypt with public key from /mnt/publickey.txt
for /education_cost using action GET combine http://combine.com/SOAP/costeducate with
http://combine.com/education_benefit/2010 with
http://combine.com/education_benefit/2011 return html
These are two possible ideas where rules for a request for a transcript is sent to a different site, such as within a firewall, and the data is encrypted and returned.
The second would be more complicated in that the results of a SOAP and two REST requests will be combined, and there would need to be additional commands on how this is combined, but the idea is to put all of this in files that can be parsed on the fly.
If I used Groovy then some new classes could be generated for the routing, which would remove some performance hits, but I think using Scala would be the best bet, even if I took a performance hit.
My hope is to make a framework that is more maintainable so new routing rules can be written by people that don't know any OOP or functional languages, but the specifications could be written using Specs (http://code.google.com/p/specs/) so that the functional side could be certain that their requirements are tested on a regular basis.
UPDATE 2:
When I start working on a design I may intuitively understand some options, but not know why. Today I realized that the reason that Groovy may be a better fix for this is that I could then generate the classes for routing, using the metaprogramming (http://www.justinspradlin.com/programming/groovy-metaprogramming-adding-behavior-dynamically/), then I would be able to use Scala or Groovy to dynamically use the routing that was generated. I am not certain how to get Scala to generate the classes if they don't already exist.
In Groovy, as well as some other languages, as shown here (http://langexplr.blogspot.com/2008/02/handling-call-to-missing-method-in.html) if a method is missing you can dynamically generate the method and it will henceforth exist, so it will be missing one time.
It almost seems that I should be mixing Groovy with Java to make this work, but then the result may be that some of the code is in Scala and some in Java, for the routing of REST services.
Splitting the question in two parts:
can a DSL be implemented using the Combinator Parser
Yes. There are things that cannot be implemented using a combinator parser, or even other kinds of parser. For instance, Perl itself cannot be parsed (it must be evaluated). And combinator parsers are also not particularly good for complex languages (such as Scala -- its compiler is not based on combinator parsers), or if you demand top performance (such as the compilers used to compile hundreds of thousands of lines of code).
If, however, you plan to go to such extremes, choosing the parser is not going to be your main problem. For DSLs of average complexity, they'll do just fine.
that will allow JAX-RS to have dynamically changed routing
Well, I don't know JAX-RS, but if dynamically changed routing can be done with it, then combinators parsers will be able to provide whatever input is needed.
EDIT
Seeing your example, I think parser combinators are certainly enough. From their results, I expect you could dynamically create BlueEyes binders -- I haven't used BlueEyes, so I'm not sure how dynamic they are.
Another alternative would be go with Lift. Lift's binders are partial functions, and they can be combined in all the usual ways -- f1 orElse f2, f1 andThen f2, etc. I didn't suggest it at first because it is most often used with sessions, but it has a RESTful model which, I think, is stateless.
I don't know Scalatra, so I don't know if it would be adaptable to this or not.