Design of a pipeline that invokes a maximum number of requests per second - apache-beam

My goal is to create a pipeline that invokes a back-end (Cloud hosted) service a maximum number of times per second ... how can I achieve that?
Back story: Imagine a back-end service that is invoked with a single input and returns a single output. This service has quotas associated with it that permit a maximum number of requests per second (let's say 10 requests per second). Now imagine an unbounded source PCollection where I wish to transform the elements in the input by passing them through my back-end service. I can envisage a ParDo invoking the back-end service once for each element in the input PCollection. However, this doesn't perform any kind of flow control against the back-end.
I could imagine my DoFn logic testing the response from the back-end response and retrying till it succeeds but this doesn't feel right. If I have 100 workers, then I seem to be burning a lot of resources and putting a load on the back-end. What I think I want to do is throttle the calls to the back-end from the pipeline.

Good Day, kolban. In addition to Bruno Volpato's helpful RampupThrottlingFn example, I've seen a combination of the following. Please do not hesitate at all to let me know how I can update the example with more clarity.
PeriodicImpulse - emits an Instant at a fixed specified interval.
Fix the number of workers with the maxNumWorkers and numWorkers (Please see Dataflow Pipeline Options), if using the Dataflow runner.
Beam Metrics API to monitor the actual resource request count over time and set alerts. When using Dataflow, the Beam Metrics API automatically connects to Cloud Monitoring as Custom metrics
The following shows abbreviated code starting from the whole pipeline followed by some details as needed to provide clarity. It assumes a target of 10 workers, using Dataflow with the arguments --maxNumWorkers=10 and --numWorkers=10 and a goal to limit the resource requests among all workers to 10 requests per second. This translates to 1 request per second per worker.
PeriodicImpulse limits the Request creation to 1 per second
public class MyPipeline {
public static void main(String[] args) {
Pipeline pipeline = Pipeline.create(/* Usually with options */);
PCollection<Response> responses = pipeline.apply(
"PeriodicImpulse",
PeriodicImpulse
.create()
.withInterval(Duration.standardSeconds(1L))
).apply(
"Build Requests",
ParDo.of(new RequestFn())
)
.apply(ResourceTransform.create());
}
}
RequestFn DoFn emits Requests per Instant emitted from PeriodicImpulse
class RequestFn extends DoFn<Instant, Request> {
#ProcessElement
public void process(#Element Instant instant, OutputReceiver<Request> receiver) {
receiver.output(
Request.builder().build()
);
}
}
ResourceTransform transforms Requests to Responses, incrementing a Counter
class ResourceTransform extends PTransform<PCollection<Request>, PCollection<Response>> {
static ResourceTransform create() {
return new ResourceTransform();
}
public PCollection<Response> expand(PCollection<Request> input) {
return ParDo.of("Consume Resource", new ResourceFn());
}
}
class ResourceFn extends DoFn<Request, Response> {
private Counter counter = Metrics.counter(ResourceFn.class, "some:resource");
private transient ResourceClient client = null;
#Setup
public void setup() {
client = new ResourceClient();
}
#ProcessElement
public void process(#Element Request request, OutputReceiver<> receiver)
{
counter.inc(); // Increment the counter.
// not showing error handling
Response response = client.execute(request);
receiver.output(response);
}
}
Request and Response classes
(Aside: consider creating a Schema for the request input and response output classes. Example below uses AutoValue and AutoValueSchema)
#DefaultSchema(AutoValueSchema.class)
#AutoValue
abstract class Request {
/* abstract Getters. */
abstract String getId();
#AutoValue.Builder
static abstract class Builder {
/* abstract Setters. */
abstract Builder setId(String value);
abstract Request build();
}
}
#DefaultSchema(AutoValueSchema.class)
#AutoValue
abstract class Response {
/* abstract Getters. */
abstract String getId();
#AutoValue.Builder
static abstract class Builder {
/* abstract Setters. */
abstract Builder setId(String value);
abstract Response build();
}
}

Related

Spring Retry not retrying in Spring Batch

I have a config file that has:
#Configuration
#ComponentScan(basePackages = { "xxxxxxxx", "xxxxxxxx" })
#EnableBatchProcessing
#Import(DataSourceConfig.class)
#EnableRetry
public class BatchConfiguration extends DefaultBatchConfigurer {
and another method in a separate class that has:
#Override
#Retryable(ValidationException.class)
public FlowExecutionStatus decide(JobExecution jobExec, StepExecution stepExec) {
boolean passed = validationStatus.getValidationStatus();
if(!passed) {
LOG.info("******Batch job validation FAILED******");
throw new ValidationException("Batch job validation FAILED");
}
else return FlowExecutionStatus.COMPLETED;
}
At the very least it should retry and print this 3 times instead of once, since it does not pass. All the annotations are imported successfully. Its just not doing what I would expect.:
******Batch job validation FAILED******
All I get is the stacktrace for the one ValidationException
Caused by: org.springframework.batch.item.validator.ValidationException: Batch job validation FAILED
at com.xxxxx.xxxxx.decision.ValidationFlowDecider.decide(ValidationFlowDecider.java:31)
The decide method is part of the JobExecutionDecider contract and will be driven by the flow you defined for your job. So if you want to "retry" some logic in your job flow, you should define it in the flow definition itself, not using an annotated method (with Spring Retry or any other library).
A typical usage of a decider is shown with a code example in the reference documentation here: Programmatic Flow Decisions.

Axon Framework - Configuring Multiple EventStores in Axon Configuration

We are having an usecase wherein each aggregate root should have different eventstores. We have used the following configuration where currently , we have only one event-store configured as below
#Configuration
#EnableDiscoveryClient
public class AxonConfig {
private static final String DOMAIN_EVENTS_COLLECTION_NAME = "coll-capture.domainEvents";
//private static final String DOMAIN_EVENTS_COLLECTION_NAME_TEST =
//"coll-capture.domainEvents-test";
#Value("${mongodb.database}")
private String databaseName;
#Value("${spring.application.name}")
private String appName;
#Bean
public RestTemplate restTemplate() {
CloseableHttpClient httpClient = HttpClientBuilder.create().build();
HttpComponentsClientHttpRequestFactory clientHttpRequestFactory = new
HttpComponentsClientHttpRequestFactory(httpClient);
return new RestTemplate(clientHttpRequestFactory);
}
#Bean
#Profile({"uat", "prod"})
public CommandRouter springCloudHttpBackupCommandRouter(DiscoveryClient discoveryClient,
Registration localInstance,
RestTemplate restTemplate,
#Value("${axon.distributed.spring-
cloud.fallback-url}") String messageRoutingInformationEndpoint) {
return new SpringCloudHttpBackupCommandRouter(discoveryClient,
localInstance,
new AnnotationRoutingStrategy(),
serviceInstance -> appName.equalsIgnoreCase(serviceInstance.getServiceId()),
restTemplate,
messageRoutingInformationEndpoint);
}
#Bean
public Repository<TestEnquiry> testEnquiryRepository(EventStore eventStore) {
return new EventSourcingRepository<>(TestEnquiry.class, eventStore);
}
#Bean
public Repository<Test2Enquiry> test2enquiryRepository(EventStore eventStore) {
return new EventSourcingRepository<>(Test2Enquiry.class, eventStore);
}
#Bean
public EventStorageEngine eventStorageEngine(MongoClient client) {
MongoTemplate mongoTemplate = new DefaultMongoTemplate(client, databaseName)
.withDomainEventsCollection(DOMAIN_EVENTS_COLLECTION_NAME);
return new MongoEventStorageEngine(mongoTemplate);
}
}
Now , We want to configure "DOMAIN_EVENTS_COLLECTION_NAME_TEST"(just for example) as well in EventStorageEngine. How we can achieve the same support for multiple event-stores and select the tracking process as which collection they should be part of
If you are going the route of segregating the event streams, then combining them from an event handling perspective could become a necessity indeed. Especially when having several bounded contexts, segregating the event streams into distinct storage solutions is reasonable.
If you want to define which [message source / event store] is used by a TrackingEventProcessor, you will have to deal with the EventProcessingConfigurer. More specifically, you should invoke the EventProcessingConfigurer#registerTrackingEventProcessor(String, Function<Configuration, StreamableMessageSource<TrackedEventMessage<?>>>) method. The first String parameter is the name of the processor you want to configure as being "tracking". The second parameter defines a Function which gives you the message source to be used by this TrackingEventProcessor (TEP). It is here where you should provide the event store you want this TEP to ingest events from.
Pairing them up at a later stage could also occur of course, which is also supported by Axon Framework. This boils down to a specific form of StreamableMessageSource implementation.
More specifically, you can use the MultiStreamableMessageSource, where you can connect any number of StreamableMessageSources together.
Note that Axon's EmbeddedEventStore is in essence an implementation of a StreamableMessageSource. Once the MultiStreamableMessageSource, you will have to specify it as the messageSource for your TrackingEventProcessors of course.
Last note, know that this solution can only be used when you are using TrackingEventProcessors, as those are the only Event Processors provided by Axon ingesting a StreamableMessageSource as the source for it's events.

Dumping bad requests

I have a service implemented with Dropwizard and I need to dump incorrect requests somewhere.
I saw that there is a possibility to customise the error message by registering ExceptionMapper<JerseyViolationException>. But I need to have the complete request (headers, body) and not only ConstraintViolations.
You can inject ContainerRequest into the ExceptionMapper. You need to inject it as a javax.inject.Provider though, so that you can lazily retrieve it. Otherwise you will run into scoping problems.
#Provider
public class Mapper implements ExceptionMapper<ConstraintViolationException> {
#Inject
private javax.inject.Provider<ContainerRequest> requestProvider;
#Override
public Response toResponse(ConstraintViolationException ex) {
ContainerRequest request = requestProvider.get();
}
}
(This also works with constructor argument injection instead of field injection.)
In the ContainerRequest, you can get headers with getHeaderString() or getHeaders(). If you want to get the body, you need to do a little hack because the entity stream is already read by Jersey by the time the mapper is reached. So we need to implement a ContainerRequestFilter to buffer the entity.
public class EntityBufferingFilter implements ContainerRequestFilter {
#Override
public void filter(ContainerRequestContext containerRequestContext) throws IOException {
ContainerRequest request = (ContainerRequest) containerRequestContext;
request.bufferEntity();
}
}
You might not want this filter to be called for all requests (for performance reasons), so you might want to use a DynamicFeature to register the filter just on methods that use bean validation (or use Name Binding).
Once you have this filter registered, you can read the body using ContainerRequest#readEntity(Class). You use this method just like you would on the client side with Response#readEntity(). So for the class, if you want to keep it generic, you can use String.class or InputStream.class and convert the InputStream to a String.
ContainerRequest request = requestProvider.get();
String body = request.readEntity(String.class);

Error using "condition paramter header" #StreamListener of new release Chelsea.RC1

I am trying to use the event filter to reduce the amount of topics the application uses using the new feature available in the new release of the spring cloud stream (Chelsea.RC1). The message is being created, with the correct header, however, inspecting the contents of the message in the queue, the message does not contain the header, only the body with the payload.
public void sendEnroll(EnrollCommand data) {
//MessageChannel
outputEnroll.send(MessageBuilder
.withPayload(data)
.setHeader("brand", "MASTERCARD")
.setHeader("operation", Operation.ENROLL).build());
}
Consumer
#Service
#EnableBinding(Channel.class)
public class EnrollConsumer {
#Autowired
private EnrollService service;
#StreamListener(target = Channel.INPUT_ENROLL, condition = "headers['brand']=='MASTERCARD'")
public void enrollConsumer(#Payload String command){
System.out.println(command);
//service.enrollment(command);
}
}
In consumer service, it gives the following warning:
WARN -kafka-listener-1 o.s.c.s.b.DispatchingStreamListenerMessageHandler:62 - Cannot find a #StreamListener matching for message with id: 7baae934-7484-a7fd-91b0-ba906558bb13
You have to map that your custom headers:
spring.cloud.stream.kafka.binder.headers = brand,operation
That information is present in the documentation.

What is the difference between factory and pipeline design patterns?

What is the difference between factory and pipeline design patterns?
I am asking because I need making classes, each of which has a method that will transform textual data in a certain way.
I have other classes whose data needs to be transformed. However, the order and selection of the transformations depends on (and only on) which base class from which these classes inherit.
Is this somehow related pipeline and/or a factory pattern?
Factory creates objects without exposing the instantiation logic to the client and refers to the newly created object through a common interface. So, goal is to make client completely unaware of what concrete type of product it uses and how that instance created.
public interface IFactory // used by clients
{
IProduct CreateProduct();
}
public class FooFactory : IFactory
{
public IProduct CreateProduct()
{
// create new instance of FooProduct
// setup something
// setup something else
// return it
}
}
All creation details are encapsulated. You can create instance via new() call. Or you can clone some existing sample FooProduct. You can skip setup. Or you can read some data from database before. Anything.
Here we go to Pipeline. Pipeline purpose is to divide a larger processing task into a sequence of smaller, independent processing steps (Filters). If creation of your objects is a large task AND setup steps are independent, you can use pipeline for setup inside factory. But instantiation step definitely not independent in this case. It mast occur prior to other steps.
So, you can provide Filters (i.e. Pipeline) to setup your product:
public class BarFilter : IFilter
{
private IFilter _next;
public IProduct Setup(IProduct product)
{
// do Bar setup
if (_next == null)
return product;
return _next.Setup(product);
}
}
public abstract class ProductFactory : IProductFactory
{
protected IFilter _filter;
public IProduct CreateProduct()
{
IProduct product = InstantiateProduct();
if (_filter == null)
return product;
return _filter.Setup(product);
}
protected abstract IProduct InstantiateProduct();
}
And in concrete factories you can setup custom set of filters for your setup pipeline.
Factory is responsible for creating objects:
ICar volvo = CarFactory.BuildVolvo();
ICar bmw = CarFactory.BuildBMW();
IBook pdfBook = BookFactory.CreatePDFBook();
IBook htmlBook = BookFactory.CreateHTMLBook();
Pipeline will help you to separate processing into smaller tasks:
var searchQuery = new SearchQuery();
searchQuery.FilterByCategories(categoryCriteria);
searchQuery.FilterByDate(dateCriteria);
searchQuery.FilterByAuthor(authorCriteria);
There is also a linear pipeline and non-linear pipeline. Linear pipeline would require us to filter by category, then by date and then by author. Non-linear pipeline would allow us to run these simultaneously or in any order.
This article explains it quite well:
http://www.cise.ufl.edu/research/ParallelPatterns/PatternLanguage/AlgorithmStructure/Pipeline.htm