ServiceProxy for current Service - azure-service-fabric

I am trying to figure out how to generate a ServiceProxy that points to the service that I'm currently executing within. I have a need to convey information about it, so that another service can call back into this specific instance: since it's stateful.
ServiceProxy seems to do resolution by partition keys. However, I don't see how I can obtain a partition key for the currently executing service. I can obtain the partition Guid. But, ServiceProxy cannot be used with that.
Example use case: I have a StatefulService which invokes an external HTTP API. It posts a message to this API, which results in the API calling back into my infrastructure after some period. The HTTP endpoint that I have built needs to resolve the original StatefulService in order to route the response back to it.

You can get the key range for the partition and send the low key value to the external HTTP API. When that external API needs to resolve the partition for the call back, it can use the low key value as the partition key, which guarantees it will fall in the right partition range:
Int64RangePartitionInformation partitionInfo = this.ServicePartition.PartitionInfo as Int64RangePartitionInformation;
long lowKey = partitionInfo.LowKey;

Related

Handling multiple requests with same body - REST API

Let's say I have a micro service which just registers a user into the database and we expose it to our client. I want to understand what's the better way of handling the following scenario,
What if the user sends multiple requests in parallel(say 10 requests within the 1 second) with same request body. Should I keep the requests in a queue and register for the very first user and deny all the other 9 requests, or should I classify each request and compare whichever having similar request body and if any of them has different request body shall be picked up one each and rest are rejected? or What's the best thing I can do to handle this scenario?
One more thing I would like to understand, is it recommended to have rate-limiting (say n requests per minute) on a global API level or micro-service level?
Thanks in advance!
The best way is to use an idempotent call. Instead of exposing an endpoint like this :
POST /users + payload
Expose an endpoint like this :
PUT /user/ID + payload
You let the caller generate the id, and you ask for an UUID. With UUID, no matter who generates it. This way, if caller invokes your endpoint multiple times, the first time you will create the user, the following times you will juste update the user with the same payload, which means you'll do nothing. At least you won't generate duplicates.
It's always a good practice to protect your services with rate-limiting. You have to set it at API level. If you define it at microservice level, you will authorize N times the rate if you have N instances, because you will ditribute the requests.

Kogito - wait until data from multiple endpoints is received

I am using Kogito with Quarkus. I have set on drl rule and am using a bpmn configuration. As can be seen below, currently one endpoint is exposed, that starts the process. All needed data is received from the initial request, it is then evaluated and process goes on.
I would like to extend the workflow to have two separate endpoints. One to provide the age of the person and another to provide the name. The process must wait until all needed data is gathered before it proceeds with evaluation.
Has anybody come across a similar solution?
Technically you could use a signal or message to add more data into a process instance before you execute the rules over the entire data, see https://docs.kogito.kie.org/latest/html_single/#ref-bpmn-intermediate-events_kogito-developing-process-services.
In order to do that you need to have some sort of correlation between these events, otherwise, how do you map that event name 1 should be matched to event age 1. If you can keep the process instance id, then the second event can either trigger a rest endpoint to the specific process instance or send it a message via a message broker.
You also have your own custom logic to aggregate the events and only fire a new process instance once your criteria of complete data is met, and there is also plans in Kogito to extend the capabilities of how correlation is done, allowing for instance to use variables of the process as the identifier. For example, if you have person.id as correlation and event to name and age of the same id would signal the same process instance. HOpe this info helps.

External processing using Kafka Streams

There are several questions regarding message enrichment using external data, and the recommendation is almost always the same: ingest external data using Kafka Connect and then join the records using state stores. Although it fits in most cases, there are several other use cases in which it does not, such as IP to location and user agent detection, to name a few.
Enriching a message with an IP-based location usually requires a lookup by a range of IPs, but currently, there is no built-in state store that provides such capability. For user agent analysis, if you rely on a third-party service, you have no choices other than performing external calls.
We spend some time thinking about it, and we came up with an idea of implementing a custom state store on top of a database that supports range queries, like Postgres. We could also abstract an external HTTP or GRPC service behind a state store, but we're not sure if it is the right way.
In that sense, what is the recommended approach when you cannot avoid querying an external service during the stream processing, but you still must guarantee fault tolerance? What happens when an error occurs while the state store is retrieving data (a request fails, for instance)? Do Kafka Streams retry processing the message?
Generally, KeyValueStore#range(fromKey, toKey) is supported by build-in stores. Thus, it would be good to understand how the range queries you try to do are done? Also note, that internally, everything is stored as byte[] arrasy and RocksDB (default storage engine) sorts data accordingly -- hence, you can actually implement quite sophisticated range queries if you start to reason about the byte layout, and pass in corresponding "prefix keys" into #range().
If you really need to call an external service, you have "two" options to not lose data: if an external calls fails, throw an exception and let the Kafka Streams die. This is obviously not a real option, however, if you swallow error from the external lookup you would "skip" the input message and it would be unprocessed. Kafka Streams cannot know that processing "failed" (it does not know what your code does) and will not "retry", but consider the message as completed (similar if you would filter it out).
Hence, to make it work, you would need to put all data you use to trigger the lookup into a state store if the external call fails, and retry later (ie, do a lookup into the store to find unprocessed data and retry). This retry can either be a "side task" when you process the next input message, of you schedule a punctuation, to implement the retry. Note, that this mechanism changes the order in which records are processed, what might or might not be ok for your use case.

Pessimistic locking mechanism with IReliableQueue in Azure Service Fabric

I understand locking is scoped per transaction for IReliableQueue in Service Fabric. I have a requirement where once the data is read from the ReliableQueue within a transaction, I need to pass the data back to my client and preserve the lock on that data for a certain duration and if the processing fails in client, then write the data back to queue (preferably at the head so that it is picked first in next iteration).
Service Fabric doesn't support this. I recommend you look into using an external queuing mechanism for this. For example, Azure Service Bus Queues provides the functionality you describe.
You can use this package to receive SB messages within your services.
preserve the lock on that data for a certain duration
We made that once or twice too in other contexts with success using modifiable-lists and a document-field LockedUntillUtc (initialized to mininimum or null, or using a different reliable collection of locked keys (sorted on LockedUntillUtc?) - which best suites your needs?).
If you can't trust your clients to adhere to such a lock-request and write/un-lock-request contract, consider an ETag pattern - only returned on a successfull lock-request...

Kafka Streams - accessing data from the metrics registry

I'm having a difficult time finding documentation on how to access the data within the Kafka Streams metric registry, and I think I may be trying to fit a square peg in a round hole. I was hoping to get some advice on the following:
Goal
Collect metrics being recorded in the Kafka Streams metrics registry and send these values to an arbitrary end point
Workflow
This is what I think needs to be done, and I've complete all of the steps except the last (having trouble with that one because the metrics registry is private). But I may be going about this the wrong way:
Define a class that implements the MetricReporter interface. Build a list of the metrics that Kafka creates in the metricChange method (e.g. whenever this method is called, update a hashmap with the currently registered metrics).
Specify this class in the metric.reporters configuration property
Set up a process that polls the Kafka Streams metric registry for the current data, and ship the values to an arbitrary end point
Anyways, the last step doesn't appear to be possible in Kafka 0.10.0.1 since the metrics registry isn't exposed. Could some please let me know this if is the correct workflow (sounds like it's not..), or if I am misunderstanding the process for extracting the Kafka Streams metrics?
Although the metrics registry is not exposed, you can still get the value of a given KafkaMetric by its KafkaMetric.value() / KafkaMetric.value(timestamp) methods. For example, as you observed in the JMXRporter, it keeps the list of KafkaMetrics from the instantiated init() and metricChange/metricRemoval methods, and then in its MBean implementation, when getAttribute is called, it will call its corresponding KafkaMetrics.value() function. So for your customized reporter, you can apply similar patterns, for example, periodically poll all kept KafkaMetrics.value() and then pipe the results to your end point.
The MetricReporter interface in org.apache.kafka.common.metrics already enables you to manage all Kafka stream metrics in the reporter. So kafka internal registry is not needed.