I am pretty new to MongoDB.
I am in a scenario where it is possible for a system to invoke functions simultaneously many time.
I have gone through mongoDB Atlas function documentation and didn't find anything which speaks about scalability or concurrency issues.
Can a single function be invoked multiple times in parallel?
for example: Three different request trying to invoke same function will all three request be handled one by one or in parallel.
You can call the functions concurrently, provided the workload adheres to the App Services' limitation of 5000 concurrent requests. So, to address your point: if 3 different services try to invoke the same function at a time, they will be handled in parallel.
Additionally, you can use HTTPS Endpoints to expose a Function and trigger it through an endpoint call.
Related
I have a machine learning model inside a docker image. I pushed the docker image to google container registry and then deploy it inside a Kubernetes pod. There is a fastapi application that runs on Port 8000 and this Fastapi endpoint is public
(call it mymodel:8000).
The structure of fastapi is :
app.get("/homepage")
asynd def get_homepage()
app.get("/model):
aysnc def get_modelpage()
app.post("/model"):
async def get_results(query: Form(...))
User can put query and submit them and get results from the machine learning model running inside the docker. I want to limit the number of times a query can be made by all the users combined. So if the query limit is 100, all the users combined can make only 100 queries in total.
I thought of a way to do this:
Store a database that stores the number of times GET and POST method has been called. As soon as the total number of times POST has been called crosses the limit, stop accepting any more queries.
Is there an alternative way of doing this using Kubernetes limits? Such as I can define a limit_api_calls such that the total number of times mymodel:8000 is accessed is at max equal to limit_api_calls.
I looked at the documentation and I could only find setting limits for CPUs, Memory and rateLimits.
There are several approaches that could satisfy your needs.
Custom implementation: As you mentioned, keep in a persistence layer the number of API calls received and deny requests after it has been reached.
Use a service mesh: Istio (for instance) will let you limit the number of requests received and act as a circuit breaker.
Use an external Api Manager: Apigee will also let you limit and even charge your users, however if it is only for internal use (not pay per use) I definitely won't recommend it.
The tricky part is what you want to happen after the limit has been reached, if it is just a pod you may exit the application to finish and clear it.
Otherwise, if you have a deployment with its replica set and several resources associated with it (like configmaps), you probably want to use some kind of asynchronous alert or polling check to clean up everything related to your deployment. You may want to have a deep look at orchestrators like Airflow (Composer) and use several tools such as Helm for keeping deployments easy.
I am wondering how to process one message at a time using Googles pub/sub functionality in Go. I am using the official library for this, https://pkg.go.dev/cloud.google.com/go/pubsub#section-readme. The event is being consumed by a service that runs with multiple instances, so any in memory locking mechanism will not work.
I realise that it's an anti-pattern to do this, so let me explain my use-case. Using mongoDB I store an array of objects as an embedded document for each entity. The event being published is modifying parts of this array and saves it. If I receive more than one event at a time and they start processing exactly at the same time, one of the saves will override the other. So I was thinking a solution for this is to make sure that only one message will be processed at a time, and it would be nice to use any built-in functionality in cloud pub/sub to do so. Otherwise I was thinking of implementing some locking mechanism in the DB but i'd like to avoid that.
Any help would be appreciated.
You can imagine 2 things:
You can use ordering key in PubSub. Like that, all the message in relation with the same object will be delivered in order and one by one.
You can use a PUSH subscription to PubSub, to push to Cloud Run or Cloud Functions. With Cloud Run, set the concurrency to 1 (it's by default with Cloud Functions gen1), and set the max instance to 1 also. Like that you can process only one message at a time, all the other message will be rejected (429 HTTP error code) and will be requeued to PubSub. The problem is that you can parallelize the processing as before with ordering key
A similar thing, and simpler to implement, is to use Cloud Tasks instead of PubSub. With Cloud Tasks you can set a rate limit on a queue, and set the maxConcurrentDispatches to 1 (and you haven't to do the same with Cloud Functions max instances or Cloud Run max instances and concurrency)
Assuming there is a Flask web server that has two routes, deployed as a CloudRun service over GKE.
#app.route('/cpu_intensive', methods=['POST'], endpoint='cpu_intensive')
def cpu_intensive():
#TODO: some actions, cpu intensive
#app.route('/batch_request', methods=['POST'], endpoint='batch_request')
def batch_request():
#TODO: invoke cpu_intensive
A "batch_request" is a batch of many same structured requests - each one is highly CPU intensive and handled by the function "cpu_intensive". No reasonable machine can handle a large batch and thus it needs to be paralleled across multiple replicas.
The deployment is configured that every instance can handle only 1 request at a time, so when multiple requests arrive CloudRun will replicate the instance.
I would like to have a service with these two endpoints, one to accept "batch_requests" and only break them down to smaller requests and another endpoint to actually handle a single "cpu_intensive" request. What is the best way for "batch_request" break down the batch to smaller requests and invoke "cpu_intensive" so that CloudRun will scale the number of instances?
make http request to localhost - doesn't work since the load balancer is not aware of these calls.
keep the deployment URL in a conf file and make a network call to it?
Other suggestions?
With more detail, it's now clearer!!
You have 2 responsibilities
One to split -> Many request can be handle in parallel, no compute intensive
One to process -> Each request must be processed on a dedicated instance because of compute intensive process.
If your split performs internal calls (with localhost for example) you will be only on the same instance, and you will parallelize nothing (just multi thread the same request on the same instance)
So, for this, you need 2 services:
one to split, and it can accept several concurrent request
The second to process, and this time you need to set the concurrency param to 1 to be sure to accept only one request in the same time.
To improve your design, and if the batch processing can be asynchronous (I mean, the split process don't need to know when the batch process is over), you can add PubSub or Cloud Task in the middle to decouple the 2 parts.
And if the processing requires more than 4 CPUs 4Gb of memory, or takes more than 1 hour, use Cloud Run on GKE and not Cloud Run managed.
Last word: Now, if you don't use PubSub, the best way is to set the Batch Process URL in Env Var of your Split Service to know it.
I believe for this use case it's much better to use GKE rather than Cloud Run. You can create two kubernetes deployements one for the batch_request app and one for the cpu_intensive app. the second one will be used as worker for the batch_request app and will scale on demand when there are more requests to the batch_request app. I believe this is called master-worker architecture in which you separate your app front from intensive work or batch jobs.
I have a question regarding **writing test for HTTP DELETE method in JMeter using Concurrency Thread Group**. I want to measure **how many DELETEs** can it perform in certain amount of time for certain amount of Users (i.e. Threads) who are sending Concurrent HTTP (DELETE) Requests.
Concurrency Thread Group parameters are:
Target Concurrency: 50 (Threads)
RampUp Time: 10 secs
RampUp Steps Count: 5 secs
Hold Target Rate Time (sec): 5 secs
Threads Iterations Limit: infinite
The thing is that HTTP DELETE is idempotent operation i.e. if inovked on same resource (i.e. Record in database) it kind of doesn't make much sense. How can I achieve deletion of multiple EXISTING records in database by passing Entity's ID in URL? E.g.:
http://localhost:8080/api/authors/{id}
...where ID is being incremented for each User (i.e. Thread)?
My question is how can I automate deletion of multiple EXISTING rows in database (Postgres 11.8)...should I write some sort of script or is there other easier way to achieve that?
But again I guess it will probably perform multiple times same thing on same resources ID (e.g. HTTP DELETE will be invoked more than once on http://localhost:8080/api/authors/5).
Any help/advice is greatly appreciated.
P.S. I'm doing this to performance test my SpringBoot, Vert.X and Dropwizard RESTful Web service apps.
UPDATE1:
Sorry, I've didn't fully specify reason for writing these Test Use Case for my Web Service apps which communicate with Postgres DB. MAIN reason why I'm actually doing this testing is to test PERFORMANCES of blocking and NON-blocking WEB Server implementations for mentioned frameworks (SpringBoot, Dropwizard and Vert.X). Web servers are:
Blocking impelementations:
1.1. Apache Tomcat (SpringBoot)
1.2. Jetty (Dropwizard)
Non-blocking: Vert.X (uses own implementation based on Netty)
If I am using JMeter's JDBC Request in my Test Plan won't that actually slow down Test execution?
The easiest way is using either Counter config element or __counter() function in order to generate an incrementing number on each API hit:
More information: How to Use a Counter in a JMeter Test
Also the list of IDs can be obtained from the Postgres database via JDBC Request sampler and iterated using ForEach Controller
We have the first version of an application based on a microservice architecture. We used REST for external and internal communication.
Now we want to switch to AP from CP (CAP theorem)* and use a message bus for communication between microservices.
There is a lot of information about how to create an event bus based on Kafka, RabbitMQ, etc.
But I can't find any best practices for a combination of REST and messaging.
For example, you create a car service and you need to add different car components. It would make more sense, for this purpose, to use REST with POST requests. On the other hand, a service for booking a car would be a good task for an event-based approach.
Do you have a similar approach when you have a different dictionary and business logic capabilities? How do you combine them? Just support both approaches separately? Or unify them in one approach?
* for the first version, we agreed to choose consistency and partition tolerance. But now availability becomes more important for us.
Bottom line up front: You're looking for Command Query Responsibility Segregation; which defines an architectural pattern for breaking up responsibilities from querying for data to asking for a process to be run. The short answer is you do not want to mix the two in either a query or a process in a blocking fashion. The rest of this answer will go into detail as to why, and the three different ways you can do what you're trying to do.
This answer is a short form of the experience I have with Microservices. My bona fides: I've created Microservices topologies from scratch (and nearly zero knowledge) and as they say hit every branch on the way down.
One of the benefits of starting from zero-knowledge is that the first topology I created used a mixture of intra-service synchronous and blocking (HTTP) communication (to retrieve data needed for an operation from the service that held it), and message queues + asynchronous events to run operations (for Commands).
I'll define both terms:
Commands: Telling a service to do something. For instance, "Run ETL Batch job". You expect there to be an output from this; but it is necessarily a process that you're not going to be able to reliably wait on. A command has side-effects. Something will change because of this action (If nothing happens and nothing changes, then you haven't done anything).
Query: Asking a service for data that it holds. This data may have been there because of a Command given, but asking for data should not have side effects. No Command operations should need to be run because of a Query received.
Anyway, back to the topology.
Level 1: Mixed HTTP and Events
For this first topology, we mixed Synchronous Queries with Asynchronous Events being emitted. This was... problematic.
Message Buses are by their nature observable. One setting in RabbitMQ, or an Event Source, and you can observe all events in the system. This has some good side-effects, in that when something happens in the process you can typically figure out what events led to that state (if you follow an event-driven paradigm + state machines).
HTTP Calls are not observable without inspecting network traffic or logging those requests (which itself has problems, so we're going to start with "not feasible" in normal operations). Therefore if you mix a message based process and HTTP calls, you're going to have holes where you can't tell what's going on. You'll have spots where due to a network error your HTTP call didn't return data, and your services didn't continue the process because of that. You'll also need to hook up Retry/Circuit Breaker patterns for your HTTP calls to ensure they at least try a few times, but then you have to differentiate between "Not up because it's down", and "Not up because it's momentarily busy".
In short, mixing the two methods for a Command Driven process is not very resilient.
Level 2: Events define RPC/Internal Request/Response for data; Queries are External
In step two of this maturity model, you separate out Commands and Queries. Commands should use an event driven system, and queries should happen through HTTP. If you need the results of a query for a Command, then you issue a message and use a Request/Response pattern over your message bus.
This has benefits and problems too.
Benefits-wise your entire Command is now observable, even as it hops through multiple services. You can also replay processes in the system by rerunning events, which can be useful in tracking down problems.
Problems-wise now some of your events look a lot like queries; and you're now recreating the beautiful HTTP and REST semantics available in HTTP for messages; and that's not terribly fun or useful. As an example, a 404 tells you there's no data in REST. For a message based event, you have to recreate those semantics (There's a good Youtube conference talk on the subject I can't find but a team tried to do just that with great pain).
However, your events are now asynchronous and non-blocking, and every service can be refactored to a state-machine that will respond to a given event. Some caveats are those events should contain all the data needed for the operation (which leads to messages growing over the course of a process).
Your queries can still use HTTP for external communication; but for internal command/processes, you'd use the message bus.
I don't recommend this approach either (though it's a step up from the first approach). I don't recommend it because of the impurity your events start to take on, and in a microservices system having contracts be the same throughout the system is important.
Level 3: Producers of Data emit data as events. Consumers Record data for their use.
The third step in the maturity model (and we were on our way to that paradigm when I departed from the project) is for services that produce data to issue events when that data is produced. That data is then jotted down by services listening for those events, and those services will use that (could be?) stale data to conduct their operations. External customers still use HTTP; but internally you emit events when new data is produced, and each service that cares about that data will store it to use when it needs to. This is the crux of Michael Bryzek's talk Designing Microservices Architecture the Right way. Michael Bryzek is the CTO of Flow.io, a white-label e-commerce company.
If you want a deeper answer along with other issues at play, I'll point you to my blog post on the subject.