Why do I get ReadTimeout errors even with low load using Locust? - locust

I'm running load-tests with Locust. I have two user-types that inherit from the HttpUser class, and they both call the same endpoints with the same parameters using the Python Requests library.
Type A users make calls more often and Type B users make calls less often, but pass longer query-strings.
No matter how many users I spawn (very few, or very many) usually once the test has been running for ten to 15 minutes, the same error starts occurring, for just one User-type (User Type B):
ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host=\'server.name.com',port=443):Read timed out. (read=timeout=None)",),)'
User Type A (on high and low load) continues making all requests with no failures.
Again, both user types are making the same calls, and all the individual calls have been QA'ed (they all work). It's only under load tests where they seem to fail intermittently.
Here is a view into some of the code. Below is a task, an intermediary function, and an implementation of one of my api calls. My code repeats this structure for all calls, for both users.
Task
#task(5)
def get_one_audio_file(self):
self.do_task(self.get_one_audio_file_request)
def do_task(self, a_task):
a_task()
def get_one_audio_file_request(self):
path = files/audio/
self.client.get(path, name="files/audio", headers=headers, verify=False)
I'm at a loss to explain what's going on here.
If I run a load test with Type A users only, they test runs fine.
Any ideas?

Related

Background task in reactive pipeline (Fire-and-forget)

I have a reactive pipeline to process incoming requests. For each request I need to call a business-relevant function (doSomeRelevantProcessing).
After that is done, I need to notify some external service about what happened. That part of the pipeline should not increase the overall response time.
Also, notifying this external system is not business critical: giving a quick response after the main part of the pipeline is finished is more important than making sure the notification is successful.
As far as I learned, the only way to run something in the background without slowing down the overall process is to subscribe to in directly in the pipeline, thus achieving a fire-and-forget mentality.
Is there a good alternative to subscribing inside the flatmap?
I am a little worried about what might happen if notifying the external service takes longer than the original processing and a lot of requests are coming in at once. Could this lead to a memory exhaustion or the overall process to block?
fun runPipeline(incoming: Mono<Request>) = incoming
.flatMap { doSomeRelevantProcessing(it) } // this should not be delayed
.flatMap { doBackgroundJob(it) } // this can take a moment, but is not super critical
fun doSomeRelevantProcessing(request: Request) = Mono.just(request) // do some processing
fun doBackgroundJob(request: Request) = Mono.deferContextual { ctx: ContextView ->
val notification = "notification" // build an object from context
// this uses non-blocking HTTP (i.e. webclient), so it can take a second or so
notifyExternalService(notification).subscribeOn(Schedulers.boundedElastic()).subscribe()
Mono.just(Unit)
}
fun notifyExternalService(notification: String) = Mono.just(Unit) // might take a while
I'm answering this assuming that you notify the external service using purely reactive mechanisms - i.e. you're not wrapping a blocking service. If you are then the answer would be different as you're bound by the size of your bounded elastic thread pool, which could quickly become overwhelmed if you have hundreds of requests a second incoming.
(Assuming you're using reactive mechanisms, then there's no need for .subscribeOn(Schedulers.boundedElastic()) as you give in your example, as that's not buying you anything - it's designed for wrapping legacy blocking services.)
Could this lead to a memory exhaustion
It's only a possibility in really extreme cases, the memory used by each individual request will be tiny. It's almost certainly not worth worrying about, if you start seeing memory issues here then you'll almost certainly be hit by other issues elsewhere.
That being said, I'd probably recommend adding .timeout(Duration.ofSeconds(5)) or similar before your inner subscribe method to make sure the requests are killed off after a while if they haven't worked for any reason - this will prevent them building up.
...or [can this cause] the overall process to block?
This one is easier - a short no, it can't.

What is ZIO error channel and how to get a feeling about what to put in it?

ZIO (https://zio.dev/) is a scala framework which has at its core the ZIO[R, E, A] datastructure and its site gives the following information for the three parameters:
ZIO
The ZIO[R, E, A] data type has three type parameters:
R - Environment Type. The effect requires an environment of type R. If this type parameter is Any, it means the effect has no
requirements, because you can run the effect with any value (for
example, the unit value ()).
E - Failure Type. The effect may fail with a value of type E. Some applications will use Throwable. If this type parameter is
Nothing, it means the effect cannot fail, because there are no
values of type Nothing.
A - Success Type. The effect may succeed with a value of type A. If this type parameter is Unit, it means the effect produces no
useful information, while if it is Nothing, it means the effect runs
forever (or until failure).
It's easy to get what A is: it's the value returned by the function in the nominal case, ie why we coded the function for.
R is so kind of dependency injection - an interesting topic, but we can just ignore it to use ZIO by alway setting it to Any (and there is actually a IO[E, A] = ZIO[Any, E, A] alias in the lib).
So, it remains the E type, which is for error (the famous error channel). I roughtly get that IO[E, A] is kind of Either[E, A], but deals with effect (which is great).
My question is: why should I use an error channel EVERYWHERE in my application, and how can I decide what should go in the error channel?
1/ Why effect management with an error channel?
As a developper, one of your hardest task is to decide what is an error and what is not in your application - or more preciselly, to discover failure modes: what the nominal path (ie the goal of that code), what is an expected error that can be dealt with by the application in some way later on, and what are unexpected errors that the application can't deal with. There is no definitive answer for that question, it depends of the application and context, and so it's you, the developper, who needs to decide.
But the hardest task is to build an application that keeps its promises (your promises, since you chose what is an error and what is the nominal path) and that is not surprising so that users, administrators, and dev - including the futur you in two weeks - know what the code do in most cases without having to guess and have agency to adapt to that behavior, including to respond to errors.
This is hard, and you need a systematic process to deals with all the possible cases without going made.
The error channel in IO bi-monad (and thus ZIO) helps you for that task: the IO monad helps you keep track of effects, which are the source of most errors, and the error channel makes explicit what are the possible error cases, and so other parts of the application have agency to deal with them if they can. You will be able to manage your effects in a pure, consistant, composable way with explicit failure modes.
Moreover, in the case of ZIO, you can easely import non-pure code like legacy java extremelly easily:
val pure = ZIO.effect(someJavaCodeThrowingException)
2/ How do I choose what is an error?
So, the error channel provide a way to encode answer to what if? question to futur dev working on that code. "What if database is down?" "there's a DatabaseConnectionError".
But all what if are not alike for YOUR use case, for CURRENT application level. "What if user is not found?" - ah, it may be a totally expected answer at the low, "repository" level (like a "find" that didn't find anything), or it can be an error at an other level (like when you are in the process of authenticating an user, it should really be there). On the first case, you will likely not use the error channel: it's the nominal path, sometimes you don't find things. And in the second case, you will likelly use the error channel (UserNotFoundError).
So as we said, errors in error channel are typically for what if question that you may want to deal with in the application, just not at that function level. The first example of DatabaseConnectionError may be catch higher in the app and lead to an user message like "please try again" and a notification email to sysadmin ("quick, get a look, something if wrong here"). The UserNotFoundError will likely be managed as an error message for the user in the login form, something like "bad login or password, try again or recover credentials with that process".
So these cases (nominal and expected errors) are the easy parts. But there are some what if questions that your application, whatever the level, has no clue how to answer. "What if I get a memory exception when I try to allocate that object?" I don't have any clue, and actually, even if I had a clue, that's out of the scope of the things that I want to deal with for that application. So these errors DON'T go in the error channel. We call them failure and we crash the application when they happens, because it's likely that the application is now in an unknow, dangerous, zombie state.
Again, that choice (nominal path/error channel/failure) is your choice: two applications can make different choices. For example, a one-time-data-processing-app-then-discard-it will likelly treat all non-nominal paths as failures. There is a dev to catch the case in realtime and decide if it's important (see: Shell, Python, and any scripting where that strategy is heavely used - ok, sometimes even when there is no dev to catch errors:). On the other end of the specter, Nasa dev put EVERYTHING in the error channel(+), even memory CORRUPTION. Because it is an expected error, so the application need to know how to deal with that and continue.
(+)NOTE: AFAIK they don't use zio (for now), but the decision process about what is an error is the same, even in C.
To go further, I (#fanf42) gave a talk at Scala.io conference. The talk, "Ssytematic error management in application", is available in French here. Yes, French, I know - but slides are available in English here! And you can ping me (see contact info near the end of slide deck).

Safety of running assertions in a separate execution context

How isolated are different execution contexts from each other? Say we have two execution contexts ec1 and ec2 both used on the same code path implementing some user journey. If, say, starvation and crashing starts happening in ec2, would ec1 remain unaffected?
For example, consider the following scenario where we want to make sure user was charged only once by running an assertion inside a Future
chargeUserF andThen { case _ =>
getNumberOfChargesF map { num => assert(num == 0) }
.andThen { case Failure(e) => logger.error("User charged more than once! Fix ASAP!", e) }
}
Here getNumberOfChargesF is not necessary to fulfil user's request, it is just a side-concern where we assert on the expected state of the database after it was mutated by chargeUserF. Because it is not necessary I feel uneasy adding it to the main business logic out of fear it could break the main logic in some way. If I run getNumberOfChargesF on a different execution context from the one chargeUserF uses, can I assume issues such as starvation, blocking etc. caused by getNumberOfChargesF will not affect the main business logic?
Each execution context has its own thread pool, so, yeah ... kinda.
They are "independent" in the sense that if one runs out of threads, the other one might still keep going, however, they do use the same resource (cpu), so if that gets maxed out by one, the other will obviously be affected.
They are also affected by each other's side effects. For example, the way your code is written, chargeUser and getNumberOfCharges are happening in parallel, and there is no saying which one will finish first, so, if I am guessing the semantics right, the number of charges may end up being either 0 or 1 fairly randomly, depending on whether the previous future has completed or not.

QUnit and Sinon, testing XHR requests

I'm relatively new to unit testing and i'm trying to figure out a way to test an XHR request in a meaningful way.
1) The request pulls in various scripts and other resources onto the page, I want to make sure the correct number of resources are being loaded, and that the request is successful.
2) Should I use an actual request to the service that is providing the resource? I looked at fakeserver and fakexhr request on sinonjs.org, but I don't really get how those can provide a meaningful test.
3) I'm testing existing code, which I realize is pretty pointless, but it's what i'm required to do. That being said, there is alot of code in certain methods which could potentially be broken down into various tests. Should I break the existing code down and create tests for my interpreted expectation? Or just write tests for what is actually there?.... if that makes any sense.
Thanks,
-John
I find it useful to use the sinon fakeServer to return various test responses that will exercise my client-side functions. You can set up a series of tests in which a fakeServer response returns data that you can use to subsequently check the behaviour of your code. For example, suppose you expect ten resource objects to be returned, you can create pre-canned xml or json to represent those resources and then check that your code has handled them properly. In another test, what does your code do when you only receive nine objects?
Begin writing your tests to cover your existing code. When those tests pass, begin breaking up your code into easier-to-understand and meaningful units. If the tests still pass, then great, you've just refactored your code and not inadvertently broken anything. Also, now you've got smaller chunks of code that can more readily be tested and understood. From this point on you'll never look back :-)

hosting simple python scripts in a container to handle concurrency, configuration, caching, etc

My first real-world Python project is to write a simple framework (or re-use/adapt an existing one) which can wrap small python scripts (which are used to gather custom data for a monitoring tool) with a "container" to handle boilerplate tasks like:
fetching a script's configuration from a file (and keeping that info up to date if the file changes and handle decryption of sensitive config data)
running multiple instances of the same script in different threads instead of spinning up a new process for each one
expose an API for caching expensive data and storing persistent state from one script invocation to the next
Today, script authors must handle the issues above, which usually means that most script authors don't handle them correctly, causing bugs and performance problems. In addition to avoiding bugs, we want a solution which lowers the bar to create and maintain scripts, especially given that many script authors may not be trained programmers.
Below are examples of the API I've been thinking of, and which I'm looking to get your feedback about.
A scripter would need to build a single method which takes (as input) the configuration that the script needs to do its job, and either returns a python object or calls a method to stream back data in chunks. Optionally, a scripter could supply methods to handle startup and/or shutdown tasks.
HTTP-fetching script example (in pseudocode, omitting the actual data-fetching details to focus on the container's API):
def run (config, context, cache) :
results = http_library_call (config.url, config.http_method, config.username, config.password, ...)
return { html : results.html, status_code : results.status, headers : results.response_headers }
def init(config, context, cache) :
config.max_threads = 20 # up to 20 URLs at one time (per process)
config.max_processes = 3 # launch up to 3 concurrent processes
config.keepalive = 1200 # keep process alive for 10 mins without another call
config.process_recycle.requests = 1000 # restart the process every 1000 requests (to avoid leaks)
config.kill_timeout = 600 # kill the process if any call lasts longer than 10 minutes
Database-data fetching script example might look like this (in pseudocode):
def run (config, context, cache) :
expensive = context.cache["something_expensive"]
for record in db_library_call (expensive, context.checkpoint, config.connection_string) :
context.log (record, "logDate") # log all properties, optionally specify name of timestamp property
last_date = record["logDate"]
context.checkpoint = last_date # persistent checkpoint, used next time through
def init(config, context, cache) :
cache["something_expensive"] = get_expensive_thing()
def shutdown(config, context, cache) :
expensive = cache["something_expensive"]
expensive.release_me()
Is this API appropriately "pythonic", or are there things I should do to make this more natural to the Python scripter? (I'm more familiar with building C++/C#/Java APIs so I suspect I'm missing useful Python idioms.)
Specific questions:
is it natural to pass a "config" object into a method and ask the callee to set various configuration options? Or is there another preferred way to do this?
when a callee needs to stream data back to its caller, is a method like context.log() (see above) appropriate, or should I be using yield instead? (yeild seems natural, but I worry it'd be over the head of most scripters)
My approach requires scripts to define functions with predefined names (e.g. "run", "init", "shutdown"). Is this a good way to do it? If not, what other mechanism would be more natural?
I'm passing the same config, context, cache parameters into every method. Would it be better to use a single "context" parameter instead? Would it be better to use global variables instead?
Finally, are there existing libraries you'd recommend to make this kind of simple "script-running container" easier to write?
Have a look at SQL Alchemy for dealing with database stuff in python. Also to make script writing easier for dealing with concurrency look into Stackless Python.