The target server failed to respond for multiple iterations in Jmeter - server

In my Jmeter script,I am getting error for 2nd iteration.
For multiple users with single iteration, no errors were observed, but with multiple iterations am getting error with below message
Response code: Non HTTP response code: org.apache.http.NoHttpResponseException
Response message: Non HTTP response message: The target server failed to respond
Response data is The target server failed to respond
Error Snapshot
Could you please suggest me what could be reason behind this error
Thanks in advance

Most likely your server becomes overloaded. In regards to possible reason my expectation is that single iteration does not deliver the full concurrency as JMeter acts like:
JMeter starts all the virtual users within the specified ramp-up period
Each virtual user starts executing samplers
When there are no more samplers to execute and no loops to iterate - the thread is being shut down
So with 1 iteration you may run into situation when some threads have already finished their job and the others have not been started yet. When you add more iterations the "old" threads start over and "new" are arriving. The situation is explained in the JMeter Test Results: Why the Actual Users Number is Lower than Expected article and you can monitor the actual delivered load using Active Threads Over Time chart of the HTML Reporting Dashboard or Active Threads Over Time Listener available via JMeter Plugins
To get to the bottom of the failure I would recommend checking the following:
components logs on the application under test side (application logs, application/web server logs, database logs)
application under test baseline health metrics (CPU, RAM, Disk, etc.). You can use JMeter PerfMon Plugin, this way you will be able to correlate increasing load with resources consumption

Related

ADF Dataflow stuck IN progress and fail with below errors

ADF Pipeline DF task is Stuck in Progress. It was working seamlessly last couple of months but suddenly Dataflow stuck in progress and Time out after certain time. We are using IR managed Virtual Network. I am using forereach loop to run data flow for multiple entities parallel, it always randomly get stuck on last Entity.
What can I try to resolve this?
Error in Dev Environment
Error Code 4508
Spark cluster not found
Error in Prod Environment:
Error code
5000
Failure type
User configuration issue
Details
[plugins.*** ADF.adf-ir-001 WorkspaceType:<ADF> CCID:<f289c067-7c6c-4b49-b0db-783e842a5675>] [Monitoring] Livy Endpoint=[https://hubservice1.eastus.azuresynapse.net:8001/api/v1.0/publish/815b62a1-7b45-4fe1-86f4-ae4b56014311]. Livy Id=[0] Job failed during run time with state=[dead].
Images:
I tried below steps:
By changing IR configuring as below
Tried DF Retry and retry Interval
Also, tried For each loop one batch at a time instead of 4 batch parallel. None of the above trouble-shooting steps worked. These PL is running last 3-4 months without a single failure, suddenly they started to fail last 3 days consistently. DF flow always stuck in progress randomly for different entity and times out in one point by throwing above errors.
Error Code 4508 Spark cluster not found.
This error can cause because of two reasons.
The debug session is getting closed till the dataflow finish its transformation in this case recommendation is to restart the debug session
the second reason is due to resource problem, or an outage in that particular region.
Error code 5000 Failure type User configuration issue Details [plugins.*** ADF.adf-ir-001 WorkspaceType: CCID:] [Monitoring] Livy Endpoint=[https://hubservice1.eastus.azuresynapse.net:8001/api/v1.0/publish/815b62a1-7b45-4fe1-86f4-ae4b56014311]. Livy Id=[0] Job failed during run time with state=[dead].
A temporary error is one that says "Livy job state dead caused by unknown error." At the backend of the dataflow, a spark cluster is used, and this error is generated by the spark cluster. to get the more information about error go to StdOut of sparkpool execution.
The backend cluster may be experiencing a network problem, a resource problem, or an outage.
If error persist my suggestion is to raise Microsoft support ticket here

Possible Stuckness: Google Cloud PubSub to Cloud Storage

I have a Dataflow streaming job that writes PubSub messages to a file that gets stored in Cloud Storage in 3-minute windows. After a few hours I notice on the "Data Freshness by stages" graph it displays "Possible Stuckness" and "Possible slowness".
I have checked the logs and the info logs displays the follow: "Setting socket default timeout to 60 seconds."; "socket default timeout is 60.0 seconds."; "Attempting refresh to obtain initial access_token."; "Refreshing due to a 401 (attempt 1/2)". That last log kept repeating every few minutes for four hours before the job displayed that there was possible slowness/stuckness.
I am not entirely sure what is happening here. Are these logs related to why the job slowed down and got stuck?
The "potential stuckness" and "potential slowness" are basically the same thing, they are documented here.
The logs might be red herrings.
You can view all available logs following here by their categories: job-message, worker, worker-startup and etc. Try
identify if there is any worker logs to determine whether workers are successfully started with dependencies installed;
search "Operation ongoing" to see whether any work item is taking too much time;
search if there is any error in workers that is blocking the streaming job from making progress.

Batch account node restarted unexpectedly

I am using an Azure batch account to run sqlpackage.exe in order to move databases from a server to another. A task that has started 6 days ago has suddenly been restarted and started from the beginning after 4 days of running (extremely large databases). The task run uninterruptedly up until then and should have continued to run for about 1-2 days.
The PowerShell script that contains all the logic handles all the exceptions that could occur during the execution. Also, the retry count for the task was set to 0 in case it fails.
Unfortunately, I did not have diagnostics settings configured and I could only look at the metrics and there was a short period when there wasn't any node.
What can be the causes for this behavior? Restarting while the node is still running
Thanks
Unfortunately, there is no way to give a definitive answer to this question. You will need to dig into the compute node (interactively log in) and check system logs to give you details on why the node restarted. There is no guarantee that a compute node will have 100% uptime as there may be hardware faults or other service interruptions.
In general, it's best practice to have long running tasks checkpoint progress combined with a retry policy. Programs that can reload state can pick up at the time of the checkpoint when the Batch service automatically reschedules the task execution. Please see the Batch best practices guide for more information.

24 hours performance test execution stopped abruptly running in jmeter pod in AKS

I am running load test of 24 hours using Jmeter in Azure Kubernetes service. I am using Throughput shaping timer in my jmx file. No listener is added as part of jmx file.
My test stopped abruptly after 6 or 7 hrs.
jmeter-server.log file under Jmeter slave pod is giving warning --> WARN k.a.j.t.VariableThroughputTimer: No free threads left in worker pool.
Below is snapshot from jmeter-server.log file.
Using Jmeter version - 5.2.1 and Kubernetes version - 1.19.6
I checked, Jmeter pods for master and slaves are continously running(no restart happened) in AKS.
I provided 2GB memory to Jmeter slave pod still load test is stopped abruptly.
I am using log analytics workspace for logging. Checked ContainerLog table not getting error.
Snapshot of JMX file.
Using following elements -> Thread Group, Throughput Controller, Http request Sampler and Throughput Shaping Timer
Please suggest for same.
It looks like your Schedule Feedback Function configuration is wrong in its last parameter
The warning means that the Throughput Shaping Timer attempts to increase the number of threads to reach/maintain the desired concurrency but it doesn't have enough threads in order to do this.
So either increase this Spare threads ration to be closer to 1 if you're using a float value for percentage or increment the absolute value in order to match the number of threads.
Quote from documentation:
Example function call: ${__tstFeedback(tst-name,1,100,10)} , where "tst-name" is name of Throughput Shaping Timer to integrate with, 1 and 100 are starting threads and max allowed threads, 10 is how many spare threads to keep in thread pool. If spare threads parameter is a float value <1, then it is interpreted as a ratio relative to the current estimate of threads needed. If above 1, spare threads is interpreted as an absolute count.
More information: Using JMeter’s Throughput Shaping Timer Plugin
However it doesn't explain the premature termination of the test so ensure that there are no errors in jmeter/k8s logs, one of the possible reasons is that JMeter process is being terminated by OOMKiller

Node-red app in Bluemix crashes when performance testing

I have a node-red app in Bluemix that contains 2 flows.
The first flow has 3 nodes: an Http In node, a function that reformats from 1 json object to another and an Mqlight node.
The 2nd flow has an mqlignt input, a batcher node to batch so many messages together, a couple nodes to reformat and then an http request node to put the message to a cloudant database.
I have been trying to performance test this. I feed it 1000-5000 messages over a few minutes and it crashes before all the messages are put to the database. The error just says exit status 255: CRASHED. I do not see any additional data in the logs.
Any help would be appreciated.
See attached screen prints.
Memory usage: 353MB/1.625GB
Disk usage: 333MB/1GB
CPU: .3%
CRASH ERROR UPDATED 4/4/2016 with the error from the crash