How to troubleshoot and reduce communication overhead on Rockwell ControlLogix - ethernet

Need help. We have a plc that's cpu keeps getting maxed out. We've already upgraded it once. Now we need work on optimize it.
We have over 50 outgoing msg instructions, 60 incoming, and 103 number of ethernet devices (flow meters, drives, etc) I've gone through and tried to make sure everything is cached that can be, only instructions that are currently needed are running, and communication to the same plc happen in the same scan, but I haven't made a dent.
I'm having trouble identifying which instructions are significant. It seems the connections will be consolidated so lots of msgs shouldn't be too big of a problem. Considering Produced & Consumed tags but our team isn't very familiar with them and I believe you have to do a download to modify them, which is a problem. Our IO module RPIs are all set to around 200ms, but that didn't seem to make a difference (from 5ms).
We have a shutdown this weekend and I plan on disabling everything and turning it back on one part at a time to see where the load is really coming from.
Does anyone have any suggestions? The task monitor doesn't have a lot of detail that I can understand, i.e. It's either too summarized or too instant for me to make heads or tales of it. Here is a couple screens from the Task Monitor to shed some light on what I'm seeing.

First question coming to mind is are you using the Continues Task or is all in Periodic tasks?

I had a similar issue many years ago with a CLX. Rockwell suggested increasing the System Overhead Time Slice to around 40 to 50%. The default is 20%.
Some details:
Look at the System Overhead Time Slice (go to Advanced tab under Controller Properties). Default is 20%. This determines the time the controller spends running its background tasks (communications, messaging, ASCII) relative to running your continuous task.
From Rockwell:
For example, at 25%, your continuous task accrues 3 ms of run time. Then the background tasks can accrue up to 1 ms of run time, then the cycle repeats. Note that the allotted time is interrupted, but not reduced, by higher priority tasks (motion, user periodic or event tasks).
Here is a detailed Word Doc from Rockwell:
https://rockwellautomation.custhelp.com/ci/fattach/get/162759/&ved=2ahUKEwiy88qq0IjeAhUO3lQKHf01DYcQFjADegQIAxAB&usg=AOvVaw125pgiSor_bf-BpNSvNVF8
And here is a detailed KB from Rockwell:
https://rockwellautomation.custhelp.com/app/answers/detail/a_id/42964

Related

Locust eats CPU after 2-3 hours running

I have a simple HTTP server that I was testing. This server interacts with other HTTP servers and Cassandra DB.
Currently I was using 100 users with 1 request/s, so totally 100 tps was on the server. What I noticed with the Docker stats was that the CPU usage became higher and higher and ~ 2-3 hours later the CPU usage reaches the 90% mark, and even more. After that I got a notice from Locust, stating that the measurement may be inconsistent. But the latencies were not increased, so I do not know why this has been happening.
Can you please suggest possible cause(s) of the problem? I think 100 tps should be handled by one vCPU.
Thanks,
AM
There's no way for us to know exactly what's wrong without at very least seeing some code, and even then other factors like the environment or data or server you're running it on or against could have additional factors we wouldn't know about.
It's possible you have a problem with your code for your Locust users, such as a memory leak or they're just doing too much for a single worker to handle that many users. For users only doing simple HTTP calls, a single CPU typically can handle upwards of thousands of requests per second. Do anything more than that and you'll start to expect to reduce what a worker can handle. It's also possible you may just need a more powerful CPU (or more RAM or bandwidth) to do what you want it to do at the scale you want.
Do some profiling to see if you can find any inefficiencies in your code. Run smaller tests to see if the same behavior is evident with smaller loads. Run the same load but with additional Locust workers on other CPUs.
It's also just as possible your DB can't handle the load. The increasing CPU usage could be due to how your code is handling waiting on the connection from the DB. Perhaps the DB could sustain, say, 80 users at an acceptable rate but any additional users makes it fall further and further behind and your Locust users are then waiting longer and longer for the requested data.
For more suggestions, check out the Locust FAQ https://github.com/locustio/locust/wiki/FAQ#increase-my-request-raterps

determine ideal number of workers and EC2 sizing for master

I have a requirement to use locust to simulate 20,000 (and higher) users in a 10 minute test window.
the locustfile is a tasksquence of 9 API calls. I am trying to determine the ideal number of workers, and how many workers should be attached to an EC2 on AWS. My testing shows with 20 workers, on two EC2 instance, the CPU load is minimal. the master however suffers big time. a 4 CPU 16 GB RAM system as the master ends up thrashing to the point that the workers start printing messages like this:
[2020-06-12 19:10:37,312] ip-172-31-10-171.us-east-2.compute.internal/INFO/locust.util.exception_handler: Retry failed after 3 times.
[2020-06-12 19:10:37,312] ip-172-31-10-171.us-east-2.compute.internal/ERROR/locust.runners: RPCError found when sending heartbeat: ZMQ sent failure
[2020-06-12 19:10:37,312] ip-172-31-10-171.us-east-2.compute.internal/INFO/locust.runners: Reset connection to master
the master seems memory exhausted as each locust master process has grown to 12GB virtual RAM. ok - so the EC2 has a problem. But if I need to test 20,000 users, is there a machine big enough on the planet to handle this? or do i need to take a different approach and if so, what is the recommended direction?
In my specific case, one of the steps is to download a file from CloudFront which is randomly selected in one of the tasks. This means the more open connections to cloudFront trying to download a file, the more congested the available network becomes.
Because the app client is actually a native app on a mobile and there are a lot of factors affecting the download speed for each mobile, I decided to to switch from a GET request to a HEAD request. this allows me to test the response time from CloudFront, where the distribution is protected by a Lambda#Edge function which authenticates the user using data from earlier in the test.
Doing this dramatically improved the load test results and doesn't artificially skew the other testing happening as with bandwidth or system resource exhaustion, every other test will be negatively impacted.
Using this approach I successfully executed a 10,000 user test in a ten minute run-time. I used 4 EC2 T2.xlarge instances with 4 workers per T2. The 9 tasks in test plan resulted in almost 750,000 URL calls.
The answer for the question in the title is: "It depends"
Your post is a little confusing. You say you have 10 master processes? Why?
This problem is most likely not related to the master at all, as it does not care about the size of the downloads (which seems to be the only difference between your test case and most other locust tests)
There are some general tips that might help:
Switch to FastHttpUser (https://docs.locust.io/en/stable/increase-performance.html)
Monitor your network usage (if your load gens are already maxing out their bandwidth or CPU then your test is very unrealistic anyway, and adding more users just adds to the noice. In general, start low and work your way up)
Increase the number of loadgens
In general, the number of users is not an issue for locust, but number of requests per second or bandwidth might be.

How many parallel processes?

I am running some code in parallel by using a forking module in perl called Parallel::ForkManager. I have currently setting the maximum number of processes to 30:
my $pm = Parallel::ForkManager->new(30);
What would be an advisable maximum number of processes to create? I am doing this on a commercial grade Solaris server, but I still don't want to overload the system.
In downloading files, this really depends on
how many different hosts you're downloading from, and
how fast they will give you the requested files compared to your maximum bandwidth.
If you're downloading files from a single machine to a single machine on a local network, 2-3 is about max. If you're downloading files from 30 different servers on the internet, all of which are slow, but you have a fat pipe, then 30 might be reasonable.
There is no one universal right answer here. Unless you count "it depends."
The purpose of "downloading files" was mentioned, but in comments a while ago and I take the question as stated, to also be more general.
The only relevant measure is when you start reaching saturation in performance gains, with particular software on that system. The formal limits are huge and meaningless while rules of thumb are very general.
Let's imagine to run 10 processes and the time to complete the job drops 10 times. Increase to 20 processes and the time drops 20 times -- but for 30 processes the gain is the factor of 10. At this point we have loaded the system. Push further and the performance will degrade rapidly, and for everyone. At that point the server is overloaded, even though it allows, say, 1024 processes per user (and really ten or more times that for a server).
With a few processes per core the machine is engaged and I'd say that that is a good rule of thumb. However, it is too general. I doubt that you'd gain much in performance by going to that many processes, given the many other factors that affect it.
Accessing one web server The server's capability is the gospel. They may have posted how many requests per seconds they are happy with. Or they may have a limit on number of processes per user, say 10 or 20. If that means that many simultaneous downloads then that's your limit. But I'd be careful -- if the site is close and fast a request may complete in as little as 0.1 or 0.2 seconds. Then, with 10 processes you may be hitting the server 100 times a second. I do not recommend that. If there is no information I'd say keep it to a few requests per second. The performance and server load also depend on the content -- big downloads are different from pulling many skinny web pages. The I/O on your side may matter but I'd expect the server to set the limit. If you are going to use their service a lot why not send an email and ask what they are OK with.
I/O, network (many servers) or disk With network the performance depends on every piece of hardware in the path as well as on software. Nobody can tell without trying it out. The disk I/O is very complex. To add to trouble it is unclear whether it'd be your disks or network that is the bottleneck. I'd expect clear performance gains up to a few tens of processes, and probably fewer.
CPU or memory bound This may be easiest -- processing that can be broken up in parallel on 30 cores can enjoy close to a factor of 30 speedup (given no other bottlenecks). Going beyond the number of cores clearly leads to reduced performance gain. Concurrent (but not parallel) processing is far more complicated. If your code is memory intensive that is yet completely different.
Useful basic tools for assessing above components are iostat -xzn, netstat -I, and vmstat. But there is a bit of a curve to learning how to interpret their output and hopefully it doesn't come to that.
The conclusion is that you have to time it. Take your real application and time it running in one process. Do this 3 to 5 times and see the average (throw away obvious outliers). Then repeat with 5 processes, then with 10, etc. I'd expect that the trend will start slowing down far sooner than the 30 processors you mention. Once it gets to that the system is loaded and whoever works on it will notice. Very soon after that the performance will likely degrade rapidly. Proper benchmarking tools, like Benchmark, are far more sophisticated but this may well settle the issue. If you see strange or inconsistent behavior you may have to dig into details, starting with tools mentioned above.
What "overloaded" means is a bit unclear. I like to cap my use of resources well before other people are affected. But it may be possible to push it, in particular if you can run when it's quiet. I doubt that you'll keep having a worthy gain all the way to the number of available processors.
So there is no concern about "overloading" the server if you first time things. The performance limit will tell you when to stop. I'd say that your limit of 30 is very reasonable. Unless this is really about downloading files, in which case the web server is likely all that matters.
You should set the maximum number of processes to 60.

Distribution of CPU cycles when multiple process are running in parallel?

My question is Are cpu cycles are given to different processess in
round robin fashion ?
Conext of question is :-
i have windows system and lets say i have opened these 10 diferent processes for example
playing music in media player, typing in wordpad, typing in notepad, surfing in browser etc.
As music is played in background without interruption when at the same time i am typing in wordpad . I am wondering how
come music player is given continous CPU cycles. My understanding is OS is rotating the cpu cycles among different
processes in round robin fashion but this switching is too fast that end user is not able to spot the interruption
in music(though actually it is interrupted)
The simple round-robin is a way to share the computing resource among a set of processes(threads) but not the one used in Windows. Each thread has it's static and dynamic priority. The scheduler picks up the a thread with a highest priority to run and gives it a time slot to execute. If the thread consumed the time slot completely, then it is swapped out from execution by the scheduler preventively, or the thread may give the rest of the time slot back to the system voluntary if it has nothing to do more (for example it is waiting for the end of IO operation).
In your your particular question there is another thing that creates the continuous play of sound. It is buffering. The media player reads data in advance from the media and then queues it to the hardware to play. So the hardware should always have a buffered data in advance, otherwise the sound will have interruptions. Nowadays our computers are powerful enough to supply the necessary stream to the hardware even under notable loads.
In old days if you run many apps at once and the system starts to swap in and out the processes from the disk (from the OS point of view is more important thing than to give the media player the opportunity to run) then you could have your music played with gaps of silence.
a) The CPU fills a buffer in the sound card, which plays from the buffer while the CPU is doing other stuff. So the CPU doesn't have to care about the sound card all the time.
b) Switching between processes takes place in the milli- or even microsecond time range, so you just won't notice that as a human.
c) Processes that have nothing to do (like wordpad waiting for a key) tell the OS they're idle, so the OS doesn't give them any time until something happens (a key is pressed, or windows have been moved so they have to be redrawn).
d) CPUs are fast. Even if you could type 10 keys per second, the CPU won't take more than 100 microseconds per key (actually much less but this value makes it easier to calculate), so while typing at 10 key/s, you're giving work for 1ms/s to the CPU. So you'll take 0.1% of your CPU time while typing. You won't even see CPU usage in the taskmanager go up.
Cycles into processor are not round robin, because round robin means equality on the shares, and is not.
You can change all the cycles assignament percents using priority(windows) or using nice (Linux). CPU is dinamically assigned, between cycles.
In your context, putting more priority to the sound you will get lag on the keystroke repeating speed. (just like when you do into a 133Mhz computer)
A lot of extra info:
Is a usual problem to say CPU is the unique that makes the pc powerfull.
But really not. IS the core, but not the engine.
Your sound is NOT proccessed by main proccesor as we could "think". Audio chips are for. They got the concrete sound chip processing load. So, if you push play... the CPU push the play, draws a player, put info (mp3 mb files) into ram, then sends to the soundchip, and the Sound chip does the hard work.
Anyway, the computer workflow is not CPU->keyboard->CPU nor like that.
You have to check RAM, where the sound is stored, and passed to the sound chip.
The latency, current state, current load...
Motherboard is the road betteen components (micro, cpu, sound, vga, other), so the sum of total BUS MHz (cycles) it quite more important thant your CPU power.
As a final note. If you got problems when using sound with another apps/process, simply add more priority to the process.
The "background state" of the sound player maybe is one reason to get less priority by itself...but im not sure.
I know that is not a cool asnwer. Maybe re-edit them.
Just put the ideas on the table.
I suggest you to try at least 3 sound players with the same file.
If nothing has changed, try the same with 3 txt editors.
If nothing has changed, we are into a space-tem hole.
I think is the RAM, and if not, the motherboard.
Because you dont post any ANY data about your system, i cannot say nothing more.
If you have 4gb ram, IS NOT.
If you have 133Mhz CPU, we need to talk about it :)

Optimize JBPM5 engine's performance

We are launching processes in bulk (say 100 instances) in JBPM5 at a time. And each and every tasks of the process are started and completed by external programs asynchronously. In this scenario, the JBPM engine is taking much time to generate the next task and thus the overall performance is getting affected. (eg: Its taking an average of 45 mins to complete 100 process instances) Kindly suggest a way to optimize the performance of the jbpm5 engine.
Something must be wrong or misconfigured, as 45min to complete 100 process instances seems way too much, each request in general should take significantly less than a second in normal cases. But it's difficult to figure out what might be wrong. Do you have more info on your set up and what is actually taking up a lot of time? What types of external services are you invoking? Do you have a prototype available that we might be able to look at?
Kris
Yes that sounds as problem in your domain, and not in your engine. Some time ago we did some performance tests for in memory processes and for DB persisted processes and the latency introduced by the engine were less that 2ms per activity (in memory) and 5ms per activity (persisted in the database).
How exactly are you calling the engine, how are you hosting it? what kind of calls are you doing? Do yo have a way to measure how much time takes your external services to answer?
Cheers
Now its faster. After completing the task by
client.complete()
i'm intimating/signalling the server using the command
ksession.getWorkItemManager().completeWorkItem(id, data);
with this the engine is generating the subsequent tasks faster and i could able to retrieve it for my processing.
But is this the ideal way of completing any tasks..?