how to stop the AWS Redshift resize activity? - amazon-redshift

Resizing operation seems very very slow
We have ds2.xlarge 3 nodes cluster, we decide to scale down that to 2 nodes, it has been running for last 28 hours, but the % of completion is just 48% (screenshot attached). So,
Do we need to wait for 30+ more hours to get it done, till that the cluster is going to be in read-only mode?
Because of this can we decide that usually resize will take more than 60+ hours?
What if I want to terminate the process?
Please advise.

60+ hours is anomalous, as per documentation it should take less than 48 hours:
(resizing) ... can take anywhere from a couple of hours to a couple of days.
You can't stop it from the console, but you can contact AWS support to stop it for you.

Related

MySQL stop responding

In a production environment, our Master MySQL 5.6.26 stop responding.
Our business handles about 1500 transactions per minute but there are times that nothing gets processes for many seconds (18 seconds this time).
We are logging show full processlist output every few seconds and we find the usual queries that normally took a fraction of a second holding for many seconds. But no indication of why.
In the past, we have an issue with storage provider that has almost a second of latency and everything fell apart, but now is not the case, normal 5 to 20 milliseconds latency.
What should I look?

Dataflow TextIO.write issues with scaling

I created a simple dataflow pipeline that reads byte arrays from pubsub, windows them, and writes to a text file in GCS. I found that with lower traffic topics this worked perfectly, however I ran it on a topic that does about 2.4GB per minute and some problems started to arise.
When kicking off the pipeline I hadn't set the number of workers (as I'd imagined that it would auto-scale as necessary). When ingesting this volume of data the number of workers stayed at 1, but the TextIO.write() was taking 15+ minutes to write a 2 minute window. This would continue to be backed up till it ran out of memory. Is there a good reason why Dataflow doesn't auto scale when this step gets so backed up?
When I increased the the number of workers to 6, the time to write the files started at around 4 mins for a 5 minute window, then moved down to as little as 20 seconds.
Also, when using 6 workers, it seems like there might be an issue for calculating wall time? Mine never seems to go down even when the dataflow has caught up and after running for 4 hours my summary for the write step looked like this:
Step summary
Step name: Write to output
System lag: 3 min 30 sec
Data watermark: Max watermark
Wall time: 1 day 6 hr 26 min 22 sec
Input collections: PT5M Windows/Window.Assign.out0
Elements added: 860,893
Estimated size: 582.11 GB
Job ID: 2019-03-13_19_22_25-14107024023503564121
So for each of your questions:
Is there a good reason why Dataflow doesn't auto scale when this step gets so backed up?
Streaming autoscaling is a beta feature, and must be explicitly enabled for it to work per the documentation here.
When using 6 workers, it seems like there might be an issue for calculating wall time?
My guess would be you ran your 6 worker pipeline for about 5 hours and 4 minutes, thus the "Wall time" presented is Workers*Hours.

How to troubleshoot and reduce communication overhead on Rockwell ControlLogix

Need help. We have a plc that's cpu keeps getting maxed out. We've already upgraded it once. Now we need work on optimize it.
We have over 50 outgoing msg instructions, 60 incoming, and 103 number of ethernet devices (flow meters, drives, etc) I've gone through and tried to make sure everything is cached that can be, only instructions that are currently needed are running, and communication to the same plc happen in the same scan, but I haven't made a dent.
I'm having trouble identifying which instructions are significant. It seems the connections will be consolidated so lots of msgs shouldn't be too big of a problem. Considering Produced & Consumed tags but our team isn't very familiar with them and I believe you have to do a download to modify them, which is a problem. Our IO module RPIs are all set to around 200ms, but that didn't seem to make a difference (from 5ms).
We have a shutdown this weekend and I plan on disabling everything and turning it back on one part at a time to see where the load is really coming from.
Does anyone have any suggestions? The task monitor doesn't have a lot of detail that I can understand, i.e. It's either too summarized or too instant for me to make heads or tales of it. Here is a couple screens from the Task Monitor to shed some light on what I'm seeing.
First question coming to mind is are you using the Continues Task or is all in Periodic tasks?
I had a similar issue many years ago with a CLX. Rockwell suggested increasing the System Overhead Time Slice to around 40 to 50%. The default is 20%.
Some details:
Look at the System Overhead Time Slice (go to Advanced tab under Controller Properties). Default is 20%. This determines the time the controller spends running its background tasks (communications, messaging, ASCII) relative to running your continuous task.
From Rockwell:
For example, at 25%, your continuous task accrues 3 ms of run time. Then the background tasks can accrue up to 1 ms of run time, then the cycle repeats. Note that the allotted time is interrupted, but not reduced, by higher priority tasks (motion, user periodic or event tasks).
Here is a detailed Word Doc from Rockwell:
https://rockwellautomation.custhelp.com/ci/fattach/get/162759/&ved=2ahUKEwiy88qq0IjeAhUO3lQKHf01DYcQFjADegQIAxAB&usg=AOvVaw125pgiSor_bf-BpNSvNVF8
And here is a detailed KB from Rockwell:
https://rockwellautomation.custhelp.com/app/answers/detail/a_id/42964

Apache Spark is taking longer than what the UI shows

I am running a series of steps in PySpark. It is taking about 56 minutes to complete. But when I go to the UI, I can see the breakdown only for 9-12 mins in one stage and all other stages are in milliseconds. Is there any way that I can reduce the wait time and also the run time to those 9-12 mins?
Highly appreciate for your time. Thanks.

Does "concurrency" limit of 10 guarantee 10 parallel slice runs?

In an ADF we can define concurrency limit up to maximum 10. So, assuming we set it to 10, and slices are waiting to run (not waiting for data set dependency etc), will there always be guarantee that at any given time 10 slices will be running in parallel. I have noticed that even after setting it to 10, sometimes couple of them are in progress, or not sure if UI doesn't show properly. Is it subject to resources available? But finally it's cloud, there are infinite resources virtually. Has anyone noticed anything like this?
If there are 10 slices to be run in parallel and for each one of them all their dependencies have been met then 10 slices would run in parallel. Do raise an Azure support ticket if you do not see this happening and we would look into it. There may be a small delay in kicking all 10 off but 10 should run in parallel.
Thanks, Harish