array with aftercorr dependency never starts in slurm - hpc

I am trying to write two arrays of identical size in SLURM and have each job in array #2 start after the corresponding job in #1 has completed. As far as I understand this is the exact use case of --dependency=aftercorr:#jobid in SLURM as per one-to-one dependency between two job arrays in SLURM
However, when I do the below, the second array is left PENDING although many jobs in #1 have completed:
sbatch --mem=30g -c 2 --time 10:0:0 -e %A.%a.out -o %A.%a.out --array=1-977%200 ./step1.sh
# Store the array ID as $array1
# wait 1 hour, some of them show as COMPLETED
sbatch --mem=30g -c 2 --time 10:0:0 -e %A.%a.out -o %A.%a.out --array=1-977%200 --dependency=aftercorr:$array1 ./step2.sh
# This just remains PENDING forever

Related

How to get nodename of running celery worker?

I want to shut down celery workers specifically. I was using app.control.broadcast('shutdown'); however, this shutdown all the workers; therefore, I would like to pass the destination parameter.
When I run ps -ef | grep celery, I can see the --hostname on the process.
I know that the format is {CELERYD_NODES}{NODENAME_SEP}{hostname} from the utility function nodename
destination = ''.join(['celery', # CELERYD_NODES defined at /etc/default/newfies-celeryd
'#', # from celery.utils.__init__ import NODENAME_SEP
socket.gethostname()])
Is there a helper function which returns the nodename? I don't want to create it myself since I don't want to hardcode the value.
I am not sure if that's what you're looking for, but with control.inspect you can get info about the workers, for example:
app = Celery('app_name', broker=...)
app.control.inspect().stats() # statistics per worker
app.control.inspect().registered() # registered tasks per each worker
app.control.inspect().active() # active workers/tasks
so basically you can get the list of workers from each one of them:
app.control.inspect().stats().keys()
app.control.inspect().registered().keys()
app.control.inspect().active().keys()
for example:
>>> app.control.inspect().registered().keys()
dict_keys(['worker1#my-host-name', 'worker2#my-host-name', ..])

how to stop locust when specific number of users are spawned with -i command line option?

from locust import SequentialTaskSet, HttpUser, constant, task
import locust_plugins
class MySeqTask(SequentialTaskSet):
#task
def get_status(self):
self.client.get("/200")
print("Status of 200")
#task
def get_100_status(self):
self.client.get("/100")
print("Status of 100")
class MyLoadTest(HttpUser):
host = "https://http.cat"
tasks = [MySeqTask]
wait_time = constant(1)
Examples for locust-plugins command line options can be found here:
https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/cmd_line_examples.sh
locust -u 5 -t 60 --headless -i 10
# Stop locust after 10 task iterations (this is an upper bound, so you can be sure no more than 10 of iterations will be done)
# Note that in a distributed run the parameter needs to be set on the workers, it is (currently) not distributed from master to worker.
You will run your locust file the same way as normal but add -i to each worker you run. It sounds to me like since it's per worker, you'll need to pre-calculate how many you want each worker to run. So if you have 10 workers and you want to stop after a total of 10000 task iterations, you'd probably do -i 1000 on each worker.

rundeck make job with multiple steps on different nodes

How can a Job with multiple steps run some steps on Node 1 and other on Node 2?
For example:
On Node 1, I have to copy a file to a folder cp file.txt /var/www/htm/
On Node 2, I have to download this file wget https://www.mywebsite.com/file.txt
I have tried creating three jobs,
JOB 1, workflow I have Execute Command on remote cp file.txt /var/www/htm/ and NODES filter to my NODE 1
JOB 2, workflow I have Execute Command on remote wget https://www.mywebsite.com/file.txt and NODES filter to NODE 2
JOB 3, workflow step 1: selected Job Reference, and paste the UUID from JOB 1, step 2 Job reference and paste UUID JOB 2 and node filter I writed .* to get all nodes.
For now I tried to only run a command ls(on JOB 1 and JOB 2), but when I run JOB 3 the output is 3 time the command each job, for example:
// Run Job 3
// Output from Job 1
test-folder
test.text
test-folder
test.text
test-folder
test.text
And same for JOB 2
How can I implement my job?
Using the job reference step is the right way to solve that, but instead of defining .* to get all nodes, you can use the node1 name in the first job reference step call and the node2 name for the second job reference call, on "Override node filters?" section. Alternatively you can define the nodes filter in each job and just call it from the Job 3 using job reference step.

Batch Filtering with Multi-Filter throws a 'Class attribute not set' exception

We have a data set of 15k classified tweets with which we need to perform sentiment analysis. I would like to test against a test set of 5k classified tweets. Due to Weka needing the same attributes within the header of the test set as exist in the header of training set, I will have to use batch filtering if I want to be able to run my classifier against this 5k test set.
However, there are several filters that I need to run my training set through, so I figured the running a multifilter against the training set would be a good idea. The multifilter works fine when not running the batch argument, but when I try to batch filter I get an error from the CLI as it tried to execute the first filter within the multi-filter:
CLI multiFilter command w/batch argument:
java weka.filters.MultiFilter -F "weka.filters.supervised.instance.Resample -B 1.0 -S 1 -Z 15.0 -no-replacement" \
-F "weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 100000 -prune-rate -1.0 -N 0 -S -stemmer weka.core.stemmers.NullStemmer -M 2 -tokenizer weka.core.tokenizers.AlphabeticTokenizer" \
-F "weka.filters.unsupervised.attribute.Reorder -R 2-last,first"\
-F "weka.filters.supervised.attribute.AttributeSelection -E \"weka.attributeSelection.InfoGainAttributeEval \" -S \"weka.attributeSelection.Ranker -T 0.0 -N -1\"" \
-F weka.filters.AllFilter \
-b -i input\Train.arff -o output\Train_b_out.arff -r input\Test.arff -s output\Test_b_out.arff
Here is the resultant error from the CLI:
weka.core.UnassignedClassException: weka.filters.supervised.instance.Resample: Class attribute not set!
at weka.core.Capabilities.test(Capabilities.java:1091)
at weka.core.Capabilities.test(Capabilities.java:1023)
at weka.core.Capabilities.testWithFail(Capabilities.java:1302)
at weka.filters.Filter.testInputFormat(Filter.java:434)
at weka.filters.Filter.setInputFormat(Filter.java:452)
at weka.filters.SimpleFilter.setInputFormat(SimpleFilter.java:195)
at weka.filters.Filter.batchFilterFile(Filter.java:1243)
at weka.filters.Filter.runFilter(Filter.java:1319)
at weka.filters.MultiFilter.main(MultiFilter.java:425)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at weka.gui.SimpleCLIPanel$ClassRunner.run(SimpleCLIPanel.java:265)
And here are the headers with a portion of data for both the training and test input arffs:
Training:
#RELATION classifiedTweets
#ATTRIBUTE ##sentence## string
#ATTRIBUTE ##class## {1,-1,0}
#DATA
"Conditioning be very important for curly dry hair",0
"Combine with Sunday paper coupon and",0
"Price may vary by store",0
"Oil be not really moisturizers",-1
Testing:
#RELATION classifiedTweets
#ATTRIBUTE ##sentence## string
#ATTRIBUTE ##class## {1,-1,0}
#DATA
"5",0
"I give the curl a good form and discipline",1
"I have be cowashing every day",0
"LOL",0
"TITLETITLE Walgreens Weekly and Midweek Deal",0
"And then they walk away",0
Am I doing something wrong here? I know that supervised resampling requires the class attribute to be on the bottom of the attribute list within the header, and it is... within both the test and training input files.
EDIT:
Further testing reveals that this error does not occur with relationship to the batch filtering, it occurs whenever I run the supervised resample filter from the CLI... The data that I use works on every other filter I've tried within the CLI, so I don't understand why this filter is any different... resampling the data in the GUI works fine as well...
Update:
This also happens with the SMOTE filter instead of the resample filter
Could not get the batch filter to work with any resampling filter. However, our workaround was to simply resample (and then randomize) the training data as step 1. From this reduced set we ran batch filters for everything else we wanted on the test set. This seemed to work fine.
You could have used the multifilter along with the ClassAssigner method to make it work:
java -classpath $jcp weka.filters.MultiFilter
-F "weka.filters.unsupervised.attribute.ClassAssigner -C last"
-F "weka.filters.supervised.instance.Resample -B 1.0 -S 1 -Z 66.0"

VOD server performance test

I have two VOD servers (RTSP) each on a different machine in a local network at home (vlc and Darwin streaming server).
What i am trying to do is a performance test that goes as follows:
* send in 10 requests, 50, then 100.
* redo the same but request multiple files instead of emulating multiple access to a single file.
* output statistics (speed, quality...etc).
What i have right now is OpenRstp which uses "-Q" to output Qos info but it is nowhere near what i need.
What i need is a free tool that can help me with this...all the ones i found (divesifeye and IxLoad) are not free.
Could anyone please suggest something useful?
I found a method that should do. It is based on openRTSP with "-Q" for Qos statistics.
the trick is how to redirect the data to a file as the Qos info only shows up after the feed is cut off. i wrote the following script to manage N-readings of a video feed/playlist. It will create a file that will contain the Qos info.
#!/bin/bash
f_rtsp(){
clear
echo -e "ENTER THE NUMBER OF STREAM USERS:"
echo -n "USER:"
read usr
for((i=1; i <= $usr;i++))
do
exec &> /$HOME/Desktop/results
echo -e "******************************* $i *****************************"
openRTSP -Q rtsp://<url>/<playlist-name>.sdp &
done
}
while : #Loop forever
do
cat <<!
Benchmark.RTSP
1.RTSP consumers
2.EXIT
!
echo -n "YOUR CHOICHE? :"
read choice
case $choice in
1|[rR]) f_rtsp ;;
2|[eE]) exit ;;
*) echo "\"$choice\"is not valid"; sleep 2 ;;
esac
done