Helm and Kubernetes: Is there a barrier equivalent for jobs?

Helm and Kubernetes: Is there a barrier equivalent for jobs? - kubernetes

Given 3 jobs (A,B,C) in 3 Helm charts, is it possible to run A and B jobs in parallel, then start job C as soon as both of them are finished? Think of a barrier, in which a bunch of stuff needs to be finished before moving on.
Even if I put A and B charts as sub-charts for C chart, then all 3 are started in parallel.
I already have an workaround for this: add an external check for A and B job finishing, then start C. Still, I would prefer a Helm-based solution, if it exists.

Kubernetes isn't a batch job framework/scheduler and does not fit your advanced batch framework requirements.
My recommendation would be to use a real batch framework like Luigi which also supports scheduling Luigi jobs on kubernetes.
Look here for an example how to do this.

Indeed, Kubernetes seems to be quite basic when it comes to scheduling jobs.
We'll move to Luigi at some point in the future for advanced job scenarios.
For now, I wrote this small awk, bash-based workaround for this. Perhaps it could help others in a similar situation.
while true; do
sleep 60
# done means output is 'SUCCESSFUL\n1' only
is_done=$(kubectl get jobs 2>&1 | awk '{print $3}' | uniq | awk 'BEGIN{no_lines=0; no_all_lines=0} {no_all_lines++; if(NR==1 && $1=="SUCCESSFUL") {no_lines++}; if(NR==2 && $1=='1') {no_lines++}} END{ if(no_lines==no_all_lines) {print "1"} else {print "0"}}'
)
if [ ${is_done} = "1" ]; then
echo "Finished all jobs"
break
else
echo "One or more jobs are still running, checking again in 1 minute"
fi
done

Related

skip Helm uninstall interactive request

i want to automate a bit helm install/uninstall but during helm uninstall command it will become user interactiv asking:
Do you want to continue to delete suite-helm? (yY|nN):
Is any flag or way to skip this part?
Thanks in advance

finaly i found a way using expect and here it is:
expect -c ' spawn ./helm_remove.sh; expect (yY|nN); send "y\n"; interact'
into sh file i will have helm uninstall suite-helm -n suite-helm and some other commands to remove pvs deployment...

You would have to wrap in a shell script or function.
Something like (just spitballing here, not even syntax checking)
helm-delete() {
helm status $1
echo "Do you want to continue to delete suite-helm? (yY|nN):"
read -rs -k 1 ans
case "${ans}" in
y|Y|$'\n')
printf "Yes\n"
helm delete %1
;;
*) # This is the default
printf "No\n"
return 0
esac
}

determine required permissions for AWS CDK

I'm working with AWS CDK and every time I go to create a new resource (CodePipeline, VPC, etc) I end up in the same loop of...
try to deploy
"you are not authorized to foo:CreateBar"
update IAM permissions
try to deploy
"you are not authorized to baz:CreateZzz"
update IAM permissions
...over and over again. Then the same when I cdk destroy, but for "foo:DeleteFoo"
Is there a more efficient way to determine what permissions a policy needs to perform a certain CDK action? Maybe somewhere in the documentation I can reference?
Thanks

Here is a script that will execute whatever you pass to it but will also capture the timestamps between what you passed it and when it finished executing and will print all the AWS API Events captured by the configured default aws user using cloudtrail. It can take like 20 minutes for the actions to show up in cloudtrail but the script will check every minute until it gets results for that time range. If no AWS api calls are made during the time range then no results will ever be returned. It's a simple script, there is no max timeout or anything.
#!/bin/bash -x
user_name=`aws sts get-caller-identity | jq -r '.Arn' | sed -e 's/user\// /g' | awk '{print $2}'`
sleep 5 # Sleep to avoid getting the sts call in our time range
start_time=`date`
sleep 1 # Sleep to avoid millisecond rounding issues
eval $#
sleep 1 # Sleep to avoid millisecond rounding issues
end_time=`date`
actions=""
while [ -z "$actions" ]; do
sleep 60
echo "Checking for events from $start_time to $end_time..."
actions=`aws cloudtrail lookup-events --lookup-attributes AttributeKey=Username,AttributeValue=${user_name} --start-time "${start_time}" --end-time "${end_time}" | jq -r '.Events[].CloudTrailEvent' | jq -s | jq -r '.[] | "\(.eventSource) \(.eventName)"' | sed -e 's/.amazonaws.com /:/g' | sed -e 's/[0-9]//g' | sort | uniq`
done
echo "AWS Actions Used:"
echo "$actions"
I call it get-aws-actions.sh and it requires the aws cli to be installed as well as jq. For cdk I would use it like this
./get-aws-actions.sh "cdk deploy && cdk destroy"
I'd have my admin level credentials configured as the default profile so I know the deployment will not fail because of permission issues then I use the returned results from this script to give permissions to a more specific deployment user/role for long term use. The problem you can run into is the first time you may only see a bunch of :Create* or :Add* actions but really you'll need to add all the lifecycle actions for the ones you see. So if you see dynamodb:CreateTable you'll want to make sure you also add UpdateTable and DeleteTable. If you see s3:PutBucketPolicy you'll also want s3:DeleteBucketPolicy.
To be honest, any services that don't deal with API calls that allow access to data, I will just do <service>:*. An example might be ECS. I can't use ECS API calls to call an API do anything to a container that CloudFormation won't need to do to manage the service. So for that service if I knew I was doing containers I'd just grant ecs:* on * to my deployer role. A service like s3, lambda, sqs, sns where there is data access as well as resource creation access through an API I'll need to be more deliberate with the permissions granted. My deployer role shouldn't have access to read all the data off all buckets or execute functions but it does need to create buckets and functions.

192 days to build Europe tiles

Hi and thanks for all the good work on OpenMapTiles.
I'm trying to build tiles for Europe, North-America, maybe world.
I'm using the ./quickstart script and it's said to be taking 30 days to build the tiles for America and 192 days for Europe.
This is running on a c5d.18xlarge EC2 instance (70 CPU, 180G RAM, SSD disks).
Am I missing something ?
I'm currently trying to use a database outside of Docker (on localhost) to see if I can speed things... but how are you guys doing ?

I'm using this
https://github.com/mapbox/mbutil/blob/5e1ac74fdf7b0f85cfbbc245481e1d6b4d0f440d/patch
This is one of my scripts, I'm merging it all to tmp and checking if there is still a tilelive-copy in progress on that file
for i in *.mbtiles; do
[ -f "$i" ] || break
if [[ $i != *"final.mbtiles"* ]]; then
if ! [[ `lsof -c /tilelive-copy/ $i` ]]; then
exit=$(/usr/local/bin/merge_mbtiles.sh $i /tmp/final.mbtiles)
echo $exit
(( $rc )) && echo "merge failed $i" && exit 1
echo "merge sucessfull"
fi
fi
done

I'm using openmaptiles too and speed extremly slowed down after the last updates (still have to figure out the changes that caused that).
quickstart script is nice for trying and figuring out, in the end I started writing scripts to split and parallise the work. Right now we are processing the whole world with zoom 0-14(fast) and most of europe with 14-18 (that takes weeks)
Try the following:
* tune postgres (defaults are bad for large databases)
* try to split the areas and parralise the work.
You can see that the rendering process with tilelive-copy is not really using all cores. The whole process isn't that effective in using resources. After a couple of tries I figured out that starting multiple workers in parallel is faster (in the end) than supersizing your server with more CPU core speed.
see also:
https://github.com/openmaptiles/openmaptiles/issues/462
https://github.com/mapbox/tilelive/issues/181

Is it possible to list all tags across all behat tests?

I have several hundred behat tests created by many people who used different tags. I want to clean this up, and to start with I want to list out all the tags which have been used so far.

I wanted to answer my own question as it was something I could not find an answer to elsewhere.
I tried initially to use a custom formatter but that did not work.
https://gist.github.com/paulmozo/fb23d8fb436700381a06
Eventually I crafted a Bash command to suit my purposes
bin/behat --dry-run 2>&1 | tr ' ' '\n' | grep -w #.* | sort -u
This runs the behat command with --dry-run which does not execute the tests, merely outputs the steps so I can pipe them to another tool. The 2>&1 redirects the standard error to null (this is shell dependent). The tr tool breaks every word in the stream into a separate line. The grep searches for lines starting with the # symbol. Finally sort -u sorts the list and returns the uniques.
This command takes about 15 seconds to run and did the job perfectly for me.

Script response if md5sum returns FAILED

Say I had a script that checked honeypot locations using md5sum.
#!/bin/bash
#cryptocheck.sh
#Designed to check md5 CRC's of honeypot files located throughout the filesystem.
#Must develop file with specific hashes and create crypto.chk using following command:
#/opt/bin/md5sum * > crypto.chk
#After creating file, copy honeypot folder out to specific folders
locations=("/share/ConfData" "/share/ConfData/Archive" "/share/ConfData/Application"
"/share/ConfData/Graphics")
for i in "${locations[#]}"
do
cd "$i/aaaCryptoAudit"
/opt/bin/md5sum -c /share/homes/admin/crypto.chk
done
And the output looked like this:
http://pastebin.com/b4AU4s6k
Where would you start to try and recognize the output and perhaps trigger some sort of response by the system if there is a 'FAILED'?
I've worked a bit with PERL trying to parse log files before but my attempts typically failed miserably for one reason or another.
This may not be the proper way to go about this, but I'd want to be putting this script into a cronjob that would run every minute. I had some guys telling me that an inotify job or script (I'm not familiar with this) would be better than doing it this way.
Any suggestions?
--- edit
I made another script to call the script above and send the output to a file. The new script then runs a grep -q on 'FAILED' and if it picks anything up, it sounds the alarm (tbd what the alarm will be).
#!/bin/bash
#cryptocheckinit.sh
#
#rm /share/homes/admin/cryptoalert.warn
/share/homes/admin/cryptocheck.sh > /share/homes/admin/cryptoalert.warn
grep -q "FAILED" /share/homes/admin/cryptoalert.warn && echo "LIGHT THE SIGNAL FIRES"

Use:
if ! /opt/bin/md5sum -c /share/homes/admin/crypto.chk
then
# Do something
fi
Or pipe the output of the loop:
for i in "${locations[#]}"
do
cd "$i/aaaCryptoAudit"
/opt/bin/md5sum -c /share/homes/admin/crypto.chk
done | grep -q FAILED && echo "LIGHT THE SIGNAL FIRES"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Helm and Kubernetes: Is there a barrier equivalent for jobs? - kubernetes

Kubernetes isn't a batch job framework/scheduler and does not fit your advanced batch framework requirements. My recommendation would be to use a real batch framework like Luigi which also supports scheduling Luigi jobs on kubernetes. Look here for an example how to do this.

Related

skip Helm uninstall interactive request

determine required permissions for AWS CDK

192 days to build Europe tiles

Is it possible to list all tags across all behat tests?

Script response if md5sum returns FAILED

Categories

Resources