How to check if the spark job is succeeded or failed programmatically, while running spark-submit. Usually the unix exit code is used.
phase: Failed
container status:
container name: spark-kubernetes-driver
container image: <regstry>/spark-py:spark3.2.1
container state: terminated
container started at: 2022-03-25T19:10:51Z
container finished at: 2022-03-25T19:10:57Z
exit code: 1
termination reason: Error
2022-03-25 15:10:58,457 INFO submit.LoggingPodStatusWatcherImpl: Application Postgres-Minio-Kubernetes.py with submission ID spark:postgres-minio-kubernetes-py-b70d3f7fc27829ec-driver finished
2022-03-25 15:10:58,465 INFO util.ShutdownHookManager: Shutdown hook called
2022-03-25 15:10:58,466 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-3321e67c-73d5-422d-a26d-642a0235cf23
Process failed and when I get the exit code in unix by echo $? it return a zero error code!
$ echo $?
0
The pod generated is also random way. What's the way the spark-submit is handled apart from using sparkonk8operator?
If you are using bash, one way to to grep on the output. You might have to grep on stderr or stdout depending on where the log output is being sent.
Something like this:
OUTPUT=`spark-submit ...`
if echo "$OUTPUT" | grep -q "exit code: 1"; then
exit 1
fi
In addition to things which #Rico mentioned, I have also considered the deployment mode of cluster and client with changing spark-submit shell file in $SPARK_HOME/bin directory as follows.
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
if [ -z "${SPARK_HOME}" ]; then
source "$(dirname "$0")"/find-spark-home
fi
# disable randomized hash for string in Python 3.3+
export PYTHONHASHSEED=0
# check deployment mode.
if echo "$#" | grep -q "\-\-deploy-mode cluster";
then
echo "cluster mode..";
# temp log file for spark job.
export TMP_LOG="/tmp/spark-job-log-$(date '+%Y-%m-%d-%H-%M-%S').log";
exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$#" |& tee ${TMP_LOG};
# when exit code 1 is contained in spark log, then return exit 1.
if cat ${TMP_LOG} | grep -q "exit code: 1";
then
echo "exit code: 1";
rm -rf ${TMP_LOG};
exit 1;
else
echo "job succeeded.";
rm -rf ${TMP_LOG};
exit 0;
fi
elif echo "$#" | grep -q "\-\-conf spark.submit.deployMode=cluster";
then
echo "cluster mode..";
# temp log file for spark job.
export TMP_LOG="/tmp/spark-job-log-$(date '+%Y-%m-%d-%H-%M-%S').log";
exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$#" |& tee ${TMP_LOG};
# when exit code 1 is contained in spark log, then return exit 1.
if cat ${TMP_LOG} | grep -q "exit code: 1";
then
echo "exit code: 1";
rm -rf ${TMP_LOG};
exit 1;
else
echo "job succeeded.";
rm -rf ${TMP_LOG};
exit 0;
fi
else
echo "client mode..";
exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$#"
fi
Then, I have built and pushed my spark docker image.
See the following link for more details:
https://itnext.io/things-to-consider-to-submit-spark-jobs-on-kubernetes-766402c21716
Related
Is it possible to make the build stage parallel?
today the build stage builds and deploys all the images in a sequence, which takes quite a lot of time. it would save a lot of time if each image will be built in parallel to the others (same as the deploy stage).
The deploy stage does run in parallel, unless you opt to deploy them in order with the stages.deployments field in your pipeline manifest.
As for the build stage, you can make changes to your own pipeline's buildspec, specifically in this block:
for env in $pl_envs; do
tag=$(sed 's/:/-/g' <<<"${CODEBUILD_BUILD_ID##*:}-${env}" | rev | cut -c 1-128 | rev)
for svc in $svcs; do
./copilot-linux svc package -n $svc -e $env --output-dir './infrastructure' --tag $tag --upload-assets;
if [ $? -ne 0 ]; then
echo "Cloudformation stack and config files were not generated. Please check build logs to see if there was a manifest validation error." 1>&2;
exit 1;
fi
done;
for job in $jobs; do
./copilot-linux job package -n $job -e $env --output-dir './infrastructure' --tag $tag --upload-assets;
if [ $? -ne 0 ]; then
echo "Cloudformation stack and config files were not generated. Please check build logs to see if there was a manifest validation error." 1>&2;
exit 1;
fi
done;
done;
When deploying an application into dedicated Bluemix it uses DEA architecture by default. How can I force it to use DIEGO architecture instead?
You have to use more steps. Deploy without start, switch to diego, start.
cf push APPLICATION_NAME --no-start
cf disable-diego APPLICATION_NAME
cf start APPLICATION_NAME
Ref Deploying Apps
I built a bash exec to do this, which will use your existing manifest.yml file and pack all of this into a single request. The contents of the bash exec follow:
#!/bin/bash
filename="manifest.yml"
if [ -e $filename ];
then
echo "using manifest.yml file in this directory"
else
echo "no manifest.yml file found. exiting"
exit -2
fi
shopt -s nocasematch
string='name:'
targetName=""
echo "Retrieving name from manifest file"
while read -r line
do
name="$line"
variable=${name%%:*}
if [[ $variable == *"name"* ]]
then
inBound=${name#*:}
targetName="$(echo -e "${inBound}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
fi
done < "$filename"
if [ "$targetName" == "" ];
then
echo "Could not find name of application in manifest.yml file. Cancelling build."
echo "application name is identified by the 'name: ' term in the manifest.yml file"
exit -1
else
echo "starting cf push for $targetName"
cf push --no-start
echo "cf enable-diego $targetName"
cf enable-diego $targetName
echo "cf start $targetName"
cf start $targetName
exit 0
fi
Just put this code into your editor as a new file and then make the file executable. I keep a copy of this exec in each of my repos in the root directory. After doing a copy-paste and executing this exec, you may get the following error:
/bin/bash^M: bad interpreter: No such file or directory
If you do, just run the dos2unix command and it will 'fix up' the line endings to match your os.
Hi great people of stackoverflow,
Were hosting a docker container on EB with an nodejs based code running on it.
When redeploying our docker container we'd like the old one to do a graceful shutdown.
I've found help & guides on how our code could receive a sigterm signal produced by 'docker stop' command.
However further investigation into the EB machine running docker at:
/opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh
shows that when "flipping" from current to the new staged container, the old one is killed with 'docker kill'
Is there any way to change this behaviour to docker stop?
Or in general a recommended approach to handling graceful shutdown of the old container?
Thanks!
Self answering as I've found a solution that works for us:
tl;dr: use .ebextensions scripts to run your script before 01flip, your script will make sure a graceful shutdown of whatevers inside the docker takes place
first,
your app (or whatever your'e running in docker) has to be able to catch a signal, SIGINT for example, and shutdown gracefully upon it.
this is totally unrelated to Docker, you can test it running wherever (locally for example)
There is a lot of info about getting this kind of behaviour done for different kind of apps on the net (be it ruby, node.js etc...)
Second,
your EB/Docker based project can have a .ebextensions folder that holds all kinda of scripts to execute while deploying.
we put 2 custom scripts into it, gracefulshutdown_01.config and gracefulshutdown_02.config file that looks something like this:
# gracefulshutdown_01.config
commands:
backup-original-flip-hook:
command: cp -f /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh /opt/elasticbeanstalk/hooks/appdeploy/01flip.sh.bak
test: '[ ! -f /opt/elasticbeanstalk/hooks/appdeploy/01flip.sh.bak ]'
cleanup-custom-hooks:
command: rm -f 05gracefulshutdown.sh
cwd: /opt/elasticbeanstalk/hooks/appdeploy/enact
ignoreErrors: true
and:
# gracefulshutdown_02.config
commands:
reorder-original-flip-hook:
command: mv /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh /opt/elasticbeanstalk/hooks/appdeploy/enact/10flip.sh
test: '[ -f /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh ]'
files:
"/opt/elasticbeanstalk/hooks/appdeploy/enact/05gracefulshutdown.sh":
mode: "000755"
owner: root
group: root
content: |
#!/bin/sh
# find currently running docker
EB_CONFIG_DOCKER_CURRENT_APP_FILE=$(/opt/elasticbeanstalk/bin/get-config container -k app_deploy_file)
EB_CONFIG_DOCKER_CURRENT_APP=""
if [ -f $EB_CONFIG_DOCKER_CURRENT_APP_FILE ]; then
EB_CONFIG_DOCKER_CURRENT_APP=`cat $EB_CONFIG_DOCKER_CURRENT_APP_FILE | cut -c 1-12`
echo "Graceful shutdown on app container: $EB_CONFIG_DOCKER_CURRENT_APP"
else
echo "NO CURRENT APP TO GRACEFUL SHUTDOWN FOUND"
exit 0
fi
# give graceful kill command to all running .js files (not stats!!)
docker exec $EB_CONFIG_DOCKER_CURRENT_APP sh -c "ps x -o pid,command | grep -E 'workers' | grep -v -E 'forever|grep' " | awk '{print $1}' | xargs docker exec $EB_CONFIG_DOCKER_CURRENT_APP kill -s SIGINT
echo "sent kill signals"
# wait (max 5 mins) until processes are done and terminate themselves
TRIES=100
until [ $TRIES -eq 0 ]; do
PIDS=`docker exec $EB_CONFIG_DOCKER_CURRENT_APP sh -c "ps x -o pid,command | grep -E 'workers' | grep -v -E 'forever|grep' " | awk '{print $1}' | cat`
echo TRIES $TRIES PIDS $PIDS
if [ -z "$PIDS" ]; then
echo "finished graceful shutdown of docker $EB_CONFIG_DOCKER_CURRENT_APP"
exit 0
else
let TRIES-=1
sleep 3
fi
done
echo "failed to graceful shutdown, please investigate manually"
exit 1
gracefulshutdown_01.config is a small util that backups the original flip01 and deletes (if exists) our custom script.
gracefulshutdown_02.config is where the magic happens.
it creates a 05gracefulshutdown enact script and makes sure flip will happen afterwards by renaming it to 10flip.
05gracefulshutdown, the custom script, does this basically:
find current running docker
find all processes that need to be sent a SIGINT (for us its processes with 'workers' in its name
send a sigint to the above processes
loop:
check if processes from before were killed
continue looping for an amount of tries
if tries are over, exit with status "1" and dont continue to 10flip, manual interference is needed.
this assumes you only have 1 docker running on the machine, and that you are able to manually hop on to check whats wrong in the case it fails (for us never happened yet).
I imagine it can also be improved in many ways, so have fun.
I need to use mongodb with the --rest option. But mongodb is started automatically on boot, so I guess I need to modify a file or something.
Where can I add this --rest option?
I have this file at /etc/init/mongodb.conf, not sure what to edit:
# Ubuntu upstart file at /etc/init/mongodb.conf
limit nofile 20000 20000
kill timeout 300 # wait 300s between SIGTERM and SIGKILL.
pre-start script
mkdir -p /var/lib/mongodb/
mkdir -p /var/log/mongodb/
end script
start on runlevel [2345]
stop on runlevel [06]
script
ENABLE_MONGODB="yes"
if [ -f /etc/default/mongodb ]; then . /etc/default/mongodb; fi
if [ "x$ENABLE_MONGODB" = "xyes" ]; then exec start-stop-daemon --start --quiet --chuid mongodb --exec /usr/bin/mongod -- --config /etc/mongodb.conf; fi
end script
And this file at /etc/init.d/mongodb:
#!/bin/sh -e
# upstart-job
#
# Symlink target for initscripts that have been converted to Upstart.
set -e
INITSCRIPT="$(basename "$0")"
JOB="${INITSCRIPT%.sh}"
if [ "$JOB" = "upstart-job" ]; then
if [ -z "$1" ]; then
echo "Usage: upstart-job JOB COMMAND" 1>&2
exit 1
fi
JOB="$1"
INITSCRIPT="$1"
shift
else
if [ -z "$1" ]; then
echo "Usage: $0 COMMAND" 1>&2
exit 1
fi
fi
COMMAND="$1"
shift
if [ -z "$DPKG_MAINTSCRIPT_PACKAGE" ]; then
ECHO=echo
else
ECHO=:
fi
$ECHO "Rather than invoking init scripts through /etc/init.d, use the service(8)"
$ECHO "utility, e.g. service $INITSCRIPT $COMMAND"
# Only check if jobs are disabled if the currently _running_ version of
# Upstart (which may be older than the latest _installed_ version)
# supports such a query.
#
# This check is necessary to handle the scenario when upgrading from a
# release without the 'show-config' command (introduced in
# Upstart for Ubuntu version 0.9.7) since without this check, all
# installed packages with associated Upstart jobs would be considered
# disabled.
#
# Once Upstart can maintain state on re-exec, this change can be
# dropped (since the currently running version of Upstart will always
# match the latest installed version).
UPSTART_VERSION_RUNNING=$(initctl version|awk '{print $3}'|tr -d ')')
if dpkg --compare-versions "$UPSTART_VERSION_RUNNING" ge 0.9.7
then
initctl show-config -e "$JOB"|grep -q '^ start on' || DISABLED=1
fi
case $COMMAND in
status)
$ECHO
$ECHO "Since the script you are attempting to invoke has been converted to an"
$ECHO "Upstart job, you may also use the $COMMAND(8) utility, e.g. $COMMAND $JOB"
$COMMAND "$JOB"
;;
start|stop)
$ECHO
$ECHO "Since the script you are attempting to invoke has been converted to an"
$ECHO "Upstart job, you may also use the $COMMAND(8) utility, e.g. $COMMAND $JOB"
if status "$JOB" 2>/dev/null | grep -q ' start/'; then
RUNNING=1
fi
if [ -z "$RUNNING" ] && [ "$COMMAND" = "stop" ]; then
exit 0
elif [ -n "$RUNNING" ] && [ "$COMMAND" = "start" ]; then
exit 0
elif [ -n "$DISABLED" ] && [ "$COMMAND" = "start" ]; then
exit 0
fi
$COMMAND "$JOB"
;;
restart)
$ECHO
$ECHO "Since the script you are attempting to invoke has been converted to an"
$ECHO "Upstart job, you may also use the stop(8) and then start(8) utilities,"
$ECHO "e.g. stop $JOB ; start $JOB. The restart(8) utility is also available."
if status "$JOB" 2>/dev/null | grep -q ' start/'; then
RUNNING=1
fi
if [ -n "$RUNNING" ] ; then
stop "$JOB"
fi
# If the job is disabled and is not currently running, the job is
# not restarted. However, if the job is disabled but has been forced into the
# running state, we *do* stop and restart it since this is expected behaviour
# for the admin who forced the start.
if [ -n "$DISABLED" ] && [ -z "$RUNNING" ]; then
exit 0
fi
start "$JOB"
;;
reload|force-reload)
$ECHO
$ECHO "Since the script you are attempting to invoke has been converted to an"
$ECHO "Upstart job, you may also use the reload(8) utility, e.g. reload $JOB"
reload "$JOB"
;;
*)
$ECHO
$ECHO "The script you are attempting to invoke has been converted to an Upstart" 1>&2
$ECHO "job, but $COMMAND is not supported for Upstart jobs." 1>&2
exit 1
esac
It's probably cleaner to enable the REST interface via /etc/mongodb.conf by adding a line of:
rest = true
That setting is documented here.
MongoDB version 2.6 has switched to a YAML config file. The following two entries are required to prevent the following startup warning:
mongodb WARNING: --rest is specified without --httpinterface
net:
http:
enabled: true
RESTInterfaceEnabled: true
When u start the server using command mongod , add --rest option with command mongod like this mongod --rest.
refer mongod - MongoDB Manual 2.6.
After run command complete , u can use the following the simple Restful API:
http://127.0.0.1:28017/databaseName/collectionName/
Here is simple RestFul API Doc.
Just start the server using mongod --rest
Note: By default, the rest API's are inaccessible due to security issues. The web interface is accessible at localhost:<port>, where the number is 1000 more than the mongod port. For example, your mongodb server is running at 27017 (by default) then you can access mongodb at
http://127.0.0.1:28017/<db-name>/<collection-name>/
I'm writing a Django app that uses celery. So far I've been running on Ubuntu, but I'm trying to deploy to CentOS.
Celery comes with a nice init.d script for Debian-based distributions, but it doesn't work on RedHat-based distributions like CentOS because it uses start-stop-daemon. Does anybody have an equivalent one for RedHat that uses the same variable conventions so I can reuse my /etc/default/celeryd file?
Is better solved here:
Celery CentOS init script
You should be good using that one
Since I didn't get an answer, I tried to roll my own:
#!/bin/sh
#
# chkconfig: 345 99 15
# description: celery init.d script
# Defines the following variables
# CELERYD_CHDIR
# DJANGO_SETTINGS_MODULE
# CELERYD
# CELERYD_USER
# CELERYD_GROUP
# CELERYD_LOG_FILE
CELERYD_PIDFILE=/var/run/celery.pid
if test -f /etc/default/celeryd; then
. /etc/default/celeryd
fi
# Source function library.
. /etc/init.d/functions
# Celery options
CELERYD_OPTS="$CELERYD_OPTS -f $CELERYD_LOG_FILE -l $CELERYD_LOG_LEVEL"
if [ -n "$2" ]; then
CELERYD_OPTS="$CELERYD_OPTS $2"
fi
start () {
cd $CELERYD_CHDIR
daemon --user $CELERYD_USER --pidfile $CELERYD_PIDFILE $CELERYD $CELERYD_OPTS &
}
stop () {
if [[ -s $CELERYD_PIDFILE ]] ; then
echo "Stopping Celery"
killproc -p $CELERYD_PIDFILE python
echo "done!"
rm -f $CELERYD_PIDFILE
else
echo "Celery not running."
fi
}
check_status() {
status -p $CELERYD_PIDFILE python
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
status)
check_status
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit 1
;;
esac