Logs Only Write to EMR stderr/stdout with non-zero exit code - scala

I have a piece of code in Scala which uses log4j to do logging.
I have a few conditions upon which I want to log some information to the stderr/stdout of the EMR.
If I have a non-zero exit code it will log it:
if (args.length != 2) {
log.setLevel(Level.ERROR)
log.error("Expecting 2 args. Usage: ...")
System.exit(4)
}
However if I just have regular logging it does not show:
log.setLevel(Level.INFO)
log.info("This is my log")
I thought that perhaps this was because it only logs the errors in the EMR and tried the following:
log.setLevel(Level.ERROR)
log.error("This is my log")
With no luck.
It seems the logs are only written with a non-zero exit code.
I believe they are probably writing somewhere outside the log file. Is there any way to make it so that they do write where I am expecting them too?

Related

Unhelpful output from pytest

TLDR: How can I get better output from pytest?
I'm using Django with regular python3 unittests.
I've just switched to pytest-django for running tests.
pytest throws an error for almost all my tests (149 in total).
Pages and pages with this error.
self = <RegexURLResolver 'project.urls' (None:None) ^/>
#property
def reverse_dict(self):
language_code = get_language()
if language_code not in self._reverse_dict:
self._populate()
> return self._reverse_dict[language_code]
E KeyError: 'en-us'
Which wasn't the problem. It led me down to a wrong path.
I had a syntax error in one of my views.py files.
./manage.py test resulted in:
snip
File "/home/roland/project/views.py", line 20
code = zip(list1, list2])
SyntaxError: invalid syntax
Notice the last: ] which was the problem.
So: How can I get more useful output on problems when using pytest?
Btw:
After finding this and scrolling back into the pytest output there was mention of the syntax error. It was just buried in the output.
You can use the --maxfail=1 option so it will stop immediately on first failure.
Also, make sure your pytest.ini is setup properly so that pytest knows it should be using django-pyest.
[pytest]
DJANGO_SETTINGS_MODULE='myapp.settings'
For my workflow, I usually do the following:
run pytest --maxfail=1 myfile.py &> pytest-output.txt
tail, grep, or search he text file for errors.
Fix and iterate
There are a lot of other configuration options that will help you to get more meaningful input from pytest.

How do I write messages to the output log on AWS Glue?

AWS Glue jobs log output and errors to two different CloudWatch logs, /aws-glue/jobs/error and /aws-glue/jobs/output by default. When I include print() statements in my scripts for debugging, they get written to the error log (/aws-glue/jobs/error).
I have tried using:
log4jLogger = sparkContext._jvm.org.apache.log4j
log = log4jLogger.LogManager.getLogger(__name__)
log.warn("Hello World!")
but "Hello World!" doesn't show up in either of the logs for the test job I ran.
Does anyone know how to go about writing debug log statements to the output log (/aws-glue/jobs/output)?
TIA!
EDIT:
It turns out the above actually does work. What was happening was that I was running the job in the AWS Glue Script editor window which captures Command-F key combinations and only searches in the current script. So when I tried to search within the page for the logging output it seemed as if it hadn't been logged.
NOTE: I did discover through testing the first responder's suggestion that AWS Glue scripts don't seem to output any log message with a level less than WARN!
Try to use built-in python logger from logging module, by default it writes messages to standard output stream.
import logging
MSG_FORMAT = '%(asctime)s %(levelname)s %(name)s: %(message)s'
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'
logging.basicConfig(format=MSG_FORMAT, datefmt=DATETIME_FORMAT)
logger = logging.getLogger(<logger-name-here>)
logger.setLevel(logging.INFO)
...
logger.info("Test log message")
I know the article is not new but maybe it could be helpful for someone:
For me logging in glue works with the following lines of code:
# create glue context
glueContext = GlueContext(sc)
# set custom logging on
logger = glueContext.get_logger()
...
#write into the log file with:
logger.info("s3_key:" + your_value)
I noticed the above answers are written in python. For Scala you could do the following
import com.amazonaws.services.glue.log.GlueLogger
object GlueApp {
def main(sysArgs: Array[String]) {
val logger = new GlueLogger
logger.info("info message")
logger.warn("warn message")
logger.error("error message")
}
}
You can find both Python and Scala solution from official doc here
Just in case this helps. This works to change the log level.
sc = SparkContext()
sc.setLogLevel('DEBUG')
glueContext = GlueContext(sc)
logger = glueContext.get_logger()
logger.info('Hello Glue')
This worked for INFO level in a Glue Python job:
import sys
root = logging.getLogger()
root.setLevel(logging.DEBUG)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)
root.info("check")
source
I faced the same problem. I resolved it by added
logging.getLogger().addHandler(logging.StreamHandler(sys.stdout))
Before there was no prints at all, even ERROR level
The idea was taken from here
https://medium.com/tieto-developers/how-to-do-application-logging-in-aws-745114ac6eb7
Another option would be to log to stdout and glue AWS logging to stdout (using stdout is actually one of the best practices in cloud logging).
Update: it works only for setLevel("WARNING") and when prints ERROR or WARING. I didn't find how to manage it for the INFO level :(
If you're just debugging, print() (Python) or println() (Scala) works just fine.

How to manipulate the status of current job-execution from inside of an inline script?

The following code returns an error to rundeck.
#!/bin/bash
exit -1
And rundeck decides how to deal with it by running the next step or changing the execution "status" to "failed".
I would like to modify the status directly by inline script to support more than 2 states. I need "succeeded", "failed" and "nodata" to express that the data are missing.
Is there a way to express this?
There is none. Just like bash can return zero or non-zero
One possible alternative is raise an exception with message nodata and exit with non-zero code. Rundeck will mark this job as fail with NonZeroResultCode error. You should be able to get your error message nodata with ${result.message}

How can I generate a Parsing Error in BitBake on intent?

I use Anonymous Python Functions in BitBake recipes to set variables during parsing.
Now I wonder if I can check if a specific variable is set or not. If not, then I want to generate a BitBake Error, which stops the build process.
Pseudo code, that I want to create:
python __anonymous () {
if d.getVar('MY_VARIABLE', True) == "":
<BITBAKE ERROR with custom message "MY_VARIABLE not found">
}
You can call bb.fatal("MY_VARIABLE not set") which will print that error and abort the build by throwing an exception.
Beware that d.getVar() returns None when the variable is unset. You only get the empty string if that's your default value.
Outputs are possible on different loglevels and with python as well as shell script code
For usage in python there are:
bb.fatal
bb.error
bb.warn
bb.note
bb.plain
bb.debug
For usage in shell script there are:
bbfatal
bberror
bbwarn
bbnote
bbplain
bbdebug
for example if you want to throw an error in your recipe's do_install_append function:
bbfatal "something went terribly wrong!"

Stopping SpringBatch jobs started from the command line

Spring Batch jobs can be started from the commandline by telling the JVM to run CommandLineJobRunner. According to the JavaDoc, running the same command with the added parameter of -stop will stop the Job:
The arguments to this class can be provided on the command line
(separated by spaces), or through stdin (separated by new line). They
are as follows:
jobPath jobIdentifier (jobParameters)* The command line options are as
follows
jobPath: the xml application context containing a Job
-restart: (optional) to restart the last failed execution
-stop: (optional) to stop a running execution
-abandon: (optional) to abandon a stopped execution
-next: (optional) to start the next in a sequence according to the JobParametersIncrementer in the Job jobIdentifier: the name of the job or the id of a job execution (for -stop, -abandon or -restart).
jobParameters: 0 to many parameters that will be used to launch a job specified in the form of key=value pairs.
However, on the JavaDoc for the main() method the -stop parameter is not specified. Looking through the code on docjar.com I can't see any use of the -stop parameter where I would expect it to be.
I suspect that it is possible to stop a batch that has been started from the command line but only if the batches being run are backed by a non-transient jobRepository? If running a batch on the command line that only stores its data in HSQL (ie in memory) there is no way to stop the job other than CTRL-C etc?
stop command is implemented, see source for CommandLineJobRunner, line 300+
if (opts.contains("-stop")) {
List<JobExecution> jobExecutions = getRunningJobExecutions(jobIdentifier);
if (jobExecutions == null) {
throw new JobExecutionNotRunningException("No running execution found for job=" + jobIdentifier);
}
for (JobExecution jobExecution : jobExecutions) {
jobExecution.setStatus(BatchStatus.STOPPING);
jobRepository.update(jobExecution);
}
return exitCodeMapper.intValue(ExitStatus.COMPLETED.getExitCode());
}
The stop switch will work, but it will only stop the job after the currently executing step completes. It won't kill the job immediately.