Custom logger using log4j for spark scala application when executed using oozie - scala

I have developed a spark scala application and used log4j for logger and is working fine when i execute it using the spark-submit as below :
spark-submit --name "Test" --class com.comp.test --conf spark.driver.extraJavaOptions='-Dlog4j.configuration=file:/home/myid/log4j.properties' --queue=root.user /home/myid/dev/data.jar
The works fine and I get my log file created at the specified directory in the log4j.properties.
Now, When I run the same using Oozie spark action , the log file at the specific directory as mentioned in the log4j.properties is not getting created.
log4j.properties:
log4j.appender.myConsoleAppender=org.apache.log4j.ConsoleAppender
log4j.appender.myConsoleAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.myConsoleAppender.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=/home/myid/dev/log/dev.log
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n
# By default, everything goes to console and file
log4j.rootLogger=INFO, myConsoleAppender, RollingAppender
# The noisier spark logs go to file only
log4j.logger.spark.storage=INFO, RollingAppender
log4j.additivity.spark.storage=false
log4j.logger.spark.scheduler=INFO, RollingAppender
log4j.additivity.spark.scheduler=false
log4j.logger.spark.CacheTracker=INFO, RollingAppender
log4j.additivity.spark.CacheTracker=false
log4j.logger.spark.CacheTrackerActor=INFO, RollingAppender
log4j.additivity.spark.CacheTrackerActor=false
log4j.logger.spark.MapOutputTrackerActor=INFO, RollingAppender
log4j.additivity.spark.MapOutputTrackerActor=false
log4j.logger.spark.MapOutputTracker=INFO, RollingAppender
log4j.additivty.spark.MapOutputTracker=false
Oozie workflow :
<workflow-app name="OozieApp" xmlns="uri:oozie:workflow:0.5">
<start to="LoadTable"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="LoadTable">
<spark xmlns="uri:oozie:spark-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapreduce.job.queuename</name>
<value>root.user</value>
</property>
</configuration>
<master>yarn</master>
<mode>client</mode>
<name>OozieApp</name>
<class>com.comp.test</class>
<jar>data.jar</jar>
<spark-opts>--queue=root.user --conf spark.driver.extraJavaOptions='-Dlog4j.configuration=file:/home/myid/log4j.properties' </spark-opts>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
Could you help me get the custom log created in log directory when executed using oozie spark action ?
I can use shell action and use spark-submit , but I prefer spark action itself.

Related

How to find what arguments are passed to a Scala main class from oozie spark action

I'm trying to figure out if I can access the name parameter from <name>[SPARK JOB NAME]</name> inside the Scala class, when running a spark application.
Spark Action: https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html
Syntax:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.3">
...
<action name="[NODE-NAME]">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>[JOB-TRACKER]</job-tracker>
<name-node>[NAME-NODE]</name-node>
<prepare>
<delete path="[PATH]"/>
...
<mkdir path="[PATH]"/>
...
</prepare>
<job-xml>[SPARK SETTINGS FILE]</job-xml>
<configuration>
<property>
<name>[PROPERTY-NAME]</name>
<value>[PROPERTY-VALUE]</value>
</property>
...
</configuration>
<master>[SPARK MASTER URL]</master>
<mode>[SPARK MODE]</mode>
<name>[SPARK JOB NAME]</name>
<class>[SPARK MAIN CLASS]</class>
<jar>[SPARK DEPENDENCIES JAR / PYTHON FILE]</jar>
<spark-opts>[SPARK-OPTIONS]</spark-opts>
<arg>[ARG-VALUE]</arg>
...
<arg>[ARG-VALUE]</arg>
...
</spark>
<ok to="[NODE-NAME]"/>
<error to="[NODE-NAME]"/>
</action>
...
</workflow-app>
I tried a few test scripts, and I could access only the arguments enclosed in the <arg><arg/> tags.
Thank you.

Execute Scala code using oozie spark action gives exit code [-1] error

I am trying to execute a scala code using Oozie spark action. But ending up with the error shown below:
oozi-W#spark-node] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [-1]
My workflow.xml is given below.
<action name='spark-node'>
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark/hive-conf.xml</job-xml>
<master>${master}</master>
<name>ToyPredictor</name>
<class>org.scala.model.ToyPredictor</class>
<jar>${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark/lib/ToyPredictor-1.4.jar,${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark/lib/datanucleus-api-jdo-3.2.6.jar,${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark/lib/datanucleus-rdbms-3.2.9.jar,${nameNode}/user/${wf:user()}/${examplesRoot}/apps/spark/lib/datanucleus-core-3.2.10.jar
</jar>
<arg>34</arg>
<arg>0</arg>
<arg>34</arg>
<arg>${nameNode}/tmp/ScalaOut-data/</arg>
</spark>
<ok to="end" />
<error to="fail" />
</action>
Iam able to execute example spark action provided in the Oozie documentation. But when trying to execute this scala code using spark action I get the error above.
Any advice or pointers to sample scala code executiosn using oozie is much appreciated.

Spark custom log4j integration for Scala application

I am new to spark and Scala as well,
Question is I am not able to debug my application.
I have developed a spark application using Maven in Scala.
But I am not able to log the details, meaning not getting where that log file is getting generated, cause as per log4j property, log file is not available at given path.
Any specific changes I need to do, to get that log file.
I am testing my application in Hortonworks.
Command for submitting the app:
bin/spark-submit --master yarn-cluster --class com.examples.MainExample lib/Test.jar
log4j.properties file is kept at src/resources folder
PFB log4j.properties
log4j.appender.myConsoleAppender=org.apache.log4j.ConsoleAppender
log4j.appender.myConsoleAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.myConsoleAppender.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=/var/log/spark.log
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n
# By default, everything goes to console and file
log4j.rootLogger=INFO, myConsoleAppender, RollingAppender
# The noisier spark logs go to file only
log4j.logger.spark.storage=INFO, RollingAppender
log4j.additivity.spark.storage=false
log4j.logger.spark.scheduler=INFO, RollingAppender
log4j.additivity.spark.scheduler=false
log4j.logger.spark.CacheTracker=INFO, RollingAppender
log4j.additivity.spark.CacheTracker=false
log4j.logger.spark.CacheTrackerActor=INFO, RollingAppender
log4j.additivity.spark.CacheTrackerActor=false
log4j.logger.spark.MapOutputTrackerActor=INFO, RollingAppender
log4j.additivity.spark.MapOutputTrackerActor=false
log4j.logger.spark.MapOutputTracker=INFO, RollingAppender
log4j.additivty.spark.MapOutputTracker=false
Not able to solve this issue via application but, in if you change log4j.properties in conf folder as below, it will write logs to give file.
Make sure the path has write access.
log4j.rootLogger=INFO, FILE
# Set everything to be logged to the console
log4j.rootCategory=INFO, FILE
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File=/tmp/sparkLog/SparkOut.log
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
Try placing the log4j.properties inside 'src/main/scala/resources'.

Turn off logging of Solr in "sbt test"

I am using Solr in Scala. I have a test case that adds some documents into Solr core.
When running sbt test, the following information is shown repeatedly:
15/12/03 01:17:50 INFO LogUpdateProcessor: [test] webapp=null path=/update params={} {add=[(null)]} 0 2
In an attempt to suppress it, I put a log4j.properties with content:
.level=WARNING
org.apache.solr.core.level=WARNING
org.apache.solr.update.processor.level=WARNING
under both ${project_dir}/src/main/resources and ${project_dir}/src/test/resources
However, the log message is still there.
I am using :
Scala 2.11.5
solr-solrj 5.3.1
solr-core 5.3.1
sbt 0.1.0
The log4j.properties file is malformatted.
The following content works:
log4j.rootLogger=WARN, stdout
# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n>

gremlin from titan 1.0.0 is not running out of the box on windows

I'm following the http://s3.thinkaurelius.com/docs/titan/1.0.0/getting-started.html guide on my windows machine.
but i'm getting stuck at the very first step, getting gremlin to run:
>bin\gremlin.bat
Error opening zip file or JAR manifest missing : ..\lib\jamm-0.3.0.jar
Error occurred during initialization of VM
agent library failed to init: instrument
Found a solution in this google group for this issue and more:
to run gremlin edit the gremlin.bat file:
Change:
set LIBDIR=..\lib
To:
set LIBDIR=lib
Change:
if "%CP%" == "" (
set CP=%LIBDIR%\%1
)else (
set CP=%CP%;%LIBDIR%\%1
)
To:
if "%CP%" == "" (
set CP=%1
)else (
set CP=%CP%;%1
)
also, to add command history abilities to the gremlin command line:
in the gremlin.bat file add to the set JAVA_OPTIONS line (solution from same source):
set JAVA_OPTIONS=-Xms32m -Xmx512m -javaagent:%LIBDIR%\jamm-0.3.0.jar
add:
set JAVA_OPTIONS=-Xms32m -Xmx512m -javaagent:%LIBDIR%\jamm-0.3.0.jar -Djline.terminal=none
and lastly, to change the loglevel:
add a file named logback.xml in the titan-1.0.0-hadoop1 folder containing:
(solution from the same source)
<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<!-- encoders are assigned the type
ch.qos.logback.classic.encoder.PatternLayoutEncoder by default -->
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<root level="WARN"> <!-- set loglevel here-->
<appender-ref ref="STDOUT" />
</root>
</configuration>