How to append Spark ApplicationID in filename of log4j log file - Scala - scala

I am trying to append the Spark applicationId to the filename of log4j log file. Below is log4j.properties file
log4j.rootLogger=info,file
# Redirect log messages to console
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L -%m%n
# Redirect log messages to log file, support file rolling
log4j.appender.file=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.file.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.file.rollingPolicy.FileNamePattern=log4j//Data_Quality.%d{yyyy-MM-dd}.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L -%m%n
# set the immediate flush to true
log4j.appender.FILE.ImmediateFlush=true
# set the threshold to debug mode INFO
log4j.appender.FILE.Threshold=INFO
#Set the append to false, overwrite
log4j.appender.FILE.Append=true
Spark-submit command:
spark2-submit -conf "spark.driver.extraJavaOptions=-Dconfig.file=./input.conf -Dlog4j.configuration=log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties" --files "input.conf,log4j.properties" --master yarn --class "DataCheckImplementation" Data_quality.jar
Logfiles are created with name : Data_Quality.2020-07-21.log, which is working correctly.
I want to add Spark ApplicationID to filename
Expected filename : Data_Quality.(ApplicationID).2020-07-21.log
Example: Data_Quality.(application_1595144411765_20000).2020-07-21.log
Is it possible? Need help!

I don't know/think if this can be at configuration level (e.g lo4j.properties, etc), but there are ways we can achieve this. Here is one approach:
You will need to have a logger class/trait where you deal with all you logger management, something like :
trait SparkContextProvider {
def spark: SparkSession
}
trait Logger extends SparkContextProvider {
lazy val log = Logger.getLogger(...)
lazy val applicationId = spark.sparkContext.applicationId
val appender = new RollingFileAppender();
appender.setAppend(true);
appender.setMaxFileSize("1MB");
appender.setMaxBackupIndex(1);
appender.setFile("Data_Quality" + applicationId + "_" + dateFormat.format(date) + ".log");
appender.activateOptions();
val layOut = new PatternLayout();
layOut.setConversionPattern("%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n");
appender.setLayout(layOut);
log.addAppender(appender);
}

Related

Problem with configuring log4j2.properties into spring boot( using gradle)

I added a log4j2.properties file in scr/main/resources but it is not getting affected. Shouldn't log4j2.properties get detected on its own. How can I check if it's not getting detected??
Log4j2.properties file
status = error
name = PropertiesConfig
filters = threshold
filter.threshold.type = ThresholdFilter
filter.threshold.level = debug
appenders = console
appender.console.type = Console
appender.console.name = STDOUT
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
rootLogger.level = debug
rootLogger.appenderRefs = stdout
rootLogger.appenderRef.stdout.ref = STDOUT
Spring Boot is using Logback as logging framework.
If you want to use Log4j2 you have to do some configuration.
Exclude the default logger and add log4j2 starter dependency:
dependencies {
compile 'org.springframework.boot:spring-boot-starter-web'
compile 'org.springframework.boot:spring-boot-starter-log4j2'
}
configurations {
all {
exclude group: 'org.springframework.boot', module: 'spring-boot-starter-logging'
}
}
And as far as I know Log4j2 is configured usting a XML file not a property file.
Please find all the information in the official Spring Boot Reference Documentation:
https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#howto-configure-log4j-for-logging

Problems in the configuration between hadoop and spark

I have a problem in a program and I do not have this problem with spark-shell.
When I call:
FileSystem.get(spark.sparkContext.hadoopConfiguration)
In the spark-shell, everything works perfectly, but when I try to use it in the code, I can't read the core-site.xml. I still get it to work when I use:
val conf = new Configuration()
conf.addResource(new Path("path to conf/core-site.xml"))
FileSystem.get(conf)
This solution is not acceptable, since I need to use the Hadoop configuration without passing the configuration explicitly.
Both in (Spark-shell and in the program) the master is called with the parameters spark: //x.x.x.x: 7077
How can I configure spark to use the hadoop configuration?
Code:
val HdfsPrefix: String = "hdfs://"
val path: String = "/tmp/"
def getHdfs(spark: SparkSession): FileSystem = {
//val conf = new Configuration()
//conf.addResource(new Path("/path to/core-site.xml"))
//FileSystem.get(conf)
FileSystem.get(spark.sparkContext.hadoopConfiguration)
}
val dfs = getHdfs(session)
data.select("name", "value").collect().foreach{ x =>
val os = dfs.create(new Path(HdfsPrefix + path + x.getString(0)))
val content: String = x.getString(1)
os.write(content.getBytes)
os.hsync()
}
Error log:
Wrong FS: hdfs:/tmp, expected: file:///
java.lang.IllegalArgumentException: Wrong FS: hdfs:/tmp, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:428)
at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:690)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:446)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:433)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:775)
at com.bbva.ebdm.ocelot.io.hdfs.HdfsIO$HdfsOutputFile$$anonfun$write$1.apply(HdfsIO.scala:116)
at com.bbva.ebdm.ocelot.io.hdfs.HdfsIO$HdfsOutputFile$$anonfun$write$1.apply(HdfsIO.scala:115)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at com.bbva.ebdm.ocelot.io.hdfs.HdfsIO$HdfsOutputFile.write(HdfsIO.scala:115)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseApp$$anonfun$exec$1.apply(SparkSqlBaseApp.scala:33)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseApp$$anonfun$exec$1.apply(SparkSqlBaseApp.scala:31)
at scala.collection.immutable.Map$Map3.foreach(Map.scala:161)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseApp$class.exec(SparkSqlBaseApp.scala:31)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseAppTest$$anonfun$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$1$$anonfun$2$$anonfun$apply$2$$anon$1.exec(SparkSqlBaseAppTest.scala:47)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseAppTest$$anonfun$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$3.apply(SparkSqlBaseAppTest.scala:49)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseAppTest$$anonfun$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$3.apply(SparkSqlBaseAppTest.scala:47)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseAppTest$$anonfun$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$1.apply(SparkSqlBaseAppTest.scala:47)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseAppTest$$anonfun$1$$anonfun$apply$mcV$sp$1$$anonfun$apply$1.apply(SparkSqlBaseAppTest.scala:47)
at wvlet.airframe.Design.runWithSession(Design.scala:169)
at wvlet.airframe.Design.withSession(Design.scala:182)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseAppTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(SparkSqlBaseAppTest.scala:47)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSpecLike$$anon$1.apply(FunSpecLike.scala:454)
at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196)
at org.scalatest.FunSpec.withFixture(FunSpec.scala:1630)
at org.scalatest.FunSpecLike$class.invokeWithFixture$1(FunSpecLike.scala:451)
at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:464)
at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:464)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
at org.scalatest.FunSpecLike$class.runTest(FunSpecLike.scala:464)
at org.scalatest.FunSpec.runTest(FunSpec.scala:1630)
at org.scalatest.FunSpecLike$$anonfun$runTests$1.apply(FunSpecLike.scala:497)
at org.scalatest.FunSpecLike$$anonfun$runTests$1.apply(FunSpecLike.scala:497)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:373)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:410)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
at org.scalatest.FunSpecLike$class.runTests(FunSpecLike.scala:497)
at org.scalatest.FunSpec.runTests(FunSpec.scala:1630)
at org.scalatest.Suite$class.run(Suite.scala:1147)
at org.scalatest.FunSpec.org$scalatest$FunSpecLike$$super$run(FunSpec.scala:1630)
at org.scalatest.FunSpecLike$$anonfun$run$1.apply(FunSpecLike.scala:501)
at org.scalatest.FunSpecLike$$anonfun$run$1.apply(FunSpecLike.scala:501)
at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
at org.scalatest.FunSpecLike$class.run(FunSpecLike.scala:501)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseAppTest.org$scalatest$BeforeAndAfterAll$$super$run(SparkSqlBaseAppTest.scala:31)
at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
at com.bbva.ebdm.ocelot.templates.spark_sql.SparkSqlBaseAppTest.run(SparkSqlBaseAppTest.scala:31)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1346)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1340)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1506)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
at org.scalatest.tools.Runner$.run(Runner.scala:850)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:131)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
You need to put the hdfs-site.xml, core-site.xml in spark class path i.e classpath of your program when you are running it
https://spark.apache.org/docs/latest/configuration.html#custom-hadoophive-configuration
According to doc's:
If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that should be included on Spark’s classpath:
hdfs-site.xml, which provides default behaviors for the HDFS client.
core-site.xml, which sets the default filesystem name.
The location of these configuration files varies across Hadoop versions, but a common location is inside of /etc/hadoop/conf. Some tools create configurations on-the-fly, but offer a mechanism to download copies of them.
To make these files visible to Spark, set HADOOP_CONF_DIR in $SPARK_HOME/conf/spark-env.sh to a location containing the configuration files.
The problem was 'ScalaTest', it doesn't read the core-site.xml when Maven is compiling the proyect, but spark-submit reads it correctly when the proyect is compiled.

Scala junit picking up wrong log4j.properties file

I have a test written in Scala, using junit. The test is in a module of a multi-pom with many other modules.
Here is the code of the test:
import org.apache.log4j.Logger
import org.apache.logging.log4j.scala.Logging
import org.junit._
class MyTest extends Logging {
#Test
def mainTest() = {
//val logger = Logger.getLogger("MyTest")
logger.fatal("fatal")
logger.error("error")
logger.warn("warn")
logger.info("info")
logger.debug("debug")
logger.trace("trace")
}
}
And here is the log4j.properties file, which is in the resources folder:
log4j.rootCategory=ALL, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
The maven dependencies are:
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api-scala_2.10</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.8.2</version>
</dependency>
When I run the test, the debug and trace levels are not printed.
It seems to me that the logger might be picking up a files from one of the other projects. why?
If I uncomment the first line of the test, all the levels get printed.
Tried adding -Dlog4j.debug to the run command, but log4j seems to be ignoring it.
Any idea what I'm missing?
You are using log4j2.
Your file name should be log4j2.properties.
Also, the syntax of the .properties file has changes. The following example, taken from here, will get you started:
name=PropertiesConfig
property.filename = logs
appenders = console, file
appender.console.type = Console
appender.console.name = STDOUT
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%-5level] %d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %c{1} - %msg%n
appender.file.type = File
appender.file.name = LOGFILE
appender.file.fileName=${filename}/propertieslogs.log
appender.file.layout.type=PatternLayout
appender.file.layout.pattern=[%-5level] %d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %c{1} - %msg%n
loggers=file
logger.file.name=guru.springframework.blog.log4j2properties
logger.file.level = debug
logger.file.appenderRefs = file
logger.file.appenderRef.file.ref = LOGFILE
rootLogger.level = debug
rootLogger.appenderRefs = stdout
rootLogger.appenderRef.stdout.ref = STDOUT

Spark program behave differently based on --master is set to local[4] or yarn-client

I am using the movielens dataset to load Movie information into a spark program and print the same using the following code snippet
import org.apache.spark.{SparkConf, SparkContext}
object MovieApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("movie-recommender")
val sc = new SparkContext(conf)
val movieFile = "/mnt/DATASETS/ml-1m/movies.dat"
val movieData = sc.textFile(movieFile)
val movies = movieData.map(_.split("::") match { case Array(movieid, title, genres) =>
val genreList = genres.split("|")
(movieid, title, genreList)
})
println("Num movies:" + movies.count())
movies.foreach { case movielist =>
println("ID:" + movielist._1 + "Title:" + movielist._2)
}
}
}
When I run the code using the command
spark-submit --master local[4] --class "MovieApp" movie-recommender.jar I get the expected output as
*root#philli ml]# /usr/lib/spark/bin/spark-submit --master local[4] --class "MovieApp" movie-recommender_2.10-1.0.jar
14/12/05 00:17:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Num movies:3883
ID:2020 Title:Dangerous Liaisons (1988)
ID:2021 Title:Dune (1984)
ID:2022 Title:Last Temptation of Christ, The (1988)
ID:2023 Title:Godfather: Part III, The (1990)
ID:2024 Title:Rapture, The (1991)
ID:2025 Title:Lolita (1997)
ID:2026 Title:Disturbing Behavior (1998)
ID:2027 Title:Mafia! (1998)
ID:2028 Title:Saving Private Ryan (1998)
ID:2029 Title:Billy's Hollywood Screen Kiss (1997)
...
*
but when I run the same on a hadoop cluster using the command
spark-submit --master yarn-client --class "MovieApp" movie-recommender.jar the output is different as below (no movie details???)
*[root#philli ml]# /usr/lib/spark/bin/spark-submit --master yarn-client --class "MovieApp" movie-recommender_2.10-1.0.jar
14/12/05 00:21:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/05 00:21:07 WARN BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
--args is deprecated. Use --arg instead.
Num movies:3883
[root#philli ml]# *
Why should the behavior of the program change between running it as local vs on the cluster....I have built spark-1.1.1 for hadoop using the command
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean
package
The cluster I am using is HDP2.1
Sample movies.dat file is as follows:
1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy
6::Heat (1995)::Action|Crime|Thriller
7::Sabrina (1995)::Comedy|Romance
8::Tom and Huck (1995)::Adventure|Children's
9::Sudden Death (1995)::Action
10::GoldenEye (1995)::Action|Adventure|Thriller
When you run the program on a cluster, the foreach closure will be executed in the workers, therefore the println is happening, but on the worker's stdout and not on the driver.
Look into the yarn logs and you will find the expected output.

akka.actor.ActorLogging does not log the stack trace of exception by logback

I am using Logback + SLF4J to do logging for those actors with trait of akka.actor.ActorLogging. However, when I do the code log.error("Error occur!", e), the stack trace of the exception e is not logged, but only print a line of Error occur! WARNING arguments left: 1. I wonder why and how to print the stack trace in the log file. Thank you. The following is my logback.groovy file configuration.
appender("FILE", RollingFileAppender) {
file = "./logs/logd.txt"
append = true
rollingPolicy(TimeBasedRollingPolicy) {
fileNamePattern = "./logs/logd.%d{yyyy-MM-dd}.log"
maxHistory = 30
}
encoder(PatternLayoutEncoder) {
pattern = "%date{ISO8601} [%thread] %-5level %logger{36} %X{sourceThread} - %msg%n"
}
}
root(DEBUG, ["FILE"])
Akka has separate logging, which is configured in Akka's application.conf. If you want bridge to SLF4J/Logback - use thеsе settings:
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
loglevel = "DEBUG"
}
See: http://doc.akka.io/docs/akka/2.0/scala/logging.html
As far as I can see here, reason (Throwable) should be the first argument of log.error:
def error(cause: Throwable, message: String)
That's why you see "WARNING arguments left" - your Throwable argument was just ignored.
The 'cause' exception should be the first argument to error, not the second (as correctly mentioned by JasonG in a comment on another answer).
Using the Akka log system instead of 'bare' scala-logging has some advantages around automatically added metadata and easier testing/filtering.
See also:
http://doc.akka.io/docs/akka/2.4.16/scala/logging.html
http://doc.akka.io/api/akka/2.4/akka/event/LoggingAdapter.html#error(cause:Throwable,message:String):Unit