How to disable logging with spark/scala project using gradle? - scala

I am using IntelliJ with gradle to build spark project. I tried multiple ways to disable logging but to no avail.
Here are some of the things that I have tried.
1. Add log4j.properties under src/main/resources with the following configuration
# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
# Change this to set Spark log level
log4j.logger.org.apache.spark=WARN
# Silence akka remoting
log4j.logger.Remoting=WARN
# Ignore messages below warning level from Jetty, because it's a bit verbose
log4j.logger.org.eclipse.jetty=WARN
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
Add these before and after sparkSession creation.
Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)
Logger.getLogger("spark").setLevel(Level.OFF)
LogManager.getRootLogger.setLevel(Level.OFF)
Add this when creating sparkSession
val spark: SparkSession = SparkSession.builder()
.appName("test").master("local")
.enableHiveSupport()
.getOrCreate()
spark.sparkContext.setLogLevel("OFF")
I am running IntelliJ using mac OS,
scalaVersion=2.11.8
sparkVersion=2.1.1
Help me please, I have looked up and down in the Internet and still can't find anything that works
Thanks a million!
May.

Related

Spark Application Not reading log4j.properties present in Jar

I am using MapR5.2 - Spark version 2.1.0
And i am running my spark app jar in Yarn CLuster mode.
I have tried all the available options that i found But unable to succeed.
This is our Production environment. But i need that for my particular spark job it should follow and pick-up my log4j-Driver.properties file which is present in my src/main/resources folder(I also confirmed by opening the jar my property file is present)
1) Content of My Log File -> log4j-Driver.properties
log4j.rootCategory=DEBUG, FILE
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.FILE=org.apache.log4j.RollingFileAppender
log4j.appender.FILE.File=/users/myuser/logs/Myapp.log
log4j.appender.FILE.ImmediateFlush=true
log4j.appender.FILE.Threshold=debug
log4j.appender.FILE.Append=true
log4j.appender.FILE.MaxFileSize=100MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
2) My Script for Spark-Submit Command
propertyFile=application.properties
spark-submit --class MyAppDriver \
--conf "spark.driver.extraJavaOptions -Dlog4j.configuration=file:/users/myuser/log4j-Driver.properties" \
--master yarn --deploy-mode cluster \
--files /users/myuser/log4j-Driver.properties,/opt/mapr/spark/spark-2.1.0/conf/hive-site.xml,/users/myuser/application.properties
/users/myuser/myapp_2.11-1.0.jar $propertyFile
All i Need is as of now i am trying to Write my Driver Logs in the directory mentioned in my properties file(mentioned above) If i am successful in this then i will try for Executor logs as well. But first i need to make this Driver Log to write on my local (and its an Edge node of our Cluster)
/users/myuser/log4j-Driver.properties seems to be the path to the file on your local computer so you were right to use it for the --files argument.
The problem is, that there's no such file on the driver and/or executor, so when you use file:/users/myuser/log4j-Driver.properties as an argument to -Dlog4j.configuration Log4j will fail to find it.
Since you run on YARN, files listed as arguments to --files will be submitted to HDFS. Each application submission will have its own base directory in HDFS where all the files will be put by spark-submit.
In order to refer to these files use relative paths. In your case --conf "spark.driver.extraJavaOptions -Dlog4j.configuration=log4j-Driver.properties" should work.

log4j.properties file not found on classpath or ignored

I want to log in maprDB a spark job with log4j. I have written a custom appender, and here my log4j.properties :
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss}
%-5p %c{1}:%L - %m%n
log4j.appender.MapRDB=com.datalake.maprdblogger.Appender
log4j.logger.testest=WARN, MapRDB
Put on src/main/resources directory
This is my main method :
object App {
val log: Logger = org.apache.log4j.LogManager.getLogger(getClass.getName)
def main(args: Array[String]): Unit = {
// custom appender
LogHelper.fillLoggerContext("dev", "test", "test", "testest", "")
log.error("bad record.")
}
}
When I run my spark-submit without any configuration, nothing happens. It is like my log4j.properties wasn't here.
If I deploy my log4j.properties file manually and add options :
--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/PATH_TO_FILE/log4j.properties
--conf spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/PATH_TO_FILE/log4j.properties
It works well. Why it doesn't work without theese options ?
The "spark.driver.extraJavaOptions" :
Default value is: (none)
A string of extra JVM options to pass to the driver. For instance, GC settings or other logging. Note that it is illegal to set maximum heap size (-Xmx) settings with this option. Maximum heap size settings can be set with spark.driver.memory in the cluster mode and through the --driver-memory command-line option in the client mode.
Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.
[Refer to this link for more details:
https://spark.apache.org/docs/latest/configuration.html]

Unable to connect to remote Spark Master - ROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message

I posted this question some time ago but it came out that I was using my local resources instead of remote's ones.
I have a remote machine configured with spark : 2.1.1, cassandra : 3.0.9 and scala : 2.11.6.
Cassandra is configured at localhost:9032 and spark master at localhost:7077.
Spark master is set to 127.0.0.1 and its port to 7077.
I'm able to connect to cassandra remotely but unable to do the same
thing with spark.
When connecting to the remote spark master, I get the following error:
ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
Here are my settings via code
val configuration = new SparkConf(true)
.setAppName("myApp")
.setMaster("spark://xx.xxx.xxx.xxx:7077")
.set("spark.cassandra.connection.host", "xx.xxx.xxx.xxx")
.set("spark.cassandra.connection.port", 9042)
.set("spark.cassandra.input.consistency.level","ONE")
.set("spark.driver.allowMultipleContexts", "true")
val sparkSession = SparkSession
.builder()
.appName("myAppEx")
.config(configuration)
.enableHiveSupport()
.getOrCreate()
I don't understand why cassandra works just fine and spark does not.
What's causing this? How can I solve?
I answer to this question in order to help other people who are struggling with this problem.
It came out that it was caused by a mismatch between Intellij Idea's scala version and server's one.
Server had scala ~ 2.11.6 while the IDE was using scala ~ 2.11.8.
In order to make sure of using the very same version, it was necessary to change IDE's scala version by doing the following steps:
File > Project Structure > + > Scala SDK > Find and select server's scala Version > Download it if you haven't it already installed > Ok > Apply > Ok
Could this be a typo? The errormessage reports 8990, in your connect config you have port 8090 for spark.

Spark 2.2.0 - unable to read recursively into directory structure

Problem summary:
I am unable to read from nested subdirectories using my Spark program, despite setting the required Hadoop configuration (see attempted).
I get the error pasted below.
Any help is appreciated.
Version:
Spark 2.2.0
Input directory layout:
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939225073/part-00000-3a44cd00-e895-4a01-9ab9-946064b739d4-c000.parquet
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=1502939234036/part-00000-cbd47353-0590-4cc1-b10d-c18886df1c25-c000.parquet
...
Input directory parameter passed:
/user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/*/*
Attempted (1):
Set parameter in code...
val sparkSession: SparkSession =SparkSession.builder().master("yarn").getOrCreate()
//Recursive glob support & loglevel
import sparkSession.implicits._sparkSession.sparkContext.hadoopConfiguration.setBoolean("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", true)
Did not see the configuration in place in Spark UI.
Attempted (2):
Passed the config from the CLI - spark-submit, and set it in code (see below).
spark-submit --conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true \...
I do see the configuration in the Spark UI, but same error – cannot traverse into the directory structure..
Code:
//Spark Session
val sparkSession: SparkSession=SparkSession.builder().master("yarn").getOrCreate()
//Recursive glob support
val conf= new SparkConf()
val cliRecursiveGlobConf=conf.get("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive")
import sparkSession.implicits._
sparkSession.sparkContext.hadoopConfiguration.set("spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive", cliRecursiveGlobConf)
Error & overall output:
Full error is at - https://gist.github.com/airawat/77fbdb821410a5a87dfd29ffaf60fdf9
17/08/18 15:59:29 INFO state.StateStoreCoordinatorRef: Registered
StateStoreCoordinator endpoint
Exception in thread "main" java.io.FileNotFoundException: File /user/akhanolk/data/myq/parsed/myq-app-logs/to-be-compacted/flat-view-format/batch_id=*/* does not exist.

java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream with Spark on local mode

I have used Spark before in yarn-cluster mode and it's been good so far.
However, I wanted to run it "local" mode, so I created a simple scala app, added spark as dependency via maven and then tried to run the app like a normal application.
However, I get the above exception in the very first line where I try to create a SparkConf object.
I don't understand, why I need hadoop to run a standalone spark app. Could someone point out what's going on here.
My two line app:
val sparkConf = new SparkConf().setMaster("local").setAppName("MLPipeline.AutomatedBinner")//.set("spark.default.parallelism", "300").set("spark.serializer", "org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.mb", "256").set("spark.akka.frameSize", "256").set("spark.akka.timeout", "1000") //.set("spark.akka.threads", "300")//.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") //.set("spark.akka.timeout", "1000")
val sc = new SparkContext(sparkConf)