Scala| Databricks - Monitoring logging - error in log analytics - scala

I am trying to create custom logging for my data brick notebook , for that I followed monitoring logging which seems an only option for scala codes to have logging in databricks. I followed the steps created the maven project built the jars copied them to dbfs . Now i can see the log analytics workspace created is showing below error. why ? and also What i have to do to print the customs logs only in my log analytics for scala codes ? from cluster . As when
I am initializing logger like this val logger = Logger.getLogger("SDMmodule")
I was caught into this issue : Spark NotSerializableException when overriding log4j logs
Now if again i will initialize will i not get this issue? What snippet of code i have to add in my scala notebooks so that i don't end up to square one.
Hope you can help !! thank you
getting error in
monitoring-logging repo :
https://github.com/mspnp/spark-monitoring

Related

How to resolve: ERROR Error while starting connector "java.lang.NoSuchFieldError: SYSTEM"

I am not able to get kafka-connect-jdbc
working with Kafka 0.10.1 on HDInsight Cluster. Here are the steps I went thru so far:
I cloned the repo and ran mvn install and after struggling with the dependencies (not SNAPSHOTS), I got the jar.
moved it to ./libs and I was able to see io.confluent.connect.jdbc.JdbcSourceConnector & io.confluent.connect.jdbc.JdbcSinkConnector when hitting GET /connector-plugins.
I created a source connector to Azure SQLServer & a topic.
I am getting the following error once, the source connector is created:
[2018-02-14 15:46:16,960] ERROR Error while starting connector azure-source-connector-test (org.apache.kafka.connect.runtime.WorkerConnector:108)
java.lang.NoSuchFieldError: SYSTEM
at io.confluent.connect.jdbc.source.JdbcSourceConnectorConfig.(JdbcSourceConnectorConfig.java:184)
at io.confluent.connect.jdbc.JdbcSourceConnector.start(JdbcSourceConnector.java:69)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:100)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:125)
at org.apache.kafka.connect.runtime.WorkerConnector.transitionTo(WorkerConnector.java:182)
at org.apache.kafka.connect.runtime.Worker.startConnector(Worker.java:165)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector(DistributedHerder.java:773)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startWork(DistributedHerder.java:747)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.handleRebalanceCompleted(DistributedHerder.java:708)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:204)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:174)
at java.lang.Thread.run(Thread.java:748)
I tried different versions/branches of kafka-connect-jdbc but all trials end with the same error.
This question
has same issue, so I tried to force the build to use connect-api-0.10.1.2.6.2.3-1.jar via systemPath in pom.xml but if the build goes thru, it still has the same issue.
Any idea?

Why does running Spark job fail to find classes inside uberjar on EMR while it works locally fine?

I have a Spark Job that is using some external libraries to work. When I run the job locally through the main method from IntelliJ the job runs without any issues. However, when I assembly my job into a jarfile (I create an UberJAR using sbt) and I try to run it on EMR, it throws a ClassNotFoundException.
I have checked that the class is indeed inside the jarfile so it should be available for the job to run. I have also tried the spark-submit options spark.driver.extraClassPath, spark.driver.extraLibraryPath, spark.executor.extraClassPath and spark.executor.extraLibraryPath as well as spark.driver.userClassPathFirst and spark.executor.userClassPathFirst. Also, I tried doing in the code sparkContext.addJar("/mnt/jars/myJar"). None of them worked for me.
Also, when running on EMR I can read the log that says that the JAR was added (not sure if it is loaded on the classpath, but it should because other classes are being loaded properly):
15/11/02 04:10:26 INFO SparkContext: Added JAR file:///mnt/my-app-1.0-SNAPSHOT.jar at http://172.31.42.244:44471/jars/my-app-1.0-SNAPSHOT.jar with timestamp 1446437426661
I am running out of ideas about what else to try. I have been researching and I see few tickets on the Spark JIRA board but nothing similar to my issue.
I am running on EMR release-label 4.1.0 (Spark 1.5.0), Java 7, sbt 0.13.7 and Scala 2.10.5.
I think when launching your job on EMR you need to provide the s3 location for your jar dependencies a la the manual e.g. -u s3://sparksupport/libs. These jars will be added to the classpath when running spark.
It turned out to be a problem with SerializationUtils from Apache Commons Lang. There is an open issue where the class will throw a ClassNotFoundException even if the class is in the classpath in a multiple-classloader environment: https://issues.apache.org/jira/browse/LANG-1049
We moved away from the library and our Spark job is working fine now. The issue was not related with Spark finally.

Can't use akka in IDEA plugin development

When I develop an IDEA plugin, I want to use akka, but have some problems.
I created a demo project here: https://github.com/freewind/idea-plugin-akka-demo
You can just clone it and reproduce the problem on your computer. (Notice the Setup section)
And I copy the problem here:
Problems
1. Can't use default akka configuration
If I removed:
src/main/resources/application.conf
src/main/scala/freewind/MyAkkaConfig
and run this plugin, it will report this error when starting:
com.intellij.ide.plugins.PluginManager$StartupAbortedException:
com.intellij.diagnostic.PluginException: No configuration setting found for key 'akka'
[Plugin: com.yourcompany.unique.plugin.id]
2. Can't load the configuration from file
Then I copied the reference.conf from akka jar, to src/main/resources/application.conf, but it still report the same error. Seems akka in IDEA plugin can't find this file automatically.
3. ClassNotFoundException: akka.actor.LightArrayRevolverScheduler
So I have to use MyAkkaConfig.scala to hardcode the configuration in scala code, but this time, it reports another error:
com.intellij.ide.plugins.PluginManager$StartupAbortedException:
com.intellij.diagnostic.PluginException: ClassNotFoundException: akka.actor.LightArrayRevolverScheduler
[Plugin: com.yourcompany.unique.plugin.id]
The akka.actor.LightArrayRevolverScheduler is used in MyAkkaConfig.scala, and is included in akka-actor_2.11:2.3.12:jar. But why IDEA can't load it?
For the 3rd problem, it can be fixed by passing the classloader:
val system = ActorSystem("my-actor", MyAkkaConfig.config, this.getClass.getClassLoader)
But we also can remove the MyAkkaConfig.config, to use the file application.conf under resources

Change logging level for sbt project using Play's Logger

I have a sbt project that uses Play 2.2 as a dependency. When a program is run in the console, only ERROR level messages show up. How do you change the level so INFO, DEBUG levels are printed?
example ERROR level output:
18:36:22.388 [run-main-1] ERROR application - Message
18:36:22.389 [run-main-1] ERROR application - Message
Try this:
Check if play application logging level is set properly:
You can override log levels in application.conf as follows:
# Root logger:
logger.root=DEBUG
for more detailed information see this article: Play 2.2 logging configuration
also be sure that your sbt has proper logging level:
$ sbt --debug
for more detailed information see this article: sbt 13.2 logging configuration

Launch a mapreduce job from eclipse

I've written a mapreduce program in Java, which I can submit to a remote cluster running in distributed mode. Currently, I submit the job using the following steps:
export the mapreuce job as a jar (e.g. myMRjob.jar)
submit the job to the remote cluster using the following shell command: hadoop jar myMRjob.jar
I would like to submit the job directly from Eclipse when I try to run the program. How can I do this?
I am currently using CDH3, and an abridged version of my conf is:
conf.set("hbase.zookeeper.quorum", getZookeeperServers());
conf.set("fs.default.name","hdfs://namenode/");
conf.set("mapred.job.tracker", "jobtracker:jtPort");
Job job = new Job(conf, "COUNT ROWS");
job.setJarByClass(CountRows.class);
// Set up Mapper
TableMapReduceUtil.initTableMapperJob(inputTable, scan,
CountRows.MyMapper.class, ImmutableBytesWritable.class,
ImmutableBytesWritable.class, job);
// Set up Reducer
job.setReducerClass(CountRows.MyReducer.class);
job.setNumReduceTasks(16);
// Setup Overall Output
job.setOutputFormatClass(MultiTableOutputFormat.class);
job.submit();
When I run this directly from Eclipse, the job is launched but Hadoop cannot find the mappers/reducers. I get the following errors:
12/06/27 23:23:29 INFO mapred.JobClient: map 0% reduce 0%
12/06/27 23:23:37 INFO mapred.JobClient: Task Id : attempt_201206152147_0645_m_000000_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException: com.mypkg.mapreduce.CountRows$MyMapper
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
...
Does anyone know how to get past these errors? If I can fix this, I can integrate more MR jobs into my scripts which would be awesome!
If you're submitting the hadoop job from within the Eclipse project that defines the classes for the job then you most probably have a classpath problem.
The job.setjarByClass(CountRows.class) call is finding the class file on the build classpath, and not in the CountRows.jar (which may or may not have been built yet, or even on the classpath).
You should be able to assert this is true by printing out the result of job.getJar() after you call job.setjarByClass(..), and if it doesn't display a jar filepath, then it's found the build class, rather than the jar'd class
What worked for me was exporting a runnable JAR (the difference between it and a JAR is that the first defines the class, which has the main method) and selecting the "packaging required libraries into JAR" option (choosing the "extracting..." option leads to duplicate errors and it also has to extract the class files from the jars, which, ultimately, in my case, resulted in not resolving the class not found exception).
After that, you can just set the jar, as was suggested by Chris White. For Windows it would look like this: job.setJar("C:\\\MyJar.jar");
If it helps anybody, I made a presentation on what I learned from creating a MapReduce project and running it in Hadoop 2.2.0 in Windows 7 (in Eclipse Luna)
I have used this method from the following website to configure a Map/Reduce project of mine to run the project using Eclipse (w/o exporting project as JAR)
Configuring Eclipse to run Hadoop Map/Reduce project
Note: If you decide to debug you program, your Mapper class and Reducer class won't be debug-able.
Hope it helps. :)