ReactiveMongo 0.12 application.conf issue and logging issue - scala

I've read up everything I could on SO and the ReactiveMongo community list and I am stumped. I am using ReactiveMongo version 0.12 and am just trying to test it out since I have some other problems.
The code in my scala worksheet is:
import reactivemongo.api.{DefaultDB, MongoConnection, MongoDriver}
import reactivemongo.bson.{
BSONDocumentWriter, BSONDocumentReader, Macros, document
}
import com.typesafe.config.{Config, ConfigFactory}
lazy val conf = ConfigFactory.load()
val driver1 = new reactivemongo.api.MongoDriver
val connection3 = driver1.connection(List("localhost"))
and the error I get is
[NGSession 3: 127.0.0.1: compile-server] INFO reactivemongo.api.MongoDriver - No mongo-async-driver configuration found
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka'
at com.typesafe.config.impl.SimpleConfig.findKey(testMongo.sc:120)
at com.typesafe.config.impl.SimpleConfig.find(testMongo.sc:143)
at com.typesafe.config.impl.SimpleConfig.find(testMongo.sc:155)
at com.typesafe.config.impl.SimpleConfig.find(testMongo.sc:160)
at com.typesafe.config.impl.SimpleConfig.getString(testMongo.sc:202)
at akka.actor.ActorSystem$Settings.<init>(testMongo.sc:165)
at akka.actor.ActorSystemImpl.<init>(testMongo.sc:501)
at akka.actor.ActorSystem$.apply(testMongo.sc:138)
at reactivemongo.api.MongoDriver.<init>(testMongo.sc:879)
at #worksheet#.driver1$lzycompute(testMongo.sc:9)
at #worksheet#.driver1(testMongo.sc:9)
at #worksheet#.get$$instance$$driver1(testMongo.sc:9)
at #worksheet#.#worksheet#(testMongo.sc:30)
My application.conf is in src/main/resources of the sub-project which this worksheet is found and contains this:
mongo-async-driver {
akka {
loglevel = WARNING
}
}
I added the ConfigFactory precisely because I got this error and thought it might help. I looked at the code and that's what ReactiveMongo is doing at this point so I thought perhaps a call here would force it to load at this point. I have moved the application.conf file into every conceivable place including a conf directory (thinking it might require play conventions) and the src/main/resources of the top level directory. Nothing works. So my first question is what am I doing wrong? Where should application.conf file go?
This info message causes my program to crash and driver doesn't get created so I can't move on from here.
Also, I added an akka key to reference.conf just in case - that didnt help either.

Related

Spark ElasticSearch Configuration - Reading Elastic Search from Spark

I am trying to read data from ElasticSearch via Spark Scala. I see lot of post addressing this question, i have tried all the options they have mentioned in various posts but seems nothing is working for me
JAR Used - elasticsearch-hadoop-5.6.8.jar (Used elasticsearch-spark-5.6.8.jar too without any success)
Elastic Search Version - 5.6.8
Spark - 2.3.0
Scala - 2.11
Code:
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
val spark = SparkSession.builder.appName("elasticSpark").master("local[*]").getOrCreate()
val reader = spark.read.format("org.elasticsearch.spark.sql").option("es.index.auto.create", "true").option("spark.serializer", "org.apache.spark.serializer.KryoSerializer").option("es.port", "9200").option("es.nodes", "xxxxxxxxx").option("es.nodes.wan.only", "true").option("es.net.http.auth.user","xxxxxx").option("es.net.http.auth.pass", "xxxxxxxx")
val read = reader.load("index/type")
Error:
ERROR rest.NetworkClient: Node [xxxxxxxxx:9200] failed (The server xxxxxxxxxxxxx failed to respond); no other nodes left - aborting...
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:294)
at org.elasticsearch.spark.sql.SchemaUtils$.discoverMappingAndGeoFields(SchemaUtils.scala:98)
at org.elasticsearch.spark.sql.SchemaUtils$.discoverMapping(SchemaUtils.scala:91)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema$lzycompute(DefaultSource.scala:129)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema(DefaultSource.scala:129)
at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:133)
at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:133)
at scala.Option.getOrElse(Option.scala:121)
at org.elasticsearch.spark.sql.ElasticsearchRelation.schema(DefaultSource.scala:133)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:432)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
... 53 elided
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[xxxxxxxxxxx:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:461)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:425)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:429)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:155)
at org.elasticsearch.hadoop.rest.RestClient.remoteEsVersion(RestClient.java:655)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:287)
... 65 more
Apart from this I have also tried below properties without any success:
option("es.net.ssl.cert.allow.self.signed", "true")
option("es.net.ssl.truststore.location", "<path for elasticsearch cert file>")
option("es.net.ssl.truststore.pass", "xxxxxx")
Please note elasticsearch node is within Unix edge node and is http://xxxxxx:9200 (mentioning it just in case if that makes any difference with the code)
What I am missing here? Any other properties? Please Help
Use below Jar which support spark 2+ version instead of Elastic-Hadoop or Elastic-Spark jar.
https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-spark-20_2.11/5.6.8

Error running spark in a Scala REPL - access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )

I have been using IntelliJ for getting up to speed with developing Spark applications in Scala using sbt. I understand the basics although IntelliJ hides a lot of the scaffolding so I'd like to try getting something up and running from the command-line (i.e. using a REPL). I am using macOS.
Here's what I've done:
mkdir -p ~/tmp/scalasparkrepl
cd !$
echo 'scalaVersion := "2.11.12"' > build.sbt
echo 'libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"' >> build.sbt
echo 'libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"' >> build.sbt
echo 'libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.3.0"' >> build.sbt
sbt console
That opens a scala REPL (including downloading all the dependencies) in which I run:
import org.apache.spark.SparkConf
import org.apache.spark.sql.{SparkSession, DataFrame}
val conf = new SparkConf().setMaster("local[*]")
val spark = SparkSession.builder().appName("spark repl").config(conf).config("spark.sql.warehouse.dir", "~/tmp/scalasparkreplhive").enableHiveSupport().getOrCreate()
spark.range(0, 1000).toDF()
which fails with error access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" ):
scala> spark.range(0, 1000).toDF()
18/05/08 11:51:11 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('~/tmp/scalasparkreplhive').
18/05/08 11:51:11 INFO SharedState: Warehouse path is '/tmp/scalasparkreplhive'.
18/05/08 11:51:12 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
18/05/08 11:51:12 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
18/05/08 11:51:12 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
18/05/08 11:51:12 INFO ObjectStore: ObjectStore, initialize called
18/05/08 11:51:13 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
18/05/08 11:51:13 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
I've googled around and there is some information on this error but nothing which I've been able to use to solve it. I find it strange that a scala/sbt project on the command-line would have this problem whereas a sbt project in IntelliJ works fine (I pretty much copied/pasted the code from an IntelliJ project). I guess IntelliJ is doing something on my behalf but I don't know what, that's why I'm undertaking this exercise.
Can anyone advise how to solve this problem?
Not going to take full credit for this, but it looks similar to SBT test does not work for spark test
The solution is to issue this line before running the Scala code:
System.setSecurityManager(null)
So in full:
System.setSecurityManager(null)
import org.apache.spark.SparkConf
import org.apache.spark.sql.{SparkSession, DataFrame}
val conf = new SparkConf().setMaster("local[*]")
val spark = SparkSession.builder().appName("spark repl").config(conf).config("spark.sql.warehouse.dir", "~/tmp/scalasparkreplhive").enableHiveSupport().getOrCreate()
spark.range(0, 1000).toDF()
You can set the permission appropriately, add this to your pre-init script:
export SBT_OPTS="-Djava.security.policy=runtime.policy"
Create a runtime.policy file:
grant codeBase "file:/home/user/.ivy2/cache/org.apache.derby/derby/jars/*" {
permission org.apache.derby.security.SystemPermission "engine", "usederbyinternals";
};
This assumes that your runtime.policy file resides in the current working directory and you're pulling Derby from your locally cached Ivy repository. Change the path to reflect the actual parent folder of the Derby Jar if necessary. The placement of the asterisk is significant, and this is not a traditional shell glob.
See also: https://docs.oracle.com/javase/7/docs/technotes/guides/security/PolicyFiles.html

Spark Scala error while loading BytesWritable, invalid LOC header (bad signature)

Using sbt package I have the following error
Spark Scala error while loading BytesWritable, invalid LOC header (bad signature)
My code is
....
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
......
object Test{
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Test")
val sc = new SparkContext(conf) // the error is due by this
......
}
}
Pls re-load your JARs and / or library dependencies as they might be corrupted while building jar through sbt - could be issue with one of their update. Second alternative is that you have too many temp files open, check your 4040-9 ports on master if there are any jobs hanging and kill them if so, you can also check how increase open files you have on linux:/etc/security/limits.conf where hard nofile ***** and soft nofile ***** then reboot and ulimit -n ****
I was using spark-mllib_2.11 and it gave me the same error. I had to use version 2.10 of Spark MLIB to get rid of it.
Using Maven:
<artifactId>spark-mllib_2.10</artifactId>

Console scala app doesn't stop when using reactive mongo driver

I'm playing with Mongo database through the Reactive Mongo driver
import org.slf4j.LoggerFactory
import reactivemongo.api.MongoDriver
import reactivemongo.api.collections.default.BSONCollection
import reactivemongo.bson.BSONDocument
import scala.concurrent.Future
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
object Main {
val log = LoggerFactory.getLogger("Main")
def main(args: Array[String]): Unit = {
log.info("Start")
val conn = new MongoDriver().connection(List("localhost"))
val db = conn("test")
log.info("Done")
}
}
My build.sbt file:
lazy val root = (project in file(".")).
settings(
name := "simpleapp",
version := "1.0.0",
scalaVersion := "2.11.4",
libraryDependencies ++= Seq(
"org.reactivemongo" %% "reactivemongo" % "0.10.5.0.akka23",
"ch.qos.logback" % "logback-classic" % "1.1.2"
)
)
When I run: sbt compile run
I get this output:
$ sbt compile run
[success] Total time: 0 s, completed Apr 25, 2015 5:36:51 PM
[info] Running Main
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
17:36:52.328 [run-main-0] INFO Main - Start
17:36:52.333 [run-main-0] INFO Main - Done
And application doesn't stop.... :/
I have to press Ctrl + C to kill it
I've read that MongoDriver() creates ActorSystem so I tried to close connection manually with conn.close() but I get this:
[info] Running Main
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
17:42:23.252 [run-main-0] INFO Main - Start
17:42:23.258 [run-main-0] INFO Main - Done
17:42:23.403 [reactivemongo-akka.actor.default-dispatcher-2] ERROR reactivemongo.core.actors.MongoDBSystem - (State: Closing) UNHANDLED MESSAGE: ChannelConnected(-973180998)
[INFO] [04/25/2015 17:42:23.413] [reactivemongo-akka.actor.default-dispatcher-3] [akka://reactivemongo/deadLetters] Message [reactivemongo.core.actors.Closed$] from Actor[akka://reactivemongo/user/$b#-1700211063] to Actor[akka://reactivemongo/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [04/25/2015 17:42:23.414] [reactivemongo-akka.actor.default-dispatcher-3] [akka://reactivemongo/user/$a] Message [reactivemongo.core.actors.Close$] from Actor[akka://reactivemongo/user/$b#-1700211063] to Actor[akka://reactivemongo/user/$a#-1418324178] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
And app doesn't exit also
So, what am i doing wrong? I can'f find answer...
And it seems to me that official docs doesn't explain whether i should care about graceful shutdown at all.
I don't have much experience with console apps, i use play framework in my projects but i want to create sub-project that works with mongodb
I see many templates (in activator) such as: Play + Reactive Mongo, Play + Akka + Mongo but there's no Scala + Reactive Mongo that would explain how to work properly :/
I was having the same problem. The solution I found was invoking close on both object, the driver and the connection:
val driver = new MongoDriver
val connection = driver.connection(List("localhost"))
...
connection.close()
driver.close()
If you close only the connection, then the akka system remains alive.
Tested with ReactiveMongo 0.12
This looks like a known issue with Reactive Mongo, see the relevant thread on GitHub
A fix for this was introduced in this pull request #241 by reid-spencer, merged on the 3rd of February 2015
You should be able to fix it by using a newer version. If no release has been made since February, you could try checking out a version that includes this fix and building the code yourself.
As far as I can see, there's no mention of this bugfix in the release notes for version 0.10.5
Bugfixes:
BSON library: fix BSONDateTimeNumberLike typeclass
Cursor: fix exception propagation
Commands: fix ok deserialization for some cases
Commands: fix CollStatsResult
Commands: fix AddToSet in aggregation
Core: fix connection leak in some cases
GenericCollection: do not ignore WriteConcern in save()
GenericCollection: do not ignore WriteConcern in bulk inserts
GridFS: fix uploadDate deserialization field
Indexes: fix parsing for Ascending and Descending
Macros: fix type aliases
Macros: allow custom annotations
The name of the committer does not appear as well:
Here is the list of the commits included in this release (since 0.9, the top commit is the most recent one):
$ git shortlog -s -n refs/tags/v0.10.0..0.10.5.x.akka23
39 Stephane Godbillon
5 Andrey Neverov
4 lucasrpb
3 Faissal Boutaounte
2 杨博 (Yang Bo)
2 Nikolay Sokolov
1 David Liman
1 Maksim Gurtovenko
1 Age Mooij
1 Paulo "JCranky" Siqueira
1 Daniel Armak
1 Viktor Taranenko
1 Vincent Debergue
1 Andrea Lattuada
1 pavel.glushchenko
1 Jacek Laskowski
Looking at the commit history for 0.10.5.0.akka23 (the one you reference in build.sbt), it seems the fix was not merged into it.

XmlRpc client using Scala

I need to consume an xmlrpc service from Scala, and so far it looks like my only option is the Apache XML-RPC library.
I added this dependency to my Build.scala:
"org.apache.xmlrpc" % "xmlrpc" % "3.1.3"
and sbt reported no problem in downloading the library. However, I don't know how to go about actually accessing the library.
val xml = org.apache.xmlrpc.XmlRpcClient("http://foo") wouldn't compile
and
import org.apache.xmlrpc._
reported that object xmlrpc was not a member of package org.apache.
What would be the correct package to import?
(Or, is there a better library for XmlRpc from Scala?)
Try
"org.apache.xmlrpc" % "xmlrpc-client" % "3.1.3"
and so :
class XmlRpc(val serverURL: String) {
import org.apache.xmlrpc.client.XmlRpcClient
import org.apache.xmlrpc.client.XmlRpcClientConfigImpl
import org.apache.xmlrpc.client.XmlRpcSunHttpTransportFactory
import java.net.URL
val config = new XmlRpcClientConfigImpl();
config.setServerURL(new URL(serverURL));
config.setEncoding("ISO-8859-1");
val client = new XmlRpcClient();
client.setTransportFactory(new XmlRpcSunHttpTransportFactory(client));
client.setConfig(config);
client.execute(...)
}
There is a good module for this kind of tasks:
https://github.com/jvican/xmlrpc