how to add MongoOptions in MongoClient in casbah mongo scala driver - mongodb

i want to add mongoOptions to MongoClient basically i want to add ConnectionPerHost value its default is 10 i want to increase it to 20
but i am getting errors in the code i have tried with two different ways
val SERVER:ServerAddress = {
val hostName=config.getString("db.hostname")
val port=config.getString("db.port").toInt
new ServerAddress(hostName,port)
}
val DATABASE:String = config.getString("db.dbname")
method 1
val options=MongoClientOptions.apply( connectionsPerHost=20 )
val connectionMongo = MongoConnection(SERVER).addOption(options.getConnectionsPerHost)//returning Unit instead of MongoClient
val collectionMongo = connectionMongo(DATABASE)("testdb")
getting error on last line Unit does not take parameters
method 2
val mongoOption=MongoClientOptions.builder()
.connectionsPerHost(20)
.build();
getting error on MongoClientOptions.builder() line
value builder is not a member of object com.mongodb.casbah.MongoClientOptions
-
i want to set connectionsPerHost value to 20 please help what is the right way to do this

This seems to be working.
val config = ConfigFactory.load();
val hostName = config.getString("db.hostname")
val port = config.getInt("db.port")
val server = new ServerAddress(hostName, port)
val database = config.getString("db.dbname")
val options = MongoClientOptions(connectionsPerHost = 20)
val connectionMongo = MongoClient(server, options)
val collectionMongo = connectionMongo(database)("testdb")
Note that MongoConnection is deprecated.

Related

WordCount example - option.builder() error (intelliJ, Solr, Spark)

I started to learn Solr and I am trying to write a WordCount-example with Solr and Spark. But I have problem, perhaps with the imports or with the dependencies. You can look at my code below..
My dependencies:
<groupId>com.lucidworks.spark</groupId>
<artifactId>spark-solr</artifactId>
<version>2.1.0</version>
<groupId>org.apache.solr</groupId>
<artifactId>solr-solrj</artifactId>
<version>7.6.0</version>
My Code:
object Solr extends SparkApp.RDDProcessor {
def getName: String = "query-solr-benchmark"
def getOptions: Array[Option] = {
Array(
Option.builder()
.argName("QUERY")
.longOpt("query")
.hasArg
.required(false)
.desc("URL encoded Solr query to send to Solr")
.build()
)
}
def run(conf: SparkConf, cli: CommandLine): Int = {
val zkHost = cli.getOptionValue("zkHost", "localhost:9983")
val collection = cli.getOptionValue("collection", "collection1")
val queryStr = cli.getOptionValue("query", "*:*")
val rows = cli.getOptionValue("rows", "1000").toInt
val splitsPerShard = cli.getOptionValue("splitsPerShard", "3").toInt
val splitField = cli.getOptionValue("splitField", "_version_")
val sc = new SparkContext(conf)
val solrQuery: SolrQuery = new SolrQuery(queryStr)
val fields = cli.getOptionValue("fields", "")
if (!fields.isEmpty)
fields.split(",").foreach(solrQuery.addField)
solrQuery.addSort(new SolrQuery.SortClause("id", "asc"))
solrQuery.setRows(rows)
val solrRDD: SolrRDD = new SolrRDD(zkHost, collection, sc)
var startMs: Long = System.currentTimeMillis
var count = solrRDD.query(solrQuery).splitField(splitField).splitsPerShard(splitsPerShard).count()
var tookMs: Long = System.currentTimeMillis - startMs
println(s"\nTook $tookMs ms read $count docs using queryShards with $splitsPerShard splits")
// IMPORTANT: reload the collection to flush caches
println(s"\nReloading collection $collection to flush caches!\n")
val cloudSolrClient = SolrSupport.getCachedCloudClient(zkHost)
val req = CollectionAdminRequest.reloadCollection(collection)
cloudSolrClient.request(req)
startMs = System.currentTimeMillis
count = solrRDD.query(solrQuery).count()
tookMs = System.currentTimeMillis - startMs
println(s"\nTook $tookMs ms read $count docs using queryShards")
sc.stop()
}
}
My problem is that builder() is red and I can't run my code. Does anyone know what I missed?
error: value builder is not a member of object org.apache.commons.cli.Option
<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.4</version>
</dependency>
I missed this dependeny.

Spark Task not serializable (Array[Vector])

I am new to Spark, and I'm studying the "Advanced Analytics with Spark" book. The code is from the examples in the book. When I try to run the following code, I get Spark Task not serializable exception.
val kMeansModel = pipelineModel.stages.last.asInstanceOf[KMeansModel]
val centroids: Array[Vector] = kMeansModel.clusterCenters
val clustered = pipelineModel.transform(data)
val threshold = clustered.
select("cluster", "scaledFeatureVector").as[(Int, Vector)].
map { case (cluster, vec) => Vectors.sqdist(centroids(cluster), vec) }.
orderBy($"value".desc).take(100).last
Also, this is how I build the model:
def oneHotPipeline(inputCol: String): (Pipeline, String) = {
val indexer = new StringIndexer()
.setInputCol(inputCol)
.setOutputCol(inputCol + "_indexed")
val encoder = new OneHotEncoder()
.setInputCol(inputCol + "_indexed")
.setOutputCol(inputCol + "_vec")
val pipeline = new Pipeline()
.setStages(Array(indexer, encoder))
(pipeline, inputCol + "_vec")
}
val k = 180
val (protoTypeEncoder, protoTypeVecCol) = oneHotPipeline("protocol_type")
val (serviceEncoder, serviceVecCol) = oneHotPipeline("service")
val (flagEncoder, flagVecCol) = oneHotPipeline("flag")
// Original columns, without label / string columns, but with new vector encoded cols
val assembleCols = Set(data.columns: _*) --
Seq("label", "protocol_type", "service", "flag") ++
Seq(protoTypeVecCol, serviceVecCol, flagVecCol)
val assembler = new VectorAssembler().
setInputCols(assembleCols.toArray).
setOutputCol("featureVector")
val scaler = new StandardScaler()
.setInputCol("featureVector")
.setOutputCol("scaledFeatureVector")
.setWithStd(true)
.setWithMean(false)
val kmeans = new KMeans().
setSeed(Random.nextLong()).
setK(k).
setPredictionCol("cluster").
setFeaturesCol("scaledFeatureVector").
setMaxIter(40).
setTol(1.0e-5)
val pipeline = new Pipeline().setStages(
Array(protoTypeEncoder, serviceEncoder, flagEncoder, assembler, scaler, kmeans))
val pipelineModel = pipeline.fit(data)
I am assuming the problem is with the line Vectors.sqdist(centroids(cluster), vec). For some reason, I cannot use centroids in my Spark calculations. I have done some Googling, and I know this error happens when "I initialize a variable on the master, but then try to use it on the workers", which in my case is centroids. However, I do not know how to address this problem.
In case you got interested here is the entire code for this tutorial in the book. and here is the link to the dataset that the tutorial uses.

SCALA: Read YAML file using Scala with class constructor

I was trying to read YAML file using scala with a constructor call as below
import org.yaml.snakeyaml.constructor.Constructor
import java.io.{File, FileInputStream}
import scala.collection.mutable.ArrayBuffer
import org.yaml.snakeyaml.Yaml
object aggregation {
def main(args:Array[String]) : Unit = {
//val conf = new SparkConf().setAppName("yaml test").setMaster("local[*]")
//val sc = new SparkContext(conf)
val yamlfile = "C:\\Users\\***\\Desktop\\mongoDB\\sparkTest\\project\\properties.yaml"
val input1 = new FileInputStream(new File(yamlfile))
val yaml = new Yaml(new Constructor(classOf[ReadProperties]))
val e = yaml.load(input1).asInstanceOf[ReadProperties]
println(e.file1)
}
}
And I have a separate class so that I can have the YAML items as beans as below,
class ReadProperties(#BeanProperty var file1:String,#BeanProperty var file2:String) {
//constructor
}
And the content of my yaml file(properties.yaml) is as below,
file1: C:\\data\\names.txt
file2: C:\\data\\names2.txt
but the error is that
Can't construct a java object for tag:yaml.org,2002:ReadProperties; exception=java.lang.NoSuchMethodException: ReadProperties.<init>()
in 'reader', line 1, column 1:
file1: C:\\data\\names.txt
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:350)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
But if I use the below code it works(without constructor class),
val yaml = new Yaml
val obj = yaml.load(input)
val e = obj.asInstanceOf[java.util.HashMap[String,String]]
println(e)
result :
{file1=C:\\data\\names.txt, file2=C:\\data\\names2.txt}
16/10/02 01:24:28 INFO SparkContext: Invoking stop() from shutdown hook
I want my constructor to work and wanted to directly refer the values of parameters in yaml properties file. (for example, there are two parameters in yaml file "file1" and "file2" so I wanted to refer them directly)
Any help would be appreciated. thanks in advance!
This post suggests to use
val yaml = new Yaml
val obj = yaml.load(input1)
val e = obj.asInstanceOf[ReadProperties]
Because SnakeYAML requires a non-argument constructor if directly deserializing to a custom class object.
It is also possible to provide a custom constructor for the ReadProperties class, but since I do not really know Scala, it is beyond my abilities to show code for that. The official documentation may help.
It worked, actual issue seems to be something wrong environment. Anyways, here is how I did, yes i know its too late to comment here,
def main(args:Array[String]) : Unit = {
val conf = new SparkConf().setAppName("yaml test").setMaster("local[*]")
val sc = new SparkContext(conf)
val yamlfile = "C:\\Users\\Chaitu-Padi\\Desktop\\mongoDB\\sparkTest\\project\\properties.yaml"
val input1 = new FileInputStream(new File(yamlfile))
val yaml = new Yaml(new Constructor(classOf[yamlProperties]))
val e = yaml.load(input1)
val d = e.asInstanceOf[yamlProperties]
println(d)
}
class yamlProperties {
#BeanProperty var file1: String = null
#BeanProperty var file2: String = null
#BeanProperty var file3: String = null
override def toString: String = {
//return "%s,%s".format(file1, file2)
return ( "%s,%s,%s").format (file1, file2, file3)
}
}

Connect scala-hbase to dockered hbase

I am trying play-framework template "play-hbase".
It's a template so I expect it works in most cases.
But in my case hbase is running with boot2docker on Win 7 x64.
So I added some config details to template:
object Application extends Controller {
val barsTableName = "bars"
val family = Bytes.toBytes("all")
val qualifier = Bytes.toBytes("json")
lazy val hbaseConfig = {
val conf = HBaseConfiguration.create()
// ADDED ADDED specify boot2docker vm
conf.set("hbase.zookeeper.quorum", "192.168.59.103")
conf.set("hbase.zookeeper.property.clientPort", "2181")
conf.set("hbase.master", "192.168.59.103:60000");
val hbaseAdmin = new HBaseAdmin(conf)
// create a table in HBase if it doesn't exist
if (!hbaseAdmin.tableExists(barsTableName)) {
val desc = new HTableDescriptor(barsTableName)
desc.addFamily(new HColumnDescriptor(family))
hbaseAdmin.createTable(desc)
Logger.info("bars table created")
}
// return the HBase config
conf
}
It is being compiled and runs, but shows "bad request error" during this code.
def addBar() = Action(parse.json) { request =>
// create a new row in the table that contains the JSON sent from the client
val table = new HTable(hbaseConfig, barsTableName)
val put = new Put(Bytes.toBytes(UUID.randomUUID().toString))
put.add(family, qualifier, Bytes.toBytes(request.body.toString()))
table.put(put)
table.close()
Ok
}

Calling JDBC to impala/hive from within a spark job and creating a table

I am trying to write a spark job in scala that would open a jdbc connection with Impala and let me create a table and perform other operations.
How do I do this? Any example would be of great help.
Thank you!
val JDBCDriver = "com.cloudera.impala.jdbc41.Driver"
val ConnectionURL = "jdbc:impala://url.server.net:21050/default;auth=noSasl"
Class.forName(JDBCDriver).newInstance
val con = DriverManager.getConnection(ConnectionURL)
val stmt = con.createStatement()
val rs = stmt.executeQuery(query)
val resultSetList = Iterator.continually((rs.next(), rs)).takeWhile(_._1).map(r => {
getRowFromResultSet(r._2) // (ResultSet) => (spark.sql.Row)
}).toList
sc.parallelize(resultSetList)