Scala - get BSON document size with new ReactiveMongo(2.13) - scala

I want to find out the BSON document size before updating the document, using Scala and
reactiveMongo - lib.
Working with the old lib(2.12) the equivalent code -
val bsonDoc = EntityBSONHandler.write(entity)
val size = BSONDocumentBufferHandler
.write(bsonDoc, ChannelBufferWritableBuffer())
.toReadableBuffer()
.size
I could not figure out the new implementation with version 2.13 or later.
Thank you.

Related

Grails 3.1 - Can't find codec for domain class

I am not able to convert domain class into Basic DB object.
Below is my code:
def update_val
class_object.class.withNewSession { MongoCodecSession m ->
update_val = m.pendingUpdates.find {
it.key.name == d.class.getName()
}.value[0]nativeEntry.regions[0]."${instance.getDbKey()}"[0]
}
On below findOneAndUpdate function, I am getting error: "Can't find a codec for class class.domain". updateVal is returning as Domain Class object.
ClassName.class.findOneAndUpdate(new BasicDBObject(findVal), new BasicDBObject(updateval))
I am converting it from Grails 3.0 to Grails 3.1, here nativeEntry is returning as a domain class while in previous version, nativeEntry is returning as BasicDBObject.
Any solution?
I am using Grails 3.1 with gorm 5.0 and mongodb 3.4
I have resolved it. Add below solution to Application.yml
grails:
mongodb:
engine: mapping
It will convert MongoCodecSession to the previous MongoSession.
As in codecs, objects are no longer converted first to MongoDB Document objects and then to Groovy objects, instead the driver reads Groovy objects directly from the JSON stream at the driver level, which is far more efficient than the previous MongoSession.

How to get ROUTE_PATTERN from request in play 2.6 scala

I've extracted ROUTE_PATTERN in play 2.5 with:
request.tags.get("ROUTE_PATTERN")
now I updated play 2.6 and this code doesn't work anymore. I read play documentation here:
What’s new in Play 2.6
I tried:
object Attrs {
val RoutePattern: TypedKey[String] = TypedKey("ROUTE_PATTERN")
}
request.attrs.get(Attrs.RoutePattern)
It always returns None. How I can get the RoutePattern of request in play 2.6?
From the 2.6 migration guide:
If you used any of the Router.Tags.* tags, you should change your code to use the new Router.Attrs.HandlerDef (Scala)....
This new attribute contains a HandlerDef object with all the information that is currently in the tags. The current tags all correspond to a field in the HandlerDef object....
The field in HandlerDef that corresponds to the old ROUTE_PATTERN tag is path:
import play.api.routing.{ HandlerDef, Router }
import play.api.mvc.RequestHeader
val handler = request.attrs(Router.Attrs.HandlerDef)
val routePattern = handler.path

ingesting data in solr using spark scala

I am trying to ingest data to solr using scala and spark however, my code is missing something. For instance, I got below code from Hortonworks tutorial.
I am using spark 1.6.2, solr 5.2.1, scala 2.10.5.
Can anybody provide me a workable snippet to successfully insert data into solr?
val input_file = "hdfs:///tmp/your_text_file"
case class Person(id: Int, name: String)
val people_df1 = sc.textFile(input_file).map(_.split(",")).map(p => Person(p(0).trim.toInt, p(1))).toDF()
val docs = people_df1.map{doc=>
val docx=SolrSupport.autoMapToSolrInputDoc(doc.getAs[Int]("id").toString, doc, null)
docx.setField("scala_s", "supercool")
docx.setField("name_s", doc.getAs[String]("name"))
}
// below code has compilation issue somehow although jar file doest contain these functions.
SolrSupport.indexDocs("sandbox.hortonworks.com:2181","testsparksolr",10,docs)
val solrServer = com.lucidworks.spark.SolrSupport.getSolrServer("http://ambari.asiacell.com:2181")
solrServer.setDefaultCollection("
testsparksolr")
solrServer.commit(false, false)
thanks in advance
Have you tried spark-solr?
The library's main focus is to provide a clean API to index documents to a Solr server as in your case.

read .doc file using scala

I want to read .doc file in scala. I tried using apache.poi library for this but the method HWPFDocument(java.io.InputStream istream) accepts java io stream.
If anyone can shed some light on this, that would great!
So, here is a teaser to get you started:
val fis = new FileInputStream("/path/to/file/doc.doc")
val doc = new HWPFDocument(fis)
val we = new WordExtractor(doc)
val paras = we.getParagraphText()
You can use InputStream in Scala, just as any other Java class/interface.

Self-consistent and updated example of using Spark over ElasticSearch

This guy had a very small example that showed how to integrate ElasticSearch and Spark, when all the ES ecosystem was around version 0.9. Nowadays, it doesn't work anymore (and googling for it doesn't seem an easy feat). Can someone give a small, self-contained Scala example of:
Opening a file in spark (in the example above, it was /var/log/syslog);
Doing something with it;
Sending the result into ES;
Opening that result back in Spark.
... that works with ElasticSearch 1.3.4 and Spark 1.1.0.
I gave a talk awhile back with Spark and Elastic Search (around the 0.9 days), and I recently updated some of the examples for present day (read 1.1). I've posted the slides and the example code. Hope that helps!
I've also copied the relevant sections (from my own github repo) here:
import org.elasticsearch.spark.sql._
...
val tweetsAsCS =
createSchemaRDD(tweetRDD.map(SharedIndex.prepareTweetsCaseClass))
tweetsAsCS.saveToEs(esResource)
Note that we didn't specify any ES nodes. This will default to trying to save to a cluster on local host. If we want to use a different cluster we can add:
// if we want to have a different es cluster we can add
import org.elasticsearch.hadoop.cfg.ConfigurationOptions
val config = new SparkConf()
config.set(ConfigurationOptions.ES_NODES, node) // set the node for discovery
// other config settings
val sc = new SparkContext(config)
So that will do the first part (indexing some data).
Querying ES from Spark has also gotten a lot simpler, although only if your data types are supported by the mappings of the connector (the primary one I ran into that wasn't was geolocation but its easy enough to extend the mapper if you run into this).
val query = "{\"query\": {\"filtered\" : {\"query\" : {\"match_all\" : {}},\"filter\" : { \"geo_distance\" : { \"distance\" : \""+ dist + "km\", \"location\" : { \"lat\" : "+ lat +", \"lon\" : "+ lon +" }}}}}}"
val tweets = sqlCtx.esRDD(esResource, query)
The esRDD function isn't normally on the SQLContext, but the implicit conversions we imported up above make it available to us. tweets is now a SchemaRDD and we can update it as desired and save the results back as we did in the first part of this example.
Hope this helps!