how to get started with Elastic Search using scala client - scala

Hi i am new in Elastic Search and i want to use it with scala so i found some codes examples on github, but there was very complex examples were given as for a beginner I spend my whole day in trying to understand this tutorial but at the end i am confused how to start this is,its very complex to understand same as with other Scala client examples
https://github.com/scalastuff/esclient
https://github.com/bsadeh/scalastic
https://github.com/gphat/wabisabi also i tried this but it contains error and i posted it here as well https://stackoverflow.com/questions/27145015/scalagetstatuscode-getresponsebody-is-not-a-member-of-dispatch-future
All these examples are very complex for a new learner like me as i go through first chapter of Elastic Search from its guide then I want to do these same things pro-grammatically with Scala.Please suggest me some starting point from where can i start learning and also there is a request do not mark this question as nonconstructive first i tried myself after then i am posting this question,Please i need help i want to learn elastic search using scala

The Elastic4s project contains, near the top of the readme, a simple example on how to use the driver. This example is a complete Scala program that you can execute.
import com.sksamuel.elastic4s.ElasticClient
import com.sksamuel.elastic4s.ElasticDsl._
object Test extends App {
val client = ElasticClient.local
// await is a helper method to make this operation sync instead of async
// You would normally avoid doing this in a real program as it will block
client.execute { index into "bands/artists" fields "name"->"coldplay" }.await
val resp = client.execute { search in "bands/artists" query "coldplay" }.await
println(resp)
}
If this is too complicated, then that is not because the Scala client is too complicated, but because you don't yet understand enough about Elasticsearch or Scala. The Scala client you are looking at is a typical DSL so it uses some Scala tricks that make it nice to use as a client, but not necessarily easy to to understand under the covers.
Here are some good links to understanding Elasticsearch:
http://spinscale.github.io/elasticsearch/2012-03-jugm.html#/20
http://exploringelasticsearch.com/
http://joelabrahamsson.com/elasticsearch-101/
http://www.slideshare.net/karmi/your-data-your-search-elasticsearch-euruko-2011
Before you use any of the Scala drivers, you should at least understand the basic concepts of an index/type, the query DSL, and what a node is in Elasticsearch. It might also be helpful to look at the JSON that you can send with the HTTP interface as that is a bit easier to see what is going on, because the Elasticsearch docs can be heavy going at first.

Simple elastic search client
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>7.5.0</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.5.0</version>
</dependency>
Scala code to ES with basic auth:
import org.apache.http.HttpHost
import org.apache.http.auth.{AuthScope, Credentials, UsernamePasswordCredentials}
import org.elasticsearch.action.admin.indices.alias.IndicesAliasesRequest
import org.elasticsearch.action.admin.indices.alias.IndicesAliasesRequest.AliasActions
import org.elasticsearch.client._
import org.apache.http.client.CredentialsProvider
import org.apache.http.impl.client.BasicCredentialsProvider
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder
import org.elasticsearch.client.RestClient
import org.elasticsearch.client.RestClientBuilder
val credentials = new UsernamePasswordCredentials("<username>", "<password>");
val credentialsProvider:CredentialsProvider = new BasicCredentialsProvider
credentialsProvider.setCredentials(AuthScope.ANY, credentials)
val client = RestClient.builder(new HttpHost("<host>", 9200,"https")).setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
override def customizeHttpClient(httpClientBuilder: HttpAsyncClientBuilder): HttpAsyncClientBuilder = httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)
}).build
val request = new Request(
"GET",
/_cat/aliases?format=JSON )
val response = client.performRequest(request);
println("Response:"+response.getEntity.getContent)
client.close

Related

Throttle or debounce method calls

Let's say I have a method that permits to update some date in DB:
def updateLastConsultationDate(userId: String): Unit = ???
How can I throttle/debounce that method easily so that it won't be run more than once an hour per user.
I'd like the simplest possible solution, not based on any event-bus, actor lib or persistence layer. I'd like an in-memory solution (and I am aware of the risks).
I've seen solutions for throttling in Scala, based on Akka Throttler, but this really looks to me overkill to start using actors just for throttling method calls. Isn't there a very simple way to do that?
Edit: as it seems not clear enough, here's a visual representation of what I want, implemented in JS. As you can see, throttling may not only be about filtering subsequent calls, but also postponing calls (also called trailing events in js/lodash/underscore). The solution I'm looking for can't be based on pure-synchronous code only.
This sounds like a great job for a ReactiveX-based solution. On Scala, Monix is my favorite one. Here's the Ammonite REPL session illustrating it:
import $ivy.`io.monix::monix:2.1.0` // I'm using Ammonite's magic imports, it's equivalent to adding "io.monix" %% "monix" % "2.1.0" into your libraryImports in SBT
import scala.concurrent.duration.DurationInt
import monix.reactive.subjects.ConcurrentSubject
import monix.reactive.Consumer
import monix.execution.Scheduler.Implicits.global
import monix.eval.Task
class DbUpdater {
val publish = ConcurrentSubject.publish[String]
val throttled = publish.throttleFirst(1 hour)
val cancelHandle = throttled.consumeWith(
Consumer.foreach(userId =>
println(s"update your database with $userId here")))
.runAsync
def updateLastConsultationDate(userId: String): Unit = {
publish.onNext(userId)
}
def stop(): Unit = cancelHandle.cancel()
}
Yes, and with Scala.js this code will work in the browser, too, if it's important for you.
Since you ask for the simplest possible solution, you can store a val lastUpdateByUser: Map[String, Long], which you would consult before allowing an update
if (lastUpdateByUser.getOrElse(userName, 0)+60*60*1000 < System.currentTimeMillis) updateLastConsultationDate(...)
and update when a user actually performs an update
lastUpdateByUser(userName) = System.currentTimeMillis
One way to throttle, would be to maintain a count in a redis instance. Doing so would ensure that the DB wouldn't be updated, no matter how many scala processes you were running, because the state is stored outside of the process.

What replaced RoutedHttpService in Spray

I've been following this blog post about Spray and Akka as it seems to be a reasonable way to separate out the implementation from the routing in an async service. However, this post being > 6 months old the Spray API seems to have changed and the RoutedHttpService it uses about half way down is nowhere to be found.
I'm fairly new to Scala and very new to Spray, and the Spray docs are at best obtuse, so I've been struggling what to replace that bit of the code with.
A couple of questions then:
Is the approach outlined in this post actually sensible?
If the answer to (1) is yes, then what should RoutedHttpService be replaced with?
If the answer to (1) is no, then is there some other docs of 'the right way' to do Spray?
For easy reference, the bit of code in question is this:
trait Api extends RouteConcatenation {
this: CoreActors with Core =>
private implicit val _ = system.dispatcher
val routes =
new RegistrationService(registration).route ~
new MessengerService(messenger).route
val rootService = system.actorOf(Props(new RoutedHttpService(routes))) // :-(
}
1) The approach described in this post is really nice but it is already advanced Scala programming. My advice, don't use it if you do not understand it.
2) RoutedHttpService is actually from the Eigengo's activator template not from the Spray API, you can find the source code here.
3) You can also have a look at this project, it gives a nice skeleton with less cake pattern composition.

Configure play-slick and samples

I'm currently trying to use Play! Framework 2.2 and play-slick (master branch).
In the play-slick code I would like to override driver definition in order to add the Oracle Driver (I'm using slick-extension). In the Config.Scala of play-slick I just saw /** Extend this to add driver or change driver mapping */ ...
I'm coming from far far away (currently reading Programming In Scala) so there's a lot to learn. So my questions are :
Can someone explain me how to extend this Config object ? this object is used in others classes ... Is the cake apttern useful here ?
Talking about cake pattern, I read the computer-database example provided by play-slick. This sample uses the cake pattern and import play.api.db.slick.Config.driver.simple._ If I'm using Oracle driver I cannot use this import, am I wrong ? How can I use the cake pattern to define an implicit session ?
Thanks a lot.
Waiting for your advices and I'm still studying the play-slick code at home :)
To extend the Config trait I do not think the cake pattern is required. You should be able to create your Config object like this:
import scala.slick.driver.ExtendedDriver
object MyExtendedConfig extends play.api.db.slick.Config {
override def driverByName: String => Option[ExtendedDriver] = {name: String =>
super.driverByName(name) orElse Map("oracledriverstring" -> OracleDriver).get(name)
}
lazy val app = play.api.Play.current
lazy val driver: ExtendedDriver = driver()(app)
}
To be able to use it you only need to do: import MyExtendedConfig.driver._ instead of import play.slick.db.api.Config.driver._. BTW, I see that the type of the driverByName could have been a Map instead of a Function making it easier to extend. This shouldn't break though, but it would be easier to do it.
I think Jonas Bonér's old blog is a great place to read what the cake pattern is (http://jonasboner.com/2008/10/06/real-world-scala-dependency-injection-di/). My naive understanding of it is that you have a cake pattern when you have layers that uses the self types:
trait FooComponent{ driver: ExtendedDriver =>
import driver.simple._
class Foo extends Table[Int]("") {
//...
}
}
There are 2 use cases for the cake pattern in slick/play-slick: 1) if you have tables that references other tables (as in the computer database sample) 2) to have control over exactly which database is used at which time or if you use many many different types. By using the Config you do not really need the cake pattern as long as you only have 2 different DBs (one for prod and one for test), which is the point of the Config.
Hope this answers your questions and good luck on reading Programming in Scala (loved that book :)

Storing an object to a file

I want to save an object (an instance of a class) to a file. I didn't find any valuable example of it. Do I need to use serialization for it?
How do I do that?
UPDATE:
Here is how I tried to do that
import scala.util.Marshal
import scala.io.Source
import scala.collection.immutable
import java.io._
object Example {
class Foo(val message: String) extends scala.Serializable
val foo = new Foo("qweqwe")
val out = new FileOutputStream("out123.txt")
out.write(Marshal.dump(foo))
out.close
}
First of all, out123.txt contains many extra data and it was in a wrong encoding. My gut tells me there should be another proper way.
On the last ScalaDays Heather introduced a new library which gives a new cool mechanism for serialization - pickling. I think it's would be an idiomatic way in scala to use serialization and just what you want.
Check out a paper on this topic, slides and talk on ScalaDays'13
It is also possible to serialize to and deserialize from JSON using Jackson.
A nice wrapper that makes it Scala friendly is Jacks
JSON has the following advantages
a simple human readable text
a rather efficient format byte wise
it can be used directly by Javascript
and even be natively stored and queried using a DB like Mongo DB
(Edit) Example Usage
Serializing to JSON:
val json = JacksMapper.writeValueAsString[MyClass](instance)
... and deserializing
val obj = JacksMapper.readValue[MyClass](json)
Take a look at Twitter Chill to handle your serialization: https://github.com/twitter/chill. It's a Scala helper for the Kyro serialization library. The documentation/example on the Github page looks to be sufficient for your needs.
Just add my answer here for the convenience of someone like me.
The pickling library, which is mentioned by #4lex1v, only supports Scala 2.10/2.11 but I'm using Scala 2.12. So I'm not able to use it in my project.
And then I find out BooPickle. It supports Scala 2.11 as well as 2.12!
Here's the example:
import boopickle.Default._
val data = Seq("Hello", "World!")
val buf = Pickle.intoBytes(data)
val helloWorld = Unpickle[Seq[String]].fromBytes(buf)
More details please check here.

Close Connection for Mongodb using Casbah API

I am not getting any useful information about "how to close connection for mongodb using casbah API". Actually, I have defined multiple methods and in each method I need to establish a connection with mongodb. After working I need to close that too. I am using Scala.
one of the method like (code example in scala):
import com.mongodb.casbah.Imports._
import com.mongodb.casbah.MongoConnection
def index ={
val mongoConn = MongoConnection(configuration("hostname"))
val log = mongoConn("ab")("log")
val cursor = log.find()
val data = for {x <- cursor} yield x.getAs[BasicDBObject]("message").get
html.index(data.toList)
//mongoConn.close() <-- here i want to close the connection but this .close() is not working
}
It is unclear, from your question why exactly close is not working. Does it throw some exception, it is not compiling, or has no effect?
But since MongoConnection is a thin wrapper over com.mongodb.Mongo, you could work with underlying Mongo directly, just like in plain old Java driver:
val mongoConn = MongoConnection(configuration("hostname"))
mongoConn.underlying.close()
Actually, that's exactly, how close is implemented in casbah.
Try using .close instead. If a function doesn't have arguments in scala, you sometimes don't use parentheses after it.
EDIT: I had wrong information, edited to include correct information + link.