I am trying to use scala elstic4s client to index new documents into my elasticsearch cluster but I am having a compilation problem with the types. Following the documentation and the examples found in the web, the syntax looks like:
Client instantiation:
val settings = ImmutableSettings.settingsBuilder().put("cluster.name", Configuration.elasticsearchClusterName).build()
val uri = ElasticsearchClientUri("elasticsearch://" + Configuration.elasticsearchUri)
val client = ElasticClient.remote(settings, uri)
I am trying to write it like:
def writeToElasticsearch(bulkList: List[EventMessage]) {
val ops = for (message <- bulkList) yield index into indexDcp ttl 7.days.toMillis doc StringDocumentSource(message.toJSon())
client.execute(bulk(ops: _*)).await
}
I am getting a compilation error in the bulk operation saying:
Multiple markers at this line
- type mismatch; found : List[com.sksamuel.elastic4s.IndexDefinition] required:
Seq[Int]
Can anyone tell me how I can convert the types to make this work? Thank you!
I'm not sure why you are getting errors, possibly something in the bits you've missed out of your code samples, but this version of your code compiles fine for me.
object Test extends App {
val client = ElasticClient.local
val indexDcp = "myindex/mytype"
import scala.concurrent.duration._
import ElasticDsl._
def writeToElasticsearch(bulkList: List[EventMessage]) {
val ops = for (message <- bulkList) yield { index into indexDcp ttl 7.days.toMillis doc StringDocumentSource(message.toJSon()) }
client.execute(bulk(ops: _*)).await
}
trait EventMessage {
def toJSon(): String
}
}
Related
I am trying to get some basic file IO (write/read) in a purely functional way using cats-effect. After following this tutorial, here is what I ended up with for reading a file:
private def readFile(): IO[String] = for {
lines <- bufferedReader(new File(filePath)).use(readAllLines)
} yield lines.mkString
def bufferedReader(f: File): Resource[IO, BufferedReader] =
Resource.make {
IO(new BufferedReader(new FileReader(f)))
} { fileReader =>
IO(fileReader.close()).handleErrorWith(_ => IO.unit)
}
Now in the handleErrorWith function I could log any error occuring, but how can I add proper error handling to this (e.g. return a Resource[IO, Either[CouldNotReadFileError, BufferedReader]])?
Proper error handling can be added via the use of .attempt on the returned IO value:
import scala.collection.JavaConverters._
val resourceOrError: IO[Either[Throwable, String]] = bufferedReader(new File(""))
.use(resource => IO(resource.lines().iterator().asScala.mkString))
.attempt
If you want to lift that into your own ADT, you can use leftMap:
import cats.syntax.either._
final case class CouldNotReadError(e: Throwable)
val resourceOrError: IO[Either[CouldNotReadError, String]] =
bufferedReader(new File(""))
.use(resource => IO(resource.lines().iterator().asScala.mkString))
.attempt
.map(_.leftMap(CouldNotReadError))
Additionally, you might be interested in the ZIO datatype, which has supported cats-effect instances, and has a slightly different shape of the form IO[E, A] where E captures the error effect type.
I wrote following code to fetch data from MongoDB
import com.typesafe.config.ConfigFactory
import org.mongodb.scala.{ Document, MongoClient, MongoCollection, MongoDatabase }
import scala.concurrent.ExecutionContext
object MongoService extends Service {
val conf = ConfigFactory.load()
implicit val mongoService: MongoClient = MongoClient(conf.getString("mongo.url"))
implicit val mongoDB: MongoDatabase = mongoService.getDatabase(conf.getString("mongo.db"))
implicit val ec: ExecutionContext = ExecutionContext.global
def getAllDocumentsFromCollection(collection: String) = {
mongoDB.getCollection(collection).find()
}
}
But when I tried to get data from getAllDocumentsFromCollection I'm not getting each data for further manipulation. Instead I'm getting
FindObservable(com.mongodb.async.client.FindIterableImpl#23555cf5)
UPDATED:
object MongoService {
// My settings (see available connection options)
val mongoUri = "mongodb://localhost:27017/smsto?authMode=scram-sha1"
import ExecutionContext.Implicits.global // use any appropriate context
// Connect to the database: Must be done only once per application
val driver = MongoDriver()
val parsedUri = MongoConnection.parseURI(mongoUri)
val connection = parsedUri.map(driver.connection(_))
// Database and collections: Get references
val futureConnection = Future.fromTry(connection)
def db1: Future[DefaultDB] = futureConnection.flatMap(_.database("smsto"))
def personCollection = db1.map(_.collection("person"))
// Write Documents: insert or update
implicit def personWriter: BSONDocumentWriter[Person] = Macros.writer[Person]
// or provide a custom one
def createPerson(person: Person): Future[Unit] =
personCollection.flatMap(_.insert(person).map(_ => {})) // use personWriter
def getAll(collection: String) =
db1.map(_.collection(collection))
// Custom persistent types
case class Person(firstName: String, lastName: String, age: Int)
}
I tried to use reactivemongo as well with above code but I couldn't make it work for getAll and getting following error in createPerson
Please suggest how can I get all data from a collection.
This is likely too late for the OP, but hopefully the following methods of retrieving & iterating over collections using mongo-spark can prove useful to others.
The Asynchronous Way - Iterating over documents asynchronously means you won't have to store an entire collection in-memory, which can become unreasonable for large collections. However, you won't have access to all your documents outside the subscribe code block for reuse. I'd recommend doing things asynchronously if you can, since this is how the mongo-scala driver was intended to be used.
db.getCollection(collectionName).find().subscribe(
(doc: org.mongodb.scala.bson.Document) => {
// operate on an individual document here
},
(e: Throwable) => {
// do something with errors here, if desired
},
() => {
// this signifies that you've reached the end of your collection
}
)
The "Synchronous" Way - This is a pattern I use when my use-case calls for a synchronous solution, and I'm working with smaller collections or result-sets. It still uses the asynchronous mongo-scala driver, but it returns a list of documents and blocks downstream code execution until all documents are returned. Handling errors and timeouts may depend on your use case.
import org.mongodb.scala._
import org.mongodb.scala.bson.Document
import org.mongodb.scala.model.Filters
import scala.collection.mutable.ListBuffer
/* This function optionally takes filters if you do not wish to return the entire collection.
* You could extend it to take other optional query params, such as org.mongodb.scala.model.{Sorts, Projections, Aggregates}
*/
def getDocsSync(db: MongoDatabase, collectionName: String, filters: Option[conversions.Bson]): ListBuffer[Document] = {
val docs = scala.collection.mutable.ListBuffer[Document]()
var processing = true
val query = if (filters.isDefined) {
db.getCollection(collectionName).find(filters.get)
} else {
db.getCollection(collectionName).find()
}
query.subscribe(
(doc: Document) => docs.append(doc), // add doc to mutable list
(e: Throwable) => throw e,
() => processing = false
)
while (processing) {
Thread.sleep(100) // wait here until all docs have been returned
}
docs
}
// sample usage of 'synchronous' method
val client: MongoClient = MongoClient(uriString)
val db: MongoDatabase = client.getDatabase(dbName)
val allDocs = getDocsSync(db, "myCollection", Option.empty)
val someDocs = getDocsSync(db, "myCollection", Option(Filters.eq("fieldName", "foo")))
I want to write some data from spark DF to postgres. After searching on stack i found that easiest way of it - to open connection each time I use my prepared statement - and this works fine. But I want to share variable with connection across all nodes.
I get some code from here: https://www.nicolaferraro.me/2016/02/22/using-non-serializable-objects-in-apache-spark/ and wrote this:
class SharedVariable[T](constructor: => T) extends AnyRef with Serializable {
#transient private lazy val instance: T = constructor
def get = instance
}
object SharedVariable {
def apply[T](constructor: => T): SharedVariable[T] = new SharedVariable[T](constructor)}
val dsv = SharedVariable {
val ds = new BasicDataSource()
ds.setDriverClassName("org.postgresql.Driver")
ds.setUrl("jdbc:postgresql://...")
ds.setUsername("user")
ds.setPassword("pass")
ds}
But i get an error:
error: reference to SharedVariable is ambiguous;
it is imported twice in the same scope by
import $line1031857785174$read.SharedVariable
and import INSTANCE.SharedVariable val dsv = SharedVariable {
Can someone help me?
The following working code from Slick 2.1 returns a single integer (which in this example, happens to be the result of running a function called "foobar"):
def getFoobar(): Int = DB.withSession {
val query = Q.queryNA[Int]("select foobar()")
query.first
}
How would one port this to Slick 3.0? According to the Slick 3.0 docs, the query would have to be converted to an DBIOAction. So this is what I've tried:
import driver.api._
...
def getFoobar(): Future[Int] = {
val query = sql"select foobar()".as[Int]
db.run(query)
}
but this results in the following compilation error:
[error] found : slick.profile.SqlStreamingAction[Vector[Int],Int,slick.dbio.Effect]#ResultAction [Int,slick.dbio.NoStream,slick.dbio.Effect]
[error] required: MyDAO.this.driver.api.DBIO[Seq[Int]]
It appears that the sql interpolator is yielding a SqlStreamingAction rather than a DBIO, as db.run is expecting.
What would be the correct way to write this in the new Slick 3 API?
I used something similar and it worked for me
import slick.driver.MySQLDriver.api._
def get(id : String) : Future[Channel] = {
implicit val getChannelResult = GetResult(r => Channel(r.<<, r.<<, r.<<, r.<<, r.<<))
val query = sql"select * from Channel where id = $id".as[Channel]
db.run(myq.headOption)
}
The db.run(DBIOAction[T,NoStream,Nothing]) command would accept all types of actions like sqlstreamingaction , StreamingDriverAction , DriverAction etc.
I guess the problem lies with the driver or db configuration. So the error
[error] required: MyDAO.this.driver.api.DBIO[Seq[Int]]
Can you just paste the driver and db configuration steps, so that we can get a deeper look into the code to identify the actual error step
I am using Scala and Slick and I am trying to execute simple query with two conditions
import JiraData._
import org.scala_tools.time.Imports._
import scala.slick.driver.PostgresDriver.simple._
val today = new DateTime()
val yesterday = today.plusDays(-1)
implicit val session = Database.forURL("jdbc:postgresql://localhost/jira-performance-manager",
driver = "org.postgresql.Driver",
user = "jira-performance-manager",
password = "jira-performance-manager").withSession {
implicit session =>
val activeUsers = users.filter(_.active === true)
for (activeUser <- activeUsers) {
val activeUserWorkogs = worklogs.filter(x => x.username === activeUser.name && x.workDate === yesterday)
}
}
But I receive error:
Error:(20, 95) value === is not a member of scala.slick.lifted.Column[org.scala_tools.time.Imports.DateTime]
Note: implicit value session is not applicable here because it comes after the application point and it lacks an explicit result type
val activeUserWorkogs = worklogs.filter(x => x.username === activeUser.name && x.workDate === yesterday)
^
What's wrong here? How can I get list of results filtered by two conditions?
scala-tools.time uses JodaDateTime. See https://github.com/jorgeortiz85/scala-time/blob/master/src/main/scala/org/scala_tools/time/Imports.scala . Slick does not have built-in support for Joda. There is Slick Joda mapper: https://github.com/tototoshi/slick-joda-mapper . Or it is easy to add yourself: http://slick.typesafe.com/doc/2.1.0/userdefined.html#using-custom-scalar-types-in-queries
As a side-note: Something like
for (activeUser <- activeUsers) {
val activeUserWorkogs = worklogs.filter(...)
looks like going into the wrong direction. It will run another query for each active user. Better is to use a join or run a single accumulated query for the work logs of all active users.