There is an example of Scala mongodb transaction:
https://github.com/mongodb/mongo-scala-driver/blob/r2.4.0/driver/src/it/scala/org/mongodb/scala/DocumentationTransactionsExampleSpec.scala
But it's not clear how to rollback the transaction in case of failure.
Here is the code I copied from official example but modified a bit to make the transaction fail in the second insertion (inserting 2 documents with same ids), but problem is that the first document is persisted, and I need the WHOLE transaction to be rolled back.
import org.mongodb.scala._
import scala.concurrent.Await
import scala.concurrent.duration.Duration
object Application extends App {
val mongoClient: MongoClient = MongoClient("mongodb://localhost:27018")
val database = mongoClient.getDatabase("hr")
val employeesCollection = database.getCollection("employees")
// Implicit functions that execute the Observable and return the results
val waitDuration = Duration(5, "seconds")
implicit class ObservableExecutor[T](observable: Observable[T]) {
def execute(): Seq[T] = Await.result(observable.toFuture(), waitDuration)
}
implicit class SingleObservableExecutor[T](observable: SingleObservable[T]) {
def execute(): T = Await.result(observable.toFuture(), waitDuration)
}
updateEmployeeInfoWithRetry(mongoClient).execute()
Thread.sleep(3000)
/// -------------------------
def updateEmployeeInfo(database: MongoDatabase, observable: SingleObservable[ClientSession]): SingleObservable[ClientSession] = {
observable.map(clientSession => {
val eventsCollection = database.getCollection("events")
val transactionOptions = TransactionOptions.builder().readConcern(ReadConcern.SNAPSHOT).writeConcern(WriteConcern.MAJORITY).build()
clientSession.startTransaction(transactionOptions)
eventsCollection.insertOne(clientSession, Document("_id" -> "123", "employee" -> 3, "status" -> Document("new" -> "Inactive", "old" -> "Active")))
.subscribe((res: Completed) => println(res))
// THIS SHOULD FAIL, SINCE THERE IS ALREADY DOCUMENT WITH ID = 123, but PREVIOUS OPERATION SHOULD BE ALSO ROLLED BACK.
// I COULD NOT FIND THE WAY HOW TO ROLLBACK WHOLE TRANSACTION IF ONE OF OPERATIONS FAILED
eventsCollection.insertOne(clientSession, Document("_id" -> "123", "employee" -> 3, "status" -> Document("new" -> "Inactive", "old" -> "Active")))
.subscribe((res: Completed) => println(res))
// I'VE TRIED VARIOUS THINGS (INCLUDING CODE BELOW)
// .subscribe(new Observer[Completed] {
// override def onNext(result: Completed): Unit = println("onNext")
//
// override def onError(e: Throwable): Unit = clientSession.abortTransaction()
//
// override def onComplete(): Unit = println("complete")
// })
clientSession
})
}
def commitAndRetry(observable: SingleObservable[Completed]): SingleObservable[Completed] = {
observable.recoverWith({
case e: MongoException if e.hasErrorLabel(MongoException.UNKNOWN_TRANSACTION_COMMIT_RESULT_LABEL) => {
println("UnknownTransactionCommitResult, retrying commit operation ...")
commitAndRetry(observable)
}
case e: Exception => {
println(s"Exception during commit ...: $e")
throw e
}
})
}
def runTransactionAndRetry(observable: SingleObservable[Completed]): SingleObservable[Completed] = {
observable.recoverWith({
case e: MongoException if e.hasErrorLabel(MongoException.TRANSIENT_TRANSACTION_ERROR_LABEL) => {
println("TransientTransactionError, aborting transaction and retrying ...")
runTransactionAndRetry(observable)
}
})
}
def updateEmployeeInfoWithRetry(client: MongoClient): SingleObservable[Completed] = {
val database = client.getDatabase("hr")
val updateEmployeeInfoObservable: Observable[ClientSession] = updateEmployeeInfo(database, client.startSession())
val commitTransactionObservable: SingleObservable[Completed] =
updateEmployeeInfoObservable.flatMap(clientSession => clientSession.commitTransaction())
val commitAndRetryObservable: SingleObservable[Completed] = commitAndRetry(commitTransactionObservable)
runTransactionAndRetry(commitAndRetryObservable)
}
}
How to rollback the whole transaction if any operation failed?
From the source code of the Scala driver at https://github.com/mongodb/mongo-scala-driver/blob/r2.6.0/driver/src/main/scala/org/mongodb/scala/ClientSessionImplicits.scala
It appears that there is an abortTransaction() method defined along with commitTransaction().
In another note, currently a single replica set transaction in MongoDB 4.0 will be automatically aborted if it's not committed within 60 seconds (configurable). In the MongoDB Multi-Document ACID Transactions blog post:
By default, MongoDB will automatically abort any multi-document transaction that runs for more than 60 seconds. Note that if write volumes to the server are low, you have the flexibility to tune your transactions for a longer execution time.
Related
I'm attempting to improve the below code that creates a MongoDB connection and inserts a document using the insertDocument method:
import com.typesafe.scalalogging.LazyLogging
import org.mongodb.scala.result.InsertOneResult
import org.mongodb.scala.{Document, MongoClient, MongoCollection, MongoDatabase, Observer, SingleObservable}
import play.api.libs.json.JsResult.Exception
object MongoFactory extends LazyLogging {
val uri: String = "mongodb+srv://*********"
val client: MongoClient = MongoClient(uri)
val db: MongoDatabase = client.getDatabase("db")
val collection: MongoCollection[Document] = db.getCollection("col")
def insertDocument(document: Document) = {
val singleObservable: SingleObservable[InsertOneResult] = collection.insertOne(document)
singleObservable.subscribe(new Observer[InsertOneResult] {
override def onNext(result: InsertOneResult): Unit = println(s"onNext: $result")
override def onError(e: Throwable): Unit = println(s"onError: $e")
override def onComplete(): Unit = println("onComplete")
})
}
}
The primary issue I see with the above code is that if the connection becomes stale due to MongoDB server going offline or some other condition then
the connection is not restarted.
An improvement to cater for this scenario is :
object MongoFactory extends LazyLogging {
val uri: String = "mongodb+srv://*********"
var client: MongoClient = MongoClient(uri)
var db: MongoDatabase = client.getDatabase("db")
var collection: MongoCollection[Document] = db.getCollection("col")
def isDbDown() : Boolean = {
try {
client.getDatabase("db")
false
}
catch {
case e: Exception =>
true
}
}
def insertDocument(document: Document) = {
if(isDbDown()) {
client = MongoClient(uri)
db = client.getDatabase("db")
collection = db.getCollection("col")
}
val singleObservable: SingleObservable[InsertOneResult] = collection.insertOne(document)
singleObservable.subscribe(new Observer[InsertOneResult] {
override def onNext(result: InsertOneResult): Unit = println(s"onNext: $result")
override def onError(e: Throwable): Unit = println(s"onError: $e")
override def onComplete(): Unit = println("onComplete")
})
}
}
I expect this to handle the scenario if the DB connection becomes unavailable but is there a more idiomatic Scala method of
determining
Your code does not create connections. It creates MongoClient instances.
As such you cannot "create a new connection". MongoDB drivers do not provide an API for applications to manage connections.
Connections are managed internally by the driver and are created and destroyed automatically as needed in response to application requests/commands. You can configure connection pool size and when stale connections are removed from the pool.
Furthermore, execution of a single application command may involve multiple connections (up to 3 easily, possibly over 5 if encryption is involved), and the connection(s) used depend on the command/query. Checking the health of any one connection, even if it was possible, wouldn't be very useful.
I have a spark connector notebook, "Export Tables To Database", that write spark table data to an Azure SQL database. I have a master notebook that calls that spark connector notebook to write many tables in parallel. If a copy fails, I have a retry portion in the master notebook that retry the export. However, it is causing duplicates in my database because the original failed one doesn't cancel the connection immediately. I want to add a wait period before each retry. How do I do that?
////these next four class and functions are for exporting data directly to the Azure SQL database via the spark connectors.
// the next two functions are for retry purpose. if exporting a table faile, it will retry
def tryNotebookRun (path: String, timeout: Int, parameters: Map[String, String] = Map.empty[String, String]): Try[Any] = {
Try(
if (parameters.nonEmpty){
dbutils.notebook.run(path, timeout, parameters)
}
else{
dbutils.notebook.run(path, timeout)
}
)
}
def runWithRetry(path: String, timeout: Int, parameters: Map[String, String] = Map.empty[String, String], maxRetries: Int = 3) = {
var numRetries = 0
// I want to add a wait period here
while (numRetries < maxRetries){
tryNotebookRun(path, timeout, parameters) match {
case Success(_) => numRetries = maxRetries
case Failure(_) => numRetries = numRetries + 1
}
}
}
case class NotebookData(path: String, timeout: Int, parameters: Map[String, String] = Map.empty[String, String])
def parallelNotebooks(notebooks: Seq[NotebookData]): Future[Seq[Any]] = {
val numNotebooksInParallel = 5
// This code limits the number of parallel notebooks.
implicit val ec = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(numNotebooksInParallel))
val ctx = dbutils.notebook.getContext()
Future.sequence(
notebooks.map { notebook =>
Future {
dbutils.notebook.setContext(ctx)
runWithRetry(notebook.path, notebook.timeout, notebook.parameters)
}
.recover {
case NonFatal(e) => s"ERROR: ${e.getMessage}"
}
}
)
}
////create a sequence of tables to be writed out in parallel
val notebooks = Seq(
NotebookData("Export Tables To Database", 0, Map("client"->client, "scope"->scope, "secret"->secret, "schema"->"test", "dbTable"->"table1")),
NotebookData("Export Tables To Database", 0, Map("client"->client, "scope"->scope, "secret"->secret, "schema"->"test", "dbTable"->"table2"))
)
val res = parallelNotebooks(notebooks)
Await.result(res, 3000000 seconds) // this is a blocking call.
res.value
adding Thread.sleep was the solution
def runWithRetry(path: String, timeout: Int, parameters: Map[String, String] = Map.empty[String, String], maxRetries: Int = 2) = {
var numRetries = 0
while (numRetries < maxRetries){
tryNotebookRun(path, timeout, parameters) match {
case Success(_) => numRetries = maxRetries
case Failure(_) => {
Thread.sleep(30000)
numRetries = numRetries + 1
}
}
}
}
I wrote following code to fetch data from MongoDB
import com.typesafe.config.ConfigFactory
import org.mongodb.scala.{ Document, MongoClient, MongoCollection, MongoDatabase }
import scala.concurrent.ExecutionContext
object MongoService extends Service {
val conf = ConfigFactory.load()
implicit val mongoService: MongoClient = MongoClient(conf.getString("mongo.url"))
implicit val mongoDB: MongoDatabase = mongoService.getDatabase(conf.getString("mongo.db"))
implicit val ec: ExecutionContext = ExecutionContext.global
def getAllDocumentsFromCollection(collection: String) = {
mongoDB.getCollection(collection).find()
}
}
But when I tried to get data from getAllDocumentsFromCollection I'm not getting each data for further manipulation. Instead I'm getting
FindObservable(com.mongodb.async.client.FindIterableImpl#23555cf5)
UPDATED:
object MongoService {
// My settings (see available connection options)
val mongoUri = "mongodb://localhost:27017/smsto?authMode=scram-sha1"
import ExecutionContext.Implicits.global // use any appropriate context
// Connect to the database: Must be done only once per application
val driver = MongoDriver()
val parsedUri = MongoConnection.parseURI(mongoUri)
val connection = parsedUri.map(driver.connection(_))
// Database and collections: Get references
val futureConnection = Future.fromTry(connection)
def db1: Future[DefaultDB] = futureConnection.flatMap(_.database("smsto"))
def personCollection = db1.map(_.collection("person"))
// Write Documents: insert or update
implicit def personWriter: BSONDocumentWriter[Person] = Macros.writer[Person]
// or provide a custom one
def createPerson(person: Person): Future[Unit] =
personCollection.flatMap(_.insert(person).map(_ => {})) // use personWriter
def getAll(collection: String) =
db1.map(_.collection(collection))
// Custom persistent types
case class Person(firstName: String, lastName: String, age: Int)
}
I tried to use reactivemongo as well with above code but I couldn't make it work for getAll and getting following error in createPerson
Please suggest how can I get all data from a collection.
This is likely too late for the OP, but hopefully the following methods of retrieving & iterating over collections using mongo-spark can prove useful to others.
The Asynchronous Way - Iterating over documents asynchronously means you won't have to store an entire collection in-memory, which can become unreasonable for large collections. However, you won't have access to all your documents outside the subscribe code block for reuse. I'd recommend doing things asynchronously if you can, since this is how the mongo-scala driver was intended to be used.
db.getCollection(collectionName).find().subscribe(
(doc: org.mongodb.scala.bson.Document) => {
// operate on an individual document here
},
(e: Throwable) => {
// do something with errors here, if desired
},
() => {
// this signifies that you've reached the end of your collection
}
)
The "Synchronous" Way - This is a pattern I use when my use-case calls for a synchronous solution, and I'm working with smaller collections or result-sets. It still uses the asynchronous mongo-scala driver, but it returns a list of documents and blocks downstream code execution until all documents are returned. Handling errors and timeouts may depend on your use case.
import org.mongodb.scala._
import org.mongodb.scala.bson.Document
import org.mongodb.scala.model.Filters
import scala.collection.mutable.ListBuffer
/* This function optionally takes filters if you do not wish to return the entire collection.
* You could extend it to take other optional query params, such as org.mongodb.scala.model.{Sorts, Projections, Aggregates}
*/
def getDocsSync(db: MongoDatabase, collectionName: String, filters: Option[conversions.Bson]): ListBuffer[Document] = {
val docs = scala.collection.mutable.ListBuffer[Document]()
var processing = true
val query = if (filters.isDefined) {
db.getCollection(collectionName).find(filters.get)
} else {
db.getCollection(collectionName).find()
}
query.subscribe(
(doc: Document) => docs.append(doc), // add doc to mutable list
(e: Throwable) => throw e,
() => processing = false
)
while (processing) {
Thread.sleep(100) // wait here until all docs have been returned
}
docs
}
// sample usage of 'synchronous' method
val client: MongoClient = MongoClient(uriString)
val db: MongoDatabase = client.getDatabase(dbName)
val allDocs = getDocsSync(db, "myCollection", Option.empty)
val someDocs = getDocsSync(db, "myCollection", Option(Filters.eq("fieldName", "foo")))
I'm practicing on a project that needs a database connection, I'm using the Play Framework combine to Scala and MongoDB.
I'm also using Mongo-scala-driver and following the documentation.
I wrote the exact same code:
println("start")
val mongoClient: MongoClient = MongoClient("mongodb://localhost:27017/Sandbox")
val database: MongoDatabase = mongoClient.getDatabase("test")
val collection: MongoCollection[Document] = database.getCollection("test")
val doc: Document = Document("_id" -> 0, "name" -> "MongoDB", "type" -> "database", "count" -> 1, "info" -> Document("x" -> 203, "y" -> 102))
collection.insertOne(doc).subscribe(new Observer[Completed] {
override def onSubscribe(subscription: Subscription): Unit = println("Subscribed")
override def onNext(result: Completed): Unit = println("Inserted")
override def onError(e: Throwable): Unit = println("Failed")
override def onComplete(): Unit = println("Completed")
})
mongoClient.close()
println("end")
Nothing is inserted into the database and the only result i get from the log is this:
start
Subscribed
end
I've been looking on stackoverflow for similar subject but everything I found didn't work for me.
You try insert document in asyncronous mode.
Therefore you must define three call back function onNext onError and onComplete
But you don't give time for execute insertion.
Try append any timeout before close connection. For example simple add
Thread.sleep(1000)
before
mongoClient.close()
And you no need redefine onSubscribe()
if you not want manually control demand when you move in documents list from you requests then you no need override onSubscribe(). The default definition for onSubscrime() very usable for trivial requests. In you case you no need override him.
The next code is worked
println("start")
val mongoClient: MongoClient = MongoClient("mongodb://DB01-MongoDB:27017/Sandbox")
val database: MongoDatabase = mongoClient.getDatabase("test")
val collection: MongoCollection[Document] = database.getCollection("test")
val doc: Document = Document("_id" -> 0,
"name" -> "MongoDB",
"type" -> "database",
"count" -> 1,
"info" -> Document("x" -> 203, "y" -> 102))
collection
.insertOne(doc)
.subscribe(new Observer[Completed] {
override def onNext(result: Completed): Unit = println("Inserted")
override def onError(e: Throwable): Unit = println("Failed")
override def onComplete(): Unit = println("Completed")
})
Thread.sleep(1000)
mongoClient.close()
println("end")
}
The problem was the Observer, I imported it from org.mongodb.async.client but the good one was org.mongodb.scala.
Hope this helps someone else.
The above solution may work but you might have to trade 1 second every time you insert (or any call). Another solution is to do make use of the call back :
val insertObservable = collection.insertOne(doc)
insertObservable.subscribe(new Observer[Completed] {
override def onComplete(): Unit = mongoClient.close()
})
Once the transaction completed, the connection gets closed automatically without wasting 1 second.
I want to select some rows, update them and return updated values, but I don't understand how can I do it with Slick. Here is an example. I want to select all tasks which are awaiting execution, lock them, change statuses to in progress and return updated tasks:
object Test {
case class Task(id: Int, status: String)
class TaskTable(tag: Tag) extends Table[Task](tag, "tasks") {
def id = column[Int]("id")
def status = column[String]("status")
def * = (id, status) <>(Task.tupled, Task.unapply)
}
val tasks = TableQuery[TaskTable]
def selectWaitingTasksAndChangeStatus(): Seq[Task] = {
tasks.filter(_.status === "awaitingExecution").forUpdate
// Here I want to change status to "inProgress" and
// return tasks to client code with "inProgress" status
}
}
Is this the thing you're looking for?
import scala.concurrent.Await
import scala.concurrent.duration.Duration
def selectWaitingTasksAndChangeStatus(): Seq[Task] = {
val selectAction = tasks.filter(_.status === "inProgress").result
val updateAction = tasks.filter(_.status === "awaitingExecution").map(_.status).update("inProgress")
val combinedAction = for {
tasksBeforeUpdate <- selectAction
_ <- updateAction
tasksAfterUpdate <- selectAction
} yield tasksAfterUpdate.diff(tasksBeforeUpdate)
Await.result(db.run(combinedAction.transactionally), Duration.Inf)
}
Since you'd like to get Seq[Task] from that method, you have to synchronously wait for the result from the database. Asynchronous solution would require Future[Seq[Task]] as the returned type.