So, been working on a smaller project with Play, ReactiveMongo and MongoDB. My question is about the application.conf part for ReactiveMongo, have the standard googlable one:
mongodb = {
db = "db1"
servers = [ "localhost:27017" ]
credentials = {
username = "auser"
password = "apassword"
}
}
And to access a collection, in Scala:
def sessionCollection: JSONCollection = db.collection[JSONCollection]("session")
So, since MongoDB locks at database level for writes, I'm looking for a solution to use several databases.
Question is: how do I configure multiple databases so I can define collections like above, from those databases?
MongoDB 2.6.x, Play 2.3.x, Reactivemongo 0.10.5.0.akka23
Edit: I should say that I already know about this, doing it manually with code, but I wanted to know if there were any Play specific known solution I've failed to reach via Google.
In your Play application, you can use ReactiveMongo with multiple connection pools (possibly with different replica sets and/or different options), using the #NamedDatabase annotation.
Consider the following configuration, with several connection URIs.
# The default URI
mongodb.uri = "mongodb://someuser:somepasswd#localhost:27017/foo"
# Another one, named with 'bar'
mongodb.bar.uri = "mongodb://someuser:somepasswd#localhost:27017/lorem"
Then the dependency injection can select the API instances using the names.
import javax.inject.Inject
import play.modules.reactivemongo._
class MyComponent #Inject() (
val defaultApi: ReactiveMongoApi, // corresponds to 'mongodb.uri'
#NamedDatabase("bar") val barApi: ReactiveMongoApi // 'mongodb.bar'
) {
}
Related
In Meteor Mongo, how to specify the readPref to primary|secondary in Meteor Mongo Query.
I hope the following provides a better understanding of the relationship between Meteor and Mongo.
Meteor collections for more comfort
Meteor provides you with the full mongo functionality. However for comfort it provides a wrapped API of a mongo collection that integrates best with the Meteor environment. So if you import Mongo via
import { Mongo } from 'meteor/mongo'
you primarily import the wrapped mongo collection where operations are executed in a Meteor fiber. The cursor that is returned by queries of these wrapped collections are also not the "natural" cursors but also wrapped cursors to be Meteor optimized.
If you try to access a native feature on these instances that is not implemented you will receive an error. In your case:
import { Meteor } from 'meteor/meteor';
import { Random } from 'meteor/random';
const ExampleCollection = new Mongo.Collection('examples')
Meteor.startup(() => {
// code to run on server at startup
ExampleCollection.insert({ value: Random.id() })
const docsCursor = ExampleCollection.find();
docsCursor.readPref('primary')
});
Leads to
TypeError: docsCursor.readPref is not a function
Accessing the node mongo driver collections
The good news is, you can access the layer underneath via Collection.rawCollection() where you have full access to the node Mongo driver. This is because under the hood Meteor's Mongo.Collection and it's Cursor are making use of this native driver in the end.
Now you will find two other issues:
readPref is named in a node-mongo cursor cursor.setReadPreference (3.1 API).
Cursor.fetch does not exist but is named cursor.toArray which (as many native operations do) returns a Promise
So to finally answer your question
you can do the following:
import { Meteor } from 'meteor/meteor';
import { Random } from 'meteor/random';
const ExampleCollection = new Mongo.Collection('examples')
Meteor.startup(() => {
// code to run on server at startup
ExampleCollection.insert({ value: Random.id() })
const docsCursor = ExampleCollection.rawCollection().find();
docsCursor.setReadPreference('primary')
docsCursor.toArray().then((docs) => {
console.log(docs)
}).catch((err)=> console.error(err))
});
Summary
By using collection.rawCollection() you an have access to the full spectrum of the node mongo driver API
You are on your own to integrate the operations, cursors and results (Promises) into your environment. Good helpers are Meteor.bindEnvironment and Meteor.wrapAsync
Beware of API changes of the node-mongo driver. On the one hand the mongo version that is supported by the driver, on the other hand the driver version that is supported by Meteor.
Note that it is easier to "mess up" things with the native API but it also gives you a lot of new options. Use with care.
Symptom
After some time of running just fine our backend will stop giving responses for most of its endpoints. It will just start behaving like a blackhole for those. Once in this state, it will stay there if we don't take any action.
Update
We can reproduce this behaviour with a db dump we made when the backend was in the non responding state.
Infrastructure Setup
We are running Play 2.5 in AWS on an EC2 instance behind a loadbalancer with a PostgreSQL database on RDS. We are using slick-pg as our database connector.
What we know
Here a few things we figured out so far.
About the HTTP requests
Our logs and debugging shows us that the requests are passing the filters. Also, we see that for the authentication (we are using Silhoutte for that) the application is able to perform database queries to receive the identity for that request. The controllers action will just never be called, though.
The backend is responing for HEAD requests. Further logging showed us that it seems that Controllers using injected services (we are using googles guice for that) are the ones whose methods are not being called anymore. Controllers without injected services seem to work fine.
About the EC2 instance
Unfortunately, we are not able to get much information from that one. We are using boxfuse which provides us with an immutable and by that in-ssh-able infrastructure. We are about to change this to a docker based deployment and might be able to provide more information soon. Nevertheless, we have New Relic setup to monitor our servers. We cannot find anything suspicious there. The memory and CPU usages look fine.
Still, this setup gives us a new EC2 instance on every deployment anyway. And even after a redeployment the issue persists at least for most of the times. Eventually it is possible to resolve this with a redeployment.
Even more weird is the fact that we can run the backend locally connected to the Database on AWS and everything will just work fine there.
So it is hard to say for us where the problem lies. It seems the db is not working with any EC2 instance (until it will work with a new one eventually) but with our local machines.
About the Database
The db is the only stateful entity in this setup, so we think the issue somehow should be related to it.
As we have a production and a staging environment, we are able to dump the production db into staging when the later is not working anymore. We found that this indeed resolves the issue immediately. Unfortunately, we were not able to take a snapshot from a somehow corrupt database to dump it into the staging environment and see whether this will break it immediately. We have a snapshot of the db when the backend was not responding anymore. When we dump this to our staging environment the backend will stop responding immediately.
The number of connections to the DB is around 20 according to the AWS console which is normal.
TL;DR
Our backend starts behaving like a blackhole for some of its endpoints eventually
The requests are not reaching the controller actions
A new instance in EC2 might resolve this, but not necessarily
locally with the very same db everything is working fine
dumping a working db into it resolves the issue
CPU and memory usages of the EC2 instances as well as the number of connections to the DB look totally fine
We can reproduce the behaviour with a db dump we made when the backend was not responding anymore (see Update 2)
with new slick threadpool settings, we would get ThreadPoolExecutor exceptions from slick after a reboot of the db with a reboot of our ec2 instance afterwards. (see Update 3)
Update 1
Responding to marcospereira:
Take for instance this ApplicationController.scala:
package controllers
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
import akka.actor.ActorRef
import com.google.inject.Inject
import com.google.inject.name.Named
import com.mohiva.play.silhouette.api.Silhouette
import play.api.i18n.{ I18nSupport, MessagesApi }
import play.api.mvc.Action
import play.api.mvc.Controller
import jobs.jobproviders.BatchJobChecker.UpdateBasedOnResourceAvailability
import utils.auth.JobProviderEnv
/**
* The basic application controller.
*
* #param messagesApi The Play messages API.
* #param webJarAssets The webjar assets implementation.
*/
class ApplicationController #Inject() (
val messagesApi: MessagesApi,
silhouette: Silhouette[JobProviderEnv],
implicit val webJarAssets: WebJarAssets,
#Named("batch-job-checker") batchJobChecker: ActorRef
)
extends Controller with I18nSupport {
def index = Action.async { implicit request =>
Future.successful(Ok)
}
def admin = Action.async { implicit request =>
Future.successful(Ok(views.html.admin.index.render))
}
def taskChecker = silhouette.SecuredAction.async {
batchJobChecker ! UpdateBasedOnResourceAvailability
Future.successful(Ok)
}
}
The index and admin are just working fine. The taskchecker will show the weird behaviour, though.
Update 2
We are able to reproduce this issue now! We found that we made a db dump the last time our backend was not responding anymore. When we dump this into our staging database, the backend will stop responding immediately.
We started logging the number of threads now in one of our filters using Thread.getAllStackTraces.keySet.size and found that there are between 50 and 60 threads running.
Update 3
As #AxelFontaine suggested we enabled MultiAZ Deployment failover for the database. We rebooted the database with failover. Before, during and after the reboot the backend was not responding.
After the reboot we noticed that the number of connections to the db stayed 0. Also, we did not get any logs for authentication anymore (before we did so, the authentication step could even make db requests and would get responses).
After a rebooting of the EC2 instance, we are now getting
play.api.UnexpectedException: Unexpected exception[RejectedExecutionException: Task slick.backend.DatabaseComponent$DatabaseDef$$anon$2#76d6ac53 rejected from java.util.concurrent.ThreadPoolExecutor#6ea1d0ce[Running, pool size = 4, active threads = 4, queued tasks = 5, completed tasks = 157]]
(we did not get those before)
for our requests as well as for our background jobs that need to access the db. Our slick settings now include
numThreads = 4
queueSize = 5
maxConnections = 10
connectionTimeout = 5000
validationTimeout = 5000
as suggested here
Update 4
After we got the exceptions as described in Update 3, the backend is now running fine again. We didn't do anything for that. This was the first time the backend would recover from this state without us being involved.
It sounds like a thread management issue at first glance. Slick will provide its own execution context for database operations if you are using Slick 3.1, but you do want to manage the queue size so that it maps out to roughly the same size as the database:
myapp = {
database = {
driver = org.h2.Driver
url = "jdbc:h2:./test"
user = "sa"
password = ""
// The number of threads determines how many things you can *run* in parallel
// the number of connections determines you many things you can *keep in memory* at the same time
// on the database server.
// numThreads = (core_count (hyperthreading included))
numThreads = 4
// queueSize = ((core_count * 2) + effective_spindle_count)
// on a MBP 13, this is 2 cores * 2 (hyperthreading not included) + 1 hard disk
queueSize = 5
// https://groups.google.com/forum/#!topic/scalaquery/Ob0R28o45eM
// make larger than numThreads + queueSize
maxConnections = 10
connectionTimeout = 5000
validationTimeout = 5000
}
}
Also, you may want to use a custom ActionBuilder, and inject a Futures component and add
import play.api.libs.concurrent.Futures._
once you do that, you can add future.withTimeout(500 milliseconds) and time out the future so that an error response will come back. There's an example of a custom ActionBuilder in the Play example:
https://github.com/playframework/play-scala-rest-api-example/blob/2.5.x/app/v1/post/PostAction.scala
class PostAction #Inject()(messagesApi: MessagesApi)(
implicit ec: ExecutionContext)
extends ActionBuilder[PostRequest]
with HttpVerbs {
type PostRequestBlock[A] = PostRequest[A] => Future[Result]
private val logger = org.slf4j.LoggerFactory.getLogger(this.getClass)
override def invokeBlock[A](request: Request[A],
block: PostRequestBlock[A]): Future[Result] = {
if (logger.isTraceEnabled()) {
logger.trace(s"invokeBlock: request = $request")
}
val messages = messagesApi.preferred(request)
val future = block(new PostRequest(request, messages))
future.map { result =>
request.method match {
case GET | HEAD =>
result.withHeaders("Cache-Control" -> s"max-age: 100")
case other =>
result
}
}
}
}
so you'd add the timeout, metrics (or circuit breaker if the database is down) here.
After some more investigations we found that one of our jobs was generating deadlocks in our database. The issue we were running into is a known bug in the slick version we were using and is reported on github.
So the problem was that we were running DB transactions with .transactionally within a .map of a DBIOAction on too many threads at the same time.
I have streaming of data coming from SparkStreaming. Which i need to process and finally want to store the data in Cassandra. So, earlier i was trying to use SparkCassandra connector. But it doesn't give the access of SparkStreaming Context object on workers. So, I have to use separate cassandra-scala driver. Hence, i ended up with phantom. Now, my question is i have already defined the column family in the cassnandra. So, how do i do the select and update query from scala.
I have followed these documentation link1 but i don't understand why do we need to give the table definition at client (scala code) side. Why can't we just give Keyspace, ClusterPoints and ColumnFamily and be done with it.
object CustomConnector {
val hosts = Seq("IP1", "IP2")
val Connector = ContactPoints(hosts).keySpace("KEYSPACE_NAME")
}
realTimeAgg.foreachRDD{ x => if (x.toLocalIterator.nonEmpty) {
x.foreachPartition {
How to achieve select/insert in Cassandra table here using phantom
}
This is not yet possible using phantom, we are actively working on phantom-spark to allow you to do this, but at this stage in time this is still a few months away.
In the interim, you will have to rely on the spark cassandra connector and use the non type-safe API to achieve this. It's a more unfortunate setup, but in the very near future this will be resolved.
Note: I realise there is a similar question on SO but it talks about an old version of Casbah, plus, the behaviour explained in the answer is not what I see!
I was under the impression that Casbah's MongoClient handled connection pooling. However, doing lsof on my process I see a big and growing number of mongodb connections, which makes me doubt this pooling actually exists.
Basically, this is what I'm doing:
class MongodbDataStore {
val mongoClient = MongoClient("host",27017)("database")
var getObject1(): Object1 = {
val collection = mongoClient("object1Collection")
...
}
var getObject2(): Object2 = {
val collection = mongoClient("object2Collection")
...
}
}
So, I never close MongoClient.
Should I be closing it after every query? Implement my own pooling? What then?
Thank you
Casbah is a wrapper around the MongoDB Java client, so the connection is actually managed by it.
According to the Java driver documentation (http://docs.mongodb.org/ecosystem/drivers/java-concurrency/) :
If you are using in a web serving environment, for example, you should
create a single MongoClient instance, and you can use it in every
request. The MongoClient object maintains an internal pool of
connections to the database (default maximum pool size of 100). For
every request to the DB (find, insert, etc) the Java thread will
obtain a connection from the pool, execute the operation, and release
the connection. This means the connection (socket) used may be
different each time.
By the way, that's what I've experienced in production. I did not see any problem with this.
How I can (should) configure Grails integration tests to rollback transactions automatically when using MongoDB as datasource?
(I'm using Grails 2.2.1 + mongodb plugin 1.2.0)
For spock integration tests I defined a MongoIntegrationSpec that gives some control of cleaning up test data.
dropDbOnCleanup = true // will drop the entire DB after every feature method is executed.
dropDbOnCleanupSpec = true // will drop the entire DB after after the spec is complete.
dropCollectionsOnCleanup = ["collectionA", "collectionB", ...] // drops collections after every feature method is executed.
dropCollectionsOnCleanupSpec = ["collectionA", "collectionB", ...] // drops collections after the spec is complete.
dropNewCollectionsOnCleanup = true // after every feature method is executed, all new collections are dropped
dropNewCollectionsOnCleanupSpec = true // after the spec is complete, all new collections are dropped
Here's the source
https://github.com/onetribeyoyo/mtm/tree/dev/src/test/integration/com/onetribeyoyo/util/MongoIntegrationSpec.groovy
And the project has a couple usage examples too.
I don't think that it's even possible, because MongoDB doesn't support transactions. You could use suggested static transactional = 'mongo', but it helps only if you didn't flush your data (it's rare situation I think)
Instead you could cleanup database on setUp() manually. You can drop collection for a domain that you're going to test, like:
MyDomain.collection.drop()
and (optinally) fill with all data require for your test.
Can use static transactional = 'mongo' in integration test and/or service class.
Refer MongoDB Plugin for more details.
MongoDB does not support transactions! And hence you cannot use it. The options you have are
1. Go around and drop the collections for the DomainClasses you used.
MyDomain.collection.drop() //If you use mongoDB plugin alone without hibernate
MyDomain.mongo.collection.drop() //If you use mongoDB plugin with hibernate
Draw back is you have to do it for each domain you used
2. Drop the whole database (You don't need to create it explicitly, but you can)
String host = grailsApplication.config.grails.mongo.host
Integer port = grailsApplication.config.grails.mongo.port
Integer databaseName = grailsApplication.config.grails.mongo.databaseName
def mongo = new GMongo(host, port)
mongo.getDB(databaseName).dropDatabase() //this takes 0.3-0.5 seconds in my machin
The second option is easier and faster. To make this work for all your tests, extend IntegrationSpec and add the code to drop the database in the cleanup block (I am assuming you are using Spock test framework) or do a similar thing for JUnit like tests!
Hope this helps!