How to properly handle mongoose schema migrations? - mongodb

I'm completely new to MongoDB & Mongoose and can't seem to find an answer as to how to handle migrations when a schema changes.
I'm used to running migration SQL scripts that alter table structure and any underlying data that needs to be changed. This typically involves DB downtime.
How is this typically handled within MongoDB/Mongoose? Any gotcha's that I need to be aware of?

In coming across this and reasonably understanding how migrations work on a relational database, MongoDB makes this a little simpler. I've come to 2 ways to break this down. The things to consider when dealing with data migrations in MongoDB (not all that uncommon from RDBs) are:
Ensuring local test environments do not break when a developer merges the latest from the project repository
Ensuring any data is correctly updated on the live version regardless if a user is logged in or out if authentication is used. (Of course if everyone is automatically logged out when an upgrade is made, then only worrying about when a user logs in is necessary).
1) If your change will log everyone out or application downtime is expected then the simple way to do this is have a migration script to connect to local or live MongoDB and upgrade the correct data. Example where a user's name is changed from a single string to an object with given and family name (very basic of course and would need to be put into a script to run for all developers):
Using the CLI:
mongod
use myDatabase
db.myUsers.find().forEach( function(user){
var curName = user.name.split(' '); //need some more checks..
user.name = {given: curName[0], family: curName[1]};
db.myUsers.save( user );
})
2) You want the application to migrate the schemas up and down based on the application version they are running. This will obviously be less of a burden for a live server and not require down time due to only upgrading users when they use the upgraded / downgraded versions for the first time.
If your using middleware in Expressjs for Nodejs:
Set an app variable in your root app script via app.set('schemaVersion', 1) which will be used later to compare to the users schema version.
Now ensure all the user schemas have a schemaVersion property as well so we can detect a change between the application schema version and the current MongoDB schemas for THAT PARTICULAR USER only.
Next we need to create simple middleware to detect the config and user version
app.use( function( req, res, next ){
//If were not on an authenticated route
if( ! req.user ){
next();
return;
}
//retrieving the user info will be server dependent
if( req.user.schemaVersion === app.get('schemaVersion')){
next();
return;
}
//handle upgrade if user version is less than app version
//handle downgrade if user version is greater than app version
//save the user version to your session / auth token / MongoDB where necessary
})
For the upgrade / downgrade I would make simple js files under a migrations directory with an upgrade / downgrade export functions that will accept the user model and run the migration changes on that particular user in the MongoDB. Lastly ensure the users version is updated in your MongoDB so they don't run the changes again unless they move to a different version again.

If you're used to SQL type migrations or Rails-like migrations then you'll find my cli tool migrate-mongoose the right fit for you.
It allows you to write migrations with an up and a down function and manages the state for you based on success and failure of your migrations.
It also supports ES6 if you're using ES 2015 syntax.
You get access to your mongoose models via the this object, making it easy to make the changes you need to your models and schemas.

There are 2 types of migrations:
Offline: Will require you to take your service down for maintenance, then iterate over the entire collection and make the changes you need.
Online: Does not require to take your service down for maintenance. When you read the document, you check its version, and run a version specific migration routine for each version between the old and the new. Then you load the resulting thing.
Not all services can afford an offline migration, I recommend the online approach.

Related

Meteor - using snychronised non-persistent / in-memory MongoDB on the server

in a Meteor app, having real-time reactive updates between all connected clients is achieved with writing in collections, publishing and subscribing the right data. In normal case this means also database writes.
But what if I would like to sync particular data which does not need to be persistent and I would like to save the overhead of writing in the database ? Is it possible to use mini-mongo or other in-memory caching on the server by still preserving DDP synchronisation to all clients ?
Example
In my app I have a multiple collapsed threads and I want to show, which users currently expanded particular thread
Viewed by: Mike, Johny, Steven ...
I can store the information in the threads collection or make make a separate viewers collection and publish the information to the clients. But there is actually no meaning in making this information persistent an having the overhead of database writes.
I am confused by the collections documentation. which states:
OPTIONS
connection Object
The server connection that will manage this collection. Uses the default connection if not specified. Pass the return value of calling DDP.connect to specify a different server. Pass null to specify no connection.
and
... when you pass a name, here’s what happens:
...
On the client (and on the server if you specify a connection), a Minimongo instance is created.
But If I create a new collection and pass the option object with conneciton: null
// Creates a new Mongo collections and exports it
export const Presentations = new Mongo.Collection('presentations', {connection: null});
/**
* Publications
*/
if (Meteor.isServer) {
// This code only runs on the server
Meteor.publish(PRESENTATION_BY_MAP_ID, (mapId) => {
check(mapId, nonEmptyString);
return Presentations.find({ matchingMapId: mapId });
});
}
no data is being published to the clients.
TLDR: it's not possible.
There is no magic in Meteor that allow data being synced between clients while the data doesn't transit by the MongoDB database. The whole sync process through publications and subscriptions is triggered by MongoDB writes. Hence, if you don't write to database, you cannot sync data between clients (using the native pub/sub system available in Meteor).
After countless hours of trying everything possible I found a way to what I wanted:
export const Presentations = new Mongo.Collection('presentations', Meteor.isServer ? {connection: null} : {});
I checked the MongoDb and no presentations collection is being created. Also, n every server-restart the collection is empty. There is a small downside on the client, even the collectionHanlde.ready() is truthy the findOne() first returns undefined and is being synced afterwards.
I don't know if this is the right/preferable way, but it was the only one working for me so far. I tried to leave {connection: null} in the client code, but wasn't able to achieve any sync even though I implemented the added/changed/removed methods.
Sadly, I wasn't able to get any further help even in the meteor forum here and here

Change MongoDB Collection from local to server-side on running Meteor App

Due to the Meteor Docs there are 'server-side', 'client-side' and 'local' Collections. Is there a way to change the 'status' (e.g. if it's server-side, client-side or local) on a running app?
Use Case: A Web-Application where users can register and login. They can store sensible data. Depending on the Users personal preferences he should be able to choose if that data is stored local or on the server (General decision - not from case to case).
Current Approach: It's working fine if I either instantiate the Collection local CollectionName = new Mongo.Collection(null); or server side CollectionName = new Mongo.Collection('collectionName');.
But I can't think of an approach to make it possible to the user that he can change the Collection status.
Is there a way to do this?
Or is a workaround needed (e.g. Create both, a local and server-side Collaction, and just decide which to use for insert/update/find - what would mean a lot of duplicate code?!).
Edit: To make thinks clear: I want the user to be able to choose if his data is stored in a collection which is synced with the server or a collection without any syncing.
No, you can't change the type of a collection on a running app.
I think you are confused about what these terms mean. "Client-side" collections aren't permanently stored in localstorage. It just means it's a collection that's in the browser's memory. Just as "server-side" collections are those that reside in the server's memory. The difference is not how it's defined, but where the code runs. Most collections have a client-side and a server-side counterpart, and they are kept synchronized via pub/sub. Server-side collections are also synchronized with MongoDB (using the oplog).
Local collections can live in both places, but "local" means they aren't synchronized with anything.
I probably don't fully understand what you are trying to do, but local collections do not persist data.
If you pass null as the name, then you’re creating a local collection. It’s not synchronized anywhere; it’s just a local scratchpad that supports Mongo-style find, insert, update, and remove operations. (On both the client and the server, this scratchpad is implemented using Minimongo.)
This means any data added to them on the client will be blown away when the user closes their browser (unless you are also using one of the local collection persist meteor packages) and any data added to them on the server will be blown away when the meteor app is restarted. So I don't think you really want to use local collections.
Instead, I would use a regular collection (where a name is passed to the constructor) and either the standard allow or deny options (not really recommended anymore...but still a valid approach) or Meteor methods (the preferred approach) to control who can change data and what data is allowed to change.
Or, another option could be to pass your publication function a list of fields that the user wishes to see on the client for that given session. To do this you defined a new publication that receives a displayFields argument that you then use as the field specifier options in your collection .find().
Meteor.publish("userData", function (userId, displayFields) {
// validate the structure and contents of displayFields
// retrieve the data but only use the fields that the user requested
return UserData.find({user_id: userId}, {fields: displayFields});
});
Then on the client side you would subscribe to this and pass in the fields the user wishes to make visible on the client.
var displayFields = {
firstname: 1,
lastname: 0,
//...
};
this.subscribe("userData", [displayFields]);

Is meteor using the Mongo Oplog?

How can I check if meteor is using the oplog of my mongo?
I have a cluster of mongo and set two envs for my meteor.
MONGO_URL=mongodb://mongo/app?replicaSet=rs0
MONGO_OPLOG_URL=mongodb://mongo/local?authSource=app
How can I check if the opt log is actually in use. Meteor can fallback to query polling which is very inefficient but I would like to see if it's working properly with the oplog.
Any ideas?
Quoting the relevant bits from Meteor's OplogObserveDriver docs:
How to tell if your queries are using OplogObserveDriver
For now, we only have a crude way to tell how many observeChanges calls are using OplogObserveDriver, and not which calls they are.
This uses the facts package, an internal Meteor package that exposes real-time metrics for the current Meteor server. In your app, run meteor add facts, and add the {{> serverFacts}} template to your app. If you are using the autopublish package, Meteor will automatically publish all metrics to all users. If you are not using autopublish, you will have to tell Meteor which users can see your metrics by calling Facts.setUserIdFilter in server code; for example:
Facts.setUserIdFilter(function (userId) {
var user = Meteor.users.findOne(userId);
return user && user.admin;
});
(When running your app locally, Facts.setUserIdFilter(function () { return true; }); may be good enough!)
Now look at your app. The facts template will render a variety of metrics; the ones we're looking for are observe-drivers-oplog and observe-drivers-polling in the mongo-livedata section. If observe-drivers-polling is zero or not rendered at all, then all of your observeChanges calls are using OplogObserveDriver!
To set up oplog tailing, you need to set up a user on my_database, and an oplog_user on local. Then, specify the following URIs to connect to your replica set named test-shard (e.g. if there are 3 hosts named test-shard-[0-2]):
MONGO_URL="mongodb://user:PASS#test-shard-0.mongodb.net:27017,test-shard-1.mongodb.net:27017,test-shard-2.mongodb.net:27017/my_database?ssl=true&replicaSet=test-shard&authSource=admin"
MONGO_OPLOG_URL="mongodb://oplog_user:PASS#test-shard-0.mongodb.net:27017,test-shard-1.mongodb.net:27017,test-shard-2.mongodb.net:27017/local?ssl=true&replicaSet=test-shard&authSource=admin"
On MongoDB Atlas they require ssl=true, and also all users authenticate through the admin database. On another deployment you might just authenticate through my_database, in which case you'd remove the authsource=admin for MONGO_URL and write authsource=my_database for MONGO_OPLOG_URL. See this post for another example.
With MongoDB 3.6 and the Mongo node driver 3.0+, you may be able to use a succinct notation for DNS seedlist connections, e.g. on MongoDB Atlas, to specify the environment variables:
MONGO_URL="mongodb+srv://user:PASS#foo.mongodb.net/my_database"
MONGO_OPLOG_URL="mongodb+srv://oplog_user:PASS#foo.mongodb.net/local"
The link above explains how this notation fills in the ssl, replicaSet, and authSource arguments. This is a lot nicer than the long strings above, and also means you can scale your replica set up and down without needing to reconfigure anything.
As hwillson mentioned, use the facts-ui and facts-base packages (formerly facts) to see if there are any oplogObserveDrivers running in your app. If they are all pollingObserveDriver, than oplog is not set up correctly.
If you are using Kadira APM to monitor your app's performance, you can see if oplogs are working by navigating to the "Live Queries" section and having a look at the "Oplog notifications" chart.
You can see in my screenshot that oplogs are working, as values appear in the chart (bottom right). If oplogs weren't working then this chart would be empty.
This may be very late, but this is the only way that worked for me :
someCollection._driver.mongo._oplogHandle
if this is set to null then the oplog is not enabled, otherwise you can use this handle to check for more details.

meteor: use different database for each user

I currently assign a mongodb to my meteor app using the env variable
"MONGO_URL": "mongodb://localhost:27017/dbName" when I start the meteor instance.
So all data gets written to the mongo database with the name "dbName".
I am looking for a way to individually set the dbName for each custumer upon login in order to seperate their data into different databases.
This generally unsupported as this is defined at startup. However, this thread offers a possible solution:
https://forums.meteor.com/t/switch-database-while-meteor-is-running/4361/6
var database = new MongoInternals.RemoteCollectionDriver("<mongo url>");
MyCollection = new Mongo.Collection("collection_name", { _driver: database });
This would allow you to define the database name in the mongo url but would require a fair bit of extra work to redefine your collections on a customer by customer basis.
Here's another approach that will make your life eternally easier:
Create a generic site with no accounts at mysite.com
When they login at mysite.com, figure out what site they actually belong to and redirect them to customerName.mysite.com and log them in there
Run a separate instance of Meteor configured for a different mongo at each site
nginx might help you with the above.
It is generally good practice to run separate DBs when offering a B2B
solution.
That's a matter of opinion that depends heavily on the platform. Many SaaS providers would argue that point.

How DSpace process a query in jspui?

How any query is processed in DSpace and data is managed between front end and PostgreSQL
Like every other webapp running in a Servlet Container like Tomcat, the file WEB-INF/web.xml controls how a query is processed. In case of DSpace's JSPUI you'll find this file in [dspace-install]/webapps/jspui/WEB-INF/web.xml. The JSPUI defines several filters, listeners and servlets to process a request.
The filters are used to report that the JSPUI is running, that restricted areas can be seen by authenticated users or even by authenticated administrators only and to handle Content Negotiation.
The listeners ensure that DSpace has started correctly. During its start DSpace loads the configuration, opens database connections that it uses in a connection pool, let Spring do its IoC magic and so on.
For the beginning the most important part to see how a query is processed are the servlets and the servlet-mappings. A servlet-mapping defines which servlet is used to process a request with a specific request path: e.g. all requests to example.com/dspace-jspui/handle/* will be processed by org.dspace.app.webui.servlet.HandleServlet, all requests to example.com/dspace-jspui/submit will be processed by org.dspace.app.webui.servlet.SubmissionController.
The servlets uses their Java code ;-) and the DSpace Java API to process the request. You'll find most of it in the dspace-api module (see [dspace-source]/dspace-api/src/main/java/...) and some smaller part in dspace-services module ([dspace-source]dspace-services/src/main/java/...). Within the DSpace Java API their are two important classes if you're interested in the communication with the database:
One is org.dspace.core.Context. The context contains information whether and which user is logged in, an initialized and connected database connection (if all went well) and a cache. The methods Context.abort(), Context.commit() and Context.complete() are used to manage the database transaction. That is the reason, why almost all methods manipulating the database requests a Context as method parameter: it controls the database connection and the database transaction.
The other one is org.dspace.storage.rdbms.DatabaseManager. The DatabaseManager is used to handle database queries, updates, deletes and so on. All DSpaceObjects contains an object TableRow which contains the information of the object stored in the database. Inside the DSpaceObject classes (e.g. org.dspace.content.Item, org.dspace.content.Collection, ...) the TableRow may be manipulated and the changes stored back to the database by using DatabaseManager.update(Context, DSpaceObject). The DatabaseManager provides several methods to send SQL queries to the database, to update, delete, insert or even create data in the database. Just take a look to its API or look for "SELECT" it the DSpace source to get an example.
In JSPUI it is important to use Context.commit() if you want to commit the database state. If a request is processed and Context.commit() was not called, then the transaction will be aborted and the changes gets lost. If you call Context.complete() the transaction will be committed, the database connection will be freed and the context is marked as been finished. After you called Context.complete() the context cannot be used for a database connection any more.
DSpace is quite a huge project and their could be written a lot more about its ORM, the initialization of the database and so on. But this should already help you to start developing for DSpace. I would recommend you to read the part "Architecture" in the DSpace manual: https://wiki.duraspace.org/display/DSDOC5x/Architecture
If you have more specific questions you are always invited to ask them here on stackoverflow or on our mailing lists (http://sourceforge.net/p/dspace/mailman/) dspace-tech (for any question about DSpace) and dspace-devel (for question regarding the development of DSpace).
It depends on the version of DSpace you are running, along with your configuration.
In DSpace 4.0 or above, by default, the DSpace JSPUI uses Apache Solr for all searching and browsing. DSpace performs all indexing and querying of Solr via its Discovery module. The Discovery (Solr) based searche/indexing classes are available under the "org.dspace.discovery" package.
In earlier versions of DSpace (3.x or below), by default, the DSpace JSPUI uses Apache Lucene directly. In these older versions, DSpace called Lucene directly for all indexing and searching. The Lucene based search/indexing classes are available under the "org.dspace.search" package.
In both situations, queries are passed directly to either Solr or Lucene (again depending on the version of DSpace). The results are parsed and displayed within the DSpace UI.