Import "normal" MongoDB collections into DerbyJS 0.6 - mongodb

Same situation like this question, but with current DerbyJS (version 0.6):
Using imported docs from MongoDB in DerbyJS
I have a MongoDB collection with data that was not saved through my
Derby app. I want to query against that and pull it into my Derby app.
Is this still possible?
The accepted answer there links to a dead link. The newest working link would be this: https://github.com/derbyjs/racer/blob/0.3/lib/descriptor/query/README.md
Which refers to the 0.3 branch for Racer (current master version is 0.6).
What I tried
Searching the internets
The naïve way:
var query = model.query('projects-legacy', { public: true });
model.fetch(query, function() {
query.ref('_page.projects');
})
(doesn't work)

A utility was written for this purpose: https://github.com/share/igor
You may need to modify it to only run against a single collection instead of the whole database, but it essentially goes through every document in the database and modifies it with the necessary livedb metadata and creates a default operation for it as well.
In livedb every collection has a corresponding operations collection, for example profiles will have a profiles_ops collection which holds all the operations for the profiles.
You will have to convert the collection to use it with Racer/livedb because of the metadata on the document itself.
An alternative if you dont want to convert is to use traditional AJAX/REST to get the data from your mongo database and then just put it in your local model. This will not be real-time or synced to the server but it will allow you to drive your templates from data that you dont want to convert for some reason.

Related

Meteor - using snychronised non-persistent / in-memory MongoDB on the server

in a Meteor app, having real-time reactive updates between all connected clients is achieved with writing in collections, publishing and subscribing the right data. In normal case this means also database writes.
But what if I would like to sync particular data which does not need to be persistent and I would like to save the overhead of writing in the database ? Is it possible to use mini-mongo or other in-memory caching on the server by still preserving DDP synchronisation to all clients ?
Example
In my app I have a multiple collapsed threads and I want to show, which users currently expanded particular thread
Viewed by: Mike, Johny, Steven ...
I can store the information in the threads collection or make make a separate viewers collection and publish the information to the clients. But there is actually no meaning in making this information persistent an having the overhead of database writes.
I am confused by the collections documentation. which states:
OPTIONS
connection Object
The server connection that will manage this collection. Uses the default connection if not specified. Pass the return value of calling DDP.connect to specify a different server. Pass null to specify no connection.
and
... when you pass a name, here’s what happens:
...
On the client (and on the server if you specify a connection), a Minimongo instance is created.
But If I create a new collection and pass the option object with conneciton: null
// Creates a new Mongo collections and exports it
export const Presentations = new Mongo.Collection('presentations', {connection: null});
/**
* Publications
*/
if (Meteor.isServer) {
// This code only runs on the server
Meteor.publish(PRESENTATION_BY_MAP_ID, (mapId) => {
check(mapId, nonEmptyString);
return Presentations.find({ matchingMapId: mapId });
});
}
no data is being published to the clients.
TLDR: it's not possible.
There is no magic in Meteor that allow data being synced between clients while the data doesn't transit by the MongoDB database. The whole sync process through publications and subscriptions is triggered by MongoDB writes. Hence, if you don't write to database, you cannot sync data between clients (using the native pub/sub system available in Meteor).
After countless hours of trying everything possible I found a way to what I wanted:
export const Presentations = new Mongo.Collection('presentations', Meteor.isServer ? {connection: null} : {});
I checked the MongoDb and no presentations collection is being created. Also, n every server-restart the collection is empty. There is a small downside on the client, even the collectionHanlde.ready() is truthy the findOne() first returns undefined and is being synced afterwards.
I don't know if this is the right/preferable way, but it was the only one working for me so far. I tried to leave {connection: null} in the client code, but wasn't able to achieve any sync even though I implemented the added/changed/removed methods.
Sadly, I wasn't able to get any further help even in the meteor forum here and here

Atomically query for all collection documents + watching for further changes

Our Java app saves its configurations in a MongoDB collections. When the app starts it reads all the configurations from MongoDB and caches them in Maps. We would like to use the change stream API to be able also to watch for updates of the configurations collections.
So, upon app startup, first we would like to get all configurations, and from now on - watch for any further change.
Is there an easy way to execute the following atomically:
A find() that retrieves all configurations (documents)
Start a watch() that will send all further updates
By atomically I mean - without potentially missing any update (between 1 and 2 someone could update the collection with new configuration).
To make sure I lose no update notifications, I found that I can use watch().startAtOperationTime(serverTime) (for MongoDB of 4.0 or later), as follows.
Query the MongoDB server for its current time, using command such as Document hostInfoDoc = mongoTemplate.executeCommand(new Document("hostInfo", 1))
Query for all interesting documents: List<C> configList = mongoTemplate.findAll(clazz);
Extract the server time from hostInfoDoc: BsonTimestamp serverTime = (BsonTimestamp) hostInfoDoc.get("operationTime");
Start the change stream configured with the saved server time ChangeStreamIterable<Document> changes = eventCollection.watch().startAtOperationTime(serverTime);
Since 1 ends before 2 starts, we know that the documents that were returned by 2 were at least same or fresher than the ones on that server time. And any updates that happened on or after this server time will be sent to us by the change stream (I don't care to run again redundant updates, because I use map as cache, so extra add/remove won't make a difference, as long as the last action arrives).
I think I could also use watch().resumeAfter(_idOfLastAddedDoc) (didn't try). I did not use this approach because of the following scenario: the collection is empty, and the first document is added after getting all (none) documents, and before starting the watch(). In that scenario I don't have previous document _id to use as resume token.
Update
Instead of using "hostInfo" for getting the server time, which couldn't be used in our production, I ended using "dbStats" like that:
Document dbStats= mongoOperations.executeCommand(new Document("dbStats", 1));
BsonTimestamp serverTime = (BsonTimestamp) dbStats.get("operationTime");

Accessing files in Mongodb

I am using sacred package in python, this allows to keep track of computational experiments i'm running. sacred allows to add observer (mongodb) which stores all sorts of information regarding the experiment (configuration, source files etc).
sacred allows to add artifacts to the db bt using sacred.Experiment.add_artifact(PATH_TO_FILE).
This command essentially adds the file to the DB.
I'm using MongoDB compass, I can access the experiment information and see that an artifact has been added. it contains two fields:
'name' and 'file_id' which contains an ObjectId. (see image)
I am attempting to access the stored file itself. i have noticed that under my db there is an additional sub-db called fs.files in it i can filter to find my ObjectId but it does not seem to allow me to access to content of the file itself.
Code example for GridFS (import gridfs, pymongo)
If you already have the ObjectId:
artifact = gridfs.GridFS(pymongo.MongoClient().sacred)).get(objectid)
To find the ObjectId for an artifact named filename with result as one entry of db.runs.find:
objectid = next(a['file_id'] for a in result['artifacts'] if a['name'] == filename)
MongoDB file storage is handled by "GridFS" which basically splits up files in chunks and stores them in a collection (fs.files).
Tutorial to access: http://api.mongodb.com/python/current/examples/gridfs.html
I wrote a small library called incense to access data from MongoDB stored via sacred. It is available on GitHub at https://github.com/JarnoRFB/incense and via pip. With it you can load experiments as Python objects. The artifacts will be available as objects that you can again save on disk or display in a Jupyter notebook
from incense import ExperimentLoader
loader = ExperimentLoader(db_name="my_db")
exp = loader.find_by_id(1)
print(exp.artifacts)
exp.artifacts["my_artifact"].save() # Save artifact on disk.
exp.artifacts["my_artifact"].render() # Display artifact in notebook.

How to check if a collection has changed?

I've created a JSON API with Express.js, Mongoose and MongoDB. Currently, there's no way for the clients of the API to check if the data in a collection has changed - they would need to download the whole collection periodically.
How could I allow the clients of the API to check for changes to a collections (inserts, updates, deletions) without downloading the collection itself?
Is there a way of getting the version number of the collection, the last change timestamp or a hash of the collection with Mongoose? What is the best practice solution to this problem?
In current and earlier versions of MongoDB, you have to do it on the application side, maybe using polling.
MongoDB 3.6 has a brand new feature called change stream that allows you to listen changes happening on your collections in real time.
The sample code to listen selected changes happening on your collection is below:
var MongoClient = require('mongodb').MongoClient
, assert = require('assert');
MongoClient.connect("mongodb://172.16.0.110:27017/myproject?readConcern=majority").then(function(client){
var db = client.db('myproject')
var changeStreams = db.collection('documents').watch()
changeStreams.on('change', function(change){
console.log(change)
})
})
If you are using node.js, you need to use following node module to get it working:
"dependencies": {
"mongodb": "mongodb/node-mongodb-native#3.0.0"
}

Comparing geospatial paths with MongoDB

I'm working on a mobile app that tracks a user's location at regular intervals to allow him to plot the path of a journey on a map. We'd like to add an optional feature that will tell him what other users of the app have made similar journeys in the timeframe he's looking at, be it today's commute or the last month of travel. We're referring to this as "path-matching".
The data is currently logged into files within the app's private storage directories on iOS and Android in a binary format that is easily and quickly scanned through to read locations. Each file contains the locations for one day, and generally runs to about 80KB.
To be able to implement the path matching feature we'll obviously need to start uploading these location logs to our server (with the users permission of course), on which we're running PHP. Someone suggested MongoDB for its geospatial prowess - but I've a few questions that maybe folks could help me with:
It seems like we could change our location-logging to use BSON instead. The first field would be a device or user IDs, followed by a list of locations for a particular day. The file could then be uploaded to our server and pushed into the MongoDB store. The online documentation however only seems to refer to importing BSON files created by mongodump. Is the format stable enough that any app could write BSON files readable directly by MongoDB?
Is MongoDB able to run geospatial queries on documents containing multiple locations, or on locations forming a path across multiple documents? Or does this strike you as something that would require excessive logic outside the database, on the PHP side?
The format is totally stable, but there isn't much tooling to do what you describe. Generally, you'd upload it to the backend and it would end up in, say $_POST['locations'] or something that would be an array of associative arrays. Sanitize it and just save it to the database, something like:
$locs = sanitize($_POST['locations']);
$doc = array('path' => array('type' => 'LineString', 'coordinates' => $locs), 'user' => $userId);
$collection->insert($doc);
In the above example, I'm using some of the latest geo stuff (http://docs.mongodb.org/manual/release-notes/2.4/#new-geospatial-indexes-with-geojson-and-improved-spherical-geometry), you'll need a nightly build to get this but it should be in the stable build in about a month. If you need it before then, you can use the older geo API: http://docs.mongodb.org/manual/core/geospatial-indexes/.
MongoDB doesn't read BSON files, but you could use mongorestore to manually load them. I would highly recommend letting the driver do the low-level stuff for you, though!
You can have a document containing a line (in the new geo stuff) and an array of points (in the old geo stuff). I'm not sure what you mean by "a path across multiple documents."
Edited to add: based on your comment, you might want to try {path : {$near : {$geometry : userPath}}} to find "nearby" paths. You could also try making a polygon around the user's path and querying for docs $within the polygon.