Queries in Mongodb Gridfs - Metadata? Further Possibilities? - mongodb

I started using MongoDB with Gridfs some time ago as I need to store documents that are bigger than 16mb. Saving and loading documents worked fine so far.
Next, I wondered how to specify queries for my files stored in GridFS. Lets assume I have instances of my class which looks as follows:
public class Test {
private String id;
private int test;
}
If I want to query for a file which has a certain value for "test", how can I do that in GridFS? I know that I can save my file and store additional metadata.
Hence, I could store the test value in the metadata of the file.
To retrieve the value test of a file, I would do something like this:
String id = "...";
GridFS fs = new GridFS(mongoDB, "TESTFS");
BasicDBObject dbobj = new BasicDBObject();
dbobj.put("filename", id);
GridFSDBFile fsFile = fs.findOne(dbobj);
BasicDBObject metadata = (BasicDBObject) fsFile.get("metadata");
System.out.println("Test: " + metadata.get("test"));
Using metadata, is there an easier way to extract the "test" value of a certain file (without loading the complete file, deserialize the JSON string, etc.)
The disadvantage of this approach is that I have to store the metadata explicitly. If I want to query for other information, I need to introduce this into the metadata for all my data. Is this right?
Is there an alternative to storing such information in the metadata? Or how can I query for specific information in GridFS?
This is obviously a very simple example. The same questions arise when trying to perform more complex queries.

As #wdberkeley noted - you can query db['fs'].files explicitly without fetching the whole file each time (indeed, not the best idea). You can do it using find on the above collections (query as you would form it in MongoDB console - please translate to your language):
db['fs'].files.find({'metadata.test': '#### - the value you want to query for - $$$'})
The cursor that is returned in the MongoDB console and via most drivers is going to allow you to read metadata without ever retrieving the contents of the file.

Related

how to filter files in mongo Db

Am running 4.10 Mongo DB version
How to get the count of files filtered like
i have .doc and .pdf and .csv in file system in MOngo.
how to check the count of each format files (.csv ,.pdf,.doc)
Please let me know
I am assuming you don't mean references, but files that are in GridFS. In this case, the way it works is that Mongo simply puts them in 2 collections, one for files meta data, and another collection for chunks of files, that is to say the data of your files that is sliced in pieces. They are regular collections that you can query like any other collection. Their names are "fs.files" and "fs.chunks".
The one you want is "fs.files". For retrieving the files by type, you can use the field contentType. Here is how you get the number of PDFs:
db.fs.files.find({contentType: "application/pdf"}).count()
// or if your question is only for counting
db.fs.files.count({contentType: "application/pdf"})
Like I said, just like any other collection.
EDIT:
var pdfCount = db.fs.files.find({contentType: "application/pdf"}).count();
var csvCount = db.fs.files.find({contentType: "text/csv"}).count();
var docCount = db.fs.files.find({contentType: "application/msword"}).count();

Flow Router doesn't work with ObjectID. Any fix?

I'm trying to build routes in my Meteor app. Routing works perfectly fine but getting information from db with route path just doesn't work. I create my page specific routes with this:
FlowRouter.route('/level/:id'...
This route takes me to related template without a problem. Then I want to get some data from database that belong to that page. In my template helpers I get my page's id with this:
var id = FlowRouter.getParam('id');
This gets the ObjectID() but in string format. So I try to find that ObjectID() document in the collection with this:
Levels.findOne({_id: id});
But of course documents doesn't have ObjectIDs in string format (otherwise we wouldn't call it "object"id). Hence, it brings an undefined error. I don't want to deal with creating my own _ids so is there anything I can do about this?
PS: Mongo used to create _ids with plain text. Someting like I would get with _id._str now but all of a sudden, it generates ObjectID(). I don't know why, any ideas?
MongoDB used ObjectIds as _ids by default and Meteor explicitly sets GUID strings by default.
Perhaps you inserted using a meteor shell session in the past and now used a mongo shell/GUI or a meteor mongo prompt to do so, which resulted in ObjectIds being created.
If this happens in a development environment, you could generate the data again.
Otherwise, you could try to generate new _ids for your data using Meteor.uuid().
If you want to use ObjectId as the default for a certain collection, you can specify the idGeneration option to its constructor as 'MONGO'.
If you have the string content of an ObjectId and want to convert it, you can issue
let _id = new Mongo.ObjectID(my23HexCharString);

Configure pymongo to use string _id instead of ObjectId

I'm using pymongo to seed a database with old information from a different system, and I have a lot of queries like this:
studentId = studentsRemote.insert({'price': price})
In the actual python script, that studentId prints as a string, but in the javascript Meteor application I'm using this data in, it shows up everywhere as ObjectId(...).
I want to configure pymongo to generate the _id as a string and not bother with ObjectId's
Any objects I create with the Meteor specification will use the string format, and not the ObjectId format. I don't want to have mixing of id types in my application, because it's causing me interoperability headaches.
I'm aware I can create ObjectId's from Meteor but frankly I'd much rather use the string format. It's the Meteor default, it's much simpler, and I can't find any good reason to use ObjectId's in my particular app.
The valueOf() mongo function or something similar could parse the _id and be used to update the document once it's in the database, but it would be nice to have something more direct.
in .py files:
from bson.objectid import ObjectId
......
kvdict['_id'] = str(ObjectId())
......
mongoCollection.insert(kvdict)
it's ok!
It ended up being fairly simple.
The son_manipulator module can be used to change incoming documents to a different form. Most of the time this is used to encode custom objects, but it worked for this as well.
With the manipulator in place, it was just a matter of calling the str() function on the ObjectId to make the transformation.
from pymongo.son_manipulator import SONManipulator
class ObjectIdManipulator(SONManipulator):
def transform_incoming(self, son, collection):
son[u'_id'] = str(son[u'_id'])
return son
db.add_son_manipulator(ObjectIdManipulator())

File versioning with GridFS

I'm trying to store versioned content in mongo DB with GridFS. Therefore I add a version field to the metadata of the file I'm storing. This all works well. Now I want to get the latest version without knowing the version. Here: Find the latest version of a document stored in MongoDB - GridFs someone mentions that findOne always returns the youngest (latest) file if matching the query. What is, what I want. But when I try this, I always get the first (oldest) file from findOne(). I'm using spring-data-mongodbversion 1.5.0.RELEASE
Here my current code:
public void storeFileToGridFs(ContentReference contentReference, InputStream content) {
Integer nextVersion = findLatestVersion(contentReference) + 1;
DBObject metadata = new BasicDBObject();
metadata.put("version", nextVersion);
metadata.put("definitionId", contentReference.getContentDefinitionId());
gridOperations.store(content, contentReference.getContentId().getValue(), metadata);
}
and to find the latest version:
private Integer findLatestVersion(ContentReference contentReference) {
Query query = new Query(GridFsCriteria.whereFilename().is(contentReference.getContentId().getValue()));
GridFSDBFile latestVersionRecord = gridOperations.findOne(query);
if (latestVersionRecord != null) {
Integer version = (Integer) latestVersionRecord.getMetaData().get("version");
return version;
} else return 0;
}
But, as already mentioned, the findLatestVersion() always returns 1 (except the first time, when it returns 0...
If I have this running, is there a way to only retrieve the metadata of the document? In findLatestVersion()it's not necessary to load the file itself.
findOne returns exactly one result, more specifically the first one in the collection matching the query.
I am not too sure whether the latest version is returned when using findOne. Please try find instead.
A more manual approach would be filtering a result set from querying for the file name for the highest value of version.
In general, the version field only shows how often a document was changed. It is used for something which is called optimistic locking, which works by checking the current version of a document against the one the changed document has. If the version in the database is higher than the one in the document to be saved, another process has made changes to the document and an exception is raised.
For storing versioned documents, git (via egit for example) might be a solution.
EDIT: After a quick research, here is how it works. File versioning should be done using the automatically set upload date from the metadata. Query for it, sort descending and use the first result. You do not need to set the version manually any more.
I know it's been a while since this question has been asked and I don't know whether the code has been the same back then, but I think this information may help future readers:
Looking at the source code shows that findOne completely ignores the sorting part defined in the query, while find actually makes use of it.
So you need to make a normal query with find and then select the first object found (refer to Markus W Mahlberg's answer for more information).
Try adding sorting to the query, like this:
GridFSDBFile latestVersionRecord = template.findOne(
new Query(GridFsCriteria.whereFilename().is(filename))
.with(new Sort(Sort.Direction.DESC, "version")));
once you have the GridFSDBFile, you can easily retrieve metadata without loading whole file with the method:
DBObject metadata = latestVersionRecord.getMetaData();
Hope it helps!

How do I save a file to MongoDB?

I want to save a user selected file to MongoDB. How do I correctly add the file to the BSON object in order to add it to MongoDB? If my approach is incorrect please point in the right direction.
Below is the client code. This jQuery functions gathers the text (need help on the file part) on every input field and sends it to the server as a BSON object.
$('#add').click(function()
{
console.log('Creating JSON object...');
var classCode = $('#classCode').val();
var professor = $('#professor').val();
var description = $('#description').val();
var file = $('#file').val();
var document =
{
'classCode':classCode,
'professor':professor,
'description':description,
'file':file,
'dateUploaded':new Date(),
'rating':0
};
console.log('Adding document.');
socket.emit('addDocument', document);
});
The HTML of the form:
<form>
<input type = 'text' placeholder = 'Class code' id = 'classCode'/>
<input type = 'text' placeholder = 'Document description' id = 'description'/>
<input type = 'text' placeholder = 'Professor' id = 'professor'/>
<input type = 'file' id = 'file'/>
<input type = 'submit' id = 'add'/>
</form>
The server side code in CoffeeScript:
#Uploads a document to the server. documentData is sent via javascript from submit.html
socket.on 'addDocument', (documentData) ->
console.log 'Adding document: ' + documentData
db.collection 'documents', (err, collection) ->
collection.insert documentData, safe:false
return
If your files are small enough (under 16 megabytes), instead of adding the complexity of GridFS, you can just embed the files into BSON documents.
BSON has a binary data type, to which any of the drivers should provide access.
If your file is a text file you can just store it as a UTF8 string.
To store files in MongoDB you should try to use GridFS.
You can find some tutorials about working with GridFS (example).
Check your MongoDB Driver's API and try to implement it in your project
When to Use GridFS
From official doc: https://docs.mongodb.com/manual/core/gridfs/#when-to-use-gridfs
In MongoDB, use GridFS for storing files larger than 16 MB.
In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem.
If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.
When you want to access information from portions of large files without having to load whole files into memory, you can use GridFS to recall sections of files without reading the entire file into memory.
When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities, you can use GridFS. When using geographically distributed replica sets, MongoDB can distribute files and their metadata automatically to a number of mongod instances and facilities.
Do not use GridFS if you need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata. You can update the metadata field that indicates “latest” status in an atomic update after uploading the new version of the file, and later remove previous versions if needed.
Furthermore, if your files are all smaller the 16 MB BSON Document Size limit, consider storing the file manually within a single document instead of using GridFS. You may use the BinData data type to store the binary data. See your drivers documentation for details on using BinData.