How do I save a file to MongoDB? - mongodb

I want to save a user selected file to MongoDB. How do I correctly add the file to the BSON object in order to add it to MongoDB? If my approach is incorrect please point in the right direction.
Below is the client code. This jQuery functions gathers the text (need help on the file part) on every input field and sends it to the server as a BSON object.
$('#add').click(function()
{
console.log('Creating JSON object...');
var classCode = $('#classCode').val();
var professor = $('#professor').val();
var description = $('#description').val();
var file = $('#file').val();
var document =
{
'classCode':classCode,
'professor':professor,
'description':description,
'file':file,
'dateUploaded':new Date(),
'rating':0
};
console.log('Adding document.');
socket.emit('addDocument', document);
});
The HTML of the form:
<form>
<input type = 'text' placeholder = 'Class code' id = 'classCode'/>
<input type = 'text' placeholder = 'Document description' id = 'description'/>
<input type = 'text' placeholder = 'Professor' id = 'professor'/>
<input type = 'file' id = 'file'/>
<input type = 'submit' id = 'add'/>
</form>
The server side code in CoffeeScript:
#Uploads a document to the server. documentData is sent via javascript from submit.html
socket.on 'addDocument', (documentData) ->
console.log 'Adding document: ' + documentData
db.collection 'documents', (err, collection) ->
collection.insert documentData, safe:false
return

If your files are small enough (under 16 megabytes), instead of adding the complexity of GridFS, you can just embed the files into BSON documents.
BSON has a binary data type, to which any of the drivers should provide access.
If your file is a text file you can just store it as a UTF8 string.

To store files in MongoDB you should try to use GridFS.
You can find some tutorials about working with GridFS (example).
Check your MongoDB Driver's API and try to implement it in your project

When to Use GridFS
From official doc: https://docs.mongodb.com/manual/core/gridfs/#when-to-use-gridfs
In MongoDB, use GridFS for storing files larger than 16 MB.
In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem.
If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.
When you want to access information from portions of large files without having to load whole files into memory, you can use GridFS to recall sections of files without reading the entire file into memory.
When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities, you can use GridFS. When using geographically distributed replica sets, MongoDB can distribute files and their metadata automatically to a number of mongod instances and facilities.
Do not use GridFS if you need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata. You can update the metadata field that indicates “latest” status in an atomic update after uploading the new version of the file, and later remove previous versions if needed.
Furthermore, if your files are all smaller the 16 MB BSON Document Size limit, consider storing the file manually within a single document instead of using GridFS. You may use the BinData data type to store the binary data. See your drivers documentation for details on using BinData.

Related

Spring Mongodb search string inside binary data

I am storing document(text,pdf,csv,doc,docx etc) in mongodb using spring rest.Documents are getting stored as binary data.
Now i want to search documents based on contents inside it.
For e.g. if user searches for string "office", he should see list of documents that contains string "office".
How can i query mongodb for data contains in binary data?
You could try to define a text index over your binary files. I don't know if it would work, but even if it does, such an index would match any words that are part of the file format rather than user content which is generally undesirable.
If I was implementing your requirements I would use a transformer from all of the binary documents to plain text (e.g. pandoc), thus obtaining the user content of each of the documents, then insert that content into a field which has a text index over it, then query on that field.

how to filter files in mongo Db

Am running 4.10 Mongo DB version
How to get the count of files filtered like
i have .doc and .pdf and .csv in file system in MOngo.
how to check the count of each format files (.csv ,.pdf,.doc)
Please let me know
I am assuming you don't mean references, but files that are in GridFS. In this case, the way it works is that Mongo simply puts them in 2 collections, one for files meta data, and another collection for chunks of files, that is to say the data of your files that is sliced in pieces. They are regular collections that you can query like any other collection. Their names are "fs.files" and "fs.chunks".
The one you want is "fs.files". For retrieving the files by type, you can use the field contentType. Here is how you get the number of PDFs:
db.fs.files.find({contentType: "application/pdf"}).count()
// or if your question is only for counting
db.fs.files.count({contentType: "application/pdf"})
Like I said, just like any other collection.
EDIT:
var pdfCount = db.fs.files.find({contentType: "application/pdf"}).count();
var csvCount = db.fs.files.find({contentType: "text/csv"}).count();
var docCount = db.fs.files.find({contentType: "application/msword"}).count();

Queries in Mongodb Gridfs - Metadata? Further Possibilities?

I started using MongoDB with Gridfs some time ago as I need to store documents that are bigger than 16mb. Saving and loading documents worked fine so far.
Next, I wondered how to specify queries for my files stored in GridFS. Lets assume I have instances of my class which looks as follows:
public class Test {
private String id;
private int test;
}
If I want to query for a file which has a certain value for "test", how can I do that in GridFS? I know that I can save my file and store additional metadata.
Hence, I could store the test value in the metadata of the file.
To retrieve the value test of a file, I would do something like this:
String id = "...";
GridFS fs = new GridFS(mongoDB, "TESTFS");
BasicDBObject dbobj = new BasicDBObject();
dbobj.put("filename", id);
GridFSDBFile fsFile = fs.findOne(dbobj);
BasicDBObject metadata = (BasicDBObject) fsFile.get("metadata");
System.out.println("Test: " + metadata.get("test"));
Using metadata, is there an easier way to extract the "test" value of a certain file (without loading the complete file, deserialize the JSON string, etc.)
The disadvantage of this approach is that I have to store the metadata explicitly. If I want to query for other information, I need to introduce this into the metadata for all my data. Is this right?
Is there an alternative to storing such information in the metadata? Or how can I query for specific information in GridFS?
This is obviously a very simple example. The same questions arise when trying to perform more complex queries.
As #wdberkeley noted - you can query db['fs'].files explicitly without fetching the whole file each time (indeed, not the best idea). You can do it using find on the above collections (query as you would form it in MongoDB console - please translate to your language):
db['fs'].files.find({'metadata.test': '#### - the value you want to query for - $$$'})
The cursor that is returned in the MongoDB console and via most drivers is going to allow you to read metadata without ever retrieving the contents of the file.

Store images within a document in MongoDB without using GridFS

Say I have a collection named Items. I'm trying to accomplish a document structure like this:
{
"itemName": "Google Glass"
"description": "Awesome Gadget"
"*some_picture*": "*some_picture_object*"
}
The images that I want to store won't exceed the 16MB cap on BSON documents so I don't want to use GridFS. How can I accomplish the structure above? I'm new to mongoDB and am pretty lost
I'd use the BinData format for the field in your document that contains the image data. Exact usage varies depending upon language used.
For PHP, a sample code for your use case (store image file in collection) taken from the PHP manual http://www.php.net/manual/en/class.mongobindata.php :
<?php
$profile = array(
"username" => "foobity",
"pic" => new MongoBinData(file_get_contents("gravatar.jpg"), MongoBinData::GENERIC),
);
$users->save($profile);
?>
Perl http://api.mongodb.org/perl/current/MongoDB/DataTypes.html#Binary%20Data :
# non-utf8 string
my $string = "\xFF\xFE\xFF";
$collection->insert({"photo" => \$string});
This previous answer has sample code to save an image using Python in MongoDB: saving picture to mongodb
There shouldn't be anything stopping you (other then the cap on the bson document) from storing the bytes (string) of the image in "some_picture". When you select the document and grab the bytes, you treat it exactly as you would if you were reading the bytes from disk.
Consider though, that without using $project, when iterating over your collection, the image will have to be sent over the wire (an entire document is sent across the wire unless you are using aggregate pipeline or map/reduce)
This is as well as I can answer without knowing what language you are using in order to provide an example.

mongodump by date / find() in dumped data

How to dump all collections by date? If my records hasn't timestamp field?
Fields: _id, name, email, carnumber... etc.
And how to look/find() in archived/dumped database?
I need to create search mechanism, for searching in archive
You can pass a query to mongodump that will make it dump only a portion of your data. If you can't make a query that finds a required portion of data, then you're out of luck.
Result of mongodump is a collection of bson files. They are not directly queryable. But you can load them into another database and query that. Or you can use mongoexport utility that creates JSON documents. JSON is a little bit easier to work with.
Although what Sergio says is broadly true, let me expand a bit:
First, You mention using _id - if that is an ObjectID (the default), then it contains a timestamp - the first 4 bytes are a unix style timestamp:
http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification
Next, the problem with using mongoexport is that JSON does not preserve all BSON types (http://bsonspec.org/#/specification) - BSON has more types than JSON does and so storing as JSON can be problematic unless you have rules to re-import
If you keep the data in BSON format there is the bsondump to inspect things as-is in the files:
http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-bsondump
Or, if you had an "archive" MongoDB instance, you could just use mongodump/mongorestore, which works directly with the BSON files and does not have the JSON issues seen with mongoexport etc.:
http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-mongodumpandmongorestore