Store images within a document in MongoDB without using GridFS - mongodb

Say I have a collection named Items. I'm trying to accomplish a document structure like this:
{
"itemName": "Google Glass"
"description": "Awesome Gadget"
"*some_picture*": "*some_picture_object*"
}
The images that I want to store won't exceed the 16MB cap on BSON documents so I don't want to use GridFS. How can I accomplish the structure above? I'm new to mongoDB and am pretty lost

I'd use the BinData format for the field in your document that contains the image data. Exact usage varies depending upon language used.
For PHP, a sample code for your use case (store image file in collection) taken from the PHP manual http://www.php.net/manual/en/class.mongobindata.php :
<?php
$profile = array(
"username" => "foobity",
"pic" => new MongoBinData(file_get_contents("gravatar.jpg"), MongoBinData::GENERIC),
);
$users->save($profile);
?>
Perl http://api.mongodb.org/perl/current/MongoDB/DataTypes.html#Binary%20Data :
# non-utf8 string
my $string = "\xFF\xFE\xFF";
$collection->insert({"photo" => \$string});
This previous answer has sample code to save an image using Python in MongoDB: saving picture to mongodb

There shouldn't be anything stopping you (other then the cap on the bson document) from storing the bytes (string) of the image in "some_picture". When you select the document and grab the bytes, you treat it exactly as you would if you were reading the bytes from disk.
Consider though, that without using $project, when iterating over your collection, the image will have to be sent over the wire (an entire document is sent across the wire unless you are using aggregate pipeline or map/reduce)
This is as well as I can answer without knowing what language you are using in order to provide an example.

Related

Spring Mongodb search string inside binary data

I am storing document(text,pdf,csv,doc,docx etc) in mongodb using spring rest.Documents are getting stored as binary data.
Now i want to search documents based on contents inside it.
For e.g. if user searches for string "office", he should see list of documents that contains string "office".
How can i query mongodb for data contains in binary data?
You could try to define a text index over your binary files. I don't know if it would work, but even if it does, such an index would match any words that are part of the file format rather than user content which is generally undesirable.
If I was implementing your requirements I would use a transformer from all of the binary documents to plain text (e.g. pandoc), thus obtaining the user content of each of the documents, then insert that content into a field which has a text index over it, then query on that field.

MongoDB search via index of documents containing JSON

Say I have objects in a MongoDB collection:
{
...
"json" : "{\"things\":[2494090781803658355,5114030115038563045,3035856943768375362,8931213615561493991,7574631742057150605,480863244020297489]}"
}
It's an Azure "MongoDB" so doesn't support all the features, but suppose it does.
This search will find that document:
db.coll.find({"json" : {$regex : "5114030115038563045|8931213615561493991"}})
Of course, it's scanning the whole collection to pull these records out. What's an efficient/faster way to find documents where the list of "things"
contains any of a list of "things" in a query? It seems like throwing a search engine like Solr or ElasticSearch would solve this, and perhaps
using another Azure's Data Lake storage would make this more searchable, so I'm considering those options. They're outside the scope of this
question though; I'd like to know if there's a Mongo-ish way to search this collection by index.
The only option you have available to you if you're storing a JSON string is to use a text index with a $text operator.
If this document structure isn't set in stone, however, you might consider also separately storing the JSON as a nested subdocument (with the appropriate sanitation, of course). This would allow you to construct an index on json.things, while still storing the JSON string, and allow you to perform a query on e.g. "json.things": {$in: [ "5114030115038563045", "8931213615561493991" ]}

When to use array or not to use them in mongodb

I am working to my very first application in Symfony2/mongodb, I have to store articles and these articles have tags, keywords and related images. At the moment I am storing these information like that:
"category" : [
"category1",
" category2",
" category3"
],
but also I saw a few examples saying to do
"category" : "category1, category2, category3",
so I was guessing which one is the best way to do it?
It's a very bad idea to use string when you actually need an array. If you want to search documents by tag, you definitely need an array. But strings are usefull, when you need text search (for example, searching a word with it forms in sentences).
If you use array, then you will have the following advantages:
You can access each item directly by index.
You can perform queries directly on the array using operators like $in, $nin and $elemMatch
If you use a string, then you will have to:
Split by , in order to do any looping
User text based searching in query, which is slow
One thing you need to keep in mind regarding arrays inside a MongoDB document is that it should not be too large. Arrays can get very large, and if it pushes the size of the document beyond 16 MB, it will cause issues, as 16 MB is the maximum allowed size for a single document.
In that use case, you can split off the contents of your array into a separate collection and created references.

Queries in Mongodb Gridfs - Metadata? Further Possibilities?

I started using MongoDB with Gridfs some time ago as I need to store documents that are bigger than 16mb. Saving and loading documents worked fine so far.
Next, I wondered how to specify queries for my files stored in GridFS. Lets assume I have instances of my class which looks as follows:
public class Test {
private String id;
private int test;
}
If I want to query for a file which has a certain value for "test", how can I do that in GridFS? I know that I can save my file and store additional metadata.
Hence, I could store the test value in the metadata of the file.
To retrieve the value test of a file, I would do something like this:
String id = "...";
GridFS fs = new GridFS(mongoDB, "TESTFS");
BasicDBObject dbobj = new BasicDBObject();
dbobj.put("filename", id);
GridFSDBFile fsFile = fs.findOne(dbobj);
BasicDBObject metadata = (BasicDBObject) fsFile.get("metadata");
System.out.println("Test: " + metadata.get("test"));
Using metadata, is there an easier way to extract the "test" value of a certain file (without loading the complete file, deserialize the JSON string, etc.)
The disadvantage of this approach is that I have to store the metadata explicitly. If I want to query for other information, I need to introduce this into the metadata for all my data. Is this right?
Is there an alternative to storing such information in the metadata? Or how can I query for specific information in GridFS?
This is obviously a very simple example. The same questions arise when trying to perform more complex queries.
As #wdberkeley noted - you can query db['fs'].files explicitly without fetching the whole file each time (indeed, not the best idea). You can do it using find on the above collections (query as you would form it in MongoDB console - please translate to your language):
db['fs'].files.find({'metadata.test': '#### - the value you want to query for - $$$'})
The cursor that is returned in the MongoDB console and via most drivers is going to allow you to read metadata without ever retrieving the contents of the file.

How do I save a file to MongoDB?

I want to save a user selected file to MongoDB. How do I correctly add the file to the BSON object in order to add it to MongoDB? If my approach is incorrect please point in the right direction.
Below is the client code. This jQuery functions gathers the text (need help on the file part) on every input field and sends it to the server as a BSON object.
$('#add').click(function()
{
console.log('Creating JSON object...');
var classCode = $('#classCode').val();
var professor = $('#professor').val();
var description = $('#description').val();
var file = $('#file').val();
var document =
{
'classCode':classCode,
'professor':professor,
'description':description,
'file':file,
'dateUploaded':new Date(),
'rating':0
};
console.log('Adding document.');
socket.emit('addDocument', document);
});
The HTML of the form:
<form>
<input type = 'text' placeholder = 'Class code' id = 'classCode'/>
<input type = 'text' placeholder = 'Document description' id = 'description'/>
<input type = 'text' placeholder = 'Professor' id = 'professor'/>
<input type = 'file' id = 'file'/>
<input type = 'submit' id = 'add'/>
</form>
The server side code in CoffeeScript:
#Uploads a document to the server. documentData is sent via javascript from submit.html
socket.on 'addDocument', (documentData) ->
console.log 'Adding document: ' + documentData
db.collection 'documents', (err, collection) ->
collection.insert documentData, safe:false
return
If your files are small enough (under 16 megabytes), instead of adding the complexity of GridFS, you can just embed the files into BSON documents.
BSON has a binary data type, to which any of the drivers should provide access.
If your file is a text file you can just store it as a UTF8 string.
To store files in MongoDB you should try to use GridFS.
You can find some tutorials about working with GridFS (example).
Check your MongoDB Driver's API and try to implement it in your project
When to Use GridFS
From official doc: https://docs.mongodb.com/manual/core/gridfs/#when-to-use-gridfs
In MongoDB, use GridFS for storing files larger than 16 MB.
In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem.
If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.
When you want to access information from portions of large files without having to load whole files into memory, you can use GridFS to recall sections of files without reading the entire file into memory.
When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities, you can use GridFS. When using geographically distributed replica sets, MongoDB can distribute files and their metadata automatically to a number of mongod instances and facilities.
Do not use GridFS if you need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata. You can update the metadata field that indicates “latest” status in an atomic update after uploading the new version of the file, and later remove previous versions if needed.
Furthermore, if your files are all smaller the 16 MB BSON Document Size limit, consider storing the file manually within a single document instead of using GridFS. You may use the BinData data type to store the binary data. See your drivers documentation for details on using BinData.