I have been looking at GridFS to store images and other files.
It has some really nice features but we have to retrieve and store files at some temporary location to render html which costs us cpu time.
What would be a good strategy to use GridFS and minimize this cost?
Mongodb supports REST. So you don't have to download file to a temporary storage. With rest, you just need to append the REST link of the file to your html.
See mongodb simple rest interface here
Related
Can we use in MeteorJS binary inserted png to the mongodb ? Or do we have to stick base64 ?
I inserted tons of binary png to mongodg for saving the spaces.
I can perfectly utilize it from my C++ codes.
But now need some web frontend.
There are packages like ostrio:files that will do a lot of the work for you. Inserting files into the database works, but puts a load on the database and app to do basic file serving activity, which is better done by something like AWS S3.
Alternatively there is a great service called Filestack https://www.filestack.com/ which is very easy to integrate to, and has a good upload control complete with cropping and resizing. You can just store the image URL's in your database. Quick to implement, and offloads from your server.
I'm developing a Asp.Net MVC project that will be hosted in Amazon AWS, but I have some questions about storage of the client's files. The documentation from Amazon is not clear to me and I'm looking for some directions and experiences here.
1 - each client have a few files with low space disk requirements, low update frequency but very high access frequency (like brand image and even sensitive files like certificates). Is appropriate to storage this files in app_data folder in web server?
2 - the most critical to me are sensitive documents (from hundreds to dozen of thousands per client, most like xml signed files). This files has a medium read access frequency but a very high demand for creation. One solution I found is MongoDB, wich give me some freedom to manage the storage policy and allow me a external backup easy, but I'm not sure about that. Other options are to use the Amazon Storage and handle all this files and GBs in there with a lot of folders or maybe use a regular database and save the files as xml or bin.
My concerns are about the amount of data, the security and the reliable in case of disaster as most of this documents has legal value.
You could, but storing them locally, violates the shared nothing architecture and would limit your scaling options. Amazon S3 is a good option here. You can set some files public and serve them direct from s3 (or with cloudfront) and keep other private and provide access via signed urls.
Again, you can put the files on s3 and make them private. You will still probably store references to the files in your database. Generally its not a great idea to store large blob files in a database since they are often not well optimized to access them.
What is the recommended way of handling file uploads to the database using Play!2 with Scala?
Depending on your specific requirements, MongoDB's GridFS will store files for you in a much more efficient and scalable manner than any relational database and filesystem.
http://www.mongodb.org/display/DOCS/GridFS
There are plenty of MongoDB plugins for Play that support GridFS.
Say, if I want to store PDFs or ePub files using MongoDB's GridFS, is it possible to perform full-text searching on the data files?
You can't currently do real full text search within mongo: http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo
Feel free to vote for it here:
https://jira.mongodb.org/browse/SERVER-380
Mongo is more of a general purpose scalable data store, and as of yet it doesn't have any full text search support. Depending on your use case, you could use the standard b-tree indexes with an array of all of the words in the text, but it won't do stemming or fuzzy matches, etc.
However, I would recommend combining mongodb with a lucene-based application (elastic search is popular). You can store all of your data in mongodb (binary data, metadata, etc.), and then index the plain text of your documents in lucene. Or, if your use case is pure full text search, you might consider just using elastic search instead of mongodb.
Update (April 2013):
MongoDB 2.4 now supports a basic full-text index! Some useful resources below.
http://docs.mongodb.org/manual/applications/text-search/
http://docs.mongodb.org/manual/reference/command/text/#dbcmd.text
http://blog.mongohq.com/blog/2013/01/22/first-week-with-mongodb-2-dot-4-development-release/
Not using MongoDB APIs, not that I know of. GridFS seems to be designed to be more like a simplified file system with APIs that provides a straightforward key-value semantic. On their project ideas page they list two things that would help you if existed in production-ready state:
GridFS FUSE that would allow you to mount GridFS as a local file system and then index it like you would index stuff on your disk
Real-Time Full Text search integration with tools like Lucene and Solr. There are some projects on github and bitbucket that you might want to check out.
Also look at ElasticSearch. I have seen some integration with Mongo but I am not sure how much has been done to tap into the GridFS (GridFS attachment support is mentioned but I haven't worked with it to know for sure). Maybe you will be the one to build it and then opensource it? should be a fun adventure
The java web app I'm developing allows users to upload files (pictures and documents) to their profiles and define access rules for those files (define which of the other users are able to view / download the file). The access control / permission system is custom made and rules are stored in mongoDB alongside the user's profile and the actual file entry.
Knowing that I need the application and storage to be distributed and fault-tolerant I need to figure out which is the best strategy for file storage.
Should I store the files inside mongoDB in the files collection where the file document containing description and access rules are located ?
Or should I store the files inside the server's file system and keep the path in the mongoDB document? With the filesystem approach will I still be able to enforce the user defined access permissions and how?
Finally in the filesystem approach how do I distribute files accross servers? Should I use dedicated servers for this or can I store the files on the webapp servers or mongodb servers ?
Thanks a lot for all your insights! Any help or feedback appreciated.
Alex
There are several alternatives:
put files in a storage service (e.g. S3): easy and much space but bad perf
put files in a local filesystem: fast but doesnt scale
put files in mongodb docs: easy, powerful and scalable but limited to 16MB
use GridFS layer of mongodb. Functionalities are limited but it is made for scalability (thanks to sharding) and is fairly fast too. Note you can put info about file (permission etc) right into the file's metadata object.
In your case it sounds like last option may be best, there are quite a few users who switched from FS to gridFS and it worked very well for them.
Things to keep in mind:
gridfs sharding works but is not perfect: usually only data is sharded, not the metadata. Not a big deal but the shard with metadata must be very safe.
it can be beneficial to use gridfs in a separate mongodb cluster from your core data, since requirements (storage, backup, etc) are usually different.