The java web app I'm developing allows users to upload files (pictures and documents) to their profiles and define access rules for those files (define which of the other users are able to view / download the file). The access control / permission system is custom made and rules are stored in mongoDB alongside the user's profile and the actual file entry.
Knowing that I need the application and storage to be distributed and fault-tolerant I need to figure out which is the best strategy for file storage.
Should I store the files inside mongoDB in the files collection where the file document containing description and access rules are located ?
Or should I store the files inside the server's file system and keep the path in the mongoDB document? With the filesystem approach will I still be able to enforce the user defined access permissions and how?
Finally in the filesystem approach how do I distribute files accross servers? Should I use dedicated servers for this or can I store the files on the webapp servers or mongodb servers ?
Thanks a lot for all your insights! Any help or feedback appreciated.
Alex
There are several alternatives:
put files in a storage service (e.g. S3): easy and much space but bad perf
put files in a local filesystem: fast but doesnt scale
put files in mongodb docs: easy, powerful and scalable but limited to 16MB
use GridFS layer of mongodb. Functionalities are limited but it is made for scalability (thanks to sharding) and is fairly fast too. Note you can put info about file (permission etc) right into the file's metadata object.
In your case it sounds like last option may be best, there are quite a few users who switched from FS to gridFS and it worked very well for them.
Things to keep in mind:
gridfs sharding works but is not perfect: usually only data is sharded, not the metadata. Not a big deal but the shard with metadata must be very safe.
it can be beneficial to use gridfs in a separate mongodb cluster from your core data, since requirements (storage, backup, etc) are usually different.
Related
We are working on an application that will be offered both as a web-based and as a cross-platform desktop solution by means of Electron.
Due to customer requirements, the desktop client cannot make use of "the cloud" to store data; all data should be stored in the local machine or, even better, the user should have the option to keep the database/data file on an external HDD so that another user on the same local network can use the same data file.
We've been looking at NeDB, PouchDB, etc, but all these use either Web SQL or IndexedDB on the browser itself to store the data.
NeDB can theoretically use the file system but that seems only possible for Node Webkit apps.
Another option is of course MongoDB, but it requires setting up a site on a web server. Seeing as how our users will set that up in on their own machines, that will work for one user only but would make it very hard for them to share the data (note: assume users with little technical know-how).
Is there a way to force NeDB to persist data in a file instead of the in-browser database?
Alternatively, does any one know of a file-based, compact database that plays well with electron/node?
We'd preferably like to use a NoSQL database, but options of file-based SQL databases will be considered as well.
I have some experience with NeDB in an Electron app and I can say it will definitely work on the filesystem.
How are you initializing NeDB (or whatever your database choice is)? Also, are you initializing it in the main or renderer process? If you can share that, I think we could trace the issue to a configuration issue.
This is how you start NeDB with a persistent data-store that saves to disk.
var Datastore = require('nedb')
, db = new Datastore({ filename: 'path/to/datafile', autoload: true });
I think MongoDB is going to be overkill for an Electron app (it's meant to be really a high performance, distributed database running in the cloud).
Another option you could consider is LevelDB (a key/value store that can persist to the filesystem) which is popular in the node community. (EDIT 4/17/17 IndexedDB uses LevelDB underneath the hood, so if you go that route, may as well just use that)
One aspect I would definitely evaluate carefully is: How difficult is this database going to be to package and distribute? How do I integrate it into my build system? Level and NeDB can be included simply via npm install and any native code compiling is handled seamlessly with node-gyp, which is as simple as it gets. However, bundling Mongo, for example, will require some work to get a working build for each different platform.
I am writing a server that allows user to upload images. It appears that most people tend to store those files on the filesystem directly.
My question would be if that really is the way how to do it. I'm not familiar with the capacities of a server but what I'm curious about is e.g. how to make sure that the server does not run out of (hard drive) memory?
I would also like to know how one would organize those files for many different users. Is it enough to just store it like war/images/<user-database-id>/<uuid-for-image>.(jpeg|png) by just using the user ID inside the database or are there a lot more things to consider when it comes to storing images?
I think your best bet would be to use a cloud storage system such as Amazon S3, Google Cloud Storage, Rackspace, or MS Azure.
Using a path like the one you suggested ought to be possible but you could also omit the user-database-id if that database already gives you a list of objects owned by that user.
I'm developing a Asp.Net MVC project that will be hosted in Amazon AWS, but I have some questions about storage of the client's files. The documentation from Amazon is not clear to me and I'm looking for some directions and experiences here.
1 - each client have a few files with low space disk requirements, low update frequency but very high access frequency (like brand image and even sensitive files like certificates). Is appropriate to storage this files in app_data folder in web server?
2 - the most critical to me are sensitive documents (from hundreds to dozen of thousands per client, most like xml signed files). This files has a medium read access frequency but a very high demand for creation. One solution I found is MongoDB, wich give me some freedom to manage the storage policy and allow me a external backup easy, but I'm not sure about that. Other options are to use the Amazon Storage and handle all this files and GBs in there with a lot of folders or maybe use a regular database and save the files as xml or bin.
My concerns are about the amount of data, the security and the reliable in case of disaster as most of this documents has legal value.
You could, but storing them locally, violates the shared nothing architecture and would limit your scaling options. Amazon S3 is a good option here. You can set some files public and serve them direct from s3 (or with cloudfront) and keep other private and provide access via signed urls.
Again, you can put the files on s3 and make them private. You will still probably store references to the files in your database. Generally its not a great idea to store large blob files in a database since they are often not well optimized to access them.
I created a mongodb database with this description
http://docs.mongodb.org/manual/tutorial/enable-authentication-without-bypass/
created database
created admin-user
run mongodb with --auth parameter
that works fine.
but how can I really protect the database files from anonymous access?
When someone would take my database-file and run mongodb without --auth parameter he would have access to the whole database.
Is there a way to protect the database file itself so I can't just run mongodb without --auth?
Best regards
Tobias
Encrypting data files is only part of an overall security strategy - if someone has access to copy any files from your computer or a backup, they may also be able to snag your encryption keys from the same source. The MongoDB manual has a Security section which covers general best practices including access control, network exposure, auditing, and a high level checklist.
If you want to encrypt your MongoDB data files you will need to look into a solution for "encryption at rest".
As at MongoDB 2.6, there is no built-in support for data encryption but there are a number of open source as well as commercial solutions available.
The broad categories of encryption at rest are application level or storage encryption (which can be used independently or together, depending on your requirements). Encryption will add some performance overhead for disk I/O, so you should consider this in your testing & evaluation of a suitable solution for your requirements.
A few examples of encryption at rest solutions are:
LUKS (Linux Unified Key Setup)
Windows Bitlocker Drive Encryption
For more information on supported options, have a read of the Encryption at Rest section of the MongoDB security documentation.
I have been looking at GridFS to store images and other files.
It has some really nice features but we have to retrieve and store files at some temporary location to render html which costs us cpu time.
What would be a good strategy to use GridFS and minimize this cost?
Mongodb supports REST. So you don't have to download file to a temporary storage. With rest, you just need to append the REST link of the file to your html.
See mongodb simple rest interface here