Store files on disk or MongoDB - mongodb

I am creating a mongodb/nodejs blogging system (similar to wordpress).
I currently have the images being saved on the disk and a pointer being placed in mongo. I was wondering since I have all sessions being stored in MongoDB to enable easy load balancing across servers, would storing the actual files in Mongo also be a smart idea for easy multiserver setups and/or performance gains.
If everything is stored in a DB, you can simply spawn more web servers and/or mongo replications to scale horizontally
Opinions?

MongoDB is a good option to store your files (I'm talking about GridFS), specially for the use case you described above.
When you store files into MongoDB (GridFS, not documents), you get all the replication and sharding capability for free, which is awesome.
If you have to spawn a new server and you have the files already into MongoDB, all you have to do is to enable replication (thus scale horizontally). I'm sure this can save you a lot of headaches.
Resources:
Is GridFS fast and reliable enough for production?
http://www.mongodb.org/display/DOCS/GridFS
http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/

Aside from GridFS, you might be considering a cloud-based deployment. In that case, you might consider storing files in cloud-specific storage (Windows Azure has Blob Storage, for example). Sticking with Windows Azure for this example (since that's what I work with), you'd reference a file by its storage account URI. For example:
https://mystorageacct.blob.core.windows.net/mycontainer/myvideo.wmv
Since you'd be storing the MongoDB database itself in its own blob (and mounted as disk volume on your Linux or Windows VM), you could then choose to store your files in either the same storage account or a completely different storage account (with each storage account providing 100TB 200TB of storage).

Storing the image as document in mongodb would be a bad idea, as the resources which could have been used to send a large amount of informational data would be used for sending files.
Have a look at mongoDb file storage GridFS , that might solve your problem
of storing images, and providing horizontal scalability as well.
http://www.mongodb.org/display/DOCS/GridFS
http://www.mongodb.org/display/DOCS/GridFS+Specification

Related

Moodle asynchronous replication

I'm looking to deploy moodle in the cloud however I have some 50 odd sites which require access to this moodle possibly even temporarily offline. So I'm looking into replicating moodle down onto each site. From what I understand there are 2 data stores that require replication, moodledata and the database, postgresql in our case. moodledata if I'm not mistaken contains the multimedia data and the database among other things all the user records. Luckily the multimedia data will be centralized and is thus synched only one way down to the nodes, that seems doable. Where I'm stuck is how do I handle the Postgres database where the sync will need to be bidirectional?

PostgresQL data_directory on Google Cloud Storage, possible?

I am new to google cloud and was wondering if it is possible to run PostgresQL container on Cloud Run but the data_directory of PostgresQL was pointed to Cloud Storage?
If possible, then please could you point me to some tutorials/guides on this topic. And also what are the downsides of this approach?
Edit-0: Just to clarify what I am trying to achieve:
I am learning google cloud and want to write simple application to work along with it. I have decided that the backend code will run as a container under Cloud Run and the persistent data(i.e the database file) will reside on Cloud Storage. Because this is a small app for learning purpose, I am trying to use as less moving parts as possible on the backend(and also ones that are always free). And also both PostgresQL and the backend code will reside in the same container except for the actual data file, which will reside under Cloud Storage. Is this approach correct? Are there better approaches to achieve the same minimalism?
Edit-1: Okay, I got the answer! The Google documentation here mentions the following:
"Don't run a database over Cloud Storage FUSE!"
The buckets are not meant to store database information, some of the limits are the following:
There is no limit to writes across multiple objects, which includes uploading, updating, and deleting objects. Buckets initially support roughly 1000 writes per second and then scale as needed.
There is no limit to reads of objects in a bucket, which includes reading object data, reading object metadata, and listing objects. Buckets initially support roughly 5000 object reads per second and then scale as needed.
One alternative to separate persistent disk for your PostgreSQL database, is to use Google Compute Engine. You can follow the “How to Set Up a New Persistent Disk for PostgreSQL Data” Community Tutorial.

Google storage operations extremely slow when using Customer Managed Encryption Key

We're planning on switching from Google managed keys to our own keys (working with sensitive medical data) but are struggling with the performance degradation when we turn on CMEK. We move many big files around storage in our application (5-200GB files), both with the Java Storage API and gsutil. The former stops working on even 2GB size files (times out, and when timeouts are raised silently does not copy the files), and the latter just takes about 100x longer.
Any insights into this behaviour?
When using CMEK, you are actually using an additional layer of encryption on top of Google-managed encryption keys and not replacing them. As for gsutil, if your moving process involves including the objects’ hashes then gsutil will perform an additional operation per object, this might explain why moving the big files is taking much longer than usual.
As a workaround, you may instead use resumable uploads. This type of upload works best with large files since it includes the option of uploading files in multiple chunks which allows you to resume an operation even if the flow of data is interrupted.

How encrypted AWS PostgreSQL RDS works under the hood?

I read that when creating an encrypted AWS Postgres RDS, it encrypts the underlying EBS volume created for it, all read replicas, backups, and snapshots as well.
Also when I queried and inserted the data into DB, it worked same as unencrypted DB would and gave results in plain text.
I have some questions regarding how exactly it is working under the hood.
Here they are:
How does search work?
A simple value-based search can be performed by encrypting what to search and then search it on the encrypted RDS. But my search in PostgreSQL was also working on JSONB nested objects. How is that achieved?
How does a partial search work?
I was able to do a partial search(like query) on a name and address inside a JSONB object. How can a partial search be done on encrypted DB?
How does insertion work?
I have some JSONB columns in my PostgreSQL and I was able to partial insert on my JSONB objects. Its a cool feature of PostgreSQL but how was it achieved when the whole JSONB was encrypted?
PS: I have some knowledge of how DB works under the hood for storing and retrieving data but can't get my head around if everything is encrypted. Pardon me if I am completely wrong on some concepts.
I would really appreciate if someone can shed light on this as I was not able to find it on the internet.
Thanks
You are overthinking this. The DB files on EBS volume do not appear to be encrypted from the perspective of the PostgreSQL process that is running on the RDS server. The encryption/decryption is happening at the hypervisor layer, and is transparent to any software running on the VM. All those things you keep asking how they work, work exactly the same as they would on unencrypted EBS volumes, because when the DB service requests information from disk it receives unencrypted data.
Think of it like going into the BIOS on your laptop and enabling some sort of encryption of your disk volumes. Would you expect that to break all database engines you tried to run on your computer? Would you expect all software to somehow know how to deal with your laptop's BIOS disk encryption? When you do that all the software you run on your computer (like PostgreSQL) doesn't need to suddenly be made aware of how to decrypt data on your disk, it is transparent to that software, appearing to the running software as if the disk is not encrypted.

Storing and managing video files

What approach is considered to be the best to store and manage video files? As databases are used for small textual data, are databases good enough to handle huge amounts of video/audio data? Are databases, the formidable solution?
Apart from size of hard disk space required for centrally managing video/audio/image content, what are the requirements of hosting such a server?
I would not store big files, like videos, in the database ; instead, I would :
store the files on disk, in the filesystem
and only store the name of the file (plus some metadata like content-type, size) in the database.
You have to consider, at least, those few points :
database is generally harder to scale than disks :
having a DB that has a size of several dozens/hundreds/more GB because of videos will make many things (backup, for example) really hard.
do you want to put more read-load on your DB servers, to serve... files ?
samer thing when writting "files" to your DB, btw
serving files (like, say, videos) from the filesystem is something that webservers do pretty well -- you can even use something lighter (like lighttpd, nginx, ...) than the webserver used to run your application (Apache, IIS, ...), if needed
it allows your application and/or some background scripts to do some tasks (like generating previews/thumbnails, for example) without involving the DB server
Using plain old files will also probably make things much easier the day you want to use some kind of CDN to distribute your videos to users.
And here are a couple of other questions/answers that might interest you :
Storing Images in DB - Yea or Nay?
Storing Images : DB or File System -
Store images(jpg,gif,png) in filesystem or DB?
store image in database or in a system file ?
Those questions/answers are about images ; but the idea will be exactly the same for videos -- except that videos will be much bigger than images !