What are fs.chunks and fs.files in lab cloud - mongodb

I am saving data to mlab storage but I ran out of data, so I went into my account and realized that I had 490mb worth of data in fs.chunks. Can I delete fs.chunks and fs.files class. Or will something very bad happen. Im very confused by these two classes so more clarification would much be appreciated. Do I need fs.chunks and ds.files

These collections are created when an application is storing data in a database using GridFS.
In general, its much more efficient to store files in a file storage service such as AWS S3 and storing references to the files in the database, as opposed to storing the files directly in the database.

Related

Is there a way to upload a deep learning model to a PostgreSQL database?

I have deep learning models (tensorflow in hdf5 format), which I want to upload to a PostgreSQL database.
A single model may be up to 500 MBs, and the models need to be updated and uploaded/downloaded from the database frequently (e.g., once every 30 mins).
I'm a begineer to PostgreSQL, so any information/receipes to start with would be appreciated.
Relation DB is a wrong tool for this. Store the link to an S3 bucket for example. Or your own private cloud if the model is private.
Storing large binary data in an relational DB is possible but unwise. Also, you have to eventually copy your model from the DB to the disk for TensorFlow to load anyway. Why not store your model directly on the disk?

Add folder in mongoose

Can you create something similar to a folder in mongoose(or perhaps MongoDB)?
I've tried creating separate databases for each new so-called "folder" but it gets a bit tedious after a while.
MongoDB is an object storage which does not store data in folder format. It stores them as documents, since it is a document-oriented datastore.
If you want a database or storage option resembling a folder, you might have to look into an object storage such as AWS S3 (cloud) or MinIO (local).

Will Mongodb GridFs MyFiles.files get deleted automatically?

I have some files uploaded to GridFs which will create Files.files and Files.chunks but now I see that Files.files data got deleted but Files.chunks data still available. I'm not able to access the files without Files.files(ObjectId). Do these files will get deleted after some time or some storage?

Why mongodump does not backup indexes?

While reading the mongodump documentation, I came across this information.
"mongodump only captures the documents in the database in its backup data and does not include index data. mongorestore or mongod must then rebuild the indexes after restoring data."
Considering that indexes are also critical piece of the database puzzle and they form required to be rebuilt, why doesn't mongodump have an option of taking the backups with indexes?
I get that there are two advantages of not backing up indexes as a default option:
1. We save time which would otherwise be required for backup and restore of indexes.
2. We save space required for storing the backups.
But why not have it as an option at all?
mongodump creates a binary export of data from a MongoDB database (in BSON format). The index definitions are backed up in <dbname>.metadata.json files, so mongorestore can recreate the original data & indexes.
There are two main reasons that the actual indexes cannot be backed up with mongodump:
Indexes point to locations in the data files. The data files do not exist if you are only exporting the documents in the data files (rather than taking a full file copy of the data files).
The format of indexes on disk is storage-engine specific, whereas mongodump is intended to be storage-engine independent.
If you want a full backup of data & indexes, you need to backup by copying the underlying data files (typically by using filesystem or EBS snapshots). This is a more common option for larger deployments, as mongodump requires reading all data into the mongod process (which will evict some of your working set if your database is larger than memory).

CarrierWave save image to gridfs and upload in background s3

Is there any way to save image to mongo's gridfs and after asynchronous upload to S3 in background?
Maybe it is possible to chain uploaders?
The problem in next: Multiple servers used, thus - saved to hard drive image and running background process can be on different servers.
Also
1. it should remove from gridfs when uploaded to s3
2. it should auto remove from s3 when correspond entity destroyed.
Thanks.
What does your deployment architecture look like? I'm a little confused by when you say "multiple servers"- do you mean multiple mongod instances? Also, it's a bit confusing when you specify your requirements. According to requirement 1, if you upload to S3, then the gridfs file should be removed. However, according to your requirements, it cannot exist in both S3 and Gridfs, so requirement 2 seems to be a contradiction to the first, ie, it shouldn't exist in gridfs in the first place. Are you preserving some files on both Gridfs and S3?
If you are running in a replica set or sharded cluster, you could create a tailable cursor on your gridfs collection (you can also do this on a single node, although it's not recommended). When you see an insert operation (will look like 'op':'i') you could execute a script or do something in your application to grab the file from gridfs and push the appropriate file to s3. Similarly, when you see a delete operation ('op':'d') you could summarily delete the file from s3.
The beauty of a tailable cursor is that it allows for asynchronous operations- you can have another process monitoring the oplog on a different server and performing the appropriate actions.
I used temp variable to store to gridfs and made Worker (see this) to perform async upload from gridfs to s3.
Hope this would help somebody, thanks.