Can you create something similar to a folder in mongoose(or perhaps MongoDB)?
I've tried creating separate databases for each new so-called "folder" but it gets a bit tedious after a while.
MongoDB is an object storage which does not store data in folder format. It stores them as documents, since it is a document-oriented datastore.
If you want a database or storage option resembling a folder, you might have to look into an object storage such as AWS S3 (cloud) or MinIO (local).
Related
I have deep learning models (tensorflow in hdf5 format), which I want to upload to a PostgreSQL database.
A single model may be up to 500 MBs, and the models need to be updated and uploaded/downloaded from the database frequently (e.g., once every 30 mins).
I'm a begineer to PostgreSQL, so any information/receipes to start with would be appreciated.
Relation DB is a wrong tool for this. Store the link to an S3 bucket for example. Or your own private cloud if the model is private.
Storing large binary data in an relational DB is possible but unwise. Also, you have to eventually copy your model from the DB to the disk for TensorFlow to load anyway. Why not store your model directly on the disk?
I am saving data to mlab storage but I ran out of data, so I went into my account and realized that I had 490mb worth of data in fs.chunks. Can I delete fs.chunks and fs.files class. Or will something very bad happen. Im very confused by these two classes so more clarification would much be appreciated. Do I need fs.chunks and ds.files
These collections are created when an application is storing data in a database using GridFS.
In general, its much more efficient to store files in a file storage service such as AWS S3 and storing references to the files in the database, as opposed to storing the files directly in the database.
We have a MongoDB sitting at 600GB. We've deleted a lot of documents, and in the hopes of shrinking it, we repaired it onto a 2TB drive.
It ran for hours, eventually running out of the 2TB space. When I looked at the repair directory, it had created way more files than the original database??
Anyway, I'm trying to look for alternative options. My first thought was to create a new MongoDB, and copy each row from the old to the new. Is it possible to do this, and what's the fastest way?
I have a lot of success in copying database using the db.copyDatabase command:
link to mongodb copyDatabase
I have also used MongoVUE, which is a software that makes it easy to copy databases from one location to another - MonogoVUE (which is jus ta graphical interface on top of monogo).
If you have no luck with copyDatabase, I can suggest you try to dump and restore the database to an external file, something like mongodump or lvcreate
Here is a full read on backup and restore which should allow you to copy the database easily: http://docs.mongodb.org/manual/core/backups/
I have some data sets in Google cloud storage. I could find how I can append more data to this dataset. But if I want to merge the data set(Insert else update), how do I do it?
I have one option of using Hive - Insert overwrite. Is there any other better option?
Is there any option with Google cloud storage API itself?
Maybe this could be helpful: https://cloud.google.com/storage/docs/json_api/v1/objects/compose
Objects: compose
Concatenates a list of existing objects into a new object in the same bucket.
GCS treats your objects (files) as blobs, there are no in-built GCS operations on the text in your objects. There is an easier way to do the same as you are doing though.
App-engine hosted MapReduce provides in-built adapters to work with GCS. You can find the example code in this repo.
Is there any way to save image to mongo's gridfs and after asynchronous upload to S3 in background?
Maybe it is possible to chain uploaders?
The problem in next: Multiple servers used, thus - saved to hard drive image and running background process can be on different servers.
Also
1. it should remove from gridfs when uploaded to s3
2. it should auto remove from s3 when correspond entity destroyed.
Thanks.
What does your deployment architecture look like? I'm a little confused by when you say "multiple servers"- do you mean multiple mongod instances? Also, it's a bit confusing when you specify your requirements. According to requirement 1, if you upload to S3, then the gridfs file should be removed. However, according to your requirements, it cannot exist in both S3 and Gridfs, so requirement 2 seems to be a contradiction to the first, ie, it shouldn't exist in gridfs in the first place. Are you preserving some files on both Gridfs and S3?
If you are running in a replica set or sharded cluster, you could create a tailable cursor on your gridfs collection (you can also do this on a single node, although it's not recommended). When you see an insert operation (will look like 'op':'i') you could execute a script or do something in your application to grab the file from gridfs and push the appropriate file to s3. Similarly, when you see a delete operation ('op':'d') you could summarily delete the file from s3.
The beauty of a tailable cursor is that it allows for asynchronous operations- you can have another process monitoring the oplog on a different server and performing the appropriate actions.
I used temp variable to store to gridfs and made Worker (see this) to perform async upload from gridfs to s3.
Hope this would help somebody, thanks.