couchbase with binary document as value - nosql

we have heard if we want to save binary data as value to couchbase, it encoded with base64, which expand 30% in size and consume encode decode time. it may not acceptable, any other solutions? eg. modify couchbase to support binary value? thanks
BTW, any description doc on codes?

You can store any kind of value you want in Couchbase. This includes binary documents, JSON documents, etc. The only thing to keep in mind is that we only index JSON documents so you can't query into binary data. Documents are only shown in base64 in the Web UI and may be stored that way on disk, but this should be completely transparent to you. You should not have to worry about anything being in base64 in your application unless of course you do this yourself.

Related

Upload pdf with express to mongo db

I'm trying to implement pdf upload with express and save it to mongo db . But it was only saving to localy my pc i can't send it to mongo db. Can any one help me how can i upload or access pdf files using multer or any other library.
As far as I know, you have 2 -maybe 3- ways of using "files" with MongoDB, the approach will depend on your use case (and size of the document):
Store the files directly in the document
As you know you can store anything you want in a JSON/BSON document, you just need to store the bytes and your PDF will be part of the document.
You just need to be careful about the document size limit of 16Mb.
You can add meta-data for the file in the JSON document, and they are stored in the same place.
For example in Java you will just store byte[] in an attribute, look at his test:
https://github.com/mongodb/mongo-java-driver/blob/master/src/test/com/mongodb/ByteTest.java#L188
Use GridFS
GridFS allows you to store files of "any size" into MongoDB. The file you are storing is divided in chunks by the driver and stored into smaller documents into MongoDB, when you read it it will be put back in a single file. With this approach, you do not have any size limit.
In this case, if you want to add metadata, you create a JSON document that you store with all the attributes and a reference to the GridFS file.
You can find information about this here: http://docs.mongodb.org/manual/core/gridfs/
and to this Java test:
https://github.com/mongodb/mongo-java-driver/blob/master/src/test/com/mongodb/gridfs/GridFSTest.java
Create Reference to an external storage
This one is not directly a "MongoDB" use case, but I think it is important to mention it. You can obviously store the files in some special storage and use MongoDB just for the metadata and reference this file. I will take a stupid example but suppose you want to create a Video application, you can store the videos in YouTube and reference this into the document with all your application metadata.
So let's stay on your use case/question, so you can use approaches 1 & 2, and it will depend on the size of the files and how do you access them. If you can give us more information about your application people may have a stronger opinion on the best approach.
If you are looking for people doing this, you can look at this presentation from MongoDB World: http://www.mongodb.com/presentations/translational-medicine-platform-sanofi-0

Explain Like I'm Five: Form w/ Text and Image Field > Routes > Controller > Write to MongoDB Document - GridFS goes where?

I have been trying to read the documentation for GridFS and MongoDB for awhile. They just keep repeating the same thing and I can't make sense of it.
Desired Output: The user submits a form that form contains many fields, but one is an image. The request needs to store the data in a collection and make a new document, which can be retrieved later. My main question is how do I use GridFS in this situation to store an image in that document.
It says GridFS makes two collections in my database files and chunks. So how do those collections relate to my other collection which has the other form data?
I assume a reference to these files and chucks collection, however, I can't make any sense of this. It's been a few days and I feel like it's time to reach out to my StackOverflow community.
Can someone please explain to me the program flow and key points for how I can achieve my goal of storing an image in a document using gridfs?
GridFS is mentioned everywhere and seems to be popular but I can't make sense of it. These moments of utter confusion usually result in a breakthrough, so I'm eager to learn from veterans and experts.
GridFS collections are internal implementation detail of GridFS. There is no linking between them and your other data.
To write to and read from GridFS, you would use GridFS APIs provided by your driver. Generally this means that, if you are saving for example some fields and a binary blob like an image, you would perform the save in two steps (one insert/update operation for the fields and a separate GridFS operation for the binary blob).
Can someone please explain to me the program flow and key points for how I can achieve my goal of storing an image in a document using gridfs?
You wouldn't store the image in your document. You would store the image in GridFS, and in your document you could include a reference to the GridFS file (those have their own ids).

How to store lookup values in MongoDB?

I have a collection in db which represents mediafiles.
And among other info I shoud store format name. I wonder if there best practices to store info like that. Is it better to create new collection for file formats and use link to that collection or to store format name right in file documents as a plain text? What about perfomance and compression? It supposed to be more than a billion documents in db. What would mongo expers suggest in this situation?
Embedded documents are the preferred approach.
In your case, it means it is better to store file format in the same collection.
Putting the file format into the separate collection means creating a new file on the disk.
It is a slower option and should be used if your document ( any of them ) exceeds 16 MB in size.
See these links for more information
6 Rules of Thumb for MongoDB Schema Design
and
How to Program with MongoDB Using the .NET Driver
I've done some benchmarks and figured out that in my case storing "lookup values" as plaintext is more efficient in terms of disk space than embedded document and than reference to outstanding collection. Sorry for poor terminology.

C/C++ Example for GridFS implementation in MongoDB

Just Started building an application on mongodb for file saving and retrieving and found that it has a standard specification for this purpose named as GridFS . But unfortunately i am unable to find any start up example for this in C/C++. If anyone know any thing related with it then please gives my the direction.
Edit:
I read that for storing file greater than the size of 16MB, GridFS is used, so what about the file size smaller than 16MB?..I can not get any information about it. For the smaller size, do i need to use some other process or the same GridFs?
Thanks
GridFS can be accessed through the class mongo::GridFS. The API is pretty self-explaining.
Alternatively, you can embed the binary data of your files in normal documents as the BSON BinData type. mongo::BSONObjBuilder has the method appendBinData to add a field with binary content to a document.
The reason GridFS exists is that there is an upper limit of 16MB per document. When you want to store data larger than 16MB, you need to split it into multiple documents. GridFS is an abstraction to handle this automatically, but it can also be used for smaller files.
In general, you shouldn't mix both techniques for the same content, as it just makes things more complicated with little benefit. When you can guarantee that your data doesn't get close to 16MB, use embedding. When you occasionally have content > 16MB, you should use GridFS even for files smaller than that.

Should I use GridFS or binary data to store & retrieve images from MongoDB?

I was wondering which is better/faster:
Having a separate collection of documents that just contain the image saved as binary data, and possibly some metadata.
Or using GridFS to store the images.
If your images are small you can store them as binary data in the documents in your collection. Just consider that you will be retrieving them every time you query your document (unless you exclude the 'image' field from your queries).
However, if your images are larger I would use GridFS. GridFS has some features that make it very good at handling images that you should consider:
For larger images, when they are stored in GridFs they will be split in chunks and you can store very large files. If you try to store images in your document, you are constrained by the 16Mb max size of a document, and you are consuming space that needs to be used for your actual document.
You can add metadata to the image itself and run queries against these attributes, as if you were doing it from a regular document in a collection. So GridFS is as good as a document for metadata about the image.
I really like that I get MD5 hash calculated on the images. (It is very useful for some of my cases).
By storing images in GridFS you save yourself the preprocessing of the image into binary format (not a big deal, but a convenience of GridFS)
In terms of performance, reading/writing against a regular document should be no different than doing it against GridFS. I would not consider performance to be a differentiator in choosing either one.
My personal recommendation is to go with GridFS, but you need to analyze for your particular use case.
Hope this helps.
I use GridFS to store photos and documents. It's so easy and retrieving it from the collection to display or save locally is easy. You can store metadata along w/ the binary data inside the same collection. This way you don't need to create an additional collection to store them.
For example, in one of my project I store user profile photos along with usernames, file type, and date of upload.
GridFS is developed to handle Files in an efficient way.
http://www.mongodb.org/display/DOCS/When+to+use+GridFS
Do not forget that you maybe will have to translate the data
to a file and back.
But to be sure, do a performance test that takes account of your usage pattern.