Replace an existing GridFS file with new content using Spring Data MongoDB - mongodb

I have some data in MongoDB GridFS. I am using the Spring Data GridFsOperations class to do my GridFS read/writes.
I have a requirement to replace the content of an existing GridFS file i.e. the _id and filename should stay the same, but the file content should be updated.
Spring Data [GridFsOperations] (API) primarily allows find, which returns a Mongo GridFSDBFile, and store. GridFSDBFile (API) does not allow updating content. The store method could in theory be used if the file was deleted first, and then stored with the same _id as the previous file. However store does not allow specifying the _id field.
The only solution I have found so far is to use the Mongo API directly to delete the existing file, and store a new one with the same _id. Answers to this effect are not useful: the question is specific to Spring Data MongoDB.

The reason there's no API exposed yet is that there's no support for that on the MongoDB GridFS. You essentially work around this issue implementing a pattern like described here. But as this boils down to a non-atomic operation we decided to not expose it as operation in the first place.
In case you thing there's a reliable implementation of this pattern (plus the appropriate handling of error cases) feel free to open ticket in our JIRA to discuss options.

Related

Upload pdf with express to mongo db

I'm trying to implement pdf upload with express and save it to mongo db . But it was only saving to localy my pc i can't send it to mongo db. Can any one help me how can i upload or access pdf files using multer or any other library.
As far as I know, you have 2 -maybe 3- ways of using "files" with MongoDB, the approach will depend on your use case (and size of the document):
Store the files directly in the document
As you know you can store anything you want in a JSON/BSON document, you just need to store the bytes and your PDF will be part of the document.
You just need to be careful about the document size limit of 16Mb.
You can add meta-data for the file in the JSON document, and they are stored in the same place.
For example in Java you will just store byte[] in an attribute, look at his test:
https://github.com/mongodb/mongo-java-driver/blob/master/src/test/com/mongodb/ByteTest.java#L188
Use GridFS
GridFS allows you to store files of "any size" into MongoDB. The file you are storing is divided in chunks by the driver and stored into smaller documents into MongoDB, when you read it it will be put back in a single file. With this approach, you do not have any size limit.
In this case, if you want to add metadata, you create a JSON document that you store with all the attributes and a reference to the GridFS file.
You can find information about this here: http://docs.mongodb.org/manual/core/gridfs/
and to this Java test:
https://github.com/mongodb/mongo-java-driver/blob/master/src/test/com/mongodb/gridfs/GridFSTest.java
Create Reference to an external storage
This one is not directly a "MongoDB" use case, but I think it is important to mention it. You can obviously store the files in some special storage and use MongoDB just for the metadata and reference this file. I will take a stupid example but suppose you want to create a Video application, you can store the videos in YouTube and reference this into the document with all your application metadata.
So let's stay on your use case/question, so you can use approaches 1 & 2, and it will depend on the size of the files and how do you access them. If you can give us more information about your application people may have a stronger opinion on the best approach.
If you are looking for people doing this, you can look at this presentation from MongoDB World: http://www.mongodb.com/presentations/translational-medicine-platform-sanofi-0

Are there side effects to adding new content to a Mongo DB

I want to add a new object to an existing MongoDB document which I don't control and I don't want to break the vendors application. I've tried this in test code and it works fine, but I wanted some confirmation.
I'm using a RestAPI to drive a commercial product and under the hood the application is using MongoDB to persist. I can add new and arbitrary fields/objects to the JSon messages and they're persisted into Mongo as expected. Am I right that as long as my naming is different from existing/new vendor fields, then the Vendors application should just keep working, ignoring my new data?
Bonus points if there's an article covering this that I can reference.
MongoDB does not have a fixed schema and it treats all documents in a collection differently. With the new storage engine WiredTiger, even there is a document level transaction. So adding a new document to the existing collection should not matter most. However, if you are going to read that new document and its not indexed then reading time will be high

Explain Like I'm Five: Form w/ Text and Image Field > Routes > Controller > Write to MongoDB Document - GridFS goes where?

I have been trying to read the documentation for GridFS and MongoDB for awhile. They just keep repeating the same thing and I can't make sense of it.
Desired Output: The user submits a form that form contains many fields, but one is an image. The request needs to store the data in a collection and make a new document, which can be retrieved later. My main question is how do I use GridFS in this situation to store an image in that document.
It says GridFS makes two collections in my database files and chunks. So how do those collections relate to my other collection which has the other form data?
I assume a reference to these files and chucks collection, however, I can't make any sense of this. It's been a few days and I feel like it's time to reach out to my StackOverflow community.
Can someone please explain to me the program flow and key points for how I can achieve my goal of storing an image in a document using gridfs?
GridFS is mentioned everywhere and seems to be popular but I can't make sense of it. These moments of utter confusion usually result in a breakthrough, so I'm eager to learn from veterans and experts.
GridFS collections are internal implementation detail of GridFS. There is no linking between them and your other data.
To write to and read from GridFS, you would use GridFS APIs provided by your driver. Generally this means that, if you are saving for example some fields and a binary blob like an image, you would perform the save in two steps (one insert/update operation for the fields and a separate GridFS operation for the binary blob).
Can someone please explain to me the program flow and key points for how I can achieve my goal of storing an image in a document using gridfs?
You wouldn't store the image in your document. You would store the image in GridFS, and in your document you could include a reference to the GridFS file (those have their own ids).

Create schema.xml automatically for Solr from mongodb

Is there an option to generate automatically a schema.xml for solr from mongodb? e.g each field of a document and subdocuments from a collection should by indexed and get searchable by default.
As written as in this SO answer Solr's Schemaless Mode could help you
Solr supports a Schemaless Mode. When starting Solr this way, you are initially not bound to a schema. When you give Solr a first document it will guess the appropriate field types and generate a schema that includes those field types for you. These fields are then fixed. You may still add new fields on the fly that way.
What you still need to do is to create an Import Route of some kind from your mongodb into Solr.
After googling a bit, you may stumble over the SO question - solr Data Import Handlers for MongoDB - which may help you on that part too.
Probably simpler would be to create a mongo query whose result contains all relevant information you require, save the result to json and send that to Solr's direct update handler, which can parse json.
So in short
Create a new, empty core in Schemaless Mode
Create an import of some kind that covers all entities and attributes you want
Run the import
Check if the result is as you want it to be
As long as (4) is not satisfied you may delete the core and repeat these steps.
No, MongoDB does not provide this option. You will have to create a script that maps documents to XML.

Question about doing in-place updates in Basho Riak

Currently I use Mongodb for recording statistics and adserving. I log raw impressions to a log collection, and processes' do findandmodify to pull off the log and aggregate into a precomputed collection using upsert (similar to how rainbird works with twitter).
http://techcrunch.com/2011/02/04/twitter-rainbird/
I aggregate on the parent, child, childs child etc, which makes querying for statistics fast and painless.
I use (in mongo) a key consisting of the {Item_id, Hour} and upsert to that (alot)
I was wondering if Riak had a strong way to solve the same problem, and how I would implement it.
Short answer: I don't think Riak supports upsert-like operations.
Long answer: Riak is a Key-Value store which treats stored values as opaque data. But in the future Riak could consider adding support for HTTP PATCH which might allow one to support operations similar to upsert. There is another category of operations (compare-and-set) which would also be interesting, but supporting these is definitely much more complicated.
The way this works with Riak depends on the storage backend that you're using for Riak.
Bitcask, the current default storage backend, uses a log-structured hash tree for the internal storage mechanism. When you write a new record to Riak, an entirely new copy of your data is stored on disk. Eventually, compaction of the bitcask will occur and the old copies of your data will be removed from the bitcask file.
Any put into Riak is effectively an upsert - if the data doesn't exist, a new record will be inserted. Otherwise, the existing value will be updated by expiring the old value and making the newest value the current value.