I am storing document(text,pdf,csv,doc,docx etc) in mongodb using spring rest.Documents are getting stored as binary data.
Now i want to search documents based on contents inside it.
For e.g. if user searches for string "office", he should see list of documents that contains string "office".
How can i query mongodb for data contains in binary data?
You could try to define a text index over your binary files. I don't know if it would work, but even if it does, such an index would match any words that are part of the file format rather than user content which is generally undesirable.
If I was implementing your requirements I would use a transformer from all of the binary documents to plain text (e.g. pandoc), thus obtaining the user content of each of the documents, then insert that content into a field which has a text index over it, then query on that field.
Related
I have a mongo collection in which at any depth or within array elements there may occur ObjectId('11111111') as a value.
I need to replace it everywhere with ObjectId('2222222').
Is there an easy way of doing this in mongo?
Dump the collection to extended json, open the dump in a text editor, replace the values, then load the dump back into the database.
So i am using an npm package called mongoose-encryption all is great except one important thing. I have an array called reports and it has alot of objects in it. Each object has one unique field called report_id the thing is i need to perform a delete operation based on that ID but if i encrypt it mongoose apparently cant find it. For that i excluded some fields like the docs said but i cant apparently exclude a nested field i tried this:
usersSchema.plugin(encrypt,{secret:sigKey,excludeFromEncryption: ['username','reports.report_id']});
So the username is excluded from encryption but not the reports.report_id
any ideas?
The [document](https://www.npmjs.com/package/mongoose-encryption] says:
To encrypt, the relevant fields are removed from the document, converted to JSON, enciphered in Buffer format with the IV and plugin version prepended, and inserted into the _ct field of the document. Mongoose converts the _ct field to Binary when sending to mongo.
If the reports field is encrypted by mongoose-encryption, it will not be present at all in the document stored in MongoDB.
If you need the value of the reports.report_id field to be accessible in a query while encrypting the reports field, you'll need to copy them out to an array that is not being encrypted.
Is there a difference between a wildcard search index like $** and text indexes that I create for each of the fields in the collection ?
I do see a small difference in response time when I individually create text indexes. Using individual indexes, returns a better response. I am not able to post an example now, but will try to.
A wildcard text search will index every field that contains string data for each document in the collection (https://docs.mongodb.com/manual/core/index-text/#wildcard-text-indexes).
Because you are essentially increasing the number of fields indexed with a wild card text index, it would take longer to run compared to targeting specific fields for a text index.
Since you can only have one text index per collection (https://docs.mongodb.com/manual/core/index-text/#create-text-index), its worth considering which fields you plan on querying against beforehand.
I am new in MongoDB. I am using MongoDB 3.6.3. I have one fields in the document with huge josn data and I don't know all key names present inside JSON data.
Now I want to search documents by the value which will present in any key inside JSON data.
I have tried using this.
db.getCollection('booking').find({'result.important_information': /.*small dogs*/})
But this required key for search But I dont have key name I have to search using the only value.
I would like to encode some meaning behinds first N characters of every document ID i.e. make first three characters determine a document type sensible to the system being used in.
You can have a custom _id when you insert the document. If the document to be inserted doesn't contain _id, then MongoDB will insert a ObejctId for you.
The _id can be of any type but keeping it uniform for all the documents makes sense if you are accessing from application layer.
You can refer one of the old questions at SO - How to generate unique object id in mongodb