How to find data thats greter than filename in node js gridfs mongodb - mongodb

I was trying to query data after spliting my variable filename. After that i tried to qurty it with gfs.find({filename > 1}) to find data that are file name is greater than 1

Related

Trouble querying specific text field in mongoDB using pymongo

I have stored around 120 text files in a mongoDB database through connecting my local instance to mongodb cloud. I used pymongo to automate the insertion of the contents of each text file into mongodb cloud. The collection of 120 documents looks like this:
'''
{ _id:ObjectID(....),
nameTextdoc.txt:"text_document",
content: ['Each sentence stored in an array.','...']
'''
I am trying to retrieve the nameTextdoc.txt field and content field by using:
'''
collections.find_one({'nameTextdoc.txt': 'text_doc'})
'''
in a python script using pymongo. For some reason I receive None when I run this query. However, when I run:
'''
collections.find_one({})
'''
I get the entire document.
I would like to get assistance on writing a query that would retrieve the entirety of the text file by querying the name of the text file. I have periods in my key names, which may be the specific reason why I cannot retrieve them. Any help would be much appreciated.

Spring Mongodb search string inside binary data

I am storing document(text,pdf,csv,doc,docx etc) in mongodb using spring rest.Documents are getting stored as binary data.
Now i want to search documents based on contents inside it.
For e.g. if user searches for string "office", he should see list of documents that contains string "office".
How can i query mongodb for data contains in binary data?
You could try to define a text index over your binary files. I don't know if it would work, but even if it does, such an index would match any words that are part of the file format rather than user content which is generally undesirable.
If I was implementing your requirements I would use a transformer from all of the binary documents to plain text (e.g. pandoc), thus obtaining the user content of each of the documents, then insert that content into a field which has a text index over it, then query on that field.

how to filter files in mongo Db

Am running 4.10 Mongo DB version
How to get the count of files filtered like
i have .doc and .pdf and .csv in file system in MOngo.
how to check the count of each format files (.csv ,.pdf,.doc)
Please let me know
I am assuming you don't mean references, but files that are in GridFS. In this case, the way it works is that Mongo simply puts them in 2 collections, one for files meta data, and another collection for chunks of files, that is to say the data of your files that is sliced in pieces. They are regular collections that you can query like any other collection. Their names are "fs.files" and "fs.chunks".
The one you want is "fs.files". For retrieving the files by type, you can use the field contentType. Here is how you get the number of PDFs:
db.fs.files.find({contentType: "application/pdf"}).count()
// or if your question is only for counting
db.fs.files.count({contentType: "application/pdf"})
Like I said, just like any other collection.
EDIT:
var pdfCount = db.fs.files.find({contentType: "application/pdf"}).count();
var csvCount = db.fs.files.find({contentType: "text/csv"}).count();
var docCount = db.fs.files.find({contentType: "application/msword"}).count();

How to index array data type from mongodb to elasticsearch using logstash

We were trying to index data from mongodb using logstash but we were unable to index array data type fields alone, also there were no errors in the log file.
It was an issue with MongoDB plugin which we used in logstash.
we added ruby code to the logstash-conf file to get the arrays from log_entry field.
Note: log_entry will have the complete field list and data.

Sphinx search with xml pipe

I have one json array of Objects.I am creating sphinx compatible xml document from this JSON Array.For each object i create one document and specify id value for it.For example if json array contains 20 objects then i have to create 20 documents starting from 1 to 20.Now my json array update with time and new Objects come in place so i need to assign them id starting from 21 and so on.So is there any way to maintain this id values from Sphinx side internally?
Just store the latest used id somewhere. A text file or something.
Or just get the highest id from the sphinx index. So the script building the xml file, could just first run a sphinxql query
select id from gi_stemmed order by id desc limit 1;