Pymongo: iterate over all documents in the collection - mongodb

I am using PyMongo and trying to iterate over (10 millions) documents in my MongoDB collection and just extract a couple of keys: "name" and "address", then output them to .csv file.
I cannot figure out the right syntax to do it with find().forEach()
I was trying workarounds like
cursor = db.myCollection.find({"name": {$regex: REGEX}})
where REGEX would match everything - and it resulted in "Killed".
I also tried
cursor = db.myCollection.find({"name": {"$exist": True}})
but that did not work either.
Any suggestions?

I cannot figure out the right syntax to do it with find().forEach()
cursor.forEach() is not available for Python, it's a JavaScript function. You would have to get a cursor and iterate over it. See PyMongo Tutorial: querying for more than one document, where you can do :
for document in myCollection.find():
print(document) # iterate the cursor
where REGEX would match everything - and it resulted in "Killed".
Unfortunately there's lack of information here to debug on why and what 'Killed' is. Although if you would like to match everything, you can just state:
cursor = db.myCollection.find({"name": {$regex: /.*/}})
Given that field name contains string values. Although using $exists to check whether field name exists would be preferable than using regex.
While the use of $exists operator in your example above is incorrect. You're missing an s in $exists. Again, unfortunately we don't know much information on what 'didn't work' meant to help debug further.
If you're writing this script for Python exercise, I would recommend to review:
PyMongo Tutorial
MongoDB Tutorial: query documents
You could also enrol in a free online course at MongoDB University for M220P: MongoDB for Python Developers.
However, if you are just trying to accomplish your task of exporting CSV from a collection. As an alternative you could just use MongoDB's mongoexport. Which has the support for :
Exporting specific fields via --fields "name,address"
Exporting in CSV via --type "csv"
Exporting specific values with query via --query "..."
See mongoexport usage for more information.

I had no luck with .find().forEach() either, but this should find what you are searching for and then print it.
First find all documents that match what you are searching for
cursors = db.myCollection.find({"name": {$regex: REGEX}})
then iterate it over the matches
for cursor in cursors
print(cursor.get("name"))

The find() methods returns a PyMongo cursor, which is a reference to the result set of a query.
You have to de-reference, somehow, the reference(address).
After that, you will get a better understanding how to manipulate/manage the cursor.
Try the following for a start:
result = db.*collection_name*.find()
print(list(result))

I think I get the question but there's no accurate answer yet I believe. I had the same challenge and that's how I came about this, although, I don't know how to output to a .csv file. For my situation I needed the result in JSON. Here's my solution to your question using mongodb Projections;
your_collection = db.myCollection
cursor = list(your_collection.find( { }, {"name": 1, "address": 1}))
This second line returns the result as a list using the python list() function.
And then you can use jsonify(cursor) or just print(cursor) as a list.
I believe with the list it should be easier to figure how to output to a .csv.

Related

MongoDB $regex with $in clause

I need a mongodb query something like
db.getCollection("xyz").find({"_id" : {$regex : {$in : [xxxx/*]}}})
My Use case is -- I have a list of Strings such as
[xyz/12/poi, abc/98/mnb, ytn/65/tdx, ...]
The ids that are there in the collection(test) are something like
xyz/12/poi/2019061304.
I will get the values like xyz/12/poi from the input list, the other part of the id being yyyymmddhh format.
So, I need to go to the collection and find all the documents matching the input list with the ID of the documents in the test collection.
I can retrieve the documents individually but that does not seem to be a feasible option as the size of the input list is more than 10000.
Can you guys suggest a more feasible solution. Thanks in advance.
I tried using $in with $regex. But it seems mongodb does not support that. I have also tried pattern matching but even that is not feasible for me. Can you please suggest an alternative to using $in with $regex in mongodb.
Expected result could be an aggragate query/a normal query so that we hit the database only once and get the desired output rather than hitting the db for 10000 odd times.

Wildcard index search result problem. I can't fetch what I want

I've an instance of MongoDB with a lot of data in it and somebody trying to make search easier create a Wildcard Index but when I use .find() function I only got results when I use a complete sentence.
For example:
If I'm trying to find "Netherlands" but I only write "Neth" nothing happens. I want to make this kind of querys like when we use a Wildcard in SQL.
This is an example of what I'm doing:
db.getCollection('transaction').find({$text: {$search: "Neth"}}, {score: {$meta: "textScore"}}).sort({score:{$meta:"textScore"}})

How do I make a mongo query for something that is not in a subdocument array of heterodox size?

I have a mongodb collection full of 65k+ documents, each one with a properties named site_histories. The value of it is an array that might be empty, or might not be. If it is not empty, it will have one or more objects similar to this:
"site_histories" : "[{\"site_id\":\"129373\",\"accepted\":\"1\",\"rejected\":\"0\",\"pending\":\"0\",\"user_id\":\"12743\"}]"
I need to make a query that will look for every instance in the collection of a document that does not have a given user_id.
I'm pretty new to Mongo, so I was trying to make a query that would find every instance that does have the given user_id, which I was then planning on adding a "$ne" to, but even that didn't work. This is the query I was using that didn't work:
db.test.find({site_histories: { $elemMatch: {user_id: '12743\' }}})
So can anyone tell me why this query didn't work? And can anyone help me format a query that will do what I need the final query to do?
If your site_histories really is an array, it should be as simple as doing:
db.test.find({"site_histories.user_id": "12743"})
That looks in all the elements of the array.
However, I'm a bit scared of all those backslashes. If site_histories is a string, that won't work. It would mean that the schema is poorly designed, you'd maybe try with $regex

update mongo db documents with regex

I need to find all the documents in mongodb that have keywords that start with number 1-9, then add a '+' in front of the keyword, I can easily find the documents but cannot figure out how to update them.
I tried this one, but it doesn't work
db.placements.update({program_id:{$in:[113,107]},
keyword:{$regex:'^[0-9]', $options:'i'}},
{keyword:"+"+$keyword})
It cannot recognize $keyword, I also tried '.keyword', 'keyword', none of them works. Is there any way to reference the document itself like Java does, using 'this', so I can do something like
this.keyword: "+" + this.keyword
You'll have to use the $set operator in the update query to update a specific field. Also, you cannot concatenate string within an update query. One way to do this would be using cursor forEach() in the shell:
db.placements.find({program_id:{$in:[113,107]}, keyword:{$regex:'^[0-9]', $options:'i'}})
.forEach(function(doc){
db.placements.updateOne({_id:doc._id}, {$set:{"keyword":"+" + doc.keyword}})
})
No, you cannot reference a value on the document itself when querying like you can with SQL.
I would suggest querying the document, updating it on your web/app server and then updating the value back to mongodb.
You will also find that your update command above will wipe your entire document leaving only the keyword field. You should use the $set modifier to update a field or set of fields.
db.placements.update(
{
program_id:{$in:[113,107]},
keyword:{$regex:'^[0-9]', $options:'i'}
},
{ $set: {keyword: new_value}})

Haskell mongodb text search

What is the status of text search with haskell mongodb driver?
There is now 'LIKE' operator in mongo similar to SQL variants, so what is the best way to search a collection or the whole db for a particular text string?
I've read some people referencing external tools but I can also see that some text search was implemented in 2.4 mongo version which is done through command interface.
There should not be any problems doing it from console but how would I do it from haskell driver? I found 'runCommand' function in the driver APIs and it looks like it should be possible to send 'text' command to the server but the signature shows that it returns only one document - not a list of documents. So how is it done correctly?
How would I efficiently search for a word or a sentence in a collection or db so that it returns a list of documents containing the word? Is it possible to do without external tools using mongo 'text search' feature? SHould it be done in the application level?
Thanks.
The result type already contains the list of documents (that contain the searched text). Unfortunately, I could not test the query on my running database, but I have used runCommand to run an aggregation (before it was implemented for the haskell driver). The result document you get for such an query looks something like this:
{ results: [
{ score : ...,
obj : { ... }
},
...
],
... ,
ok : 1
}
The result document has a field results and its value is a document with fields score and obj. So in the end, you can find each of the matched document behind the obj-field in the list of results.
For more details, you should take a look here.