MongoDB $regex with $in clause - mongodb

I need a mongodb query something like
db.getCollection("xyz").find({"_id" : {$regex : {$in : [xxxx/*]}}})
My Use case is -- I have a list of Strings such as
[xyz/12/poi, abc/98/mnb, ytn/65/tdx, ...]
The ids that are there in the collection(test) are something like
xyz/12/poi/2019061304.
I will get the values like xyz/12/poi from the input list, the other part of the id being yyyymmddhh format.
So, I need to go to the collection and find all the documents matching the input list with the ID of the documents in the test collection.
I can retrieve the documents individually but that does not seem to be a feasible option as the size of the input list is more than 10000.
Can you guys suggest a more feasible solution. Thanks in advance.
I tried using $in with $regex. But it seems mongodb does not support that. I have also tried pattern matching but even that is not feasible for me. Can you please suggest an alternative to using $in with $regex in mongodb.
Expected result could be an aggragate query/a normal query so that we hit the database only once and get the desired output rather than hitting the db for 10000 odd times.

Related

Pymongo: iterate over all documents in the collection

I am using PyMongo and trying to iterate over (10 millions) documents in my MongoDB collection and just extract a couple of keys: "name" and "address", then output them to .csv file.
I cannot figure out the right syntax to do it with find().forEach()
I was trying workarounds like
cursor = db.myCollection.find({"name": {$regex: REGEX}})
where REGEX would match everything - and it resulted in "Killed".
I also tried
cursor = db.myCollection.find({"name": {"$exist": True}})
but that did not work either.
Any suggestions?
I cannot figure out the right syntax to do it with find().forEach()
cursor.forEach() is not available for Python, it's a JavaScript function. You would have to get a cursor and iterate over it. See PyMongo Tutorial: querying for more than one document, where you can do :
for document in myCollection.find():
print(document) # iterate the cursor
where REGEX would match everything - and it resulted in "Killed".
Unfortunately there's lack of information here to debug on why and what 'Killed' is. Although if you would like to match everything, you can just state:
cursor = db.myCollection.find({"name": {$regex: /.*/}})
Given that field name contains string values. Although using $exists to check whether field name exists would be preferable than using regex.
While the use of $exists operator in your example above is incorrect. You're missing an s in $exists. Again, unfortunately we don't know much information on what 'didn't work' meant to help debug further.
If you're writing this script for Python exercise, I would recommend to review:
PyMongo Tutorial
MongoDB Tutorial: query documents
You could also enrol in a free online course at MongoDB University for M220P: MongoDB for Python Developers.
However, if you are just trying to accomplish your task of exporting CSV from a collection. As an alternative you could just use MongoDB's mongoexport. Which has the support for :
Exporting specific fields via --fields "name,address"
Exporting in CSV via --type "csv"
Exporting specific values with query via --query "..."
See mongoexport usage for more information.
I had no luck with .find().forEach() either, but this should find what you are searching for and then print it.
First find all documents that match what you are searching for
cursors = db.myCollection.find({"name": {$regex: REGEX}})
then iterate it over the matches
for cursor in cursors
print(cursor.get("name"))
The find() methods returns a PyMongo cursor, which is a reference to the result set of a query.
You have to de-reference, somehow, the reference(address).
After that, you will get a better understanding how to manipulate/manage the cursor.
Try the following for a start:
result = db.*collection_name*.find()
print(list(result))
I think I get the question but there's no accurate answer yet I believe. I had the same challenge and that's how I came about this, although, I don't know how to output to a .csv file. For my situation I needed the result in JSON. Here's my solution to your question using mongodb Projections;
your_collection = db.myCollection
cursor = list(your_collection.find( { }, {"name": 1, "address": 1}))
This second line returns the result as a list using the python list() function.
And then you can use jsonify(cursor) or just print(cursor) as a list.
I believe with the list it should be easier to figure how to output to a .csv.

Mongo DB search based on multiple conditions

I am trying to search based on multiple conditions which works but the problem is that does not behave like this.
Assuming i have a search query like
Orders.find({$or: {"status":{"$in":["open", "closed"]},"paymentStatus":{"$in":["unpaid"]}}}
)
and i add another filter parameter like approvalStatus it does not leave the previously found items but rather it treats the query like an AND that will return an empty collection of items if one of the queries does not match.
How can i write a query that regardless of what is passed into it, it will retain previously found items even if there is no record in one of the conditions.
like a simple OR query in sql
I hope i explained this well enough
Using $or here is the right approach, but its value needs to be an array of query expressions, not an object.
So your query should look something like this instead:
Orders.find({$or: [
{"status": {"$in": ["open", "closed"]}},
{"paymentStatus": {"$in": ["unpaid"]}},
{"approvalStatus": {"$in": ["approved"]}}
]})

How do I make a mongo query for something that is not in a subdocument array of heterodox size?

I have a mongodb collection full of 65k+ documents, each one with a properties named site_histories. The value of it is an array that might be empty, or might not be. If it is not empty, it will have one or more objects similar to this:
"site_histories" : "[{\"site_id\":\"129373\",\"accepted\":\"1\",\"rejected\":\"0\",\"pending\":\"0\",\"user_id\":\"12743\"}]"
I need to make a query that will look for every instance in the collection of a document that does not have a given user_id.
I'm pretty new to Mongo, so I was trying to make a query that would find every instance that does have the given user_id, which I was then planning on adding a "$ne" to, but even that didn't work. This is the query I was using that didn't work:
db.test.find({site_histories: { $elemMatch: {user_id: '12743\' }}})
So can anyone tell me why this query didn't work? And can anyone help me format a query that will do what I need the final query to do?
If your site_histories really is an array, it should be as simple as doing:
db.test.find({"site_histories.user_id": "12743"})
That looks in all the elements of the array.
However, I'm a bit scared of all those backslashes. If site_histories is a string, that won't work. It would mean that the schema is poorly designed, you'd maybe try with $regex

to compare two fields of the same collection

I want to compare two fields of the same collection (Mysql query example "SELECT * FROM table AS t WHERE t.field1 > t.filed2;") in mongodb with cakephp. I cannot use '$where' and aggregate of mongodb as I am also using other operators of mongodb like $or, $and and etc. And also I am using find of mongodb.
Ex: Collection have two fields integer fields per_day_budget and today_spent and I want to get the list of records where today_spent is less than or equal to per_day_budget. I hope this will you to better understand my query.
Kindly suggest solution for the same.
You can try:
db.collection.find({ this.today_spent : {$lte : this.per_day_budget}});

Mongodb: return matched filters when using $or in find()

Suppose I am doing a query in Mongodb like this
db.user.find({$or : [{"field1" : "abc"}, {"field2" : "def"}, {"field3" : "ghi"}]})
And a number of documents are returned. What is the easiest way to know which one (or multiple) of the three filters is matched for each document returned? By "easiest", I do not wish to add more executions of find()'s.
Thanks.
There is no such option to solve this on the MongoDB query layer. Likely you want to perform individual queries instead one big $or query in order to solve your problem.