how do I exlude term in mongodb query - mongodb

I am having urls in this format stored in mongodb
Source:
index.php?name=xxxxxxxxxxxxxabcxxxxxxxx&id=15&success=1
index.php?name=xxxxxxxdefxxxxxxxxxxxx&id=18&success=0
where xxxxxxxxxxxxxxx is some string
I want to write a query to find all sources where name should not contain "abc" as a substring
So I wrote the query
db.coll.find({source:/(?!name=abc)/})
but this query is not working..please guide me what will be the correct query

db.coll.find({source: {$not: /[?&]name=.*abc.*(&|$)/}})
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-Metaoperator%3A%24not

regex w/ $nin (not in). Don't think that is supported as a single query yet...
try looking at
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24nin
http://jira.mongodb.org/browse/SERVER-322
https://github.com/mongodb/mongo/commit/6c7dc2b0f8831fac6621f125889d873241588b02

Related

mongodb index on regex fields not working

I'm new in mongoDB and I'm facing an issue about performance that need your help. I have a collection with 400k records, when not create index for any field on the collection it takes 20-30s for each query then I create indexs for fields that usually using for search query, but the problem is, when using $regex to search for a string field with index on it, mongoDB does not use index on that field, mongodb still scan for all records in that collection, I've searched on internet with this keyword: "index on regex fields mongodb" and I found some answers which say that "MongoDB use prefix of RegEx to lookup indexes" which means you have to use "^" prefix for the index to work like "db.users.find({name: /^key word/})", but that is not working for me, does "index on $regex field" need MongoDB Atlas to work? because i'm using comunity version of mongoDB. Thanks!
There's a lot to unpack here. We'll split the answer into two parts, the first to try and answer some of the direct questions about index usage and the second to explore solutions to satisfy the application requirements.
Index Usage with $regex
As is true with an index in any database that captures the full string value as the key, MongoDB can use the index for a $regex operation but its efficiency in doing so greatly depends on the regex being applied. That is what the Index Use documentation from the comments and the other answers you reference are describing.
In the comments you mention that an example query might be db.users.find({name: {$regex: '.*keyword.*', $options: 'i'}}). That means that the regex is a both unanchored and case-insensitive. The aforementioned doumentation states directly:
Case insensitive regular expression queries generally cannot use indexes effectively.
Why is this? because the substring that you are searching for can be found in any string value captured by the index. So the document with matching value {name: 'a keyword'} would be located at one end of the index, {name: 'keyWord' }, may be somewhere in the middle, and {name: 'Z keyword'} may be at the end. The only way to ensure correct results is for the database to scan the index for all string values. So while it is still using the index, it may not be efficient as most of the scanned values will not be match and will be discarded.
You may always use .explain() to better understand how the database is answering the query, such as if and how it is using an index.
Solutions
So what do we do about this?
Well as #rickhg12hs suggests in the comments, it depends on exactly what you are trying to achieve. You reiterate that that you are looking for 'full regex search capability', but that is really an approach/solution rather than a goal. If what you really need, for example, is just to match an exact string in a case insensitive manner, then something as simple as a case insensitive index would likely do the trick.
However if truly do wish to perform arbitrary substring searching, then you are really looking at search engine capabilities. In that situation your best bets would probably be to emulate their indexes directly in MongoDB (e.g. have the application manually tokenize the strings to be indexed), stand up something like Solr/Elasticsearch next to MongoDB, or use MongoDB's Atlas Search offering. The $text operator mentioned in the comment has limitations when it comes to substring searching (such as just part of a word), which may or may not be relevant for your needs.

MongoDB $regex with $in clause

I need a mongodb query something like
db.getCollection("xyz").find({"_id" : {$regex : {$in : [xxxx/*]}}})
My Use case is -- I have a list of Strings such as
[xyz/12/poi, abc/98/mnb, ytn/65/tdx, ...]
The ids that are there in the collection(test) are something like
xyz/12/poi/2019061304.
I will get the values like xyz/12/poi from the input list, the other part of the id being yyyymmddhh format.
So, I need to go to the collection and find all the documents matching the input list with the ID of the documents in the test collection.
I can retrieve the documents individually but that does not seem to be a feasible option as the size of the input list is more than 10000.
Can you guys suggest a more feasible solution. Thanks in advance.
I tried using $in with $regex. But it seems mongodb does not support that. I have also tried pattern matching but even that is not feasible for me. Can you please suggest an alternative to using $in with $regex in mongodb.
Expected result could be an aggragate query/a normal query so that we hit the database only once and get the desired output rather than hitting the db for 10000 odd times.

How to use query commands in MongoDB?

MongoDb query
I am new to MongoDB, I just started learning recently.When I am using a query command for instance, db.tests.find({"by":"Srihari"}) .It is not giving any output. Is there any wrong with my query? Please help!
From the screenshot you've shared following document exists in your tests collection:
{"username": "srihari"}
{"username": "srih"}
{"username": "srh"}
{"username": "sh"}
The query you're sending to mongodb is :
db.tests.find({"by":"Srihari"})
There isn't any document in tests collection that matches your query.
However, you can query like this:
db.tests.find({"username": "sh"})
will definately return the result.
In MongoDB you specify equality conditions, using <field>:<value> expressions in the query filter. So db.tests.find({"by":"Srihari"}) is looking for all documents where the field "by" has the value "Srihari".
Since your document has the format
{
username: "srihari"
}
your query should be:
db.tests.find({username: "srihari"})
You can see more examples here: https://docs.mongodb.com/manual/tutorial/query-documents/

Pymongo: iterate over all documents in the collection

I am using PyMongo and trying to iterate over (10 millions) documents in my MongoDB collection and just extract a couple of keys: "name" and "address", then output them to .csv file.
I cannot figure out the right syntax to do it with find().forEach()
I was trying workarounds like
cursor = db.myCollection.find({"name": {$regex: REGEX}})
where REGEX would match everything - and it resulted in "Killed".
I also tried
cursor = db.myCollection.find({"name": {"$exist": True}})
but that did not work either.
Any suggestions?
I cannot figure out the right syntax to do it with find().forEach()
cursor.forEach() is not available for Python, it's a JavaScript function. You would have to get a cursor and iterate over it. See PyMongo Tutorial: querying for more than one document, where you can do :
for document in myCollection.find():
print(document) # iterate the cursor
where REGEX would match everything - and it resulted in "Killed".
Unfortunately there's lack of information here to debug on why and what 'Killed' is. Although if you would like to match everything, you can just state:
cursor = db.myCollection.find({"name": {$regex: /.*/}})
Given that field name contains string values. Although using $exists to check whether field name exists would be preferable than using regex.
While the use of $exists operator in your example above is incorrect. You're missing an s in $exists. Again, unfortunately we don't know much information on what 'didn't work' meant to help debug further.
If you're writing this script for Python exercise, I would recommend to review:
PyMongo Tutorial
MongoDB Tutorial: query documents
You could also enrol in a free online course at MongoDB University for M220P: MongoDB for Python Developers.
However, if you are just trying to accomplish your task of exporting CSV from a collection. As an alternative you could just use MongoDB's mongoexport. Which has the support for :
Exporting specific fields via --fields "name,address"
Exporting in CSV via --type "csv"
Exporting specific values with query via --query "..."
See mongoexport usage for more information.
I had no luck with .find().forEach() either, but this should find what you are searching for and then print it.
First find all documents that match what you are searching for
cursors = db.myCollection.find({"name": {$regex: REGEX}})
then iterate it over the matches
for cursor in cursors
print(cursor.get("name"))
The find() methods returns a PyMongo cursor, which is a reference to the result set of a query.
You have to de-reference, somehow, the reference(address).
After that, you will get a better understanding how to manipulate/manage the cursor.
Try the following for a start:
result = db.*collection_name*.find()
print(list(result))
I think I get the question but there's no accurate answer yet I believe. I had the same challenge and that's how I came about this, although, I don't know how to output to a .csv file. For my situation I needed the result in JSON. Here's my solution to your question using mongodb Projections;
your_collection = db.myCollection
cursor = list(your_collection.find( { }, {"name": 1, "address": 1}))
This second line returns the result as a list using the python list() function.
And then you can use jsonify(cursor) or just print(cursor) as a list.
I believe with the list it should be easier to figure how to output to a .csv.

mongodb wildcard query from grails/groovy

I’m having some problems with issuing a wildcard query in MongoDB from my Grails application.
Basically the way I am doing it now is by issuing a find query with an array of query parameters:
db.log.find(criteria) -> where criteria is an array [testId:"test"]
This works fine as long as I’m strictly querying on actual values. However, for fun, I tried it with a wildcard search instead:
db.log.find(criteria) -> this time critera = [testId:/.*te.*/]
This however will after looking at the Mongo query log as:
query: { query: { testId: "/.*te.*/" }
hence making the query not a wildcard search, but a query for this as a string, instead.
Is there a way to work around this in some sense still using this concept of querying?
Thanks in advance!
Use the Groovy Pattern shortcut ~ to specify that your query is a regular expression.
db.log.find(['testId': ~/.*te.*/])
See this blog post for more info
To use regex query, define query condition with $regex operator
def regexCondition = ['$regex': '/.*te.*/']
def criteria = ['testId': regexCondition]
db.log.find(criteria)
This worked for me:
In your groovy file:
db.collectionName.find([fieldName:[$regex:'pattern']])
More or less, use a regular mongodb query, but replace the {} with [].