Search for MongoDB records containing a certain string - mongodb

I have a MongoDB database full of tweets that I've gathered using the tweepy API, and I want to be able to search on a web application a hashtag and it shows the tweets containing that hashtag.
Currently, I have created a list with the DB records in, and iterating through that list to display them all, but now I want to refine the search so the user can choose what they see. I have the users search saved into a variable and I have tried the following ideas, but none seem to be working.
My first idea was to just pass in the variable and hope for the best
def display():
input = request.form['input'] # setting variable for user input
# Set up Mongo Client
client = MongoClient("mongo.user/pass")
# Accessing database
db = client.tweets
# acessing a collection
posts = db.posts
data = list(posts.find(input))
return render_template('results.html', posts_info=data)
With this, I get a TypeError, which I somewhat expected as I didn't expect it to be this easy
After some reading online, I tried using a regex.
def display():
input = request.form['input'] # setting variable for user input
# Set up Mongo Client
client = MongoClient("mongo.user/pass")
# Accessing database
db = client.tweets
# acessing a collection
posts = db.posts
data = list(posts.find({Tweet: {$regex: input}}))
return render_template('results.html', posts_info=data)
This also didn't work so I tried to hard code the regex to see if it was the user input variable creating issues
def display():
input = request.form['input'] # setting variable for user input
# Set up Mongo Client
client = MongoClient("mongo.user/pass")
# Accessing database
db = client.tweets
# acessing a collection
posts = db.posts
data = list(posts.find({Tweet: {$regex: "GoT"}}))
return render_template('results.html', posts_info=data)
With both these methods, I get syntax errors at the start of the regex expression, and flags the $ before regex
Error message that I get reads:
File "pathToWebApp/webApp.py", line 71
data = list(posts.find({Tweet: {$regex: input}}))
^
SyntaxError: invalid syntax
I've never worked with MongoDB or used regexs so I'm at a complete loss here. I've scoured the mongo docs but nothing I've tried works so any help from anyone would be greatly appreciated

By the looks of it you might need to wrap the $regex key in quotes as well:
data = list(posts.find({Tweet: {"$regex": input}}))

Related

MongoDB Atlas - Trigger returning empty object

I've been trying to search through documentation, and other various stack overflow posts, but I can't seem to find out how to do this. I'm trying setup a Trigger for MongoDB Atlas with the following function:
exports = function() {
const mongodb = context.services.get("test_cluster");
const db = mongodb.db("test_database");
const test_collection = db.collection("test_collection");
var tasks = test_collection.find();
console.log(JSON.stringify(tasks));
};
but the logs return "{}" every time I run it, is there something basic that I'm missing here? There is data in the collection with just some dummy data with a basic _id and some values.
I figured out my problem, Atlas does not output the information from console.log into their console through the browser. I setup the variables correctly, but to get the information I wanted I needed to use a return statement instead.

unable to set db using string while coding in python

I would like to know if there is a way to set db using a variable
For example: I am coding in Python, and I connect using client = MongoClient(uri). All goes fine.
There are 4 dbs: test1,test2,test3,test4.
I am able to list them all.
dblist = client.list_database_names()
print(dblist)
All goes fine.
Now, Instead of connecting/ using
db = client.test1
Is there a way to use a string rather than actual name of the db?
such as str = 'test1', and then db=client.str.
(this doesn't work)
In my program , I display the list of dbs first and then I am taking user input on the db , and proceed with further flow, but unable to do so.
Please help.
You cannot add string as an name when it comes to that. However there is another function that takes string of the name of certain database and gets the database.
db=client.get_database('test')
Here is the documentation: https://api.mongodb.com/python/current/api/pymongo/mongo_client.html

find_one() finds duplicates that are not there

I am trying to copy a remote mongodb atlas server to a local one. I do this by a python script which also checks if the record is already there. I see that eventhough the local database is empty my script find duplicates, which are not in the remote mongodb atlas (at least i cannot find them). I am not so experienced with mongodb and pymongo but I connot see what I am doing wrong. Sometimes Find_one() finds exactly the same record as before (all the fields are the same even the _id) ?
I removed the collection completely from my local server and tried again, but still the same result.
UserscollectionRemote = dbRemote['users']
UserscollectionNew = dbNew['users']
LogcollectionRemote = dbRemote['events']
LogcollectionNew = dbNew['events']
UsersOrg = UserscollectionRemote.find()
for document in UsersOrg: # loop over all users
print(document)
if UserscollectionNew.find_one({'owner_id': document["owner_id"]}) is None: # check if already there
UserscollectionNew.insert_one(document)
UserlogsOrg = LogcollectionRemote.find({'owner_id': document["owner_id"]}) # get all logs from this user
for doc in UserlogsOrg:
try:
if LogcollectionNew.find_one({'date': doc["date"]}) is None: # there was no entry yet with this date
LogcollectionNew.insert_one(doc)
else:
print("duplicate");
print (doc);
except:
print("an error occured finding the document");
print(doc);
You have the second for loop inside the first; that could be trouble.
On a separate note, you should investigate mongodump and mongorestore for copying collections; unless you need to be doing it in code, these tools are better suited for your use case.

python 3.7 and ldap3 reading group membership

I am using Python 3.7 and ldap3. I can make a connection and retrieve a list of the groups in which I am interested. I am having trouble getting group members though.
server = Server('ldaps.ad.company.com', use_ssl=True, get_info=ALL)
with Connection(server, 'mydomain\\ldapUser', '******', auto_bind=True) as conn:
base = "OU=AccountGroups,OU=UsersAndGroups,OU=WidgetDepartment," \
+ "OU=LocalLocation,DC=ad,DC=company,DC=com"
criteria = """(
&(objectClass=group)
(
|(sAMAccountName=grp-*widgets*)
(sAMAccountName=grp-oldWidgets)
)
)"""
attributes = ['sAMAccountName', 'distinguishedName']
conn.search(base, criteria, attributes=attributes)
groups = conn.entries
At this point groups contains all the groups I want. I want to itterate over the groups to collect the members.
for group in groups:
# print(cn)
criteria = f"""
(&
(objectClass=person)
(memberof:1.2.840.113556.1.4.1941:={group.distinguishedName})
)
"""
# criteria = f"""
# (&
# (objectClass=person)
# (memberof={group.distinguishedName})
# )
# """
attributes = ['displayName', 'sAMAccountName', 'mail']
conn.search(base, criteria, attributes=attributes)
people = conn.entries
I know there are people in the groups but people is always an empty list. It doesn't matter if I do a recirsive search or not.
What am I missing?
Edit
There is a longer backstory to this question that is too long to go into. I have a theory about this particular issue though. I was running out of time and switched to a different python LDAP library -- which is working. I think the issue with this question might be that I "formated" the query over multiple lines. The new ldap lib (python-ldap) complained and I stripped out the newlines and it just worked. I have not had time to go back and test that theory with ldap3.
people is overwritten in each iteration of your loop over groups.
Maybe the search result for the last group entry in groups is just empty.
You should initialise an empty list outside of your loop and extend it with your results:
people = []
for group in groups:
...
conn.search(...)
people.extend(conn.entries)
Another note about your code snippet above. When combining objectClass definitions with attribute definitions in your search filter you may consider using the Reader class which will combine those internally.
Furthermore I would like to point out that I've created an object relational mapper where you can simply define your queries using declarative python syntax, e.g.:
from ldap3_orm import ObjectDef, Reader
from ldap3_orm.config import config
from ldap3_orm.connection import conn
PersonDef = ObjectDef("person", conn)
r = Reader(conn, PersonDef, config.base_dn, PersonDef.memberof == group.distinguishedName)
r.search()
ldap3-orm documentation can be found at http://code.bsm-felder.de/doc/ldap3-orm

MongoDB findOneAndReplace log if added as new document or replaced

I'm using mongo's findOneAndReplace() with upsert = true and returnNewDocument = true
as basically a way to not insert duplicate. But I want to get the _id of the new inserted document (or the old existing document) to be passed to a background processing task.
BUT I also want to log if the document was Added-As-New or if a Replacement took place.
I can't see any way to use findOneAndReplace() with these parameters and answer that question.
The only think I can think of is to find, and insert in two different requests which seems a bit counter-productive.
ps. I'm actually using pymongo's find_one_and_replace() but it seems identical to the JS mongo function.
EDIT: edited for clarification.
Is it not possible to use replace_one function ? In java I am able to use repalceOne which returns UpdateResult. That has method for finding if documented updated or not. I see repalce_one in pymongo and it should behave same. Here is doc PyMongo Doc Look for replace_one
The way I'm going to implement it for now (in python):
import pymongo
def find_one_and_replace_log(collection, find_query,
document_data,
log={}):
''' behaves like find_one_or_replace(upsert=True,
return_document=pymongo.ReturnDocument.AFTER)
'''
is_new = False
document = collection.find_one(find_query)
if not document:
# document didn't exist
# log as NEW
is_new = True
new_or_replaced_document = collection.find_one_and_replace(
find_query,
document_data,
upsert=True,
return_document=pymongo.ReturnDocument.AFTER
)
log['new_document'] = is_new
return new_or_replaced_document