Changing a value in a listfield of embedded documents on MongoEngine - mongodb

I am learning how to use MongoEngine and MongoDB, and I know how to query over Listfield(EmbeddedDocumentField) from this question:
can't query over ListField(EmbeddedDocumentField)
Sort of nasty. Kind of wish there was something easier than that.
I know how to change the name of an agent using the same example from the link:
Agent.objects(name="Brenna Li").update_one(set__name="Brenna Smith")
But how can I change a value inside an embedded document in a listfield? For example, what is the code I need to change Brenna Li's skill level in C++ from a 6 to a 8 and her skill level in Java from 4 to a 5?

You can use the positional operator $ or S in mongoengine (so it can be used as a keyword argument). However, you can only update a single match at a time. Making it impossible to update both the Java and C++ levels in a single query - without replacing the whole Skills list (which wouldnt be very safe).
To do it in two queries you could do something like:
class Skill(EmbeddedDocument):
name = StringField(required = True)
level = IntField(required = True)
class Agent(Document):
name = StringField(required = True)
email = EmailField(required = True, unique = True)
skills = ListField(EmbeddedDocumentField(Skill))
Agent.drop_collection()
Agent(name="Brenna Li", email="br#example.com",
skills=[Skill(name="Java", level=2),
Skill(name="Surfing", level=6),
Skill(name="c++", level=4)]).save()
Agent.objects.filter(name="Brenna Li", skills__name="Java").update(set__name="Brenna Smith", inc__skills__S__level=1)
Agent.objects.filter(name="Brenna Smith", skills__name="c++").update(inc__skills__S__level=1)

Related

Getting ElasticSearch document fields inside of loaded records in searchkick

Is it possible to get ElasticSearch document fields inside of loaded AR records?
Here is a gist that illustrates what I mean: https://gist.github.com/allomov/39c30905e94c646fb11637b45f43445d
In this case I want to avoid additional computation of total_price after getting response from ES. The solution that I currently see is to include the relationship and run total_price computation for each record, which is not so optimal way to perform this operation, as I see it.
result = Product.search("test", includes: :product_components).response
products_with_total_prices = result.map do |product|
{
product: product
total_price: product.product_components.map(&:price).compact.sum
}
end
Could you please tell if it is possible to mix ES document fields into AR loaded record?
As far as I'm aware it isn't possible to get a response that merges the document fields into the loaded record.
Usually I prefer to completely rely on the data in the indexed document where possible (using load: false as a search option), and only load the AR record(s) as a second step if necessary. For example:
result = Product.search("test", load: false).response
# If you also need AR records, could do something like:
product_ids = result.map(&:id)
products_by_id = {}
Product.where(id: product_ids).find_each do |ar_product|
products_by_id[ar_product.id] = ar_product
end
merged_result = result.map do |es_product|
es_product[:ar_product] = products_by_id[es_product.id]}
end
Additionally, it may be helpful to retrieve the document stored in the ES index for a specific record, which I would normally do by defining the following method in your Product class:
def es_document
return nil unless doc = Product.search_index.retrieve(self).presence
Hashie::Mash.new doc
end
You can use select: true and the with_hit method to get the record and the search document together. For your example:
result = Product.search("test", select: true)
products_with_total_prices =
result.with_hit.map do |product, hit|
{
product: product,
total_price: hit["_source"]["total_price"]
}
end

CS193P Smashtag Popularity Extra Task #3 - Get only new tweets using "IN" keyword

I'm working on Stanford CS193p's SmashTag Popularity Mentions assignment (asst. #5) and I've got everything working well. I'm working on Extra Task #3:
Loading up a lot of data by querying for an existing instance in the database, then inserting if not found over and over again, one by one (like we did in lecture), can be pretty poor performing. Enhance your application to make this more efficient by checking for the existence of a pile of things you want to be in the database in one query (then only creating the ones that don’t exist). The predicate operator IN might be of value here.
I've managed to do something like this, but not using the IN predicate operator! This is inside the perform block of my updateDatabase(with newTweets:) method: (I've called my core data entity CDTweet
if let databaseTweets = try? globalContext!.fetch(NSFetchRequest<CDTweet>(entityName: "CDTweet")) {
let databaseIDs = databaseTweets.map { $0.id! }
for twitterInfo in newTweets where !databaseIDs.contains(twitterInfo.id) {
_ = CDTweet.createCDTweet(with: twitterInfo, into: globalContext!)
}
}
As you can see, I get all the tweets in the database, get their IDs, and then only create a new tweet for internet-fetched-Twitter.Tweets whose IDs are not in the array of database tweets.
This appears to function properly (i.e., create only the new tweets), but I am very curious how the instructor imagined it would work with the IN predicate operator. Has anyone done this and can lend some insight?
Note: A number of solutions I've seen have a core data entity for the Search Term (usually called Query). I don't have this, only entities for Tweet and Mention, and I have everything working fine.
You need something like this (i assume searchIDs is an array of values you are looking for):
var searchIDs = // ... an array of IDs you are searching for
var fetchRequest = NSFetchRequest<CDTweet>(entityName: "CDTweet")
let predicate = NSPredicate(format: "id IN %#", searchIDs)
fetchRequest.predicate = predicate
databaseTweets = try? globalContext!.fetch(fetchRequest) {
// here you should only get all entries with IDs in the newTweets array
}
Details about predicates can be found here, about predicate syntax especially here

How does Mongoengine decide if 2 EmbeddedDocuments are equal or not?

I have the following Mongoengine document:
class MyEmbed(EmbeddedDocument):
embedField = StringField(primary_key=True)
varField = StringField()
class TestDoc(Document):
myField = StringField()
embed_list = ListField(EmbeddedDocumentField(MyEmbed))
So I keep a list of embedded documents, to which I wish to add new documents if they don't exist already. The problem is that when I use the atomic update operator add_to_set things don't turn out the way I want them to.
This is what I am trying to do:
embed1 = models.MyEmbed(embedField="F1")
parent = models.TestDoc(myField="ParentField")
embed_list = []
embed_list.append(embed1)
parent.embed_list = embed_list
parent.save()
embed2 = models.MyEmbed(embedField="F1", varField="varField")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed2)
The problem is that after doing this, I have in the DB a list of embedded documents with 2 elements. And what I want is to decide upon one field (embedField in this case) whether 2 EmbeddedDocuments are equal or not, and not by taking into account all the properties. My questions are:
What are the default criteria according to which Mongoengine decides whether 2 EmbeddedDocuments are equal or not?
How can I redefine the function that makes Mongoengine decide when 2 EmbeddedDocuments are equal or not?
Thanks!
The actual checking is done inside MongoDB and not mongoengine.
The object sent to mongodb should be the same, but this is where it gets tricky as with BSON order is important and in python with dictionaries its not. When converting to send to mongodb mongoengine just passes a dictionary. This is a bug - so I've added #296 and will fix for 0.8
See https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/document.py#L51 and https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/base/document.py#L52:
def __eq__(self, other):
if isinstance(other, self.__class__):
return self._data == other._data
return False
It compare dicts of Embedded documents data. So you can override this method.
If you look at Document update that calls QuerySet update (find add_to_set and addToSet) you can find that mongoengine doesnt't check exists document in list and just call mongo $addToSet operation: https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/queryset/transform.py#L156.
In your code you have document MyEmbed(embedField="F1") and try add another document MyEmbed(embedField="F1", varField="varField") so logic right: it add new document. If you try next code:
embed1 = models.MyEmbed(embedField="F1")
parent = models.TestDoc(myField="ParentField")
embed_list = []
embed_list.append(embed1)
parent.embed_list = embed_list
parent.save()
embed2 = models.MyEmbed(embedField="F1", varField="varField")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed2)
embed3 = models.MyEmbed(embedField="F1")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed3)
embed4 = models.MyEmbed(embedField="F1", varField="varField")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed4)
you can find that parent contains only embed1 and embed2.
So, to resolve you problem you can override __eq__ method and check document in list, but you must find another solution for update document list, because it have direct call of mongo method.

Does Mongodb have a special value that's ignored in queries?

My web application runs on MongoDB, using python and pyMongo. I get this scenario a lot - code that reads something like:
from pymongo import Connnection
users = Connection().db.users
def findUsers(firstName=None, lastName=None, age=None):
criteria = {}
if firstName:
criteria['firstName'] = firstName
if lastName:
criteria['lastName'] = lastName
if age:
criteria['age'] = age
query = users.find(criteria)
return query
I find that kind of messy how I need an if statement for every value that's optional to figure out if it's needs to go into the search criteria. If only there were a special query value that mongo ignored in queries. Then my code could look like this:
def findUsers(firstName=<ignored by mongo>, lastName=<ignored by mongo>, age=<ignored by mongo>):
query = users.find({'firstName':firstName, 'lastName':lastName, 'age':age})
return query
Now isn't that so much cleaner than before, especially if you have many more optional parameters. Any parameters that aren't specified default to something mongo just ignores. Is there any way to do this? Or at-least something more concise than what I currently have?
You're probably better off filtering your empty values in Python. You don't need a separate if-statement for each of your values. The local variables can be accessed by locals(), so you can create a dictionary by filtering out all keys with None value.
def findUsers(firstName=None, lastName=None, age=None):
loc = locals()
criteria = {k:loc[k] for k in loc if loc[k] != None}
query = users.find(criteria)
Note that this syntax uses dictionary comprehensions, introduced in Python 2.7. If you're running an earlier version of Python, you need to replace that one line with
criteria = dict((k, loc[k]) for k in loc if loc[k] != None)

Filter array using NSPredicate and obtains new object composed by some elements in the query

I've got an array like that
Word array (
{
translation = (
{
name = Roma;
lang = it;
},
{
name = Rome;
lang = en;
}
);
type = provenance;
value = RMU;
},
{
translation = (
{
name = "Milano";
lang = it;
},
{
name = "Milan";
lang = en;
}
);
type = destination;
value = MIL;
},)
The idea is to filter it using an NSPredicate and receive and an array of dictionaries based on the lang key, I'd like to get something like this made by filtering for lang == it,
Word array (
{
name = Roma;
lang = it;
type = provenance;
value = RMU;
},
{
name = "Milano";
lang = it;
type = destination;
value = MIL;
})
I can't simplify the data because it comes from a "JSON" service.
I've tried different predicates using SUBQUERY but none of them works, documentation about SUBQUERY is pretty poor, I'm missing something, probably the problem is that I'd like to receive an object that is really different from the source.
Of course I'm able to obtain that structure enumerating, I'm wondering if there is a shorter solution
This answer from Dave DeLong link to SUBQUERY explanation gave a me a lot of hints about SUBQUERY, but I'm not able to find a solution to my problem.
Can someone give me a hints about?
You can't do this with a predicate. (Well, you could, but it would be stupidly complex, difficult to understand and maintain, and in the end it would be easier to write the code yourself)
NSPredicate is for extracting a subset of data from an existing set. It only* does filtering, because a predicate is simply a statement that evaluates to true or false. If you have a collection and filter it with a predicate, then what happens is the collection starts iterating over its elements and asks the predicate: "does this pass your test?" "does this pass your test?" "does this pass your test?"... Every time that the predicate answers "yes this passes my test", the collection adds that object to a new collection. It is that new collection that is returned from the filter method.
THUS:
NSPredicate does not (easily) allow for merging two sets of data (which is what you're asking for). It is possible (because you can do pretty much anything with a FUNCTION() expression), but it makes for inherently unreadable predicates.
SO:
Don't use NSPredicate to merge your dataset. Do it yourself.