Getting ElasticSearch document fields inside of loaded records in searchkick - searchkick

Is it possible to get ElasticSearch document fields inside of loaded AR records?
Here is a gist that illustrates what I mean: https://gist.github.com/allomov/39c30905e94c646fb11637b45f43445d
In this case I want to avoid additional computation of total_price after getting response from ES. The solution that I currently see is to include the relationship and run total_price computation for each record, which is not so optimal way to perform this operation, as I see it.
result = Product.search("test", includes: :product_components).response
products_with_total_prices = result.map do |product|
{
product: product
total_price: product.product_components.map(&:price).compact.sum
}
end
Could you please tell if it is possible to mix ES document fields into AR loaded record?

As far as I'm aware it isn't possible to get a response that merges the document fields into the loaded record.
Usually I prefer to completely rely on the data in the indexed document where possible (using load: false as a search option), and only load the AR record(s) as a second step if necessary. For example:
result = Product.search("test", load: false).response
# If you also need AR records, could do something like:
product_ids = result.map(&:id)
products_by_id = {}
Product.where(id: product_ids).find_each do |ar_product|
products_by_id[ar_product.id] = ar_product
end
merged_result = result.map do |es_product|
es_product[:ar_product] = products_by_id[es_product.id]}
end
Additionally, it may be helpful to retrieve the document stored in the ES index for a specific record, which I would normally do by defining the following method in your Product class:
def es_document
return nil unless doc = Product.search_index.retrieve(self).presence
Hashie::Mash.new doc
end

You can use select: true and the with_hit method to get the record and the search document together. For your example:
result = Product.search("test", select: true)
products_with_total_prices =
result.with_hit.map do |product, hit|
{
product: product,
total_price: hit["_source"]["total_price"]
}
end

Related

How can I upsert a record and array element at the same time?

That is meant to be read as a dual upsert operation, upsert the document then the array element.
So MongoDB is a denormalized store for me (we're event sourced) and one of the things I'm trying to deal with is the concurrent nature of that. The problem is this:
Events can come in out of order, so each update to the database need to be an upsert.
I need to be able to not only upsert the parent document but an element in an array property of that document.
For example:
If the document doesn't exist, create it. All events in this stream have the document's ID but only part of the information depending on the event.
If the document does exist, then update it. This is the easy part. The update command is just written as UpdateOneAsync and as an upsert.
If the event is actually to update a list, then that list element needs to be upserted. So if the document doesn't exist, it needs to be created and the list item will be upserted (resulting in an insert); if the document does exist, then we need to find the element and update it as an upsert, so if the element exists then it is updated otherwise it is inserted.
If at all possible, having it execute as a single atomic operation would be ideal, but if it can only be done in multiple steps, then so be it. I'm getting a number of mixed examples on the net due to the large change in the 2.x driver. Not sure what I'm looking for beyond the UpdateOneAsync. Currently using 2.4.x. Explained examples would be appreciated. TIA
Note:
Reiterating that this is a question regarding the MongoDB C# driver 2.4.x
Took some tinkering, but I got it.
var notificationData = new NotificationData
{
ReferenceId = e.ReferenceId,
NotificationId = e.NotificationId,
DeliveredDateUtc = e.SentDate.DateTime
};
var matchDocument = Builders<SurveyData>.Filter.Eq(s => s.SurveyId, e.EntityId);
// first upsert the document to make sure that you have a collection to write to
var surveyUpsert = new UpdateOneModel<SurveyData>(
matchDocument,
Builders<SurveyData>.Update
.SetOnInsert(f => f.SurveyId, e.EntityId)
.SetOnInsert(f => f.Notifications, new List<NotificationData>())){ IsUpsert = true};
// then push a new element if none of the existing elements match
var noMatchReferenceId = Builders<SurveyData>.Filter
.Not(Builders<SurveyData>.Filter.ElemMatch(s => s.Notifications, n => n.ReferenceId.Equals(e.ReferenceId)));
var insertNewNotification = new UpdateOneModel<SurveyData>(
matchDocument & noMatchReferenceId,
Builders<SurveyData>.Update
.Push(s => s.Notifications, notificationData));
// then update the element that does match the reference ID (if any)
var matchReferenceId = Builders<SurveyData>.Filter
.ElemMatch(s => s.Notifications, Builders<NotificationData>.Filter.Eq(n => n.ReferenceId, notificationData.ReferenceId));
var updateExistingNotification = new UpdateOneModel<SurveyData>(
matchDocument & matchReferenceId,
Builders<SurveyData>.Update
// apparently the mongo C# driver will convert any negative index into an index symbol ('$')
.Set(s => s.Notifications[-1].NotificationId, e.NotificationId)
.Set(s => s.Notifications[-1].DeliveredDateUtc, notificationData.DeliveredDateUtc));
// execute these as a batch and in order
var result = await _surveyRepository.DatabaseCollection
.BulkWriteAsync(
new []{ surveyUpsert, insertNewNotification, updateExistingNotification },
new BulkWriteOptions { IsOrdered = true })
.ConfigureAwait(false);
The post linked as being a dupe was absolutely helpful, but it was not the answer. There were a few things that needed to be discovered.
The 'second statement' in the linked example didn't work
correctly, at least when translated literally. To get it to work, I had to match on the
element and then invert the logic by wrapping it in the Not() filter.
In order to use 'this index' on the match, you have to use a
negative index on the array. As it turns out, the C# driver will
convert any negative index to the '$' character when the query is
rendered.
In order to ensure they are run in order, you must include bulk write
options with IsOrdered set to true.

CS193P Smashtag Popularity Extra Task #3 - Get only new tweets using "IN" keyword

I'm working on Stanford CS193p's SmashTag Popularity Mentions assignment (asst. #5) and I've got everything working well. I'm working on Extra Task #3:
Loading up a lot of data by querying for an existing instance in the database, then inserting if not found over and over again, one by one (like we did in lecture), can be pretty poor performing. Enhance your application to make this more efficient by checking for the existence of a pile of things you want to be in the database in one query (then only creating the ones that don’t exist). The predicate operator IN might be of value here.
I've managed to do something like this, but not using the IN predicate operator! This is inside the perform block of my updateDatabase(with newTweets:) method: (I've called my core data entity CDTweet
if let databaseTweets = try? globalContext!.fetch(NSFetchRequest<CDTweet>(entityName: "CDTweet")) {
let databaseIDs = databaseTweets.map { $0.id! }
for twitterInfo in newTweets where !databaseIDs.contains(twitterInfo.id) {
_ = CDTweet.createCDTweet(with: twitterInfo, into: globalContext!)
}
}
As you can see, I get all the tweets in the database, get their IDs, and then only create a new tweet for internet-fetched-Twitter.Tweets whose IDs are not in the array of database tweets.
This appears to function properly (i.e., create only the new tweets), but I am very curious how the instructor imagined it would work with the IN predicate operator. Has anyone done this and can lend some insight?
Note: A number of solutions I've seen have a core data entity for the Search Term (usually called Query). I don't have this, only entities for Tweet and Mention, and I have everything working fine.
You need something like this (i assume searchIDs is an array of values you are looking for):
var searchIDs = // ... an array of IDs you are searching for
var fetchRequest = NSFetchRequest<CDTweet>(entityName: "CDTweet")
let predicate = NSPredicate(format: "id IN %#", searchIDs)
fetchRequest.predicate = predicate
databaseTweets = try? globalContext!.fetch(fetchRequest) {
// here you should only get all entries with IDs in the newTweets array
}
Details about predicates can be found here, about predicate syntax especially here

Meteor: return subset of attributes from Mongo

Im querying Mongo to get the user item, but I only want to pass through a subset of the info to the template. My current solution is this:
var returnUsers = [];
var users = Meteor.users.find().fetch();
for (var i = 0; i < users.length; i++) {
returnUsers.push(users[i].profile);
}
console.log(returnUsers);
return returnUsers;
But I'm losing the iterator. Ideally I want to just return the profile object of each user. How do you do that?
There is little point in doing this on the client. Returning a cursor with fields you don't end up using from minimongo is normally just as fast or faster than filtering fields out in javascript.
Especially for the Users collection you want to filter out the extra fields in your publication from the server. For example:
Meteor.publish('allUsers',function(){
return Meteor.users.find({},{ fields: { profile: 1 }});
});
This will publish the profile data and the _id for each user. Then when you do
Meteor.users.find({});
on the client you will only get the profile data and _id without any need to do extra filtering.
Note that the fields option only allows you to define a set of fields to include or exclude together. You cannot mix include and exclude:
{ fields: { key1: 0, key2: 1 }}
will fail.
There is no security benefit to filtering fields on the client either. The user has full access to the published collection from the console.
Seeing as you want to keep cursor as per comment in previous answer remove the fetch as this turns it into an array not a cursor and add fields like below
return Meteor.users.find({},{fields:{profile:1}});
This won't give you only profile but will also return the id as this is always sent regardless of the fields specified to return.
use `map`
var profiles=Meteor.users.find().map(function(a){return a.profile})

How does Mongoengine decide if 2 EmbeddedDocuments are equal or not?

I have the following Mongoengine document:
class MyEmbed(EmbeddedDocument):
embedField = StringField(primary_key=True)
varField = StringField()
class TestDoc(Document):
myField = StringField()
embed_list = ListField(EmbeddedDocumentField(MyEmbed))
So I keep a list of embedded documents, to which I wish to add new documents if they don't exist already. The problem is that when I use the atomic update operator add_to_set things don't turn out the way I want them to.
This is what I am trying to do:
embed1 = models.MyEmbed(embedField="F1")
parent = models.TestDoc(myField="ParentField")
embed_list = []
embed_list.append(embed1)
parent.embed_list = embed_list
parent.save()
embed2 = models.MyEmbed(embedField="F1", varField="varField")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed2)
The problem is that after doing this, I have in the DB a list of embedded documents with 2 elements. And what I want is to decide upon one field (embedField in this case) whether 2 EmbeddedDocuments are equal or not, and not by taking into account all the properties. My questions are:
What are the default criteria according to which Mongoengine decides whether 2 EmbeddedDocuments are equal or not?
How can I redefine the function that makes Mongoengine decide when 2 EmbeddedDocuments are equal or not?
Thanks!
The actual checking is done inside MongoDB and not mongoengine.
The object sent to mongodb should be the same, but this is where it gets tricky as with BSON order is important and in python with dictionaries its not. When converting to send to mongodb mongoengine just passes a dictionary. This is a bug - so I've added #296 and will fix for 0.8
See https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/document.py#L51 and https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/base/document.py#L52:
def __eq__(self, other):
if isinstance(other, self.__class__):
return self._data == other._data
return False
It compare dicts of Embedded documents data. So you can override this method.
If you look at Document update that calls QuerySet update (find add_to_set and addToSet) you can find that mongoengine doesnt't check exists document in list and just call mongo $addToSet operation: https://github.com/MongoEngine/mongoengine/blob/master/mongoengine/queryset/transform.py#L156.
In your code you have document MyEmbed(embedField="F1") and try add another document MyEmbed(embedField="F1", varField="varField") so logic right: it add new document. If you try next code:
embed1 = models.MyEmbed(embedField="F1")
parent = models.TestDoc(myField="ParentField")
embed_list = []
embed_list.append(embed1)
parent.embed_list = embed_list
parent.save()
embed2 = models.MyEmbed(embedField="F1", varField="varField")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed2)
embed3 = models.MyEmbed(embedField="F1")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed3)
embed4 = models.MyEmbed(embedField="F1", varField="varField")
TestDoc.objects(id=parent.id).update_one(add_to_set__embed_list=embed4)
you can find that parent contains only embed1 and embed2.
So, to resolve you problem you can override __eq__ method and check document in list, but you must find another solution for update document list, because it have direct call of mongo method.

MongoDB C# offic. List<BsonObject> query issue and always olds values?

I have not clearly issue during query using two criterials like Id and Other. I use a Repository storing some data like id,iso,value. I have created an index("_id","Iso") to performs queries but queries are only returning my cursor if i use only one criterial like _id, but is returning nothing if a use two (_id, Iso) (commented code).
Are the index affecting the response or the query method are failing?
use :v1.6.5 and C# official.
Sample.
//Getting Data
public List<BsonObject> Get_object(string ID, string Iso)
{
using (var helper = BsonHelper.Create())
{
//helper.Db.Repository.EnsureIndex("_Id","Iso");
var query = Query.EQ("_Id", ID);
//if (!String.IsNullOrEmpty(Iso))
// query = Query.And(query, Query.EQ("Iso", Iso));
var cursor = helper.Db.Repository.FindAs<BsonObject>(query);
return cursor.ToList();
}
}
Data:
{
"_id": "2345019",
"Iso": "UK",
"Data": "Some data"
}
After that I have Updated my data using Update.Set() methods. I can see the changed data using MongoView. The new data are correct but the query is always returning the sames olds values. To see these values i use a page that can eventually cached, but if add a timestamp at end are not changing anything, page is always returning the same olds data. Your comments are welcome, thanks.
I do not recall offhand how the C# driver creates indexes, but the shell command for creating an index is like this:
db.things.ensureIndex({j:1});
Notice the '1' which is like saying 'true'.
In your code, you have:
helper.Db.Repository.EnsureIndex("_Id","Iso");
Perhaps it should be:
helper.Db.Repository.EnsureIndex("_Id", 1);
helper.Db.Repository.EnsureIndex("Iso", 1);
It could also be related to the fact that you are creating indexes on "_Id" and the actual id field is called "_id" ... MongoDB is case sensitive.
Have a quick look through the index documentation: http://www.mongodb.org/display/DOCS/Indexes