Algolia Facet sorting numerically - algolia

I'm using the algoliasearch and algoliasearchHelper libraries to build an instant search interface using the hogan template example on the algolia site.
I'm having an issue with sorting a numerical facet. I'm populating the index using the algoliasearch-client-php installed via composer. I'm passing an integer into the index object like so:
"cost_to_build" => (int) $project->data['approximate_cost'],
But in the index, I'm getting something like:
cost_to_build: "15.00"
which then results in a facet order like:
15, 25, 3, 5, 6.
Even thoguh {sortBy: ['name:asc']}. If I manually change all of my index values to integers from strings (too many to really do manually, plus we update it regularly), the sorting works as desired.
Anyone have any tips?
Thanks!

The fact that the value gets transformed from integer to strings is really surprising in itself and I don't have a clue on why this would happen.
However, there is still an easy solution without fixing the root cause. The sortBy parameter is also able to accept a comparison function, so you can cast those values to integers in your front-end instead of your back-end.
Something along those lines should work :
helper.on('results', function(content){
//get values ordered only by count ascending using a function
content.getFacetValues('cost_to_build', {
sortBy: function(a, b) {
return parseInt(a.name, 10) - parseInt(b.name, 10);
}
});
});

Related

Ag-grid setFilter in server side filtering

I just want to check if it's possible to give setFilter values in callback, in form of complex objects instead of array of strings. Reason why we need setFilter to have complex objects is because we are using server side filtering and we would like to show labels in filter, but send back keys to server to perform filtering.
If we have for example objects like {name: 'some name', id: 1} we would like to show 'some name' in filter UI but when that filter is selected we need associated id (in this case 1).
By looking into source code of setFilter and corresponding model, it seems like this is not possible.
Is there a way maybe I am missing that this could work?
ag-Grid version 23.2.0
I have exactly the same problem, from the interface it seems impossible indeed because of expected string[] values
interface SetFilterValuesFuncParams {
// The function to call with the values to load into the filter once they are ready
success: (values: string[]) => void;
// The column definition object from which the set filter is invoked
colDef: ColDef;
}

How does MongoDB sort/compare objects

Couldn't find any clear documentation on how does MongoDB compare/sort complex objects. I've tried some examples and found out property order does matter and property names also matter
Examples:-
Order matter
{“name”: {“first”: “A”, “last”: “B”}} != {“name”: {“last”: “B”, “first”: “A”}}
Values matter
{“name”: {“first”: “A”}} < {“name”: {“first”: “B”}}
property names matter
{“name”: {“**f**irst”: “A”}} < {“name”: {“**g**irst”: “A”}}
So wondering how exactly does that work, I'm sure stuff like missing properties would also affect this.
If you sort on embedded object fields like name, the sort comparisons are done using at the binary representation (BSON object) level, which isn't very useful.
What you typically want to do instead is identity the specific fields within those objects using dot notation, putting them in the order you want:
// Sort on last name, and then first name
db.test.find().sort({'name.last': 1, 'name.first': 1})

Updating multiple complex array elements in MongoDB

I know this has been asked before, but I have yet to find a solution that works efficiently. I am working with the MongoDB C# driver, though this is more of a general question about MongoDB operations.
I have a document structure that looks something like this:
field1: value1
field2: value2
...
users: [ {...user 1 subdocument...}, {...user 2 subdocument...}, ... ]
Some facts:
Each user subdocument includes further sub-arrays & subdocuments (so they're fairly complex).
The average users array only contains about 5 elements, but in the worst case can surpass 100.
Several thousand update operations on multiple users may be conducted per day in this system, each on one document at a time. Larger arrays will receive more frequent updates due to their data size.
I am trying to figure out how to do this efficiently. From what I've heard, you cannot directly set several array elements to new values all at once, so I had to try something else.
I tried using the $pullAll / $AddToSet + $each operations to remove the old array and replace it with a modified one. I am aware that $pullall can remove only the elements that I need as well, but I would like to preserve the order of elements.
The C# code:
try
{
WriteConcernResult wcr = collection.Update(query,
Update.Combine(Update.PullAll("users"),
Update.AddToSetEach("users", newUsers.ToArray())));
}
catch (WriteConcernException wce)
{
return wce.Message;
}
In this case newUsers is aList<BsonValue>converted to an array. However I am getting the following exception message:
Cannot update 'users' and 'users' at the same time
By the looks of it, I can't have two update statements in use on the same field in the same write operation.
I also tried Update.Set("users", newUsers.ToArray()), but apparently the Set statement doesn't work with arrays, just basic values:
Argument 2: cannot convert from 'MongoDB.Bson.BsonValue[]' to 'MongoDB.Bson.BsonValue'
So then I tried converting that array to a BsonDocument:
Update.Set("users", newUsers.ToArray().ToBsonDocument());
And got this:
An Array value cannot be written to the root level of a BSON document.
I could try replacing the whole document, but that seems like overkill and definitely not very efficient.
So the only thing I can think of now is to run two separate write operations: one to remove the unwanted old users and another to replace them with their newer versions:
WriteConcernResult wcr = collection.Update(query, Update.PullAll("users"));
WriteConcernResult wcr = collection.Update(query, Update.AddToSetEach("users", newUsers.ToArray()));
Is this my best option? Or is there another, better way of doing this?
Your code should work with a minor change:
Update.Set("users", new BsonArray(newUsers));
BsonArray is a BsonValue, where as an array of documents is not and we don't implicitly convert arrays like we do other primitive values.
this extension method solve my problem:
public static class MongoExtension
{
public static BsonArray ToBsonArray(this IEnumerable list)
{
var array = new BsonArray();
foreach (var item in list)
array.Add((BsonValue) item);
return array;
}
}

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.
Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.
You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]

Make Lucene index a value and store another

I want Lucene.NET to store a value while indexing a modified, stripped-down version of the stored value. e.g. Consider the value:
this_example-has some/weird (chars) 100%
I want it stored right like that (so that I can retrieve exactly that for showing in the results list), but I want lucene to index it as:
this example has some weird chars 100
(you see, like a "sanitized" version of the original value) for a simplified search.
I figure this would be the job of an analyzer, but I don't want to mess with rolling my own. Ideally, the solution should remove everything that is not a letter, a number or quotes, replacing the removed chars by a white-space before indexing.
Any suggestions on how to implement that?
This is because I am indexing products for an e-commerce search, and some have realy creepy names. I think this would improve search assertiveness.
Thanks in advance.
If you don't want a custom analyzer, try storing the value as a separate non-indexed field, and use a simple regex to generate the sanitized version.
var input = "this_example-has some/weird (chars) 100%";
var output = Regex.Replace(input, #"[\W_]+", " ");
You mention that you need another Analyzer for some searching functionality. Dont forget the PerFieldAnalyzerWrapper which will allow you to use different analyzers within the same document.
public static void Main() {
var wrapper = new PerFieldAnalyzerWrapper(defaultAnalyzer: new StandardAnalyzer(Version.LUCENE_29));
wrapper.AddAnalyzer(fieldName: "id", analyzer: new KeywordAnalyzer());
IndexWriter writer = null; // TODO: Retrieve these.
Document document = null;
writer.AddDocument(document, analyzer: wrapper);
}
You are correct that this is the work of the analyzer. And I'd start by using a tool like luke to see what the standard analyzer does with your term before getting into what to use -- it tends to do a good job stripping noise characters and words.