How can I aggregate search results in Algolia by three different criteria and sort them in a specific way? - algolia

Apologies in advance, as I'm not a native English speaker! I'll try to be as clear as possible with what I'm trying to do:
I'm using Algolia InstantSearch in Angular in a marketplace website to provide my users with a search widget. I've been tasked with having results displayed following this logic:
Top result: Best reviewed product
Second result: Most purchased product
Third result: Most recently published product
This "block" should repeat as long as there's results, so the fourth result would need to be the second best reviewed product, fifth would be the second most purchased, third the second most recently published product, and so on. This has the intention of allowing new sellers in the marketplace to get exposure, while rewarding those that have sold the most and had better reviews for their products simultaneously.
Is this possible in some way using Algolia? I've read the documentation on custom ranking (https://www.algolia.com/doc/guides/managing-results/must-do/custom-ranking/) and exhaustive sorting (https://www.algolia.com/doc/guides/managing-results/refine-results/sorting/in-depth/exhaustive-sort/) and I've only found how to set different ranking criteria which are applied one after the other with tie-breakers, but no information at all about how I might achieve this.

As you've discovered with Algolia, sorting occurs within the index itself. To sort by different criteria you create a replica of your index with different sorting criteria.
So you'd have a primary index + 3 replicas sorted by reviews, purchases, and date.
my_index
my_index-reviews-descending
my_index-purchase-descending
my_index-added-descending
You'll use the multi-index query to get results from all three replicas simultaneously:
https://www.algolia.com/doc/guides/building-search-ui/ui-and-ux-patterns/multi-index-search/angular/
Also, don't forget that to sort by date you'll want to out your dates as Unix timestamps. More info here:
https://www.algolia.com/doc/guides/managing-results/refine-results/sorting/how-to/sort-an-index-by-date/

Related

Creating Dynamic Filters in Tableau

I'm working in Tableau to help my school district visualize discipline data. I want to be able to disaggregate and filter by quite a few different measures (at least 13).
In the past, if I wanted to be able to disaggregate by a number of measures, I would make a parameter with a list of possible outputs, display each output as the name of a measure, then create a calculated field that returned the value from a given measure based on that parameter. This works fine for disaggregating.
However, filtering based on these values presents a challenge. The problem is that I'm not filtering based on any given measure, I'm filtering on a calculated field that returns the value in that measure. If my parameter is set to "Day" for instance, and I filter to Tuesday, but then switch to "Race", everything vanishes, because now my calculated field is returning race. What I want to create is a dropdown menu that lets you select from a number of different measures to filter by.
Below is a link to a packaged workbook that can help illustrate the problem that I'm dealing with.
I feel like something like this should be possible in Tableau, but there's some little trick that I'm missing. When I contacted their support team, their solutions were both only viable due to the limited number of measures I was using in the dummy data. The support team felt that this was possible as well, but they didn't know how.
https://public.tableau.com/profile/publish/DynamicFiltersUsingParameters/Sheet1#!/publish-confirm
You could create an Filter Action on the Tableau dashboard which carries over the 'Day' filter to give a smaller subset of data to work with for the next filter.

MongoDB scheme on a big project

We recently started to work in a big project and we decided to use MongoDB as a DDBB solution.
We wrote a lot of code, but the project has started to grow and we found out that we're trying to use joins instead of doing it the NoSQLway, which denotes a bad DDBB design.
What I'm trying to ask here is a good design for our project, which, at this point consists of the following:
More than 12.000 Products
More than 2.000 Sellers
Every seller should have its own private area that will allow to create a product catalog based on the +12.000 "products template list".
The seller should be able to set the price, stock and offers, which will then be reflected only in his public product listing. The template list of products will remain unchanged.
Currently we have two collections. One for the products (which holds the general product information, like name, description, photos, etc...) and one collection in which we store documents that contain the ID of the product from the first collection, an ID that is related to the seller and the stock, price and offers values.
We are using aggregate with $lookup to "emulate" SQL's left join to merge the two collections, but the process is not scaling as we'd like it to and we're hitting serious performance issues.
We're aware that using joins is not the way to go in NoSQL. What should we do? How should we refactor our DDBB design? Should we embed the prices, offers and stock for each seller in each document?
The decision of using "Embedded documents" or "Joins among two or more different collections" should depend on how you are going to retrieve the data.If every time,while fetching product, you are going to fetch sellers,then it makes sense to make it an embedded document instead of different collections.But if you will be planning to fetch these two entities separately, then only option you are left with is to use Join.

Which way of storing this data in MongoDB is more performant? Caching max/min values in Item collection or on-the-fly calculation based on all bids?

I'm working with a startup building an exchange platform where commodities from an Item collection with around 50,000 documents can be bought and sold by users, who create buy and sell bids for these items.
For our "buy it now"/"sell it now" features, it's required to calculate the best buy and sell bids for an item. Currently we are calculating these on the fly with an index in the UserBids collection on the buy and sell bids field (for a given Item document, let's say with ID 1234, we'll find all UserBids for item 1234 and get the maximum buy bid and minimum sell bid). This is used to present every user with the best price they can buy/sell an item instantly at, and requires a lot of queries on the UserBids collection, but prevents having to update a canonical 'best' price for each item.
I'm wondering if it would be more performant for the Item schema to have a MaxBuy and MinSell field. This would require the MaxBuy and MinSell fields for an Item document to receive an update every time a user enters a new bid, using something like Items.update({id: itemId, $or: [{maxBuy: {$lt: currentBuyBid}}, {maxBuy: null}]}). We would still have to perform the same number of queries to show a user the best price, but the queries wouldn't require an aggregation, and as the exchange grows, we expect the UserBids collection to grow much more than the Items collection (which should remain relatively the same size)
Bids may be added/modified regularly, but we expect the volume of users checking best buy/sell prices to be about 10-100 times greater. Is there a good way to evaluate which one of these approaches would be best?
This mostly depends on which use-case is more frequent and performance-critical:
a user placing a bid which would trigger a recalculation of said fields
someone checks the price
When you assume that the latter use-case is more frequent, this is the one you should optimize for.

how to avoid redundent in Yahoo answer api

I have a question about Yahoo answer api. I plan to use (questionSearch, getByCategory, getQuestion, getByUser). For example I used getByCategory to query. Each time I call the function, I can query max 50 questions. However, there are a lot of same questions which have been queried in previous time. So How can I remove this redundent ?
The API doesn't track what it has returned to you previously as its stateless.
This leaves you with two options that I can think of.
1) After you get your data back filter out what you already have. This requires you checking what is displayed and then not displaying duplicated items.
2) Store all ID's you have showing in a list, then adjust your YQL Query so that it provides that list of ID's as ones not to turn. Like:
select * from answers.getbycategory where category_id=2115500137 and type="resolved" and id not in ('20140216060544AA0tCLE', '20140215125452AAcNRTq', '20140215124804AAC1cQl');
The downside of this, is that it could effect performance since your YQL queries will start to take longer and longer to return.

Partition Lucene Index by ID across multiple indexes

I am trying to put together my Lucene search solution, and I'm having trouble figuring out how to start.
On my site, I want one search to span 5 different types of objects in my model.
I want my results to come back as one list, ordered by best match first, with a way to differentiate the type so I can show the data appropriately
Our system is split out into what we call sites. I want to index the 5 different model objects by site. Searching will always be done by site.
I'm not sure where to begin to index this system for optimal performance. I'm also not sure how best to implement the search for this setup. Any advice, articalse, and examples are greatly appreciated.
EDIT:
Since it has been said this is too broad,
Let's say I have 3 sites, Site 1, Site 2, and site 3.
Let's say I am indexing Dogs, Cats, and Hamsters. a record in each of these types is linked to a site.
So, for instance, my data might be (Type, Name, SiteId)
Dog, "Fido" 1
Cat, "Sprinkles", 2
Hamster, "Sprinkles", 2
Cat, "Mr. Pretty", 3
Cat, "Mr. Pretty 2", 3
So, when I do a search for "Mr. Pretty", I want to target a specific Site Id. If I go against site id 1, I'll get 0 results. If I search against site id 3, I'll get
Mr. Pretty
Mr. Pretty 2
And if I search for "Sprinkles" on Site 2, I will know that one result is a cat and the other result is a hamster.
What is the best way I can go about achieving this sort of search index?
As goalie7960 suggested, you can add a "SiteID" to each document and add a query term like siteid:3 to your query, in order to retrieve documents only from this site. You can also improve the performance of this by creating and storing a Filter for each different site, so you can apply it to the correspondent queries.
Regarding differente types in the same index, you could use the same strategy. Create a "type" field for each document with the corresponding type (maybe just an ID). Elasticsearch uses the same strategy to have different distinguishable types in the same index. Again, you can use Filters on the types to speed up queries (Elasticsearch does the same).