How to manually clear the cache of my algolia index? - algolia

I'm using the algolia client directly in my node js backend so i don't use instantsearch.js.
I can easily querying/indexing/updating etc. my algolia index but i can't find a way to clear the cache because my app always need to display an updated hits list in real time.
i've tried
client.initIndex('my index');
client.clearCache()
But without success. Always need to force unmount/remount manually my app to see the updated hits list.
Any solution ?

Its an old question but since there are no responses ....
Docs
Given the following instantiation:
const client = algoliasearch('H58KBL9VKQ', '••••••••••••••••••••');
const index = client.initIndex('your_index_name');
There are two methods.
client.clearCache() - when using multiple indices
index.clearCache() - when querying one index specifically
Note that BOTH return a Promise and both work only in the BROWSER, since the Node API doesn't cache results, there is no cache to clear.

Related

Issue in MongoDB document search

I am new to MongoDB. And I have the following issue on currently developing web application.
We have an application where we use mongoDB to store data.
And we have an API where we search for the document via text search.
As an example: if the user type “New York” then the request should send the all the available data in the collection to the keyword “New York". (Here we call the API for each letter typed.) We have nearly 200000 data in the DB. Once the user searches for a document then it returns nearly 4000 data for some keywords. We tried with limiting the data to 5 – so it returns the top 5 data, and not the other available data. And we tried without limiting data now it returns hundreds and thousands of data as I mentioned. And it causes the request to slow down.
At Frontend we Bind search results to a dropdown. (NextJs)
My question:
Is there an optimizing way to search a document?
Are there any suggestions of a suitable way that I can implement this requirement using mongoDB and net5.0?
Or any other Implementation methods regarding this requirement?
Following code segment shows the query to retrieve the data to the incomming keyword.
var hotels = await _hotelsCollection
.Find(Builders<HotelDocument>.Filter.Text(keyword))
.Project<HotelDocument>(hotelFields)
.ToListAsync();
var terminals = await _terminalsCollection
.Find(Builders<TerminalDocument>.Filter.Text(keyword))
.Project<TerminalDocument>(terminalFeilds)
.ToListAsync();
var destinations = await _destinationsCollection
.Find(Builders<DestinationDocument>.Filter.Text(keyword))
.Project<DestinationDocument>(destinationFields)
.ToListAsync();
So this is a classic "autocomplete" feature, there are some known best practices you should follow:
On the client side you should use a debounce feature, this is a most. there is no reason to execute a request for each letter. This is most critical for an autocomplete feature.
On the backend things can get a bit more complicated, naturally you want to be using a db that is suited for this task, specifically MongoDB have a service called Atlas search that is a lucene based text search engine.
This will get you autocomplete support out of the box, however if you don't want to make big changes to your infra here are some suggestions:
Make sure the field your searching on is indexed.
I see your executing 3 separate requests, consider using something like Task.WhenAll to execute all of them at once instead of 1 by 1, I am not sure how the client side is built but if all 3 entities are shown in the same list then ideally you merge the labels into 1 collection so you could paginate the search properly.
As mentioned in #2 you must add server side pagination, no search engine can exist without one. I can't give specifics on how you should implement it as you have 3 separate entities and this could potentially make pagination implementation harder, i'd consider wether or not you need all 3 of these in the same API route.

Efficiently retrieving all unique records for a facet value along with record counts in Algolia

I'm using Algolia to power search in my app. I have an index called prod_COACHES in which I have some records with an object key called speciality1.
The data structure for speciality1 looks like this:
I have enabled speciality1.itemName as an Algolia 'facet' so that I can filter on it. All good so far and working nicely. Now, in my Algolia dashboard I can see a nice bit of UI that shows me every unique facet (in this case my specialisations) along with the number of records for each facet:
As it happens, I want to show exactly this information on my own UI in my app but I'm not sure how to get this data from Algolia in the most efficient way. I'm using the client side AlgoliaSearch Javascript SDK. How do I run a search to retrieve every unique speciality1.itemName and the number of records for each unique speciality1.itemName so I can build my own UI just like the above?
I have gone through the docs and followed the examples but my question is really about finding the most efficient way to do this from someone who really knows Algolia well, rather than hack my own solution together. Thanks!
It looks like you've enabled attributesForFaceting on the attribute speciality1.itemName. You can retrieve the facet values for the given attribue with the search parameter facets. The Algolia response will now contain a map with value:count. Here is an example with the JavaScript client:
import algoliasearch from 'algoliasearch';
const client = algoliasearch('XXX', 'XXX');
const index = client.initIndex('XXX');
index.search('', {
facets: ['speciality1.itemName']
}).then(result => {
console.log(result.facets)
});
If you want to easily build a search UI, you should take a look at the InstantSearch libraries. It's built on top of Algolia to ease the state/ui management for such UI. Many flavours are available e.g. Vanilla, React.

How to optimize collection subscription in Meteor?

I'm working on a filtered live search module with Meteor.js.
Usecase & problem:
A user wants to do a search through all the users to find friends. But I cannot afford for each user to ask the complete users collection. The user filter the search using checkboxes. I'd like to subscribe to the matched users. What is the best way to do it ?
I guess it would be better to create the query client-side, then send it the the method to get back the desired set of users. But, I wonder : when the filtering criteria changes, does the new subscription erase all of the old one ? Because, if I do a first search which return me [usr1, usr3, usr5], and after that a search that return me [usr2, usr4], the best would be to keep the first set and simply add the new one to it on the client-side suscribed collection.
And, in addition, if then I do a third research wich should return me [usr1, usr3, usr2, usr4], the autorunned subscription would not send me anything as I already have the whole result set in my collection.
The goal is to spare processing and data transfer from the server.
I have some ideas, but I haven't coded enough of it yet to share it in a easily comprehensive way.
How would you advice me to do to be the more relevant possible in term of time and performance saving ?
Thanks you all.
David
It depends on your application, but you'll probably send a non-empty string to a publisher which uses that string to search the users collection for matching names. For example:
Meteor.publish('usersByName', function(search) {
check(search, String);
// make sure the user is logged in and that search is sufficiently long
if (!(this.userId && search.length > 2))
return [];
// search by case insensitive regular expression
var selector = {username: new RegExp(search, 'i')};
// only publish the necessary fields
var options = {fields: {username: 1}};
return Meteor.users.find(selector, options);
});
Also see common mistakes for why we limit the fields.
performance
Meteor is clever enough to keep track of the current document set that each client has for each publisher. When the publisher reruns, it knows to only send the difference between the sets. So the situation you described above is already taken care of for you.
If you were subscribed for users: 1,2,3
Then you restarted the subscription for users 2,3,4
The server would send a removed message for 1 and an added message for 4.
Note this will not happen if you stopped the subscription prior to rerunning it.
To my knowledge, there isn't a way to avoid removed messages when modifying the parameters for a single subscription. I can think of two possible (but tricky) alternatives:
Accumulate the intersection of all prior search queries and use that when subscribing. For example, if a user searched for {height: 5} and then searched for {eyes: 'blue'} you could subscribe with {height: 5, eyes: 'blue'}. This may be hard to implement on the client, but it should accomplish what you want with the minimum network traffic.
Accumulate active subscriptions. Rather than modifying the existing subscription each time the user modifies the search, start a new subscription for the new set of documents, and push the subscription handle to an array. When the template is destroyed, you'll need to iterate through all of the handles and call stop() on them. This should work, but it will consume more resources (both network and server memory + CPU).
Before attempting either of these solutions, I'd recommend benchmarking the worst case scenario without using them. My main concern is that without fairly tight controls, you could end up publishing the entire users collection after successive searches.
If you want to go easy on your server, you'll want to send as little data to the client as possible. That means every document you send to the client that is NOT a friend is waste. So let's eliminate all that waste.
Collect your filters (eg filters = {sex: 'Male', state: 'Oregon'}). Then call a method to search based on your filter (eg Users.find(filters). Additionally, you can run your own proprietary ranking algorithm to determine the % chance that a person is a friend. Maybe base it off of distance from ip address (or from phone GPS history), mutual friends, etc. This will pay dividends in efficiency in a bit. Index things like GPS coords or other highly unique attributes, maybe try out composite indexes. But remember more indexes means slower writes.
Now you've got a cursor with all possible friends, ranked from most likely to least likely.
Next, change your subscription to match those friends, but put a limit:20 on there. Also, only send over the fields you need. That way, if a user wants to skip this step, you only wasted sending 20 partial docs over the wire. Then, have an infinite scroll or 'load more' button the user can click. When they load more, it's an additive subscription, so it's not resending duplicate info. Discover Meteor describes this pattern in great detail, so I won't.
After a few clicks/scrolls, the user won't find any more friends (because you were smart & sorted them) so they will stop trying & move on to the next step. If you returned 200 possible friends & they stop trying after 60, you just saved 140 docs from going through the pipeline. There's your efficiency.

Marklogic REST API search for latest document version

We need to restrict a MarkLogic search to the latest version of managed documents, using Marklogic's REST api. We're using MarkLogic 6.
Using straight xquery, you can use dls:documents-query() as an additional-query option (see
Is there any way to restrict marklogic search on specific version of the document).
But the REST api requires XML, not arbitrary xquery. You can turn ordinary cts queries into XML easily enough (execute <some-element>{cts:word-query("hello world")}</some-element> in QConsole).
If I try that with dls:documents-query() I get this:
<cts:properties-query xmlns:cts="http://marklogic.com/cts">
<cts:registered-query>
<cts:id>17524193535823153377</cts:id>
</cts:registered-query>
</cts:properties-query>
Apart from being less than totally transparent... how safe is that number? We'll need to put it in our query options, so it's not something we can regenerate every time we need it. I've looked on two different installations here and the the number's the same, but is it guaranteed to be the same, and will it ever change? On, for example, a MarkLogic upgrade?
Also, assuming the number is safe, will the registered-query always be there? The documentation says that registered queries may be cleared by the system at various times, but it's talking about user-defined registered queries, and I'm not sure how much of that applies to internal queries.
Is this even the right approach? If we can't do this we can always set up collections and restrict the search that way, but we'd rather use dls:documents-query if possible.
The number is a registered query id, and is deterministic. That is, it will be the same every time the query is registered. That behavior has been invariant across a couple of major releases, but is not guaranteed. And as you already know, the server can unregister a query at any time. If that happens, any query using that id will throw an XDMP-UNREGISTERED error. So it's best to regenerate the query when you need it, perhaps by calling dls:documents-query again. It's safest to do this in the same request as the subsequent search.
So I'd suggest extending the REST API with your own version of the search endpoint. Your new endpoint could add dls:documents-query to the input query. That way the registered query would be generated in the same request with the subsequent search. For ML6, http://docs.marklogic.com/6.0/guide/rest-dev/extensions explains how to do this.
The call to dls:documents-query() makes sure the query is actually registered (on the fly if necessary), but that won't work from REST api. You could extend the REST api with a custom extension as suggested by Mike, but you could also use the following:
cts:properties-query(
cts:and-not-query(
cts:element-value-query(
xs:QName("dls:latest"),
"true",
(),
0
),
cts:element-query(
xs:QName("dls:version-id"),
cts:and-query(())
)
)
)
That is the query that is registered by dls:documents-query(). Might not be future proof though, so check at each upgrade. You can find the definition of the function in /Modules/MarkLogic/dls.xqy
HTH!

Anemone with Rails and MongoDB

I am preparing to write my first web crawler, and it looks like Anemone makes the most sense. There is built in support for MongoDB storage, and I am already using MongoDB via Mongoid in my Rails application. My goal is to store the crawled results, and then access them later via Rails. I have a couple of concerns:
1) At the end of this page, it says that "Note: Every storage engine will clear out existing Anemone data before beginning a new crawl." I would expect this to happen at the end of the crawl if I were using the default memory storage, but shouldn't the records be persisted to MongoDB indefinitely so that duplicate pages are not crawled next time the task is run? If they are wiped "before beginning a new crawl", then should I just run my Rails logic before the next crawl? If so, then I would end up having to check for duplicate records from the previous crawl.
2) This is the first time I have really thought about using MongoDB outside the context of Rails models. It looks like the records are created using the Page class, so can I later just query these as I normally would using Mongoid? I guess it is just considered a "model" once it has an ORM providing the fancy methods?
Great questions.
1) It depends on what your goal is.
In most cases this default makes sense. One does a crawl with anemone and examines the data.
When you do a new crawl, the old data should be erased so that the data from the new crawl can replace it.
You could point the storage engine at a new collection before starting the new crawl if you don't want that to happen.
2) Mongoid won't create the model classes for you.
You need to define models so that mongoid knows to create a class for the collection, and optionally define the fields that each of the documents have so that you can use the . accessor method out of the box.
Something like:
class Page
include Mongoid::Document
field :url, type: String #i'm guessing, check what kind of docs anemone produces
field :aliases, type: Array
field ....
end
It will probably need to include the following fields:
url - The URL of the page
aliases - Other URLs that redirected to this page, or the Page that this one redirects to
headers - The full HTTP response headers
code - The HTTP response code (e.g. 200, 301, 404)
body - The raw HTTP response body
doc - A Nokogiri::HTML::Document of the page body (if applicable)
links - An Array of all the URLs found on the page that point to the same domain
But please just take a look at what type (string, array, whatever) the storage engine is storing them as and don't make assumptions.
Good luck!