Issue in MongoDB document search

Issue in MongoDB document search - mongodb

I am new to MongoDB. And I have the following issue on currently developing web application.
We have an application where we use mongoDB to store data.
And we have an API where we search for the document via text search.
As an example: if the user type “New York” then the request should send the all the available data in the collection to the keyword “New York". (Here we call the API for each letter typed.) We have nearly 200000 data in the DB. Once the user searches for a document then it returns nearly 4000 data for some keywords. We tried with limiting the data to 5 – so it returns the top 5 data, and not the other available data. And we tried without limiting data now it returns hundreds and thousands of data as I mentioned. And it causes the request to slow down.
At Frontend we Bind search results to a dropdown. (NextJs)
My question:
Is there an optimizing way to search a document?
Are there any suggestions of a suitable way that I can implement this requirement using mongoDB and net5.0?
Or any other Implementation methods regarding this requirement?
Following code segment shows the query to retrieve the data to the incomming keyword.
var hotels = await _hotelsCollection
.Find(Builders<HotelDocument>.Filter.Text(keyword))
.Project<HotelDocument>(hotelFields)
.ToListAsync();
var terminals = await _terminalsCollection
.Find(Builders<TerminalDocument>.Filter.Text(keyword))
.Project<TerminalDocument>(terminalFeilds)
.ToListAsync();
var destinations = await _destinationsCollection
.Find(Builders<DestinationDocument>.Filter.Text(keyword))
.Project<DestinationDocument>(destinationFields)
.ToListAsync();

So this is a classic "autocomplete" feature, there are some known best practices you should follow:
On the client side you should use a debounce feature, this is a most. there is no reason to execute a request for each letter. This is most critical for an autocomplete feature.
On the backend things can get a bit more complicated, naturally you want to be using a db that is suited for this task, specifically MongoDB have a service called Atlas search that is a lucene based text search engine.
This will get you autocomplete support out of the box, however if you don't want to make big changes to your infra here are some suggestions:
Make sure the field your searching on is indexed.
I see your executing 3 separate requests, consider using something like Task.WhenAll to execute all of them at once instead of 1 by 1, I am not sure how the client side is built but if all 3 entities are shown in the same list then ideally you merge the labels into 1 collection so you could paginate the search properly.
As mentioned in #2 you must add server side pagination, no search engine can exist without one. I can't give specifics on how you should implement it as you have 3 separate entities and this could potentially make pagination implementation harder, i'd consider wether or not you need all 3 of these in the same API route.

Related

How can I specify multiple languages when sending a GET request to GitHub search API

I wonder how can I send a GET request to GitHub search API, specifically https://api.github.com/search/repositories and make the query to include several languages instead of one.
Here's my current query.
https://api.github.com/search/repositories?q=stars:%3E=1000+language:scala&sort=stars&order=desc&per_page=10
I have tried doing something like this but it didn't work as well
https://api.github.com/search/repositories?q=stars:%3E=1000+language:[scala, java]&sort=stars&order=desc&per_page=10
Thanks for your help

You need to pass in multiple language: element for being able to pass multiple languages to the query as per the doc.
For your specific case, the query would be :
https://api.github.com/search/repositories?q=stars:%3E=1000+language:scala+language:java&sort=stars&order=desc
with pagination applied it would be :
https://api.github.com/search/repositories?q=stars:%3E=1000+language:scala+language:java&sort=stars&order=desc&per_page=10
However, with pagination applied your search results will be limited in the browser.

Efficiently retrieving all unique records for a facet value along with record counts in Algolia

I'm using Algolia to power search in my app. I have an index called prod_COACHES in which I have some records with an object key called speciality1.
The data structure for speciality1 looks like this:
I have enabled speciality1.itemName as an Algolia 'facet' so that I can filter on it. All good so far and working nicely. Now, in my Algolia dashboard I can see a nice bit of UI that shows me every unique facet (in this case my specialisations) along with the number of records for each facet:
As it happens, I want to show exactly this information on my own UI in my app but I'm not sure how to get this data from Algolia in the most efficient way. I'm using the client side AlgoliaSearch Javascript SDK. How do I run a search to retrieve every unique speciality1.itemName and the number of records for each unique speciality1.itemName so I can build my own UI just like the above?
I have gone through the docs and followed the examples but my question is really about finding the most efficient way to do this from someone who really knows Algolia well, rather than hack my own solution together. Thanks!

It looks like you've enabled attributesForFaceting on the attribute speciality1.itemName. You can retrieve the facet values for the given attribue with the search parameter facets. The Algolia response will now contain a map with value:count. Here is an example with the JavaScript client:
import algoliasearch from 'algoliasearch';
const client = algoliasearch('XXX', 'XXX');
const index = client.initIndex('XXX');
index.search('', {
facets: ['speciality1.itemName']
}).then(result => {
console.log(result.facets)
});
If you want to easily build a search UI, you should take a look at the InstantSearch libraries. It's built on top of Algolia to ease the state/ui management for such UI. Many flavours are available e.g. Vanilla, React.

Best practice for filtering data mongodb

I am looking into best -practices for returning search results. I have a search page that subscribes to a publication that returns a find based on the searched regex query in multiple fields. This gets put into the minimongo collection, on the client.
At this time, the way it is being handled is that facets are being set up from the subscription. My question is if the filtering for the pre-loaded results from the backend should be done client side, or if the query should be sent back.
Example :
Given a collection of fruits, i want to find all that have the color red. The server returns this, but I have facets based on the fruits. So, i have a checkbox for strawberries, apples, cherries, etc. If I click on the checkbox for cherries, should I just be filtering the current minimongo collection, or should I re-query?
Logically, I already have all the needed items in my collection that I could be filtering on, so I am not sure why I would need to hit the back-end. The only time I should hit the backend is if in the search, I type in a new query (such as blue), and the facets get re-done appropriately

If your original search is returning all matching documents then adding criteria on the client can just be done in your minimongo query if the fields on which the additional criteria were returned with the original search.
OTOH if the original search is returning a paginated list or just the top N results or if the required keys weren't included then you want to continue the search on the server.
In a traditional request-response system, you might also want to query the server each time if the underlying data was rapidly changing (ex: a reservations system). With Meteor the reactive nature of pub-sub means that the data on the client is being constantly refreshed with adds/changes/deletions via DDP over WebSocket.

REST: How best to handle large lists

It seems to me that REST has clean and clear semantics for basic CRUD and for listing resources, but I've seen no discussion of how to handle large lists of resources. It doesn't scale to dump an entire database table over the network in a resource-oriented architecture (imagine a customer table with a million customers!), especially if you only need a few items. So it seems that some semantics should exist to filter, map and reduce a list of resources on the server-side.
So, do you know any tried and true ways to do the following kinds of requests in REST:
1) Retrieve just the count of the resources?
I could imagine doing something like GET /api/customer?result=count
Is that how it's usually done?
I could also imagine modifying the URL (/api/count/customer or /api/customer/count, for example), but that seems to either break the continuity of the resource paths or inflict an ugly hack on the expected ID field.
2) Filter the results on the server-side?
I could imagine using query parameters for this, in a context-specific way (such as GET /api/customer?country=US&state=TX).
It seems tricky to do it in a flexible way, especially if you need to join other tables (for example, get customers who purchased in the last 6 months).
I could imagine using the HTTP OPTIONS method to return a JSON string specifying possible filters and their possible values.
Has anyone tried this sort of thing?
If so, how complex did you get (for example, retrieving the items purchased year-to-date by female customers between 18 and 45 years old in Massachussetts, etc.)?
3) Mapping to just get a limited set of fields or to add fields from joined tables?
4) More complicated reductions than count (such as average, sum, etc.)?
EDIT: To clarify, I'm interested in how the request is formulated rather than how to implement it on the server-side.

I think the answer to your question is OData! OData is a generic protocol for querying and interacting with information. OData is based on REST but extends the semantics to include programatic elemements similar to SQL.
OData is not always URL-based only as it use JSON payloads for some scenarios. But it is a standard (OASIS) so it well structured and supported by many APIs.
A few general links:
https://en.wikipedia.org/wiki/Open_Data_Protocol
http://www.odata.org/

The most common ways of handling large data sets in GET requests are (afaict) :
1) Pagination. The request would be something like GET /api/customer?country=US&state=TX&firstResult=0&maxResults=50. This way the client has the freedom to choose the size of the data chunk he needs (this is often useful for UI-based clients).
2) Exposing a size service, so that the client gets to know how large the data set is before actually requesting it. The service would be something like
GET /api/customer/size?country=US&state=TX
Obviously the two can (and imho should) be used together, so that when/if a client (be it mobile or web or whatever) waints to fill its UI with content, he can choose what's the best data chunk size based also on the size of whole data set (e.g. to avoid creating 100 pages for the user to navigate).

simple model when requesting collection and extended model when requesting resource - how

I have the following URI: /articles/:id, where article is a resource on web-service and have associated model/class. Now I need to return only partial data for each resource (to save bandwidth and make for speed) when collection is requested, but when a single item is requested from collection I need to send full data. My question is should I use two models/classes for the same resource on the server and initiate different one depending on collection or single resource is requested? Or maybe there is should be only one model/class but not all fields should be filled with data when a collection is requested? Or maybe there is another approach?

I suggest using the approach suggested here with a fields query parameter.
If the API is going to be open to everyone to use and client usage is going to be unpredictable, then by default you probably need to limit the fields that you return. Just make sure you document in some way all the possible fields that could be used, in case a client actually needs them.
If the API is going to be consumed only by an app or apps you made, then by default you could return all of the fields and then your app can pass that fields parameter to speed things up.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Issue in MongoDB document search - mongodb

Related

How can I specify multiple languages when sending a GET request to GitHub search API

Efficiently retrieving all unique records for a facet value along with record counts in Algolia

Best practice for filtering data mongodb

REST: How best to handle large lists

simple model when requesting collection and extended model when requesting resource - how

Categories

Resources