What is the best way to structure the database for tag based fetching in cloud database - mongodb

I am confused the way i need to structure my documents to effectively search/fetch items by tag when needed.
Meaning,structure of each document goes like:
{
name: "Delicious blackforest cake",
tags: ["blackforest","birthday","designer"]
...
}
{
name: "Red velvet cake",
tags: ["party","anniversary","designer"]
...
}
...
There's total of 32 tags , and i want to fetch the cakes based on tags.This is my present structure which i feel would be inefficient while fetching.
And I want to search based on tags and name of the cake, for example
if i search de
The search suggestions should be
designer cake /* This is based on tag */
Delicious blackForest cake /* This is based on actual name */
As per my knowledge i guess this is difficult to achieve in firebase. Should i opt for mongoDb or should i change the structure of the document.
I want suggestion to effectively search and fetch according to my above stated needs.

Firestore can be used for this use case. The array-contains operator can be used to query documents where tags array contains a specific value.
await colRef.where("tags", "array-contains", "tag")
If your use case required to find documents with multiple tags, then you might have to use a map instead of array. Checkout Firestore search array contains for multiple values.
MongoDB has a $all operator that can be used for this as shown below:
await collection.find({ tags: { $all: ["tag"] } })
For full-text search, you'll have to use a search service as also mentioned in the documentation for best results. Although MongoDB has a $search operator (uses Apache Lucene as far as I know), it can be used only when you host your database on Atlas otherwise you'll have to rely on $text operator.
Firestore Algolia Extension should do most of the work for you and let you use all full text search capabilities of Algolia.
Additionally, if you use Algolia with Firestore, you can get even better support for filtering by tags so you won't have to use a map instead of array as mentioned earlier.

Related

Implementing Mongo Atlas Search on a Document that has dynamic properties

I am having trouble implementing an Atlas Search on a document collection where the documents have no fixed field names. All the field names are highly dynamic.
For Example, A user may create a document with the following fields
{
name: string,
description: string
}
Another user may create a document with the following fields
{
company: string,
username: string
}
This is happening because we provide a feature to users where they can create their own records. So the fields are also dynamic depending on their needs. No, we need to provide Full-Text Search Support on these documents but we are struggling to create a Search Index because the path is dynamic.
Is there any way in Mongo Atlas Search to accomplish this?
Yes, when you define the collections [field mapping] you want to define it as dynamic:
You can configure Atlas Search to automatically index all the supported field types in the collection using dynamic mappings.
There are some limitations by doing this but it does not sounds like it will affect you.
You then can execute a wildcard field search

MongoDB search via index of documents containing JSON

Say I have objects in a MongoDB collection:
{
...
"json" : "{\"things\":[2494090781803658355,5114030115038563045,3035856943768375362,8931213615561493991,7574631742057150605,480863244020297489]}"
}
It's an Azure "MongoDB" so doesn't support all the features, but suppose it does.
This search will find that document:
db.coll.find({"json" : {$regex : "5114030115038563045|8931213615561493991"}})
Of course, it's scanning the whole collection to pull these records out. What's an efficient/faster way to find documents where the list of "things"
contains any of a list of "things" in a query? It seems like throwing a search engine like Solr or ElasticSearch would solve this, and perhaps
using another Azure's Data Lake storage would make this more searchable, so I'm considering those options. They're outside the scope of this
question though; I'd like to know if there's a Mongo-ish way to search this collection by index.
The only option you have available to you if you're storing a JSON string is to use a text index with a $text operator.
If this document structure isn't set in stone, however, you might consider also separately storing the JSON as a nested subdocument (with the appropriate sanitation, of course). This would allow you to construct an index on json.things, while still storing the JSON string, and allow you to perform a query on e.g. "json.things": {$in: [ "5114030115038563045", "8931213615561493991" ]}

MongoDB fulltext search + workaround for partial word match

Since it is not possible to find "blueberry" by the word "blue" by using a mongodb full text search, I want to help my users to complete the word "blue" to "blueberry". To do so, is it possible to query all the words in a mongodb full text index -> that I can use the words as suggestions i.e. for typeahead.js?
Language stemming in text search uses an algorithm to try to relate words derived from a common base (eg. "running" should match "run"). This is different from the prefix match (eg. "blue" matching "blueberry") that you want to implement for an autocomplete feature.
To most effectively use typeahead.js with MongoDB text search I would suggest focusing on the prefetch support in typeahead:
Create a keywords collection which has the common words (perhaps with usage frequency count) used in your collection. You could create this collection by running a Map/Reduce across the collection you have the text search index on, and keep the word list up to date using a periodic Incremental Map/Reduce as new documents are added.
Have your application generate a JSON document from the keywords collection with the unique keywords (perhaps limited to "popular" keywords based on word frequency to keep the list manageable/relevant).
You can then use the generated keywords JSON for client-side autocomplete with typeahead's prefetch feature:
$('.mysearch .typeahead').typeahead({
name: 'mysearch',
prefetch: '/data/keywords.json'
});
typeahead.js will cache the prefetch JSON data in localStorage for client-side searches. When the search form is submitted, your application can use the server-side MongoDB text search to return the full results in relevance order.
A simple workaround I am doing right now is to break the text into individual chars stored as a text indexed array.
Then when you do the $search query you simply break up the query into chars again.
Please note that this only works for short strings say length smaller than 32 otherwise the indexing building process will take really long thus performance will be down significantly when inserting new records.
You can not query for all the words in the index, but you can of course query the original document's fields. The words in the search index are also not always the full words, but are stemmed anyway. So you probably wouldn't find "blueberry" in the index, but just "blueberri".
Don't know if this might be useful to some new people facing this problem.
Depending on the size of your collection and how much RAM you have available, you can make a search by $regex, by creating the proper index. E.g:
db.collection.find( {query : {$regex: /querywords/}}).sort({'criteria': -1}).limit(limit)
You would need an index as follows:
db.collection.ensureIndex( { "query": 1, "criteria" : -1 } )
This could be really fast if you have enough memory.
Hope this helps.
For those who have not yet started implementing any database architecture and are here for a solution, go for Elasticsearch. Its a json document driven database similar to mongodb structurally. It has "edge-ngram" analyzer which is really really efficient and quick in giving you did you mean for mis-spelled searches. You can also search partially.

Confusion regarding Mongo db Schema. How to make it better?

I am using mongoose with node.js for this.
My current Schema is this:
var linkSchema = new Schema({
text: String,
tags: array,
body: String,
user: String
})
My use-case is this: There are a list of users and each user has a list of links associated with it. Users and links are different Schemas of course. Thus, how does one get that sort of one to one relationship done using mongo-db.
Should I make a User Schema and embed linkSchema in it? Or the other way around?
Another doubt regarding that. Tags would always be an array of strings which I can use to browse through links later. Should it be an array data type or is there a better way to represent it?
If it's 1:1 then nest one document inside the other. Which way around depends on the queries, but you could easily do both if you need to.
For tags, you can index an array field and use that for searching/filtering documents and from the information you've given that sounds reasonable IMHO.
If you had a fixed set of tags it would make sense to represent those as a nested object with named fields perhaps, depending on queries. Don't forget you not only can create nested documents in Mongo but you can also search on sub-fields and even use entire nested documents as searchable/indexable fields. For instance, you could have a username like this;
email: "joe#somewhere.com"
as a string, and you could also do;
email: {
user: "joe",
domain: "somewhere.com"
}
you could index email in both cases and use either for matching. In the latter case though you could also search on domain or user only without resorting to RegEx style queries. You could also store both variants, so there's lots of flexibile options in Mongo.
Going back to tags, I think your array of strings is a fine model given what you've described, but if you were doing more complex bulk aggregation, it wouldn't be crazy to store a document for every tag with the same document contents, since that's essentially what you'd have to do for every query during aggregation.

mongodb- indexes on list fields for $all queries

I am making an application using pymongo wrapper for which my schema is like:
{
_id: <some_id>,
name: <some_name>,
my_tags: [<list_of_tags>]
}
Now I want to return those entries which falls under the user specified tags. For example,
I want to have entries where my_tags should be atleast ["college", "USA", "engineering"]. For that I read $all construct can be used. Now what I want to know is, would it be of any use making an index on my_tags. For my app, this type of queries are used extensively.
would it be of any use making an index on my_tags. For my app, this type of queries are used extensively.
Yes $all will use an index so it is still good to make one there however there are still optimisations that can be done for it: https://jira.mongodb.org/browse/SERVER-5331 and https://jira.mongodb.org/browse/SERVER-1000
Normally the docs will only warn you of when something can not use an index.
The syntax for the $all query is:
db.collection.find({'my_tags': {'$all': ['college', 'USA', 'engineering']}})
The documentation can be found at:
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24all