I am new to MongoDB.
I have created a collection in MongoDB and stored the following
Q1UsefulStatementsList: [{
Q1UsefulStatement: "Useful Sentence"
Q1ActionsList: [{
Q1Verb: "Verb in the sentence"
Q1NP: "The Noun phrase"
Q1PP: "The Preposition phrase"
}]
}]
Q2UsefulStatementsList: [{
Q2UsefulStatement: "Useful Sentence"
Q2ActionsList: [{
Q2Verb: "Verb in the Sentence"
Q2NP: "The Noun phrase"
Q2PP: "The preposition Phrase"
}]
}]
I need to loop through this collection and get all the Verbs from Q1UsefulStatementsList and Q2UsefulStatementsList.
Ex:
Q1UsefulStatementsList: [{
Q1UsefulStatement: "My dog also likes eating sausage"
Q1ActionsList: [{
Q1Verb: "likes"
Q1NP: "My dog"
Q1PP: "n / a"
}]
} {
Q1UsefulStatement: "The disabling of log helps"
Q1ActionsList: [{
Q1Verb: "disabling"
Q1NP: "disaabling of logs"
Q1PP: "of"
}]
}]
Q2UsefulStatementsList: [{
Q2UsefulStatement: "Log analysis failed"
Q2ActionsList: [{
Q2Verb: "failed"
Q2NP: "Log analysis"
Q2PP: "n / a"
}]
}]
I would like to get 'likes' and 'disabling' as output when I run through Q1UsefulStatementsList.
I have tried it using the code below. But is there a easier way to do these sort of things in MongoDB?
I tried using the 'dot operator' like (Q1UsefulStatementsList.Q1UsefulStatement) but what it gives me is a entire BSON(JSON) object. What I need is actual individual values directly.
Do suggest any easier way if any.
The Java code that I have written to extract the values
if (object.get("Q1UsefulStatementsList") != null) {
BasicDBList qUseStatementList = (BasicDBList)(object.get("Q1UsefulStatementsList"));
for (Object qUsefulStatement: qUseStatementList) {
DBObject tmp = (DBObject) qUsefulStatement;
if (tmp.get("Q1ActionsList") != null) {
BasicDBList qActionsList = (BasicDBList) tmp.get("Q1ActionsList");
for (Object qVerbs: qActionsList) {
DBObject tmpQVerbs = (DBObject) qVerbs;
String verb = (tmpQVerbs.get("Q1Verb").toString());
String nP = (tmpQVerbs.get("Q1NP").toString());
String pP = (tmpQVerbs.get("Q1PP").toString());
}
}
}
}
You can use Aggregation Framework to get the verbs you want. You can do it with the following code.
db.myObject.aggregate(
{ $unwind : "$Q1UsefulStatementsList"},
{ $unwind : "$Q1UsefulStatementsList.Q1ActionsList"},
{$group:{ _id: "$_id", verbs : {$addToSet : "$Q1UsefulStatementsList.Q1ActionsList.Q1Verb"}}}
);
This will return results as follow :
"result" : [
{
"_id" : ObjectId("5253e46ae3a2c44e082642c9"),
"verbs" : ["disabling", "likes"]
}
]
By looping the result array you can extract verbs array and add it to the Set in order to get unique verbs.
You can easily convert it to Java code.
Related
In the documentation of MongoDB Atlas search, it says the following for the autocomplete operator:
query: String or strings to search for. If there are multiple terms in
a string, Atlas Search also looks for a match for each term in the
string separately.
For the text operator, the same thing applies:
query: The string or strings to search for. If there are multiple
terms in a string, Atlas Search also looks for a match for each term
in the string separately.
Matching each term separately seems odd behaviour to me. We need multiple searches in our app, and for each we expect less results the more words you type, not more.
Example: When searching for "John Doe", I expect only results with both "John" and "Doe". Currently, I get results that match either "John" or "Doe".
Is this not possible using MongoDB Atlas Search, or am I doing something wrong?
Update
Currently, I have solved it by splitting the search-term on space (' ') and adding each individual keyword to a separate must-sub-clause (with the compound operator). However, then the search query no longer returns any results if there is one keyword with only one character. To account for that, I split keywords with one character from those with multiple characters.
The snippet below works, but for this I need to save two generated fields on each document:
searchString: a string with all the searchable fields concatenated. F.e. "John Doe Man Streetstreet Citycity"
searchArray: the above string uppercased & split on space (' ') into an array
const must = [];
const searchTerms = 'John D'.split(' ');
for (let i = 0; i < searchTerms.length; i += 1) {
if (searchTerms[i].length === 1) {
must.push({
regex: {
path: 'searchArray',
query: `${searchTerms[i].toUpperCase()}.*`,
},
});
} else if (searchTerms[i].length > 1) {
must.push({
autocomplete: {
query: searchTerms[i],
path: 'searchString',
fuzzy: {
maxEdits: 1,
prefixLength: 4,
maxExpansions: 20,
},
},
});
}
}
db.getCollection('someCollection').aggregate([
{
$search: {
compound: { must },
},
},
]).toArray();
Update 2 - Full example of unexpected behaviour
Create collection with following documents:
db.getCollection('testing').insertMany([{
"searchString": "John Doe ExtraTextHere"
}, {
"searchString": "Jane Doe OtherName"
}, {
"searchString": "Doem Sarah Thisistestdata"
}])
Create search index 'default' on this collection:
{
"mappings": {
"dynamic": false,
"fields": {
"searchString": {
"type": "autocomplete"
}
}
}
}
Do the following query:
db.getCollection('testing').aggregate([
{
$search: {
autocomplete: {
query: "John Doe",
path: 'searchString',
fuzzy: {
maxEdits: 1,
prefixLength: 4,
maxExpansions: 20,
},
},
},
},
]).toArray();
When a user searches for "John Doe", this query returns all the documents that have either "John" OR "Doe" in the path "searchString". In this example, that means all 3 documents. The more words the user types, the more results are returned. This is not expected behaviour. I would expect more words to match less results because the search term gets more precise.
An edgeGram tokenization strategy might be better for your use case because it works left-to-right.
Try this index definition take from the docs:
{
"mappings": {
"dynamic": false,
"fields": {
"searchString": [
{
"type": "autocomplete",
"tokenization": "edgeGram",
"minGrams": 3,
"maxGrams": 10,
"foldDiacritics": true
}
]
}
}
}
Also, add change your query clause from must to filter. That will exclude the documents that do not contain all the tokens.
If the data in my mongodb is like this
{
"category": "Pop Culture",
"subcategory": "Film Directors",
"name": "Quentin Tarantino",
}
Is there any way I can find this object only using some part of the name
Like:
db.data.find_one({'name': 'quentin'})
For string data, you can use $regex operator to find with some part of the field data.
Example:
db.collection.findOne({
name: {
$regex: "Some part of name"
}
})
See $regex for more information.
After digging google and SO for a week I've ended up asking the question here. Suppose there are two collections,
UsersCollection:
[
{...
name:"James"
userregex: "a|regex|str|here"
},
{...
name:"James"
userregex: "another|regex|string|there"
},
...
]
PostCollection:
[
{...
title:"a string here ..."
},
{...
title: "another string here ..."
},
...
]
I need to get all users whose userregex will match any post.title(Need user_id, post_id groups or something similar).
What I've tried so far:
1. Get all users in collection, run regex on all products, works but too dirty! it'll have to execute a query for each user
2. Same as above, but using a foreach in Mongo query, it's the same as above but only Database layer instead of application layer
I searched alot for available methods such as aggregations, upwind etc with no luck.
So is it possible to do this in Mongo? Should i change my database type? if yes what type would be good? performance is my first priority. Thanks
It is not possible to reference the regex field stored in the document in the regex operator inside match expression.
So it can't be done in mongo side with current structure.
$lookup works well with equality condition. So one alternative ( similar to what Nic suggested ) would be update your post collection to include an extra field called keywords ( array of keyword values it can be searched on ) for each title.
db.users.aggregate([
{$lookup: {
from: "posts",
localField: "userregex",
foreignField: "keywords",
as: "posts"
}
}
])
The above query will do something like this (works from 3.4).
keywords: { $in: [ userregex.elem1, userregex.elem2, ... ] }.
From the docs
If the field holds an array, then the $in operator selects the
documents whose field holds an array that contains at least one
element that matches a value in the specified array (e.g. ,
, etc.)
It looks like earlier versions ( tested on 3.2 ) will only match if array have same order, values and length of arrays is same.
Sample Input:
Users
db.users.insertMany([
{
"name": "James",
"userregex": [
"another",
"here"
]
},
{
"name": "John",
"userregex": [
"another",
"string"
]
}
])
Posts
db.posts.insertMany([
{
"title": "a string here",
"keyword": [
"here"
]
},
{
"title": "another string here",
"keywords": [
"another",
"here"
]
},
{
"title": "one string here",
"keywords": [
"string"
]
}
])
Sample Output:
[
{
"name": "James",
"userregex": [
"another",
"here"
],
"posts": [
{
"title": "another string here",
"keywords": [
"another",
"here"
]
},
{
"title": "a string here",
"keywords": [
"here"
]
}
]
},
{
"name": "John",
"userregex": [
"another",
"string"
],
"posts": [
{
"title": "another string here",
"keywords": [
"another",
"here"
]
},
{
"title": "one string here",
"keywords": [
"string"
]
}
]
}
]
MongoDB is good for your use case but you need to use a approach different from current one. Since you are only concerned about any title matching any post, you can store the last results of such a match. Below is a example code
db.users.find({last_post_id: {$exists: 0}}).forEach(
function(row) {
var regex = new RegExp(row['userregex']);
var found = db.post_collection.findOne({title: regex});
if (found) {
post_id = found["post_id"];
db.users.updateOne({
user_id: row["user_id"]
}, {
$set :{ last_post_id: post_id}
});
}
}
)
What it does is that only filters users which don't have last_post_id set, searches post records for that and sets the last_post_id if a record is found. So after running this, you can return the results like
db.users.find({last_post_id: {$exists: 1}}, {user_id:1, last_post_id:1, _id:0})
The only thing you need to be concerned about is a edit/delete to an existing post. So after every edit/delete, you should just run below, so that all matches for that post id are run again.
post_id_changed = 1
db.users.updateMany({last_post_id: post_id_changed}, {$unset: {last_post_id: 1}})
This will make sure that next time you run the update these users are processed again. The approach does have one drawback that for every user without a matching title, the query for such users would run again and again. Though you can workaround that by using some timestamps or post count check
Also you should make to sure to put index on post_collection.title
I was thinking that if you pre-tokenized your post titles like this:
{
"_id": ...
"title": "Another string there",
"keywords": [
"another",
"string",
"there"
]
}
but unfortunately $lookup requires that foreignField is a single element, so my idea of something like this will not work :( But maybe it will give you another idea?
db.Post.aggregate([
{$lookup: {
from: "Users",
localField: "keywords",
foreignField: "keywords",
as: "users"
}
},
]))
I've been using Bloodhound with the prefetch [docs] option defined.
This works fine, except when I add content to the json file being prefetched, it is not available as a search result unless I restart the browser.
So I am trying to make the search results reflect the updated file content in 'real time'.
I tried simply replacing prefetch with remote but this causes the search functionality not to work as intended (it shows non-matched results).
Below is the code I am using with prefetch.
Version info: typeahead.bundle.min.js at v0.10.5.
function searchFunction() {
var template =
"<p class=\"class_one\">{{area}}</p><p class=\"class_two\">{{title}}</p><p class=\"class_three\">{{description}}</p>";
var compiled_template = Hogan.compile(template);
var dataSource = new Bloodhound({
datumTokenizer: function(d) {
return Bloodhound.tokenizers.whitespace(d.tokens.join(
' '));
},
queryTokenizer: Bloodhound.tokenizers.whitespace,
prefetch: '/static/my_file.json'
# remote: '/search'
});
dataSource.initialize();
$('.my_lookup .typeahead').typeahead({}, {
source: dataSource.ttAdapter(),
name: 'courses',
displayKey: 'title',
templates: {
suggestion: compiled_template.render.bind(
compiled_template)
}
}).focus().on('typeahead:selected', function(event, selection) {
var title = selection.title
// do things with the title variable
});
}
Edit:
I started thinking perhaps I need some server side logic to perform a search on a database that contains the content previously within the local json file.
Using the code posted below, the following works:
Searches database in real time.
All matches are returned.
The following does not work:
It does not offer suggestions, you have to type the full token name.
If searching for apple, it will search after typing a, then p etc, if it doesn't get any results, it shows this error in Firebug: TypeError: data is null. After a few of these errors, it stops triggering searches and no error is displayed.
And, the results from the database are in the following format, and I don't know how to apply the Hogan template for the suggestions to each result:
{
"matches": [{
"tokens": ["apple", "orange"],
"area": "Nautical",
"_id": {
"$oid": "4793765242f9d1337be3d538"
},
"title": "Boats",
"description": "Here is a description"
}, {
"tokens": ["apple", "pineapple"],
"area": "Aviation",
"_id": {
"$oid": "4793765242f9d1337be3d539"
},
"title": "Planes",
"description": "Here is a description."
}]
}
JS
function searchFunction() {
var engine = new Bloodhound({
remote: {
url: '/search?q=%QUERY%',
wildcard: '%QUERY%'
},
datumTokenizer: Bloodhound.tokenizers.whitespace('q'),
queryTokenizer: Bloodhound.tokenizers.whitespace,
});
engine.initialize();
$('.my_lookup .typeahead').typeahead({
}, {
source: engine.ttAdapter(),
name: 'courses',
displayKey: 'title',
templates: {
suggestion: function (data) {
return "// not sure how to apply markup to each match"
}
}
}).focus().on('typeahead:selected', function(event, selection) {
console.log(selection);
var title = "// again not sure how to access individual match data"
// do things with the title variable
});
}
MongoDB Schema
Database: courses
Collection: courses
Documents:
{
"_id" : ObjectId("4793765242f9d1337be3d538"),
"tokens" : [
"apple",
"orange"
],
"area" : "Nautical",
"title" : "Boats",
"description" : "Here is a description."
}
and:
{
"_id" : ObjectId("4793765242f9d1337be3d539"),
"tokens" : [
"apple",
"pineapple"
],
"area" : "Aviation",
"title" : "Planes",
"description" : "Here is a description."
}
etc
Python (using Bottle routes)
#route('/search')
def search():
"""
Query courses database for matches in tokens field.
"""
# get the query
query = request.GET.q
# define the database
dbname = 'courses'
db = connection[dbname]
# define the collection
collection = db.courses
# make the query
matches = collection.find({"tokens":query})
# send back results
results = {}
results['matches'] = matches
response.content_type = 'application/json'
return dumps(results)
I have the following structure.
books collection:
{
_id: "book_1",
title: "How to build a house",
authorId: "author_1"
}
{
_id: "book_2",
title: "How to plant a tree",
authorId: "author_2"
}
authors collection:
{
_id: "author_1",
name: "Adam Adamson"
}
{
_id: "author_2",
name: "Brent Brentson"
}
I want to make a case insensitive free text search with the string "b" through the books collection and find all books that either has the "b" in the title or has an author with "b" in the name.
I can embed the author in the book object just to be able to make the query. But if the author name changes in the authors collection, the embedded authors object will have the wrong name.
{
_id: "book_2",
title: "How to plant a tree",
authorId: "author_2",
author:
{
name: "Brent Brentson"
}
}
What would be a good way to solve this problem?
You could use the following queries where the first gets the array of author ids that match the given regex expression query on the authors collection (using the map() method of the find() cursor) and the second query applies that array in the books collection query using the $in operator as well as using the regex pattern to find books that have "b" in the title:
var authorIds = db.authors.find({"name": /b/i}).map(function (doc) {return doc._id});
db.books.find({$or: [{"title": /b/i}, {"authorId": {"$in": authorIds} }]})
Result:
/* 0 */
{
"_id" : "book_1",
"title" : "How to build a house",
"authorId" : "author_1"
}
/* 1 */
{
"_id" : "book_2",
"title" : "How to plant a tree",
"authorId" : "author_2"
}
-- UPDATE --
Thanks to #yogesh for suggesting another approach which uses the distinct() method to get the author ids list:
var authorIds = db.authors.distinct("_id", {"name": /b/i})