Querying array of arrays in MongoDB - mongodb

I have a mongo collection which has an array of arrays (bigrams from a NLP process) that I'd like to be able to search, so for example;
{
"sentence" : "will most likely be",
"biGrams" : [["will","most"], ["most","likely"], ["likely", "be"]
},
{
"sentence" : "likely most people use stackoverflow",
"biGrams" : [["likely","most"], ["most","people"], ["people", "use"], ["use", "stackoverflow"]
}
What I'd like to be able to do is search through the biGram sub-doucment for a certain instance of one of these bigrams, e.g. search for all sentences that contain the bigram ["most","likely"].
I've tried this;
find({'biGrams':{$elemMatch: {$elemMatch:{$in:['most','likely']}} }})
But this obviously finds all cases with the word 'most' or 'likely' are present. And the order is important, i.e. I don't want to find docs with ['likely','most'] in this example.
Thanks in advance, I'm stumped....

How about
find({"biGrams":["most","likely"]})
Searching on a particular field "unwinds" one level of arrays in that field, and searching for a particular array should be a binary match on that array.

Related

Mongodb query variable number of search terms

I am trying to design a query based on documents containing an array of metadata.
book = {
title : String,
metaData : [String]
}
To find the desired books I have another search string array containing multiple metadata terms. The length of the array can be variable in the number of metadata search terms present. How can I query to find only books that contain all the specified metadata search terms?
Example:
book1 - nature, trees, insects, fog, music
book2 - music, art, sports
Search using metadata of [music, sports] would yield book2.
How can I most efficiently design this query? Can I do this and avoid a nested query? Any help would be greatly appreciated.
You can do this by using the $all operator.
From the docs:
If, instead, you wish to find an array that contains both the elements "red" and "blank", without regard to order or other elements in the array, use the $all operator:
db.inventory.find( { tags: { $all: ["red", "blank"] } } )
MongoDB CRUD Operations: Query an array
You can use $all to get all the documents containing the given metadata strings.
Try:
let given_metadata_array = ["music", "sports"];
db.book.find({
metadata : {$all : given_metadata_array}
})
Read more about $all official documentation for detailed information.

Get text words from query

I've read the MongoDB documentation on getting the indexes within a collection, and have also searched SO and Google for my question. I want to get the actual indexed values.
Or maybe my understanding of how MongoDB indexes is incorrect. If I've been indexing a field called text that contains paragraphs, am I right in thinking that what gets indexed is each word in the paragraph?
Either case I want to retrieve the values that were indexed, which db.collection.getIndexes() doesn't seem to be returning.
Well yes and no, in summary.
Indexes work on the "values" of the fields they are supplied to index, and are much like a "card index" in that there is a point of reference to look at to find the location of something that matches that term.
What "you" seem to be asking about here is "text indexes". This is a special index format in MongoDB and other databases as well that looks at the "text" content of a field and breaks down every "word" in that content into a value in that "index".
Typically we do:
db.collection.createIndex({ "text": "text" })
Where the "field name" here is "text" as you asked, but more importantly the type of index here is "text".
This allows you to then insert data like this:
db.collection.insert({ "text": "The quick brown fox jumped over the lazy dog" })
And then search like this, using the $text operator:
db.collection.find({ "$text": { "$search": "brown fox" } })
Which will return and "rank" in order the terms you gave in your query depending how they matched the given "text" of your field in the index on your collection.
Note that a "text" index and it's query does not interact on a specific field. But the index itself can be made over multiple fields. The query and the constraints on the "index" itself are that there can "only be one" text index present on any given collection otherwise errors will occur.
As per mongodb's docs:
"db.collection.getIndexes() returns an array of documents that hold index information for the collection. Index information includes the keys and options used to create the index. For information on the keys and index options, see db.collection.createIndex()."
You first have to create the index on the collection, using the createIndex() method:
db.records.createIndex( { userid: 1 } )
Queries on the userid field are supported by the index:
Example:
db.records.find( { userid: 2 } )
db.records.find( { userid: { $gt: 10 } } )
Indexes help you avoid scanning the whole document. They basically are references or pointers to specific parts of your collection.
The docs explain it better:
http://docs.mongodb.org/manual/tutorial/create-an-index/

mongoDB text index on subdocuments

I have a collection that looks something like this
{ "text1" : "text",
"url" : "http:....",
"title" : "the title",
......,
"search_metadata" : { "tags" : [ "tag1", "tag2", "tag3" ],
"title" : "the title",
"topcis": [ "topic1", "topic2"]
}
}
I want to be able to add a text index to search_metadata and all it's subdocuments.
ensureIndex({search_metadata:"text"}) Gives me no results
and:
ensureIndex({"$**":"text"}) will give me irrelevant data
How can I make it happen?
From the text indexes page:
text indexes can include any field whose value is a string or an array
of string elements. To perform queries that access the text index, use
the $text query operator
Your search_metadata field is a series of sub-documents, not a string or an array of strings, so it basically is not in the right format to make use of a text index in MongoDB as it is currently structured.
Now, embedded in search_metadata you have both strings and arrays of strings, so you could use a text index on those, so an index on {search_metadata.tags : "text"} for example fits the criteria and should work just fine.
Hence, it's a choice between restructuring the field to meet the text index criteria, or a matter of indexing the relevant sub-fields. If you take the latter approach you may find that you don't need text indexes on each of the fields and a simpler (and far smaller) index may serve you just as well (using a normal index on tags and then $elemMatch for example).

Mongodb projection returning a given position

I am trying to do something as simple as return a subarray of a document. So my query is:
db.mydocs.findOne({_id:ObjectId("af7a85f758e338d762000012")},{'house.0.room.4.windows':1});
I want to return only that value. But I get an empty structure. I know I could do it with $slice. But as far as I know, I can't do it for subarrays. I mean: {'house':{'$slice':0}} would work. But I don't know how to get house 0 and room 4.
You'll have to use chained $slice operators for this to work:
db.mydocs.findOne(
{_id:ObjectId("af7a85f758e338d762000012")},
{"house" : {$slice : [0,1]}, // house.0
"house.room" :{$slice : [3,1]}, // house.0.room.4
"house.room.windows" : 1}); // house.0.room.4.windows
The resulting document will look like this, i.e. the arrays are still arrays (which is helpful when mapping to strongly typed languages):
house : [ { room : [ { windows : foo } ] } ]
from the documentation:
Tip: MongoDB does not support projections of portions of arrays except when using the $elemMatch and $slice projection operators.
P.S: I find it irritating that house and room are arrays, but have singular names, esp. because windows is plural.

Querying sub array with $where

I have a collection with following document:
{
"_id" : ObjectId("51f1fd2b8188d3117c6da352"),
"cust_id" : "abc1234",
"ord_date" : ISODate("2012-10-03T18:30:00Z"),
"status" : "A",
"price" : 27,
"items" : [{
"sku" : "mmm",
"qty" : 5,
"price" : 2.5
}, {
"sku" : "nnn",
"qty" : 5,
"price" : 2.5
}]
}
I want to use "$where" in the fields of "items", so something like this:
{$where:"this.items.sku==mmm"}
How can I do it? It works when the field is not of array type.
You don't need a $where operator to do this; just use a query object of:
{ "items.sku": mmm }
As for why your $where isn't working, the value of that operator is executed as JavaScript, so that's not going to check each element of the items array, it's just going to treat items as a normal object and compare its sku property (which is undefined) to mmm.
You are comparing this.items.sku to a variable mmm, which isn't initialized and thus has the value unefined. What you want to do, is iterate the array and compare each entry to the string 'mmm'. This example does this by using the array method some which returns true, when the passed function returns true for at least one of the entries:
{$where:"return this.items.some(function(entry){return entry.sku =='mmm'})"}
But really, don't do this. In a comment to the answer by JohnnyHK you said "my service is just a interface between user and mongodb, totally unaware what the field client want's to store". You aren't really explaining your use-case, but I am sure you can solve this better.
The $where operator invokes the Javascript engine even though this
trivial expression could be done with a normal query. This means unnecessary performance overhead.
Every single document in the collection is passed to the function, so when you have an index, it can not be used.
When the javascript function is generated from something provided by the client, you must be careful to sanetize and escape it properly, or your application gets vulnerable to code injection.
I've been reading through your comments in addition to the question. It sounds like your users can generically add some attributes, which you are storing in an array within a document. Your client needs to be able to query an arbitrary pair from the document in a generic manner. The pattern to achieve this is typically as follows:
{
.
.
attributes:[
{k:"some user defined key",
v:"the value"},
{k: ,v:}
.
.
]
}
Note that in your case, items is attributes. Now to get the document, your query will be something like:
eg)
db.collection.find({attributes:{$elemMatch:{k:"sku",v:"mmm"}}});
(index attributes.k, attributes.v)
This allows your service to provide a way to query the data, and letting the client specify what the k,v pairs are. The one caveat with this design is always be aware that documents have a 16MB limit (unless you have a use case that makes GridFS appropriate). There are functions like $slice which may help with controlling this.