Mongodb query variable number of search terms - mongodb

I am trying to design a query based on documents containing an array of metadata.
book = {
title : String,
metaData : [String]
}
To find the desired books I have another search string array containing multiple metadata terms. The length of the array can be variable in the number of metadata search terms present. How can I query to find only books that contain all the specified metadata search terms?
Example:
book1 - nature, trees, insects, fog, music
book2 - music, art, sports
Search using metadata of [music, sports] would yield book2.
How can I most efficiently design this query? Can I do this and avoid a nested query? Any help would be greatly appreciated.

You can do this by using the $all operator.
From the docs:
If, instead, you wish to find an array that contains both the elements "red" and "blank", without regard to order or other elements in the array, use the $all operator:
db.inventory.find( { tags: { $all: ["red", "blank"] } } )
MongoDB CRUD Operations: Query an array

You can use $all to get all the documents containing the given metadata strings.
Try:
let given_metadata_array = ["music", "sports"];
db.book.find({
metadata : {$all : given_metadata_array}
})
Read more about $all official documentation for detailed information.

Related

DataStax Stargate Document API

What does the a JSON blob with search filters, allowed operators: $eq, $ne, $in, $nin, $gt, $lt, $gte, $lte, $exists in the Swagger documentation that is shown in the DataStax Document API Swagger UI, it's not that documented so I want to ask if the query string is based on MongoDB?
The Document API exposed on top of Cassandra is provided by the open source project Stargate, indeed developed by Datastax and embedded in their Saas solution Astra.
The JSON query String than you created is parsed and converted in a proper CQL query under the hood.
Source code doesn't lie you can find the full code here and specially parsing of the where clause here
public List<FilterCondition> convertToFilterOps(
List<PathSegment> prependedPath,
JsonNode filterJson) {
List<FilterCondition> conditions = new ArrayList<>();
if (!filterJson.isObject()) {
throw new DocumentAPIRequestException("Search was expecting a JSON object as input.");
}
ObjectNode input = (ObjectNode) filterJson;
Iterator<String> fields = input.fieldNames();
while (fields.hasNext()) {
String fieldName = fields.next();
if (fieldName.isEmpty()) {
throw new DocumentAPIRequestException(
"The field(s) you are searching for can't be the empty string!");
}
...
The query string is pretty similar in spirit to what you'd find with Mongo.
Here are some sample where clauses to give an idea:
{"name": {"$eq": "Eric"}} - simple enough, matches documents that have a field name with value Eric
{"a.age": {"$gt": 0}} - You can also reference nested fields in a document
{"friends.[0].name": {"$in": ["Cassandra"]}} - Array elements are referenced using [], this would match if the document's first friend is named Cassandra.
{"friends.*.age": {"$gte": 24}} - Wildcard * can be used to match any element in an array, or any field at a particular level of nesting. This matches any friend whose age is >= 24.

Get text words from query

I've read the MongoDB documentation on getting the indexes within a collection, and have also searched SO and Google for my question. I want to get the actual indexed values.
Or maybe my understanding of how MongoDB indexes is incorrect. If I've been indexing a field called text that contains paragraphs, am I right in thinking that what gets indexed is each word in the paragraph?
Either case I want to retrieve the values that were indexed, which db.collection.getIndexes() doesn't seem to be returning.
Well yes and no, in summary.
Indexes work on the "values" of the fields they are supplied to index, and are much like a "card index" in that there is a point of reference to look at to find the location of something that matches that term.
What "you" seem to be asking about here is "text indexes". This is a special index format in MongoDB and other databases as well that looks at the "text" content of a field and breaks down every "word" in that content into a value in that "index".
Typically we do:
db.collection.createIndex({ "text": "text" })
Where the "field name" here is "text" as you asked, but more importantly the type of index here is "text".
This allows you to then insert data like this:
db.collection.insert({ "text": "The quick brown fox jumped over the lazy dog" })
And then search like this, using the $text operator:
db.collection.find({ "$text": { "$search": "brown fox" } })
Which will return and "rank" in order the terms you gave in your query depending how they matched the given "text" of your field in the index on your collection.
Note that a "text" index and it's query does not interact on a specific field. But the index itself can be made over multiple fields. The query and the constraints on the "index" itself are that there can "only be one" text index present on any given collection otherwise errors will occur.
As per mongodb's docs:
"db.collection.getIndexes() returns an array of documents that hold index information for the collection. Index information includes the keys and options used to create the index. For information on the keys and index options, see db.collection.createIndex()."
You first have to create the index on the collection, using the createIndex() method:
db.records.createIndex( { userid: 1 } )
Queries on the userid field are supported by the index:
Example:
db.records.find( { userid: 2 } )
db.records.find( { userid: { $gt: 10 } } )
Indexes help you avoid scanning the whole document. They basically are references or pointers to specific parts of your collection.
The docs explain it better:
http://docs.mongodb.org/manual/tutorial/create-an-index/

Exact match when searching in arrays of array in MongoDB

I have two questions. I found similar things but I couldn't adapt to my problem.
query = {'$and': [{'cpc.class': u'24'},
{'cpc.section': u'A'},
{'cpc.subclass': u'C'}]}
collection:
{"_id":1,
"cpc":
[{u'class': u'24',
u'section': u'A',
u'subclass': u'B'},
{u'class': u'07',
u'section': u'C',
u'subclass': u'C'},]}
{"_id":2,
"cpc":
[{u'class': u'24',
u'section': u'A',
u'subclass': u'C'},
{u'class': u'07',
u'section': u'K',
u'subclass': u'L'},]}
In this query, two documents will be fetched.
1) But I want to fetch only the second document ("_id": 2) because it matches the query exactly. That is, the second document contains a cpc element which its class equals to 24, its section equals to A, and its subclass equals to C.
2) And I want to fetch only the matching element of cpc if possible? Otherwise I have to traverse all elements of each retrieved documents; if I traverse and try to find out which element matches exactly then my first question would be meaningless.
Thanks!
1) you're looking for the $elemMatch operator which compares subdocuments as a whole and is more concise then separate subelement queries (you don't need the $and in your query by the way):
query = { 'cpc' : {
'$elemMatch': { 'class': u'24',
'section': u'A',
'subclass': u'C' } } };
2) That can be done using a projection:
db.find(query, { "cpc.$" : 1 })
The $ projection operator documentation contains pretty much this use case as an example.

MongoDB: Perform a text-search in a document field (using high-level API)

It may be related to this question
Basic GROUP BY statement using OPA MongoDB high level API.
I want to be able to retrieve a list of documents which "name" field value contains a given string.
Here's my documents list :
{name: "Charles-Hugo"},
{name: "Jean Pierre"},
{name: "Pierre Dupont"},
I want to be able to only retrieve documents which name contains the "Pierre" string: Jean Pierre, and Pierre Dupont.
I know this isn't possible with the MongoDB high-level API.
I've looked in the low-level API functions but I don't know what's the easiest way to retrieve my documents in safe Opa type.
Also I'd like to add skip/limit options to my query.
Any idea ?
The DbGen automation mechanism in Opa has support for this:
DbSet.iterator(/path/data[name =~ pattern])
As #Henri pointed out there is regular expression searching support in Opa since commit [enhance] DbGen: add case insensitive regex operator =~ what is very nice.
Mind that it is using $regex operator, not the full-text index and it may result with some performance loss :( As MongoDB documentation says $regex operator uses indexes in limited way - only for prefix search: pattern ^Jean. Searching for Jean anywhere in text will require full scan.
Personally, I am using full-text index feature of Mongo with Opa's "low-level" API for the $text command like this:
function list({float score, Article.id id}) textSearch(string query) {
function onfailure(failure) {
cat.error("textSearch({{~query}}): {failure}");
[];
}
function onsuccess(success) {
function aux(~{name,value}) {
name == "results";
}
match (List.filter(aux, success)) {
| [] :
// `results` field not found - error
onfailure(success);
| results:
cat.debug("textSearch({~{query}}): {results}");
function ({~score, obj: ~{id}}) {
~{score, id}
}
|> List.map(_, Bson.doc2opa(results) ? []);
}
}
opts = [H.str("search", query), H.doc("project", [H.i32("_id",0), H.i32("id",1)])];
// { search: query, project: {_id:0, id:1}, }
// |> Bson.opa2doc
outcome = MongoCommands.simple_str_command_opts(ll_db, db_name, "text", coll_name, opts);
MongoCommon.outcome_map(outcome, onsuccess, onfailure)
}
Feature is available in Mongo since 2.4 as experimental (you have to turn it on by special configuration option) and in 2.6 as stable (turned on by default).

Querying array of arrays in MongoDB

I have a mongo collection which has an array of arrays (bigrams from a NLP process) that I'd like to be able to search, so for example;
{
"sentence" : "will most likely be",
"biGrams" : [["will","most"], ["most","likely"], ["likely", "be"]
},
{
"sentence" : "likely most people use stackoverflow",
"biGrams" : [["likely","most"], ["most","people"], ["people", "use"], ["use", "stackoverflow"]
}
What I'd like to be able to do is search through the biGram sub-doucment for a certain instance of one of these bigrams, e.g. search for all sentences that contain the bigram ["most","likely"].
I've tried this;
find({'biGrams':{$elemMatch: {$elemMatch:{$in:['most','likely']}} }})
But this obviously finds all cases with the word 'most' or 'likely' are present. And the order is important, i.e. I don't want to find docs with ['likely','most'] in this example.
Thanks in advance, I'm stumped....
How about
find({"biGrams":["most","likely"]})
Searching on a particular field "unwinds" one level of arrays in that field, and searching for a particular array should be a binary match on that array.