MongoDB search multiple Field in multiple subDocument - mongodb

My MongoDB data structure looks like this:
{
"id": "7118592",
"passages": [
{
"infons": {
"article-id_pmid": "32292259",
"title": "Keywords",
"type": "front",
"section_type": "TITLE"
},
"text": "Understanding of guidance for acupuncture and moxibustion interventions on COVID-19 (Second edition) issued by CAAM"
},
{
"infons": {
"section_type": "ABSTRACT",
"type": "abstract",
"section": "Abstract"
},
"offset": 116,
"text": "At present, the situation of global fight against COVID-19 is serious. WHO (World Health Organization)-China Joint Mission fully confirms the success of \"China's model\" against COVID-19 in the report. In fact, one particular power in \"China's model\" is acupuncture and moxibustion of traditional Chinese medicine. To better apply \"non-pharmaceutic measures\":the external technique of traditional Chinese medicine, in the article, the main content of Guidance for acupuncture and moxibustion interventions on COVID-19 (Second edition) issued by China Association of Acupuncture-Moxibution is introduced and the discussion is stressed on the selection of moxibustion device and the duration of its exertion."
}
]
}
I want to request the article-id_pmid and the text in the same subdocument, also the text of subdocument which contains in infons a field with section_type : ABSTRACT and type: abstract.
I had tried this request but the result was not what I am searching for:
db.mydata.find({$and:[{"passages.infons.article-id_pmid$":{$exists: true}},{"passages.infons.section_type":"ABSTRACT"},{"passages.infons.type":"abstract"}]},{"passages.infons.article-id_pmid.$":1,_id:0, "passages.text":1})

Each of the top-level conditions is treated independently.
Use https://docs.mongodb.com/manual/reference/operator/query/elemMatch/ to specify multiple conditions on the same array element.

Related

REST related/nested objects URL standard

if /wallet returns a list a wallets and each wallet has a list of transactions. What is the standard OpenAPI/REST standard?
For example,
http://localhost:8000/api/wallets/ gives me
{
"count": 1,
"next": null,
"previous": null,
"results": [
{
"user": 1,
"address": "3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd",
"balance": "2627199.00000000"
}
]
}
http://localhost:8000/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd/ gives me
{
"user": 1,
"address": "3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd",
"balance": "2627199.00000000"
}
If I wanted to add a transactions list, what is the standard way to form this?
http://localhost:8000/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd/transactions ?
http://localhost:8000/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd/transactions?offset=100 for pagination
REST doesn't care what spelling conventions you use for your resources. What it would instead expect is that you have representations of links between resources, along with meta data describing the nature of the link.
So this schema is fine.
/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd
/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd/transactions
And this schema is also fine.
/api/wallets/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd
/api/transactions/3E8ociqZa9mZUSwGdSmAEMAoAxBK3FNDcd
As far as I can tell, OpenAPI also gives you the freedom to design your resource model the way that works best for you (it just tells you one possible way to document the resource model you have selected).

Partial Representations in REST API's for collections vs items

I'm putting together a REST based API but I'm not sure on how I should deliver the response for collections vs individual resources.
Does it make sense to have a slimmed down representation for a collection over a single item in the world of REST?
Say I have something along the lines of this for a collection of albums:
{
items: [
{
"id": 1,
"title": "Thriller"
},
...
]
}
But then for the actual individual item I had
{
"id": 1,
"title": "Thriller",
"artist": "Michael Jackson",
"released": "1982",
"imageLinks": {
"smallThumbnail": "...",
"largeThumbnail": "..."
}
...
}
A resource representation should be unique irrespective of whether it is given as a collection or a single item. But, you can introduce a new parameter like fields which can be used by the clients to get only the required field thereby optimising the bandwidth.
/albums - This should give the list of objects each having the structure of what you would give in a individual item api
/albums?fields=id,title - This can give the list of objects with just the id & title.

Is it possible to run a query using Orion where the searching criteria is given by attributes value?

As I create entities in an Orion server, I can search by ID, as flat or using regular expressions:
http://<localhost>:1026/v1/queryContext
Content:
{
"entities": [
{
"type": "Sensor",
"isPattern": "true",
"id": "sensor_1.*"
}
],
"attributes": ["temperature","humidity"]
}
In the above example I'd get all objects of type "Sensor" whose ID starts with "sensor_1", and their attributes "temperature" and "humidity". I wonder if there is any way that would allow me to search by specific attribute value, for example to get those sensors whose humidity is over "60.2", or this selection must be done over the retrieved data queried by ID.
Not in the current Orion version (0.19.0) but it will be possible in the future. Have a look to the attribute::<name> filter with the = operator in this document.

Can I add or sort by a MongoDB match object?

So I have two collections, Trait and Question. For a given user, I iterate over the user's traits, and I want to query all of the questions that correspond to missing traits:
linq.From(missingTraits)
.ForEach(function(trait)
{
match.$or.push({ "Trait": trait });
});
database.collection("Questions", function(err, collection)
{
collection.find(match).limit(2).toArray(function(err, questions)
{
next(err, questions);
});
});
This works, but I'd like the objects to come back sorted by a field on the Trait document (which is NOT on the Question document):
Traits
[
{ "Name": "Height", "Value": "73", "Importance": 15 },
{ "Name": "Weight", "Value": "230" "Importance": 10 },
{ "Name": "Age", "Value": "29", "Importance": 20 }
]
Questions
[
{ "Trait": "Height", "Text": "How tall are you?" },
{ "Trait": "Weight", "Text": "How much do you weight?" },
{ "Trait": "Age", "Text": "How old are you?" }
]
So in the above example, if all three traits were missing, I'd want to bring back only Age and height (in that order). Is it possible to modify the query or the match object in some way to facilitate this?
Thank you.
If I am understanding your question right, you would like to find the two most important traits? If your missingTraits array is just the names of the traits ("Height", "Weight", "Age"), then you would want to query the Traits collection to find the two most important, before sending the query to the Questions collection.
Unfortunately, modifying just the query on your Questions collection will not work without a separate call to the Traits collection, since that is where the information on importance lives.
A query like
database.collection("Traits", function(err, collection)
{
collection.find(match).sort({ Importance : -1 }).limit(2).toArray(function(err, traits)
{
// cache the traits to query the questions collection
});
});
will get you the two most important Traits that match the $or query (all matches, sorted by Importance by descending order, limited to 2). You can then use these results to query the Questions collection.
Telling the Questions collection to return the docs in "Age", "Height" order is difficult in this case, so I would recommend caching the importance order of the traits you receive back from the Traits collection and then sorting the results of the query on Questions client-side.
Here are the docs for sort():
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%7B%7Bsort%28%29%7D%7D
One last thing: if you are populating missingTraits via a database call to the Traits collection, then it should be possible to place the sort() logic into that call, rather than adding a second call to the collection.

MongoDB Schema Design for language database

I need some advice on MongoDB schema design for a natural language database.
I need to store for each language texts and words like:
lang: {
_id: "English",
texts : [
{ text : "This is a first text",
date : Date("2011-09-19T04:00:10.112Z"),
tag : "test1"
},
{ text : "Second One",
date : Date("2011-09-19T04:00:10.112Z"),
tag : "test2"
}
],
words : [
{
word : "This",
},
{
word : "is",
},
{
word : "a",
},
{
word : "first",
},
{
word : "text",
},
{
word : "second",
},
{
word : "one",
}
]
}
And then I need to know each words and texts a user has associated. The word/text amount tends to be huge and I need to list all words on a language and all words a user has associated for that language.
From my perspective I think storing the user_ids that are associated with a given word in an array for the word is maybe a good approach like:
lang: {
_id: "English",
texts : [
...
],
words : [
{
word : "This",
users: [user1,user2,user3]
},
{
word : "is",
users: [user1,user2]
},
...
]
}
Having in mind that a word can be associated to hundreds of thousand of users and the document limit (as I read) is 4MB and that I need to:
List all words for a given user and language
Is this a good approach? Or can you think of a better one?
Hope this question is clear enough and that someone can give me a help on this ;)
Thank you all!
I don't think this is a good approach, for just the reason you mention: the document size limit. It looks like with your approach, you are definitely going to run up against the limit. I would go for a flatter approach (which should also make your collection easier to query). Something like this:
[
{
user: "user1",
word: "This",
lang: "en"
},
{
user: "user1",
word: "is",
lang: "en"
},
// et cetera...
]
In other words, grow vertically by adding documents rather than horizontally by adding more data to one document. You can query words for a given user with db.find( { user: "user1", lang: "en" });.
This approach isn't "normalized", of course, so if you're concerned about space then you might want to create a separate collection for users, words, and languages and reference them in the main collection by an ID. But since there are no join queries in MongoDB, you have to weigh query performance against space efficiency.
dbaseman is correct (and upvoted), but a couple of other points:
First, the document limit is now 16MB (Max Document Size), as of this writing, assuming you are running a recent versionof MongoDB.
Second, unbounded growth is generally a bad idea in MongoDB, this type of document size expansion can cause MongoDB to have to move the document if it exceeds the current space allocated to it. You can read more about this in the Padding Factor section of the documentation.
Those types of moves are relatively expensive, especially if they happen frequently. Therefore, if you do go with this type of design limiting the size (essentially bounding that growth) of the comments equivalent in your main collection (most recent X, most popular X etc.) and perhaps even pre-populating that document field (essentially manual padding) to beyond the average size will reduce the moves caused additions/changes.
This is the reason why tip #6 in the MongoDB Developers tips and tricks book from O'Reilly is:
Tip #6: Do not embed fields that have unbound growth