MongoDB - Find how many documents have the same characteristics - mongodb

I'm saying sorry for the title and for not providing an example, but I'm very new to MongoDB and, after trying to accomplish this result using MySQL, I moved to MongoDB because I think that can be simpler to archive this result :(
I'll need to find how many documents have the same "characteristics".
I try to expose this with a restaurant example:
I need to find the most popular dishes that a family ordered
This is the dataset, where persons and withChildren is the criteria of the group by:
{"persons": 4, "dish1": 3, "dish2": 4},
{"persons": 4, "dish1": 3, "dish2": 4},
{"persons": 4, "dish1": 3, "dish2": 4},
{"persons": 4, "withChilden": true, "dish1": 3, "dish2": 4},
{"persons": 4, "dish1": 3, "dish2": 2},
{"persons": 4, "dish1": 3, "dish2": 2},
{"persons": 4, "dish1": 3, "dish2": 2, "dish3": 6},
I make a separation to the rows to better show the difference:
(4 persons) has ordered (dish1=3 / dish2=4) three times
(4 persons withChilden) has ordered (dish1=3 / dish2=4) one time
(4 persons has ordered) has ordered (dish1=3 / dish2=2) two times
(4 persons has ordered) has ordered (dish1=3 / dish2=2 / dish3=6) one time
The goal is to produce documents that expose the previous rows, like that:
{
{ "type": {"persons": 4} },
"dish1": 3,
"dish2": 4,
"tot": 3
}
For the type with children, will be:
{
{ "type": {"persons": 4, "withChildren": true} },
"dish1": 3,
"dish2": 4,
"tot": 1
}
I'll already try to read this solutions, that seems to be a little similar on what I need to accomplish, but because I'm very new to MongoDB I don't know if it's possible to have this result with a single query, if I need to write a script and so on.
The nested object in the result is not trivial, so the result could be a plain object too, like that:
{
"persons": 4,
"dish1": 3,
"dish2": 4,
"tot": 3
}
Thanks a lot for your help and understanding

Related

How to cluster a list of multiple sets using clustering data-mining algorithms?

I have a list of multiple sets as an input, for example:
[
{{0, 1, 3}, {2, 4, 5, 8}, {6, 7, 9, 10}},
{{0, 1, 2, 3}, {4, 5, 8}, {6, 7, 9}, {10}},
{{0, 1, 2, 3}, {4, 5, 8}, {6, 7, 9}, {10}},
{{0}, {1, 2, 3}, {4, 5, 6, 7, 8, 9, 10}},
...
]
Every row in the list is a set, which contains multiple sets that aggregate the numbers 1~10.
I want to cluster these rows so that the rows that cluster numbers 1-10 following a similar pattern will be clustered.
I have been contemplating for a long time, still can't come up with any ideas of how to make these rows clusterable by clustering algorithm like k-means.
Please give me hints, thank you very much.

MongoDB: Can I store stock data in this way?

{
{
"symbol": "MSFT",
"close": [0, 1, 2, 3, 4, 5],
"open": [0, 1, 2, 3, 4, 5],
"high": [0, 1, 2, 3, 4, 5],
"low": [0, 1, 2, 3, 4, 5],
"volume": [0, 1, 2, 3, 4, 5],
"dates": ["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05", "2022-01-06"],
"date_to_index": {
"2022-01-01": 0,
"2022-01-02": 1,
"2022-01-03": 2,
"2022-01-04": 3,
"2022-01-05": 4,
"2022-01-06": 5
}
}
when I need the data of MicroSoft from 2022-01-03 to 2022-01-05, I will get the start and end indices from date_to_index and then retrieve the slice from index 2 to index 4 of the data arrays I want.
You can certainly store data this way, but
looks you'll need to fetch the entire object each time you want to extract only a part of data or do two queries. Either way, it looks not ideal.
Gut feeling says there's a risk of not fitting into document size limit when using real world data (MSFT, for example, has decades of stock data history). Having sub-day resolution increases this risk even further.
Overall, I'd explore alternate strategies.

Can I sort the values in a field in mongodb?

If I have a document like this
{'id': 123, 'keywords': {'keyword1': 3, 'keyword2': 8, 'keyword3': 2}}
Can I get a query result with keywords sorted by values, like
{'id': 123, 'keywords': {'keyword3': 2, 'keyword1': 3, 'keyword2': 8}}
Is there a built-in mongodb feature for doing this? Or I have to implement this in application level?

indexing and searching of non-text data

Is there any efficient way to search for non-text data? For example, let say there are millions of documents of this form:
{
data: [
{attr1: 5, attr2: 4},
{attr1: 3, attr2: 3},
{attr1: 1, attr2: 2},
... // several hundred more things
]
}
and I want to find all documents with data which has subarray (of variable length, let say 2 to 10), such as
[{attr1: 3, attr2: 3}, {attr1: 1, attr2: 2}].
where the order is important. Further, if I want other customization, such as if I allow attr2 to not be the exactly equal values, but near the specified values, does that complicate things a whole lot?

Comparing document fields

Let's say I have this document structure:
{
"user": "John Doe",
"data": [1, 3, 2, 4, 1, 3],
"data_version": 1
}
Can I query by data field in such a way so I could match all documents that match at least N values inside the array, at the same position?
So for example, in those data fields:
1, 3, 4, 2, 5, 1, 5
2, 5, 1, 4, 2, 3, 5
1, 3, 2, 5, 5, 4, 2
5, 2, 4, 1, 2, 2, 3
Searching for
1, 3, 3, 1, 5, 4, 3
with N minimum limit being 3, I'd get the 1st and 3rd document, but raising N to 4, I'd get only the 3rd document.
You will need to iterate over your collection. Something similar to the following should work:
var N = 3;
var query = [1,3,3,1,5,4,3];
db.users.find().forEach(function(entry) {
var similarity = 0;
for (i = 0; i < entry.data.length; i++) {
if (entry.data[i] === query[i]) { similarity++; }
}
if (similarity > N) { print(entry); }
});
Does this help?
I don't think you can do this in the query language itself, there is no construct for "match N out of M". I can't think of a way to do this with a different data model, either. I actually doubt there's something like that in any query language, or is there?
I think you'd be left doing the matching inside your application.