MongoDB: Can I store stock data in this way? - mongodb

{
{
"symbol": "MSFT",
"close": [0, 1, 2, 3, 4, 5],
"open": [0, 1, 2, 3, 4, 5],
"high": [0, 1, 2, 3, 4, 5],
"low": [0, 1, 2, 3, 4, 5],
"volume": [0, 1, 2, 3, 4, 5],
"dates": ["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-04", "2022-01-05", "2022-01-06"],
"date_to_index": {
"2022-01-01": 0,
"2022-01-02": 1,
"2022-01-03": 2,
"2022-01-04": 3,
"2022-01-05": 4,
"2022-01-06": 5
}
}
when I need the data of MicroSoft from 2022-01-03 to 2022-01-05, I will get the start and end indices from date_to_index and then retrieve the slice from index 2 to index 4 of the data arrays I want.

You can certainly store data this way, but
looks you'll need to fetch the entire object each time you want to extract only a part of data or do two queries. Either way, it looks not ideal.
Gut feeling says there's a risk of not fitting into document size limit when using real world data (MSFT, for example, has decades of stock data history). Having sub-day resolution increases this risk even further.
Overall, I'd explore alternate strategies.

Related

MongoDB - Find how many documents have the same characteristics

I'm saying sorry for the title and for not providing an example, but I'm very new to MongoDB and, after trying to accomplish this result using MySQL, I moved to MongoDB because I think that can be simpler to archive this result :(
I'll need to find how many documents have the same "characteristics".
I try to expose this with a restaurant example:
I need to find the most popular dishes that a family ordered
This is the dataset, where persons and withChildren is the criteria of the group by:
{"persons": 4, "dish1": 3, "dish2": 4},
{"persons": 4, "dish1": 3, "dish2": 4},
{"persons": 4, "dish1": 3, "dish2": 4},
{"persons": 4, "withChilden": true, "dish1": 3, "dish2": 4},
{"persons": 4, "dish1": 3, "dish2": 2},
{"persons": 4, "dish1": 3, "dish2": 2},
{"persons": 4, "dish1": 3, "dish2": 2, "dish3": 6},
I make a separation to the rows to better show the difference:
(4 persons) has ordered (dish1=3 / dish2=4) three times
(4 persons withChilden) has ordered (dish1=3 / dish2=4) one time
(4 persons has ordered) has ordered (dish1=3 / dish2=2) two times
(4 persons has ordered) has ordered (dish1=3 / dish2=2 / dish3=6) one time
The goal is to produce documents that expose the previous rows, like that:
{
{ "type": {"persons": 4} },
"dish1": 3,
"dish2": 4,
"tot": 3
}
For the type with children, will be:
{
{ "type": {"persons": 4, "withChildren": true} },
"dish1": 3,
"dish2": 4,
"tot": 1
}
I'll already try to read this solutions, that seems to be a little similar on what I need to accomplish, but because I'm very new to MongoDB I don't know if it's possible to have this result with a single query, if I need to write a script and so on.
The nested object in the result is not trivial, so the result could be a plain object too, like that:
{
"persons": 4,
"dish1": 3,
"dish2": 4,
"tot": 3
}
Thanks a lot for your help and understanding

RxDart convert Stream based on previous value

Lets say I have a Stream that emits a List followed by single elements like:
Stream.fromIterable([
[1, 2, 3],
4,
5,
]);
How to convert it to a Stream that updates the previous element with current value and emits:
[1, 2, 3],
[1, 2, 3, 4],
[1, 2, 3, 4, 5],

MongoDB setIntesection on array of arrays

I'm pretty new to Mongo DB, and I'm having a bit of trouble getting the aggregate intersection working.
Let's say I only have the following document in a collection:
{"ids" : [ [ 1, 4, 7, 10, 13 ], [ 1, 3, 5, 7, 9, 11, 13, 15 ], [1, 3, 5, 7] ] }
and I would like to return
{"intersection" : [1, 7]}
I'm doing:
db.collection.aggregate([ {$project: {intersection:{$setIntersection:"$ids"}}} ])
but that is returning
{"intersection" : [ [ 1, 4, 7, 10, 13 ], [ 1, 3, 5, 7, 9, 11, 13, 15 ], [1, 3, 5, 7] ] }
I'm assuming it is because "$ids" is interpreted as an array of an array of ints, as opposed to var-args where each arg is an array of ints.
Any idea how to get this working?
It looks like you want to find all elements that occur in every array inside of ids.
This can't be handled with set intersection because these are array elements and not fields in a document and there isn't a way to refer to individual array elements in projections.
Here is a work-around for you, may or may not work, depending on the rest of the aggregation needs:
db.inter.aggregate(
{$project:{ids:1, sz:{$size:"$ids"}}},
{$unwind:"$ids"},
{$unwind:"$ids"},
{$group:{_id:{_id:"$_id",ids:"$ids"},count:{$sum:1},need:{$first:"$sz"}}},
{$project:{keep:{$eq:["$need","$count"]}}},{$match:{keep:true}},{$sort:{_id:1}},
{$group:{_id:"$_id._id",intersection:{$push:"$_id.ids"}}},
{$project:{ intersection:1}}
)
This figures out how many elements the array of arrays has and then calculates how many times each number appears in unwound set. If it's same as size, it must have been in each subelement. This assumes, however, that each subelement can not have the same number twice.

Mongodb Aggregation with very complex documents

I have a fairly complex document model that is structurally like this:
{
_id: 1,
"title": "I'm number one",
... (many other meta data text fields not desired in the summary)
"foo": {
"tom": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
"dick": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
"harry": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
... (Total of 14 fields in foo)
},
"bar": {
"joe": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
"fred": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
"bob": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
... (Total of 14 fields in bar)
},
"dodads": [
{
"contraption": 0,
"doohickey": 0,
"gewgaw": 0,
"gizmo": 0,
... (total of 15 elements in each doodad object)
},
{
"contraption": 0,
"doohickey": 0,
"gewgaw": 0,
"gizmo": 0,
...
},
... (total of 6 objects in dodads object array)
]
},
... (a couple hundred documents in total)
What I'm looking for is a summary of all the objects/arrays that have numeric data. I would like the result to be a document, in the original format, that contains the numeric fields summarized. For now, let's say the documents all have the same structure.
The aggregation result would be like the following
{
"foo": {
"tom": [35, 65, 13, 22, 36, 58, 93, 43, 56, 44, 23, 72],
"dick": [56, 87, 28, 49, 34, 22, 48, 86, 29, 23, 88, 29],
... (All 14 fields in foo)
},
"bar": {
"joe": [87, 28, 49, 34, 22, 48, 86, 29, 23, 88, 29, 47],
"fred": [13, 22, 36, 58, 93, 43, 56, 44, 23, 72, 35, 65],
... (All 14 fields in bar)
},
"dodads": [
{
"contraption": 45,
"doohickey": 88,
"gewgaw": 23,
"gizmo": 64,
... (All 15 elements in each doodad object)
},
{
"contraption": 12,
"doohickey": 73,
"gewgaw": 57,
"gizmo": 86,
...
},
... (All 6 objects in dodads object array)
]
}
I believe I can unwind the arrays, specify sums and projections and get exactly what I want with an extensive and verbose aggregation pipeline. I could also do multiple queries grabbing the component pieces (one that's just foo, a second that's just bar...).
What I'm wondering is, is there a shorthand way of specifying summarizations? For example, can I say I want the summary of foo or foo.tom and get back their contents summarized?
There are some things in your document structure that are really not going to help you here. That is primarily the use of sub-documents like these:
"foo": {
"tom": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
"dick": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
"harry": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
},
"bar": {
"joe": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
"fred": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
"bob": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
}
That makes things pretty difficult as you can usually only get at the contained fields with notation such as "foo.tom", "bar.fred" etc. For reasons I have commented on before, and which is best explained by following through the links, but summarizing, where this is possible you are going to make life easier by changing the structure of the documents:
"foo": [
{ "name": "tom", "values": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] },
{ "name": "dick", "values": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] },
{ "name": "harry", "values": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] }
],
As this will give you better access to query the elements than the explicit references you would need to use otherwise. The answers I have given before go through this in more depth.
As for what you want to do in finding the fields that are numeric, I asked this question here which is basically a rewording of what you require. From the response there is an approach to doing this using mapReduce.
map = function() {
function isNumber(n) {
return !isNaN(parseFloat(n)) && isFinite(n);
}
var numerics = [];
for(var fn in this) {
if (isNumber(this[fn])) {
numerics.push({f: fn, v: this[fn]});
}
if (Array.isArray(this[fn])) {
// example ... more complex logic needed
if(isNumber(this[fn][0])) {
numerics.push({f: fn, v: this[fn]});
}
}
}
emit(this._id, { n: numerics });
};
reduce = function(key, values) {
return values;
};
This may be what you need, but from this skeleton beware that you may need to do some complex unwinding of the fields in your document in order to test this as there is really no simple way to do it. You would basically have to add a lot of traversal logic into that to come up with what you want in the structure that you have.
As you seem to be after "finding out information on the structure of the documents", then you might want to look at the answers on this question:
MongoDB Get names of all keys in collection

Comparing document fields

Let's say I have this document structure:
{
"user": "John Doe",
"data": [1, 3, 2, 4, 1, 3],
"data_version": 1
}
Can I query by data field in such a way so I could match all documents that match at least N values inside the array, at the same position?
So for example, in those data fields:
1, 3, 4, 2, 5, 1, 5
2, 5, 1, 4, 2, 3, 5
1, 3, 2, 5, 5, 4, 2
5, 2, 4, 1, 2, 2, 3
Searching for
1, 3, 3, 1, 5, 4, 3
with N minimum limit being 3, I'd get the 1st and 3rd document, but raising N to 4, I'd get only the 3rd document.
You will need to iterate over your collection. Something similar to the following should work:
var N = 3;
var query = [1,3,3,1,5,4,3];
db.users.find().forEach(function(entry) {
var similarity = 0;
for (i = 0; i < entry.data.length; i++) {
if (entry.data[i] === query[i]) { similarity++; }
}
if (similarity > N) { print(entry); }
});
Does this help?
I don't think you can do this in the query language itself, there is no construct for "match N out of M". I can't think of a way to do this with a different data model, either. I actually doubt there's something like that in any query language, or is there?
I think you'd be left doing the matching inside your application.