Historical data structure on MongoDB - mongodb

Based on a certain time interval I need to implement pre-aggregated statistical data based on the following model:
I have a Product entity and ProductGroup entity that plays a role of Products container. I can have 0..N Products and 0..N ProductGroups with the MANY_2_MANY relationship between Products and ProductGroups.
Based on some own business logic I can calculate the order of every Product in the every ProductGroup.
I will do this calculation continuously per some period of time... let's say via Cron job.
I also would like to store the history for every calculation(versions) in order to be able to analyze the Product positions shifts.
I have created a simple picture with this structure:
Right now I use MongoDB database and really interested to implement this structure on MongoDB without introducing new technologies.
My functional requirements - I need to have the ability to quickly get the position(and position offset) for certain Product in the certain ProductGroup. Let's say P2 position and offset for ProductGroup1. The output should be:
position: 1
offset : +2
Also, I'd like to visualize the graphics and show the historical changes of positions for a certain Product with a certain ProductGroup. For example for Product P2 in ProductGroup1 the output should be:
1(+2), 3(-3), 0
Is it possible to implement with MongoDB and if so, could you please describe the MongoDB collection(s) structure in order to support this?

Since the only limitation is to "quickly query the data as I described at my question", the simplest way is to have a collection of snapshots with an array of products:
db.snapshots.insert({
group: "group 1",
products:[
{id:"P2", position:0, offset:0},
{id:"P4", position:1, offset:0},
{id:"P5", position:2, offset:0},
{id:"P6", position:3, offset:0}
], ver:0
});
db.snapshots.insert({
group: "group 1",
products:[
{id:"P3", position:0, offset:0},
{id:"P5", position:1, offset:1},
{id:"P1", position:2, offset:0},
{id:"P2", position:3, offset:-3},
{id:"P4", position:4, offset:0}
], ver:1
});
The index would be
db.snapshots.createIndex(
{ group: 1, ver: -1, "products.id": 1 },
{ unique: true, partialFilterExpression: { "products.id": { $exists: true } } }
);
And the query to fetch current position of a product in the group ("P4" in "group 1" in the example):
db.snapshots.find(
{ group: "group 1" },
{ _id: 0, products: { $elemMatch: { id: "P4" } } }
).sort( { ver:-1 } ).limit(1)
A query to fetch historical data is almost the same:
db.snapshots.find(
{ group: "group 1" },
{ _id: 0, products: { $elemMatch: {id: "P4" } }, ver: 1 }
).sort({ver:-1})

Related

Mongo DB Time Series Performance on just secondary Index

I am building a mongoose model to store survey response data. There are, however, different types of surveys with different response rates. One type of survey has frequent answers (perhaps every few seconds) and data is normally queried in chunks of time, eg from startDate to endDate of the response. However, some surveys only get responses maybe a 20 times a month, and sometimes I would want to get all the data for that survey just based on the survey_id, and, not using any time field constraints.
So my question is, do secondary indexes on time series collections work as well as they would on a non-time series collection?
My model looks like this:
const responseSchema = mongoose.Schema(
{
metaData: {
type: new mongoose.Schema({
survey_id: { type: mongoose.Schema.Types.ObjectId, ref: "survey", required: true },
}),
required: true,
},
createdAt: Date,
answers: { type: Map, of: mongoose.Mixed },
},
{
timeseries: {
timeField: "createdAt",
metaField: "metaData",
granularity: "seconds",
},
}
);
responseSchema.plugin(ts);
responseSchema.index({ "metaData.survey_id": 1, "createdAt": 1 });
I would expect normal querys using the createdAt field as filters to work well, but what if I only query by survey_id and don't use the time field. Will that still work well? or do I get performance degradation by not using the time field with a time series collection.
querys of this collection will always be based on the survey_id

Filter out an object with arrays having specific ids on the basis of an existing collection - (Aggregate Framework)?

I'm having two objects,
const originTimeStamp = {
chats: '2021-06-25T12:21:21.835+00:00',
users: '2021-06-21T12:21:21.835+00:00',
history: '2021-06-18T12:21:21.835+00:00'
}
const controlIds = {
chats: ['1bfe','2bfs','3bhr'],
users: ['6jkj'],
history: ['8her'],
}
and a collection that typically have some logs related to user activities:
{
controlId: '2bfs'
createdAt: '2021-07-19T12:21:21.835+00:00'
},
{
controlId: '6jkj'
createdAt: '2021-06-18T12:21:21.835+00:00'
},
{
controlId: '8her'
createdAt: '2021-06-25T12:21:21.835+00:00'
},
What I basically want to do is to filter out the controlIds object in such a way that if the control id exists in the collection and If the origin time stamp of that section say for chats '2021-06-25T12:21:21.835+00:00' < '2021-07-19T12:21:21.835+00:00' (of id '2bfs' from collection ) we will remove that id from the object.
Expected Result:
const controlIds = {
chats: ['1bfe','3bhr'],
users: ['6jkj'],
history: [],
}
Is there any way to achieve it with aggregation pipeline, right now i tried creating a flow but not able to do that? Here is the suggested flow i tried so far:
$match the documents with a particular range of timestamps
$project only the control_Id
$group them using $push to get all documents in an array
Assign this output to a variable and filter your object.

how to deal with many-to-many relationship in mongodb?

it is easy to deal with 1-1(via refs) or 1-N(via populate virtuals) relations in MongoDB
but how to deal with N-M relations?
suppose I have 2 entities teacher and classroom
many teachers can access many classrooms
many classrooms can be accessed by many teachers
teacher.schema
{
name:String;
//classrooms:Array;
}
classrooms.schema
{
name:String;
//teachers:Array
}
is there a direct way(similar like populate virtuals) to keep this N-M relations so that when one teacher removed, then teachers in classroom can automatically be changed too?
should I use a third 'bridge' schema like TeacherToClassroom to record their relations?
i am thinking of some thing like this, like a computed value
teacher.schema
{
name:String;
classrooms:(row)=>{
return db.classrooms.find({_id:{$elemMatch:row._id }})
}
}
classrooms.schema
{
name:String;
teachers:{Type:ObjectId[]}
}
so that i just manage the teacher ids in classrooms, then the classroom property in teach schema will auto computed
The literature describes a few methods on how to implement a m-n relationship in Mongodb.
The first method is by two-way embedding. Looking at an example using books and director of movies:
{
_id: 1,
name: "Peter Griffin",
books: [1, 2]
}
{
_id: 2,
name: "Luke Skywalker",
books: [2]
}
{
_id: 1,
title: "War of the oceans",
categories: ["drama"],
authors: [1, 2]
}
{
_id: 2,
title: "Into the loop",
categories: ["scifi"],
authors: [1]
}
The second option is to use one-way embedding. This means you only embed one of the documents into the other. Like so (movie with a genre):
{
_id: 1,
name: "drama"
}
{
_id: 1,
title: "War of the oceans",
categories: [1],
authors: [1, 2]
}
When the data you are embedding becomes larger you could use something like the bucketing pattern to split it up: https://www.mongodb.com/blog/post/building-with-patterns-the-bucket-pattern
As you can see in the above example by embedding the documents you still only need to modify the data in one location. You do not need any intermediate tables to do that.
In some cases you might even be able to omit an entire document when it has no meaning as a stand-alone object: Absorbing N in a M:N relationship

Handling a one-to-many relationship MongoDB schema design

I am currently designing the MongoDB schema for an event management system. The ER diagram is as follows:
The concept is fairly simple:
A company can create 1 or more events (estimating x500s of companies)
A client can attend 1 or more events from a multitude of companies (estimating x200 per client..also estimate x1000s of clients)
The is the classic many-to-many relationship, right?
Now I come from an RDBMS background, so my instincts on structuring a MongoDB schema might be incorrect. However I like MongoDB's flexible document nature and so I tried to come up with the following model structure:
Company model
{
_id: <CompanyID1>,
name: "Foo Bar",
events: [<EventID1>, <EventID2>, ...]
}
Event model
{ _id: <EventID1>,
name: "Rockestra",
location: LocationSchema, // (model below)
eventDate: "01/01/2019",
clients: [<ClientID1>, <ClientID2>, ...]
}
Client model
{ _id: <ClientID1>,
name: "Joe Borg"
}
Location model
{ _id: <LocationID1>,
name: "London, UK"
}
My typical query scenarios would probably be:
List all events organised by a specific company (including location details)
List all registered clients for a particular event
Would this design and approach be a sensible one to use given the cardinality I stated above? I guess one of the pitfalls of this design is that I could not get the company details if I just query the events model.
I would do
Company model
{
_id: <CompanyID1>,
name: "Foo Bar"
}
Event model
{ _id: <EventID1>,
name: "Rockestra",
location: LocationSchema, // embedded, not a reference
eventDate: "01/01/2019",
company: <CompanyID1> // indexed reference.
}
Client model
{ _id: <ClientID1>,
name: "Joe Borg",
events: [<EventID1>, <EventID2>, ...] // with index on events
}
List all events organised by a specific company (including location details):
db.events.find({company:<CompanyID1>})
List all registered clients for a particular event:
db.clients.find({events:<EventID1>})
It's not many-to-many unless a single event can be created by many companies. It looks like you are describing one-to-many.
This is the way I'd approach it.
Company model
{
_id:
name:
}
Client model
{
_id:
name:
}
ClientEvents model
{
_id
clientId
eventId
}
Event model
{
_id:
companyId:
name:
locationId:
eventDate:
}
Location model
{
_id:
name: "London, UK"
}

Mongoose - find documents with maximum no of counts

I am using Mongoose to fetch data from MongoDB. Here is my model.
var EmployeeSchema = new Schema({
name: String,
viewCount: { type: Number, default: 0 },
description: {
type: String,
default: 'No description'
},
departments: []
});
I need to find top 5 employees where count(viewCount) is highest order by name.
I am thinking of finding all the employee by using find() & then read viewCount property & produce the result. is there any better way to get the desired result.
All you need here is .sort() and .limit():
Employee.find().sort({ "viewCount": -1, "name": 1 }).limit(5)
.exec(function(err,results) {
});
And that is the top 5 employees in views ordered by name after the viewCount.
If you want them ordered by "name" in your final five, then just sort that result:
Employee.find().sort({ "viewCount": -1, "name": 1 }).limit(5)
.exec(function(err,results) {
// sort it by name
results.sort(function(a,b) {
return a.name.localeCompare(b.name);
});
// do something with results
});
You can sort by the view count and limit the search results to 5.
In code it might look like this:
Employee
.find()
.sort([['viewCount',-1], ['name',-1]])
.limit(5)
.exec(function(err, results){
//do something with the results here
});