Mongo data modeling for monitoring application - mongodb

I am trying to understand the best modeling for monitoring application.
I have a monitoring application which will be running every 30 mins to get stats from the target system and stores the details in MongoDB.
Use case:
Products, Companies
There will be around 2000 products. Products will be added/removed but the growth will not be more than 10% every month. So, I don't expect more than 3000 in the next 1 year.
Companies are the consumers for each products. There will be 1 to 10 companies for each product who are using the product. Consumers count also will go up and down.
So, on each run, we will get list of products along with the corresponding companies. Product details will be like,
Product:
Product name
Total number (this will give the current number available and will change on every poll)
Product weight
Durability days (might change once in a while)
Companies List - Who are using this product
Sample data for product:
{
"productName" : "Small Box",
"total" : NumberLong(1000),
"weight" : "1.5",
"durability" : "20",
"companies" : [
{
"name" : "Nike",
"taken" : NumberLong(10)
},
{
"name" : "Reebok",
"taken" : NumberLong(20)
}
]
}
Here, taken count will keep changing on each poll.
Web application:
There will be 3 screens to show the details.
Dashboard - Which will show high level stats like ( No of products, No of companies, Total size, ....)
Products - List view( To view the complete list )- Will show the details of a product on selecting any product
Here, I will have to show the product details and will have to list the companies who are all consuming.
Companies - List view( To view the complete list )- Will show the details company each selecting any company
Here, I will have to show Company details and all the products it is consuming.
The way, I am storing currently.
Dashboard collection - To show the stats details like, Total products, Total companies, ...
{
"time" :
"totalProducts" : NumberLong(1000),
"totalCompanies" : "1.5",
}
Products collection - Will have the following details.
{
"productName" : "Small Box",
"total" : NumberLong(1000),
"weight" : "1.5",
"durability" : "20",
"companies" : [
{
"name" : "Nike",
"taken" : NumberLong(10)
},
{
"name" : "Reebok",
"taken" : NumberLong(20)
}
]
}
Companies collection - will have the following details
{
"companyName" : "Nike",
"products" : [
{
"name" : "Small Box",
"taken" : NumberLong(10)
},
{
"name" : "Medium Box",
"taken" : NumberLong(20)
}
]
}
So, on each run, I am generating unique Id and adding this id to all the data being stored. I am keeping only last 2 weeks of data in these collections. Data older than 2 weeks will be cleaned every day.
So, when user comes to Dashboard, doing sort by to get the latest record and showing the details. There will be only one record for each run in Dashboard collection and there will be last 2 weeks of records.
When user comes to Products screen, Still will have to get the latest record from Dashboard to get the UniqueId and going to Products collection to get all the records for that UniqueId as there will be around 2000 records for each run. Same for companies collection.
Here, I will have to always show the latest data. I am going to 2 different collection when user goes to Products or Companies screen.
Is there any better approach?

Please check Its help you how to prepare schema
Note : Mongo Version : 3.6.5
Products Coolection
/* 1 */
{
"_id" : ObjectId("5bb1e270269004e06093e178"),
"productName" : "Small Box",
"total" : NumberLong(1000),
"weight" : "1.5",
"durability" : "20",
"companies" : [
ObjectId("5bb1e2d2269004e06093e17b"),
ObjectId("5bb1e2d8269004e06093e17c")
],
"date" : ISODate("2018-10-01T09:28:40.502Z")
}
/* 2 */
{
"_id" : ObjectId("5bb1e293269004e06093e179"),
"productName" : "Large Box",
"total" : 1000.0,
"weight" : "1.2",
"durability" : "20",
"companies" : [
ObjectId("5bb1e2d8269004e06093e17c"),
ObjectId("5bb1e2de269004e06093e17d")
],
"date" : ISODate("2018-10-01T09:28:40.502Z")
}
/* 3 */
{
"_id" : ObjectId("5bb1e29d269004e06093e17a"),
"productName" : "Medium Box",
"total" : 1000.0,
"weight" : "1.2",
"durability" : "20",
"companies" : [
ObjectId("5bb1e2d2269004e06093e17b"),
ObjectId("5bb1e2d8269004e06093e17c"),
ObjectId("5bb1e2de269004e06093e17d")
],
"date" : ISODate("2018-07-01T09:28:40.502Z")
}
Company collection
/* 1 */
{
"_id" : ObjectId("5bb1e2d2269004e06093e17b"),
"companyName" : "Nike"
}
/* 2 */
{
"_id" : ObjectId("5bb1e2d8269004e06093e17c"),
"companyName" : "Reebok"
}
/* 3 */
{
"_id" : ObjectId("5bb1e2de269004e06093e17d"),
"companyName" : "PUMA"
}
Get Single products with comapny
db.getCollection('products').aggregate([{
$match : { "_id" : ObjectId("5bb1e270269004e06093e178") } },
{ $lookup : {
from : 'company',
foreignField : '_id',
localField : 'companies',
as : 'companies'
}
}
])
All Producsts with Company List
db.getCollection('products').aggregate([
{ $lookup : {
from : 'company',
foreignField : '_id',
localField : 'companies',
as : 'companies'
}
}
])
Company by id and its used products
db.getCollection('company').aggregate([{
$match : { "_id" : ObjectId("5bb1e2d2269004e06093e17b") } },
{ $lookup : {
from : 'products',
foreignField : 'companies',
localField : '_id',
as : 'products'
}
}
])
Also by add date fields in each products you get last week data by
db.getCollection('products').aggregate([{
$match : {
date: {
$gte: new Date(new Date() - 7 * 60 * 60 * 24 * 1000)
} } },
{ $lookup : {
from : 'products',
foreignField : 'companies',
localField : '_id',
as : 'products'
}
}
])
Get latest product with company
db.getCollection('products').aggregate([
{ $sort : { date : -1} },
{ $limit : 1},
{ $lookup : {
from : 'company',
foreignField : '_id',
localField : 'companies',
as : 'companies'
}
}
])

So, On each run, I am generating unique Id and adding this id to all the data being stored. I am keeping only last 2 weeks of data in these collections. Data older than 2 weeks will be cleaned every day.
i wouldnt do it like this. i would use the _id, mongodb automaticly give you. and make a runobject that collects all those objects id's. because its better to have 1 key per object than 336 keys ( 48(30 minutes a day) x 14(days in 2 weeks))
i would make a run object that contains a array of company_id's , product_id's and timestamp. and i would make it a possibility that if nothing has changed in 30 minutes that you only use a runobject _id(of the first one that was the same) and a timestamp. so you saved a lot of space.
hopefully i understand you correctly, because it was a tough read for me.

Related

Problem with the speed of mongodb queries

I have a csv file ( name : members ) that contains 20,000 IDs and a mongodb BSON collection ( name : Customers ) that contains 40 million rows of data and with ID , phone_Number , etc columns.
what I want to do is that I want to search for those 20,000 IDs ( name : user id ) that are available in that 40-million-rowed database ( name : id ) and if so, I want to return the ID and the phone-Number. Here is mongodb query that I am currently using:
db.getCollection("Customers").aggregate(
[
{
"$project" : {
"_id" : NumberInt(0),
"Customers" : "$$ROOT"
}
},
{
"$lookup" : {
"localField" : "Customers.id",
"from" : "members",
"foreignField" : "user id",
"as" : "members"
}
},
{
"$unwind" : {
"path" : "$members",
"preserveNullAndEmptyArrays" : false
}
},
{
"$project" : {
"Customers.id" : "$Customers.id",
"Customers.phone" : "$Customers.phone",
"_id" : NumberInt(0)
}
}
],
{
"allowDiskUse" : true
}
);
I am not so familiar with mongodb and the problem is that this query took 5 hours till now and still is not giving any output.
Is there any suggestions for me to get the result as fast as possible?
Is there any ways to speedup query's performance?
Do you suggest any other query codes for this job?
Thank you!

From two collections how to filter un matching data

In DB i have som sample data as fallows
items(Collection name)
//Object 1
{
"_id" : 1234,
"itemCode" : 3001,// (Number)
"category" : "Biscuts"
}
//Object 2
{
"_id" : 1235,
"itemCode" : 3002,// (Number)
"category" : "Health products"
}
The Above is the sample data in the items collection. So like this, there are many objects with the unique item code.
orders(Collection name)
{
"_id" : 1456,
"customer" : "ram",
"address" : "india",
"type" : "order",
"date" : "2018/08/20",
"orderId" : "999",
"itemcode" : "3001"//('string')
}
The above is the orders sample data. Even this collection has many objects with repeating item codes and orderid.
In the application, we have some tab called items not billed. So in this tab, we can see the items which were not used even once for the order. So from the above data how can I show the items which were not used?
For example: From the above data the resulting itemcode should be 3002 because that item is not used even once. How can I get the output with one DB query?
You can use below aggregation in mongo 4.0 version.
db.items.aggregate([
{ $addFields: {
itemCodeStr: {$toString: "$itemCode"}
}},
{
$lookup: {
from: "orders",
localField: "itemCodeStr",
foreignField: "itemcode",
as: "matched-orders"
}
},
{
$match: {
matched-orders: []
}
}
])

Is a mongodb query with 1 indexed field faster than multiple indexed fields?

In the following model a product is owned by a customer. and cannot be ordered by other customers. So I know that in an order by customer 1 there can only be products owned by customer one.
To give you an idea here is a simple version of the data model:
Orders:
{
'customer' : 1
'products' : [
{'productId' : 'a'},
{'productId' : 'b'}
]
}
Products:
{
'id' : 'a'
'name' : 'somename'
'customer' : 1
}
I need to find orders that contain certain products. I know the product id and customer id. I'm free to add/change indexes on my database.
Now my question is. Is it faster to just add a single field index on the product id's and query only using that ID. Or should I go for a compound index with customer and product id?
I'm not sure if this matters, but in my real model the list of products is actually a list of objects which have an amount and a dbref to the product. And the customer is also a dbref.
Here is a full order object:
{
"_id" : 0,
"_class" : "nl.pfa.myprintforce.models.Order",
"orderNumber" : "e35f1fa8-b4c4-4d53-89c9-66abe94a3553",
"status" : "ERROR",
"created" : ISODate("2017-03-30T11:50:50.292Z"),
"finished" : false,
"orderTime" : ISODate("2017-01-12T12:50:50.292Z"),
"expectedDelivery" : ISODate("2017-03-30T11:50:50.292Z"),
"totalItems" : 19,
"orderItems" : [
{
"amount" : 4,
"product" : {
"$ref" : "product",
"$id" : NumberLong(16)
}
},
{
"amount" : 7,
"product" : {
"$ref" : "product",
"$id" : NumberLong(26)
}
},
{
"amount" : 8,
"product" : {
"$ref" : "product",
"$id" : NumberLong(7)
}
}
],
"stateList" : [
{
"timestamp" : ISODate("2017-03-28T11:50:50.074Z"),
"status" : "NEW",
"message" : ""
},
{
"timestamp" : ISODate("2017-03-29T11:50:50.075Z"),
"status" : "IN_PRODUCTION",
"message" : ""
},
{
"timestamp" : ISODate("2017-03-30T11:50:50.075Z"),
"status" : "ERROR",
"message" : "Something went wrong"
}
],
"customer" : {
"$ref" : "customer",
"$id" : ObjectId("58dcf11a71571a24c475c044")
}
}
When I have the following indexes:
1: {"customer" : 1, "orderItems.product" : 1}
2: {"orderItems.product" : 1}
both count queries (I use count to forcefully find all documents without the network transfer):
a: db.getCollection('order').find({
'orderItems.product' : DBRef('product',113)
}).count()
b: db.getCollection('order').find({
'customer' : DBRef('customer',ObjectId("58de009671571a07540a51d5")),
'orderItems.product' : DBRef('product',113)
}).count()
Run with the same time of ~0.007 seconds on a set of 200k.
When I add 1000k record for a different customer (and different products) it does not effect the time at all.
an extended explain shows that:
query 1 just uses index 2.
query 2 uses index 2 but also considered index 1. Perhaps index intersection is used here?
Because if I drop index 1 the results are:
Query a: 0.007 seconds
Query b: 0.035 seconds (5x as long!)
So my conclusion is that with the right indexing both methods work about as fast. However, if you do not need the compound index for anything else it's just a waste of space & write speed.
So: single field index is better in my case.

Working with data in mongo

I need help to get a best form to do a specific query in a Mongodb
I have a jobs collection being fed hourly, and want take some useful information:
The last 10 record by jobname and take a avg the timedur but remove the higher and lower value.
I can get 10 last records by timeend key
Thanks for helping me solve this problem.
{
"_id" : ObjectId("52446679e4b0961fd47b63a9"),
"jobname" : "ftp_s_jobx",
"descript" : "Get some file",
"applic" : "PRD.TEAM",
"applgroup" : "bil.jobx.set",
"schedtab" : "bil.jobx.set",
"owner" : "cdfiles",
"runcount" : "1",
"cyclic" : "Y",
"times" : [
{
"timeelep" : "3674"
},
{
"timestmp" : "20130926132537"
},
{
"timedur" : "00:00:36"
},
{
"timeend" : "26/09/2013 13:25:37"
}
]
}

How can I select a number of records per a specific field using mongodb?

I have a collection of documents in mongodb, each of which have a "group" field that refers to a group that owns the document. The documents look like this:
{
group: <objectID>
name: <string>
contents: <string>
date: <Date>
}
I'd like to construct a query which returns the most recent N documents for each group. For example, suppose there are 5 groups, each of which have 20 documents. I want to write a query which will return the top 3 for each group, which would return 15 documents, 3 from each group. Each group gets 3, even if another group has a 4th that's more recent.
In the SQL world, I believe this type of query is done with "partition by" and a counter. Is there such a thing in mongodb, short of doing N+1 separate queries for N groups?
You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).
So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.
Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:
map = function () {
emit(this.name, {a:[this]});
}
The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).
reduce = function (key, values) {
result={a:[]};
values.forEach( function(v) {
result.a = v.a.concat(result.a);
} );
return result;
}
Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.
final = function (key, value) {
Array.prototype.sortByProp = function(p){
return this.sort(function(a,b){
return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
});
}
value.a.sortByProp('date');
return value.a.slice(0,5);
}
Using a template document similar to one you provided, you run this by calling mapReduce command:
> db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
{
"results" : [
{
"_id" : "group1",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe13"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.498Z"),
"contents" : 0.23778377776034176
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.467Z"),
"contents" : 0.4434165076818317
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe09"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.436Z"),
"contents" : 0.5935856597498059
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe04"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.405Z"),
"contents" : 0.3912118375301361
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfdff"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.372Z"),
"contents" : 0.221651989268139
}
]
},
{
"_id" : "group2",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe14"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.504Z"),
"contents" : 0.019611883210018277
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.473Z"),
"contents" : 0.5670706110540777
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.442Z"),
"contents" : 0.893193120136857
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe05"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.411Z"),
"contents" : 0.9496864483226091
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe00"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.378Z"),
"contents" : 0.013748752186074853
}
]
},
{
"_id" : "group3",
...
}
]
}
],
"timeMillis" : 15,
"counts" : {
"input" : 80,
"emit" : 80,
"reduce" : 5,
"output" : 5
},
"ok" : 1,
}
Each result has _id as group name and values as array of most recent five documents from the collection for that group name.
you need aggregation framework $group stage piped in a $limit stage...
you want also to $sort the records in some ways or else the limit will have undefined behaviour, the returned documents will be pseudo-random (the order used internally by mongo)
something like that:
db.collection.aggregate([{$group:...},{$sort:...},{$limit:...}])
here there is the documentation if you want to know more