Working with data in mongo - mongodb

I need help to get a best form to do a specific query in a Mongodb
I have a jobs collection being fed hourly, and want take some useful information:
The last 10 record by jobname and take a avg the timedur but remove the higher and lower value.
I can get 10 last records by timeend key
Thanks for helping me solve this problem.
{
"_id" : ObjectId("52446679e4b0961fd47b63a9"),
"jobname" : "ftp_s_jobx",
"descript" : "Get some file",
"applic" : "PRD.TEAM",
"applgroup" : "bil.jobx.set",
"schedtab" : "bil.jobx.set",
"owner" : "cdfiles",
"runcount" : "1",
"cyclic" : "Y",
"times" : [
{
"timeelep" : "3674"
},
{
"timestmp" : "20130926132537"
},
{
"timedur" : "00:00:36"
},
{
"timeend" : "26/09/2013 13:25:37"
}
]
}

Related

mongodb lookup giving empty array

BED_MAST this is my one collection bed_mast contains WARD_ID and want to perform join to my other collection with is WARD_MAST given below.
{
"_id" : ObjectId("5e53c95a26b0e5ad0fb46376"),
"Bed_id" : "bd-10",
"WARD_ID" : "4",
"OCCUPIED" : "0",
"BED_TYPE" : "single AC"
}
{
"_id" : ObjectId("5e53c95a26b0e5ad0fb46377"),
"Bed_id" : "bd-11",
"WARD_ID" : "1",
"OCCUPIED" : "0",
"BED_TYPE" : "single Non AC"
}
WARD_MAST this is my WARD_MAST having ward_id. but while I am putting lookup I am not getting any data.
{
"_id" : ObjectId("5e53c95b26b0e5ad0fb46544"),
"patient_id" : null,
"ward_id" : 1,
"total_beds" : 55,
"ward_name" : "Ward 1"
}
{
"_id" : ObjectId("5e53c95d26b0e5ad0fb46545"),
"patient_id" : null,
"ward_id" : 2,
"total_beds" : 63,
"ward_name" : "Ward 2"
}
MY query is
db.BED_MAST.aggregate([{$lookup:{'from':"WARD_MAST",'localField':"WARD_ID",'foreignField':"ward_id",'as':"lookup_value"}}]).pretty()
output: I have confirmed the data by running this query to MySQL there it is working fine
{
"_id" : ObjectId("5e53c95b26b0e5ad0fb46388"),
"Bed_id" : "bd-28",
"WARD_ID" : "6",
"OCCUPIED" : "0",
"BED_TYPE" : "NICU",
"lookup_value" : [ ]
}
SAMPLE VALUES DATA IS GIVEN ALL DATA IS NOT POSSIBLE TO GIVE. I know it was asked 1000 times but not able to resolve this question. tried to solve with lookup. but it showing blank space. Is anything I am missing.
The problem is BED_MAST collection's WARD_ID has string values and WARD_MAST collection's ward_id has Number values.

how to calculate time difference based on 2 fields in mongodb

I am familiar to simple mongodb queries but this one is a bit complex for me. Here, what I am trying to achieve is on the basis of jsonObject.callID and jsonObject.mobile fields I have to calculate time difference of jsonObject.timestamp. For example in below sample documents, jsonObject.callID and mobile will remain same for jsonObject.action start and end. So based on jsonObject.callID and jsonObject.mobile, I have to subtract the jsonObject.timestamp. jsonObject.callId will be same for two interval actions i.e. start and end with their same jsonObject.mobile numbers.
{
"_id" : ObjectId("5df9bc5ee5e7251030535df5"),
"_class" : "com.abc.mongo.docs.IvrMongoLog",
"jsonObject" : {
"mode" : "ivr",
"callID" : "33333",
"callee" : "128",
"action" : "end",
"mobile" : "218924535466",
"timestamp" : "2019-12-18 16:18:12"
}
}
{
"_id" : ObjectId("5df9bc3de5e7251030535df4"),
"_class" : "com.abc.mongo.docs.IvrMongoLog",
"jsonObject" : {
"mode" : "ivr",
"callID" : "33333",
"callee" : "128",
"action" : "start",
"mobile" : "218924535466",
"timestamp" : "2019-12-18 16:12:11"
}
}
So I am trying to achieve a output like below:
{
"callee" : "128",
"mobile" : "218924535466",
"callID" : "33333",
"minutes_of_call" : "6" // difference of "2019-12-18 16:18:12" - "2019-12-18 16:12:11"
}
subsequently I need such results for next documents...
Kindly assist.

Mongolite Aggregate Query with Added Fields

Problem
I have a collection hotelreviews_collection containing 1 million rows (documents) of reviews with various metadata. I would like to group by the Hotel_Name field, count the number of times this hotel has showed up, but also get the fields "lat", "lng" and "Average_Score" with my query. The three extra rows are the same for each Hotel_Name.
I am doing the queries in R using the mongolite library connected to a local MongoDB.
My Attempt
I have gotten to retrieving the Hotel_Names and counting their appearances using the code below, but cannot for the life of me get the other fields to work.
Current Code
overviewData <- M_CONNECTION$aggregate('[{"$group":{"_id":"$Hotel_Name", "count": {"$sum":1}, "average":{"$avg":"$distance"}}}]',
options = '{"allowDiskUse":true}')
I am completely lost on this, any and all help would be greatly appreciated.
I have solved my issue using the following code.
db.getCollection("hotelreviews_collection").aggregate(
[
{
"$group" : {
"_id" : {
"Hotel_Name" : "$Hotel_Name",
"lat" : "$lat",
"lng" : "$lng",
"Average_Score" : "$Average_Score"
},
"COUNT(Hotel_Name)" : {
"$sum" : NumberInt(1)
}
}
},
{
"$project" : {
"Hotel_Name" : "$_id.Hotel_Name",
"lat" : "$_id.lat",
"lng" : "$_id.lng",
"Average_Score" : "$_id.Average_Score",
"COUNT(Hotel_Name)" : "$COUNT(Hotel_Name)",
"_id" : NumberInt(0)
}
}
]
)

Mongo data modeling for monitoring application

I am trying to understand the best modeling for monitoring application.
I have a monitoring application which will be running every 30 mins to get stats from the target system and stores the details in MongoDB.
Use case:
Products, Companies
There will be around 2000 products. Products will be added/removed but the growth will not be more than 10% every month. So, I don't expect more than 3000 in the next 1 year.
Companies are the consumers for each products. There will be 1 to 10 companies for each product who are using the product. Consumers count also will go up and down.
So, on each run, we will get list of products along with the corresponding companies. Product details will be like,
Product:
Product name
Total number (this will give the current number available and will change on every poll)
Product weight
Durability days (might change once in a while)
Companies List - Who are using this product
Sample data for product:
{
"productName" : "Small Box",
"total" : NumberLong(1000),
"weight" : "1.5",
"durability" : "20",
"companies" : [
{
"name" : "Nike",
"taken" : NumberLong(10)
},
{
"name" : "Reebok",
"taken" : NumberLong(20)
}
]
}
Here, taken count will keep changing on each poll.
Web application:
There will be 3 screens to show the details.
Dashboard - Which will show high level stats like ( No of products, No of companies, Total size, ....)
Products - List view( To view the complete list )- Will show the details of a product on selecting any product
Here, I will have to show the product details and will have to list the companies who are all consuming.
Companies - List view( To view the complete list )- Will show the details company each selecting any company
Here, I will have to show Company details and all the products it is consuming.
The way, I am storing currently.
Dashboard collection - To show the stats details like, Total products, Total companies, ...
{
"time" :
"totalProducts" : NumberLong(1000),
"totalCompanies" : "1.5",
}
Products collection - Will have the following details.
{
"productName" : "Small Box",
"total" : NumberLong(1000),
"weight" : "1.5",
"durability" : "20",
"companies" : [
{
"name" : "Nike",
"taken" : NumberLong(10)
},
{
"name" : "Reebok",
"taken" : NumberLong(20)
}
]
}
Companies collection - will have the following details
{
"companyName" : "Nike",
"products" : [
{
"name" : "Small Box",
"taken" : NumberLong(10)
},
{
"name" : "Medium Box",
"taken" : NumberLong(20)
}
]
}
So, on each run, I am generating unique Id and adding this id to all the data being stored. I am keeping only last 2 weeks of data in these collections. Data older than 2 weeks will be cleaned every day.
So, when user comes to Dashboard, doing sort by to get the latest record and showing the details. There will be only one record for each run in Dashboard collection and there will be last 2 weeks of records.
When user comes to Products screen, Still will have to get the latest record from Dashboard to get the UniqueId and going to Products collection to get all the records for that UniqueId as there will be around 2000 records for each run. Same for companies collection.
Here, I will have to always show the latest data. I am going to 2 different collection when user goes to Products or Companies screen.
Is there any better approach?
Please check Its help you how to prepare schema
Note : Mongo Version : 3.6.5
Products Coolection
/* 1 */
{
"_id" : ObjectId("5bb1e270269004e06093e178"),
"productName" : "Small Box",
"total" : NumberLong(1000),
"weight" : "1.5",
"durability" : "20",
"companies" : [
ObjectId("5bb1e2d2269004e06093e17b"),
ObjectId("5bb1e2d8269004e06093e17c")
],
"date" : ISODate("2018-10-01T09:28:40.502Z")
}
/* 2 */
{
"_id" : ObjectId("5bb1e293269004e06093e179"),
"productName" : "Large Box",
"total" : 1000.0,
"weight" : "1.2",
"durability" : "20",
"companies" : [
ObjectId("5bb1e2d8269004e06093e17c"),
ObjectId("5bb1e2de269004e06093e17d")
],
"date" : ISODate("2018-10-01T09:28:40.502Z")
}
/* 3 */
{
"_id" : ObjectId("5bb1e29d269004e06093e17a"),
"productName" : "Medium Box",
"total" : 1000.0,
"weight" : "1.2",
"durability" : "20",
"companies" : [
ObjectId("5bb1e2d2269004e06093e17b"),
ObjectId("5bb1e2d8269004e06093e17c"),
ObjectId("5bb1e2de269004e06093e17d")
],
"date" : ISODate("2018-07-01T09:28:40.502Z")
}
Company collection
/* 1 */
{
"_id" : ObjectId("5bb1e2d2269004e06093e17b"),
"companyName" : "Nike"
}
/* 2 */
{
"_id" : ObjectId("5bb1e2d8269004e06093e17c"),
"companyName" : "Reebok"
}
/* 3 */
{
"_id" : ObjectId("5bb1e2de269004e06093e17d"),
"companyName" : "PUMA"
}
Get Single products with comapny
db.getCollection('products').aggregate([{
$match : { "_id" : ObjectId("5bb1e270269004e06093e178") } },
{ $lookup : {
from : 'company',
foreignField : '_id',
localField : 'companies',
as : 'companies'
}
}
])
All Producsts with Company List
db.getCollection('products').aggregate([
{ $lookup : {
from : 'company',
foreignField : '_id',
localField : 'companies',
as : 'companies'
}
}
])
Company by id and its used products
db.getCollection('company').aggregate([{
$match : { "_id" : ObjectId("5bb1e2d2269004e06093e17b") } },
{ $lookup : {
from : 'products',
foreignField : 'companies',
localField : '_id',
as : 'products'
}
}
])
Also by add date fields in each products you get last week data by
db.getCollection('products').aggregate([{
$match : {
date: {
$gte: new Date(new Date() - 7 * 60 * 60 * 24 * 1000)
} } },
{ $lookup : {
from : 'products',
foreignField : 'companies',
localField : '_id',
as : 'products'
}
}
])
Get latest product with company
db.getCollection('products').aggregate([
{ $sort : { date : -1} },
{ $limit : 1},
{ $lookup : {
from : 'company',
foreignField : '_id',
localField : 'companies',
as : 'companies'
}
}
])
So, On each run, I am generating unique Id and adding this id to all the data being stored. I am keeping only last 2 weeks of data in these collections. Data older than 2 weeks will be cleaned every day.
i wouldnt do it like this. i would use the _id, mongodb automaticly give you. and make a runobject that collects all those objects id's. because its better to have 1 key per object than 336 keys ( 48(30 minutes a day) x 14(days in 2 weeks))
i would make a run object that contains a array of company_id's , product_id's and timestamp. and i would make it a possibility that if nothing has changed in 30 minutes that you only use a runobject _id(of the first one that was the same) and a timestamp. so you saved a lot of space.
hopefully i understand you correctly, because it was a tough read for me.

How can I select a number of records per a specific field using mongodb?

I have a collection of documents in mongodb, each of which have a "group" field that refers to a group that owns the document. The documents look like this:
{
group: <objectID>
name: <string>
contents: <string>
date: <Date>
}
I'd like to construct a query which returns the most recent N documents for each group. For example, suppose there are 5 groups, each of which have 20 documents. I want to write a query which will return the top 3 for each group, which would return 15 documents, 3 from each group. Each group gets 3, even if another group has a 4th that's more recent.
In the SQL world, I believe this type of query is done with "partition by" and a counter. Is there such a thing in mongodb, short of doing N+1 separate queries for N groups?
You cannot do this using the aggregation framework yet - you can get the $max or top date value for each group but aggregation framework does not yet have a way to accumulate top N plus there is no way to push the entire document into the result set (only individual fields).
So you have to fall back on MapReduce. Here is something that would work, but I'm sure there are many variants (all require somehow sorting an array of objects based on a specific attribute, I borrowed my solution from one of the answers in this question.
Map function - outputs group name as a key and the entire rest of the document as the value - but it outputs it as a document containing an array because we will try to accumulate an array of results per group:
map = function () {
emit(this.name, {a:[this]});
}
The reduce function will accumulate all the documents belonging to the same group into one array (via concat). Note that if you optimize reduce to keep only the top five array elements by checking date then you won't need the finalize function, and you will use less memory during running mapreduce (it will also be faster).
reduce = function (key, values) {
result={a:[]};
values.forEach( function(v) {
result.a = v.a.concat(result.a);
} );
return result;
}
Since I'm keeping all values for each key, I need a finalize function to pull out only latest five elements per key.
final = function (key, value) {
Array.prototype.sortByProp = function(p){
return this.sort(function(a,b){
return (a[p] < b[p]) ? 1 : (a[p] > b[p]) ? -1 : 0;
});
}
value.a.sortByProp('date');
return value.a.slice(0,5);
}
Using a template document similar to one you provided, you run this by calling mapReduce command:
> db.top5.mapReduce(map, reduce, {finalize:final, out:{inline:1}})
{
"results" : [
{
"_id" : "group1",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe13"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.498Z"),
"contents" : 0.23778377776034176
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0e"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.467Z"),
"contents" : 0.4434165076818317
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe09"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.436Z"),
"contents" : 0.5935856597498059
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe04"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.405Z"),
"contents" : 0.3912118375301361
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfdff"),
"name" : "group1",
"date" : ISODate("2013-04-17T20:07:59.372Z"),
"contents" : 0.221651989268139
}
]
},
{
"_id" : "group2",
"value" : [
{
"_id" : ObjectId("516f011fbfd3e39f184cfe14"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.504Z"),
"contents" : 0.019611883210018277
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0f"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.473Z"),
"contents" : 0.5670706110540777
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe0a"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.442Z"),
"contents" : 0.893193120136857
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe05"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.411Z"),
"contents" : 0.9496864483226091
},
{
"_id" : ObjectId("516f011fbfd3e39f184cfe00"),
"name" : "group2",
"date" : ISODate("2013-04-17T20:07:59.378Z"),
"contents" : 0.013748752186074853
}
]
},
{
"_id" : "group3",
...
}
]
}
],
"timeMillis" : 15,
"counts" : {
"input" : 80,
"emit" : 80,
"reduce" : 5,
"output" : 5
},
"ok" : 1,
}
Each result has _id as group name and values as array of most recent five documents from the collection for that group name.
you need aggregation framework $group stage piped in a $limit stage...
you want also to $sort the records in some ways or else the limit will have undefined behaviour, the returned documents will be pseudo-random (the order used internally by mongo)
something like that:
db.collection.aggregate([{$group:...},{$sort:...},{$limit:...}])
here there is the documentation if you want to know more