Trying to aggregate on multiple fields Mongoose - mongodb

I have created a time entry system in which users can enter in the amount of time (percentage) spent on a task between a given time period. Each record looks like the following. I changed the user _id to explicit names to make it easier to visualize
"project_name": "first_project",
"linked_project": "5bd057f5d4b8173d88b7fe47",
"percentage": 25,
"user": {
"$oid": "Steve"
},
"project_name": "first_project",
"linked_project": "5bd057f5d4b8173d88b7fe47",
"percentage": 50,
"user": {
"$oid": "Steve"
},
"project_name": "second_project",
"linked_project": "5bd057f5d4b8173d88b7fe48",
"percentage": 25,
"user": {
"$oid": "Steve"
},
"project_name": "second_project",
"linked_project": "5bd057f5d4b8173d88b7fe48",
"percentage": 75,
"user": {
"$oid": "Mary"
},
I'm trying to first group by Person and then by project. Basically I want a total of how much each user has spent on a particular task. Not sure if what I am trying to achieve is even possible. I have included what I am trying to achieve below:
Example output:
[
{
user: Steve,
projects: [
first_project: 75,
second_project: 25
]
},
{
user: Mary,
projects: [
second_project: 75
]
}
]
I've tried a variety of ways to achieve this and I haven't come close. Hopefully someone has some insight on how to achieve this.

You can use multiple groups, one for summing percentages for each user and project_name combination and second group to push all the documents for user.
db.colname.aggregate([
{"$group":{
"_id":{
"user":"$user",
"project_name":"$project_name"
},
"time":{"$sum":"$percentage"}
}},
{"$group":{
"_id":"$_id.user",
"projects":{"$push":{"project_name":"$_id.project_name","time":"$time"}}
}}
])
To get the output as single document you can use in the last group stage
"projects":{"$mergeObjects":{"$arrayToObject":[[["$_id.project_name","$time"]]]}}

Related

MongoDB - Project specific element from array (big data)

I got a big array with data in the following format:
{
"application": "myapp",
"buildSystem": {
"counter": 2361.1,
"hostname": "host.com",
"jobName": "job_name",
"label": "2361",
"systemType": "sys"
},
"creationTime": 1517420374748,
"id": "123",
"stack": "OTHER",
"testStatus": "PASSED",
"testSuites": [
{
"errors": 0,
"failures": 0,
"hostname": "some_host",
"properties": [
{
"name": "some_name",
"value": "UnicodeLittle"
},
<MANY MORE PROPERTIES>,
{
"name": "sun",
"value": ""
}
],
"skipped": 0,
"systemError": "",
"systemOut": "",
"testCases": [
{
"classname": "IdTest",
"name": "has correct representation",
"status": "PASSED",
"time": "0.001"
},
<MANY MORE TEST CASES>,
{
"classname": "IdTest",
"name": "normalized values",
"status": "PASSED",
"time": "0.001"
}
],
"tests": 8,
"time": 0.005,
"timestamp": "2018-01-31T17:35:15",
"title": "IdTest"
}
<MANY MORE TEST SUITES >,
]}
Where I can distinct three main structures with big data: TestSuites, Properties, and TestCases. My task is to sum all times from each TestSuite so that I can get the total duration of the test. Since the properties and TestCases are huge, the query cannot complete. I would like to select only the "time" value from TestSuites, but it kind of conflicts with the "time" of TestCases in my query:
db.my_tests.find(
{
application: application,
creationTime:{
$gte: start_date.valueOf(),
$lte: end_date.valueOf()
}
},
{
application: 1,
creationTime: 1,
buildSystem: 1,
"testSuites.time": 1,
_id:1
}
)
Is it possible to project only the "time" properties from TestSuites without loading the whole schema? I already tried testSuites: 1, testSuites.$.time: 1 without success. Please notice that TestSuites is an array of one element with a dictionary.
I already checked this similar post without success:
Mongodb update the specific element from subarray
Following code prints duration of each TestSuite:
query = db.my_collection.aggregate(
[
{$match: {
application: application,
creationTime:{
$gte: start_date.valueOf(),
$lte: end_date.valueOf()
}
}
},
{ $project :
{ duration: { $sum: "$testSuites.time"}}
}
]
).forEach(function(doc)
{
print(doc._id)
print(doc.duration)
}
)
Is it possible to project only the "time" properties from TestSuites
without loading the whole schema? I already tried testSuites: 1,
testSuites.$.time
Answering to your problem of prejecting only the time property of the testSuites document you can simply try projecting it with "testSuites.time" : 1 (you need to add the quotes for the dot notation property references).
My task is to sum all times from each TestSuite so that I can get the
total duration of the test. Since the properties and TestCases are
huge, the query cannot complete
As for your task, i suggest you try out the mongodb's aggregation framework for your calculations documents tranformations. The aggregations framework option {allowDiskUse : true} will also help you if you are proccessing "large" documents.

Mongo DB Join Collections

I am pretty new to Mongo db and coming from T-SQL background, I am finding little hard to understand how joins work in Mongo.
I have a very simple case where i have a "User Table.. err.. Collections" and "User Audit Collections"..
My User Collection looks something like this.
{
"_id": LUUID("d991e92a-766c-054e-9ad8-1c902acc6efc"),
"System": {
"VisitCount": 1
},
"UserData": {
"Uid": "46831",
"UserName": "abc.",
"FirstName": "abv",
"LastName": "test",
"EmailId": "abc#gmail.com",
"Region": "Georgia",
"Postal": "10000",
"Country": "United States",
"Phone": "800-000-1734",
}
}
and a User Audit Table :
{
"_id": LUUID("9561a583-0afe-e844-a090-43ffdab46ed2"),
"UserId": LUUID("914ed252-3fc7-d84c-9731-f382e7cf400b"),
"StartDateTime": ISODate("2016-05-12T04:07:37.299Z"),
"EndDateTime": ISODate("2016-05-12T04:07:42.715Z"),
"SaveDateTime": ISODate("2016-05-12T04:28:23.186Z"),
"Browser": {
"BrowserVersion": "50.0",
"BrowserMajorName": "Chrome",
"BrowserMinorName": "50.0"
},
"Pages": [
{
"DateTime": ISODate("2016-05-12T04:07:37.365Z"),
"Duration": 5416,
"Item": {
"_id": LUUID("f293157a-f22d-fe49-a7b0-f66f412408fe"),
"Language": "en",
"Version": 1
}"Url": {
"Path": "/"
},
"VisitPageIndex": 1
},
{
"DateTime": ISODate("2016-05-12T04:07:42.781Z"),
"Duration": 0,
"Item": {
"Version": 0
},
"SitecoreDevice": {
"_id": LUUID("df7f5dfe-c089-994d-9aa3-b5fbd009c9f3"),
"Name": "Default"
},
"MvTest": {
"ValueAtExposure": 0
},
"Url": {
"Path": "/Sample Page1"
},
"VisitPageIndex": 2
}
]
}
I need a Flat view where each row will hold all the user User information and the pages the user visited.
The Audit information can be grouped by user or repeated per user.. My main idea is to combine the User details with Page visited history.
I am looking for something like a Left outer join equivalent
something like
Select * from usertable, useraudittable
on usertable.id = userAuditTable.UserId
group by userID.
Mongo is a simple object storage database and does not offer a lot of relational operations like joins. Normally you have to do it programmatically doing multiple queries and processing the data using your application code and logic.
In Mongo 3.2 they introduced the lookup operation to the aggregation pipeline and fortunately it kinda does what you are looking for. You can use something like this (using mongo shell javascript syntax as example)
db.user.aggregate([{
$lookup: {
from: "audit",
localField: "_id",
foreignField: "UserId",
as: "VisitedPages"
}
}]);
If you are using the last version of mongo you can play with this approach otherwise you'll need to go with multiple queries on your application.
Take a look at the documentation

Algolia AND search through an array

I am looking for a way to search in Algolia a record where at least one element of an array meets several conditions.
As an example, imagine this kind of record:
{
"name": "Shoes",
"price": 100,
"prices": [
{
"start": 20160101,
"end": 20160131,
"price": 50,
},
{
"start": 20160201,
"end": 20160229,
"price": 80,
}
]
}
I am looking for a way to do a query like the following:
prices.price<60 AND prices.start<=20160210 AND prices.end>=20160210
(A product where the price is less than 60 for the given date)
That query should not return anything because the price condition is not met for that date but the record is returned anyway. Probably because the condition is met "globally" among all prices.
I am a beginner with Algolia and trying to learn. Is there a way I can do the desired request or will I have to go for a separate index for prices and use multiple queries?
Thanks.
When a facetFilter or tagFilter is applied on an array, Algolia's engine checks if any element of the array matches and then goes to the next condition.
The reason it behaves that way and not the way you expected is simple: let's assume you have an array of strings (like tags):
{ tags: ['public', 'cheap', 'funny', 'useless'] }
When a user wants to find a record that is "useless" and "funny", this user is not looking for a tag that is both "useless" and "funny" at the same time, but for a record containing both tags in the array.
The solution for this is to denormalize your object in some way: transforming a record with an array of objects to multiple records with one object each.
So you would transform
{
"name": "Shoes",
"price": 100,
"prices": [
{ "start": 20160101, "end": 20160131, "price": 50 },
{ "start": 20160201, "end": 20160229, "price": 80 }
]
}
into
[
{
"name": "Shoes",
"default_price": 100,
"price": { "start": 20160101, "end": 20160131, "price": 50 }
},
{
"name": "Shoes",
"default_price": 100,
"price": { "start": 20160201, "end": 20160229, "price": 80 }
}
]
You could either do this in the same index (and use distinct to de-duplicate), or have one index per month or day. It really depends on your use-case, which makes it a bit hard to give you the right solution.

compare two collections in mongodb using java or an simple query

I am having following document (Json) of an gallery,
{
"_id": "53698b6092x3875407fefe7c",
"status": "active",
"colors": [
"red",
"green"
],
"paintings": [
{
"name": "MonaLisa",
"by": "LeonardodaVinci"
},
{
"name": "JungleArc",
"by": "RayBurggraf"
}
]
}
Now I am also having one collection of colors say
COLORS-COLLECTION: ["black","yellow","red","green","blue","pink"]
I want to fetch paintings by it's name matching to provided text say "MonaLisa" (as search query) also I want to compare two colors with COLORS-COLLECTION, if colors has any of the matching color in COLORS-COLLECTION then it should return the painting.
I want something like below:
{
"paintings": [
{
"name": "MonaLisa",
"by": "LeonardodaVinci"
}
]
}
Please help me!!. Thanks in advance.
If I get you correctly, aggregation framework would do your job:
db.gallery.aggregate([
{"$unwind": "$paintings"},
{"$match": {"paintings.name": 'MonaLisa', "colors": {"$in": ["black","yellow","red","green","blue","pink"]}}},
{"$project": {"paintings": 1, "_id": 0}}
]);

How to remove array elements when array is nested in multiple levels of embedded docs?

Given the following MongoDB example collection ("schools"), how do you remove student "111" from all clubs?
[
{
"name": "P.S. 321",
"structure": {
"principal": "Fibber McGee",
"vicePrincipal": "Molly McGee",
"clubs": [
{
"name": "Chess",
"students": [
ObjectId("111"),
ObjectId("222"),
ObjectId("333")
]
},
{
"name": "Cricket",
"students": [
ObjectId("111"),
ObjectId("444")
]
}
]
}
},
...
]
I'm hoping there's some way other than using cursors to loop over every school, then every club, then every student ID in the club...
MongoDB doesn't have a great support for arrays within arrays (within arrays ...). The simplest solution I see is to read the whole document into your app, modify it there and then save. This way, of course, the operation is not atomic, but for your app it might be ok.