How to take the most newest part of all reports and merge into one main report? - mongodb

EDIT: I have found out that mongo does not allow to use special characters such as dots and the dollar sign as a key in the report so I had to rechange the structure of the JSON a bit. But, my question remains the same (I removed the old stuff so it will be more readable but you can still see it in the edit history section). The new structure looks as follows:
{
"name": "test1",
"main": [
{
"subs": [
{
"data": [
{
"group": "ABC",
"values": [
"tcsh"
]
},
{
"group": "AA",
"values": [
"6.13.00"
]
}
]
},
{
"data": [
{
"group": "xyz",
"values": [
"tcsh"
]
},
{
"group": "SADA",
"values": [
"6.13.00"
]
}
]
}
],
"main_name": "MAIN",
"main_path": "play_ground/MAIN"
},
{
"subs": [
{
"data": [
{
"group": "BAB",
"values": [
"tcsh"
]
},
{
"group": "GO",
"values": [
"6.13.00"
]
}
]
}
],
"main_name": "MAIN2",
"main_path": "play_ground/MAIN2"
}
],
"user": "easdasa",
"timestamp": "1564437533"
}
I want to get all reports that have a name test1 and a user easdasa. Then, I would like to take the latest block of data of each block of subs. This is done with the help of the timestamp.
For example in the following array I have two reports:
[{
"name": "test1",
"main": [
{
"subs": [
{
"data": [
{
"group": "xyz",
"values": [
"tcsh"
]
},
{
"group": "SADA",
"values": [
"6.13.00"
]
}
]
}
],
"main_name": "MAIN",
"main_path": "play_ground/MAIN"
}
],
"timestamp": "1564437533"
},
{
"name": "test1",
"main": [
{
"subs": [
{
"data": [
{
"group": "ABC",
"values": [
"tcsh"
]
},
{
"group": "AA",
"values": [
"6.13.00"
]
}
]
},
{
"data": [
{
"group": "xyz",
"values": [
"tcsh"
]
},
{
"group": "SADA",
"values": [
"5.0.1",
"12312"
]
}
]
}
],
"main_name": "MAIN",
"main_path": "play_ground/MAIN"
}
],
"timestamp": "1564437522"
}]
The first report is was created after the second report (due to the timestamp). I can see that there is a block that located in the second report but not in the first report:
{
"data": [
{
"group": "ABC",
"values": [
"tcsh"
]
},
{
"group": "AA",
"values": [
"6.13.00"
]
}
]
},
So I want the final report to heve it (besides all the blocks from the first report). Also, you can see that the values of the SADA group are diffrenet. So we want to take the first's report block. The final report should be:
{
"name": "test1",
"main": [
{
"subs": [
{
"data": [
{
"group": "ABC",
"values": [
"tcsh"
]
},
{
"group": "AA",
"values": [
"6.13.00"
]
}
]
},
{
"data": [
{
"group": "xyz",
"values": [
"tcsh"
]
},
{
"group": "SADA",
"values": [
"6.13.00"
]
}
]
}
],
"main_name": "MAIN",
"main_path": "play_ground/MAIN"
}
],
"timestamp": "1564437533"
}
In other words, in the (json) values of the data level I want to get the latest report and in the (json) values of the subs level I want to get all existing subs. So it will be more clear, in the (json) values of the data level I want to get all the groups and values of the latest report and for the (json) values of the subs level I want to have all the subs.
If I could specify steps:
Get all reports by user and name.
Theoritcly merge all report into one main report (the implmenetation could be diffrent). The merge will use be done by main_name.
Remove all old subs values by timestamp that already exists in the latest report so the final report will have in the subs level only the newest objects and object from the old reports that were not in the newer reports.
Which query I should be in order to get the wanted report?

Please use the below query and check on stats, I can really say performance can be improved by having proper indexing as per your requirements(querying), Please use $explain (enter link description here)
to check on query performance. I've considered your array exists in a field with key as values , Please let me know if this works or if it doesn't provide sample data, we can check on that:
db.getCollection('yourcollection').aggregate([{$unwind: '$values'},{$match : {'values.name': 'test1', 'values.user': 'galih'}},
{$sort: {'values.timestamp' : -1}},
{$limit: 1}
])

Related

MongoDB lookup with multiple nested levels

In my application, I have a section of comments and replies under some documents.
Here's how my database schema looks like
db.updates.insertOne({
"_id": "62347813d28412ffd82b551d",
"documentID": "17987e64-f848-40f3-817e-98adfd9f4ecd",
"stream": [
{
"id": "623478134c449b218b68f636",
"type": "comment",
"text": "Hey #john, we got a problem",
"authorID": "843df3dbbdfc62ba2d902326",
"taggedUsers": [
"623209d2ab26cfdbbd3fd348"
],
"replies": [
{
"id": "623478284c449b218b68f637",
"type": "reply",
"text": "Not sure, let's involve #jim here",
"authorID": "623209d2ab26cfdbbd3fd348",
"taggedUsers": [
"26cfdbbd3fd349623209d2ab"
]
}
]
}
]
})
db.users.insertMany([
{
"_id": "843df3dbbdfc62ba2d902326",
"name": "Manager"
},
{
"_id": "623209d2ab26cfdbbd3fd348",
"name": "John"
},
{
"_id": "26cfdbbd3fd349623209d2ab",
"name": "Jim"
},
])
I want to join those two collections, and replace user ids with complete user information on all levels. So the final JSON should look like this
{
"_id": "62347813d28412ffd82b551d",
"documentID": "17987e64-f848-40f3-817e-98adfd9f4ecd",
"stream": [
{
"id": "623478134c449b218b68f636",
"type": "comment",
"text": "Hey #john, we got a problem",
"author": {
"_id": "843df3dbbdfc62ba2d902326",
"name": "Manager"
},
"taggedUsers": [
{
"_id": "623209d2ab26cfdbbd3fd348",
"name": "John"
}
],
"replies": [
{
"id": "623478284c449b218b68f637",
"type": "reply",
"text": "Not sure, let's involve #jim here",
"author": {
"_id": "623209d2ab26cfdbbd3fd348",
"name": "John"
},
"taggedUsers": [
{
"_id": "26cfdbbd3fd349623209d2ab",
"name": "Jim"
}
]
}
]
}
]
}
I know how to do the $lookup on the top-level fields, including pipelines, but how can I do with the nested ones?

OSRM Table API not returning distance

I am trying to get the distance matrix for my desired locations. As mentioned in the OSRM-Table Service docs, I have tried modifying the same as http://router.project-osrm.org/table/v1/driving/13.388860,52.517037;13.397634,52.529407;13.428555,52.523219&annotations=distance.
Response is showing error as:
{
"message": "Coordinate is invalid: 13.397634,52.529407&annotations=distance,duration",
"code": "InvalidInput"
}
but, when I run it without annotations, then I am able to get the proper response.
{
"durations": [
[
0,
723.9,
711
],
[
419.8,
0,
541.6
],
[
565,
416,
0
]
],
"destinations": [
{
"hint": "g5HFiBCSxYgiAAAA6gIAAAAAAAAAAAAASjFaQU1xpUEAAAAAAAAAACIAAADqAgAAAAAAAAAAAADppQAA_kvMAKlYIQM8TMwArVghAwAA7wqVP7a9",
"distance": 4.231665624816857,
"name": "Friedrichstraße",
"location": [
13.388798,
52.517033
]
},
{
"hint": "0BgegNQVzIgMAAAADAAAAAAAAAACAQAAW7-PQOKcyEAAAAAApq6DQgwAAAAMAAAAAAAAAIoAAADppQAAf27MABiJIQOCbswA_4ghAwAAXwWVP7a9",
"distance": 2.7893928415656375,
"name": "Torstraße",
"location": [
13.397631,
52.529432
]
},
{
"hint": "xRcegP___38kAAAAyAAAAC0AAABKAAAAsowKQkpQX0Lx6yZCvsQGQiQAAABkAAAALQAAACUAAADppQAASufMAOdwIQNL58wA03AhAwMAvxCVP7a9",
"distance": 2.2265954222656257,
"name": "Platz der Vereinten Nationen",
"location": [
13.428554,
52.523239
]
}
],
"sources": [
{
"hint": "g5HFiBCSxYgiAAAA6gIAAAAAAAAAAAAASjFaQU1xpUEAAAAAAAAAACIAAADqAgAAAAAAAAAAAADppQAA_kvMAKlYIQM8TMwArVghAwAA7wqVP7a9",
"distance": 4.231665624816857,
"name": "Friedrichstraße",
"location": [
13.388798,
52.517033
]
},
{
"hint": "0BgegNQVzIgMAAAADAAAAAAAAAACAQAAW7-PQOKcyEAAAAAApq6DQgwAAAAMAAAAAAAAAIoAAADppQAAf27MABiJIQOCbswA_4ghAwAAXwWVP7a9",
"distance": 2.7893928415656375,
"name": "Torstraße",
"location": [
13.397631,
52.529432
]
},
{
"hint": "xRcegP___38kAAAAyAAAAC0AAABKAAAAsowKQkpQX0Lx6yZCvsQGQiQAAABkAAAALQAAACUAAADppQAASufMAOdwIQNL58wA03AhAwMAvxCVP7a9",
"distance": 2.2265954222656257,
"name": "Platz der Vereinten Nationen",
"location": [
13.428554,
52.523239
]
}
],
"code": "Ok"
}
The problem here is I don't have distance matrix in it. Can anyone please suggest the cause or how to solve it?
It seems there is currently a problem with the osrm demo service. Check the issues below:
OSRM Github issue #5541
OSRM Github issue #5517

MongoDB nested array aggregation

I want to aggregate data for the following sample array.
[
{
"_id": "5b7c0540342100091a375793",
"pages": [
{
"name": "ABCD",
"sections": [
{
"name": "sectionThird",
"id": 2,
"value": [
10,
50,
20
]
}
]
}
]
},
{
"_id": "5b3cd546342100514b4683a2",
"pages": [
{
"name": "ABCD",
"sections": [
{
"name": "sectionFourth",
"id": 2,
"value": [
19,
5,
8
]
},
{
"name": "sectionThird",
"id": 2,
"value": [
60
]
}
]
},
{
"name": "EFGH",
"sections": [
{
"name": "sectionFourth",
"id": 2,
"value": [
5
]
},
{
"name": "sectionThsads",
"id": 2,
"value": [
8
]
}
]
}
]
}
]
I want the following output:
[
{
"page": "ABCD",
"sections": [
{
"name": "sectionThird",
"totalValue": 140
},
{
"name": "sectionFourth",
"totalValue": 32
}
]
},
{
"page": "EFGH",
"sections": [
{
"name": "sectionFourth",
"totalValue": 5
},
{
"name": "sectionThsads",
"totalValue": 8
}
]
}
]
In the above sample array, you can see there are multiple documents with "page" as one of the keys which are also an array of objects. Each page object has a key "name" which is going to be unique for each object in "page" array. The "page" object has "sections" key and they also have "name" key in them which is going to be unique for each object.
So the output array is grouped by page.name then in that its grouped by sections.name from all the page objects with the sum of all the value array throughout sections inside a page object with the same section name.
You can use below aggregation.
$unwind each page and section followed by $group with $sum to sum the values for each section and $push to push the sections values back into page array.
db.col.aggregate([
{"$unwind":"$pages"},
{"$unwind":"$pages.sections"},
{"$group":{
"_id":{"pagename":"$pages.name","sectionname":"$pages.sections.name"},
"totalTime":{"$sum":{"$sum":"$pages.sections.value"}}
}},
{"$group":{
"_id":"$_id.pagename",
"sections":{"$push":{"name":"$_id.sectionname","totalTime":"$totalTime"}}
}}])

Mongodb: Apply group in a single collection with multiple relations

Example: A Collection name is Account which is having categories and upvotes and downvotes. Upvotes and downvotes are basically feeds of categories. The structure of the Collection is as follows:
{
"id": "585a2f2735cc577c178bda2b",
"category_ids": [
"5857e65c950cfd241818abc3",
"5857e92f950cfd241818abd0",
"5857e957950cfd241818abd2",
"5857f03f950cfd241818abd5"
],
"upVotes": [
{
"id": "585a6ccc055f93cbb10bd179",
"name": "career feed",
"category_ids": [
"5857e65c950cfd241818abc3"
]
},
{
"id": "5860bc714b7b3a400ef96d2a",
"name": "Treasurers and Controllers",
"category_ids": [
"5857e957950cfd241818abd2"
]
}
],
"downVotes": [
{
"id": "585a8fbb416ecf300c6ea969",
"name": "testinggggg",
"category_ids": [
"5857f03f950cfd241818abd5",
"5857f07c950cfd241818abd6"
]
},
{
"id": "585a8406354db7d811f9a405",
"name": "feeds5",
"category_ids": [
"5857e957950cfd241818abd2",
"5857f07c950cfd241818abd6"
]
}
]
}
I want to show downvotes and upvotes according to the category of user.
[Working on Mongo DB]
OUTPUT:
"Category": [{
"id": "585a6ccc055f93cbb10bd179",
"upvotes": [{
"id": "",
"name": ""
}, {
"id": "",
"name": ""
}],
"downvotes": [{
"id": "",
"name": ""
}, {
"id": "",
"name": ""
}]
}]

Getting Lifetime Values from Google Analytics API

Google Analytics API documentation shows that, for fetching the lifetime values, the date ranges should not be specified. But when I make such a request (without date range), it returns empty dimension and metrics result. But when I use date range, it returns dimension and metrics values for that date range.
The following is an excerpt from the API documentation :
Date ranges should not be specified for cohorts or Lifetime value
requests.
For example, if I make the request without date range, as follows:
{
"reportRequests": [
{
"viewId": "XXXXXXXXX",
"dimensions": [
{
"name": "ga:date"
},
{
"name": "ga:eventLabel"
}
],
"metrics": [
{
"expression": "ga:totalEvents"
}
]
}
]
}
I get the following response:
{
"reports": [
{
"columnHeader": {
"dimensions": [
"ga:date",
"ga:eventLabel"
],
"metricHeader": {
"metricHeaderEntries": [
{
"name": "ga:totalEvents",
"type": "INTEGER"
}
]
}
},
"data": {
"totals": [
{
"values": [
"0"
]
}
]
}
}
]
}
However, if I include the date range,
{
"reportRequests": [
{
"viewId": "XXXXXXXX",
"dimensions": [
{
"name": "ga:date"
},
{
"name": "ga:eventLabel"
}
],
"metrics": [
{
"expression": "ga:totalEvents"
}
],
"dateRanges": [
{
"startDate": "2016-01-01",
"endDate": "2016-04-30"
}
]
}
]
}
I get the following response:
{
"reports": [
{
"columnHeader": {
"dimensions": [
"ga:date",
"ga:eventLabel"
],
"metricHeader": {
"metricHeaderEntries": [
{
"name": "ga:totalEvents",
"type": "INTEGER"
}
]
}
},
"data": {
"rows": [
{
"dimensions": [
"20160412",
"http://mytestblog.com/"
],
"metrics": [
{
"values": [
"1"
]
}
]
},
{
"dimensions": [
"20160412",
"http://mytestblog.com/2016/04/first-post.html"
],
"metrics": [
{
"values": [
"3"
]
}
]
},
{
"dimensions": [
"20160419",
"http://mytestblog.com/"
],
"metrics": [
{
"values": [
"4"
]
}
]
},
{
"dimensions": [
"20160419",
"http://mytestblog.com/2016/04/fourth.html"
],
"metrics": [
{
"values": [
"13"
]
}
]
}
],
"totals": [
{
"values": [
"21"
]
}
],
"rowCount": 4,
"minimums": [
{
"values": [
"1"
]
}
],
"maximums": [
{
"values": [
"13"
]
}
]
}
}
]
}
Why is it that, even though specified in the documentation, I have to specify date range in the ReportRequest to get the values? Am I misunderstanding the meaning of Lifetime values here?
The reportRequest object should have either a value for dateRanges or a definition value for cohortGroup. When you omit both the requests assumes the default values for a startDate of 7daysAgo and an endDate of yesterday.
The correct interpretation of the docs is that the reportRequest should not have a dateRange defined for cohort and LTV requests. But in order to make a cohort or lifetime value request you must add a cohort definition. For Lifetime value requests the cohort definition should have a specific dateRange in addition to the lifetimeValue field set to true:
POST https://analyticsreporting.googleapis.com/v4/reports:batchGet
{
"reportRequests": [
{
"viewId": "XXXX",
"dimensions": [
{"name": "ga:cohort" },
{"name": "ga:cohortNthWeek" }],
"metrics": [
{"expression": "ga:cohortTotalUsersWithLifetimeCriteria"},
{"expression": "ga:cohortRevenuePerUser"}
],
"cohortGroup": {
"cohorts": [{
"name": "cohort 1",
"type": "FIRST_VISIT_DATE",
"dateRange": {
"startDate": "2015-08-01",
"endDate": "2015-09-01"
}
},
{
"name": "cohort 2",
"type": "FIRST_VISIT_DATE",
"dateRange": {
"startDate": "2015-07-01",
"end_date": "2015-08-01"
}
}],
"lifetimeValue": True
}
}]
}