How to perform mongodb aggregations on multilevel/nested documents? - mongodb

I have mongodb collection called tasks. Here T1.1 is sub task of T1 and T1.1.1 is sub task of T1.1 and so on.. Subtask levels can grow more. I am using mongodb version 4.0. Below is the collection data
---------------------------------------
task | parent_task_id | progress(%)
---------------------------------------
T1 | null | 20
---------------------------------------
T2 | null | 30
---------------------------------------
T1.1 | T1 | 10
---------------------------------------
T1.2 | T1 | 10
---------------------------------------
T1.1.1 | T1.1 | 10
---------------------------------------
T1.1.2 | T1.1 | 10
---------------------------------------
T1.1.1.1 | T1.1.1 | 10
---------------------------------------
How do I calculate average progress of task T1 including all subtasks(T1.1,T1.2,T1.1.1,T1.1.2,T1.1.1.1) using mongodb aggregations?
Thanks in advance.

db.collection.aggregate([
{
$match: {
"task": {//Find all subtasks using pattern
$regex: "^T1.*"
}
}
},
{
$group: {
"_id": null,
"avg": {//Avg of the matched tasks' progress
"$avg": "$progress"
}
}
}
])
Sample playground

Related

Replicating MongoDB $bucket with conditional sum in Postgres

I have a database with hundreds of thousands of rows with this schema:
+----+----------+---------+
| id | duration | type |
+----+----------+---------+
| 1 | 41 | cycling |
+----+----------+---------+
| 2 | 15 | walking |
+----+----------+---------+
| 3 | 6 | walking |
+----+----------+---------+
| 4 | 26 | running |
+----+----------+---------+
| 5 | 30 | cycling |
+----+----------+---------+
| 6 | 13 | running |
+----+----------+---------+
| 7 | 10 | running |
+----+----------+---------+
I was previously using a MongoDB aggregation to do this and get a distribution of activities by type and total count:
{
$bucket: {
groupBy: '$duration',
boundaries: [0, 16, 31, 61, 91, 121],
default: 121,
output: {
total: { $sum: 1 },
walking: {
$sum: { $cond: [{ $eq: ['$type', 'walking'] }, 1, 0] },
},
running: {
$sum: { $cond: [{ $eq: ['$type', 'running'] }, 1, 0] },
},
cycling: {
$sum: { $cond: [{ $eq: ['$type', 'cycling'] }, 1, 0] },
},
},
},
}
I have just transitioned to using Postgres and can't figure out how to do the conditional sums there. What would the query be to get a result table like this?
+---------------+---------+---------+---------+-------+
| duration_band | walking | running | cycling | total |
+---------------+---------+---------+---------+-------+
| 0-15 | 41 | 21 | 12 | 74 |
+---------------+---------+---------+---------+-------+
| 15-30 | 15 | 1 | 44 | 60 |
+---------------+---------+---------+---------+-------+
| 30-60 | 6 | 56 | 7 | 69 |
+---------------+---------+---------+---------+-------+
| 60-90 | 26 | 89 | 32 | 150 |
+---------------+---------+---------+---------+-------+
| 90-120 | 30 | 0 | 6 | 36 |
+---------------+---------+---------+---------+-------+
| 120+ | 13 | 90 | 0 | 103 |
+---------------+---------+---------+---------+-------+
| Total | 131 | 257 | 101 | 492 |
+---------------+---------+---------+---------+-------+
SQL is very good at retrieving and making calculations on data, and delivering it so getting the values you want is an easy task. It is not so good at formatting results, that why that task is typically left to the presentation layer. That said, however, does not mean it cannot be done - it can and in a single query. The difficulty is the pivot process - transforming rows into columns. But first some setup. You should put the duration data on a its own table (if not already). With the addition of a identifier which then allows multiple criteria sets (more on that later). I will proceed that way.
create table bands( name text, period int4range, title text );
insert into bands(name, period, title)
values ('Standard', '[ 0, 15)'::int4range , '0 - 15')
, ('Standard', '[ 15, 30)'::int4range , '15 - 30')
, ('Standard', '[ 30, 60)'::int4range , '30 - 60')
, ('Standard', '[ 60, 90)'::int4range , '60 - 00')
, ('Standard', '[ 90,120)'::int4range , '90 - 120')
, ('Standard', '[120,)'::int4range , '120+');
This sets up the your current criteria. The name column is the prior mentioned identifier where the title column becomes the duration band on the output. The interesting column is the period; defined as an integer range. In this case a [closed,open) range that includes the 1st number but not the 2nd - yea the brackets have meaning. That definition becomes the heart of resulting query. The query builds as follows:
Retrieve the desired interval set ( [0-5) ... ) set and append to it
a "totals" entry.
Define the list of activities (cycling, ...).
Combine these sets to create a list of interval set with each
activity.
The above gives the activity intervals which becomes the matrix generated when pivoted.
Combine the "test" table values into the above list calculating the
total time for each activity within each interval. This is the work
horse of the query. It does ALL of the calculations.
The above now contains intervals plus total activity for each cell in the matrix. However it still exists in row orientation.
With the results calculated pivot them from row orientation to
column orientation.
Finally compress the pivoted results into a single row for each interval and set the final interval ordering.
And the result is:
with buckets ( period , title, ord) as
( select period , title, row_number() over (order by lower(b.period)) ord ---- 1
from bands b
where name = 'Standard'
union all
select '[0,)','Total',count(*) + 1
from bands b
where name = 'Standard'
)
, activities (activity) as ( values ('running'),('walking'),('cycling'), ('Total')) ---- 2
, activity_buckets (period, title, ord, activity) as
(select * from buckets cross join activities) ---- 3
select s2.title "Duration Band" ---- 6
, max(cycling) "Cycling"
, max(running) "Running"
, max(walking) "Walking"
, max(Total) "Total "
from ( select s1.title, s1.ord
, case when s1.activity = 'cycling' then duration else null end cycling ---- 5
, case when s1.activity = 'running' then duration else null end running
, case when s1.activity = 'walking' then duration else null end walking
, case when s1.activity = 'Total' then duration else null end total
from ( select ab.ord, ab.title, ab.activity
, sum(coalesce(t.duration,0)) duration ---- 4
from activity_buckets ab
left join test t
on ( (t.type = ab.activity or ab.activity = 'Total')
and t.duration <# ab.period --** determines which time interval(s) the value belongs
)
group by ab.ord, ab.title, ab.activity
) s1
) s2
group by s2.ord,s2.title
order by s2.ord;
See demo. It contains each of the major steps along the way. Additionally it shows how creating a table for the intervals can be put to use. Since I dislike long queries I generally hide them behind a SQL function and then just use the function. Demo also contains this.

How to iterate over a JSON object in a transaction for multiple inserts in database using pg-promise

Apologies if my title might not be clear, I'll explain the question further here.
What I would like to do is to have multiple inserts based on a JSON array that I (backend) will be receiving from the frontend. The JSON object has the following data:
//Sample JSON
{
// Some other data here to insert
...
"quests": {
[
{
"player_id": [1, 2, 3],
"task_id": [11, 12],
},
{
"player_id": [4, 5, 6],
"task_id": [13, 14, 15],
}
]
}
Based on this JSON, this is my expected output upon being inserted in Table quests and processed by the backend:
//quests table (Output)
----------------------------
id | player_id | task_id |
----------------------------
1 | 1 | 11 |
2 | 1 | 12 |
3 | 2 | 11 |
4 | 2 | 12 |
5 | 3 | 11 |
6 | 3 | 12 |
7 | 4 | 13 |
8 | 4 | 14 |
9 | 4 | 15 |
10| 5 | 13 |
11| 5 | 14 |
12| 5 | 15 |
13| 6 | 13 |
14| 6 | 14 |
15| 6 | 15 |
// Not sure if useful info, but I will be using the player_id as a join later on.
-- My current progress --
What I currently have (and tried) is to do multiple inserts by iterating each JSON object.
//The previous JSON response I accept:
{
"quests: {
[
{
"player_id": 1,
"task_id": 11
},
{
"player_id": 1,
"task_id": 12
},
{
"player_id": 6,
"task_id": 15
}
]
}
}
// My current backend code
db.tx(async t => {
const q1 // some queries
....
const q3 = await t.none(
`INSERT INTO quests (
player_id, task_id)
SELECT player_id, task_id FROM
json_to_recordset($1::json)
AS x(player_id int, tasl_id int)`,[
JSON.stringify(quests)
]);
return t.batch([q1, q2, q3]);
}).then(data => {
// Success
}).catch(error => {
// Fail
});
});
It works, but I think it's not good to have a long request body, which is why I'm wondering if it's possible to run iteration of the arrays inside the object.
If there are information needed, I'll edit again this post.
Thank you advance!

Merge two lines into one using ID in Power BI

I'm using Power BI for visualize my data saved in a Mongo Database.
My records looks like that
'_id': 0,
'code_zone': "ABCD",
'type_zone': "Beautiful",
'all_coordinates': [{
"type": "Feature",
"geometry": {
"type": "Point",
"one_coordinates": [10.11, 40.44]
},
"properties": {
"limite_vertical_min": "L0",
"limite_vertical_max": "L100"
}
}]
When I import data on Power BI, he divides my records into 3 "tables":
my_collection
my_collection.all_coordinates
my_collection.all_coordinates.one_coordinates
Because I didn't know how can I fixe this issue, I selected this 3 tables and I linked them using id.
Actually, I can visualize that :
_id | code_zone | index_all_coordinates | index_one_coordinate | value
----------------------------------------------------------------------
id0 | ABCD | 1 | 0 | 10.11
----------------------------------------------------------------------
id0 | ABCD | 1 | 1 | 40.44
I'm expecting to have that :
_id | code_zone | index_all_coordinates | value_x | value_y
------------------------------------------------------------
id0 | ABCD | 1 | 10.11 | 40.44
------------------------------------------------------------
Is it the good solution or I have to refactor my data before the import in Power BI ?
How can I merge this two lines into one with Power BI ?
To get from the first table to the second, you can pivot on the index_one_coordinate column and then relabel those new columns 0 and 1 to value_x and value_y.

MongoDB Lookup with Match condition on join field

I am new in mongoDB and I want to do something like as below :
I have two collections :
Collection_1
-----------------------
Name | MobileNo | CountryCode
S1 | 9199123456 | 91
S2 | 9199567892 | 91
S3 | 9712345678 | 971
S4 | 9716598984 | 971
S5 | 9188687789 | 91
Collection_2
----------------------
MobileNo | CountryCode
9199 | 91
9716 | 971
I have two queries :
1). I want to select all the documents of collection_1 which MobileNo is start
with 9199% or 9716% and CountryCode is same same.
I want to apply like condition on collection_2 result.
2). Can we use like condition and select Collection_1's documents which start with 9199% and 9716% without CountryCode join (lookup)?
I have tried for first query and done something like that
db.Collection_1.aggregate([
{
$lookup:
{
from: "Collection_2",
localField: "CountryCode",
foreignField: "CountryCode",
as: "result"
}
},
{
$unwind: "$CountryCode"
},
{
$match: { MobileNo : /$result.MobileNo/ }
}
]);
But unable to found any records.
Can anyone help me to get below output?
Output
------------------
Name | MobileNo | CountryCode
S1 | 9199123456 | 91
S2 | 9199567892 | 91
S4 | 9716598984 | 971
Thanks in advance.
Hemik Gajjar
I found the solution that we can substring the actual value and compare with lookup value.
Sorry, did not test query on the data
db.Collection_1.find({$or:[{"MobileNo":/^9199/}, {"MobileNo":/^9716/}]});

Optimizing MongoDB indexing multiple field with multiple query

I am new to database indexing. My application has the following "find" and "update" queries, searched by single and multiple fields
reference | timestamp | phone | username | key | Address
update x | | | | |
findOne | x | x | | |
find/limit:16 | x | x | x | |
find/limit:11 | x | | | x | x
find/limit:1/sort:-1 | x | x | | x | x
find | x | | | |
1)update({"reference":"f0d3dba-278de4a-79a6cb-1284a5a85cde"}, ……….
2)findOne({"timestamp":"1466595571", "phone":"9112345678900"})
3)find({"timestamp":"1466595571", "phone":"9112345678900", "username":"a0001a"}).limit(16)
4)find({"timestamp":"1466595571", "key":"443447644g5fff", "address":"abc road, mumbai, india"}).limit(11)
5)find({"timestamp":"1466595571", "phone":"9112345678900", "key":"443447644g5fff", "address":"abc road, mumbai, india"}).sort({"_id":-1}).limit(1)
6)find({"timestamp":"1466595571"})
I am creating index
db.coll.createIndex( { "reference": 1 } ) //for 1st, 6th query
db.coll.createIndex( { "timestamp": 1, "phone": 1, "username": 1 } ) //for 2nd, 3rd query
db.coll.createIndex( { "timestamp": 1, "key": 1, "address": 1, phone: 1 } ) //for 4th, 5th query
Is this the correct way?
Please help me
Thank you
I think what you have done looks fine. One way to check if your query is using an index, which index is being used, and whether the index is effective is to use the explain() function alongside your find().
For example:
db.coll.find({"timestamp":"1466595571"}).explain()
will return a json document which details what index (if any) was used. In addition to this you can specify that the explain return "executionStats"
eg.
db.coll.find({"timestamp":"1466595571"}).explain("executionStats")
This will tell you how many index keys were examined to find the result set as well as the execution time and other useful metrics.