Say the collection store data in the below format. Every day a new entry is added in the collection. Dates are in ISO format.
|id|dt|data|
---
|1|2021-03-17|{key:"A", value:"B"}
...
|1|2021-03-14|{key:"A", value:"B"}
...
|1|2021-02-28|{key:"A", value:"B"}
|1|2021-02-27|{key:"A", value:"B"}
...
|1|2021-02-01|{key:"A", value:"B"}
|1|2021-01-31|{key:"A", value:"B"}
|1|2021-01-30|{key:"A", value:"B"}
...
|1|2021-01-01|{key:"A", value:"B"}
|1|2020-12-31|{key:"A", value:"B"}
...
|1|2020-11-30|{key:"A", value:"B"}
...
I need help with a query that gives me the last day of each month for a given period of time. Below is the query I was able to do which is not giving last day of the current month as I am sorting it by day, month and year.
db.getCollection('data').aggregate([
{
$match: {dt: {$gt: ISODate("2020-01-01")}
},
{
$project: {
dt: "$dt",
month: {
$month: "$dt"
},
day: {
$dayOfMonth: "$dt"
},
year: {
$year: "$dt"
},
data: "$data"
}
},
{
$sort: {day: -1, month: -1, year: -1}
},
{ $limit: 24},
{
$sort: {dt: -1}
},
])
The results I am after is:
|1|2021-03-17|{key:"A", value:"B"}
|1|2021-02-28|{key:"A", value:"B"}
|1|2021-01-31|{key:"A", value:"B"}
|1|2020-12-31|{key:"A", value:"B"}
|1|2020-11-30|{key:"A", value:"B"}
...
|1|2020-01-31|{key:"A", value:"B"}
Group the records by year and month, get the max date for that month.
db.getCollection('data').aggregate([
{ $match: { dt: { $gt: ISODate("2020-01-01") } } },
{ $group: { // group by
_id: { $substr: ['$dt', 0, 7] }, // get year and month eg 2020-01
dt: { $max: "$dt" }, // find the max date
doc:{ "$first" : "$$ROOT" } } // to get the document
},
{ "$replaceRoot": { "newRoot": "$doc"} }, // project the document
{ $sort: { dt: -1 } }
]);
$substr
$group
$replaceRoot
$max
$first
I monkey patched a possible solution for you in Python, but without your DB, I can't be positive that this works.
First there's a function that takes in an integer representing a month and returns the last day of that month.
import datetime as dt
def last_day_of_month(month):
return dt.datetime(2021, month+1, 1) - dt.timedelta(days=1)
Next, I built the query with a separate function.
def build_query(last_month):
return [
{
"$and": [
{"date": {"$gte": last_day_of_month(i)}},
{"date": {"$lt": last_day_of_month(i) + dt.timedelta(days=1)}}
]
}
for i in range(0, last_month)
]
Here's the output. It would be inside an $or operator in the $match stage.
{'$match': {'$or': [{'$and': [{'date': {'$gte': datetime.datetime(2020, 12, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 1, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 1, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 2, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 2, 28, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 3, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 3, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 4, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 4, 30, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 5, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 5, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 6, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 6, 30, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 7, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 7, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 8, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 8, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 9, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 9, 30, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 10, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 10, 31, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 11, 1, 0, 0)}}]},
{'$and': [{'date': {'$gte': datetime.datetime(2021, 11, 30, 0, 0)}},
{'date': {'$lt': datetime.datetime(2021, 12, 1, 0, 0)}}]}]}}
Related
I have jsonb array like this,
"[{""year"": 2020, ""month"": 8, ""visitor"": 1}, {""year"": 2020, ""month"": 12, ""visitor"": 1}, {""year"": 2021, ""month"": 9, ""visitor"": 1}, {""year"": 2021, ""month"": 11, ""visitor"": 1}, {""year"": 2022, ""month"": 1, ""visitor"": 2}]"
Thats my query
SELECT to_json(t."MonthlyVisitors")->>-1 as visitor FROM "Table" t Where "Url"='asd'
Query Result
"{""year"": 2022, ""month"": 1, ""visitor"": 2}"
but the answer i want just the last item;2
You can use the function jsonb_array_length:
WITH jsontable(j) AS (
SELECT JSONB '[{"year": 2020, "month": 8, "visitor": 1},
{"year": 2020, "month": 12, "visitor": 1},
{"year": 2021, "month": 9, "visitor": 1},
{"year": 2021, "month": 11, "visitor": 1},
{"year": 2022, "month": 1, "visitor": 2}]'
)
SELECT j -> (jsonb_array_length(j) - 1)
FROM jsontable;
?column?
══════════════════════════════════════════
{"year": 2022, "month": 1, "visitor": 2}
(1 row)
Even more efficient is to use the array index -1 (hats off to a_horse_with_no_name):
WITH jsontable(j) AS (...)
SELECT j -> -1
FROM jsontable;
I'm not sure that my question is correct, but it seems so:
I have a set of rows in my Mongodb, like:
[{'_id': '5b4c9aa7ddc752c1f5844315',
'ccode': 'RU',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5b4cad0dddc752c1f5844322',
'ccode': 'US',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 2,
'regs_age1': 2,
'regs_male': 2}},
{'_id': '5bd88204af4c814883a414b2',
'ccode': 'US',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5bd88204af4c814883a414b3',
'ccode': 'RU',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}}]
And I want to sort them by date and combine because for the same date there are multiple rows from different countries.
So the result should look something like ...
[{'2018-07-16T00:00:00.000Z': [{'_id': '5b4c9aa7ddc752c1f5844315',
'ccode': 'RU',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5b4cad0dddc752c1f5844322',
'ccode': 'US',
'date': '2018-07-16T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 4,
'registered': 2,
'regs_age1': 2,
'regs_male': 2}}]},
{'2018-10-30T00:00:00.000Z': [{'_id': '5bd88204af4c814883a414b2',
'ccode': 'US',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}},
{'_id': '5bd88204af4c814883a414b3',
'ccode': 'RU',
'date': '2018-10-30T00:00:00.000Z',
'rates': {'reg_emails_confirmed': 2,
'registered': 1,
'regs_age1': 1,
'regs_male': 1}}]}]
I tried:
db.getCollection('daily_stats').aggregate([
{'$match': some_condition},
{'$group': {'ccode': 1}}, # ccode or date?
{'$sort': {"date": 1}},
])
But got an error
The field * must be an accumulator object
I googled the error, it's pretty clear, but not seems that related to my case. I don't need any sum, avg, etc functions
Query
sort by date (asceding here, if you need descending put -1)
group by date and collect the ROOT documents
replace the root so you have the date as key
*this assumes you have dates on strings, which is bad idea, if you convert them to date objects, you can still use the query but add
"k":{"$dateToString" : {"date" :"$_id"}}
Test code here
aggregate(
[{"$sort":{"date":1}},
{"$group":{"_id":"$date", "docs":{"$push":"$$ROOT"}}},
{"$replaceRoot":
{"newRoot":{"$arrayToObject":[[{"k":"$_id", "v":"$docs"}]]}}}])
When using $group, you need an _id
From the docs
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
In your case...
db.getCollection('daily_stats').aggregate([
{'$match': some_condition},
{'$group': {
'_id': "$ccode",
'rates': { $addToSet: '$rates' },
'date': { $first: '$date' }
}},
{'$sort': {"date": 1}},
{'$project: { "_id": 0, "country": "$_id", "rates": 1, "date": 1 }}
])
Playground: https://mongoplayground.net/p/B31XLS9p-6W
I have two collections names batting and candidates. First i do aggregate and necessary pipeline aggregations to get my outputs.
batting_data = batting.aggregate(
[
{
"$group": {
"_id": "$playerID",
"AB": {"$sum": "$AB"},
"R": {"$sum": "$R"},
"H": {"$sum": "$H"},
"2B": {"$sum": "$2B"},
},
},
{"$match": {"AB": {"$gt": 0}}},
{
"$addFields": {
"avg": {"$divide": ["$H", "$AB"]},
"slug": {
"$divide": [
{
"$add": [
"$H",
"$2B",
{"$multiply": ["$HR", 3]},
]
},
"$AB",
]
},
}
},
]
)
And second collections names candidates.
candidates_data = candidates.aggregate(
[
{
"$group": {
"_id": "$playerID",
"AB": {"$sum": "$AB"},
"R": {"$sum": "$R"},
"H": {"$sum": "$H"},
"2B": {"$sum": "$2B"},
},
},
{"$match": {"AB": {"$gt": 0}}},
{
"$addFields": {
"avg": {"$divide": ["$H", "$AB"]},
"slug": {
"$divide": [
{
"$add": [
"$H",
"$2B",
{"$multiply": ["$HR", 3]},
]
},
"$AB",
]
},
}
},
]
)
After that I got two object output like this,
for i in batting_data:
print(i)
output:
{'_id': 'kellyto01', 'AB': 127, 'R': 11, 'H': 23, '2B': 5, 'avg': 0.18110236220472442}
{'_id': 'cedenan01', 'AB': 2051, 'R': 221, 'H': 485, '2B': 98, 'avg': 0.23647001462701123}
{'_id': 'fanokha01', 'AB': 6, 'R': 0, 'H': 2, '2B': 1, 'avg': 0.333333333333333}
{'_id': 'baueral01', 'AB': 23, 'R': 3, 'H': 5, '2B': 0, 'avg': 0.21739130434782608}
{'_id': 'coleal01', 'AB': 1760, 'R': 286, 'H': 493, '2B': 58, 'avg': 0.28011363636363634}
{'_id': 'cuylemi01', 'AB': 1386, 'R': 227, 'H': 329, '2B': 47, 'avg': 0.23737373737373738}
{'_id': 'dicksla01', 'AB': 3, 'R': 0, 'H': 0, '2B': 0, 'avg': 0.0}
{'_id': 'willijo01', 'AB': 48, 'R': 2, 'H': 6, '2B': 1, 'avg': 0.12}
{'_id': 'deshide01', 'AB': 5779, 'R': 872, 'H': 1548, '2B': 244, 'avg': 0.2678664128741997}
{'_id': 'mooretr01', 'AB': 26, 'R': 1, 'H': 6, '2B': 2, 'avg': 0.23076923076923078}
{'_id': 'smythha01', 'AB': 54, 'R': 7, 'H': 13, '2B': 2, 'avg': 0.24074074074074073}
{'_id': 'garceri01', 'AB': 3, 'R': 0, 'H': 0, '2B': 0, 'avg': 0.0}
{'_id': 'gardimi01', 'AB': 4, 'R': 0, 'H': 0, '2B': 0, 'avg': 0.0}
{'_id': 'calhowi01', 'AB': 542, 'R': 65, 'H': 133, '2B': 21, 'avg': 0.24538745387453875,}
And for candidates_data :
for i in candidates_data:
print(i)
{'_id': 'bondsba01', 'AB': 9847, 'R': 2227, 'H': 2935, '2B': 601, 'avg': 0.29806032294099727}
{'_id': 'brookhu01', 'AB': 5974, 'R': 656, 'H': 1608, '2B': 290, 'avg': 0.26916638767994644}
{'_id': 'jamesbo01', 'AB': 21, 'R': 0, 'H': 4, '2B': 0, 'avg': 0.19047619047619047}
{'_id': 'hassero01', 'AB': 3440, 'R': 348, 'H': 914, '2B': 172, 'avg': 0.26569767441860465}
{'_id': 'addybo01', 'AB': 1231, 'R': 227, 'H': 341, '2B': 36, 'avg': 0.2770105605199025}
{'_id': 'cedence01', 'AB': 7310, 'R': 1084, 'H': 2087, '2B': 436, 'avg': 0.28549931600547196}
Now i am trying to execute the code like,
score = 0.0
for can in candidates_data:
for bat in batting_data:
score = 1000 - (((can['AB'] - bat['AB'])/20) + ((can['H'] - bat['H'])/25))
And it should print every line that inside the candidate output. So thats i am trying to loop candidates first and then batting data in inner loop.
My Output expectation like,
bondsba01 - 568.56
brookhu01 - 450.67
like this upto the loop is running
If anyone response . It would be a great help . Thanks,
Here's how my collection looks:
{"_id": 1, "price_history": [{date: 10-01-19, price: 10}, {date: 10-05-19, price: 15}...]...}
{"_id": 2, "price_history": [{date: 10-01-19, price: 12}, {date: 10-05-19, price: 14}...]...}
{"_id": 3, "price_history": [{date: 10-01-19, price: 17}, {date: 10-05-19, price: 25}...]...}
{"_id": 4, "price_history": [{date: 10-01-19, price: 10}, {date: 10-05-19, price: 16}...]...}
(The dates are all date objects, just wrote them this way to read easier)
So I'm able to get the max price from the "price_history" array, but I also want to get that date object that matches with that max price.
Here's what I have so far, I've removed a lot of irrelevant stuff to the question.
{
$group: {
'_id': 'stats',
'price_history_stats': {
$push: {
'_id': '$_id',
'highest': {
$max: '$price_history.price'
}
}
}
}
}
The output I am getting is:
{
'_id': 'stats',
'price_history_stats': [
{'_id': 1, 'highest': 15},
{'_id': 1, 'highest': 14},
{'_id': 1, 'highest': 25},
{'_id': 1, 'highest': 16}
]
}
But I'm looking for a way to achieve this with the dates:
{
'_id': 'stats',
'price_history_stats': [
{'_id': 1, 'highest': 15, date: 10-05-10},
{'_id': 1, 'highest': 14, date: 10-05-10},
{'_id': 1, 'highest': 25, date: 10-05-10},
{'_id': 1, 'highest': 16, date: 10-05-10}
]
}
(Excuse any typos, I reformatted a lot of stuff for the question)
Any help would be appreciated. Thanks
If the intent is to find the max document for the group based on price, a combination of $sort first on price then $group with $last will produce a similar output.
Query: Link
db.collection.aggregate([
{
$unwind: "$price_history"
},
{
$sort: {
"price_history.price": 1
}
},
{
$group: {
_id: "$_id",
max_price_doc: {
$last: "$price_history"
}
}
}
]);
Output:(Demo)
[
{
"_id": 1,
"max_price_doc": {
"date": "10 - 05 - 19",
"price": 15
}
},
{
"_id": 4,
"max_price_doc": {
"date": "10 - 05 - 19",
"price": 16
}
},
{
"_id": 3,
"max_price_doc": {
"date": "10 - 05 - 19",
"price": 25
}
},
{
"_id": 2,
"max_price_doc": {
"date": "10 - 05 - 19",
"price": 14
}
}
]
I have a collection with documents like this:
{
"User": { _id: 1, UserName: "username", DisplayName: "DisplayName" },
"Interests": [1, 4, 7, 25, 30, 34, 46],
"MinAge": 11,
"Title": "ad title",
...
}
I want to select the 10 documents which matches the most number of interests from a given array, like:
array = [1,7, 30, 33, 38, 46, 55];
How could I do that?
You can achieve it with unwinding and grouping. You will have to group on unique identifier.
db.colName.aggregate(
{$unwind: '$Interests'},
{$match: {Interests: {$in: [1,7, 30, 33, 38, 46, 55]}}},
{$group: {_id: '$User.UserName', count: {$sum:1}}},
{$sort: {count: -1}},
{$limit: 10}
)