mongodb group and count, error in sharded environment - mongodb

I have been looking around and searching the internet, but I couldn't find an answer. Maybe I am wrong with my aggregation, please correct me if I am wrong.
supposedly I have a data like this
mongos> db.data_analytics.find().sort({_id:-1}).limit(1).pretty()
{
"title" : "data weather currency",
"time" : "Sun Dec 08 2013 04:01:51 GMT+0900 (JST)",
"_id" : ObjectId("52a3709f52d744c201000ba6"),
"dataAnalytics" : [
{
"keyItem" : "currency-yen-php",
"currency" : 0.427044,
"from" : "JPY",
"to" : "PHP",
"time" : "Sun Dec 08 2013 04:01:49 GMT+0900 (JST)",
"key" : "currency"
},
{
"keyItem" : "weather-akiruno",
"main-temp" : "5.78",
"main-temp_min" : "2.78",
"main-temp_max" : "7.78",
"weather-main" : "Clouds",
"weather-description" : "scattered clouds",
"time" : "Sun Dec 08 2013 04:01:50 GMT+0900 (JST)",
"key" : "weather"
},
],
"__v" : 0
}
I want to group and count by "$dataAnalytics.currency", but I get an error in a Sharded environment like this
mongos> db.data_analytics.aggregate([{$unwind: "$dataAnalytics"}, {$group:{_id:"$dataAnalytics.currency", c: {$sum:1}}}])
{
"code" : 16390,
"ok" : 0,
"errmsg" : "exception: sharded pipeline failed on shard jeanepaul_set_pi_debian_01: { errmsg: \"exception: invalid operator \"$const\"\", code: 15999, ok: 0.0 }"
}
I have tried doing this in replica set successfully, but no luck in sharded environment. what could be the problem?
I appreciate any help, and thank you for any help in advance!

You're currently grouping by the entire dataAnalytics element, you need to add .currency (or some other individual field) to your _id like this:
db.data_analytics.aggregate([
{$unwind: "$dataAnalytics"},
{$group:{_id:"$dataAnalytics.currency", c: {$sum:1}}}])

Sir Ian, if you can see this, I have solved the problem. I added a $match to my aggregate. and here's my result
mongos> db.data_analytics.aggregate([{$unwind:"$dataAnalytics"},{$match:{"dataAnalytics.key":"currency"}},{$group:{_id:"$dataAnalytics.currency", total: {$sum:1}}}, {$limit:100}])
{
"result" : [
{
"_id" : 0.428345,
"total" : 3
},
{
"_id" : 0.428517,
"total" : 7
},
{
"_id" : 0.428511,
"total" : 63
},
{
"_id" : 0.42802,
"total" : 7
}
],
"ok" : 1
}
the error seems to be gone. Thank you again for the suggestion!

Related

mongoDB aggregation on large dataset has run for HOURS with no end in sight. Normal? Or can I speed this up?

I have a collection of 6-7 million event records. I have another collection of ~100,000 hourly weather records spanning the same timeframe as the event records. I am using an $aggregate pipeline with $lookup to merge in relevant weather data for each event in the event collection.
THE PROBLEM: I have been running this on the full EVENT dataset for more than 8 HOURS, with no result. I have a deadline and I'm wondering if I will get a result...ever.
PLEASE HELP
Here is a sample event record:
{
"_id" : ObjectId("5dedae8111cd89b173b00910"),
"EventType" : "P",
"Jurisdiction" : "ABCD",
"Year" : 2006,
"JulianDay" : 91,
"CallReceipt" : ISODate("2006-04-01T00:00:37Z"),
"EventClosed" : ISODate("2006-04-01T00:05:25Z"),
"FinalType" : "EFGHI",
"EventWindowStart" : ISODate("2006-04-01T00:00:00Z"),
"EventWindowEnd" : ISODate("2006-04-01T01:00:00Z")
}
Here is a weather record:
{
"_id" : ObjectId("5dc3cd909fc78c0c78a336da"),
"DATE" : ISODate("2012-01-01T00:02:00Z"),
"REPORT_TYPE" : "FM-16",
"SOURCE" : 7,
"HourlyAltimeterSetting" : "30.06",
"HourlyDewPointTemperature" : "36",
"HourlyDryBulbTemperature" : "37",
"HourlyPresentWeatherType" : "BR:1 ||",
"HourlyRelativeHumidity" : 93,
"HourlySkyConditions" : "SCT:04 7 BKN:07 15 OVC:08 33",
"HourlyStationPressure" : "29.46",
"HourlyVisibility" : "5.00",
"HourlyWetBulbTemperature" : 37,
"HourlyWindDirection" : "260",
"HourlyWindSpeed" : 5,
"REM" : "MET10101/01/12 00:02:02 SPECI KROC 010502Z 26004KT 5SM BR SCT00
7 BKN015 OVC033 03/02 A3006 RMK AO2 RTX (MP)",
"REPORT_MODE" : "hourly"
}
Here is my code, typed directly into the mongo shell:
db.EVENTS.aggregate([
{
$lookup:
{
from: "WEATHER",
let : { start : "$EventWindowStart", end: "$EventWindowEnd"},
pipeline : [
{ $match :
{ $expr:
{ $and:
[
{$gte: ["$DATE", "$$start"]},
{$lte: ["$DATE", "$$end"]}
]
}
}
},
{$project: {
_id : 0,
HourlyDryBulbTemperature : 1,
HourlyPrecipitation : 1,
HourlyVisibility : 1,
WindSpeed: 1
}
}
],
as: "HourlyWeatherData"
}
},
{$out: "MERGED" }
])
On a small test subset I get the desired output. So the code works, as far as I can tell...
Sample output:
{
"_id" : ObjectId("5dedae8111cd89b173b00910"),
"EventType" : "P",
"Jurisdiction" : "ABCD",
"Year" : 2006,
"JulianDay" : 91,
"CallReceipt" : ISODate("2006-04-01T00:00:37Z"),
"EventClosed" : ISODate("2006-04-01T00:05:25Z"),
"FinalType" : "EFGHI",
"EventWindowStart" : ISODate("2006-04-01T00:00:00Z"),
"EventWindowEnd" : ISODate("2006-04-01T01:00:00Z"),
"HourlyWeatherData" : [
{
"HourlyDryBulbTemperature" : "59",
"HourlyPrecipitation" : "0.00",
"HourlyVisibility" : "10.00"
},
{
"HourlyDryBulbTemperature" : "59",
"HourlyVisibility" : "9.94"
}
]
}
PS: I do have ascending indexes on the event window fields in EVENTS, and an ascending and descending index on the DATE in WEATHER.

Explanation of mongo query with OR condition

I have this data in my db
{ "_id" : ObjectId("5c89da093180684aba34c5b7"), "name" : "Allen", "age" : 24, "nicknames" : [ "kanky" ], "siblings" : [ ] }
{ "_id" : ObjectId("5c89da4b3180684aba34c5b8"), "name" : "Sonata", "age" : "30" }
{ "_id" : ObjectId("5c89da8f3180684aba34c5b9"), "name" : "Kaushik", "age" : "20" }
{ "_id" : ObjectId("5c89da8f3180684aba34c5ba"), "name" : "Stuart", "age" : "24" }
{ "_id" : "5c89da093180684aba34c5b7", "name" : "Allen", "age" : 24 }
When I run this query db.people.find({$or: [{name:"Allen"}, {age:24}]}), it doesn't give the entry with name : Stuart which has age : 24.
But if I run this query db.people.find({$or: [{name:"Stuart"}, {age:24}]}), it works as intended.
Can anyone explain how does this work? I am starting with mongodb, so mightbe a very basic question.
Thanks
You have different schema types for age in different documents. For Allen you are using a number data type for age and for Stuart you are using a string to store age. I think thats the problem.
Try running this:-
db.people.find({$or: [{name:"Allen"}, {age: "24" }]})
You will get Stuart with this query. I don't see anything else wrong here except for different data types.

MongoDB - Update a field only if it doesn't exist

I found a question similar to mine regarding updating a document if a certain field does not exist. But in the question, it is not an array of documents like what I have
{
"_id" : ObjectId("5a5f814487c320156094c144"),
"sender_id" : "123",
"iso_number" : "ABC-DEF-123",
"subject" : "Sample Memo",
"content" : "This is a sample memorandum sent through postman.",
"recipients" : [
{
"faculty_number" : 222,
"_id" : ObjectId("5a5f814487c320156094c146"),
"status" : "Sent"
},
{
"faculty_number" : 111,
"_id" : ObjectId("5a5f814487c320156094c145"),
"status" : "Sent"
}
],
"memo_created" : ISODate("2018-01-17T17:00:52.104Z"),
"__v" : 0
}
I'm trying to attach a memo_seen to recipient with faculty_number 111, I'm using the query.
db.getCollection('memos').update({"_id": id, "recipients.faculty_number": faculty_number, "recipients.$.memo_seen": {$exists: false}}, {$set: {"recipients.$.memo_seen": timestamp}})
The timestamp should not update when I repeat the query. Can someone help me out? Been stuck for quite some time now.

How to query an array of objects in mongodb

I have an object structure as shown below
{
"_id" : ObjectId("55d164f1c8f2c53a82535b9a"),
"plant_name" : "TOTAL",
"installed_capacity" : 3473,
"wind_data" : [
{
"date" : "16-08-15",
"timestamp" : " 16:27:15",
"generated_capacity" : 617.24,
"frequency" : 50.01
},
{
"date" : "16-08-15",
"timestamp" : " 21:21:15",
"generated_capacity" : 670.25,
"frequency" : 49.94
}, ....]
}
I need to sum up (at least retrieve) "generated_capacity" of all the objects under "wind_data" having "date" equal to "16-08-15" of "TOTAL" object. I have tried this query
db.collectionName.aggregate(
{"$unwind":"$wind_data"},
{"$match":{"plant_name":"TOTAL","wind_data.date":"16-08-15"}}
)
But, this query is not working. Please suggest some way to figure this out.
The following query would do the job
db.collectionName.aggregate([
{"$unwind":"$wind_data"},
{"$match":{"plant_name":"TOTAL","wind_data.date":"16-08-15"}},
{"$group":{"_id":"$wind_data.date","generated_capacity_sum":{"$sum":"$wind_data.generated_capacity"}}}
])

Mongodb save/upsert using C# drivers, continuous array adds and field updates to same doc

I need some ideas/tips for this. Here is a sample document I am storing:
{
"_id" : new BinData(0, "C3hBhRCZ5ZFizqbO1hxwrA=="),
"gId" : 237,
"name" : "WEATHER STATION",
"mId" : 341457,
"MAC" : "00:00:00:00:00:01",
"dt" : new Date("Fri, 24 Feb 2012 13:59:02 GMT -05:00"),
"hw" : [{
"tag" : "Weather Sensors",
"snrs" : [{
"_id" : NumberLong(7),
"sdn" : "Wind Speed"
}, {
"_id" : NumberLong(24),
"sdn" : "Wind Gust"
}, {
"_id" : NumberLong(28),
"sdn" : "Wind Direction"
}, {
"_id" : NumberLong(31),
"sdn" : "Rainfall Amount"
}, {
"_id" : NumberLong(33),
"sdn" : "Rainfall Peak Amount"
}, {
"_id" : NumberLong(38),
"sdn" : "Barometric Pressure"
}],
"_id" : 1
}]
}
What I am currently doing is using the C# driver and performing a .Save() to my collection to get upsert, however, what I want is kinda a hybrid approach I guess. Here are the distinct operations I need to be able to perform:
Upsert entire document if it does not exist
Update the dt field with a new timestamp if the document does exist
For the hw field, I need several things here. If hw._id exists, update its tag field as well as handling the snrs field by either updating existing entries so the sdn value is updated or adding entirely new entires when _id does not exist
Nothing should ever be removed from the hw array and nothing should ever be removed from the snrs array.
A standard upsert does not appear to get me what I am after, so I am looking for the best way to do what I need with as few roundtrips to the server as possible. I am thinking some of the $ Operators may be what I am needing here, but just need some thoughts on how best to approach this.
The gist of what I am doing here is keeping an accumulating, historical document of snrs entries with the immediate current value as well as retaining any historical entries in the array even though they are no longer "alive", being reported, etc. This allows future reporting on things that no longer exist in current time, but were at some point in the past. _id values are application-generated, globally unique across all documents, and never change after initial creation. For example, last week "Wind Speed" was being reported, but this week it is not. It's _id value, however, will not change if "Wind Speed" starts reporting again. Follow?
Clarifications or more detail can be provided if needed.
Thanks.
By changing the structure of your document from embedded arrays to subdocuments key'ed by the _ids you can do this.
e.g.
{
"MAC" : "00:00:00:00:00:01",
"_id" : 1,
"dt" : ISODate("2012-02-24T18:59:02Z"),
"gId" : 237,
"hw" : {
"1" : {
"snrs" : {
"1" : "Wind Speed",
"2" : "Wind Gust"
},
"tag" : "Weather Sensors"
}
},
"mId" : 341457,
"name" : "WEATHER STATION 1"
}
I created the above document by the following upsert
db.foo.update(
{_id:1},
{
$set: {
"gId" : 237,
"name" : "WEATHER STATION 1",
"mId" : 341457,
"MAC" : "00:00:00:00:00:01",
"dt" : new Date("Fri, 24 Feb 2012 13:59:02 GMT -05:00"),
"hw.1.tag" : "Weather Sensors",
"hw.1.snrs.1" : "Wind Speed",
"hw.1.snrs.2" : "Wind Gust"
}
},
true
)
Now when I run
db.foo.update(
{_id:1},
{
$set: {
"dt" : new Date(),
"hw.2.snrs.1" : "Rainfall Amount"
}
},
true
)
I get
{
"MAC" : "00:00:00:00:00:01",
"_id" : 1,
"dt" : ISODate("2012-03-07T05:14:31.881Z"),
"gId" : 237,
"hw" : {
"1" : {
"snrs" : {
"1" : "Wind Speed",
"2" : "Wind Gust"
},
"tag" : "Weather Sensors"
},
"2" : {
"snrs" : {
"1" : "Rainfall Amount"
}
}
},
"mId" : 341457,
"name" : "WEATHER STATION 1"
}