Is it possible to aggregate on data that is stored via DBRef?
Mongo 2.6
Let's say I have transaction data like:
{
_id : ObjectId(...),
user : DBRef("user", ObjectId(...)),
product : DBRef("product", ObjectId(...)),
source : DBRef("website", ObjectId(...)),
quantity : 3,
price : 40.95,
total_price : 122.85,
sold_at : ISODate("2015-07-08T09:09:40.262-0700")
}
The trick is "source" is polymorphic in nature - it could be different $ref values such as "webpage", "call_center", etc that also have different ObjectIds. For example DBRef("webpage", ObjectId("1")) and DBRef("webpage",ObjectId("2")) would be two different webpages where a transaction originated.
I would like to ultimately aggregate by source over a period of time (like a month):
db.coll.aggregate( { $match : { sold_at : { $gte : start, $lt : end } } },
{ $project : { source : 1, total_price : 1 } },
{ $group : {
_id : { "source.$ref" : "$source.$ref" },
count : { $sum : $total_price }
} } );
The trick is you get a path error trying to use a variable starting with $ either by trying to group by it or by trying to transform using expressions via project.
Any way to do this? Actually trying to push this data via aggregation to a subcollection to operate on it there. Trying to avoid a large cursor operation over millions of records to transform the data so I can aggregate it.
Mongo 4. Solved this issue in the following way:
Having this structure:
{
"_id" : LUUID("144e690f-9613-897c-9eab-913933bed9a7"),
"owner" : {
"$ref" : "person",
"$id" : NumberLong(10)
},
...
...
}
I needed to use "owner.$id" field. But because of "$" in the name of field, I was unable to use aggregation.
I transformed "owner.$id" -> "owner" using following snippet:
db.activities.find({}).aggregate([
{
$addFields: {
"owner": {
$arrayElemAt: [{ $objectToArray: "$owner" }, 1]
}
}
},
{
$addFields: {
"owner": "$owner.v"
}
},
{"$group" : {_id:"$owner", count:{$sum:1}}},
{$sort:{"count":-1}}
])
Detailed explanations here - https://dev.to/saurabh73/mongodb-using-aggregation-pipeline-to-extract-dbref-using-lookup-operator-4ekl
You cannot use DBRef values with the aggregation framework. Instead you need to use JavasScript processing of mapReduce in order to access the property naming that they use:
db.coll.mapReduce(
function() {
emit( this.source.$ref, this["total_price"] )
},
function(key,values) {
return Array.sum( values );
},
{
"query": { "sold_at": { "$gte": start, "$lt": end } },
"out": { "inline": 1 }
}
)
You really should not be using DBRef at all. The usage is basically deprecated now and if you feel you need some external referencing then you should be "manually referencing" this with your own code or implemented by some other library, with which you can do so in a much more supported way.
Related
I have a Mongo Document in below format:
{
"id":"eafa3720-28e2-11ed-bf07"
"type":"test"
"serviceType_details": [
{
"is_custom_service_type": false,
"bill_amount": 100
}
]
}
"serviceType_details" Key doesn't have any definite schema.
Now I want to export it using MongoDB aggregate to Parquet so that I could use Presto to query it.
My Pipeline Code:
db.test_collection.aggregate([
{
$match: {
"id": "something"
}
},
{
$addFields: {
...
},
}
{
"$out" : {
"format" : {
"name" : "parquet",
"maxFileSize" : "10GB",
"maxRowGroupSize" : "100MB"
}
}
}
])
Now I want to export the value of "serviceType_details" in json string not as array ( when using current code parquet recognises it as an array)
I have tried $convert,$project and it's not working.
Currently the generated Parquet schema looks something like this:
I want the generated Parquet schema for "serviceType_details" to have as string and value should be stringify version of array which is present in mongo document.
Reason for me to have need it as string is because in each document "serviceType_details" details have completely different schema, its very difficult to maintain Athena table on top of it.
You can use the $function operator to define custom functions to implement behaviour not supported by the MongoDB Query Language
It could be done using "$function" like this:
db.test_collection.aggregate([
{
$match: {
"id": "something"
}
},
{
$addFields: {
newFieldName: {
$function: {
body: function(field) {
return (field != undefined && field != null) ? JSON.stringify(field) : "[]"
},
args: ["$field"],
lang: "js"
}
},
},
}
{
"$out" : {
"format" : {
"name" : "parquet",
"maxFileSize" : "10GB",
"maxRowGroupSize" : "100MB"
}
}
}
])
Executing JavaScript inside an aggregation expression may decrease performance. Only use the $function operator if the provided pipeline operators cannot fulfill your application's needs.
I'm running an aggregation query, and the $group stage is as follows
$group:
{
_id:
{
year_month: { $dateToString: { "date": "$updated_at", "format": "%Y-%m" } }
,client_name: "$clients_docs.client_name"
,client_label: "$clients_docs.client_label"
,client_code: "$clients_docs.client_code"
,client_country: "$clients_docs.client_country"
,base_curr: "$clients_docs.client_base_currency"
,inv_curr: "$clients_docs.client_invoice_currency"
,dest_curr: "$store.destination_currency"
}
,total_vol: { $sum: "$USD_Value" }
,total_tran: { $sum: 1 }
}
It returns the correct results, and returns all the grouped results in the _id:{} array.
I now want to extract all those fields from the array and return them not within the array so I can more easily export the output to a spreadsheet.
I tried using this stage:
{
$project:
{
year_month: 1
,client_name: 1
,client_label: 1
,client_code: 1
,client_country: 1
,base_curr: 1
,inv_curr: 1
,dest_curr: 1
,total_vol: 1
,total_tran : 1
}
},
But that returned the same results as the $group stage:
{
"_id" : {
"year_month" : "2022-01",
"client_name" : "client A",
"client_label" : "client A",
"client_code" : NumberInt(0000),
"client_country" : "TH",
"base_curr" : "USD",
"inv_curr" : "USD",
"dest_curr" : "HKD"
},
"total_vol" : 100000,
"total_tran" : 100.0
}
I want the "year_month" through "dest_curr" fields at the same level as the "total_vol" and "total_tran", so that when the data is exported they all appear as separate columns (now it's all captured as one "_id" column, and a "total_vol" and "total_tran" column). What's the best way to do this?
From a terminology perspective, you currently have an embedded document (or nested fields) rather than an array.
The straightforward way to do this is to simply enumerate each field, eg:
"year_month": "$_id.year_month",
There are fancier ways to do this, but as you only have a handful of fields this should suffice. Working playground example here.
Edit
An alternative ("fancier") approach is to leverage the $replaceWith stage using the $mergeObjects operator inside of it. Then you can $unset the previous _id field afterwards. It would look like this:
db.collection.aggregate([
{
"$replaceWith": {
"$mergeObjects": [
"$$ROOT",
"$_id"
]
}
},
{
$unset: "_id"
}
])
Playground link here
I also fixed the earlier playground link that had a typo for the client_label field.
new to Mongo. Trying to group across different sub fields of a document based on a condition. The condition is a regex on a field value. Looks like -
db.collection.aggregate([{
{
"$group": {
"$cond": [{
"upper.leaf": {
$not: {
$regex: /flower/
}
}
},
{
"_id": {
"leaf": "$upper.leaf",
"stem": "$upper.stem"
}
},
{
"_id": {
"stem": "$upper.stem",
"petal": "$upper.petal"
}
}
]
}
}])
Using api v4.0: cond in the docs shows - { $cond: [ <boolean-expression>, <true-case>, <false-case> ] }
The error I get with the above code is - "Syntax error: dotted field name 'upper.leaf' can not used in a sub object."
Reading up on that I tried $let to re-assign the dotted field name. But started to hit various syntax errors with no obvious issue in the query.
Also tried using $project to rename the fields, but got - Field names may not start with '$'
Thoughts on the best approach here? I can always address this at the application level and split my query into two but it's attractive potentially to solve it natively in mongo.
$group syntax is wrong
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
You tried to do
{
$group:
<expression>
}
And even if your expression resulted in the same code, its invalid syntax for $group (check from the documentation where you are allowed to use expressions)
One other problem is that you use the query operator for regex, and not the aggregate regex operators (you can't do that, if you aggregate you can use only aggregate operators, only $match is the exception that you can use both if you add $expr)
You need this i think
[{
"$group" : {
"_id" : {
"$cond" : [ {
"$not" : [ {
"$regexMatch" : {
"input" : "$upper.leaf",
"regex" : "/flower/"}}]},
{"leaf" : "$upper.leaf","stem" : "$upper.stem"},
{"stem" : "$upper.stem","petal" : "$upper.petal"}]
}
}}]
Its similar code, but expression gets as value of the "_id" and $regexMatch
is used that is aggregate operator.
I didnt tested the code.
db.students.aggregate([
{ $unwind: "$details" },
{
$group: {
_id: {
sid: "$details.student._id",
statuscode: "$details.studentStatus.statusCode"
},
total: { $sum: 1 }
}
}
]);
The query is working fine and need to convert into mongo template.
Sample document:
{
"_id" : 59,
"details" : [
{
"student" : {
"_id" : "5d3145a8523a2e602e5e0200"
},
"studentStatus" : {
"statusCode" : 1
}
}
]
}
The Spring Data MongoTemplate code for the given aggregation is as follows.
Note that I have added a project stage before the group. This project is required; if the nested fields ("details.student._id" and "details.studentStatus.statusCode") are used directly within the group stage there are errors "FieldPath field names may not contain '.'." and could not be resolved (and this only happens when you use more than one field in the grouping).
The result is same as that of the aggregation you have provided. I have used the latest of Spring and MongoDB drivers with Java 8.
MongoOperations mongoOps = new MongoTemplate(MongoClients.create(), "spr_test");
Aggregation agg = newAggregation(
unwind("details"),
project("_id")
.and("details.student._id").as("sid")
.and("details.studentStatus.statusCode").as("statuscode"),
group("sid", "statuscode")
.count().as("total")
);
AggregationResults<Document> aggResults = mongoOps.aggregate(agg, "students", Document.class);
aggResults.forEach(System.out::println);
In MongoDB, using $type, it is possible to filter a search based on if the field matches a BSON data type (see DOCS).
For eg.
db.posts.find({date2: {$type: 9}}, {date2: 1})
which returns:
{
"_id" : ObjectId("4c0ec11e8fd2e65c0b010000"),
"date2" : "Fri Jul 09 2010 08:25:26 GMT"
}
I need a query that will tell me what the actual type of the field is, for every field in a collection. Is this possible with MongoDB?
Starting from MongoDB 3.4, you can use the $type aggregation operator to return a field's type.
db.posts.aggregate(
[
{ "$project": { "fieldType": { "$type": "$date2" } } }
]
)
which yields:
{
"_id" : ObjectId("4c0ec11e8fd2e65c0b010000"),
"fieldType" : "string"
}
type the below query in mongo shell
typeof db.employee.findOne().first_name
Syntax
typeof db.collection_name.findOne().field_name
OK, here are some related questions that may help:
Get all field names in a collection using map-reduce.
Here's a recursive version that lists all possible fields.
Hopefully that can get you started. However, I suspect that you're going to run into some issues with this request. There are two problems here:
I can't find a "gettype" function for JSON. You can query by $type, but it doesn't look like you can actually run a gettype function on a field and have that maps back to the BSON type.
A field can contain data of multiple types, so you'll need a plan to handle this. Even if it's not apparent Mongo could store some numbers as ints and others floats without you really knowing. In fact, with the PHP driver, this is quite possible.
So if you assume that you can solve problem #1, then you should be able to solve problem #2 using a slight variation on "Get all field Names".
It would probably look something like this:
"map" : function() { for (var key in this) { emit(key, [ typeof value[key] ]); } }
"reduce" : function(key, stuff) { return (key, add_to_set(stuff) ); }
So basically you would emit the key and the type of key value (as an array) in the map function. Then from the reduce function you would add unique entries for each type.
At the end of the run you would have data like this
{"_id":[255], "name" : [1,5,8], ... }
Of course, this is all a lot of work, depending on your actual problem, you may just want to ensure (from your code) that you're always putting in the right type of data. Finding the type of data after the data is in the DB is definitely a pain.
Taking advantage of the styvane query, I added a $group listing to make it easier to read when we have different data types.
db.posts.aggregate(
[
{ "$project": { _id:0, "fieldType": { "$type": "$date2" } } },
{"$group": { _id: {"fieldType": "$fieldType"},count: {$sum: 1}}}
])
And have this result:
{ "_id" : { "fieldType" : "missing" }, "count" : 50 }
{ "_id" : { "fieldType" : "date" }, "count" : 70 }
{ "_id" : { "fieldType" : "string" }, "count" : 10 }
Noting that a=5;a.constructor.toString() prints function Number() { [native code] }, one can do something similar to:
db.collection.mapReduce(
function() {
emit(this._id.constructor.toString()
.replace(/^function (\S+).+$/, "$1"), 1);
},
function(k, v) {
return Array.sum(v);
},
{
out: { inline: 1 }
});