Convert Array to Json in Mongodb Aggregate - mongodb

I have a Mongo Document in below format:
{
"id":"eafa3720-28e2-11ed-bf07"
"type":"test"
"serviceType_details": [
{
"is_custom_service_type": false,
"bill_amount": 100
}
]
}
"serviceType_details" Key doesn't have any definite schema.
Now I want to export it using MongoDB aggregate to Parquet so that I could use Presto to query it.
My Pipeline Code:
db.test_collection.aggregate([
{
$match: {
"id": "something"
}
},
{
$addFields: {
...
},
}
{
"$out" : {
"format" : {
"name" : "parquet",
"maxFileSize" : "10GB",
"maxRowGroupSize" : "100MB"
}
}
}
])
Now I want to export the value of "serviceType_details" in json string not as array ( when using current code parquet recognises it as an array)
I have tried $convert,$project and it's not working.
Currently the generated Parquet schema looks something like this:
I want the generated Parquet schema for "serviceType_details" to have as string and value should be stringify version of array which is present in mongo document.
Reason for me to have need it as string is because in each document "serviceType_details" details have completely different schema, its very difficult to maintain Athena table on top of it.

You can use the $function operator to define custom functions to implement behaviour not supported by the MongoDB Query Language
It could be done using "$function" like this:
db.test_collection.aggregate([
{
$match: {
"id": "something"
}
},
{
$addFields: {
newFieldName: {
$function: {
body: function(field) {
return (field != undefined && field != null) ? JSON.stringify(field) : "[]"
},
args: ["$field"],
lang: "js"
}
},
},
}
{
"$out" : {
"format" : {
"name" : "parquet",
"maxFileSize" : "10GB",
"maxRowGroupSize" : "100MB"
}
}
}
])
Executing JavaScript inside an aggregation expression may decrease performance. Only use the $function operator if the provided pipeline operators cannot fulfill your application's needs.

Related

MongoDB - How to find() a field in a collection that has a reference to another collection?

So I have this field contacts.envCon.name which is inside the Projects collection but when I see them in mongo they are like this:
"envCon" : {
"$ref" : "contacts",
"$id" : ObjectId("5807966090c01f4174cb1714")
}
After doing a simple find based on past ObjectId:
db.getCollection('contacts').find({_id:ObjectId("5807966090c01f4174cb1714")})
I get the following result:
{
"_id" : ObjectId("5807966090c01f4174cb1714"),
"name" : "Terracon"
}
By the way: I'm using Meteor if there is anyway to do this directly with publish/suscribe methods.
Yes, you can do this join inside a publication using the very popular reywood:publish-composite package.
With your model:
Meteor.publishComposite('projectsWithContacts', {
find: function() {
return Projects.find(); // all projects
},
children: [
{
find: function(p) { // p is one project document
return Contacts.find(
{ _id: p.envCon.$id }, // this is the relationship
{ fields: { name: 1 } }); // only return the name (_id is automatic)
}
},
]
});

MongoDB Aggregation with DBRef

Is it possible to aggregate on data that is stored via DBRef?
Mongo 2.6
Let's say I have transaction data like:
{
_id : ObjectId(...),
user : DBRef("user", ObjectId(...)),
product : DBRef("product", ObjectId(...)),
source : DBRef("website", ObjectId(...)),
quantity : 3,
price : 40.95,
total_price : 122.85,
sold_at : ISODate("2015-07-08T09:09:40.262-0700")
}
The trick is "source" is polymorphic in nature - it could be different $ref values such as "webpage", "call_center", etc that also have different ObjectIds. For example DBRef("webpage", ObjectId("1")) and DBRef("webpage",ObjectId("2")) would be two different webpages where a transaction originated.
I would like to ultimately aggregate by source over a period of time (like a month):
db.coll.aggregate( { $match : { sold_at : { $gte : start, $lt : end } } },
{ $project : { source : 1, total_price : 1 } },
{ $group : {
_id : { "source.$ref" : "$source.$ref" },
count : { $sum : $total_price }
} } );
The trick is you get a path error trying to use a variable starting with $ either by trying to group by it or by trying to transform using expressions via project.
Any way to do this? Actually trying to push this data via aggregation to a subcollection to operate on it there. Trying to avoid a large cursor operation over millions of records to transform the data so I can aggregate it.
Mongo 4. Solved this issue in the following way:
Having this structure:
{
"_id" : LUUID("144e690f-9613-897c-9eab-913933bed9a7"),
"owner" : {
"$ref" : "person",
"$id" : NumberLong(10)
},
...
...
}
I needed to use "owner.$id" field. But because of "$" in the name of field, I was unable to use aggregation.
I transformed "owner.$id" -> "owner" using following snippet:
db.activities.find({}).aggregate([
{
$addFields: {
"owner": {
$arrayElemAt: [{ $objectToArray: "$owner" }, 1]
}
}
},
{
$addFields: {
"owner": "$owner.v"
}
},
{"$group" : {_id:"$owner", count:{$sum:1}}},
{$sort:{"count":-1}}
])
Detailed explanations here - https://dev.to/saurabh73/mongodb-using-aggregation-pipeline-to-extract-dbref-using-lookup-operator-4ekl
You cannot use DBRef values with the aggregation framework. Instead you need to use JavasScript processing of mapReduce in order to access the property naming that they use:
db.coll.mapReduce(
function() {
emit( this.source.$ref, this["total_price"] )
},
function(key,values) {
return Array.sum( values );
},
{
"query": { "sold_at": { "$gte": start, "$lt": end } },
"out": { "inline": 1 }
}
)
You really should not be using DBRef at all. The usage is basically deprecated now and if you feel you need some external referencing then you should be "manually referencing" this with your own code or implemented by some other library, with which you can do so in a much more supported way.

Mongo: select only one field from the nested object

In mongo I store object that have field "titleComposite". This field contains array of title object, like this:
"titleComposite": [
"0": {
"titleType": "01",
"titleText": "Test cover uploading"
}
]
I'm perfoming query and I would like to receive only "titleText" value for the returned values. Here is an example of my query:
db.onix_feed.find({"addedBy":201, "mediaFileComposite":{$exists:false}}, {"isbn13":1,"titleComposite.titleText":1})
In the results I see values like
{
"_id" : ObjectId("559ab286fa4634f309826385"),
"titleComposite" : [ { "titleText" : "The Nonprofit World" } ],
"isbn13" : "9781565495296"
}
Is there any way to get rid of "titleComposite" wrapper object and receive only titleText? For example, take titleText of the first element only?
Would appreciate any help
You can mongodb aggregation to achieve your expected result. Re-arrange your query as following...
db.onix_feed.aggregate([
{
$match: {
$and: [
{"addedBy":201},
{"mediaFileComposite":{$exists:false}}
]
}
},
{
$project : { titleText: "$titleComposite.titleText",
"isbn13" : 1 }
}
])

PyMongo query field of documents

In my DynamoDB every document has several fields, one of the fields is a document called "engines" that holds several documents (all the engines) that hold several fields, as the picture shows below:
I would like to get all the couples of (engine,definitions) that their definition date is greater than a specific date.
I tried:
cursor=collection.find(
{'engines': { "$elemMatch" :
{ "definitions" :
{'$gt': startdate} } } }
,{'engines':{'$elemMatch':1}},{'engines':{'$elemMatch':{'definitions':1}}} )
but I get:
TypeError: skip must be an instance of int
Can someone help with the query?
You've mixed up the closing } and ended up passing {'engines':{'$elemMatch':{'definitions':1}}} as a skip argument value.
I think you meant:
cursor = collection.find(
{
'engines': {
"$elemMatch": {
"definitions": {
'$gt': startdate
}
}
}
},
{
'engines': {
'$elemMatch': {
'definitions': 1
}
}
}
)

MongoDB update. Trying to set one field from a property of another

What I'm trying to do is pretty straightforward, but I can't find out how to give one field the value of another.
I simply want to update one field with the character count of another.
db.collection.update({$exists:true},{$set : {field1 : field2.length}})
I've tried giving it dot notation
db.collection.update({$exits:true},{$set : {field1: "this.field2.length"}})
As well as using javascript syntax
db.collection.update({$exits:true},
{$set : {field1: {$where : "this.field2.length"}})
But just copied the string and got a "notOkforstorage" respectively. Any help?
Update:
I only get the "notOkforStorage" when I query by ID:
db.collection.update({_id:ObjectID("38289842bbb")},
{$set : {field1: {$where :"this.field2.length"}}})
Try the following code:
db.collection.find(your_querry).forEach(function(doc) {
doc.field1 = doc.field2.length;
db.collection.save(doc);
});
You can use your_querry to select only part of the original collection do perform an update. If you want to process an entire collection, use your_querry = {}.
If you want all operations to be atomic, use update instead of save:
db.collection.find( your_querry, { field2: 1 } ).forEach(function(doc) {
db.collection.update({ _id: doc._id },{ $set: { field1: doc.field2.length } } );
});
Starting Mongo 4.2, db.collection.update() can accept an aggregation pipeline, finally allowing the update/creation of a field based on another field:
// { "_id" : ObjectId("5e84c..."), "field1" : 12, "field2" : "world" }
db.collection.update(
{ "_id" : ObjectId("5e84c...") },
[{ $set: { field1: { $strLenCP: "$field2" } } }]
)
// { "_id" : ObjectId("5e84c..."), "field1" : 5, "field2" : "world" }
The first part {} is the match query, filtering which documents to update.
The second part [{ $set: { field1: { $strLenCP: "$field2" } } }] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline). $set is a new aggregation operator and an alias for $addFields. Any aggregation operator can be used within the $set stage; in our case $strLenCP which provides the length of field2.
As far I know the easiest way is the read and write aproach:
//At first, get/prepare your new value:
var d= db.yourColl.fetchOne({....});
d.field1== d.field2.length;
// then update with your new value
db.yourColl.save(d);
Your are using exists in the wrong way.
Syntax: { field: { $exists: <boolean> } }
You use of $where is also incorrect
Use the $where operator to pass either a string containing a JavaScript expression or a full JavaScript function to the query system
db.myCollection.find( { $where: "this.credits == this.debits" } );
db.myCollection.find( { $where: "obj.credits == obj.debits" } );
db.myCollection.find( { $where: function() { return (this.credits == this.debits) } } );
db.myCollection.find( { $where: function() { return obj.credits == obj.debits; } } );
I think you should use Map-Reduce for what you are trying to do.