Adding new property (in the latest array item) in mongodb embedded document - mongodb

I habe a mongo document like below:
{
"_id" : ObjectId("59ccb655071d4c2ceebe190c"),
"session_deatils" : [
{
"session_start" : 1,
"session_complete" : 1,
"started_at" : ISODate("2017-09-28T08:42:19.770Z"),
"event_count" : 2
},
{
"session_start" : 1,
"session_complete" : 1,
"started_at" : ISODate("2017-09-28T08:53:08.618Z"),
"event_count" : 1
},
{
"session_start" : 1,
"session_complete" : 1,
"started_at" : ISODate("2017-09-28T09:19:42.726Z")
}
],
"session_id" : "12312312313123",
}
I want to add new field and value like "event_count" in the latest item in session details which is
{
"session_start" : 1,
"session_complete" : 1,
"started_at" : ISODate("2017-09-28T09:19:42.726Z")
}
and I want to update it and the array element should look like below:
{
"session_start" : 1,
"session_complete" : 1,
"started_at" : ISODate("2017-09-28T09:19:42.726Z"),
"event_count" : 3
}
I tried like below:
collection.update({"session_deatils.started_at": datetime.datetime(2017, 9, 28, 9, 22, 3, 459000)}, {"$set":{"session_deatils.event_count":3}})
which adds new property in parrent docuemnt.
Is there a way I can achieve that?
thanks in advance

Related

Groovy: Retrieve a value from a JSON based on an object

So I have a JSON that looks like this
{
"_embedded" : {
"userTaskDtoList" : [
{
"userTaskId" : 8,
"userTaskDefinitionId" : "JJG",
"userRoleId" : 8,
"workflowId" : 9,
"criticality" : "MEDIUM",
**"dueDate"** : "2021-09-29T09:04:37Z",
"dueDateFormatted" : "Tomorrow 09:04",
"acknowledge" : false,
"key" : 8,
},
{
"userTaskId" : 10,
"userTaskDefinitionId" : "JJP",
"userRoleId" : 8,
"workflowId" : 11,
"criticality" : "MEDIUM",
**"dueDate"** : "2021-09-29T09:06:44Z",
"dueDateFormatted" : "Tomorrow 09:06",
"acknowledge" : false,
"key" : 10,
},
{
"userTaskId" : 12,
"userTaskDefinitionId" : "JJD",
"userRoleId" : 8,
"workflowId" : 13,
"criticality" : "MEDIUM",
**"dueDate"** : "2021-09-29T09:59:07Z",
"dueDateFormatted" : "Tomorrow 09:59",
"acknowledge" : false,
"key" : 12,
}
]
}
}
It's a response from a REST request. What I need is to extract the data of key "dueDate" ONLY from a specific object and make some validations with it. I'm trying to use Groovy to resolve this.
The only thing I've managed to do is this:
import groovy.json.*
def response = context.expand( '${user tasks#Response}' )
def data = new JsonSlurper().parseText(response)
idValue = data._embedded.userTaskDtoList.dueDate
Which returns all 3 of the values from the "dueDate" key in the response.
I was thinking that maybe I can interact with a certain object based on another key, for instance let's say I retrieve only the value from the "dueDate" key, that is part of the object with "userTaskId" : 12.
How could I do this?
Any help would be greatly appreciated.
You can find the record of interest, then just grab the dueDate from that
data._embedded.userTaskDtoList.find { it.userTaskId == 12 }.dueDate

Mongodb put Documents array as the same level

I have this array of documents, I would like to put "table" on the same level like mastil_antenas and other variables. how Can I do that with aggregate?
I'm trying with the aggregate $project but I can't get the result.
Example of Data
[ {
"mastil_antena" : "1",
"nro_platf" : "1",
"antmarcmast" : "ANDREW",
"antmodelmast" : "HWXXX6516DSA3M",
"retmarcmast" : "Ericsson",
"retmodelmast" : "ATM200-A20",
"distmast" : "1.50",
"altncramast" : "41.30",
"ORIENTMAG" : "73.00",
"incelecmast" : "RET",
"incmecmast" : "1.00",
"Feedertypemast" : "Fibra Optica",
"longjumpmast" : "5.00",
"longfo" : "100",
"calibrecablefuerza" : "10 mm",
"longcablefuerza" : "65.00",
"modelorruantena" : "32B66A",
"tiltmecfoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114934746000.jpg",
"tiltmecfoto_fh" : "2017-10-18T05:51:22Z",
"az0foto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017115012727000.jpg",
"az0foto_fh" : "2017-10-18T05:55:21Z",
"azneg60foto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017115016199000.jpg",
"azneg60foto_fh" : "2017-10-18T05:55:36Z",
"azpos60foto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017115020147000.jpg",
"azpos60foto_fh" : "2017-10-18T05:55:49Z",
"etiqantenafoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114920853000.jpg",
"etiqantenafoto_fh" : "2017-10-18T05:56:01Z",
"tiltelectfoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114914236000.jpg",
"tiltelectfoto_fh" : "2017-10-18T05:56:13Z",
"idcablefoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114900279000.jpg",
"idcablefoto_fh" : "2017-10-18T05:56:38Z",
"rrutmafoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114947279000.jpg",
"rrutmafoto_fh" : "2017-10-18T05:56:49Z",
"etiquetarrufoto" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114954648000.jpg",
"etiquetarrufoto_fh" : "2017-10-18T05:57:02Z",
"rrutmafoto1" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017114959738000.jpg",
"rrutmafoto1_fh" : "2017-10-18T05:57:12Z",
"etiquetarrufoto1" : "https://secure.appenate.com/Files/FormEntry/47929-92cdf219-3128-4903-8324-a81000602b9d171017115005545000.jpg",
"etiquetarrufoto1_fh" : "2017-10-18T05:57:27Z",
"botontorre4" : "sstelcel3",
"table" : { /* put all varibles one level up*/
"tecmast" : "LTE",
"frecmast" : "2100",
"secmast" : "1",
"untitled440" : "Salir"
},
"comentmast" : "",
"longfeedmast" : "",
"numtmasmast" : "",
"otra_marca_antena" : "",
"otro_modelo_antena" : ""
}]
Starting from MongoDB version 3.4 you could use $addFields to do this.
//replace products with what makes sense in your database
db.getCollection('products').aggregate(
[
{ //1 add the properties from subdocument table to documents
$addFields: {
"documents.tecmast" : "documents.0.table.tecmast",
"documents.frecmast" : "documents.0.table.frecmast",
"documents.secmast" : "documents.0.table.secmast",
"documents.untitled440" : "documents.0.table.untitled440"
}
},
{
//(optional) 2 remove the table property from the documents
$project: {"documents.table" : 0}
}
]
)
Step 1: use $addFields to grab properties from table inside documents.table and put them on documents
Step 2: (optional) remove property "table" from documents.
I hope this helps!!!

Apache Spark:How to get parquet output file size and records

I am newbie in apache spark and i want to get parquet output file size.
My Scenario is
Read the file from csv and save as text file
myRDD.saveAsTextFile("person.txt")
after saved the file spark UI(localhost:4040) showing me inputBytes 15607801 and outputBytes 13551724
but when i save as parquet file
myDF.saveAsParquetFile("person.perquet")
UI(localhost:4040) on stage tab, only show me inputBytes 15607801 and there is nothing in outputBytes.
Can anybody help me. Thanks in advance
Edit
When I call REST API it giving me following response.
[ {
"status" : "COMPLETE",
"stageId" : 4,
"attemptId" : 0,
"numActiveTasks" : 0,
"numCompleteTasks" : 1,
"numFailedTasks" : 0,
"executorRunTime" : 10955,
"inputBytes" : 15607801,
"inputRecords" : 1440721,
**"outputBytes" : 0,**
**"outputRecords" : 0,**
"shuffleReadBytes" : 0,
"shuffleReadRecords" : 0,
"shuffleWriteBytes" : 0,
"shuffleWriteRecords" : 0,
"memoryBytesSpilled" : 0,
"diskBytesSpilled" : 0,
"name" : "saveAsParquetFile at ParquetExample.scala:82",
"details" : "org.apache.spark.sql.DataFrame.saveAsParquetFile(DataFrame.scala:1494)\ncom.spark.sql.ParquetExample$.main(ParquetExample.scala:82)\ncom.spark.sql.ParquetExample.main(ParquetExample.scala)",
"schedulingPool" : "default",
"accumulatorUpdates" : [ ]
}, {
"status" : "COMPLETE",
"stageId" : 3,
"attemptId" : 0,
"numActiveTasks" : 0,
"numCompleteTasks" : 1,
"numFailedTasks" : 0,
"executorRunTime" : 2091,
"inputBytes" : 15607801,
"inputRecords" : 1440721,
**"outputBytes" : 13551724,**
**"outputRecords" : 1200540,**
"shuffleReadBytes" : 0,
"shuffleReadRecords" : 0,
"shuffleWriteBytes" : 0,
"shuffleWriteRecords" : 0,
"memoryBytesSpilled" : 0,
"diskBytesSpilled" : 0,
"name" : "saveAsTextFile at ParquetExample.scala:77",
"details" : "org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1379)\ncom.spark.sql.ParquetExample$.main(ParquetExample.scala:77)\ncom.spark.sql.ParquetExample.main(ParquetExample.scala)",
"schedulingPool" : "default",
"accumulatorUpdates" : [ ]
} ]

Errors while creating a collection in MongoDB

I am new to MongoDB. I am not able to create a collection. It gives a sentence in the mongo shell - Display all 169 possibilities? (y or n). The code is -
db.Lead.insert(
{ LeadID: 1,
MasterAccountID: 100,
LeadName: 'Sarah',
LeadEmailID : 'sarah#hmail.com',
LeadPhoneNumber : '2132155445',
Details : [{ StateID: 1,
TaskID : 1,
Assigned By : 1001,
TimeStamp : '10:00:00',
StatusID : 1 }
]
}
)
Not sure what the issue is. Please help me out with the same.
Regards.
Apart from the fact there is a space in Assigned By everything looks good.
I am able to insert it properly.
> db.Lead.find().pretty()
{
"_id" : ObjectId("517ebe75278e0557fd167eb7"),
"LeadID" : 1,
"MasterAccountID" : 100,
"LeadName" : "Sarah",
"LeadEmailID" : "sarah#hmail.com",
"LeadPhoneNumber" : "2132155445",
"Details" : [
{
"StateID" : 1,
"TaskID" : 1,
"AssignedBy" : 1001,
"TimeStamp" : "10:00:00",
"StatusID" : 1
}
]
}

Merging two collections in MongoDB

I've been trying to use MapReduce in MongoDB to do what I think is a simple procedure. I don't know if this is the right approach, of if I should even be using MapReduce. I googled what keywords I thought of and tried to hit the docs where I thought I would have the most success - but nothing. Maybe I'm thinking too hard about this?
I have two collections: details and gpas
details is made up of a whole bunch of documents (3+ million). The studentid element can be repeated two times, one for each year, like the following:
{ "_id" : ObjectId("4d49b7yah5b6d8372v640100"), "classes" : [1,17,19,21], "studentid" : "12345a", "year" : 1}
{ "_id" : ObjectId("4d76b7oij7s2d8372v640100"), "classes" : [2,12,19,22], "studentid" : "98765a", "year" : 1}
{ "_id" : ObjectId("4d49b7oij7s2d8372v640100"), "classes" : [32,91,101,217], "studentid" : "12345a", "year" : 2}
{ "_id" : ObjectId("4d76b7rty7s2d8372v640100"), "classes" : [1,11,18,22], "studentid" : "24680a", "year" : 1}
{ "_id" : ObjectId("4d49b7oij7s2d8856v640100"), "classes" : [32,99,110,215], "studentid" : "98765a", "year" : 2}
...
gpas has elements with the same studentid's from details. Only one entry per studentid, like this:
{ "_id" : ObjectId("4d49b7yah5b6d8372v640111"), "studentid" : "12345a", "overall" : 97, "subscore": 1}
{ "_id" : ObjectId("4f76b7oij7s2d8372v640213"), "studentid" : "98765a", "overall" : 85, "subscore": 5}
{ "_id" : ObjectId("4j49b7oij7s2d8372v640871"), "studentid" : "24680a", "overall" : 76, "subscore": 2}
...
In the end I want to have a collection with one row for each student in this format:
{ "_id" : ObjectId("4d49b7yah5b6d8372v640111"), "studentid" : "12345a", "classes_1": [1,17,19,21], "classes_2": [32,91,101,217], "overall" : 97, "subscore": 1}
{ "_id" : ObjectId("4f76b7oij7s2d8372v640213"), "studentid" : "98765a", "classes_1": [2,12,19,22], "classes_2": [32,99,110,215], "overall" : 85, "subscore": 5}
{ "_id" : ObjectId("4j49b7oij7s2d8372v640871"), "studentid" : "24680a", "classes_1": [1,11,18,22], "classes_2": [], "overall" : 76, "subscore": 2}
...
The way I was going to do this was by running MapReduce like this:
var mapDetails = function() {
emit(this.studentid, {studentid: this.studentid, classes: this.classes, year: this.year, overall: 0, subscore: 0});
};
var mapGpas = function() {
emit(this.studentid, {studentid: this.studentid, classes: [], year: 0, overall: this.overall, subscore: this.subscore});
};
var reduce = function(key, values) {
var outs = { studentid: "0", classes_1: [], classes_2: [], overall: 0, subscore: 0};
values.forEach(function(value) {
if (value.year == 0) {
outs.overall = value.overall;
outs.subscore = value.subscore;
}
else {
if (value.year == 1) {
outs.classes_1 = value.classes;
}
if (value.year == 2) {
outs.classes_2 = value.classes;
}
outs.studentid = value.studentid;
}
});
return outs;
};
res = db.details.mapReduce(mapDetails, reduce, {out: {reduce: 'joined'}})
res = db.gpas.mapReduce(mapGpas, reduce, {out: {reduce: 'joined'}})
But when I run it, this is my resulting collection:
{ "_id" : "12345a", "value" : { "studentid" : "12345a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 97, "subscore" : 1 } }
{ "_id" : "98765a", "value" : { "studentid" : "98765a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 85, "subscore" : 5 } }
{ "_id" : "24680a", "value" : { "studentid" : "24680a", "classes_1" : [ ], "classes_2" : [ ], "overall" : 76, "subscore" : 2 } }
I'm missing the classes arrays.
Also, as an aside, how do I access the elements in resulting MapReduce value element? Does MapReduce always output to value or whatever else you name it?
This is similar to a question that was asked on the MongoDB-users Google Groups.
https://groups.google.com/group/mongodb-user/browse_thread/thread/60a8b683e2626ada?pli=1
The answer references an on-line tutorial which looks similar to your example:
http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/
For more information on MapReduce in MongoDB, please see the documentation:
http://www.mongodb.org/display/DOCS/MapReduce
Additionally, there is a useful step-by-step walkthrough of how a MapReduce operation works in the "Extras" Section of the MongoDB Cookbook article titled, "Finding Max And Min Values with Versioned Documents":
http://cookbook.mongodb.org/patterns/finding_max_and_min/
Forgive me if you have already read some of the referenced documents. I have included them for the benefit of other users who may be reading this post and new to using MapReduce in MongoDB
It is important that the outputs from the 'emit' statements in the Map functions match the outputs of the Reduce function. If there is only one document output by the Map function, the Reduce function might not be run at all, and then your output collection will have mismatched documents.
I have slightly modified your map statements to emit documents in the format of your desired output, with two separate "classes" arrays.
I have also reworked your reduce statement to add new classes to the classes_1 and classes_2 arrays, only if they do not already exist.
var mapDetails = function(){
var output = {studentid: this.studentid, classes_1: [], classes_2: [], year: this.year, overall: 0, subscore: 0}
if (this.year == 1) {
output.classes_1 = this.classes;
}
if (this.year == 2) {
output.classes_2 = this.classes;
}
emit(this.studentid, output);
};
var mapGpas = function() {
emit(this.studentid, {studentid: this.studentid, classes_1: [], classes_2: [], year: 0, overall: this.overall, subscore: this.subscore});
};
var r = function(key, values) {
var outs = { studentid: "0", classes_1: [], classes_2: [], overall: 0, subscore: 0};
values.forEach(function(v){
outs.studentid = v.studentid;
v.classes_1.forEach(function(class){if(outs.classes_1.indexOf(class)==-1){outs.classes_1.push(class)}})
v.classes_2.forEach(function(class){if(outs.classes_2.indexOf(class)==-1){outs.classes_2.push(class)}})
if (v.year == 0) {
outs.overall = v.overall;
outs.subscore = v.subscore;
}
});
return outs;
};
res = db.details.mapReduce(mapDetails, r, {out: {reduce: 'joined'}})
res = db.gpas.mapReduce(mapGpas, r, {out: {reduce: 'joined'}})
Running the two MapReduce operations results in the following collection, which matches your desired format:
> db.joined.find()
{ "_id" : "12345a", "value" : { "studentid" : "12345a", "classes_1" : [ 1, 17, 19, 21 ], "classes_2" : [ 32, 91, 101, 217 ], "overall" : 97, "subscore" : 1 } }
{ "_id" : "24680a", "value" : { "studentid" : "24680a", "classes_1" : [ 1, 11, 18, 22 ], "classes_2" : [ ], "overall" : 76, "subscore" : 2 } }
{ "_id" : "98765a", "value" : { "studentid" : "98765a", "classes_1" : [ 2, 12, 19, 22 ], "classes_2" : [ 32, 99, 110, 215 ], "overall" : 85, "subscore" : 5 } }
>
MapReduce always outputs documents in the form of {_id:"id", value:"value"}
There is more information available on working with sub-documents in the document titled, "Dot Notation (Reaching into Objects)":
http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29
If you would like the output of MapReduce to appear in a different format, you will have to do that programmatically in your application.
Hopefully this will improve your understanding of MapReduce, and get you one step closer to producing your desired output collection. Good Luck!
You cannot use m/r for this since that is designed to only apply on one collection. Reading from more than one collection will break sharding compatibility and is therefore not allowed. You can do what you want with either the new aggregation framework (2.1+) or do this inside your application.