I have a JSON file with a horrific data structure
{ "#timestamp" : "20160226T065604,39Z",
"#toplevelentries" : "941",
"viewentry" : [ { "#noteid" : "161E",
"#position" : "1",
"#siblings" : "941",
"entrydata" : [
and entrydata is a list of 941 entries, each of which look like this
{ "#columnnumber" : "0",
"#name" : "$Created",
"datetime" : { "0" : "20081027T114133,55+01" }
},
{ "#columnnumber" : "1",
"#name" : "WriteLog",
"textlist" : { "text" : [ { "0" : "2008.OCT.28 12:54:39 CET # EMI" },
{ "0" : "2008.OCT.28 12:56:13 CET # EMI" },
There are many more columns. The structure is always this:
{
"#columnnumber": "17",
"#name": "PublicDocument",
"text": {
"0": "TMI-1-2005.pdf"
}
}
there's a column number which we can throw away, a #name which is the important part, then one of text, datetime or textlist fields where the value is always this weird subdocument with a 0 key and the actual value.
All 941 entries have the same number of these column entries and the column entry is always the same structure. Ie. if "#columnnumber": "13" has a #name: foo then it'll always be foo and if it has a datetime key then it always will have a datetime key, never a text or textlist. This monster was borne out of a SQL or similar database somewhere at the very far end of the pipeline but I have no access to the source beyond this. The goal is to revert the transformation and make it into something a SELECT statement would produce (except textlist, although I guess array_agg and similar could produce that too).
Is there a way to get 941 separate JSON entries out of MongoDB looking like:
{
$Created: "20081027T114133,55+01",
WriteLog: ["2008.OCT.28 12:54:39 CET # EMI", "2008.OCT.28 12:56:13 CET # EMI"],
PublicDocument: "TMI-1-2005.pdf"
}
is viewentry also a list?
if you do an aggregate on the collection, and $unwind on viewentry.entrydata you will get one document for every entrydata. It should be possible to the do a $project to reformat these documents to produce the output you need
this is a nice challenge,
to get outupt like that:
{
"_id" : "161E",
"field" : [
{
"name" : "$Created",
"datetime" : {
"0" : "20081027T114133,55+01"
}
},
{
"name" : "WriteLog",
"textlist" : {
"text" : [
{
"0" : "2008.OCT.28 12:54:39 CET# EMI"
},
{
"0" : "2008.OCT.28 12:56:13 CET# EMI"
}
] } } ]}
use this aggregation pipelines:
db.chx.aggregate([ {$unwind: "$viewentry"}
, {$unwind: "$viewentry.entrydata"}
,{$group:{
"_id":"$viewentry.#noteid", field:{ $push:{
"name": "$viewentry.entrydata.#name" ,
datetime:"$viewentry.entrydata.datetime",
textlist:"$viewentry.entrydata.textlist" }}
}}
]).pretty()
the next step should be extracting log entries, but I have no idea, as my brain is already fried tonight - so probably I can return later...
Related
I am working on mongodb and making a database structure as:
{
"_id" : ObjectId("5e9fef05c0228a50befba0d7"),
"name" : "sanghm",
"id" : "3456",
"dep" : {
"dep1" : "ops",
"dep2" : "analytics"
},
"data" : [
{
"date" : "25-apr-2020",
"log" : [
{
"machine" : "windows-user1",
"task" : "excel",
"time": "10:00am"
}
{
"machine" : "windows-user1",
"task" : "email",
"time": "11:00am"
}
]
}
]
}
to push every new task on database, i'm using a query like
date = '25-apr-2020'
db.inventory.update({id:'3456'},{{$push:{'data.$[t].log':{machine:'windows-user1',task:'new task',time:'3:00pm'}}},{arrayFilters:[{"t.date":date}]}})
this query pushing new task to date array.
if in case the date changes to '26-apr-2020' then can we use same query to first add new date filed and then push data for the same.
i have read about {upsert: true} but dont know how to use it in my case
The double nesting is weird. Usually all of the events would be stored flat in the top level data array and one would produce the grouping by date (if desired) at query time.
Followup Question
Thanks #4J41 for your spot on resolution. Along the same lines, I'd also like to validate one other thing.
I have a mongo document that contains an array of Strings, and I need to convert this particular array of strings into an array of object containing a key-value pair. Below is my curent appraoch to it.
Mongo Record:
Same mongo record in my initial question below.
Current Query:
templateAttributes.find({platform:"V1"}).map(function(c){
//instantiate a new array
var optionsArray = [];
for (var i=0;i< c['available']['Community']['attributes']['type']['values'].length; i++){
optionsArray[i] = {}; // creates a new object
optionsArray[i].label = c['available']['Community']['attributes']['type']['values'][i];
optionsArray[i].value = c['available']['Community']['attributes']['type']['values'][i];
}
return optionsArray;
})[0];
Result:
[{label:"well-known", value:"well-known"},
{label:"simple", value:"simple"},
{label:"complex", value:"complex"}]
Is my approach efficient enough, or is there a way to optimize the above query to get the same desired result?
Initial Question
I have a mongo document like below:
{
"_id" : ObjectId("57e3720836e36f63695a2ef2"),
"platform" : "A1",
"available" : {
"Community" : {
"attributes" : {
"type" : {
"values" : [
"well-known",
"simple",
"complex"
],
"defaultValue" : "well-known"
},
[......]
}
I'm trying to query the DB and retrieve only the value of defaultValue field.
I tried:
db.templateAttributes.find(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
)
as well as
db.templateAttributes.findOne(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
)
But they both seem to retrieve the entire object hirarchy like below:
{
"_id" : ObjectId("57e3720836e36f63695a2ef2"),
"available" : {
"Community" : {
"attributes" : {
"type" : {
"defaultValue" : "well-known"
}
}
}
}
}
The only way I could get it to work was with find and map function, but it seems to be convoluted a bit.
Does anyone have a simpler way to get this result?
db.templateAttributes.find(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
).map(function(c){
return c['available']['Community']['attributes']['type']['defaultValue']
})[0]
Output
well-known
You could try the following.
Using find:
db.templateAttributes.find({ platform: "A1" }, { "available.Community.attributes.type.defaultValue": 1 }).toArray()[0]['available']['Community']['attributes']['type']['defaultValue']
Using findOne:
db.templateAttributes.findOne({ platform: "A1" }, { "available.Community.attributes.type.defaultValue": 1 })['available']['Community']['attributes']['type']['defaultValue']
Using aggregation:
db.templateAttributes.aggregate([
{"$match":{platform:"A1"}},
{"$project": {_id:0, default:"$available.Community.attributes.type.defaultValue"}}
]).toArray()[0].default
Output:
well-known
Edit: Answering the updated question: Please use aggregation here.
db.templateAttributes.aggregate([
{"$match":{platform:"A1"}}, {"$unwind": "$available.Community.attributes.type.values"},
{$group: {"_id": null, "val":{"$push":{label:"$available.Community.attributes.type.values",
value:"$available.Community.attributes.type.values"}}}}
]).toArray()[0].val
Output:
[
{
"label" : "well-known",
"value" : "well-known"
},
{
"label" : "simple",
"value" : "simple"
},
{
"label" : "complex",
"value" : "complex"
}
]
I have a collection :
gStats : {
"_id" : "id1",
"criteria" : ["key1":"value1", "key2":"value2"],
"groups" : [
{"id":"XXXX", "visited":100, "liked":200},
{"id":"YYYY", "visited":30, "liked":400}
]
}
I want to be able to update a document of the stats Array of a given array of criteria (exact match).
I try to do this on 2 steps :
Pull the stat document from the array of a given "id" :
db.gStats.update({
"criteria" : {$size : 2},
"criteria" : {$all : [{"key1" : "2096955"},{"value1" : "2015610"}]}
},
{
$pull : {groups : {"id" : "XXXX"}}
}
)
Push the new document
db.gStats.findAndModify({
query : {
"criteria" : {$size : 2},
"criteria" : {$all : [{"key1" : "2015610"}, {"key2" : "2096955"}]}
},
update : {
$push : {groups : {"id" : "XXXX", "visited" : 29, "liked" : 144}}
},
upsert : true
})
The Pull query works perfect.
The Push query gives an error :
2014-12-13T15:12:58.571+0100 findAndModifyFailed failed: {
"value" : null,
"errmsg" : "exception: Cannot create base during insert of update. Cause
d by :ConflictingUpdateOperators Cannot update 'criteria' and 'criteria' at the
same time",
"code" : 12,
"ok" : 0
} at src/mongo/shell/collection.js:614
Neither query is working in reality. You cannot use a key name like "criteria" more than once unless under an operator such and $and. You are also specifying different fields (i.e groups) and querying elements that do not exist in your sample document.
So hard to tell what you really want to do here. But the error is essentially caused by the first issue I mentioned, with a little something extra. So really your { "$size": 2 } condition is being ignored and only the second condition is applied.
A valid query form should look like this:
query: {
"$and": [
{ "criteria" : { "$size" : 2 } },
{ "criteria" : { "$all": [{ "key1": "2015610" }, { "key2": "2096955" }] } }
]
}
As each set of conditions is specified within the array provided by $and the document structure of the query is valid and does not have a hash-key name overwriting the other. That's the proper way to write your two conditions, but there is a trick to making this work where the "upsert" is failing due to those conditions not matching a document. We need to overwrite what is happening when it tries to apply the $all arguments on creation:
update: {
"$setOnInsert": {
"criteria" : [{ "key1": "2015610" }, { "key2": "2096955" }]
},
"$push": { "stats": { "id": "XXXX", "visited": 29, "liked": 144 } }
}
That uses $setOnInsert so that when the "upsert" is applied and a new document created the conditions specified here rather than using the field values set in the query portion of the statement are used instead.
Of course, if what you are really looking for is truly an exact match of the content in the array, then just use that for the query instead:
query: {
"criteria" : [{ "key1": "2015610" }, { "key2": "2096955" }]
}
Then MongoDB will be happy to apply those values when a new document is created and does not get confused on how to interpret the $all expression.
I would like to retrieve a list of values that comes from the oldest document currently signed.But i failed to select a document absed on the date.Thanks
here is json :
"ad" : "noc3",
"createdDate" : ISODate(),
"list" : [
{
"id" : "p45",
"value" : 21,
},
{
"id" : "p6",
"value" : 20,
},
{
"id" : "4578",
"value" : 319
}
]
and here my aggregate request :
db.friends.aggregate({$match:{advertiser:"noc3", {$sort:{timestamps:-1},{$limit:1} }},{$unwind:"$list"},{$project:{_id: "$list.id", value:{$add:[0]}}});
Your aggregate query is incorrect. You add the sort and limit to the match, but that's now how you do that. You use different pipeline operators:
db.friends.aggregate( [
{ $match: { advertiser: "noc3" } },
{ $sort: { createdDate: -1 } },
{ $limit: 1 },
Your other pipeline operators are bit strange too, and your code vs query mismatches on timestamps vs createdDate. If you add the expected output, I can update the answer to include the last bits of the query too.
I have a few collections of metrics that are stored pre-aggregated into hour and minute collections like this:
"_id" : "12345CHA-2RU020130104",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"processor_id" : NumberLong(0),
"date" : ISODate("2013-01-04T00:00:00Z"),
"processor_type" : "CHP",
"array_serial" : NumberLong(12345)
},
"hour" : {
"11" : 4.6665907,
"21" : 5.9431519999999995,
"7" : 0.6405864,
"17" : 4.712744,
---etc---
},
"minute" : {
"11" : {
"33" : 4.689972,
"32" : 4.7190895,
---etc---
},
"3" : {
"45" : 5.6883,
"59" : 4.792,
---etc---
}
The minute collection has a sub-document for each hour which has an entry for each minute with the value of the metric at that minute.
My question is about the aggregation framework, how should I process this collection if I wanted to find all minutes where the metric was above a certain highwater mark? Investigating the aggregation framework is showing an $unwind function but that seems to only work on arrays..
Would the map/reduce functionality be better suited for this? With that I could simply emit any entry above the highwatermark and count them.
You could build an array of "keys" using a reduce function that iterates through the objects attributes.
reduce: function(obj,prev)
{
for(var key in obj.minute) {
prev.results.push( { hour:key, minutes: obj.minute[key]});
}
}
will give you something like
{
"results" : [
{
"hour" : "11",
"minutes" : {
"33" : 4.689972,
"32" : 4.7190895
}
},
{
"hour" : "3",
"minutes" : {
"45" : 5.6883,
"59" : 4.792
}
}
]
}
I've just done a quick test using a group() - you'll need something more complex to iterate though the sub-sub documents (minutes) but hopefully points you in right direction.
db.yourcoll.group(
{
initial: { results: [] },
reduce: function(obj,prev)
{
for(var key in obj.minute) {
prev.results.push( { hour:key, minutes: obj.minute[key]});
}
}
}
);
In the finalizer you could reshape the data again. It's not going to be pretty, it might be easier to hold the minute and hour data as arrays rather than elements of the document.
hope it helps a bit