updating mongo documents based in map value and remove that value - mongodb

am currently working in Go and have a mongo database (connected via gopkg.in/mgo.v2) so, right now I have a data structure similar to:
{
"_id" : "some_id_bson",
"field1" : "value1",
"field2" : {
{
"key1" : "v1",
"key2" : "v2",
"key3" : "v3",
"key4" : "v4"
}
}
}
So, basically what I need to do (as an example) is to update in the database all the records that contains key1 and remove that from the json, so the result would be something like:
{
"_id" : "some_id_bson",
"field1" : "value1",
"field2" : {
{
"key2" : "v2",
"key3" : "v3",
"key4" : "v4"
}
}
}
What can I use to achieve this? I have been searching and cannot find something oriented to maps (field2 is a map). Thanks in advance

It seems like you're asking how to remove a property from a nested object in a particular document, which appears as if to be answered here: How to remove property of nested object from MongoDB document?.
from the main answer there:
Use $unset as below :
db.collectionName.update({},{"$unset":{"values.727920":""}}) EDIT For
updating multiple documents use update options like :
db.collectionName.update({},{"$unset":{"values.727920":""}},{"multi":true})

Try using $exists and $unset:
query:= bson.M{"$exists":bson.M{"field2.key1":true}}
replace:=bson.M{"$unset":bson.M{"field2.key1":""}}
collection.UpdateAll(query,replace)
This should find all documents containing field2.key1, and remove that.

Related

Adding to a double-nested array in MongoDB

I have a double nested array in my MongoDB schema and I'm trying to add an entirely new array element to a second-level nested array using $push. I'm getting the error cannot use the part (...) to traverse the element
A documents have the following structure
{
"_id" : ObjectId("5d8e37eb46c064790a28a467"),
"org-name" : "Manchester University NHS Foundation Trust",
"domain" : "mft.nhs.uk",
"subdomains" : [ {
"name" : "careers.mft.nhs.uk",
"firstSeen" : "2017-10-06 11:32:00",
"history" : [
{
"a_rr" : "80.244.185.184",
"timestamp" : ISODate("2019-09-27T17:24:57.148Z"),
"asn" : 61323,
"asn_org" : "Ukfast.net Limited",
"city" : null,
"country" : "United Kingdom",
"shodan" : {
"ports" : [
{
"port" : 443,
"versions" : [
"TLSv1",
"-SSLv2",
"-SSLv3",
"TLSv1.1",
"TLSv1.2",
"-TLSv1.3"
],
"cpe" : "cpe:/a:apache:http_server:2.4.18",
"product" : "Apache httpd"
}
],
"timestamp" : ISODate("2019-09-27T17:24:58.538Z")
}
}
]
}
]
}
What I'm attempting to do is refresh the details held in the history array and add another entire array entry to represent the most recently collected data for the subdomain.name
The net result is that I will have multiple entries in the history array, each one timestamped the the date that the data was refreshed. That way I have a historical record of changes to any of the data held.
I've read that I can't use $push on a double-nested array but the other advice about using arrayfilters all appear to be related to updating an entry in an array rather than simply appending an entirely new document - unless I'm missing something!
I'm using PyMongo and would simply like to build a new dictionary containing all of the data elements and simply append it to the history.
Thanks!
Straightforward in pymongo:
record = db.mycollection.find_one()
record['subdomains'][0]['history'].append({'another': 'record'})
db.mycollection.replace_one({'_id': record['_id']}, record)

Internal reference in protobuf?

I'm new to scala and I'm thinking of using protobuf to pass around some data. However, in the data, there are some common sets of values across different items. The data in JSON might look like this:
[
{
"id" : "1",
"value" : {
"field1" : "f1value.1",
"field2" : "f2value.1",
"field3": commonobject
}
},
{
"id" : "2",
"value" : {
"field1" : "f1value.2",
"field2" : "f2value.2",
"field3": commonobject
}
}
]
I am hoping to find a solution not to duplicated commonobject somehow. I'm wondering if there is an internal reference in protobuf, like $ref in JSON schema.
Thanks for the help!
protobuf messages cannot store references. You can store an object-id to reference common objects.

Using $last on Mongo Aggregation Pipeline

I searched for similar questions but couldn't find any. Feel free to point me in their direction.
Say I have this data:
{ "_id" : ObjectId("5694c9eed4c65e923780f28e"), "name" : "foo1", "attr" : "foo" }
{ "_id" : ObjectId("5694ca3ad4c65e923780f290"), "name" : "foo2", "attr" : "foo" }
{ "_id" : ObjectId("5694ca47d4c65e923780f294"), "name" : "bar1", "attr" : "bar" }
{ "_id" : ObjectId("5694ca53d4c65e923780f296"), "name" : "bar2", "attr" : "bar" }
If I want to get the latest record for each attribute group, I can do this:
> db.content.aggregate({$group: {_id: '$attr', name: {$last: '$name'}}})
{ "_id" : "bar", "name" : "bar2" }
{ "_id" : "foo", "name" : "foo2" }
I would like to have my data grouped by attr and then sorted by _id so that only the latest record remains in each group, and that's how I can achieve this. BUT I need a way to avoid naming all the fields that I want in the result (in this example "name") because in my real use case they are not known ahead.
So, is there a way to achieve this, but without having to explicitly name each field using $last and just taking all fields instead? Of course, I would sort my data prior to grouping and I just need to somehow tell Mongo "take all values from the latest one".
See some possible options here:
Do multiple find().sort() queries for each of the attr values you
want to search.
Grab the original _id of the $last doc, then do a findOne() for each of those values (this is the more extensible option).
Use the $$ROOT system variable as shown here.
This wouldn't be the quickest operation, but I assume you're using this more for analytics, not in response to a user behavior.
Edited to add slouc's example posted in comments:
db.content.aggregate({$group: {_id: '$attr', lastItem: { $last: "$$ROOT" }}}).

MongoDB GROUP BY and COUNT unknown keys

I am trying to GROUP BY and COUNT each key in each Mongo document but the keys may differ from document to document. I know how to group and count by explicitly calling each key like this:
db.test.aggregate([{"$group" : {_id:"$vcenter", count:{$sum:1}}}])
but how do I iterate through each key of each document without having to call out keys. I'm thinking a mapreduce function?
Here's a sample document:
"key1" : "vmx",
"key2" : "type",
"key3" : "cpu-idle",
and I'm looking for how many records per key like:
"Key1" : 1564
"Key2" : 1565
"Key3" : 458
Yes I can only think at mapreduce, since in the aggregation $group the _id is mandatory. So I'd write
map
function map(){for(var prop in this){emit(prop,1)}}
reduce
function reduce(key,values){return values.length;}
run command
db.inputCollectionName.mapReduce(map,reduce,{out:"outputCollectionName"})
You should then find in your output collection something like
{ "_id" : "key1", "value" : 1564 }
{ "_id" : "Key2", "value" : 1565 }
{ "_id" : "Key3", "value" : 458 }
Is that good for you?

MongoDB: How to merge two collections/databases together into one?

I have two databases named: DB_A and DB_B.
Each database has one collection with same name called store.
Both collections have lots lots of documents that have exactly the same structure { key:" key1", value: "value1" }, etc.
Actually, I was supposed to only create DB_A and insert all documents into DB_A. But later when I did my second round of inserting, I made a mistake by typing the wrong name as the database name.
So now, each database has a size of 32GB, I wish to merge two databases.
One problem/constraint is that the free space available now is only 15GB, so I can't just copy all things from DB_B to DB_A.
I am wondering if I can perform some kind of "move" to merge the two databases? I prefer the most efficient way as simply reinserting 32GB into DB_A will take quite a time.
I think the easiest (and maybe the only) way is to write a script that merges the two databases document after document.
Get first document from DB_B.
Insert it into DB_A if needed.
Delete it from DB_B.
Repeat until done.
Instead of deleting documents from source db (DB_B), you may want to just read documents in batches. This should be more performant, but slightly more difficult to code (especially if you never done such a thing).
Starting Mongo 4.2, the new aggregation stage $merge can be used to merge the content of a collection in another collection in another database:
// > use db1
// > db.collection.find()
// { "_id" : 1, "key" : "a", "value" : "b" }
// { "_id" : 2, "key" : "c", "value" : "d" }
// { "_id" : 3, "key" : "a", "value" : "b" }
// > use db2
// > db.collection.find()
// { "_id" : 1, "key" : "e", "value" : "f" }
// { "_id" : 4, "key" : "a", "value" : "b" }
// > use db1
db.collection.aggregate([
{ $merge: { into: { db: "db2", coll: "coll" } } }
])
// > use db2
// > db.collection.find()
// { "_id" : 1, "key" : "a", "value" : "b" }
// { "_id" : 2, "key" : "c", "value" : "d" }
// { "_id" : 3, "key" : "a", "value" : "b" }
// { "_id" : 4, "key" : "a", "value" : "b" }
By default, when the target and the source collections contain a document with the same _id, $merge will replace the document from the target collection with the document from the source collection. In order to customise this behaviour, check $merge's whenMatched parameter.