Update existing mongodb data into an embedded document - mongodb

I am new to MongoDB so this is probably a basic question (hopefully). I currently have 10 million records with 410 fields loaded in a mongodb collection like so:
{
"_id" : ObjectId("........"),
"AddressID" : 123455,
"IndividualId" : 1,
"personfirstname" : "FirstName",
"personmiddleinitial" : "M",
"personlastname" : "LastName",
"etc": "....."
}
I need to wrap all of this data into an embedded document like so:
{
"_id" : ObjectId("........"),
"data" : {
"AddressID" : 123455,
"IndividualId" : 1,
"personfirstname" : "FirstName",
"personmiddleinitial" : "M",
"personlastname" : "LastName",
"etc": "....."
}
I don't necessarily need to update this data in-place but that would be nice. If I need to export this data somehow specifying the new format and then re-import the new, updated data that is fine. Performing this via the MongoDB shell would be ideal.

As suggested by chridam within comments you can execute the following aggregation pipeline:
db.collectionName.aggregate([
{ $project: { _id: "$_id", data: "$$ROOT" } },
{ $out: "newCollectionName" }
]);
This way you have the _id field both at root level and in the data object. Thus, you can execute a massive update to unset the second one:
db.newCollectionName.updateMany(
{},
{ $unset: { "data._id": "" } }
);
Finally, you can drop the first collection and rename the second to restore the original name on the updated collection:
db.collectionName.drop();
db.newCollectionName.rename("collectionName");
This approach fully works within the database, avoiding fetching any of your 10 million documents.

You can simply do this in the shell with the following
db.test.find().forEach(function(doc){
doc = { _id: doc._id, data: doc };
delete doc.data._id;
db.test.save(doc);
});
For example, if we insert the following documents:
> db.test.insertMany([
... {
... _id: ObjectId("5a91af8908e17c5997e03b7e"),
... field1: false,
... field2: 0,
... field3: "No"
... },
... {
... _id: ObjectId("5a91afbc08e17c5997e03b7f"),
... field1: true,
... field2: 1,
... field3: "Yes"
... }])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("5a91af8908e17c5997e03b7e"),
ObjectId("5a91afbc08e17c5997e03b7f")
]
}
Then run:
db.test.find().forEach(function(doc){
doc = { _id: doc._id, data: doc };
delete doc.data._id;
db.test.save(doc);
});
Our documents now look like this:
> db.test.find().pretty()
{
"_id" : ObjectId("5a91af8908e17c5997e03b7e"),
"data" : {
"field1" : false,
"field2" : 0,
"field3" : "No"
}
}
{
"_id" : ObjectId("5a91afbc08e17c5997e03b7f"),
"data" : {
"field1" : true,
"field2" : 1,
"field3" : "Yes"
}
}

Related

MongoDB: How to get the object names in collection?

and think you in advance for the help. I have recently started using mongoDB for some personal project and I'm interested in finding a better way to query my data.
My question is: I have the following collection:
{
"_id" : ObjectId("5dbd77f7a204d21119cfc758"),
"Toyota" : {
"Founder" : "Kiichiro Toyoda",
"Founded" : "28 August 1937",
"Subsidiaries" : [
"Lexus",
"Daihatsu",
"Subaru",
"Hino"
]
}
}
{
"_id" : ObjectId("5dbd78d3a204d21119cfc759"),
"Volkswagen" : {
"Founder" : "German Labour Front",
"Founded" : "28 May 1937",
"Subsidiaries" : [
"Audi",
"Volkswagen",
"Skoda",
"SEAT"
]
}
}
I want to get the object name for example here I want to return
[Toyota, Volkswagen]
I have use this method
var names = {}
db.cars.find().forEach(function(doc){Object.keys(doc).forEach(function(key){names[key]=1})});
names;
which gave me the following result:
{ "_id" : 1, "Toyota" : 1, "Volkswagen" : 1 }
however, is there a better way to get the same result and also to just return the names of the objects. Thank you.
I would suggest you to change the schema design to be something like:
{
_id: ...,
company: {
name: 'Volkswagen',
founder: ...,
subsidiaries: ...,
...<other fields>...
}
You can then use the aggregation framework to achieve a similar result:
> db.test.find()
{ "_id" : 0, "company" : { "name" : "Volkswagen", "founder" : "German Labour Front" } }
{ "_id" : 1, "company" : { "name" : "Toyota", "founder" : "Kiichiro Toyoda" } }
> db.test.aggregate([ {$group: {_id: null, companies: {$push: '$company.name'}}} ])
{ "_id" : null, "companies" : [ "Volkswagen", "Toyota" ] }
For more details, see:
Aggregation framework
$group
Accumulator operators
As a bonus, you can create an index on the company.name field, whereas you cannot create an index on varying field names like in your example.

MongoDB Conditional validation on arrays and embedded documents

I have a number of documents in my database where I am applying document validation. All of these documents may have embedded documents. I can apply simple validation along the lines of SQL non NULL checks (these are essentially enforcing the primary key constraints) but what I would like to do is apply some sort of conditional validation to the optional arrays and embedded documents. By example, lets say I have a document that looks like this:
{
"date": <<insertion date>>,
"name" : <<the portfolio name>>,
"assets" : << amount of money we have to trade with>>
}
Clearly I can put validation on this document to ensure that date name and assets all exist at insertion time. Lets say, however, that I'm managing a stock portfolio and the document can have future updates to show an array of stocks like this:
{
"date" : <<insertion date>>,
"name" : <<the portfolio name>>,
"assets" : << amount of money we have to trade with>>
"portfolio" : [
{ "stockName" : "IBM",
"pricePaid" : 155.39,
"sharesHeld" : 100
},
{ "stockName" : "Microsoft",
"pricePaid" : 57.22,
"sharesHeld" : 250
}
]
}
Is it possible to to apply a conditional validation to this array of sub documents? It's valid for the portfolio to not be there but if it is each document in the array must contain the three fields "stockName", "pricePaid" and "sharesHeld".
MongoShell
db.createCollection("collectionname",
{
validator: {
$or: [
{
"portfolio": {
$exists: false
}
},
{
$and: [
{
"portfolio": {
$exists: true
}
},
{
"portfolio.stockName": {
$type: "string",
$exists: true
}
},
{
"portfolio.pricePaid": {
$type: "double",
$exists: true
}
},
{
"portfolio.sharesHeld": {
$type: "double",
$exists: true
}
}
]
}
]
}
})
With this above validation in place you can insert documents with or without portfolio.
After executing the validator in shell, then you can insert data of following
db.collectionname.insert({
"_id" : ObjectId("58061aac8812662c9ae1b479"),
"date" : ISODate("2016-10-18T12:50:52.372Z"),
"name" : "B",
"assets" : 200
})
db.collectionname.insert({
"_id" : ObjectId("58061ab48812662c9ae1b47a"),
"date" : ISODate("2016-10-18T12:51:00.747Z"),
"name" : "A",
"assets" : 100,
"portfolio" : [
{
"stockName" : "Microsoft",
"pricePaid" : 57.22,
"sharesHeld" : 250
}
]
})
If we try to insert a document like this
db.collectionname.insert({
"date" : new Date(),
"name" : "A",
"assets" : 100,
"portfolio" : [
{ "stockName" : "IBM",
"sharesHeld" : 100
}
]
})
then we will get the below error message
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 121,
"errmsg" : "Document failed validation"
}
})
Using Mongoose
Yes it can be done, Based on your scenario you may need to initialize the parent and the child schema.
Shown below would be a sample of child(portfolio) schema in mongoose.
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
var portfolioSchema = new Schema({
"stockName" : { type : String, required : true },
"pricePaid" : { type : Number, required : true },
"sharesHeld" : { type : Number, required : true },
}
References:
http://mongoosejs.com/docs/guide.html
http://mongoosejs.com/docs/subdocs.html
Can I require an attribute to be set in a mongodb collection? (not null)
Hope it Helps!

Mongo remove from nested object by value

I have a Mongo collection the consists of a document and a nested object describing what collections the document is in and when it was added. I would like to remove key-value pairs from a nested object based on a condition, e.g. is the value (a date) before 1-1-2016.
Example:
{
"_id" : ObjectId("581214940911ad3de98002db"),
"collections" : {
"c01" : ISODate("2016-10-27T15:52:04.512Z"),
"c02" : ISODate("2015-11-21T16:06:06.546Z")
}
}
needs to become
{
"_id" : ObjectId("581214940911ad3de98002db"),
"collections" : {
"c01" : ISODate("2016-10-27T15:52:04.512Z"),
}
}
One alternative would be to change the schema to something like this:
{
"_id" : ObjectId("581214940911ad3de98002db"),
"collections" : [
{
"id": "c01",
"date": ISODate("2016-10-27T15:52:04.512Z")
},
{
"id": "c02",
"date" : ISODate("2015-11-21T16:06:06.546Z")
}
]
}
in which case removing a document from a would be easy. I am a bit reluctant to do that because it would complicate some of the other queries I would like to support. Thanks!
I prefer the second structure for your schema
{
"_id" : ObjectId("581214940911ad3de98002db"),
"collections" : [
{
"id": "c01",
"date": ISODate("2016-10-27T15:52:04.512Z")
},
{
"id": "c02",
"date" : ISODate("2015-11-21T16:06:06.546Z")
}
]
}
then able to remove from collections like this
db.collectionName.update(
{ },// if you want can add query for specific Id {"_id" : requestId},
{ $pull: { collections: { date: {$lt: yourDate} } } }, // if need can convert iso date string like: new Date(yourDate).toISOString()
{ multi: true }
)

Increment nested value

I create players the following way.
Players.insert({
name: name,
score: 0,
items: [{'name': 0}, {'name2': 0}...]
});
How do I increment the score in a specific player and specific item name (upserting if necessary)?
Sorry for the terrible wording :p
Well, the answer is - as in life - to simplify the problem by breaking it up.
And to avoid arrays in mongoDB - after all, objects can have as many keys as you like. So, my structure became:
{
"_id": <id>,
"name": <name>,
"score": <score>,
"items": {}
}
And to increment the a dynamic key in items:
// create your update skeleton first
var ud = { $inc: {} };
// fill it in
ud.$inc['item.' + key] = value;
// call it
db.Players.update(player, ud, true);
Works a charm :)
Lets say you have:
{
"_id" : ObjectId("5465332e6c3e2eeb66ef3683"),
"name" : "Alex",
"score" : 0,
"items" : [
{
"food" : 0
}
]
}
To update you can do:
db.Players.update({name: "Alex", "items.food": {$exists : true}},
{$inc: {score: 1, "items.$.food": 5}})
Result:
{
"_id" : ObjectId("5465332e6c3e2eeb66ef3683"),
"name" : "Alex",
"score" : 1,
"items" : [
{
"food" : 5
}
]
}
I am not sure you can upsert if the document doesn't exist because of the positional operator needed to update the array.

Upsert with pymongo and a custom _id field

I'm attempting to store pre-aggregated performance metrics in a sharded mongodb according to this document.
I'm trying to update the minute sub-documents in a record that may or may not exist with an upsert like so (self.collection is a pymongo collection instance):
self.collection.update(query, data, upsert=True)
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
data:
{ 'minute': { '16': { '45': 1.6693091}}}
The problem is that in this case the 'minute' subdocument always only has the last hour: { minute: metric} entry, the minute subdocument does not create new entries for other hours, it's always overwriting the one entry.
I've also tried this with a $set style data entry:
{ '$set': { 'minute': { '16': { '45': 1.6693091}}}}
but it ends up being the same.
What am I doing wrong?
In both of the examples listed you are simply setting a field ('minute')to a particular value, the only reason it is an addition the first time you update is because the field itself does not exist and so must be created.
It's hard to determine exactly what you are shooting for here, but I think what you could do is alter your schema a little so that 'minute' is an array. Then you could use $push to add values regardless of whether they are already present or $addToSet if you don't want duplicates.
I had to alter your document a little to make it valid in the shell, so my _id (and some other fields) are slightly different to yours, but it should still be close enough to be illustrative:
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
}
}
Now let's add a minute field with an array of documents instead of a single document:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '16': {'45': 1.6693091}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
}
]
}
Then, to illustrate the addition, add a slightly different entry (since I am using $addToSet this is required for a new field to be added:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '17': {'48': 1.6693391}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
},
{
"17" : {
"48" : 1.6693391
}
}
]
}
I ended up setting the fields like this:
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
I'm setting the metrics like this:
data = {"$set": {}}
for metric in csv:
date_utc = metric['date'].astimezone(pytz.utc)
data["$set"]["minute.%d.%d" % (date_utc.hour,
date_utc.minute)] = float(metric['metric'])
which creates data like this:
{"$set": {'minute.16.45': 1.6693091,
'minute.16.46': 1.566343,
'minute.16.47': 1.22322}}
So that when self.collection.update(query, data, upsert=True) is run it updates those fields.