How to view document fields in mongo shell? - mongodb

Is there a way to figure out the fields/keys in a document while in mongo's shell? As an example, let's say we have a document like (pseudocode):
{
"message": "Hello, world",
"from": "hal",
"field": 123
}
I'd like to run a command in the shell that returns the list of fields/keys in that document. For instance, something like this:
> var message = db.messages.findOne()
> message.keys()
... prints out "message, from, field"

Even easier:
Object.keys(db.messages.findOne())

A for ... in loop should do the trick:
> var message = db.messages.findOne();
> for (var key in message) {
... print(key);
... }

Other answers are correct.
However, as I am completely new, I didn't understand where & how the above commands need to be executed.
Below helped, from my github.
On Windows: Run this code in a command prompt (cmd).
On Mac or Linux: Run this code in a terminal window.
// ------------
// start mongo client
mongo
// ------------
// list all databases
show dbs
// NOTE: assume one of the databases is myNewDatabase
// use the 'myNewDatabase' database
use myNewDatabase
// ------------
// show all collections of 'myNewDatabase' database
show collections
// NOTE: assume one of the collections is 'myCollection'
// show all documents of 'myCollection' collection
db.myCollection.find()
// ------------
// field keys
Object.keys(db.myCollection.findOne());
// values
db.myCollection.find().forEach(function(doc) {
for (field in doc) {
print(doc[field]);
}
});
// ------------

To get a list of all fields used in a collection in MongoDB, this is the way I found most straightforward (your mileage may vary :) ):
Create a .js file with the content:
use yourdbname
mr = db.runCommand({
"mapreduce" : "collectionName",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "collectionName" + "_keys"
})
db[mr.result].distinct("_id")
I found out how to do this here (GeoffTech blog)
I ran it from the shell to print the output in the console
mongo < nameOfYourFile.js
or dump the output in a text file:
mongo < nameOfYourFile.js > outputDir\nameOfYourOutputFile.txt
I'm totally new to MongoDb so I hope it does indeed get all fields regardless of use throughout the documents!
(I'm using MongoDb on windows 10, so my console may differ from yours)

You can do this in a way that gets all the fields even if not every document in the collection has some of them, and without creating a collection:
return db.collectionName.aggregate( [
{ $project : { x : { $objectToArray : "$$ROOT" } } },
{ $unwind : "$x" },
{ $group : { _id : null, keys : { $addToSet : "$x.k" } } },
] ).toArray()[0].keys.sort();
This is also a handy thing to add to the Mongo shell, which you can do by including it your .mongorc.js file in your home directory:
Object.assign( DBCollection.prototype, {
getAllFieldNames() {
return db[ this._shortName ].aggregate( [
{ $project : { x : { $objectToArray : "$$ROOT" } } },
{ $unwind : "$x" },
{ $group : { _id : null, keys : { $addToSet : "$x.k" } } },
] ).toArray()[0].keys.sort();
},
} );
Then you can just do db.myCollection.getAllFieldNames() when using the shell..

var task = db.task.find().next()
for (let key in task){print(key)}

Related

Convert Array to Json in Mongodb Aggregate

I have a Mongo Document in below format:
{
"id":"eafa3720-28e2-11ed-bf07"
"type":"test"
"serviceType_details": [
{
"is_custom_service_type": false,
"bill_amount": 100
}
]
}
"serviceType_details" Key doesn't have any definite schema.
Now I want to export it using MongoDB aggregate to Parquet so that I could use Presto to query it.
My Pipeline Code:
db.test_collection.aggregate([
{
$match: {
"id": "something"
}
},
{
$addFields: {
...
},
}
{
"$out" : {
"format" : {
"name" : "parquet",
"maxFileSize" : "10GB",
"maxRowGroupSize" : "100MB"
}
}
}
])
Now I want to export the value of "serviceType_details" in json string not as array ( when using current code parquet recognises it as an array)
I have tried $convert,$project and it's not working.
Currently the generated Parquet schema looks something like this:
I want the generated Parquet schema for "serviceType_details" to have as string and value should be stringify version of array which is present in mongo document.
Reason for me to have need it as string is because in each document "serviceType_details" details have completely different schema, its very difficult to maintain Athena table on top of it.
You can use the $function operator to define custom functions to implement behaviour not supported by the MongoDB Query Language
It could be done using "$function" like this:
db.test_collection.aggregate([
{
$match: {
"id": "something"
}
},
{
$addFields: {
newFieldName: {
$function: {
body: function(field) {
return (field != undefined && field != null) ? JSON.stringify(field) : "[]"
},
args: ["$field"],
lang: "js"
}
},
},
}
{
"$out" : {
"format" : {
"name" : "parquet",
"maxFileSize" : "10GB",
"maxRowGroupSize" : "100MB"
}
}
}
])
Executing JavaScript inside an aggregation expression may decrease performance. Only use the $function operator if the provided pipeline operators cannot fulfill your application's needs.

Complex mongodb document search

I'm attempting to write a find query where one of the keys is unknown at the time the query is run, for example on the following document I'm interested in returning the document if "setup" is true:
{
"a": {
"randomstringhere": {
"setup": true
}
}
}
However I can't work how to wildcard the "randomstringhere" field as it changes for each document in the collection.
Can somebody help?
There is not much you can do with that. But you can modify your collection schema like
{
"a": [
{
"keyName": "randomstringhere",
"setup": true
},
//...
]
}
you can than write query to look
{
'a' : { $elemMatch: { setup: true } ,
}
You can't do this with a single query, as with the current design you would need a mechanism to get all the random keys that you need and then assemble the query document that uses the $or operator in the event that you get a list of variable key name.
The first part of your operation is possible using Map-Reduce. The following mapreduce operation will populate a separate collection called collectionKeys with all the random keys as the _id values:
mr = db.runCommand({
"mapreduce": "collection",
"map" : function() {
for (var key in this.a) { emit(key, null); }
},
"reduce" : function() { },
"out": "collectionKeys"
})
To get a list of all the random keys, run distinct on the resulting collection:
db[mr.result].distinct("_id")
Example Ouput
["randomstring_1", "randomstring_2", "randomstring_3", "randomstring_4", ...]
Now given the list above, you can assemble your query by creating an object that will have its properties set within a loop. Normally your query document will have this structure:
var query = {
"$or": [
{ "a.randomstring_1.setup": true },
{ "a.randomstring_2.setup": true },
{ "a.randomstring_3.setup": true }
]
};
which you can then use in your query:
db.collection.find(query)
So using the above list of subdocument keys, you can dynamically construct the above using JavaScript's map() method:
mr = db.runCommand({
"mapreduce": "collection", // your collection name
"map" : function() { // map function
for (var key in this.a) { emit(key, null); }
},
"reduce" : function() { }, // empty reducer that doesn't do anything
"out": "collectionKeys" // output collection with results
})
var randomstringKeysList = db[mr.result].distinct("_id"),
orOperator = randomstringKeysList.map(function (key){
var o = {};
o["a."+ key +".setup"] = true;
return o;
}),
query = { "$or": orOperator };
db.collection.find(query);

Rename a sub-document field within an Array

Considering the document below how can I rename 'techId1' to 'techId'. I've tried different ways and can't get it to work.
{
"_id" : ObjectId("55840f49e0b"),
"__v" : 0,
"accessCard" : "123456789",
"checkouts" : [
{
"user" : ObjectId("5571e7619f"),
"_id" : ObjectId("55840f49e0bf"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"techId1" : ObjectId("553d9cbcaf")
},
{
"user" : ObjectId("5571e7619f15"),
"_id" : ObjectId("55880e8ee0bf"),
"date" : ISODate("2015-06-22T13:01:51.672Z"),
"techId1" : ObjectId("55b7db39989")
}
],
"created" : ISODate("2015-06-19T12:47:05.422Z"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"location" : ObjectId("55743c8ddbda"),
"model" : "model1",
"order" : ObjectId("55840f49e0bf"),
"rid" : "987654321",
"serialNumber" : "AHSJSHSKSK",
"user" : ObjectId("5571e7619f1"),
"techId" : ObjectId("55b7db399")
}
In mongo console I tried which gives me ok but nothing is actually updated.
collection.update({"checkouts._id":ObjectId("55840f49e0b")},{ $rename: { "techId1": "techId" } });
I also tried this which gives me an error. "cannot use the part (checkouts of checkouts.techId1) to traverse the element"
collection.update({"checkouts._id":ObjectId("55856609e0b")},{ $rename: { "checkouts.techId1": "checkouts.techId" } })
In mongoose I have tried the following.
collection.findByIdAndUpdate(id, { $rename: { "checkouts.techId1": "checkouts.techId" } }, function (err, data) {});
and
collection.update({'checkouts._id': n1._id}, { $rename: { "checkouts.$.techId1": "checkouts.$.techId" } }, function (err, data) {});
Thanks in advance.
You were close at the end, but there are a few things missing. You cannot $rename when using the positional operator, instead you need to $set the new name and $unset the old one. But there is another restriction here as they will both belong to "checkouts" as a parent path in that you cannot do both at the same time.
The other core line in your question is "traverse the element" and that is the one thing you cannot do in updating "all" of the array elements at once. Well, not safely and without possibly overwriting new data coming in anyway.
What you need to do is "iterate" each document and similarly iterate each array member in order to "safely" update. You cannot really iterate just the document and "save" the whole array back with alterations. Certainly not in the case where anything else is actively using the data.
I personally would run this sort of operation in the MongoDB shell if you can, as it is a "one off" ( hopefully ) thing and this saves the overhead of writing other API code. Also we're using the Bulk Operations API here to make this as efficient as possible. With mongoose it takes a bit more digging to implement, but still can be done. But here is the shell listing:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({ "checkouts.techId1": { "$exists": true } }).forEach(function(doc) {
doc.checkouts.forEach(function(checkout) {
if ( checkout.hasOwnProperty("techId1") ) {
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$set": { "checkouts.$.techId": checkout.techId1 }
});
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$unset": { "checkouts.$.techId1": 1 }
});
count += 2;
if ( count % 500 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
}
});
});
if ( count % 500 !== 0 )
bulk.execute();
Since the $set and $unset operations are happening in pairs, we are keeping the total batch size to 1000 operations per execution just to keep memory usage on the client down.
The loop simply looks for documents where the field to be renamed "exists" and then iterates each array element of each document and commits the two changes. As Bulk Operations, these are not sent to the server until the .execute() is called, where also a single response is returned for each call. This saves a lot of traffic.
If you insist on coding with mongoose. Be aware that a .collection acessor is required to get to the Bulk API methods from the core driver, like this:
var bulk = Model.collection.inititializeOrderedBulkOp();
And the only thing that sends to the server is the .execute() method, so this is your only execution callback:
bulk.exectute(function(err,response) {
// code body and async iterator callback here
});
And use async flow control instead of .forEach() such as async.each.
Also, if you do that, then be aware that as a raw driver method not governed by mongoose, you do not get the same database connection awareness as you do with mongoose methods. Unless you know for sure the database connection is already established, it is safter to put this code within an event callback for the server connection:
mongoose.connection.on("connect",function(err) {
// body of code
});
But otherwise those are the only real ( apart from call syntax ) alterations you really need.
This worked for me, I created this query to perform this procedure and I share it, (although I know it is not the most optimized way):
First, make an aggregate that (1) $match the documents that have the checkouts array field with techId1 as one of the keys of each sub-document. (2) $unwind the checkouts field (that deconstructs the array field from the input documents to output a document for each element), (3) adds the techId field (with $addFields), (4) $unset the old techId1 field, (5) $group the documents by _id to have again the checkout sub-documents grouped by its _id, and (6) write the result of these aggregation in a temporal collection (with $out).
const collection = 'yourCollection'
db[collection].aggregate([
{
$match: {
'checkouts.techId1': { '$exists': true }
}
},
{
$unwind: {
path: '$checkouts'
}
},
{
$addFields: {
'checkouts.techId': '$checkouts.techId1'
}
},
{
$project: {
'checkouts.techId1': 0
}
},
{
$group: {
'_id': '$_id',
'checkouts': { $push: { 'techId': '$checkouts.techId' } }
}
},
{
$out: 'temporal'
}
])
Then, you can make another aggregate from this temporal collection to $merge the documents with the modified checkouts field to your original collection.
db.temporal.aggregate([
{
$merge: {
into: collection,
on: "_id",
whenMatched:"merge",
whenNotMatched: "insert"
}
}
])

How to replace substring in mongodb document

I have a lot of mongodb documents in a collection of the form:
{
....
"URL":"www.abc.com/helloWorldt/..."
.....
}
I want to replace helloWorldt with helloWorld to get:
{
....
"URL":"www.abc.com/helloWorld/..."
.....
}
How can I achieve this for all documents in my collection?
db.media.find({mediaContainer:"ContainerS3"}).forEach(function(e,i) {
e.url=e.url.replace("//a.n.com","//b.n.com");
db.media.save(e);
});
Nowadays,
starting Mongo 4.2, db.collection.updateMany (alias of db.collection.update) can accept an aggregation pipeline, finally allowing the update of a field based on its own value.
starting Mongo 4.4, the new aggregation operator $replaceOne makes it very easy to replace part of a string.
// { URL: "www.abc.com/helloWorldt/..." }
// { URL: "www.abc.com/HelloWo/..." }
db.collection.updateMany(
{ URL: { $regex: /helloWorldt/ } },
[{
$set: { URL: {
$replaceOne: { input: "$URL", find: "helloWorldt", replacement: "helloWorld" }
}}
}]
)
// { URL: "www.abc.com/helloWorld/..." }
// { URL: "www.abc.com/HelloWo/..." }
The first part ({ URL: { $regex: /helloWorldt/ } }) is the match query, filtering which documents to update (the ones containing "helloWorldt") and is just there to make the query faster.
The second part ($set: { URL: {...) is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline):
$set is a new aggregation operator (Mongo 4.2) which in this case replaces the value of a field.
The new value is computed with the new $replaceOne operator. Note how URL is modified directly based on the its own value ($URL).
Before Mongo 4.4 and starting Mongo 4.2, due to the lack of a proper string $replace operator, we have to use a bancal mix of $concat and $split:
db.collection.updateMany(
{ URL: { $regex: "/helloWorldt/" } },
[{
$set: { URL: {
$concat: [
{ $arrayElemAt: [ { $split: [ "$URL", "/helloWorldt/" ] }, 0 ] },
"/helloWorld/",
{ $arrayElemAt: [ { $split: [ "$URL", "/helloWorldt/" ] }, 1 ] }
]
}}
}]
)
Currently, you can't use the value of a field to update it. So you'll have to iterate through the documents and update each document using a function. There's an example of how you might do that here: MongoDB: Updating documents using data from the same document
Using mongodump,bsondump and mongoimport.
Sometimes the mongodb collections can get little complex with nested arrays/objects etc where it would be relatively difficult to build loops around them. My work around is kinda raw but works in most scenarios regardless of complexity of the collection.
1. Export The collection using mongodump into .bson
mongodump --db=<db_name> --collection=<products> --out=data/
2. Convert .bson into .json format using bsondump
bsondump --outFile products.json data/<db_name>/products.bson
3. Replace the strings in the .json file with sed(for linux terminal) or with any other tools
sed -i 's/oldstring/newstring/g' products.json
4. Import back the .json collection with mongoimport with --drop tag where it would remove the collection before importing
mongoimport --db=<db_name> --drop --collection products <products.json
Alternatively you can use --uri for connections in both mongoimport
and mongodump
example
mongodump --uri "mongodb://mongoadmin:mystrongpassword#10.148.0.7:27017,10.148.0.8:27017,10.148.0.9:27017/my-dbs?replicaSet=rs0&authSource=admin" --collection=products --out=data/
To replace ALL occurrences of the substring in your document use:
db.media.find({mediaContainer:"ContainerS3"}).forEach(function(e,i) {
var find = "//a.n.com";
var re = new RegExp(find, 'g');
e.url=e.url.replace(re,"//b.n.com");
db.media.save(e);
});
nodejs. Using mongodb package from npm
db.collection('ABC').find({url: /helloWorldt/}).toArray((err, docs) => {
docs.forEach(doc => {
let URL = doc.URL.replace('helloWorldt', 'helloWorld');
db.collection('ABC').updateOne({_id: doc._id}, {URL});
});
});
The formatting of my comment to the selected answer (#Naveed's answer) has got scrambled - so adding this as an answer. All credit goes to Naveed.
----------------------------------------------------------------------
Just awesome.
My case was - I have a field which is an array - so I had to add an extra loop.
My query is:
db.getCollection("profile").find({"photos": {$ne: "" }}).forEach(function(e,i) {
e.photos.forEach(function(url, j) {
url = url.replace("http://a.com", "https://dev.a.com");
e.photos[j] = url;
});
db.getCollection("profile").save(e);
eval(printjson(e));
})
This can be done by using the Regex in the first part of the method replace and it will replace the [all if g in regex pattern] occurrence(s) of that string with the second string, this is the same regex as in Javascript e.g:
const string = "www.abc.com/helloWorldt/...";
console.log(string);
var pattern = new RegExp(/helloWorldt/)
replacedString = string.replace(pattern, "helloWorld");
console.log(replacedString);
Since the regex is replacing the string, now we can do this is MongoDB shell easily by finding and iterating with each element by the method forEach and saving one by one inside the forEach loop as below:
> db.media.find()
{ "_id" : ObjectId("5e016628a16075c5bd26fbe3"), "URL" : "www.abc.com/helloWorld/" }
{ "_id" : ObjectId("5e016701a16075c5bd26fbe4"), "URL" : "www.abc.com/helloWorldt/" }
>
> db.media.find().forEach(function(o) {o.URL = o.URL.replace(/helloWorldt/, "helloWorld"); printjson(o);db.media.save(o)})
{
"_id" : ObjectId("5e016628a16075c5bd26fbe3"),
"URL" : "www.abc.com/helloWorld/"
}
{
"_id" : ObjectId("5e016701a16075c5bd26fbe4"),
"URL" : "www.abc.com/helloWorld/"
}
> db.media.find()
{ "_id" : ObjectId("5e016628a16075c5bd26fbe3"), "URL" : "www.abc.com/helloWorld/" }
{ "_id" : ObjectId("5e016701a16075c5bd26fbe4"), "URL" : "www.abc.com/helloWorld/" }
>
If you want to search for a sub string, and replace it with another, you can try like below,
db.collection.find({ "fieldName": /.*stringToBeReplaced.*/ }).forEach(function(e, i){
if (e.fieldName.indexOf('stringToBeReplaced') > -1) {
e.content = e.content.replace('stringToBeReplaced', 'newString');
db.collection.update({ "_id": e._id }, { '$set': { 'fieldName': e.fieldName} }, false, true);
}
})
Now you can do it!
We can use Mongo script to manipulate data on the fly. It works for me!
I use this script to correct my address data.
Example of current address: "No.12, FIFTH AVENUE,".
I want to remove the last redundant comma, the expected new address ""No.12, FIFTH AVENUE".
var cursor = db.myCollection.find().limit(100);
while (cursor.hasNext()) {
var currentDocument = cursor.next();
var address = currentDocument['address'];
var lastPosition = address.length - 1;
var lastChar = address.charAt(lastPosition);
if (lastChar == ",") {
var newAddress = address.slice(0, lastPosition);
currentDocument['address'] = newAddress;
db.localbizs.update({_id: currentDocument._id}, currentDocument);
}
}
Hope this helps!
db.filetranscoding.updateMany({ profiles: { $regex: /N_/ } },[{$set: { profiles: {$$replaceAll: { input: "$profiles", find:"N_",replacement: "" }},"status":"100"}}])
filetranscoding -- Collection Name
profiles -- ColumnName in which you want to update
/N_/ -- String which you are searching (where Condition )
find:"N_",replacement: "" -- N_ which u want to remove "" from which you want to remove here we are taking blank String

How to count document elements inside a mongo collection with php?

I have the following structure of a mongo document:
{
"_id": ObjectId("4fba2558a0787e53320027eb"),
"replies": {
"0": {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
}
"1": {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
}
"2" ....
}
}
How do I count all the replies from all the documents in the collection?
Thank you!
In the following answer, I'm working with a simple data set with five replies across the collection:
> db.foo.find()
{ "_id" : ObjectId("4fba6b0c7c32e336fc6fd7d2"), "replies" : [ 1, 2, 3 ] }
{ "_id" : ObjectId("4fba6b157c32e336fc6fd7d3"), "replies" : [ 1, 2 ] }
Since we're not simply counting documents, db.collection.count() won't help us here. We'll need to resort to MapReduce to scan each document and aggregate the reply array lengths. Consider the following:
db.foo.mapReduce(
function() { emit('totalReplies', { count: this.replies.length }); },
function(key, values) {
var result = { count: 0 };
values.forEach(function(value) {
result.count += value.count;
});
return result;
},
{ out: { inline: 1 }}
);
The map function (first argument) runs across the entire collection and emits the number of replies in each document under a constant key. Mongo will then consider all emitted values and run the reduce function (second argument) a number of times to consolidate (literally reduce) the result. Hopefully the code here is straightforward. If you're new to map/reduce, one caveat is that the reduce method must be capable of processing its own output. This is explained in detail in the MapReduce docs linked above.
Note: if your collection is quite large, you may have to use another output mode (e.g. collection output); however, inline works well for small data sets.
Lastly, if you're using MongoDB 2.1+, we can take advantage of the Aggregation Framework to avoid writing JS functions and make this even easier:
db.foo.aggregate(
{ $project: { replies: 1 }},
{ $unwind: "$replies" },
{ $group: {
_id: "result",
totalReplies: { $sum: 1 }
}}
);
Three things are happening here. First, we tell Mongo that we're interested in the replies field. Secondly, we want to unwind the array so that we can iterate over all elements across the fields in our projection. Lastly, we'll tally up results under a "result" bucket (any constant will do), adding 1 to the totalReplies result for each iteration. Executing this query will yield the following result:
{
"result" : [{
"_id" : "result",
"totalReplies" : 5
}],
"ok" : 1
}
Although I wrote the above answers with respect to the Mongo client, you should have no trouble translating them to PHP. You'll need to use MongoDB::command() to run either MapReduce or aggregation queries, as the PHP driver currently has no helper methods for either. There's currently a MapReduce example in the PHP docs, and you can reference this Google group post for executing an aggregation query through the same method.
I haven't checked your code, might work as well. I've did the following and it just works:
$replies = $db->command(
array(
"distinct" => "foo",
"key" => "replies"
)
);
$all = count($replies['values']);
I've did it again using the group command of the PHP Mongo Driver. It's similar to a MapReduce command.
$keys = array("replies.type" => 1); //keys for group by
$initial = array("count" => 0); //initial value of the counter
$reduce = "function (obj, prev) { prev.count += obj.replies.length; }";
$condition = array('replies' => array('$exists' => true), 'replies.type' => 'follow');
$g = $db->foo->group($keys, $initial, $reduce, $condition);
echo $g['count'];
Thanks jmikola for giving links to Mongo.
JSON should be
{
"_id": ObjectId("4fba2558a0787e53320027eb"),
"replies":[
{
0: {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
},
1: {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
},
2: {....}
]
}