MongoDB : best way to storing key value data - mongodb

I have to migrate from MySQL to MongoDB , and i beginner in MongoDB, what is the best way to storing below data in MongoDB ?
should i create a document for each row?
should i save all row in a one document?
Which one is valid way in MongoDB?
{
"_id" : ObjectId("5659d56fef6c702fbc45cc1b")
"key" : "setting_update_id"
"value" : "1"
"extra" :
[
//some data
]
}
OR
{
"_id" : ObjectId("5659d56fef6c702fbc45cc1b")
"setting_update_id" : "1"
"extra" :
[
//some data
]
}

Ali,
As a beginner you would want to read the docs here. Each collection can be thought of roughly as a table in a relational database. And each document can be thought of as a row in the database. So each column of your table would be the keys of your document.
I would design it closer to the first one.
{
"_id" : ObjectId("5659d56fef6c702fbc45cc1b")
"key" : "setting_update_id"
"value" : "1"
"params" :
{
"extra" : "hello",
"foo" : "bar"
}
}

Related

String timestamp performance in Mongo

I have a collection in mongo [4.0.9] which stores document like below
{ "_id" : 187489726, "mykey" : { "data" : [ { "id" : 2, "value" : "No" } ], "timestamp" : "2020-06-03 10:40:52.718" } }
If I fire queries like so, would it have performance impact because timestamp is string in the collection.
db.mycollection.find({"mykey.timestamp" : {$get : "2020-06-05 10:00:10.269"}},{"mykey": 1}).count()
I would have close 20 million records in DB.
Could not find any benchmark or documentation such a scenario. I can still convert all in ISO format.

Adding to a double-nested array in MongoDB

I have a double nested array in my MongoDB schema and I'm trying to add an entirely new array element to a second-level nested array using $push. I'm getting the error cannot use the part (...) to traverse the element
A documents have the following structure
{
"_id" : ObjectId("5d8e37eb46c064790a28a467"),
"org-name" : "Manchester University NHS Foundation Trust",
"domain" : "mft.nhs.uk",
"subdomains" : [ {
"name" : "careers.mft.nhs.uk",
"firstSeen" : "2017-10-06 11:32:00",
"history" : [
{
"a_rr" : "80.244.185.184",
"timestamp" : ISODate("2019-09-27T17:24:57.148Z"),
"asn" : 61323,
"asn_org" : "Ukfast.net Limited",
"city" : null,
"country" : "United Kingdom",
"shodan" : {
"ports" : [
{
"port" : 443,
"versions" : [
"TLSv1",
"-SSLv2",
"-SSLv3",
"TLSv1.1",
"TLSv1.2",
"-TLSv1.3"
],
"cpe" : "cpe:/a:apache:http_server:2.4.18",
"product" : "Apache httpd"
}
],
"timestamp" : ISODate("2019-09-27T17:24:58.538Z")
}
}
]
}
]
}
What I'm attempting to do is refresh the details held in the history array and add another entire array entry to represent the most recently collected data for the subdomain.name
The net result is that I will have multiple entries in the history array, each one timestamped the the date that the data was refreshed. That way I have a historical record of changes to any of the data held.
I've read that I can't use $push on a double-nested array but the other advice about using arrayfilters all appear to be related to updating an entry in an array rather than simply appending an entirely new document - unless I'm missing something!
I'm using PyMongo and would simply like to build a new dictionary containing all of the data elements and simply append it to the history.
Thanks!
Straightforward in pymongo:
record = db.mycollection.find_one()
record['subdomains'][0]['history'].append({'another': 'record'})
db.mycollection.replace_one({'_id': record['_id']}, record)

Zipping two collections in mongoDB

Not a question about joins in mongoDB
I have two collections in mongoDB, which do not have a common field and which I would like to apply a zip function to (like in Python, Haskell). Both collections have the same number of documents.
For example:
Let's say one collection (Users) is for users, and the other (Codes) is of unique randomly generated codes.
Collection Users:
{ "_id" : ObjectId(""), "userId" : "123"}
{ "_id" : ObjectId(""), "userId" : "456"}
Collection Codes:
{ "_id" : ObjectId(""), "code" : "randomCode1"}
{ "_id" : ObjectId(""), "code" : "randomCode2"}
The desired output would to assign a user to a unique code. As follows:
Output
{ "_id" : ObjectId(""), "code" : "randomCode1", "userId" : "123"}
{ "_id" : ObjectId(""), "code" : "randomCode2", "userId" : "456"}
Is there any way of doing this with the aggregation pipeline?
Or perhaps with map reduce? Don't think so because it only works on one collection.
I've considered inserting another random id into both collections for each document pair, and then using $lookup with this new id, but this seems like an overkill. Also the alternative would be to export and use Python, since there aren't so many documents, but again I feel like there should be a better way.
I would do something like this to get the records from collection 1 & 2 and merge the required fields into single object.
You have already confirmed that number of records in collection 1 and 2 are same.
The below code will loop through the cursor and map the required fields into one object. Finally, you can print the object to console or insert into another new collection (commented the insert).
var usersCursor = db.users.find( { } );
var codesCursor = db.codes.find( { } );
while (usersCursor.hasNext() && codesCursor.hasNext()) {
var user = usersCursor.next();
var code = codesCursor.next();
var outputObj = {};
outputObj ["_id"] = new ObjectId();
outputObj ["userId"] = user["userId"];
outputObj ["code"] = code["code"];
printjson( outputObj);
//db.collectionName.insertOne(outputObj);
}
Output:-
{
"_id" : ObjectId("58348512ba41f1f22e600c74"),
"userId" : "123",
"code" : "randomCode1"
}
{
"_id" : ObjectId("58348512ba41f1f22e600c75"),
"userId" : "456",
"code" : "randomCode2"
}
Unlike relational database in MongoDB you doing JOIN stuff at the app level (so it will be easy to horizontal scale the database). You need to do that in the app level.

Resolving MongoDB DBRef array using Mongo Native Query and working on the resolved documents

My MongoDB collection is made up of 2 main collections :
1) Maps
{
"_id" : ObjectId("542489232436657966204394"),
"fileName" : "importFile1.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042e9")
},
{
"$ref" : "territories",
"$id" : ObjectId("5424892224366579662042ea")
}
]
},
{
"_id" : ObjectId("542489262436657966204398"),
"fileName" : "importFile2.json",
"territories" : [
{
"$ref" : "territories",
"$id" : ObjectId("542489232436657966204395")
}
],
"uploadDate" : ISODate("2012-08-22T09:06:40.000Z")
}
2) Territories, which are referenced in "Map" objects :
{
"_id" : ObjectId("5424892224366579662042e9"),
"name" : "Afghanistan",
"area" : 653958
},
{
"_id" : ObjectId("5424892224366579662042ea"),
"name" : "Angola",
"area" : 1252651
},
{
"_id" : ObjectId("542489232436657966204395"),
"name" : "Unknown",
"area" : 0
}
My objective is to list every map with their cumulative area and number of territories. I am trying the following query :
db.maps.aggregate(
{'$unwind':'$territories'},
{'$group':{
'_id':'$fileName',
'numberOf': {'$sum': '$territories.name'},
'locatedArea':{'$sum':'$territories.area'}
}
})
However the results show 0 for each of these values :
{
"result" : [
{
"_id" : "importFile2.json",
"numberOf" : 0,
"locatedArea" : 0
},
{
"_id" : "importFile1.json",
"numberOf" : 0,
"locatedArea" : 0
}
],
"ok" : 1
}
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example of such a case in the Mongo doc. area is stored as an integer, and name as a string.
I probably did something wrong when trying to access to the member variables of Territory (name and area), but I couldn't find an example
of such a case in the Mongo doc. area is stored as an integer, and
name as a string.
Yes indeed, the field "territories" has an array of database references and not the actual documents. DBRefs are objects that contain information with which we can locate the actual documents.
In the above example, you can clearly see this, fire the below mongo query:
db.maps.find({"_id":ObjectId("542489232436657966204394")}).forEach(function(do
c){print(doc.territories[0]);})
it will print the DBRef object rather than the document itself:
o/p: DBRef("territories", ObjectId("5424892224366579662042e9"))
so, '$sum': '$territories.name','$sum': '$territories.area' would show you '0' since there are no fields such as name or area.
So you need to resolve this reference to a document before doing something like $territories.name
To achieve what you want, you can make use of the map() function, since aggregation nor Map-reduce support sub queries, and you already have a self-contained map document, with references to its territories.
Steps to achieve:
a) get each map
b) resolve the `DBRef`.
c) calculate the total area, and the number of territories.
d) make and return the desired structure.
Mongo shell script:
db.maps.find().map(function(doc) {
var territory_refs = doc.territories.map(function(terr_ref) {
refName = terr_ref.$ref;
return terr_ref.$id;
});
var areaSum = 0;
db.refName.find({
"_id" : {
$in : territory_refs
}
}).forEach(function(i) {
areaSum += i.area;
});
return {
"id" : doc.fileName,
"noOfTerritories" : territory_refs.length,
"areaSum" : areaSum
};
})
o/p:
[
{
"id" : "importFile1.json",
"noOfTerritories" : 2,
"areaSum" : 1906609
},
{
"id" : "importFile2.json",
"noOfTerritories" : 1,
"areaSum" : 0
}
]
Map-Reduce functions should not be and cannot be used to resolve DBRefs in the server side.
See what the documentation has to say:
The map function should not access the database for any reason.
The map function should be pure, or have no impact outside of the
function (i.e. side effects.)
The reduce function should not access the database, even to perform
read operations. The reduce function should not affect the outside
system.
Moreover, a reduce function even if used(which can never work anyway) will never be called for your problem, since a group w.r.t "fileName" or "ObjectId" would always have only one document, in your dataset.
MongoDB will not call the reduce function for a key that has only a
single value

Retrieving only the relevant part of a stored document

I'm a newbie with MongoDB, and am trying to store user activity performed on a site. My data is currently structured as:
{ "_id" : ObjectId("4decfb0fc7c6ff7ff77d615e"),
"activity" : [
{
"action" : "added",
"item_name" : "iPhone",
"item_id" : 6140,
},
{
"action" : "added",
"item_name" : "iPad",
"item_id" : 7220,
}
],
"name" : "Smith,
"user_id" : 2
}
If I want to retrieve, for example, all the activity concerning item_id 7220, I would use a query like:
db.find( { "activity.item_id" : 7220 } );
However, this seems to return the entire document, including the record for item 6140.
Can anyone suggest how this might be done correctly? I'm not sure if it's a problem with my query, or with the structure of the data itself.
Many thanks.
You have to wait the following dev: https://jira.mongodb.org/browse/SERVER-828
You can use $slice only if you know insertion order and position of your element.
Standard queries on MongoDb always return all document.
(question also available here: MongoDB query to return only embedded document)