Put properties with different name in one field in MongoDB - mongodb

I am getting requests from different devices as Json. Some of them show temperature as "T", some other as "temp" and it can be different in other devices. is that possible to define in MongoDB to put all of these values in single field "temperature"?
Doesn't matter if it is "temp" or "T" or "tempC", just put all of them in "temperature" field.
Here is an example of my data:
[
{ "ip": "12:3B:6A:1A:E6:8B", "type": 0, "t": 37},
{ "ip": "22:33:66:1A:E6:8B", "type": 1, "temperature": 40},
{ "ip": "1A:3C:6A:1A:E6:8B", "type": 1, "temp": 30}
]
I want to put temp, t and temperature in Temperature field in my collection.

You can use $ifNull operator to control which value should be transferred into your output, like below:
db.col.aggregate([
{
$addFields: { Temperature: { $ifNull: [ { $ifNull: [ "$t", "$temperature"] }, "$temp" ] } }
},
{
$project: {
t: 0,
temperature: 0,
temp: 0
}
}
])
This will merge that three fields into one Temperature taking first not empty value. Additionally if you want to update your collection, you can add $out as a last aggregation stage like { $out: col } but keep in mind that it will entirely replace your source collection.

I think mongodb supports regular expression but they are meant to search datas, not to insert them based on fieldname matches.
I am quite sure you shall use some kind of facade in front of your database to achieve that.

Related

Trying to fetch data from Nested MongoDB Database?

I am beginner in MongoDB and struck at a place I am trying to fetch data from nested array but is it taking so long time as data is around 50K data, also it is not much accurate data, below is schema structure please see once -
{
"_id": {
"$oid": "6001df3312ac8b33c9d26b86"
},
"City": "Los Angeles",
"State":"California",
"Details": [
{
"Name": "Shawn",
"age": "55",
"Gender": "Male",
"profession": " A science teacher with STEM",
"inDate": "2021-01-15 23:12:17",
"Cars": [
"BMW","Ford","Opel"
],
"language": "English"
},
{
"Name": "Nicole",
"age": "21",
"Gender": "Female",
"profession": "Law student",
"inDate": "2021-01-16 13:45:00",
"Cars": [
"Opel"
],
"language": "English"
}
],
"date": "2021-01-16"
}
Here I am trying to filter date with date and Details.Cars like
db.getCollection('news').find({"Details.Cars":"BMW","date":"2021-01-16"}
it is returning details of other persons too which do not have cars- BMW , Only trying to display details of person like - Shawn which have BMW or special array value and date too not - Nicole, rest should not appear but is it not happening.
Any help is appreciated. :)
A combination of $match on the top-level fields and $filter on the array elements will do what you seek.
db.foo.aggregate([
{$match: {"date":"2021-01-16"}}
,{$addFields: {"Details": {$filter: {
input: "$Details",
as: "zz",
cond: { $in: ['BMW','$$zz.Cars'] }
}}
}}
,{$match: {$expr: { $gt:[{$size:"$Details"},0] } }}
]);
Notes:
$unwind is overly expensive for what is needed here and it likely means "reassembling" the data shape later.
We use $addFields where the new field to add (Details) already exists. This effectively means "overwrite in place" and is a common idiom when filtering an array.
The second $match will eliminate docs where the date matches but not a single entry in Details.Cars is a BMW i.e. the array has been filtered down to zero length. Sometimes you want to know this info so if this is the case, do not add the final $match.
I recommend you look into using real dates i.e. ISODate instead of strings so that you can easily take advantage of MongoDB date math and date formatting functions.
Is a common mistake think that find({nested.array:value}) will return only the nested object but actually, this query return the whole object which has a nested object with desired value.
The query is returning the whole document where value BMW exists in the array Details.Cars. So, Nicole is returned too.
To solve this problem:
To get multiple elements that match the criteria you can do an aggregation stage using $unwind to separate the different objects into array and match by the criteria you want.
db.collection.aggregate([
{
"$match": { "Details.Cars": "BMW", "date": "2021-01-26" }
},
{
"$unwind": "$Details"
},
{
"$match": { "Details.Cars": "BMW" }
}
])
This query first match by the criteria to avoid $unwind over all collection.
Then $unwind to get every document and $match again to get only the documents you want.
Example here
To get only one element (for example, if you match by _id and its unique) you can use $elemMatch in this way:
db.collection.find({
"Details.Cars": "BMW",
"date": "2021-01-16"
},
{
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
}
})
Example here
You can use $elemenMatch into query or projection stage. Docs here and here
Using $elemMatch into query the way is this:
db.collection.find({
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
},
"date": "2021-01-16"
},
{
"Details.$": 1
})
Example here
The result is the same. In the second case you are using positional operator to return, as docs says:
The first element that matches the query condition on the array.
That is, the first element where "Cars": "BMW".
You can choose the way you want.

Search values using Index in mongodb

I am new to Mongodb and wish to implement search on field in mongo collection.
I have the following structure for my test collection:-
{
'key': <unique key>,
'val_arr': [
['laptop', 'macbook pro', '16gb', 'i9', 'spacegrey'],
['cellphone', 'iPhone', '4gb', 't2', 'rose gold'],
['laptop', 'macbook air', '8gb', 'i5', 'black'],
['router', 'huawei', '10x10', 'white'],
['laptop', 'macbook', '8gb', 'i5', 'silve'],
}
And I wish to find them based on index number and value, i.e.
Find the entry where first element in any of the val_arr is laptop and 3rd element's value is 8gb.
I tried looking at composite indexes in mongodb, but they have a limit of 32 keys to be indexed. Any help in this direction is appreciated.
There is a limit on indexes here but it really should not matter. In your case you actually say 'key': <unique key>. So if that really is "unique" then it's the only thing in the collection that need be indexed, as long as you actually include that "key" as part of every query you make since this will determine you to select a document.
Indexes on arrays "within" a document really don't matter that much unless you actually intend to search directly for those elements within a document. That might be the case, but this actually has no bearing on matching your values by numbered index positions:
db.collection.find(
{
"val_arr": {
"$elemMatch": { "0": "laptop", "2": "8gb" }
}
},
{ "val_arr.$": 1 }
)
Which would return:
{
"val_arr" : [
[
"laptop",
"macbook air",
"8gb",
"i5",
"black"
]
]
}
The $elemMatch allows you to express "multiple conditions" on the same array element. This is needed over standard dot notation forms because otherwise the condition is simply looking for "any" array member which matches the value at the index. For instance:
db.collection.find({ "val_arr.0": "laptop", "val_arr.2": "4gb" })
Actually matches the given document even though that "combination" does not exist on a single "row", but both values are actually present in the array as a whole. But just in different members. Using those same values with $elemMatch makes sure the pair is matched on the same element.
Note the { "val_arr.$": 1 } in the above example, which is the projection for the "single" matched element. That is optional, but this is just to talk about identifying the matches.
Using .find() this is as much as you can do and is a limitation of the positional operator in that it can only identify one matching element. The way to do this for "multiple matches" is to use aggregate() with $filter:
db.collection.aggregate([
{ "$match": {
"val_arr": {
"$elemMatch": { "0": "laptop", "2": "8gb" }
}
}},
{ "$addFields": {
"val_arr": {
"$filter": {
"input": "$val_arr",
"cond": {
"$and": [
{ "$eq": [ { "$arrayElemAt": [ "$$this", 0 ] }, "laptop" ] },
{ "$eq": [ { "$arrayElemAt": [ "$$this", 2 ] }, "8gb" ] }
]
}
}
}
}}
])
Which returns:
{
"key" : "k",
"val_arr" : [
[
"laptop",
"macbook air",
"8gb",
"i5",
"black"
],
[
"laptop",
"macbook",
"8gb",
"i5",
"silve"
]
]
}
The initial query conditions which actually select the matching document go into the $match and are exactly the same as the query conditions shown earlier. The $filter is applied to just get the elements which actually match it's conditions. Those conditions do a similar usage of $arrayElemAt inside the logical expression as to how the index values of "0" and "2" are applies in the query conditions itself.
Using any aggregation expression incurs an additional cost over the standard query engine capabilities. So it is always best to consider if you really need it before you dive and and use the statement. Regular query expressions are always better as long as they do the job.
Changing Structure
Of course whilst it's possible to match on index positions of an array, none of this actually helps in being able to actually create an "index" which can be used to speed up queries.
The best course here is to actually use meaningful property names instead of plain arrays:
{
'key': "k",
'val_arr': [
{
'type': 'laptop',
'name': 'macbook pro',
'memory': '16gb',
'processor': 'i9',
'color': 'spacegrey'
},
{
'type': 'cellphone',
'name': 'iPhone',
'memory': '4gb',
'processor': 't2',
'color': 'rose gold'
},
{
'type': 'laptop',
'name': 'macbook air',
'memory': '8gb',
'processor': 'i5',
'color': 'black'
},
{
'type':'router',
'name': 'huawei',
'size': '10x10',
'color': 'white'
},
{
'type': 'laptop',
'name': 'macbook',
'memory': '8gb',
'processor': 'i5',
'color': 'silve'
}
]
}
This does allow you "within reason" to include the paths to property names within the array as part of a compound index. For example:
db.collection.createIndex({ "val_arr.type": 1, "val_arr.memory": 1 })
And then actually issuing queries looks far more descriptive in the code than cryptic values of 0 and 2:
db.collection.aggregate([
{ "$match": {
"val_arr": {
"$elemMatch": { "type": "laptop", "memory": "8gb" }
}
}},
{ "$addFields": {
"val_arr": {
"$filter": {
"input": "$val_arr",
"cond": {
"$and": [
{ "$eq": [ "$$this.type", "laptop" ] },
{ "$eq": [ "$$this.memory", "8gb" ] }
]
}
}
}
}}
])
Expected results, and more meaningful:
{
"key" : "k",
"val_arr" : [
{
"type" : "laptop",
"name" : "macbook air",
"memory" : "8gb",
"processor" : "i5",
"color" : "black"
},
{
"type" : "laptop",
"name" : "macbook",
"memory" : "8gb",
"processor" : "i5",
"color" : "silve"
}
]
}
The common reason most people arrive at a structure like you have in the question is typically because they think they are saving space. This is not simply not true, and with most modern optimizations to the storage engines MongoDB uses it's basically irrelevant over any small gains that might have been anticipated.
Therefore, for the sake of "clarity" and also in order to actually support indexing on the data within your "arrays" you really should be changing the structure and use named properties here instead.
And again, if your entire usage pattern of this data is not using the key property of the document in queries, then it probably would be better to store those entries as separate documents to begin with instead of being in an array at all. That also makes getting results more efficient.
So to break that all down your options here really are:
You actually always include key as part of your query, so indexes anywhere else but on that property do not matter.
You change to using named properties for the values on the array members allowing you to index on those properties without hitting "Multikey limitations"
You decide you never access this data using the key anyway, so you just write all the array data as separate documents in the collection with proper named properties.
Going with one of those that actually suits your needs best is essentially the solution allowing you to efficiently deal with the sort of data you have.
N.B Nothing to do with the topic at hand really ( except maybe a note on storage size ), but it would generally be recommended that things with an inherent numeric value such as the memory or "8gb" types of data actually be expressed as numeric rather than "strings".
The simple reasoning is that whilst you can query for "8gb" as an equality, this does not help you with ranges such as "between 4 and 12 gigabytes.
Therefore it usually makes much more sense to use numeric values like 8 or even 8000. Note that numeric values will actually have an impact on storage in that they will typically take less space than strings. Which given that the omission of property names may have been attempting to reduce storage but does nothing, does show an actual area where storage size can be reduced as well.

Update Array Children Sorted Order

I have a collection containing objects with the following structure
{
"dep_id": "some_id",
"departament": "dep name",
"employees": [{
"name": "emp1",
"age": 31
},{
"name": "emp2",
"age": 35
}]
}
I would like to sort and save the array of employees for the object with id "some_id", by employees.age, descending. The best outcome would be to do this atomically using mongodb's query language. Is this possible?
If not, how can I rearrange the subdocuments without affecting the parent's other data or the data of the subdocuments? In case I have to download the data from the database and save back the sorted array of children, what would happen if something else performs an update to one of the children or children are added or removed in the meantime?
In the end, the data should be persisted to the database like this:
{
"dep_id": "some_id",
"departament": "dep name",
"employees": [{
"name": "emp2",
"age": 35
},{
"name": "emp1",
"age": 31
}]
}
The best way to do this is to actually apply the $sort modifier as you add items to the array. As you say in your comment "My actual objects have a "rank" and 'created_at'", which means that you really should have asked that in your question instead of writing a "contrived" case ( don't know why people do that ).
So for "sorting" by multiple properties, the following reference would adjust like this:
db.collection.update(
{ },
{ "$push": { "employees": { "$each": [], "$sort": { "rank": -1, "created_at": -1 } } } },
{ "multi": true }
)
But to update all the data you presently have "as is shown in the question", then you would sort on "age" with:
db.collection.update(
{ },
{ "$push": { "employees": { "$each": [], "$sort": { "age": -1 } } } },
{ "multi": true }
)
Which oddly uses $push to actually "modify" an array? Yes it's true, since the $each modifier says we are not actually adding anything new yet the $sort modifier is actually going to apply to the array in place and "re-order" it.
Of course this would then explain how "new" updates to the array should be written in order to apply that $sort and ensure that the "largest age" is always "first" in the array:
db.collection.update(
{ "dep_id": "some_id" },
{ "$push": {
"employees": {
"$each": [{ "name": "emp": 3, "age": 32 }],
"$sort": { "age": -1 }
}
}}
)
So what happens here is as you add the new entry to the array on update, the $sort modifier is applied and re-positions the new element between the two existing ones since that is where it would sort to.
This is a common pattern with MongoDB and is typically used in combination with the $slice modifier in order to keep arrays at a "maximum" length as new items are added, yet retain "ordered" results. And quite often "ranking" is the exact usage.
So overall, you can actually "update" your existing data and re-order it with "one simple atomic statement". No looping or collection renaming required. Furthermore, you now have a simple atomic method to "update" the data and maintain that order as you add new array items, or remove them.
In order to get what you want you can use the following query:
db.collection.aggregate({
$unwind: "$employees" // flatten employees array
}, {
$sort: {
"employees.name": -1 // sort all documents by employee name (descending)
}
}, {
$group: { // restore the previous structure
_id: "$_id",
"dep_id": {
$first: "$dep_id"
},
"departament": {
$first: "$departament"
},
"employees": {
$push: "$employees"
},
}
}, {
$out: "output" // write everything out to a separate collection
})
After this step you would want to drop your source table and rename the "output" collection to match your source table name.
This solution will, however, not deal with the concurrency issue. So you should remove write access from the collection first so nobody modifies it during the process and then restore it once you're done with the migration.
You could alternatively query all data first, then sort the employees array on the client side and then use either single update queries or - faster but more complicated - a bulk write operation with all the individual update calls in order to update the existing documents. Here, you could use the entire document that you've initially read as a filter for the update operation. So if an individual update does not modify any document you'd know straight away, that some other change must have modified the document you read before. Those cases you'd need to retry later (or straight away until the update does actually modify a document).

Group result mongoDB

I have a collection with array countries values like this. I want to sum the values of the countries.
{
"_id": ObjectId("54cd5e7804f3b06c3c247428"),
"country_json": {
"AE": NumberLong("13"),
"RU": NumberLong("16"),
"BA": NumberLong("10"),
...
}
},
{
"_id": ObjectId("54cd5e7804f3b06c3c247429"),
"country_json": {
"RU": NumberLong("12"),
"ES": NumberLong("28"),
"DE": NumberLong("16"),
"AU": NumberLong("44"),
...
}
}
How to sum the values of countries to get a result like this?
{
"AE": 13,
"RU": 28,
..
}
This can simply be done using aggregation
> db.test.aggregate([
{$project: {
RU: "$country_json.RU",
AE: "$country_json.AE",
BA: "$country_json.BA"
}},
{$group: {
_id: null,
RU: {$sum: "$RU"},
AE: {$sum: "$AE"},
BA: {$sum: "$BA"}
}
])
Output:
{
"_id" : null,
"RU" : NumberLong(28),
"AE" : NumberLong(13),
"BA" : NumberLong(10)
}
This isn't a very good document structure if you intend to aggregate statistics across the "keys" like this. Not really a fan of "data as key names" anyway, but the main point is it does not "play well" with many MongoDB query forms due to the key names being different everywhere.
Particularly with the aggregation framework, a better form to store the data is within an actual array, like so:
{
"_id": ObjectId("54cd5e7804f3b06c3c247428"),
"countries": [
{ "key": "AE", "value": NumberLong("13"),
{ "key": "RU", "value": NumberLong("16"),
{ "key": "BA", "value": NumberLong("10")
]
}
With that you can simply use the aggregation operations:
db.collection.aggregate([
{ "$unwind": "$countries" },
{ "$group": {
"_id": "$countries.key",
"value": { "$sum": "$countries.value" }
}}
])
Which would give you results like:
{ "_id": "AE", "value": NumberLong(13) },
{ "_id": "RU", "value": NumberLong(28) }
That kind of structure does "play well" with the aggregation framework and other MongoDB query patterns because it really is how it's "expected" to be done when you want to use the data in this way.
Without changing the structure of the document you are forced to use JavaScript evaluation methods in order to traverse the keys of your documents because that is the only way to do it with MongoDB:
db.collection.mapReduce(
function() {
var country = this.country_json;
Object.keys(country).forEach(function(key) {
emit( key, country[key] );
});
},
function(key,values) {
return values.reduce(function(p,v) { return NumberLong(p+v) });
},
{ "out": { "inline": 1 } }
)
And that would produce exactly the same result as shown from the aggregation example output, but working with the current document structure. Of course, the use of JavaScript evaluation is not as efficient as the native methods used by the aggregation framework so it's not going to perform as well.
Also note the possible problems here with "large values" in your cast NumberLong fields, since the main reason they are represented that way to JavaScipt is that JavaScipt itself has limitations on the size of that value than can be represented. Likely your values are just trivial but simply "cast" that way, but for large enough numbers as per the intent, then the math will simply fail.
So it's generally a good idea to consider changing how you structure this data to make things easier. As a final note, the sort of output you were expecting with all the keys in a single document is similarly counter intuitive, as again it requires traversing keys of a "hash/map" rather than using the natural iterators of arrays or cursors.

How to return index of array item in Mongodb?

The document is like below.
{
"title": "Book1",
"dailyactiviescores":[
{
"date": 2013-06-05,
"score": 10,
},
{
"date": 2013-06-06,
"score": 21,
},
]
}
The daily active score is intended to increase once the book is opened by a reader. The first solution comes to mind is use "$" to find whether target date has a score or not, and deal with it.
err = bookCollection.Update(
{"title":"Book1", "dailyactivescore.date": 2013-06-06},
{"$inc":{"dailyactivescore.$.score": 1}})
if err == ErrNotFound {
bookCollection.Update({"title":"Book1"}, {"$push":...})
}
But I cannot help to think is there any way to return the index of an item inside array? If so, I could use one query to do the job rather than two. Like this.
index = bookCollection.Find(
{"title":"Book1", "dailyactivescore.date": 2013-06-06}).Select({"$index"})
if index != -1 {
incTarget = FormatString("dailyactivescore.%d.score", index)
bookCollection.Update(..., {"$inc": {incTarget: 1}})
} else {
//push here
}
Incrementing a field that's not present isn't the issue as doing $inc:1 on it will just create it and set it to 1 post-increment. The issue is when you don't have an array item corresponding to the date you want to increment.
There are several possible solutions here (that don't involve multiple steps to increment).
One is to pre-create all the dates in the array elements with scores:0 like so:
{
"title": "Book1",
"dailyactiviescores":[
{
"date": 2013-06-01,
"score": 0,
},
{
"date": 2013-06-02,
"score": 0,
},
{
"date": 2013-06-03,
"score": 0,
},
{
"date": 2013-06-04,
"score": 0,
},
{
"date": 2013-06-05,
"score": 0,
},
{
"date": 2013-06-06,
"score": 0
}, { etc ... }
]
}
But how far into the future to go? So one option here is to "bucket" - for example, have an activities document "per month" and before the start of a month have a job that creates the new documents for next month. Slightly yucky. But it'll work.
Other options involve slight changes in schema.
You can use a collection with book, date, activity_scores. Then you can use a simple upsert to increment a score:
db.books.update({title:"Book1", date:"2013-06-02", {$inc:{score:1}}, {upsert:true})
This will increment the score or insert new record with score:1 for this book and date and your collection will look like this:
{
"title": "Book1",
"date": 2013-06-01,
"score": 10,
},
{
"title": "Book1",
"date": 2013-06-02,
"score": 1,
}, ...
Depending on how much you simplified your example from your real use case, this might work well.
Another option is to stick with the array but switch to using the date string as a key that you increment:
Schema:
{
"title": "Book1",
"dailyactiviescores":{
{ "2013-06-01":10},
{ "2013-06-02":8}
}
}
Note it's now a subdocument and not an array and you can do:
db.books.update({title:"Book1"}, {"dailyactivityscores.2013-06-03":{$inc:1}})
and it will add a new date into the subdocument and increment it resulting in:
{
"title": "Book1",
"dailyactiviescores":{
{ "2013-06-01":10},
{ "2013-06-02":8},
{ "2013-06-03":1}
}
}
Note it's now harder to "add-up" the scores for the book so you can atomically also update a "subtotal" in the same update statement whether it's for all time or just for the month.
But here it's once again problematic to keep adding days to this subdocument - what happens when you're still around in a few years and these book documents grow hugely?
I suspect that unless you will only be keeping activity scores for the last N days (which you can do with capped array feature in 2.4) it will be simpler to have a separate collection for book-activity-score tracking where each book-day is a separate document than to embed the scores for each day into the book in a collection of books.
According to the docs:
The $inc operator increments a value of a field by a specified amount.
If the field does not exist, $inc sets the field to the specified
amount.
So, if there won't be a score field in the array item, $inc will set it to 1 in your case, like this:
{
"title": "Book1",
"dailyactiviescores":[
{
"date": 2013-06-05,
"score": 10,
},
{
"date": 2013-06-06,
},
]
}
bookCollection.Update(
{"title":"Book1", "dailyactivescore.date": 2013-06-06},
{"$inc":{"dailyactivescore.$.score": 1}})
will result into:
{
"title": "Book1",
"dailyactiviescores":[
{
"date": 2013-06-05,
"score": 10,
},
{
"date": 2013-06-06,
"score": 1
},
]
}
Hope that helps.