Cast string to number in Mongodb [duplicate] - mongodb

I have a collection of documents that have a value that is known to be a number, but is stored as a string. It is out of my control to change the type of the field, but I want to use that field in an aggregation (say, to average it).
It seems that I should be using a projection prior to grouping, and in that projection convert the field as needed. I can't seem to get the syntax just right - everything I try either gives me NaN, or the new field is simply missing from the next step in the aggregation.
$project: {
value: '$value',
valueasnumber: ????
}
Given the very simple example above, where the contents of $value in all documents are string type, but will parse to a number, what do I do to make valueasnumber a new (non-existing) field that is of type double with the parsed version of $value in it?
I've tried things like the examples below (and about a dozen similar things):
{ $add: new Number('$value').valueOf() }
new Number('$value').valueOf()
Am I barking up the wrong tree entirely? Any help would be greatly appreciated!
(To be 100% clear, below is how I would like to use the new field).
$group {
score: {
$avg: '$valueasnumber'
}
}

One of the way which I can think of is to use a mongo shell javascript to modify the document by adding new number field, valuesasnumber (number conversion of existing string 'value' field) in the existing document or in the new doc. Then using this numeric field for further calculations.
db.numbertest.find().forEach(function(doc) {
doc.valueasnumber = new NumberInt(doc.value);
db.numbertest.save(doc);
});
Using the valueasnumber field for numeric calculation
db.numbertest.aggregate([{$group :
{_id : null,
"score" : {$avg : "$valueasnumber"}
}
}]);

The core operation is to convert value from string to number which is unable to handled in aggregate pipeline operation currently.
mapReduce is an alternative as below.
db.c.mapReduce(function() {
emit( this.groupId, {score: Number(this.value), count: 1} );
}, function(key, values) {
var score = 0, count = 0;
for (var i = 0; i < values.length; i++) {
score += values[i].score;
count += values[i].count;
}
return {score: score, count: count};
}, {finalize: function(key, value) {
return {score: value.score / value.count};
}, out: {inline: 1}});

Now there is $toInt conversion operators in aggregation, you can check:
https://jira.mongodb.org/browse/SERVER-11400

Related

Is it possible to count distinct Documents in MongoTemplate?

Is it possible to somehow chain distinct(...) and countDocuments(...) in mongoTemplate.
Something like this
mongoTemplate.getCollection("foo").distinct("bar", Foo.class).countDocuments();
To keep in mind I will have a few million results, so I dont want to create a bottleneck in the jvm by getting all all distinct entities into an array and then getting the size of it. I rather want to get a number from MongoDB and dont bother JVM.
Yes, It is possible to get count of distinct documents using mongoTemplate.
Mongo shell query
db.foo.aggregate([{
$group: {
_id: "$bar"
}
}, {
$count: "total"
}]);
Output of this query will be
{
"total" : 8
}
To get this result using MongoTemplate:
GroupOperation groupOperation = Aggregation.group("bar");
CountOperation countOperation = Aggregation.count().as("total");
Aggregation aggregation = Aggregation.newAggregation(groupOperation, countOperation);
Document result = mongoTemplate.aggregate(aggregation, "foo", Document.class)
.getUniqueMappedResult();
Integer total = Objects.nonNull(result) ? result.getInteger("total") : 0;
Last time I remember that I used the Aggregation Pipeline Operators by which I grouped the collection(which will give you distinct values) and then use count() on top of it.
For Example:
Aggregation pipeline = newAggregation(
group(fields("foo","bar")),
group("_id.bar").count().as("distinctCount")
);
Else use the following one liner:
return mongoTemplate.aggregate(aggregation,Class.COLLECTION_NAME,BasicDBObject.class).getMappedResult();
// in this case make sure this function's return type is Integer or Long not int or long
NOTE: in this case, make sure the function's return type is Integer or Long not int or long as int and long are primitive data types and they do not contain null. However, in case, there is no data, the aggregation logic might return null hence the use of Long or Integer (object could be null)
You can use Mongo Aggrigate with $group.
db.foo.aggregate([{
'$group': {
'_id': '_id',
'count': {
'$sum': 1
}
}]);
You will get:
{ "_id":"_id", "count":12}

how to calculate sum of string in mongodb?

In mongodb I want to calculate sum of partialAmount field which is of string type and in this field values are stored as "20,00","15,00".
How to calculate sum of all values. Both of the queries I have tried are returning 0.
collection.aggregate([
{
$group: {
_id: null,
sum: { $sum: "$$partialAmount" }
}
}
]);
And:
collection.aggregate([
{
$group: {
_id: null,
totalAmount: {
$sum: {
$toDouble: "$partialAmount"
}
}
}
}
]);
Your first query is obviously not going to work cause you're trying to sum strings, and also you have an extra "$" in "$$partialAmount".
Your second query would work if your partialAmount-s were stored in the format "15.00" and "20.00", see here.
If they are saved as "15,00" and "20,00" in the db, your second query should throw an error, not return 0. (If you are actually getting a zero result, then maybe your "partialAmount" field is misspelled in the db, or the field gets lost in a previous stage of the pipeline)
In this case you need either change the values in your db to the "20.00" format, or if this is not feasible, use $split and $concat to convert to the proper format like this, before converting to double and summing up the values.

Split a string during MongoDB aggregate

Currently, I have just fullname stored in the User collection in MongoDB. I'd like to run a report that splits the first and last name so for now I'm trying to run an aggregate and split the string when a whitespace is found.
Here is what I have now, but I'd like to replace the hard coded end position with a variable based on where whitespace is found. Is this possible in an aggregate pipeline?
db.users.aggregate([{
$project : {
fullname:{ $toUpper:"$fullname" },
first: { $substr: [ "$fullname", 0, 2 ]}, _id:0 }
}, { $sort : { fullname : 1 }
}]);
The aggregation framework does not have any operator to perform a "split" based on a matched character or any such thing. There is only $substr which of course requires an index, and there is no operator to return a "index" of a matched character either.
You could use mapReduce, which can use JavaScript .split(), but of course there is no "sort stage" in mapReduce other than the results in the main key which are always pre-sorted before attempting to apply a reduce ( which would not be applied here with all unique keys ):
db.users.mapReduce(
function() {
var lastName = this.fullname.split(/\s/).reverse()[0].toUpperCase();
emit({ "lastName": lastName, "orig": this._id },this);
},
function(){}, // Never called on all unique
{ "out": { "inline": 1 } }
);
And that will basically extract the last name after a whitespace, convert it to uppercase and use it as a composite value in the primary key so results will be sorted by that key ( note you cannot use _id as any part of the key name or it will be sorted by that field instead ).
But if your real case here is "sorting", then you are better off storing the data that way, thus giving you a direct value to sort on without calculation:
var bulk = db.users.initializeOrderedBulkOp(),
count = 0;
db.users.find().forEach(user) {
bulk.find({ "_id": user._id }).updateOne({
"$set": { "lastName": user.fullname.split(/\s/).reverse()[0].toUpperCase() }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.users.initializeOrderedBulkOp();
}
}
if ( count % 1000 != 0 )
bulk.execute();
Then with a solid field in place you just run your sort:
db.users.find().sort({ "lastName": 1 });
Which is going to be a lot faster than trying to calculate a value from which to perform a sort.
Of course if sorting is not the purpose and it's just for presentation, then just perform the split in client code where it makes the most sense to do so. The aggregation framework cannot restructure the data like that, and while mapReduce "could", it's output is very opinionated and not really purposed for such an operation.

Publish all fields in document but just part of an array in the document

I have a mongo collection in which the documents have a field that is an array. I want to be able to publish everything in the documents except for the elements in the array that were created more than a day ago. I suspect the answer will be somewhat similar to this question.
Meteor publication: Hiding certain fields in an array document field?
Instead of limiting fields in the array, I just want to limit the elements in the array being published.
Thanks in advance for any responses!
EDIT
Here is an example document:
{
_id: 123456,
name: "Unit 1",
createdAt: (datetime object),
settings: *some stuff*,
packets: [
{
_id: 32412312,
temperature: 70,
createdAt: *datetime object from today*
},
{
_id: 32412312,
temperature: 70,
createdAt: *datetime from yesterday*
}
]
}
I want to get everything in this document except for the part of the array that was created more than 24 hours ago. I know I can accomplish this by moving the packets into their own collection and tying them together with keys as in a relational database but if what I am asking were possible, this would be simpler with less code.
You could do something like this in your publish method:
Meteor.publish("pubName", function() {
var collection = Collection.find().fetch(); //change this to return your data
_.each(collection, function(collectionItem) {
_.each(collectionItem.packets, function(packet, index) {
var deadline = Date.now() - 86400000 //should equal 24 hrs ago
if (packet.createdAt < deadline) {
collectionItem.packets.splice(index, 1);
}
}
}
return collection;
}
Though you might be better off storing the last 24 hours worth of packets as a separate array in your document. Would probably be less taxing on the server, not sure.
Also, code above is untested. Good luck.
you can use the $elemMatch projection
http://docs.mongodb.org/manual/reference/operator/projection/elemMatch/
So in your case, it would be
var today = new Date();
var yesterday = new Date(today);
yesterday.setDate(today.getDate() - 1);
collection.find({}, //find anything or specifc
{
fields: {
'packets': {
$elemMatch: {$gt : {'createdAt' : yesterday /* or some new Date() */}}
}
}
});
However, $elemMatch only returns the FIRST element matching your condition. To return more than 1 element, you need to use the aggregation framework, which will be more efficient than _.each or forEach, particularly if you have a large array to loop through.
collection.rawCollection().aggregate([
{
$match: {}
},
{
$redact: {
$cond: {
if : {$or: [{$gt: ["$createdAt",yesterday]},"$packets"]},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}], function (error, result ){
});
You specify the $match in a way similar to find({}). Then all the documents that match your conditions get pipped into the $redact which is specified by the $cond.
$redact scans the document from top level to bottom. At the top level, you have _id, name, createdAt, settings, packets; hence {$or: [***,"$packets"]}
The presence of $packets in the $or allows the $redact to scan the second level which contain the _id, temperature and createdAt; hence {$gt: ["$createdAt",yesterday]}
This is async, you can use Meteor.wrapAsync to wrap around the function.
Hope this help

$unwind an object in aggregation framework

In the MongoDB aggregation framework, I was hoping to use the $unwind operator on an object (ie. a JSON collection). Doesn't look like this is possible, is there a workaround? Are there plans to implement this?
For example, take the article collection from the aggregation documentation . Suppose there is an additional field "ratings" that is a map from user -> rating. Could you calculate the average rating for each user?
Other than this, I'm quite pleased with the aggregation framework.
Update: here's a simplified version of my JSON collection per request. I'm storing genomic data. I can't really make genotypes an array, because the most common lookup is to get the genotype for a random person.
variants: [
{
name: 'variant1',
genotypes: {
person1: 2,
person2: 5,
person3: 7,
}
},
{
name: 'variant2',
genotypes: {
person1: 3,
person2: 3,
person3: 2,
}
}
]
It is not possible to do the type of computation you are describing with the aggregation framework - and it's not because there is no $unwind method for non-arrays. Even if the person:value objects were documents in an array, $unwind would not help.
The "group by" functionality (whether in MongoDB or in any relational database) is done on the value of a field or column. We group by value of field and sum/average/etc based on the value of another field.
Simple example is a variant of what you suggest, ratings field added to the example article collection, but not as a map from user to rating but as an array like this:
{ title : title of article", ...
ratings: [
{ voter: "user1", score: 5 },
{ voter: "user2", score: 8 },
{ voter: "user3", score: 7 }
]
}
Now you can aggregate this with:
[ {$unwind: "$ratings"},
{$group : {_id : "$ratings.voter", averageScore: {$avg:"$ratings.score"} } }
]
But this example structured as you describe it would look like this:
{ title : title of article", ...
ratings: {
user1: 5,
user2: 8,
user3: 7
}
}
or even this:
{ title : title of article", ...
ratings: [
{ user1: 5 },
{ user2: 8 },
{ user3: 7 }
]
}
Even if you could $unwind this, there is nothing to aggregate on here. Unless you know the complete list of all possible keys (users) you cannot do much with this. [*]
An analogous relational DB schema to what you have would be:
CREATE TABLE T (
user1: integer,
user2: integer,
user3: integer
...
);
That's not what would be done, instead we would do this:
CREATE TABLE T (
username: varchar(32),
score: integer
);
and now we aggregate using SQL:
select username, avg(score) from T group by username;
There is an enhancement request for MongoDB that may allow you to do this in the aggregation framework in the future - the ability to project values to keys to vice versa. Meanwhile, there is always map/reduce.
[*] There is a complicated way to do this if you know all unique keys (you can find all unique keys with a method similar to this) but if you know all the keys you may as well just run a sequence of queries of the form db.articles.find({"ratings.user1":{$exists:true}},{_id:0,"ratings.user1":1}) for each userX which will return all their ratings and you can sum and average them simply enough rather than do a very complex projection the aggregation framework would require.
Since 3.4.4, you can transform object to array using $objectToArray
See:
https://docs.mongodb.com/manual/reference/operator/aggregation/objectToArray/
This is an old question, but I've run across a tidbit of information through trial and error that people may find useful.
It's actually possible to unwind on a dummy value by fooling the parser this way:
db.Opportunity.aggregate(
{ $project: {
Field1: 1, Field2: 1, Field3: 1,
DummyUnwindField: { $ifNull: [null, [1.0]] }
}
},
{ $unwind: "$DummyUnwindField" }
);
This will produce 1 row per document, regardless of whether or not the value exists. You may be able tinker with this to generate the results you want. I had hoped to combine this with multiple $unwinds to (sort of like emit() in map/reduce), but alas, the last $unwind wins or they combine as an intersection rather than union which makes it impossible to achieve the results I was looking for. I am sadly disappointed with the aggregate framework functionality as it doesn't fit the one use case I was hoping to use it for (and seems strangely like a lot of the questions on StackOverflow in this area are asking) - ordering results based on match rate. Improving the poor map reduce performance would have made this entire feature unnecessary.
This is what I found & extended.
Lets create experimental database in mongo
db.copyDatabase('livedb' , 'experimentdb')
Now Use experimentdb & convert Array to object in your experimentcollection
db.getCollection('experimentcollection').find({}).forEach(function(e){
if(e.store){
e.ratings = [e.ratings]; //Objects name to be converted to array eg:ratings
db.experimentcollection.save(e);
}
})
Some nerdy js code to convert json to flat object
var flatArray = [];
var data = db.experimentcollection.find().toArray();
for (var index = 0; index < data.length; index++) {
var flatObject = {};
for (var prop in data[index]) {
var value = data[index][prop];
if (Array.isArray(value) && prop === 'ratings') {
for (var i = 0; i < value.length; i++) {
for (var inProp in value[i]) {
flatObject[inProp] = value[i][inProp];
}
}
}else{
flatObject[prop] = value;
}
}
flatArray.push(flatObject);
}
printjson(flatArray);