How to do fast query on String dataType in MongoDB, where values are of type double - mongodb

I have a field contractValue and other fields in a collection contract which is of type String . It basically holds double value like 1200 or 1500 but at some places it may contain value like $1200 or $1500.
Sample data from collection:
{ ..
..
contractValue: "1200", //This is the one stored as String. I need
// to perform range query over it
..
..
}
{ ..
..
contractValue: "$1500",
..
..
}
I have requirement where i need to fetch contracts based on contract values. Query can be like below:
{$and: [ {'contractValue': {$gt: 100}}, {'contractValue': {$lt: 1000 }}]}
This query is giving me wrong result. It is also giving me documents having contractValue like 1238999
Also I need to create indexes on contractValue
Is it possible to create index on contract value , so that I can efficiently make range query, so that whenever making any query, it will do < or > on Index and will fetch exact set of documents, rather than making change in schema?
How to handle values like $1200 in index, so index value just contain 1200 as integer
rather than $1200

try this:
https://mongoplayground.net/p/TG3Y5tdh9aK
it assumes string data will be either a quoted number or a quoted number with "$" at the front
db.collection.aggregate([
{
$project: {
"newContractValue": {
"$convert": {
"input": "$contractValue",
"to": "double",
"onError": {
$toDouble: {
"$substr": [
"$contractValue",
1,
{
"$strLenCP": "$contractValue"
}
]
}
}
}
}
}
},
{
$match: {
$and: [
{
"newContractValue": {
$gt: 100
}
},
{
"newContractValue": {
$lt: 1000
}
}
]
}
}
])
This can be used to set a new contractValueNew field as number from the existing contractValue
db.getCollection('yourCollection').find({})
.forEach(function(record) {
if(record.contractValue.toString().substring(0, 1) == '$') {
record.contractValueNew = NumberInt(parseInt(record.contractValue.substring(1, record.contractValue.length)));
} else {
record.contractValueNew = NumberInt(parseInt(record.contractValue))
}
db.getCollection('yourCollection').save(record)
})

Try:
db.collection.find({'contractValue': {$gt: 100, $lt: 1000 }})
Create index on contractValue , but convert all values as numbers ...

Related

Using $sum on a existent field returns a value of 0 [duplicate]

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.

Select data where the range between two different fields contains a given number

I want to make a find query on my database for documents that have an input value between or equal to these 2 fields, LOC_CEP_INI and LOC_CEP_FIM
Example: user input a number to the system with value : 69923994, then I use this input to search my database for all documents that have this value between the range of the fields LOC_CEP_INI and LOC_CEP_FIM.
One of my documents (in this example this document is selected by the query because the input is inside the range):
{
"_id" : ObjectId("570d57de457405a61b183ac6"),
"LOC_CEP_FIM" : 69923999, //this field is number
"LOC_CEP_INI" : 69900001, // this field is number
"LOC_NO" : "RIO BRANCO",
"LOC_NU" : "00000016",
"MUN_NU" : "1200401",
"UFE_SG" : "AC",
"create_date" : ISODate("2016-04-12T20:17:34.397Z"),
"__v" : 0
}
db.collection.find( { field: { $gt: value1, $lt: value2 } } );
https://docs.mongodb.com/v3.2/reference/method/db.collection.find/
refer this mongo provide range facility with $gt and $lt .
You have to invert your field names and query value.
db.zipcodes.find({
LOC_CEP_INI: {$gte: 69923997},
LOC_CEP_FIM: {$lte: 69923997}
});
For your query example to work, you would need your documents to hold an array property, and that each item in this prop hold a 69923997 prop. Mongo would then check that this 69923997 prop has a value that is both between "LOC_CEP_INI" and "LOC_CEP_FIM" for each item in your array prop.
Also I'm not sure whether you want LOC_CEP_INI <= 69923997 <= LOC_CEP_FIM or the contrary, so you might need to switch the $gte and $lte conditions.
db.zipcodes.find( {
"LOC_CEP_INI": { "$lte": 69900002 },
"LOC_CEP_FIM": { "$gte": 69900002 } })
Here is the logic use it as per the need:
Userdb.aggregate([
{ "$match": { _id: ObjectId(session._id)}},
{ $project: {
checkout_list: {
$filter: {
input: "$checkout_list",
as: "checkout_list",
cond: {
$and: [
{ $gte: [ "$$checkout_list.createdAt", new Date(date1) ] },
{ $lt: [ "$$checkout_list.createdAt", new Date(date2) ] }
]
}
}
}
}
}
Here i use filter, because of some reason data query on nested data is not gets succeed in mongodb

MongoDB :: Order Search result depend on search condition

I have a data
[{ "name":"BS",
"keyword":"key1",
"city":"xyz"
},
{ "name":"AGS",
"keyword":"Key2",
"city":"xyz1"
},
{ "name":"QQQ",
"keyword":"key3",
"city":"xyz"
},
{ "name":"BS",
"keyword":"Keyword",
"city":"city"
}]
and i need to search records which have name= "BS" OR keyword="key2" with the help of query
db.collection.find({"$OR" : [{"name":"BS"}, {"keyword":"Key2"}]});
These records i need in the sequence
[{ "name":"BS",
"keyword":"key1",
"city":"xyz"
},
{ "name":"BS",
"keyword":"Keyword",
"city":"city"
},
{ "name":"AGS",
"keyword":"Key2",
"city":"xyz1"
}]
but i am getting in following sequences:
[{ "name":"BS",
"keyword":"key1",
"city":"xyz"
},
{ "name":"AGS",
"keyword":"Key2",
"city":"xyz1"
},
{ "name":"BS",
"keyword":"Keyword",
"city":"city"
}]
Please provide some suggestion i am stuck with this problem since 2 days.
Thanks
The order of results returned by MongoDB is not guaranteed unless you explicitly sort your data using the sort function. For smaller datasets you maybe "lucky" in the sense that the results are always returned in the same order, however, for bigger datasets and in particular when you have sharded Mongo clusters this is very unlikely. As proposed by Yathish you need to explicitly order your results using the sort function. Based on the suggested output, it seems you want to sort by name in descending order so I have set the sorting flag to -1 for the field name.
db.collection.find({"$or" : [{"name":"BS"}, {"keyword":"Key2"}]}).sort({"name" : -1});
If you need a more complex sorting algorithm as specified in your comment, you can convert your results to a Javascript array and create a custom sort function. This sort function will first list documents with a name equal to "BS" and then documents containing the keyword "Key2"
db.data.find({
"$or": [{
"name": "BS"
}, {
"keyword": "Key2"
}]
}).toArray().sort(function(doc1, doc2) {
if (doc1.name == "BS" && doc2.keyword == "Key2") {
return -1
} else if (doc2.name == "BS" && doc1.keyword == "Key2") {
return 1
} else {
return doc1.name < doc2.name
}
});

mongodb aggregate query isn't returning proper sum on using $sum

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.

How can I create an index in on an array field in MongoDB?

I have a MongoDB collection with data in the format of:
[
{
"data1":1,
"data2":2,
"data3":3,
"data4":4,
"horses":[
{
"opponent":{
"jockey":"MyFirstName MyLastName",
"name":"MyHorseName",
"age":4,
"sex":"g",
"scratched":"false",
"id":"1"
},
"id":"1"
},
{
"opponent":{
"jockey":"YourFirstName YourLastName",
"name":"YourHorseName",
"age":4,
"sex":"m",
"scratched":"false",
"id":"2"
},
"id":"2"
}
]
},
...
]
Executing the following query returns exactly what I need:
db.race_results.find({ "$and": [ { "horses":
{ "$elemMatch": { "$and": [
{ "opponent.name": "MyFirstName MyLastName" },
{ "opponent.jockey": "MyHorseName"}
] } }
}
]})
However, this query takes 0.5 seconds to execute with my collection (there are a lot of records).
I am trying to find out how to create an index on the horses.opponent.name field of the data. I have read the docs about multikey indexes (here), but I'm not sure if this is exactly what I need or not. What I need (I think) is an index on the array element of horses, but only the name and jockey fields. Is this possible?
Is there a way to create an index to make my specific query (the one above) any faster?
Any pointers would be greatly appreciated. I am fairly new to MongoDB, but learning fast!
The index to create is:
db.race_results.ensureIndex({"horses.opponent.name":1, "horses.opponent.jockey":1})
After creating this index, the query in your case should return number of scanned objects that is equal to the number of matched objects:
db.race_results.find( { horses: { $elemMatch: { "opponent.name": "MyHorseName", "opponent.jockey": "MyFirstName MyLastName" } } }
).explain()