Updating MongoDB document for geospatial searching - mongodb

Currently, I have my lat/long in separate fields in my MongoDB database, but if I want to do geospatial searching I need to have them in this format:
{ location : [ 50 , 30 ] }
By what means can I transpose the values of my lat/long keys into a new key per document as per above?
TIA!

You will have to iterate through all your documents that don't have a location field and add it (presumably deleting the lat/long fields unless this will break your application).
db.mycollection.find( { location : { $exists : false } } ).forEach(
function (doc) {
// Add (lon, lat) pairs .. order is important
doc.location = { lon: doc.lon, lat: doc.lat };
// Remove old properties
delete doc.lon;
delete doc.lat;
// Save the updated document
db.mycollection.save(doc);
}
)
Note that the order for MongoDB geospatial indexing should be consistent in your document as (longitude, latitude).

Related

How to get the particuler object fields using ReativeMongo without a case class [duplicate]

In my MongoDB, I have a student collection with 10 records having fields name and roll. One record of this collection is:
{
"_id" : ObjectId("53d9feff55d6b4dd1171dd9e"),
"name" : "Swati",
"roll" : "80",
}
I want to retrieve the field roll only for all 10 records in the collection as we would do in traditional database by using:
SELECT roll FROM student
I went through many blogs but all are resulting in a query which must have WHERE clause in it, for example:
db.students.find({ "roll": { $gt: 70 })
The query is equivalent to:
SELECT * FROM student WHERE roll > 70
My requirement is to find a single key only without any condition. So, what is the query operation for that.
From the MongoDB docs:
A projection can explicitly include several fields. In the following operation, find() method returns all documents that match the query. In the result set, only the item and qty fields and, by default, the _id field return in the matching documents.
db.inventory.find( { type: 'food' }, { item: 1, qty: 1 } )
In this example from the folks at Mongo, the returned documents will contain only the fields of item, qty, and _id.
Thus, you should be able to issue a statement such as:
db.students.find({}, {roll:1, _id:0})
The above statement will select all documents in the students collection, and the returned document will return only the roll field (and exclude the _id).
If we don't mention _id:0 the fields returned will be roll and _id. The '_id' field is always displayed by default. So we need to explicitly mention _id:0 along with roll.
get all data from table
db.student.find({})
SELECT * FROM student
get all data from table without _id
db.student.find({}, {_id:0})
SELECT name, roll FROM student
get all data from one field with _id
db.student.find({}, {roll:1})
SELECT id, roll FROM student
get all data from one field without _id
db.student.find({}, {roll:1, _id:0})
SELECT roll FROM student
find specified data using where clause
db.student.find({roll: 80})
SELECT * FROM students WHERE roll = '80'
find a data using where clause and greater than condition
db.student.find({ "roll": { $gt: 70 }}) // $gt is greater than
SELECT * FROM student WHERE roll > '70'
find a data using where clause and greater than or equal to condition
db.student.find({ "roll": { $gte: 70 }}) // $gte is greater than or equal
SELECT * FROM student WHERE roll >= '70'
find a data using where clause and less than or equal to condition
db.student.find({ "roll": { $lte: 70 }}) // $lte is less than or equal
SELECT * FROM student WHERE roll <= '70'
find a data using where clause and less than to condition
db.student.find({ "roll": { $lt: 70 }}) // $lt is less than
SELECT * FROM student WHERE roll < '70'
I think mattingly890 has the correct answer , here is another example along with the pattern/commmand
db.collection.find( {}, {your_key:1, _id:0})
> db.mycollection.find().pretty();
{
"_id": ObjectId("54ffca63cea5644e7cda8e1a"),
"host": "google",
"ip": "1.1.192.1"
}
db.mycollection.find({},{ "_id": 0, "host": 1 }).pretty();
Here you go , 3 ways of doing , Shortest to boring :
db.student.find({}, 'roll _id'); // <--- Just multiple fields name space separated
// OR
db.student.find({}).select('roll _id'); // <--- Just multiple fields name space separated
// OR
db.student.find({}, {'roll' : 1 , '_id' : 1 ); // <---- Old lengthy boring way
To remove specific field use - operator :
db.student.find({}).select('roll -_id') // <--- Will remove id from result
While gowtham's answer is complete, it is worth noting that those commands may differ from on API to another (for those not using mongo's shell).
Please refer to documentation link for detailed info.
Nodejs, for instance, have a method called `projection that you would append to your find function in order to project.
Following the same example set, commands like the following can be used with Node:
db.student.find({}).project({roll:1})
SELECT _id, roll FROM student
Or
db.student.find({}).project({roll:1, _id: 0})
SELECT roll FROM student
and so on.
Again for nodejs users, do not forget (what you should already be familiar with if you used this API before) to use toArray in order to append your .then command.
Try the following query:
db.student.find({}, {roll: 1, _id: 0});
And if you are using console you can add pretty() for making it easy to read.
db.student.find({}, {roll: 1, _id: 0}).pretty();
Hope this helps!!
Just for educational purposes you could also do it with any of the following ways:
1.
var query = {"roll": {$gt: 70};
var cursor = db.student.find(query);
cursor.project({"roll":1, "_id":0});
2.
var query = {"roll": {$gt: 70};
var projection = {"roll":1, "_id":0};
var cursor = db.student.find(query,projection);
`
db.<collection>.find({}, {field1: <value>, field2: <value> ...})
In your example, you can do something like:
db.students.find({}, {"roll":true, "_id":false})
Projection
The projection parameter determines which fields are returned in the
matching documents. The projection parameter takes a document of the
following form:
{ field1: <value>, field2: <value> ... }
The <value> can be any of the following:
1 or true to include the field in the return documents.
0 or false to exclude the field.
NOTE
For the _id field, you do not have to explicitly specify _id: 1 to
return the _id field. The find() method always returns the _id field
unless you specify _id: 0 to suppress the field.
READ MORE
For better understanding I have written similar MySQL query.
Selecting specific fields
MongoDB : db.collection_name.find({},{name:true,email:true,phone:true});
MySQL : SELECT name,email,phone FROM table_name;
Selecting specific fields with where clause
MongoDB : db.collection_name.find({email:'you#email.com'},{name:true,email:true,phone:true});
MySQL : SELECT name,email,phone FROM table_name WHERE email = 'you#email.com';
This works for me,
db.student.find({},{"roll":1})
no condition in where clause i.e., inside first curly braces.
inside next curly braces: list of projection field names to be needed in the result and 1 indicates particular field is the part of the query result
getting name of the student
student-details = db.students.find({{ "roll": {$gt: 70} },{"name": 1, "_id": False})
getting name & roll of the student
student-details = db.students.find({{ "roll": {$gt: 70}},{"name": 1,"roll":1,"_id": False})
I just want to add to the answers that if you want to display a field that is nested in another object, you can use the following syntax
db.collection.find( {}, {{'object.key': true}})
Here key is present inside the object named object
{ "_id" : ObjectId("5d2ef0702385"), "object" : { "key" : "value" } }
var collection = db.collection('appuser');
collection.aggregate(
{ $project : { firstName : 1, lastName : 1 } },function(err, res){
res.toArray(function(err, realRes){
console.log("response roo==>",realRes);
});
});
it's working
Use the Query like this in the shell:
1. Use database_name
e.g: use database_name
2. Which returns only assets particular field information when matched , _id:0 specifies not to display ID in the result
db.collection_name.find( { "Search_Field": "value" },
{ "Field_to_display": 1,_id:0 } )
If u want to retrieve the field "roll" only for all 10 records in the collections.
Then try this.
In MongoDb :
db.students.find( { } , { " roll " : { " $roll " })
In Sql :
select roll from students
The query for MongoDB here fees is collection and description is a field.
db.getCollection('fees').find({},{description:1,_id:0})
Apart from what people have already mentioned I am just introducing indexes to the mix.
So imagine a large collection, with let's say over 1 million documents and you have to run a query like this.
The WiredTiger Internal cache will have to keep all that data in the cache if you have to run this query on it, if not that data will be fed into the WT Internal Cache either from FS Cache or Disk before the retrieval from DB is done (in batches if being called for from a driver connected to database & given that 1 million documents are not returned in 1 go, cursor comes into play)
Covered query can be an alternative. Copying the text from docs directly.
When an index covers a query, MongoDB can both match the query conditions and return the results using only the index keys; i.e. MongoDB does not need to examine documents from the collection to return the results.
When an index covers a query, the explain result has an IXSCAN stage that is not a descendant of a FETCH stage, and in the executionStats, the totalDocsExamined is 0.
Query : db.getCollection('qaa').find({roll_no : {$gte : 0}},{_id : 0, roll_no : 1})
Index : db.getCollection('qaa').createIndex({roll_no : 1})
If the index here is in WT Internal Cache then it would be a straight forward process to get the values. An index has impact on the write performance of the system thus this would make more sense if the reads are a plenty compared to the writes.
If you are using the MongoDB driver in NodeJs then the above-mentioned answers might not work for you. You will have to do something like this to get only selected properties as a response.
import { MongoClient } from "mongodb";
// Replace the uri string with your MongoDB deployment's connection string.
const uri = "<connection string uri>";
const client = new MongoClient(uri);
async function run() {
try {
await client.connect();
const database = client.db("sample_mflix");
const movies = database.collection("movies");
// Query for a movie that has the title 'The Room'
const query = { title: "The Room" };
const options = {
// sort matched documents in descending order by rating
sort: { "imdb.rating": -1 },
// Include only the `title` and `imdb` fields in the returned document
projection: { _id: 0, title: 1, imdb: 1 },
};
const movie = await movies.findOne(query, options);
/** since this method returns the matched document, not a cursor,
* print it directly
*/
console.log(movie);
} finally {
await client.close();
}
}
run().catch(console.dir);
This code is copied from the actual MongoDB doc you can check here.
https://docs.mongodb.com/drivers/node/current/usage-examples/findOne/
db.student.find({}, {"roll":1, "_id":0})
This is equivalent to -
Select roll from student
db.student.find({}, {"roll":1, "name":1, "_id":0})
This is equivalent to -
Select roll, name from student
In mongodb 3.4 we can use below logic, i am not sure about previous versions
select roll from student ==> db.student.find(!{}, {roll:1})
the above logic helps to define some columns (if they are less)
Using Studio 3T for MongoDB, if I use .find({}, { _id: 0, roll: true }) it still return an array of objects with an empty _id property.
Using JavaScript map helped me to only retrieve the desired roll property as an array of string:
var rolls = db.student
.find({ roll: { $gt: 70 } }) // query where role > 70
.map(x => x.roll); // return an array of role
Not sure this answers the question but I believe it's worth mentioning here.
There is one more way for selecting single field (and not multiple) using db.collection_name.distinct();
e.g.,db.student.distinct('roll',{});
Or, 2nd way: Using db.collection_name.find().forEach(); (multiple fields can be selected here by concatenation)
e.g., db.collection_name.find().forEach(function(c1){print(c1.roll);});
_id = "123321"; _user = await likes.find({liker_id: _id},{liked_id:"$liked_id"}); ;
let suppose you have liker_id and liked_id field in the document so by putting "$liked_id" it will return _id and liked_id only.
For Single Update :
db.collection_name.update({ field_name_1: ("value")}, { $set: { field_name_2 : "new_value" }});
For MultiUpdate :
db.collection_name.updateMany({ field_name_1: ("value")}, { $set: {field_name_2 : "new_value" }});
Make sure indexes are proper.

How to make a query without nested document in MongoDB? [duplicate]

In my MongoDB, I have a student collection with 10 records having fields name and roll. One record of this collection is:
{
"_id" : ObjectId("53d9feff55d6b4dd1171dd9e"),
"name" : "Swati",
"roll" : "80",
}
I want to retrieve the field roll only for all 10 records in the collection as we would do in traditional database by using:
SELECT roll FROM student
I went through many blogs but all are resulting in a query which must have WHERE clause in it, for example:
db.students.find({ "roll": { $gt: 70 })
The query is equivalent to:
SELECT * FROM student WHERE roll > 70
My requirement is to find a single key only without any condition. So, what is the query operation for that.
From the MongoDB docs:
A projection can explicitly include several fields. In the following operation, find() method returns all documents that match the query. In the result set, only the item and qty fields and, by default, the _id field return in the matching documents.
db.inventory.find( { type: 'food' }, { item: 1, qty: 1 } )
In this example from the folks at Mongo, the returned documents will contain only the fields of item, qty, and _id.
Thus, you should be able to issue a statement such as:
db.students.find({}, {roll:1, _id:0})
The above statement will select all documents in the students collection, and the returned document will return only the roll field (and exclude the _id).
If we don't mention _id:0 the fields returned will be roll and _id. The '_id' field is always displayed by default. So we need to explicitly mention _id:0 along with roll.
get all data from table
db.student.find({})
SELECT * FROM student
get all data from table without _id
db.student.find({}, {_id:0})
SELECT name, roll FROM student
get all data from one field with _id
db.student.find({}, {roll:1})
SELECT id, roll FROM student
get all data from one field without _id
db.student.find({}, {roll:1, _id:0})
SELECT roll FROM student
find specified data using where clause
db.student.find({roll: 80})
SELECT * FROM students WHERE roll = '80'
find a data using where clause and greater than condition
db.student.find({ "roll": { $gt: 70 }}) // $gt is greater than
SELECT * FROM student WHERE roll > '70'
find a data using where clause and greater than or equal to condition
db.student.find({ "roll": { $gte: 70 }}) // $gte is greater than or equal
SELECT * FROM student WHERE roll >= '70'
find a data using where clause and less than or equal to condition
db.student.find({ "roll": { $lte: 70 }}) // $lte is less than or equal
SELECT * FROM student WHERE roll <= '70'
find a data using where clause and less than to condition
db.student.find({ "roll": { $lt: 70 }}) // $lt is less than
SELECT * FROM student WHERE roll < '70'
I think mattingly890 has the correct answer , here is another example along with the pattern/commmand
db.collection.find( {}, {your_key:1, _id:0})
> db.mycollection.find().pretty();
{
"_id": ObjectId("54ffca63cea5644e7cda8e1a"),
"host": "google",
"ip": "1.1.192.1"
}
db.mycollection.find({},{ "_id": 0, "host": 1 }).pretty();
Here you go , 3 ways of doing , Shortest to boring :
db.student.find({}, 'roll _id'); // <--- Just multiple fields name space separated
// OR
db.student.find({}).select('roll _id'); // <--- Just multiple fields name space separated
// OR
db.student.find({}, {'roll' : 1 , '_id' : 1 ); // <---- Old lengthy boring way
To remove specific field use - operator :
db.student.find({}).select('roll -_id') // <--- Will remove id from result
While gowtham's answer is complete, it is worth noting that those commands may differ from on API to another (for those not using mongo's shell).
Please refer to documentation link for detailed info.
Nodejs, for instance, have a method called `projection that you would append to your find function in order to project.
Following the same example set, commands like the following can be used with Node:
db.student.find({}).project({roll:1})
SELECT _id, roll FROM student
Or
db.student.find({}).project({roll:1, _id: 0})
SELECT roll FROM student
and so on.
Again for nodejs users, do not forget (what you should already be familiar with if you used this API before) to use toArray in order to append your .then command.
Try the following query:
db.student.find({}, {roll: 1, _id: 0});
And if you are using console you can add pretty() for making it easy to read.
db.student.find({}, {roll: 1, _id: 0}).pretty();
Hope this helps!!
Just for educational purposes you could also do it with any of the following ways:
1.
var query = {"roll": {$gt: 70};
var cursor = db.student.find(query);
cursor.project({"roll":1, "_id":0});
2.
var query = {"roll": {$gt: 70};
var projection = {"roll":1, "_id":0};
var cursor = db.student.find(query,projection);
`
db.<collection>.find({}, {field1: <value>, field2: <value> ...})
In your example, you can do something like:
db.students.find({}, {"roll":true, "_id":false})
Projection
The projection parameter determines which fields are returned in the
matching documents. The projection parameter takes a document of the
following form:
{ field1: <value>, field2: <value> ... }
The <value> can be any of the following:
1 or true to include the field in the return documents.
0 or false to exclude the field.
NOTE
For the _id field, you do not have to explicitly specify _id: 1 to
return the _id field. The find() method always returns the _id field
unless you specify _id: 0 to suppress the field.
READ MORE
For better understanding I have written similar MySQL query.
Selecting specific fields
MongoDB : db.collection_name.find({},{name:true,email:true,phone:true});
MySQL : SELECT name,email,phone FROM table_name;
Selecting specific fields with where clause
MongoDB : db.collection_name.find({email:'you#email.com'},{name:true,email:true,phone:true});
MySQL : SELECT name,email,phone FROM table_name WHERE email = 'you#email.com';
This works for me,
db.student.find({},{"roll":1})
no condition in where clause i.e., inside first curly braces.
inside next curly braces: list of projection field names to be needed in the result and 1 indicates particular field is the part of the query result
getting name of the student
student-details = db.students.find({{ "roll": {$gt: 70} },{"name": 1, "_id": False})
getting name & roll of the student
student-details = db.students.find({{ "roll": {$gt: 70}},{"name": 1,"roll":1,"_id": False})
I just want to add to the answers that if you want to display a field that is nested in another object, you can use the following syntax
db.collection.find( {}, {{'object.key': true}})
Here key is present inside the object named object
{ "_id" : ObjectId("5d2ef0702385"), "object" : { "key" : "value" } }
var collection = db.collection('appuser');
collection.aggregate(
{ $project : { firstName : 1, lastName : 1 } },function(err, res){
res.toArray(function(err, realRes){
console.log("response roo==>",realRes);
});
});
it's working
Use the Query like this in the shell:
1. Use database_name
e.g: use database_name
2. Which returns only assets particular field information when matched , _id:0 specifies not to display ID in the result
db.collection_name.find( { "Search_Field": "value" },
{ "Field_to_display": 1,_id:0 } )
If u want to retrieve the field "roll" only for all 10 records in the collections.
Then try this.
In MongoDb :
db.students.find( { } , { " roll " : { " $roll " })
In Sql :
select roll from students
The query for MongoDB here fees is collection and description is a field.
db.getCollection('fees').find({},{description:1,_id:0})
Apart from what people have already mentioned I am just introducing indexes to the mix.
So imagine a large collection, with let's say over 1 million documents and you have to run a query like this.
The WiredTiger Internal cache will have to keep all that data in the cache if you have to run this query on it, if not that data will be fed into the WT Internal Cache either from FS Cache or Disk before the retrieval from DB is done (in batches if being called for from a driver connected to database & given that 1 million documents are not returned in 1 go, cursor comes into play)
Covered query can be an alternative. Copying the text from docs directly.
When an index covers a query, MongoDB can both match the query conditions and return the results using only the index keys; i.e. MongoDB does not need to examine documents from the collection to return the results.
When an index covers a query, the explain result has an IXSCAN stage that is not a descendant of a FETCH stage, and in the executionStats, the totalDocsExamined is 0.
Query : db.getCollection('qaa').find({roll_no : {$gte : 0}},{_id : 0, roll_no : 1})
Index : db.getCollection('qaa').createIndex({roll_no : 1})
If the index here is in WT Internal Cache then it would be a straight forward process to get the values. An index has impact on the write performance of the system thus this would make more sense if the reads are a plenty compared to the writes.
If you are using the MongoDB driver in NodeJs then the above-mentioned answers might not work for you. You will have to do something like this to get only selected properties as a response.
import { MongoClient } from "mongodb";
// Replace the uri string with your MongoDB deployment's connection string.
const uri = "<connection string uri>";
const client = new MongoClient(uri);
async function run() {
try {
await client.connect();
const database = client.db("sample_mflix");
const movies = database.collection("movies");
// Query for a movie that has the title 'The Room'
const query = { title: "The Room" };
const options = {
// sort matched documents in descending order by rating
sort: { "imdb.rating": -1 },
// Include only the `title` and `imdb` fields in the returned document
projection: { _id: 0, title: 1, imdb: 1 },
};
const movie = await movies.findOne(query, options);
/** since this method returns the matched document, not a cursor,
* print it directly
*/
console.log(movie);
} finally {
await client.close();
}
}
run().catch(console.dir);
This code is copied from the actual MongoDB doc you can check here.
https://docs.mongodb.com/drivers/node/current/usage-examples/findOne/
db.student.find({}, {"roll":1, "_id":0})
This is equivalent to -
Select roll from student
db.student.find({}, {"roll":1, "name":1, "_id":0})
This is equivalent to -
Select roll, name from student
In mongodb 3.4 we can use below logic, i am not sure about previous versions
select roll from student ==> db.student.find(!{}, {roll:1})
the above logic helps to define some columns (if they are less)
Using Studio 3T for MongoDB, if I use .find({}, { _id: 0, roll: true }) it still return an array of objects with an empty _id property.
Using JavaScript map helped me to only retrieve the desired roll property as an array of string:
var rolls = db.student
.find({ roll: { $gt: 70 } }) // query where role > 70
.map(x => x.roll); // return an array of role
Not sure this answers the question but I believe it's worth mentioning here.
There is one more way for selecting single field (and not multiple) using db.collection_name.distinct();
e.g.,db.student.distinct('roll',{});
Or, 2nd way: Using db.collection_name.find().forEach(); (multiple fields can be selected here by concatenation)
e.g., db.collection_name.find().forEach(function(c1){print(c1.roll);});
_id = "123321"; _user = await likes.find({liker_id: _id},{liked_id:"$liked_id"}); ;
let suppose you have liker_id and liked_id field in the document so by putting "$liked_id" it will return _id and liked_id only.
For Single Update :
db.collection_name.update({ field_name_1: ("value")}, { $set: { field_name_2 : "new_value" }});
For MultiUpdate :
db.collection_name.updateMany({ field_name_1: ("value")}, { $set: {field_name_2 : "new_value" }});
Make sure indexes are proper.

How to store an ordered set of documents in MongoDB without using a capped collection

What's a good way to store a set of documents in MongoDB where order is important? I need to easily insert documents at an arbitrary position and possibly reorder them later.
I could assign each item an increasing number and sort by that, or I could sort by _id, but I don't know how I could then insert another document in between other documents. Say I want to insert something between an element with a sequence of 5 and an element with a sequence of 6?
My first guess would be to increment the sequence of all of the following elements so that there would be space for the new element using a query something like db.items.update({"sequence":{$gte:6}}, {$inc:{"sequence":1}}). My limited understanding of Database Administration tells me that a query like that would be slow and generally a bad idea, but I'm happy to be corrected.
I guess I could set the new element's sequence to 5.5, but I think that would get messy rather quickly. (Again, correct me if I'm wrong.)
I could use a capped collection, which has a guaranteed order, but then I'd run into issues if I needed to grow the collection. (Yet again, I might be wrong about that one too.)
I could have each document contain a reference to the next document, but that would require a query for each item in the list. (You'd get an item, push it onto the results array, and get another item based on the next field of the current item.) Aside from the obvious performance issues, I would also not be able to pass a sorted mongo cursor to my {#each} spacebars block expression and let it live update as the database changed. (I'm using the Meteor full-stack javascript framework.)
I know that everything has it's advantages and disadvantages, and I might just have to use one of the options listed above, but I'd like to know if there is a better way to do things.
Based on your requirement, one of the approaches could be to design your schema, in such a way that each document has the capability to hold more than one document and in itself act as a capped container.
{
"_id":Number,
"doc":Array
}
Each document in the collection will act as a capped container, and the documents will be stored as array in the doc field. The doc field being an array, will maintain the order of insertion.
You can limit the number of documents to n. So the _id field of each container document will be incremental by n, indicating the number of documents a container document can hold.
By doing these you avoid adding extra fields to the document, extra indices, unnecessary sorts.
Inserting the very first record
i.e when the collection is empty.
var record = {"name" : "first"};
db.col.insert({"_id":0,"doc":[record]});
Inserting subsequent records
Identify the last container document's _id, and the number of
documents it holds.
If the number of documents it holds is less than n, then update the
container document with the new document, else create a new container
document.
Say, that each container document can hold 5 documents at most,and we want to insert a new document.
var record = {"name" : "newlyAdded"};
// using aggregation, get the _id of the last inserted container, and the
// number of record it currently holds.
db.col.aggregate( [ {
$group : {
"_id" : null,
"max" : {
$max : "$_id"
},
"lastDocSize" : {
$last : "$doc"
}
}
}, {
$project : {
"currentMaxId" : "$max",
"capSize" : {
$size : "$lastDocSize"
},
"_id" : 0
}
// once obtained, check if you need to update the last container or
// create a new container and insert the document in it.
} ]).forEach( function(check) {
if (check.capSize < 5) {
print("updating");
// UPDATE
db.col.update( {
"_id" : check.currentMaxId
}, {
$push : {
"doc" : record
}
});
} else {
print("inserting");
//insert
db.col.insert( {
"_id" : check.currentMaxId + 5,
"doc" : [ record ]
});
}
})
Note that the aggregation, runs on the server side and is very efficient, also note that the aggregation would return you a document rather than a cursor in versions previous to 2.6. So you would need to modify the above code to just select from a single document rather than iterating a cursor.
Inserting a new document in between documents
Now, if you would like to insert a new document between documents 1 and 2, we know that the document should fall inside the container with _id=0 and should be placed in the second position in the doc array of that container.
so, we make use of the $each and $position operators for inserting into specific positions.
var record = {"name" : "insertInMiddle"};
db.col.update(
{
"_id" : 0
}, {
$push : {
"doc" : {
$each : [record],
$position : 1
}
}
}
);
Handling Over Flow
Now, we need to take care of documents overflowing in each container, say we insert a new document in between, in container with _id=0. If the container already has 5 documents, we need to move the last document to the next container and do so till all the containers hold documents within their capacity, if required at last we need to create a container to hold the overflowing documents.
This complex operation should be done on the server side. To handle this, we can create a script such as the one below and register it with mongodb.
db.system.js.save( {
"_id" : "handleOverFlow",
"value" : function handleOverFlow(id) {
var currDocArr = db.col.find( {
"_id" : id
})[0].doc;
print(currDocArr);
var count = currDocArr.length;
var nextColId = id + 5;
// check if the collection size has exceeded
if (count <= 5)
return;
else {
// need to take the last doc and push it to the next capped
// container's array
print("updating collection: " + id);
var record = currDocArr.splice(currDocArr.length - 1, 1);
// update the next collection
db.col.update( {
"_id" : nextColId
}, {
$push : {
"doc" : {
$each : record,
$position : 0
}
}
});
// remove from original collection
db.col.update( {
"_id" : id
}, {
"doc" : currDocArr
});
// check overflow for the subsequent containers, recursively.
handleOverFlow(nextColId);
}
}
So that after every insertion in between , we can invoke this function by passing the container id, handleOverFlow(containerId).
Fetching all the records in order
Just use the $unwind operator in the aggregate pipeline.
db.col.aggregate([{$unwind:"$doc"},{$project:{"_id":0,"doc":1}}]);
Re-Ordering Documents
You can store each document in a capped container with an "_id" field:
.."doc":[{"_id":0,","name":"xyz",...}..]..
Get hold of the "doc" array of the capped container of which you want
to reorder items.
var docArray = db.col.find({"_id":0})[0];
Update their ids so that after sorting the order of the item will change.
Sort the array based on their _ids.
docArray.sort( function(a, b) {
return a._id - b._id;
});
update the capped container back, with the new doc array.
But then again, everything boils down to which approach is feasible and suits your requirement best.
Coming to your questions:
What's a good way to store a set of documents in MongoDB where order is important?I need to easily insert documents at an arbitrary
position and possibly reorder them later.
Documents as Arrays.
Say I want to insert something between an element with a sequence of 5 and an element with a sequence of 6?
use the $each and $position operators in the db.collection.update() function as depicted in my answer.
My limited understanding of Database Administration tells me that a
query like that would be slow and generally a bad idea, but I'm happy
to be corrected.
Yes. It would impact the performance, unless the collection has very less data.
I could use a capped collection, which has a guaranteed order, but then I'd run into issues if I needed to grow the collection. (Yet
again, I might be wrong about that one too.)
Yes. With Capped Collections, you may lose data.
An _id field in MongoDB is a unique, indexed key similar to a primary key in relational databases. If there is an inherent order in your documents, ideally you should be able to associate a unique key to each document, with the key value reflecting the order. So while preparing your document for insertion, explicitly add an _id field as this key (if you do not, mongo creates it automatically with a BSON objectid).
As far as retrieving the results are concerned, MongoDB does not guarantee the order of return documents unless you explicitly use .sort() . If you do not use .sort(), the results are usually returned in natural order (order of insertion).Again, there is no guarantee on this behavior.
I'd advise you to override _id with your order while inserting, and use a sort while retrieving. Since _id is a necessary and auto-indexed entity, you will not be wasting any space defining a sort key, and storing the index for it.
For abitrary sorting of any collection, you'll need a field to sort it on. I call mine "sequence".
schema:
{
_id: ObjectID,
sequence: Number,
...
}
db.items.ensureIndex({sequence:1});
db.items.find().sort({sequence:1})
Here is a link to some general sorting database answers that may be relevant:
https://softwareengineering.stackexchange.com/questions/195308/storing-a-re-orderable-list-in-a-database/369754
I suggest going with Floating point solution - adding a position column:
Use a floating-point number for the position column.
You can then reorder the list changing only the position column in the "moved" row.
If your user wants to position "red" after "blue" but before "yellow" Then you just need to calculate
red.position = ((yellow.position - blue.position) / 2) + blue.position
After a few re-positions in the same place (Cuttin in half every time) - you might reach a wall - it's better that if you reach a certain threshold - to resort the list.
When retrieving it you can simply say col.sort() to get it sorted and no need for any client-side code (Like in the case of a Linked list solution)

how to deal with complicated query in mongodb?

I use mongodb to save the temporal and spatial data, and the document item is structured as follows:
doc = { time:t,
geo:[x,y]
}
If the different of two docs are defined as:
dist(doc1, doc2) = |t1-t2| + |x1-x2| + |y1 - y2|
How can I query the documents by mongodb and sort the results by their distance to a given document doc0 ={ time:t0, geo:[x0,y0] }?
thanks
Instead of calculating the distance manually, you could trust mongodb with that task. Mongodb has built in geospatial query support.
This would look like this:
db.docs.find( {
"time": "t0",
"geo" : { $near : [x0,y0] }
} ).limit(20)
The result would be all documents near the given location [x0,y0], automatically ordered by distance to that point.

How to remove duplicates based on a key in Mongodb?

I have a collection in MongoDB where there are around (~3 million records). My sample record would look like,
{ "_id" = ObjectId("50731xxxxxxxxxxxxxxxxxxxx"),
"source_references" : [
"_id" : ObjectId("5045xxxxxxxxxxxxxx"),
"name" : "xxx",
"key" : 123
]
}
I am having a lot of duplicate records in the collection having same source_references.key. (By Duplicate I mean, source_references.key not the _id).
I want to remove duplicate records based on source_references.key, I'm thinking of writing some PHP code to traverse each record and remove the record if exists.
Is there a way to remove the duplicates in Mongo Internal command line?
This answer is obsolete : the dropDups option was removed in MongoDB 3.0, so a different approach will be required in most cases. For example, you could use aggregation as suggested on: MongoDB duplicate documents even after adding unique key.
If you are certain that the source_references.key identifies duplicate records, you can ensure a unique index with the dropDups:true index creation option in MongoDB 2.6 or older:
db.things.ensureIndex({'source_references.key' : 1}, {unique : true, dropDups : true})
This will keep the first unique document for each source_references.key value, and drop any subsequent documents that would otherwise cause a duplicate key violation.
Important Note: Any documents missing the source_references.key field will be considered as having a null value, so subsequent documents missing the key field will be deleted. You can add the sparse:true index creation option so the index only applies to documents with a source_references.key field.
Obvious caution: Take a backup of your database, and try this in a staging environment first if you are concerned about unintended data loss.
This is the easiest query I used on my MongoDB 3.2
db.myCollection.find({}, {myCustomKey:1}).sort({_id:1}).forEach(function(doc){
db.myCollection.remove({_id:{$gt:doc._id}, myCustomKey:doc.myCustomKey});
})
Index your customKey before running this to increase speed
While #Stennie's is a valid answer, it is not the only way. Infact the MongoDB manual asks you to be very cautious while doing that. There are two other options
Let the MongoDB do that for you using Map Reduce
Another way
You do programatically which is less efficient.
Here is a slightly more 'manual' way of doing it:
Essentially, first, get a list of all the unique keys you are interested.
Then perform a search using each of those keys and delete if that search returns bigger than one.
db.collection.distinct("key").forEach((num)=>{
var i = 0;
db.collection.find({key: num}).forEach((doc)=>{
if (i) db.collection.remove({key: num}, { justOne: true })
i++
})
});
I had a similar requirement but I wanted to retain the latest entry. The following query worked with my collection which had millions of records and duplicates.
/** Create a array to store all duplicate records ids*/
var duplicates = [];
/** Start Aggregation pipeline*/
db.collection.aggregate([
{
$match: { /** Add any filter here. Add index for filter keys*/
filterKey: {
$exists: false
}
}
},
{
$sort: { /** Sort it in such a way that you want to retain first element*/
createdAt: -1
}
},
{
$group: {
_id: {
key1: "$key1", key2:"$key2" /** These are the keys which define the duplicate. Here document with same value for key1 and key2 will be considered duplicate*/
},
dups: {
$push: {
_id: "$_id"
}
},
count: {
$sum: 1
}
}
},
{
$match: {
count: {
"$gt": 1
}
}
}
],
{
allowDiskUse: true
}).forEach(function(doc){
doc.dups.shift();
doc.dups.forEach(function(dupId){
duplicates.push(dupId._id);
})
})
/** Delete the duplicates*/
var i,j,temparray,chunk = 100000;
for (i=0,j=duplicates.length; i<j; i+=chunk) {
temparray = duplicates.slice(i,i+chunk);
db.collection.bulkWrite([{deleteMany:{"filter":{"_id":{"$in":temparray}}}}])
}
Expanding on Fernando's answer, I found that it was taking too long, so I modified it.
var x = 0;
db.collection.distinct("field").forEach(fieldValue => {
var i = 0;
db.collection.find({ "field": fieldValue }).forEach(doc => {
if (i) {
db.collection.remove({ _id: doc._id });
}
i++;
x += 1;
if (x % 100 === 0) {
print(x); // Every time we process 100 docs.
}
});
});
The improvement is basically using the document id for removing, which should be faster, and also adding the progress of the operation, you can change the iteration value to your desired amount.
Also, indexing the field before the operation helps.
pip install mongo_remove_duplicate_indexes
create a script in any language
iterate over your collection
create new collection and create new index in this collection with unique set to true ,remember this index has to be same as index u wish to remove duplicates from in ur original collection with same name
for ex-u have a collection gaming,and in this collection u have field genre which contains duplicates,which u wish to remove,so just create new collection
db.createCollection("cname")
create new index
db.cname.createIndex({'genre':1},unique:1)
now when u will insert document with similar genre only first will be accepted,other will be rejected with duplicae key error
now just insert the json format values u received into new collection and handle exception using exception handling
for ex pymongo.errors.DuplicateKeyError
check out the package source code for the mongo_remove_duplicate_indexes for better understanding
If you have enough memory, you can in scala do something like that:
cole.find().groupBy(_.customField).filter(_._2.size>1).map(_._2.tail).flatten.map(_.id)
.foreach(x=>cole.remove({id $eq x})