When I try to create an index on geometry db.polygons.createIndex({"geometry":"2dsphere"}), it stops at a certain polygon with the error code 16755. It says it Can't extract geo keys and Duplicate vertices: 18 and 20.
So upon further inspection, it seems like this happens when 2 nodes in a polygon are close together, or even duplicates.
I then go manually remove this node in QGIS and re-try the process, only to find there's another polygon with the same issue.
How can I fix this issue without having to repeat the entire process of fixing polygon > uploading to MongoDB > creating index? Is there a way I can find out how many polygons have this issue?
I hit a similar problem. I just needed to find the valid records in my dataset (I discarded the records with Duplicate Vertices).
I renamed the collection -
db.myCollection.renameCollection('myCollectionBk')
Then I added a single record from the original collection into a new collection and added a geospatial index to the collection
db.myCollection.insert(db.myCollectionBk.findOne()) // recreate the collection
db.myCollection.createIndex({geometry:"2dsphere"}) // create the index (assumes the geometry on the record is valid)
db.myCollection.remove({}) // remove the record
Then I added the valid records into the new collection.
db.myCollectionBk.find().forEach(function(x){
db.myCollection.insert(x);
})
Invalid records are simply ignored.
In your case you probably want to get the WriteResult from your insert, and look to see if it was successful. Something like
var errors = []
db.myCollectionBk.find().forEach(function(x){
var result = db.myCollection.insert(x);
if (result.writeError) {
errors.push({x._id:result.writeError.errmsg});
}
})
As another alternative, check out this question (I couldn't get this to work)
So what I did was, I first created the collection with the index and then tried inserting using mongoimport which gave me how many were inserted successfully.
> db.someNewCollection.createIndex({"geometry":"2dsphere"})
To insert GeoJSON into MongoDB I did the following:
$ jq --compact-output ".features" yourGeoJSON > output.geojson
$ mongoimport --db someDB -c someNewCollection --file "output.geojson" --jsonArray
Related
As I was importing a new json file to my mongodb collection I've accidentally use just one '-' instead of 2. Eg.:
mongoimport --host=127.0.0.1 --db=dataBaseName -collection=people --file=importFile.json
I believe that due to the lack of the second '-', now I'm stuck with the following results when I type show collections:
people
ollection=people
I can't access, drop or interact with the second one. Apart from droping the database and starting over, is there a way around this issue?
You can rename the collection like:
> use YourDatabase
// Might wanna drop people collection first
> db.getCollection("ollection=people").renameCollection("people")
Hope This helps!
I use Meteor with WireTiger 3.2.12
I've made dump using monogdump
When I'm trying to restore it, on some documents I recieved this:
error: write to oplog failed: BadValue: object to insert exceeds
cappedMaxSize
This collection is not 'Capped' (I tested it with db.my_collection_name.stats()["capped"])
How it's possible to import such documents?
Thanks in advance
It could be that the collection was previously created as capped and it still has the property attached to it.
Check the collection properties, or (if you don't need the data, drop it and re-create it)
I got the same error when trying to insert an object with recursive properties (one of objects was self referencing).
In the mongo shell, I can find records from collection stat by using such commands:
use gm;
gm.stat.find({});
but stat is not listed in show collection results.
Any collection virtually exists all the time (i.e. you will not get an error saying that "you did not create a collection"). As soon as you insert the first document into a collection this will exist also physically (will be created on the disk). So if you really want to make sure a collection exists use:
db.getCollectionNames()
This will show you only the collections that had at least one document inserted into them, even if they are currently empty.
Once physically created, a collection can be deleted using the drop command:
db.myColl.drop()
This will delete it physically but the "virtual" one will still be there.
As for your example, running:
db.stat.insert({}); print("Collection stat exists:" + (db.getCollectionNames().indexOf("stat") !== -1));
will tell you:
Collection stat exists: true
In my web scraping project i need to move previous day scraped data from mongo_collection to mongo_his_collection
I am using this query to move data
for record in collection.find():
his_collection.insert(record)
collection.remove()
It works fine but sometimes it break when MongoDB collection contain above 10k rows
Suggest me some optimized query which will take less resources and do the same task
You could use a MapReduce job for this.
MapReduce allows you to specify a out-collection to store the results in.
When you hava a map function which emits each document with its own _id as key and a reduce function which returns the first (and in this case only because _id's are unique) entry of the values array, the MapReduce is essentially a copy operation from the source-collection to the out-collection.
Untested code:
db.runCommand(
{
mapReduce: "mongo_collection",
map: function(document) {
emit(document._id, document);
},
reduce: function(key, values) {
return values[0];
},
out: {
merge:"mongo_his_collection"
}
}
)
If both your collections are in the same database, I believe you're looking for renameCollection.
If not, you unfortunately have to do it manually, using a targeted mongodump / mongorestore command:
mongodump -d your_database -c mongo_collection
mongorestore -d your_database -c mongo_his_collection dump/your_database/mongo_collection.bson
Note that I just typed these two commands from the top of my head without actually testing them, so do make sure you check them before running them in production.
[EDIT]: sorry, I just realised that this was something you needed to do on a regular basis. In that case, mongodump / mongorestore probably isn't the best solution.
I don't see anything wrong with your solution - it would help if you edited your question to explain what you mean by "it breaks".
The query breaks because you are not limiting the find(). When you create a cursor on the server mongod will try to load the entire result set in memory. This will cause problems and/or fail if your collection is too large.
To avoid this use a skip/limit loop. Here is an example in Java:
long count = 0
while (true) {
MongoClient client = new MongoClient();
DBCursor = client.getDB("your_DB_name").getCollection("mongo_collection").find().sort(new BasicDBObject("$natural", 1)).skip(count).limit(100);
while (cursor.hasNext()) {
client.getDB("your_DB_name").getCollection("mongo_his_collection").insert(cursor.next());
count++;
}
}
This will work, but you would get better performance by batching the writes as well. To do that build an array of DBObjects from the cursor and write them all at once with one insert.
Also if the Collection is being altered while you are copying there is no guarantee that you will traverse all documents as some may end up getting moved if they increase in size.
You can try mongodump & mongorestore.
You can use renameCollection to do it directly. Or if on different mongods, use cloneCollection.
References:
MongoDB docs renameCollection: http://docs.mongodb.org/manual/reference/command/renameCollection/#dbcmd.renameCollection
MongoDB docs cloneCollection: http://docs.mongodb.org/manual/reference/command/cloneCollection/
Relevant blog post: http://blog.shlomoid.com/2011/08/how-to-move-mongodb-collection-between.html
What I want:
I have a master collection of products, I then want to filter them and put them in a separate collection.
db.masterproducts.find({category:"scuba gear"}).copyTo(db.newcollection)
Of course, I realise the 'copyTo' does not exist.
I thought I could do it with MapReduce as results are created in a new collection using the new 'out' parameter in v1.8; however this new collection is not a subset of my original collection. Or can it be if I use MapReduce correctly?
To get around it I am currently doing this:
Step 1:
/usr/local/mongodb/bin/mongodump --db database --collection masterproducts -q '{category:"scuba gear"}'
Step 2:
/usr/local/mongodb/bin/mongorestore -d database -c newcollection --drop packages.bson
My 2 step method just seems rather inefficient!
Any help greatly appreciated.
Thanks
Bob
You can iterate through your query result and save each item like this:
db.oldCollection.find(query).forEach(function(x){db.newCollection.save(x);})
You can create small server side javascript (like this one, just add filtering you want) and execute it using eval
You can use dump/restore in the way you described above
Copy collection command shoud be in mongodb soon (will be done in votes order)! See jira feature.
You should be able to create a subset with mapreduce (using 'out'). The problem is mapreduce has a special output format so your documents are going to be transformed (there is a JIRA ticket to add support for another format, but I can not find it at the moment). It is also going to be very inefficent :/
Copying a cursor to a collection makes a lot of sense, I suggest creating a ticket for this.
there is also toArray() method which can be used:
//create new collection
db.creatCollection("resultCollection")
// now query for type="foo" and insert the results into new collection
db.resultCollection.insert( (db.orginialCollection.find({type:'foo'}).toArray())