I would like to ask you for help. I encountered a problem where, when I'm importing JSON into mongodb via compass, it throws a duplicate _id error. Therefore, I tried to go to the terminal and go through mongoimport, which runs successfully and informs me that each document was imported without error, but I see that the documents are missing. Can you give me some advice on how to solve this problem?
This is terminal command in windows cmd
mongoimport D:\DimplomaThesis_data\transfer_json\180000-190000.json -d diplomovka -c transfer --jsonArray --stopOnError --maintainInsertionOrder --upsertFields _id
This is structure of record in JSON array:
{
"_id":"5d6566d086dc8b72382bc376",
"name":"Peter",
"surname":"Zubrík",
"titles":{
"before":"",
"after":""
},
"sex":"M",
"citizenship":"SVK",
"birthyear":1991,
"age":31,
"transfer":{
"source_ppo":"tj-polana-siba.futbalnet.sk",
"org_profile_id":"sportovnik-klub-fc-mukarov.futbalnet.sk",
"org_id":"5d5d3974eccb8850917918cd",
"sector":{
"_id":"sport:futbal:futbal",
"category":"sport",
"itemId":"futbal",
"sectorId":"futbal"
},
"competence_type":"player",
"transfer_type":"transfer",
"issfMoveType":"PWP",
"date_from":"2014-05-09T00:00:00.000Z",
"date_to":null,
"_id":"62e6d12c0ae29819010f611f",
"org_profile_name":"Sportovník klub FC Mukařov",
"org_name":"Sportovník klub FC Mukařov",
"source_ppo_name":"TJ Poľana Šiba"
},
"issfId":"1208658"
}
"_id":"5d6566d086dc8b72382bc376" this could have multiple records in array same. I download data from APIs, around 30 JSON each contain 10.000 records. Ideally import all document to mongodb and next create pipeline in compass.
I found solution for my problem.
I need to use python for creating compound_id (new primary key - unique identifier for each record in array (json)).
this code work for me:
# Load the JSON data from the file
with open("250000-260000", "r", encoding="utf-8") as f:
data = json.load(f)
# Modify the data to include the compound_key and player_id fields
for doc in data:
doc["player_id"] = doc["_id"]
doc["compound_key"] = doc["player_id"] + "_" + doc["transfer"]["date_from"]
doc["_id"] = doc["compound_key"]
# Save the modified data to a new JSON file
with open("26.json", "w") as f:
json.dump(data, f, indent=2)
Basically I created new modify json file and this file I import through Mongo Compass where import finish with 0 error (error duplicate _id)
Related
I need to get rid of the newline character that separates objects in a json file so that I can import properly into mongodb without having the objects in an array. What do I use in javascript to do this? I need my data in this format so that I can import:
{ name: "Widget 1", desc: "This is Widget 1" }
{ name: "Widget 2", desc: "This is Widget 2" }
The answer is, you dont have to convert the file to an array, mongoimport expects a "json-line" format like you already have.
This format is very good for performance, because you don't have to load it at once, instead mongo will take line by line. So imagine a billion lines, if you convert it to an array, it will cost memory...
This way its just a linear time operation, the lines gets streamed into the db.
look here:
http://zaiste.net/2012/08/importing_json_into_mongodb/
However, if you think you need to do the conversion, just do it like this:
fs.readFile('my.json', function(e, text) {
var arrayLikeString = "[" + text.split('\n').join(',') + "]";
var array = JSON.parse(arrayLikeString);
})
To import an array of objects use this command:
mongoimport --db <db-name> --collection <coll-name> --type json --jsonArray --file seed.json
note the option: --jsonArray
Finally:
Take a look at this npm package, it looks very promisingly:
https://www.npmjs.com/package/jsonlines
I'm using Play (2.4) with reactive mongo. I'm trying to save following document using reactive mongo:
{
"networkStart" : 42540528726795050063891204319802818560,
"networkEnd" : 42540528726795654526801011634390171648,
"lat" : 36.0833,
"lon" : 140.116
}
using following code
val record = GeoIP(... networkStart, networkEnd, lat, lng ...)
val collection: JSONCollection = reactiveMongoApi.db.collection[JSONCollection]("mycolleciton")
collection.save(Json.toJson(record)).map{ r =>
Logger.error(s"Has err: ${r.hasErrors}")
}
but nothing happens. There is no document in mongo DB and there is no error log in logs. When i try to save record with lower numbers e.g. 16777216 in place of network* attributes everything works fine.
Same for searching. When i search using query e.g. {networkStart: {$lte #someNum#}} for #someNum# equals to very big integer i get exception [NoSuchElementException: JsError.get]. When i search for lower number i get correct results.
Am i managing big numbers incorrectly? How can i store them and retrieve using reactive mongo? When i try to insert document with big number manually directly into DB it works.
Edit
I managed to get validation error by debugging. It says:
(,List(ValidationError(List(List((,List(ValidationError(List(List((,List(ValidationError(List(List((,List(ValidationError(List(List((,List(ValidationError(List(unhandled json value: 85060714218195519117058029889198843855),WrappedArray()))))),WrappedArray()))))),WrappedArray()))))),WrappedArray()))))),WrappedArray())))
where the most interesting part is: List(unhandled json value: 85060714218195519117058029889198843855). But why?
I have imported a csv file into a collection,
My document saved like
"_id" : "ObjectID(53874d952f92e2af1a5f0afb)"
I was unable to query this
Can anyone help me to remove quotations for ObjectId
Lets assume your collection name is myCollection.
Do this :
db.myCollection.forEach(function(doc) {
var oldId = doc._id;
var newIdStr = doc._id.replace(/ObjectID\((\w+)\)/g,"\$1");
var newObjId = ObjectId(newIdStr);
doc._id = newObjId;
db.myCollection.save(doc);
db.myCollection.remove({_id:oldId});
});
CSV does not fully preserve the types of your fields and can present strings and integers (not Object_ids). If I were you, I would write an a parser in the most suitable language and convert your _id in objectIds there.
Another approach (if your _ids are not important) you can change _id in csv header to any other name and during the import mongo will create new ids, then you will go and remove your created field.
Next time you can use mongodump and mongorestore to preserve the types.
I have got mongo db called test and in this db two collections collection1 and collection1_backup.
How to replace content of collection1 with data from collection1_backup.
The best way to have done this (considering the name of the collection ends with _backup) is possibly to have used mongorestore: http://docs.mongodb.org/manual/reference/mongorestore/
However in this case it depends. If the collection is unsharded you can use renameCollection ( http://docs.mongodb.org/manual/reference/command/renameCollection/ ) or you can use a more manual method of (in JavaScript code):
db.collection1.drop(); // Drop entire other collection
db.collection1_backup.find().forEach(function(doc){
db.collection1.insert(doc); // start to replace
});
Those are the most common methods of doing this.
This can be done using simple command:
db.collection1_backup.aggregate([ { $match: {} }, { $out: "collection1" } ])
This command will remove all the documents of collection1 and then make a clone of collection1_backup in collection1.
Generic Command would be
db.<SOURCE_COLLECTION>.aggregate([ { $match: {} }, { $out: "<TARGET_COLLECTION>" } ])
If TARGET_COLLECTION does not exist, the above command will create it.
also usefull:
to export collection to json file
mongoexport --collection collection1_backup --out collection1.json
to import collection from json file
mongoimport --db test --collection collection1 --file collection1.json
to import single collection from backup/dump file one need to convert *.bson file to *.json
by using
bsondump collection1_backup.bson > collection1_backup.json
simply just do this.
//drop collection1
db.collection1.drop();
//copy data from collection1_backup to collection1
db.collection1.insert(db.collection1_backup.find({},{_id:0}).toArray());
Using Java Driver
Try below one:
public void copyTo(String db,String sourceCollection,String destinationCollection,int limit) throws
UnknownHostException {
MongoClient mongo = new MongoClient("localhost", 27017);
DB database = mongo.getDB(db);
DBCollection collection = database.getCollection(sourceCollection);
DBCursor dbCursor = collection.find().limit(limit);
List<DBObject> list = dbCursor.toArray();
DBCollection destination = database.getCollection(destinationCollection);
destination.insert(list, WriteConcern.NORMAL); //WRITE CONCERN is based on your requirment.
}
Better way would be to use .toArray()
db.collection1.drop(); // Drop entire other collection
// creates an array which can be accessed from "data"
db.collection1_backup.find().toArray(function(err, data) {
// creates a collection and inserting the array at once
db.collection1.insert(data);
});
You can use a simple command to Backup MongoDB Collection. It will work only on MongoDB 4.0 or earlier versions.
db.sourceCollectionName.copyTo('targetCollectionName')
Your targetCollectionName must be in Single(') or Double(") Quote
Note:
The db.collection.copyTo() method uses the eval command internally. As
a result, the db.collection.copyTo() operation takes a global lock
that blocks all other read and write operations until the
db.collection.copyTo() completes.
Drop collection1
then use this query
var cursor = db.collection1_backup.find();
var data = [];
while(cursor.hasNest()){
data.push(cursor.next());
}
db.collection1.insertMany(data)
I'm trying to use mongoimport to upsert data with string values in _id.
Since the ids look like integers (even though they're in quotes), mongoimport treats them as integers and creates new records instead of upserting the existing records.
Command I'm running:
mongoimport --host localhost --db database --collection my_collection --type csv --file mydata.csv --headerline --upsert
Example data in mydata.csv:
{ "_id" : "0364", someField: "value" }
The result would be for mongo to insert a record like this: { "_id" : 364, someField: "value" } instead of updating the record with _id "0364".
Does anyone know how to make it treat the _id as strings?
Things that don't work:
Surrounding the data with double double quotes ""0364"", double and single quotes "'0364'" or '"0364"'
Appending empty string to value: { "_id" : "0364" + "", someField: "value" }
Unfortunately there is not now a way to force number-like strings to be interpreted as strings:
https://jira.mongodb.org/browse/SERVER-3731
You could write a script in Python or some other language with which you're comfortable, along the lines of:
import csv, pymongo
connection = pymongo.Connection()
collection = connection.mydatabase.mycollection
reader = csv.DictReader(open('myfile.csv'))
for line in reader:
print '_id', line['_id']
upsert_fields = {
'_id': line['_id'],
'my_other_upsert_field': line['my_other_upsert_field']}
collection.update(upsert_fields, line, upsert=True, safe=True)
Just encountered this same issue and discovered an alternative. You can force Mongo to use string types for non-string values by converting your CSV to JSON and quoting the field. For example, if your CSV looks like this:
key value
123 foo
abc bar
Then you'll get an integer field for key 123 and a string field for key abc. If you convert that to JSON, making sure that all the keys are quoted, and then use --type json when you import, you'll end up with the desired behavior:
{
"123":"foo",
"abc":"bar"
}
I was able to prefix the numeric string and that worked for me. Example:
00012345 was imported as 12345 (Type Int)
string00012345 was imported as string00012345 (Type String)
My source was a SQL database so I just did
select 'string'+column as name
Of course, you also need to do a bit of post-processing to parse the string, but far less effort than converting a rather large tsv file to json.
I also added +1 to the jira link above for the enhancement.
As an alternative to #Jesse, you can do something similar in the mongo console, e.g.
db.my_collection.find().forEach(function (obj) {
db.my_collection.remove({_id: obj._id); // remove the old one
obj._id = '' + obj._id; // change to string
db.my_collection.save(obj); // resave
});
For non _id fields you can simply do:
db.my_collection.find().forEach(function (obj) {
obj.someField = '' + obj.someField; // change to string
db.my_collection.save(obj); // resave
});
I encountered the same issue.
I feel the simplest way is to convert the CSV file to a JSON file using an online tool and then import.
This is the tool I used:
http://www.convertcsv.com/csv-to-json.htm
It lets you wrap the integer values of your CSV file in double quotes for your JSON file.
If you have trouble importing this JSON file and encountering an error, just add --jsonArray to your import command. It will work for sure.
mongoimport --host localhost --db mydb -c mycollection --type json --jsonArray --file <file_path>