How to change delimiter from comma to # in a csv using MongoDB - mongodb

Can the delimiter be changed from comma to # while exporting records to csv file.
In the below example
mongoexport -d mydb -c coll --csv --fields "ProductId,ModerationStatus,Rating,TotalCommentCount" --out results.csv

Currently, mongoexport does not have this feature.
However, you can develop a simple JavaScript for doing this. So you have the control over the format of csv and field data types.
export.js
conn = new Mongo();
db = conn.getDB("myDB");
var cur = db.myCollection.find();
var obj;
while(cur.hasNext()){
obj = cur.next();
print("\""+obj._id+"\";\""+obj.field_1+"\";\""+obj.field_2+"\"");
}
Call this script from your OS shell:
mongo --quiet export.js > file_name.csv
--quiet: disables Mongo default option to print "version", "connecting to", etc, therefore the output of the script will be just things printed explicitly using print()

not a very clean solution but you can build a your own version of mongoexport adding inside the code "Comma" statement
https://github.com/mongodb/mongo-tools/blob/master/mongoexport/csv.go
https://golang.org/pkg/encoding/csv/#Writer
// WriteHeader writes a comma-delimited list of fields as the output header row.
func (csvExporter *CSVExportOutput) WriteHeader() error {
if !csvExporter.NoHeaderLine {
//here the trick
csvExporter.csvWriter.Comma = '|'
//
csvExporter.csvWriter.Write(csvExporter.Fields)
return csvExporter.csvWriter.Error()
}
return nil
}

Related

MongoDB - mongoimport duplicate _id in JSON array

I would like to ask you for help. I encountered a problem where, when I'm importing JSON into mongodb via compass, it throws a duplicate _id error. Therefore, I tried to go to the terminal and go through mongoimport, which runs successfully and informs me that each document was imported without error, but I see that the documents are missing. Can you give me some advice on how to solve this problem?
This is terminal command in windows cmd
mongoimport D:\DimplomaThesis_data\transfer_json\180000-190000.json -d diplomovka -c transfer --jsonArray --stopOnError --maintainInsertionOrder --upsertFields _id
This is structure of record in JSON array:
{
"_id":"5d6566d086dc8b72382bc376",
"name":"Peter",
"surname":"Zubrík",
"titles":{
"before":"",
"after":""
},
"sex":"M",
"citizenship":"SVK",
"birthyear":1991,
"age":31,
"transfer":{
"source_ppo":"tj-polana-siba.futbalnet.sk",
"org_profile_id":"sportovnik-klub-fc-mukarov.futbalnet.sk",
"org_id":"5d5d3974eccb8850917918cd",
"sector":{
"_id":"sport:futbal:futbal",
"category":"sport",
"itemId":"futbal",
"sectorId":"futbal"
},
"competence_type":"player",
"transfer_type":"transfer",
"issfMoveType":"PWP",
"date_from":"2014-05-09T00:00:00.000Z",
"date_to":null,
"_id":"62e6d12c0ae29819010f611f",
"org_profile_name":"Sportovník klub FC Mukařov",
"org_name":"Sportovník klub FC Mukařov",
"source_ppo_name":"TJ Poľana Šiba"
},
"issfId":"1208658"
}
"_id":"5d6566d086dc8b72382bc376" this could have multiple records in array same. I download data from APIs, around 30 JSON each contain 10.000 records. Ideally import all document to mongodb and next create pipeline in compass.
I found solution for my problem.
I need to use python for creating compound_id (new primary key - unique identifier for each record in array (json)).
this code work for me:
# Load the JSON data from the file
with open("250000-260000", "r", encoding="utf-8") as f:
data = json.load(f)
# Modify the data to include the compound_key and player_id fields
for doc in data:
doc["player_id"] = doc["_id"]
doc["compound_key"] = doc["player_id"] + "_" + doc["transfer"]["date_from"]
doc["_id"] = doc["compound_key"]
# Save the modified data to a new JSON file
with open("26.json", "w") as f:
json.dump(data, f, indent=2)
Basically I created new modify json file and this file I import through Mongo Compass where import finish with 0 error (error duplicate _id)

how do I import json generated by javascript into Mongodb

I need to get rid of the newline character that separates objects in a json file so that I can import properly into mongodb without having the objects in an array. What do I use in javascript to do this? I need my data in this format so that I can import:
{ name: "Widget 1", desc: "This is Widget 1" }
{ name: "Widget 2", desc: "This is Widget 2" }
The answer is, you dont have to convert the file to an array, mongoimport expects a "json-line" format like you already have.
This format is very good for performance, because you don't have to load it at once, instead mongo will take line by line. So imagine a billion lines, if you convert it to an array, it will cost memory...
This way its just a linear time operation, the lines gets streamed into the db.
look here:
http://zaiste.net/2012/08/importing_json_into_mongodb/
However, if you think you need to do the conversion, just do it like this:
fs.readFile('my.json', function(e, text) {
var arrayLikeString = "[" + text.split('\n').join(',') + "]";
var array = JSON.parse(arrayLikeString);
})
To import an array of objects use this command:
mongoimport --db <db-name> --collection <coll-name> --type json --jsonArray --file seed.json
note the option: --jsonArray
Finally:
Take a look at this npm package, it looks very promisingly:
https://www.npmjs.com/package/jsonlines

to print nested json array values into csv using MongoDB

I want to output the nested json array into csv.
sample.json
{
"DocId":"ABC",
"User":[
{
"Id":1234,
"Username":"sam1234",
"Name":"Sam",
"ShippingAddress":{
"Address1":"123 Main St.",
"Address2":null,
"City":"Durham",
"State":"NC"
}
},
{
"Id":5678,
"Username":"sam5678",
"Name":"Sam",
"ShippingAddress":{
"Address1":"5678 Main St.",
"Address2":null,
"City":"Durham",
"State":"NC"
}
}
]
}
enter code here
Above is the sample file, DocID must not be printed, and output in csv must be only for array contents
Id Username Name ShippingAddress
1234 sam1234 Sam 123 Main St.Durham NC
5678 sam5678 Sam 5678 Main St.Durham NC
How to print with headers, and with out headers in csv
One way to do is to do it in two steps
Perform aggregation on this collection and change the structure of the collection docs and output them in another collection
Use mongoexport to export the collection created in step 1 as CSV [This step can be used directly ^-^].
For step 1, Lets say I have db -> test and collection -> stack, so aggregation query is:
db.stack.aggregate([
{ $unwind:"$User"},
{ $project : { Id : "$User.Id" , Username:"$User.Username", Name:"$User.Name", ShippingAddress:"$User.ShippingAddress", _id:0} },
{ $out: "result" }
])
For step 2, use mongoexport terminal utility:
mongoexport --db test --collection result --csv --fields "Id,Username,Name,ShippingAddress" --out file.csv

mongo copy from one collection to another (on the same db)

I have got mongo db called test and in this db two collections collection1 and collection1_backup.
How to replace content of collection1 with data from collection1_backup.
The best way to have done this (considering the name of the collection ends with _backup) is possibly to have used mongorestore: http://docs.mongodb.org/manual/reference/mongorestore/
However in this case it depends. If the collection is unsharded you can use renameCollection ( http://docs.mongodb.org/manual/reference/command/renameCollection/ ) or you can use a more manual method of (in JavaScript code):
db.collection1.drop(); // Drop entire other collection
db.collection1_backup.find().forEach(function(doc){
db.collection1.insert(doc); // start to replace
});
Those are the most common methods of doing this.
This can be done using simple command:
db.collection1_backup.aggregate([ { $match: {} }, { $out: "collection1" } ])
This command will remove all the documents of collection1 and then make a clone of collection1_backup in collection1.
Generic Command would be
db.<SOURCE_COLLECTION>.aggregate([ { $match: {} }, { $out: "<TARGET_COLLECTION>" } ])
If TARGET_COLLECTION does not exist, the above command will create it.
also usefull:
to export collection to json file
mongoexport --collection collection1_backup --out collection1.json
to import collection from json file
mongoimport --db test --collection collection1 --file collection1.json
to import single collection from backup/dump file one need to convert *.bson file to *.json
by using
bsondump collection1_backup.bson > collection1_backup.json
simply just do this.
//drop collection1
db.collection1.drop();
//copy data from collection1_backup to collection1
db.collection1.insert(db.collection1_backup.find({},{_id:0}).toArray());
Using Java Driver
Try below one:
public void copyTo(String db,String sourceCollection,String destinationCollection,int limit) throws
UnknownHostException {
MongoClient mongo = new MongoClient("localhost", 27017);
DB database = mongo.getDB(db);
DBCollection collection = database.getCollection(sourceCollection);
DBCursor dbCursor = collection.find().limit(limit);
List<DBObject> list = dbCursor.toArray();
DBCollection destination = database.getCollection(destinationCollection);
destination.insert(list, WriteConcern.NORMAL); //WRITE CONCERN is based on your requirment.
}
Better way would be to use .toArray()
db.collection1.drop(); // Drop entire other collection
// creates an array which can be accessed from "data"
db.collection1_backup.find().toArray(function(err, data) {
// creates a collection and inserting the array at once
db.collection1.insert(data);
});
You can use a simple command to Backup MongoDB Collection. It will work only on MongoDB 4.0 or earlier versions.
db.sourceCollectionName.copyTo('targetCollectionName')
Your targetCollectionName must be in Single(') or Double(") Quote
Note:
The db.collection.copyTo() method uses the eval command internally. As
a result, the db.collection.copyTo() operation takes a global lock
that blocks all other read and write operations until the
db.collection.copyTo() completes.
Drop collection1
then use this query
var cursor = db.collection1_backup.find();
var data = [];
while(cursor.hasNest()){
data.push(cursor.next());
}
db.collection1.insertMany(data)

Mongoimport csv files with string _id and upsert

I'm trying to use mongoimport to upsert data with string values in _id.
Since the ids look like integers (even though they're in quotes), mongoimport treats them as integers and creates new records instead of upserting the existing records.
Command I'm running:
mongoimport --host localhost --db database --collection my_collection --type csv --file mydata.csv --headerline --upsert
Example data in mydata.csv:
{ "_id" : "0364", someField: "value" }
The result would be for mongo to insert a record like this: { "_id" : 364, someField: "value" } instead of updating the record with _id "0364".
Does anyone know how to make it treat the _id as strings?
Things that don't work:
Surrounding the data with double double quotes ""0364"", double and single quotes "'0364'" or '"0364"'
Appending empty string to value: { "_id" : "0364" + "", someField: "value" }
Unfortunately there is not now a way to force number-like strings to be interpreted as strings:
https://jira.mongodb.org/browse/SERVER-3731
You could write a script in Python or some other language with which you're comfortable, along the lines of:
import csv, pymongo
connection = pymongo.Connection()
collection = connection.mydatabase.mycollection
reader = csv.DictReader(open('myfile.csv'))
for line in reader:
print '_id', line['_id']
upsert_fields = {
'_id': line['_id'],
'my_other_upsert_field': line['my_other_upsert_field']}
collection.update(upsert_fields, line, upsert=True, safe=True)
Just encountered this same issue and discovered an alternative. You can force Mongo to use string types for non-string values by converting your CSV to JSON and quoting the field. For example, if your CSV looks like this:
key value
123 foo
abc bar
Then you'll get an integer field for key 123 and a string field for key abc. If you convert that to JSON, making sure that all the keys are quoted, and then use --type json when you import, you'll end up with the desired behavior:
{
"123":"foo",
"abc":"bar"
}
I was able to prefix the numeric string and that worked for me. Example:
00012345 was imported as 12345 (Type Int)
string00012345 was imported as string00012345 (Type String)
My source was a SQL database so I just did
select 'string'+column as name
Of course, you also need to do a bit of post-processing to parse the string, but far less effort than converting a rather large tsv file to json.
I also added +1 to the jira link above for the enhancement.
As an alternative to #Jesse, you can do something similar in the mongo console, e.g.
db.my_collection.find().forEach(function (obj) {
db.my_collection.remove({_id: obj._id); // remove the old one
obj._id = '' + obj._id; // change to string
db.my_collection.save(obj); // resave
});
For non _id fields you can simply do:
db.my_collection.find().forEach(function (obj) {
obj.someField = '' + obj.someField; // change to string
db.my_collection.save(obj); // resave
});
I encountered the same issue.
I feel the simplest way is to convert the CSV file to a JSON file using an online tool and then import.
This is the tool I used:
http://www.convertcsv.com/csv-to-json.htm
It lets you wrap the integer values of your CSV file in double quotes for your JSON file.
If you have trouble importing this JSON file and encountering an error, just add --jsonArray to your import command. It will work for sure.
mongoimport --host localhost --db mydb -c mycollection --type json --jsonArray --file <file_path>