How to remove delimiter from data in mongoexport - mongodb

I am executing this query to get data in comma separated file
mongoexport --host <IP> --db <database> --collection <collection> --type csv --fields _id,contactNo,longitude,latitude,city,state,locality,pinCode -q '{"updatedAt": {$gt: ISODate("2017-09-27T00:00:00.000Z")}}' --limit 5 --out data_sample.json
However the data itself contains commas. How do I remove comma from the data before writing it to file? If I use some other delimiter, then also there will always be a risk that it could come in the data. So, I want to replace all comma with blank and then load data to file.

The csv output of mongoexport encloses all those fields which have delimiter in data itself with double quotes. The csv can be loaded in Excel or any MySQL table by providing appropriate options of enclosed by '"'

Related

Import csv file into MongoDB with arrays from a column

I have an Excel file that I converted to a csv and imported into my running MongoDB storage, but there was trouble with one column of the data from the csv file. One column, called Room, occasionally, but not always, contains values separated by a comma (ex. "101, 103").
Running:
$ mongoimport -d mydb -c things --type csv --file locations.csv --headerline
gave no errors, but for documents that are supposed to have 2 values for Room, there was just one. For example "101, 102" became "101," in the db.
Is there an option for mongoimport that allows me to specify an array for a certain column?
First you need to Import the data from CSV as
$ mongoimport -d mydb -c things --type csv --file locations.csv --headerline
After that , you just have to use
db.things.find().snapshot().forEach(function (el) { el.Room = el.Room.split(','); db.things.save(el); });
So, It will solve your problem.

mongoimport choosing field type

When importing data from file (csv in my case) mongoimport automatically choose data type for each field.
Is it possible to choose data type manually for specific field?
I encountered situation, when in my file there are phone numbers, which I want and which I should treat as strings, but mongoimport (quite properly) treat those phone numbers as a numbers (NumberLong).
When importing CSV/TSV to mongodb, the option --columnsHaveTypes can help to define the columnstypes. But the document seems very unclear. I tried several times until finally did succeed.
You should add option --columnsHaveTypes and change every column after --fields and remember using "\" before "(" and ")".
for example, change:
mongoimport -h foohost -d bardb -c fooc --type tsv --fields col1,col2,col3 --file path/to/file.txt
into
mongoimport -h foohost -d bardb -c fooc --type tsv --fields col1.int32\(\),col2.double\(\),col3.string\(\) --columnsHaveTypes --file path/to/file.txt
What you can do is import these data using CSV and then run the update statement on the existing data in mongo db to convert it into the format that you want.
Now version 3.4 onward mongoimport supports specifying the field types explicitly while importing the data. See below link:
https://docs.mongodb.com/manual/reference/program/mongoimport/#cmdoption--columnsHaveTypes
See the Type Fidelity section in the documentation:
mongoimport and mongoexport do not reliably preserve all rich BSON
data types because JSON can only represent a subset of the types
supported by BSON. As a result, data exported or imported with these
tools may lose some measure of fidelity. See MongoDB Extended JSON for
more information.
Use mongodump and mongorestore to preserve types.
When I tried to import CSV into Mongo Atlas, I ran into a similar issue. Here's how I deal with it.
To avoid shell error you can enclose fields in double-quotes.
In the below example, I used two-column "Name, Barcode".You Can use whatever column you need also don't forget to update <connecttionString>,<collectionName>, <CSVpath> with your own values.
for more mongo types refer to mongoimport documentation.
mongoimport --uri <connecttionString> --collection <collectionName> --type csv --file <CSVpath> -f "Name.string(),Barcode.string()" --columnsHaveTypes
You can also choose to put the column types in a field file to make it easier. Just make sure you have specified all columns in your field file.
In my case, I named it "field.txt".
In the field file, you write the columns with their types this way: <column>.<type>. To get the list of all types used in the mongoimport syntax, please visit https://www.mongodb.com/docs/database-tools/mongoimport/
field.txt
name.string()
usercode.int64()
city.string()
town.string()
address.string()
price.decimal()
date_created.date_go(2021-08-10 15:04:05)
You can choose to name it anything you want as long as you point the fieldFile to it. eg. fieldFile=myfieldname.txt
mongoimport --uri <connectionString> --collection <collectionName> --type csv --file <csv path> --columnsHaveTypes --fieldFile=field.txt --mode=insert

MongoDB import error assertion 9998

I seem to keep having this error when i try and import anything?
In terminal I input:
name:~ computer$ mongoimport --db users --collection contacts --type csv --file /Users/computer/Desktop/ftse100.csv
connected to: 127.0.0.1
assertion: 9998 you need to specify fields
I wouldn't know what to ask. I tried adding --field after this command line but just get help information.
ER
As per mongodb docs
--fields <field1[,field2]>, -f
Specify a comma separated list of field names when importing csv or tsv files that do not have field names in the first (i.e. header) line of the file.
mongoimport --db users --collection contacts --type csv --file /Users/computer/Desktop/ftse100.csv --fields field1, field2,field3
As per your question, there is a typo it's not --field instead --fields
In 2.4.6, mongoimport does not find the header in csv files that I make, with or without double quote boundaries.
If I chop off the header line and supply that same text to the -f or --fields option, it my files import fine.
If you want to add all columns, use --headerline option instead of -fields.
In your case it would be:
mongoimport --db users --collection contacts --type csv --headerline --file /Users/computer/Desktop/ftse100.csv

How to define delimeter to import mongodb

I have a data collection, which is separated by | character. I am going to add the data collection to mongodb. So I need to separate data through | character. how my mongoimport command looks like?
Previously, I'm successfully import csv file through the following command.
$ mongoimport -d mydb -c things --type csv --file locations.csv --headerline
mongoimport supports either JSON, CSV (comma separated values) or TSV (tab separated values). The | character is not a valid delimiter for either CSV or TSV, so you will need to change your input files' | to , or a tab, and specify --type accordingly.
mongodb could actually treat a | seperated record in a .unl, .txt, .csv
Just make sure you do this in the format below. For the specified mentioned extensions use the --type csv:
mongoimport -c <table_name> -d <database_name> --mode upsert --file <filename> --type csv --headerline

Mongoimport to merge/upsert fields

I'm trying to import and merge multiple CSVs into mongo, however documents are getting replaced rather than merged.
For example, if I have one.csv:
key1, first column, second column
and two.csv:
key1, third column
I would like to end up with:
key1, first column, second column, third column
But instead I'm getting:
key1,third column
Currently I'm using:
mongoimport.exe --ftype csv --file first.csv --fields key,firstColumn,secondColumn
mongoimport.exe --ftype csv --file second.csv --fields key,thirdColumn --upsert --upsertFields key1
That's the way mongoimport works. There's an existing new feature request for merge imports, but for now, you'll have to write your own import to provide merge behavior.
cross-collection workaround: forEach method can be run on a dummy collection and the resulting doc objects used to search/update your desired collection:
mongoimport.exe --collection mycoll --ftype csv --file first.csv --fields key,firstColumn,secondColumn
mongoimport.exe --collection dummy --ftype csv --file second.csv --fields key,third
db.dummy.find().forEach(function(doc) {db.mycoll.update({key:doc.key},{$set:{thirdcol:doc.third}})})
That's correct, mongoimport --upsert updates full documents.
You may achieve your goal by importing to a temporary collection and using the following Gist.
Load the script to Mongo Shell and run:
mergeCollections("srcCollectionName", "destCollectionName", {}, ["thirdColl"]);
I just had a very similar problem. There is a node module for mongo and jline is my command line node tool for stream processing JSON lines. So:
echo '{"page":"index.html","hour":"2015-09-18T21:00:00Z","visitors":1001}' |\
jline-foreach \
'beg::dp=require("bluebird").promisifyAll(require("mongodb").MongoClient).connectAsync("mongodb://localhost:27017/nginx")' \
'dp.then(function(db){
updates = {}
updates["visitors.hour."+record.hour] = record.visitors;
db.collection("pagestats").update({_id:record.page},{$set:updates},{upsert:true});});' \
'end::dp.then(function(db){db.close()})'
In your case you'd have to convert from csv to JSON lines first by piping it through jline-csv2jl. That converts each CSV line into a dictionary with names taken from the header.
I have added this example to the manual: https://github.com/bitdivine/jline/blob/master/bin/foreach.md
I haven't used jline with promises much but so far it's OK.
Disclaimer: I am the author of jline.