mongoimport imports csv file disordered - mongodb

I have a 23k rows csv file. When i use mongoimport from shell or import from mongochef somehow it imports with wrong order.
for example i have
a;b;c;(header)
1;1;1;
2;2;2;
3;3;3;
csv file. when i import it from shell or mongochef and then .find() it result is ;
a|b|c
1;1;1;
3;3;3;
2;2;2;
any help would be great. Here is my shell command for import ;
mongoimport -d local -c test --type csv --file "C:\Program Files\MongoDB\Example Datasets\abc.csv" --headerline --ignoreBlanks

You could try using --maintainInsertionOrder option.
As the docs say:
If specified, mongoimport inserts the documents in the order of their
appearance in the input source, otherwise mongoimport may perform the
insertions in an arbitrary order
Also notice that the find default ordering is the natural order that doesn't guarantee returning results by the insertion order. So what I usually do is sorting by the _id field.

Related

Import csv file into MongoDB with arrays from a column

I have an Excel file that I converted to a csv and imported into my running MongoDB storage, but there was trouble with one column of the data from the csv file. One column, called Room, occasionally, but not always, contains values separated by a comma (ex. "101, 103").
Running:
$ mongoimport -d mydb -c things --type csv --file locations.csv --headerline
gave no errors, but for documents that are supposed to have 2 values for Room, there was just one. For example "101, 102" became "101," in the db.
Is there an option for mongoimport that allows me to specify an array for a certain column?
First you need to Import the data from CSV as
$ mongoimport -d mydb -c things --type csv --file locations.csv --headerline
After that , you just have to use
db.things.find().snapshot().forEach(function (el) { el.Room = el.Room.split(','); db.things.save(el); });
So, It will solve your problem.

mongoimport choosing field type

When importing data from file (csv in my case) mongoimport automatically choose data type for each field.
Is it possible to choose data type manually for specific field?
I encountered situation, when in my file there are phone numbers, which I want and which I should treat as strings, but mongoimport (quite properly) treat those phone numbers as a numbers (NumberLong).
When importing CSV/TSV to mongodb, the option --columnsHaveTypes can help to define the columnstypes. But the document seems very unclear. I tried several times until finally did succeed.
You should add option --columnsHaveTypes and change every column after --fields and remember using "\" before "(" and ")".
for example, change:
mongoimport -h foohost -d bardb -c fooc --type tsv --fields col1,col2,col3 --file path/to/file.txt
into
mongoimport -h foohost -d bardb -c fooc --type tsv --fields col1.int32\(\),col2.double\(\),col3.string\(\) --columnsHaveTypes --file path/to/file.txt
What you can do is import these data using CSV and then run the update statement on the existing data in mongo db to convert it into the format that you want.
Now version 3.4 onward mongoimport supports specifying the field types explicitly while importing the data. See below link:
https://docs.mongodb.com/manual/reference/program/mongoimport/#cmdoption--columnsHaveTypes
See the Type Fidelity section in the documentation:
mongoimport and mongoexport do not reliably preserve all rich BSON
data types because JSON can only represent a subset of the types
supported by BSON. As a result, data exported or imported with these
tools may lose some measure of fidelity. See MongoDB Extended JSON for
more information.
Use mongodump and mongorestore to preserve types.
When I tried to import CSV into Mongo Atlas, I ran into a similar issue. Here's how I deal with it.
To avoid shell error you can enclose fields in double-quotes.
In the below example, I used two-column "Name, Barcode".You Can use whatever column you need also don't forget to update <connecttionString>,<collectionName>, <CSVpath> with your own values.
for more mongo types refer to mongoimport documentation.
mongoimport --uri <connecttionString> --collection <collectionName> --type csv --file <CSVpath> -f "Name.string(),Barcode.string()" --columnsHaveTypes
You can also choose to put the column types in a field file to make it easier. Just make sure you have specified all columns in your field file.
In my case, I named it "field.txt".
In the field file, you write the columns with their types this way: <column>.<type>. To get the list of all types used in the mongoimport syntax, please visit https://www.mongodb.com/docs/database-tools/mongoimport/
field.txt
name.string()
usercode.int64()
city.string()
town.string()
address.string()
price.decimal()
date_created.date_go(2021-08-10 15:04:05)
You can choose to name it anything you want as long as you point the fieldFile to it. eg. fieldFile=myfieldname.txt
mongoimport --uri <connectionString> --collection <collectionName> --type csv --file <csv path> --columnsHaveTypes --fieldFile=field.txt --mode=insert

Mongoimport to merge/upsert fields

I'm trying to import and merge multiple CSVs into mongo, however documents are getting replaced rather than merged.
For example, if I have one.csv:
key1, first column, second column
and two.csv:
key1, third column
I would like to end up with:
key1, first column, second column, third column
But instead I'm getting:
key1,third column
Currently I'm using:
mongoimport.exe --ftype csv --file first.csv --fields key,firstColumn,secondColumn
mongoimport.exe --ftype csv --file second.csv --fields key,thirdColumn --upsert --upsertFields key1
That's the way mongoimport works. There's an existing new feature request for merge imports, but for now, you'll have to write your own import to provide merge behavior.
cross-collection workaround: forEach method can be run on a dummy collection and the resulting doc objects used to search/update your desired collection:
mongoimport.exe --collection mycoll --ftype csv --file first.csv --fields key,firstColumn,secondColumn
mongoimport.exe --collection dummy --ftype csv --file second.csv --fields key,third
db.dummy.find().forEach(function(doc) {db.mycoll.update({key:doc.key},{$set:{thirdcol:doc.third}})})
That's correct, mongoimport --upsert updates full documents.
You may achieve your goal by importing to a temporary collection and using the following Gist.
Load the script to Mongo Shell and run:
mergeCollections("srcCollectionName", "destCollectionName", {}, ["thirdColl"]);
I just had a very similar problem. There is a node module for mongo and jline is my command line node tool for stream processing JSON lines. So:
echo '{"page":"index.html","hour":"2015-09-18T21:00:00Z","visitors":1001}' |\
jline-foreach \
'beg::dp=require("bluebird").promisifyAll(require("mongodb").MongoClient).connectAsync("mongodb://localhost:27017/nginx")' \
'dp.then(function(db){
updates = {}
updates["visitors.hour."+record.hour] = record.visitors;
db.collection("pagestats").update({_id:record.page},{$set:updates},{upsert:true});});' \
'end::dp.then(function(db){db.close()})'
In your case you'd have to convert from csv to JSON lines first by piping it through jline-csv2jl. That converts each CSV line into a dictionary with names taken from the header.
I have added this example to the manual: https://github.com/bitdivine/jline/blob/master/bin/foreach.md
I haven't used jline with promises much but so far it's OK.
Disclaimer: I am the author of jline.

how to import csv file in mongodb

I have a csv file containing following data and want to import it in mongodb
ID;"AdmissionID";"SeatNo";"RegistrationNo";"ResultDate";"ResultStatusId"
1;12;"2323";"23";07-05-2013;1
2;23;"35";"32";10-05-2013;5
this data is to be imported to mongodb 2.2. I'm using following command:
mongoimport -d test -c exam --type csv --headerline <f:\exam.csv
when used i get following error
SyntaxError: missing ; before statement (shell):1
please help me to find out the error
This should do the trick easily. More HERE.
mongoimport -d mydb -c collectionName --type csv --file myfile.csv --headerline
Your problem is the <f:\exam.csv bit, which is not properly escaped by the way it looks
> --headerline
> If using “--type csv” or “--type tsv,” use the first line as field names. Otherwise, mongoimport will import the first line as a distinct
> document.
Please try this line of code
C:\Program Files\MongoDB\Server\3.2\bin>mongoimport -d pravin -c FOC --type csv
--file D:\Script\ImportData\FOC.csv --headerline
2016-08-10T15:42:38.685+0530 connected to: localhost`enter code here`
2016-08-10T15:42:38.758+0530 imported 13 documents
I solved by opening the .csv file in Excel (File -> Options -> advanced) then unchecking the box "uses system separation" and then removing the comma from the box below and then saving again in .csv.
So there will not be any commas in the .csv file and the formatting of JSON in MongoDB will be right.
Run mongoimport from the system command line, not the mongo shell.
Ref - https://docs.mongodb.com/manual/reference/program/mongoimport/

mongoDB mongoimport upsert

I'm trying to do a bulk update with the following
mongoimport -d my_db -c db_collection -upsertFields email ~/Desktop/update_list.csv
the csv that i'm trying to import looks like this.
email, full_name
stack#overflow.com,stackoverflow
mongo#db.com,mongodb
It should check the email column as a query arg and update the full name accordingly. However, none were imported, it encountered errors.
exception:Failure parsing JSON string near: abc#sa
abc#sasa.com,abc
imported 0 objects
encountered 99398 errors
Where is the problem? How should i be doing it?
Your mongoimport command is missing the --upsert option, which is needed in combination with --upsertFields. Try:
mongoimport -d my_db -c db_collection --upsert --upsertFields email ~/Desktop/update_list.csv
Add --type csv
Otherwise it assumes your input is json.
Also, looks like you should pass --headerline to make it use the first line of the file as a header.
I assume that the data inside your CSV file must be double-quoted.