mongoimport choosing field type - mongodb

When importing data from file (csv in my case) mongoimport automatically choose data type for each field.
Is it possible to choose data type manually for specific field?
I encountered situation, when in my file there are phone numbers, which I want and which I should treat as strings, but mongoimport (quite properly) treat those phone numbers as a numbers (NumberLong).

When importing CSV/TSV to mongodb, the option --columnsHaveTypes can help to define the columnstypes. But the document seems very unclear. I tried several times until finally did succeed.
You should add option --columnsHaveTypes and change every column after --fields and remember using "\" before "(" and ")".
for example, change:
mongoimport -h foohost -d bardb -c fooc --type tsv --fields col1,col2,col3 --file path/to/file.txt
into
mongoimport -h foohost -d bardb -c fooc --type tsv --fields col1.int32\(\),col2.double\(\),col3.string\(\) --columnsHaveTypes --file path/to/file.txt

What you can do is import these data using CSV and then run the update statement on the existing data in mongo db to convert it into the format that you want.

Now version 3.4 onward mongoimport supports specifying the field types explicitly while importing the data. See below link:
https://docs.mongodb.com/manual/reference/program/mongoimport/#cmdoption--columnsHaveTypes

See the Type Fidelity section in the documentation:
mongoimport and mongoexport do not reliably preserve all rich BSON
data types because JSON can only represent a subset of the types
supported by BSON. As a result, data exported or imported with these
tools may lose some measure of fidelity. See MongoDB Extended JSON for
more information.
Use mongodump and mongorestore to preserve types.

When I tried to import CSV into Mongo Atlas, I ran into a similar issue. Here's how I deal with it.
To avoid shell error you can enclose fields in double-quotes.
In the below example, I used two-column "Name, Barcode".You Can use whatever column you need also don't forget to update <connecttionString>,<collectionName>, <CSVpath> with your own values.
for more mongo types refer to mongoimport documentation.
mongoimport --uri <connecttionString> --collection <collectionName> --type csv --file <CSVpath> -f "Name.string(),Barcode.string()" --columnsHaveTypes

You can also choose to put the column types in a field file to make it easier. Just make sure you have specified all columns in your field file.
In my case, I named it "field.txt".
In the field file, you write the columns with their types this way: <column>.<type>. To get the list of all types used in the mongoimport syntax, please visit https://www.mongodb.com/docs/database-tools/mongoimport/
field.txt
name.string()
usercode.int64()
city.string()
town.string()
address.string()
price.decimal()
date_created.date_go(2021-08-10 15:04:05)
You can choose to name it anything you want as long as you point the fieldFile to it. eg. fieldFile=myfieldname.txt
mongoimport --uri <connectionString> --collection <collectionName> --type csv --file <csv path> --columnsHaveTypes --fieldFile=field.txt --mode=insert

Related

Import csv file into MongoDB with arrays from a column

I have an Excel file that I converted to a csv and imported into my running MongoDB storage, but there was trouble with one column of the data from the csv file. One column, called Room, occasionally, but not always, contains values separated by a comma (ex. "101, 103").
Running:
$ mongoimport -d mydb -c things --type csv --file locations.csv --headerline
gave no errors, but for documents that are supposed to have 2 values for Room, there was just one. For example "101, 102" became "101," in the db.
Is there an option for mongoimport that allows me to specify an array for a certain column?
First you need to Import the data from CSV as
$ mongoimport -d mydb -c things --type csv --file locations.csv --headerline
After that , you just have to use
db.things.find().snapshot().forEach(function (el) { el.Room = el.Room.split(','); db.things.save(el); });
So, It will solve your problem.

mongoexport CSV with out header fields

I have below in a shell script to export certain fields from a mongo collection to a CSV file.
mongoexport --host localhost --db mydb --collection ratings --csv > data.csv --fields userId,filmId,score
My problem is that the result is generated comes with the header values.
ex:
userId,filmId,score
517,533,5
518,534,5
Is there a way that I can generate the csv file with out the header fields?
The mongoexport utility is very spartan and does not support a load of features. Instead the intention is that you augment with other available OS commands or if you really must then create your own code for explicit needs.
But this sample using tail is quite simple to skip the first emitted header line when you consider that all output is going to STDOUT by default anyway:
mongoexport --host localhost --db mydb --collection ratings \
--fields userId,filmId,score \
| tail -n+2 > data.csv
So it is just "piping through" | the tail command with the -n+2 option, that basically says "skip the first line" and then you just redirect > output to the file you want.
Just like most command line utilities, there is no need to build in options that can be performed with other common utilities in such a chained pattern as above. That is why there is no such option built in.
Since version 3.4 you can add --noHeaderLine as option within the command.

Mongoimport to merge/upsert fields

I'm trying to import and merge multiple CSVs into mongo, however documents are getting replaced rather than merged.
For example, if I have one.csv:
key1, first column, second column
and two.csv:
key1, third column
I would like to end up with:
key1, first column, second column, third column
But instead I'm getting:
key1,third column
Currently I'm using:
mongoimport.exe --ftype csv --file first.csv --fields key,firstColumn,secondColumn
mongoimport.exe --ftype csv --file second.csv --fields key,thirdColumn --upsert --upsertFields key1
That's the way mongoimport works. There's an existing new feature request for merge imports, but for now, you'll have to write your own import to provide merge behavior.
cross-collection workaround: forEach method can be run on a dummy collection and the resulting doc objects used to search/update your desired collection:
mongoimport.exe --collection mycoll --ftype csv --file first.csv --fields key,firstColumn,secondColumn
mongoimport.exe --collection dummy --ftype csv --file second.csv --fields key,third
db.dummy.find().forEach(function(doc) {db.mycoll.update({key:doc.key},{$set:{thirdcol:doc.third}})})
That's correct, mongoimport --upsert updates full documents.
You may achieve your goal by importing to a temporary collection and using the following Gist.
Load the script to Mongo Shell and run:
mergeCollections("srcCollectionName", "destCollectionName", {}, ["thirdColl"]);
I just had a very similar problem. There is a node module for mongo and jline is my command line node tool for stream processing JSON lines. So:
echo '{"page":"index.html","hour":"2015-09-18T21:00:00Z","visitors":1001}' |\
jline-foreach \
'beg::dp=require("bluebird").promisifyAll(require("mongodb").MongoClient).connectAsync("mongodb://localhost:27017/nginx")' \
'dp.then(function(db){
updates = {}
updates["visitors.hour."+record.hour] = record.visitors;
db.collection("pagestats").update({_id:record.page},{$set:updates},{upsert:true});});' \
'end::dp.then(function(db){db.close()})'
In your case you'd have to convert from csv to JSON lines first by piping it through jline-csv2jl. That converts each CSV line into a dictionary with names taken from the header.
I have added this example to the manual: https://github.com/bitdivine/jline/blob/master/bin/foreach.md
I haven't used jline with promises much but so far it's OK.
Disclaimer: I am the author of jline.

MongoDB: mongoimport seems to ignore blank field value in first position of tsv file

mongodb version v1.8.0 on Mac 10.6.8.
I'm trying to run a mongoimport on a TSV (tab separated value) file.
For some reason, it's ignoring blank fields even though I'm not using the --ignoreBlanks switch. At least, I think that's what's happening.
You can download my test file here: http://pastebin.com/9XzbDfgP
Here's my mongoimport command:
mongoimport --drop --headerline --type tsv -d movies -c performances --file ~/Desktop/100performance.tsv
So what happens is it ends up importing the wrong fields into the wrong field names (headers). And it leaves off some of the fields. It's having trouble with blank fields. I populated some of those blank fields and it seemed to do better. That's not a real fix though, obviously.
Ideas?
This is working for me now using mongodb/mongoimport v2.0.1.

mongoDB mongoimport upsert

I'm trying to do a bulk update with the following
mongoimport -d my_db -c db_collection -upsertFields email ~/Desktop/update_list.csv
the csv that i'm trying to import looks like this.
email, full_name
stack#overflow.com,stackoverflow
mongo#db.com,mongodb
It should check the email column as a query arg and update the full name accordingly. However, none were imported, it encountered errors.
exception:Failure parsing JSON string near: abc#sa
abc#sasa.com,abc
imported 0 objects
encountered 99398 errors
Where is the problem? How should i be doing it?
Your mongoimport command is missing the --upsert option, which is needed in combination with --upsertFields. Try:
mongoimport -d my_db -c db_collection --upsert --upsertFields email ~/Desktop/update_list.csv
Add --type csv
Otherwise it assumes your input is json.
Also, looks like you should pass --headerline to make it use the first line of the file as a header.
I assume that the data inside your CSV file must be double-quoted.