mongoimport: append source field - mongodb

With mongoimport I import the data of several external instances.
Does mongoimport allow me to add a field like source:"where-the-data-comes-from" to each document which is imported?
I.e. if i import the data of server A and B, I would like to store source:"A" or source:"B" to each document.

No. However, you can do this from the command line. Create a file 'header.txt' containing, e.g., (you can create this from your existing csv) by running
cat <(head -1 test.csv | tr "," "\n") <(echo source-a) > header.txt
header.txt should look like this:
field_a
field_b
.......
source
*note I have appended a 'source' field to this document.
Now you can run the command (assuming you have sed installed)
sed 's/$/,source-a/' test.csv | mongoimport -d test-db -c test-cl --type csv --fieldFile header.txt
If you already have a header line in your document, run
sed '1d' test.csv | sed 's/$/,source-a/' | mongoimport -d test -c test --type csv --fieldFile header.txt instead - where 'source-a' is the label you want with this document.
You can easily script this in bash so that you only supply the source and csv for each import job.

Related

Skipping first column data in CSV while using copy command in PostgreSQL

I have a pipe delimited data file with no headers. I need to import data into a PostgreSQL table starting with the data from second column in the file i.e skip the data before the first '|' for each line. How do I achieve this using the COPY command?
Use the cut command to remove the first column and then import.
cut -d "|" -f 2- file1.csv > file2.csv
psql -d test -h localhost -c "\copy table(f1,f2,f3) from 'file2.csv' delimiter '|' csv header;"
Not an answer as such related to postgresql but more about command line tools.

How to use 'sed' to find and replace values within a tsv file?

I am currently working with a large .tsv.gz file that contains two columns that looks something like this:
xxxyyy 408261
yzlsdf 408260null408261
zlkajd 408258null408259null408260
asfzns 408260
What I'd like to do is find all the rows that contain "null" and replace it with a comma ",". So that the result would look like this:
xxxyyy 408261
yzlsdf 408260,408261
zlkajd 408258,408259,408260
asfzns 408260
I have tried using the following command but did not work:
sed -i 's/null/,/g' 46536657_1748327588_combined_copy.tsv.gz
Unzipping the file and trying it again also does not work with a tsv file.
I've also tried opening the unzipped file in a text editor to manually find and replace. But the file is too huge and would crash.
Try:
zcat comb.tsv.gz | sed 's/null/,/g' | gzip >new_comb.tsv.gz && mv new_comb.tsv.gz comb.tsv.gz
Because this avoids unzipping your file all at once, this should save on memory.
Example
Let's start with this sample file:
$ zcat comb.tsv.gz
xxxyyy 408261
yzlsdf 408260null408261
zlkajd 408258null408259null408260
asfzns 408260
Next, we run our command:
$ zcat comb.tsv.gz | sed 's/null/,/g' | gzip >new_comb.tsv.gz && mv new_comb.tsv.gz comb.tsv.gz
By looking at the output file, we can see that the substitutions were made:
$ zcat comb.tsv.gz
xxxyyy 408261
yzlsdf 408260,408261
zlkajd 408258,408259,408260
asfzns 408260

replace capture match with capture group in bash GNU sed

I've looked around to find a solution to my problem in other posts listed bellow, but it looks my regex is quit different and need special care:
How to output only captured groups with sed
Replace one capture group with another with GNU sed (macOS) 4.4
sed replace line with capture groups
I'm trying to replace a regex match group in big JSON file,
My file has mongoDB exported objects, and I'm trying to replace the objectId with the string:
{"_id":{"$oid":"56cad2ce0481320c111d2313"},"recordId":{"$oid":"56cad2ce0481320c111d2313"}}
So the output in the original file should look like this:
{"_id":"56cad2ce0481320c111d2313","recordId":"56cad2ce0481320c111d2313"}
That's the command I run in the shell:
sed -i 's/(?:{"\$oid":)("\w+")}/\$1/g' data.json
I get no error, but the file remains the same.
What exactly am I doing wrong?
Finally I've managed to make it work, the way regex works in bash is different then in regexr.com tester tool.
echo '{"$oid":"56cad2ce0481320c111d2313"}' | sed 's/{"$oid":\("\w*"\)}/\1/g'
gives the correct output:
"56cad2ce0481320c111d2313"
I found it even better to read from stdin and output to file, instead of writing first to JSON file, then read, replace and write again.
Since I use mongoexport to export collection, replace the objectId and write the output to JSON file, my final solution looks like this:
mongoexport --host localhost --db myDB --collection my_collection | sed 's/{"$oid":\\("\\w*"\\)}/\\1/g' >> data.json

How to copy a csv file from a url to Postgresql

Is there any way to use copy command for batch data import and read data from a url. For example, copy command has a syntax like :
COPY sample_table
FROM 'C:\tmp\sample_data.csv' DELIMITER ',' CSV HEADER;
What I want is not to give a local path but a url. Is there any way?
It's pretty straightforward, provided you have an appropriate command-line tool available:
COPY sample_table FROM PROGRAM 'curl "http://www.example.com/file.csv"'
Since you appear to be on Windows, I think you'll need to install curl or wget yourself. There is an example using wget on Windows here which may be useful.
My solution is
cat $file |
tail -$numberLine |
sed 's/ / ,/g' |
psql -q -d $dataBaseName -c "COPY tableName FROM STDIN DELIMITER ','"
You can insert a awk between sed and psql to add missing column.
Interesting if already you know what to put in the missing column.
awk '{print $0" , "'info_about_missing_column'"\n"}'
I have done that and it works and faster than INSERT.

Convert pipe delimited csv to tab delimited using batch script

I am trying to write a batch script that will query a Postgres database and output the results to a csv. Currently, it queries the database and saves the output as a pipe delimited csv.
I want the output to be tab delimited rather than pipe delimited, since I will eventually be importing the csv into Access. Does anyone know how this can be achieved?
Current code:
cd C:\Program Files\PostgreSQL\9.1\bin
psql -c "SELECT * from jivedw_day;" -U postgres -A -o sample.csv cscanalytics
postgres = username
cscanalytics = database
You should be using COPY to dump CSV:
psql -c "copy jivedw_day to stdout csv delimiter E'\t'" -o sample.csv -U postgres -d csvanalytics
The delimiter E'\t' part will get you your output with tabs instead of commas as the delimiter. There are other other options as well, please see the documentation for further details.
Using -A like you are just dumps the usual interactive output to sample.csv without the normal padding to making the columns line up, that's why you're seeing the pipes:
-A
--no-align
Switches to unaligned output mode. (The default output mode is otherwise aligned.)