Iterating over output of jq with a for loop in bash [duplicate] - mongodb

This question already has answers here:
How can I loop over the output of a shell command?
(4 answers)
Closed 2 years ago.
Hello fellow developers !
I'm quite having a hard time on this one, so your help would be good :)
I'm creating a script shell that will periodically (thanks to cron) do a synchronisation between an offline mongodb to firebase.
I use jq to treat the export of my mongodb, but this one is acting strangely on strings values containing whitespaces.
For example, I've got this exported document from a mongodb collection named users (using --jsonArray as mongoexport option), file named mycollectionexported.json:
[{"_id":{"$oid":"5fc253c493b9f7363c8d1011"},"emailVerified":true,"disabled":false,"isKickstarter":false,"isPremiun":false,"gender":"Male","nationality":"None","type":"admin","optMarketing":false,"optNewsletter":false,"cguValidated":true,"providers":[],"email":"test#reflect.com","displayName":"John Doe","phoneNumber":"0644080050","homeAddress":{"street":"rue Pasteur","number":24,"zipcode":94270,"city":"Le Kremlin-Bicêtre","country":"France"},"familyId":{"$oid":"5fc253c493b9f7363c8d1010"},"createdAt":{"$date":"2020-11-28T13:42:28.51Z"},"updatedAt":{"$date":"2020-11-28T13:42:28.51Z"},"__v":0}]
You can see that for keys like "displayName", "homeAddress.street" or "homeAddress.city", I've got whitespaces in the values.
When I try to store my newly exported collection into a classic variable using jq, and display it on the stdout, jq is acting quite strangely.
Script executed:
#!/bin/bash
documents=$(jq -c ".[]" ./mycollectionexported.json)
for document in $documents; do
echo ""
echo $document
done
Output:
{"_id":{"$oid":"5fc253c493b9f7363c8d1011"},"emailVerified":true,"disabled":false,"isKickstarter":false,"isPremiun":false,"gender":"Male","nationality":"None","type":"admin","optMarketing":false,"optNewsletter":false,"cguValidated":true,"providers":[],"email":"test#reflect.com","displayName":"John
Doe","phoneNumber":"0644080050","homeAddress":{"street":"rue
Pasteur","number":24,"zipcode":94270,"city":"Le
Kremlin-Bicêtre","country":"France"},"familyId":{"$oid":"5fc253c493b9f7363c8d1010"},"createdAt":{"$date":"2020-11-28T13:42:28.51Z"},"updatedAt":{"$date":"2020-11-28T13:42:28.51Z"},"__v":0}
Expected (no newlines):
{"_id":{"$oid":"5fc253c493b9f7363c8d1011"},"emailVerified":true,"disabled":false,"isKickstarter":false,"isPremiun":false,"gender":"Male","nationality":"None","type":"admin","optMarketing":false,"optNewsletter":false,"cguValidated":true,"providers":[],"email":"test#reflect.com","displayName":"John Doe","phoneNumber":"0644080050","homeAddress":{"street":"rue Pasteur","number":24,"zipcode":94270,"city":"Le Kremlin-Bicêtre","country":"France"},"familyId":{"$oid":"5fc253c493b9f7363c8d1010"},"createdAt":{"$date":"2020-11-28T13:42:28.51Z"},"updatedAt":{"$date":"2020-11-28T13:42:28.51Z"},"__v":0}
How is that possible ?
There is no such informations about that in the jq documentation.
When I do it with a json file containing no whitespaces in string values, it works as expected...
Thank you for your help and futures answers :)

If the output from jq is a single line, why not just:
echo "$documents"
Or, if the result of the call to jq is a stream that might have more than one item, and if you want each item in the stream to be available as a bash variable, then if your bash is sufficiently up-to-date, you could use mapfile (aka readarray); otherwise, you could consider using a bash while loop, e.g. along the lines of:
declare -a document
while read -r item ; do
document+=("$item")
done < <(jq -c ....)
The idea here is that invoking jq with the -c option ensures that the only raw newlines that jq emits will be the ones demarking the items in the stream.

Related

Variable not being recognized after "read"

-- Edit : Resolved. See answer.
Background:
I'm writing a shell that will perform some extra actions required on our system when someone resizes a database.
The shell is written in ksh (requirement), the OS is Solaris 5.10 .
The problem is with one of the checks, which verifies there's enough free space on the underlying OS.
Problem:
The check reads the df -k line for root, which is what I check in this step, and prints it to a file. I then "read" the contents into variables which I use in calculations.
Unfortunately, when I try to run an arithmetic operation on one of the variables, I get an error indicating it is null. And a debug output line I've placed after that line verifies that it is null... It lost it's value...
I've tried every method of doing this I could find online, they work when I run it manually, but not inside the shell file.
(* The file does have #!/usr/bin/ksh)
Code:
df -k | grep "rpool/ROOT" > dftest.out
RPOOL_NAME=""; declare -i TOTAL_SIZE=0; USED_SPACE=0; AVAILABLE_SPACE=0; AVAILABLE_PERCENT=0; RSIGN=""
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
This is the result:
DBResize.sh[11]: TOTAL_SIZE=/1024: syntax error
I'm pulling hairs at this point, any help would be appreciated.
The code you posted cannot produce the output you posted. Most obviously, the error is signalled at line 11 but you posted fewer than 11 lines of code. The previous lines may matter. Always post complete code when you ask for help.
More concretely, the declare command doesn't exist in ksh, it's a bash thing. You can achieve the same result with typeset (declare is a bash equivalent to typeset, but not all options are the same). Either you're executing this script with bash, or there's another error message about declare, or you've defined some additional commands including declare which may change the behavior of this code.
None of this should have an impact on the particular problem that you're posting about, however. The variables created by read remain assigned until the end of the subshell, i.e. until the code hits a ), the end of a pipe (left-hand side of the pipe only in ksh), etc.
About the use of declare or typeset, note that you're only declaring TOTAL_SIZE as an integer. For the other variables, you're just assigning a value which happens to consist exclusively of digits. It doesn't matter for the code you posted, but it's probably not what you meant.
One thing that may be happening is that grep matches nothing, and therefore read reads an empty line. You should check for errors. Use set -e in scripts to exit at the first error. (There are cases where set -e doesn't catch errors, but it's a good start.)
Another thing that may be happening is that df is splitting its output onto multiple lines because the first column containing the filesystem name is too large. To prevent this splitting, pass the option -P.
Using a temporary file is fragile: the code may be executed in a read-only directory, another process may want to access the same file at the same time... Here a temporary file is useless. Just pipe directly into read. In ksh (unlike most other sh variants including bash), the right-hand side of a pipe runs in the main shell, so assignments to variables in the right-hand side of a pipe remain available in the following commands.
It doesn't matter in this particular script, but you can use a variable without $ in an arithmetic expression. Using $ substitutes a string which can have confusing results, e.g. a='1+2'; $((a*3)) expands to 7. Not using $ uses the numerical value (in ksh, a='1+2'; $((a*3)) expands to 9; in some sh implementations you get an error because a's value is not numeric).
#!/usr/bin/ksh
set -e
typeset -i TOTAL_SIZE=0 USED_SPACE=0 AVAILABLE_SPACE=0 AVAILABLE_PERCENT=0
df -Pk | grep "rpool/ROOT" | read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=TOTAL_SIZE/1024))
Strange...when I get rid of your "declare" line, your original code seems to work perfectly well (at least with ksh on Linux)
The code :
#!/bin/ksh
df -k | grep "/home" > dftest.out
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
print $TOTAL_SIZE
The result :
32962416 5732492 25552588 19% /home
5598
Which are the value a simple df -k is returning. The variables seem to last.
For those interested, I have figured out that it is not possible to use "read" the way I was using it.
The variable values assigned by "read" simply "do not last".
To remedy this, I have applied the less than ideal solution of using the standard "while read" format, and inside the loop, echo selected variables into a variable file.
Once said file was created, I just "loaded" it.
(pseudo code:)
LOOP START
echo "VAR_A="$VAR_A"; VAR_B="$VAR_B";" > somefile.out
LOOP END
. somefile.out

mongoimport: set type for all fields when importing CSV

I have multiple problems with importing a CSV with mongoimport that has a headerline.
Following is the case:
I have a big CSV file with the names of the fields in the first line.
I know you can set this line to use as field names with: --headerline.
I want all field types to be strings, but mongoimport sets the types automatically to what it looks like.
IDs such as 0001 will be turn into 1, which can have bad side effects.
Unfortunately, there is (as far as i know) no way of setting them as string with a single command, but by naming each field and setting it type with
--columnsHaveTypes --fields "name.string(), ... "
When I did that, the next problem appeared.
The headerline (with all field names) got imported as values in a separate document.
So basically, my questions are:
Is there a way of setting all field types as string using the --headerline command ?
Alternative, is there a way to ignore the first line ?
I had this problem when uploading 41 million record CSV file into mongodb.
./mongoimport -d testdb -c testcollection --type csv --columnsHaveTypes -f
"RECEIVEDDATE.date(2006-01-02 15:04:05)" --file location/test.csv
As above we have a command to upload file with data types called '-f' or '--fields' but when we use this command to the file that contain header line, mondodb upload first row as well i.e header lines row then its leads error 'cannot convert to datatype' or upload column names also as data set.
Unfortunately we cannot use '--headerline' command instead of '--fields'.
Here the solutions that I found for this problem.
1)Remove header column and upload using '--fields' command as above command. if you re use linux environment you can use below command to remove first row of the huge file i.e header line.it took 2-3 mints for me.(depending on the machine performance)
sed -i -e "1d" location/test.csv
2)upload the file using '--headerline' command then mongodb uploads the file with its default identified data types.Then open mongodb shell command use testdb then run javascript command that get each record and change it into specific data types.But if you have huge file this will takes time.
found this solution from stackoverflow
db.testcollection.find().forEach( function (x) {
x.RECEIVEDDATE = new Date(x.RECEIVEDDATE ); db.testcollection .save(x);});
If you wanna remove the unnecessary rows that not fit to data type use below command.
mongodb document
'--parseGrace skipRow'
I found a solution, that I am comfortable with
Basically, I wanted to use mongoimport within my Clojure Code to import a CSV file in the DB and do a lot of stuff with it automatically. Due to the above mentioned problems I had to find a workaround, to delete this wrong document.
I did following to "solve" this problem:
To set the types as I wanted, I wrote a function to read the first line, put it in a vector and then used String concatenation to set these as fields.
Turning this: id,name,age,hometown,street
into this: id.string(),name.string(),age.string() etc
Then I used the values from the vector to identify the document with
{ name : "name"
age : "age"
etc : "etc" }
and then deleted it with a simple remving.find() command.
Hope this helps any dealing with the same kind of problem.
https://docs.mongodb.com/manual/reference/program/mongoimport/#example-csv-import-types reads:
MongoDB 3.4 added support for specifying field types. Specify field names and types in the form .() using --fields, --fieldFile, or --headerline.
so your first line within the csv file should have names with types. e.g.:
name.string(), ...
and the mongoimport parameters
--columnsHaveTypes --headerline --file <filename.csv>
As to the question of how to remove the first line, you can employ pipes. mongoimport reads from STDIN if no --file option passed. E.g.:
tail -n+2 <filename.csv> | mongoimport --columnsHaveTypes --fields "name.string(), ... "

cURL filling up an HTML form with tcl

I need to make a Tcl program that logs into a web page and i need to fill up some information and get some information.
The page has lots of forms with diferent types of input, radio/check buttons, entry strings etc the usual.
i can log into the page no problem and fill up the forms without a problem but i have to fill EVERY input for that particular form or else it will be save as empty (the things i didnt specify)
Heres an example:
this is the form:
--- FORM report. Uses POST to URL "/goform/FormUpdateBridgeConfiguration"
Input: NAME="management_ipaddr" (TEXT)
Input: NAME="management_mask" (TEXT)
Input: NAME="upstr_addr_type" VALUE="DHCP" (RADIO)
Input: NAME="upstr_addr_type" VALUE="STATIC" (RADIO)
--- end of FORM
and this is the command i use to fill it up
eval exec curl $params -d upstr_addr_type=STATIC https://$MIP/goform/FormUpdateBridgeConfiguration -o /dev/null 2> /dev/null
where params is:
"\--noproxy $MIP \--connect-timeout 5 \-m 5 \-k \-S \-s \-d \-L \-b Data/curl_cookie_file "
yes i know is horrible but it is what it is .
In this case i want to change the value of upstr_addr_type to STATIC but when i sumit it i lose the info from management_ipaddr and management_mask.
This is a small example, i have to do this for every form and a gizillion more variables so its a real problem for me.
i figure its concept problem or something like that, i look and look and look some more, try -F -X GET -GET -almost every thing on cURL manual, can someone guide me here
If you know what the values of management_ipaddr and management_mask should be, you can just supply them as extra -d arguments. It probably makes sense to wrap this in a procedure
proc UpdateBridgeConfiguration {management_ipaddr management_mask upstr_addr_type} {
global MIP params
eval exec curl $params \
-d management_ipaddr=$management_ipaddr \
-d management_mask=$management_mask \
-d upstr_addr_type=$upstr_addr_type \
"https://$MIP/goform/FormUpdateBridgeConfiguration" \
-o /dev/null 2> /dev/null
# You ought to replace the first line of the above call with:
# exec curl {*}$params \
# Provided you're not on Tcl 8.4 or before...
}
Like that, you'll find it much easier to get the call correct. (You shouldn't need to specify -X POST for this; it's default behaviour when -d is provided.)
To get the existing values, you'll need to GET them from the right URL (which I can't guess for you) and extract them from the resulting HTML. Which might involve using a regular expression against the retrieved document. This is pretty awful, but it's what you're stuck with sometimes. (You can use tDOM to parse HTML properly — provided it isn't too ill-formed — and then use its XPath support to query for the values correctly, but that's rather more complex and introduces a dependency on an external package.) Knowing what the right RE to use is can be tricky, but it is likely to involve grabbing a copy of the form and doing something vaguely like this:
regexp -nocase {<input type="text" name="management_ipaddr" value="([^<"">]*)"} $formSource -> management_ipaddr
regexp -nocase {<input type="text" name="management_mask" value="([^<"">]*)"} $formSource -> management_mask
While in general it could be encoded all sorts of ways, that's very unlikely for IP addresses or masks! On the other hand, the order of the attributes can vary; you have to customize your RE to what you're really dealing with, not merely what it might be…
The curl invokation will be something like
set formSource [exec curl "http://$MIP/the/right/url/here" {*}$params 2>/dev/null]
It's much simpler when you're not having to send data up and you want to consume the result.

updating table rows based on txt file

I have been searching but so far I only found how to insert date into tables based on a csv files.
I have the following scenario:
Directory name = ticketID
Inside this directory I have a couple of files, like:
Description.txt
Summary.txt - Contains ticket header and has been imported succefully.
Progress_#.txt - this is everytime a ticket gets udpdated. I get a new file.
Solution.txt
Importing the Issue.txt was easy since this was actually a CSV.
Now my problem is with Description and Progress files.
I need to update the existing rows with the data from this files. Something on the line of
update table_ticket set table_ticket.description = Description.txt where ticket_number = directoryname
I'm using PostgreSQL and the COPY command is valid for new data and it would still fail due to the ',;/ special chars.
I wanted to do this using bash script, but it seem that it is it won't be possible:
for i in `find . -type d`
do
update table_ticket
set table_ticket.description = $i/Description.txt
where ticket_number = $i
done
Of course the above code would take into consideration connection to the database.
Anyone has a idea on how I could achieve this using shell script. Or would it be better to just make something in Java and read and update the record, although I would like to avoid this approach.
Thanks
Alex
Thanks for the answer, but I came across this:
psql -U dbuser -h dbhost db
\set content = `cat PATH/Description.txt`
update table_ticket set description = :'content' where ticketnr = TICKETNR;
Putting this into a simple script I created the following:
#!/bin/bash
for i in `find . -type d|grep ^./CS`
do
p=`echo $i|cut -b3-12 -`
echo $p
sed s/PATH/${p}/g cmd.sql > cmd.tmp.sql
ticketnr=`echo $p|cut -b5-10 -`
sed -i s/TICKETNR/${ticketnr}/g cmd.tmp.sql
cat cmd.tmp.sql
psql -U supportAdmin -h localhost supportdb -f cmd.tmp.sql
done
The downside is that it will create always a new connection, later I'll change to create a single file
But it does exactly what I was looking for, putting the contents inside a single column.
psql can't read the file in for you directly unless you intend to store it as a large object in which case you can use lo_import. See the psql command \lo_import.
Update: #AlexandreAlves points out that you can actually slurp file content in using
\set myvar = `cat somefile`
then reference it as a psql variable with :'myvar'. Handy.
While it's possible to read the file in using the shell and feed it to psql it's going to be awkward at best as the shell offers neither a native PostgreSQL database driver with parameterised query support nor any text escaping functions. You'd have to roll your own string escaping.
Even then, you need to know that the text encoding of the input file is valid for your client_encoding otherwise you'll insert garbage and/or get errors. It quickly lands up being easier to do it in a langage with proper integration with PostgreSQL like Python, Perl, Ruby or Java.
There is a way to do what you want in bash if you really must, though: use Pg's delimited dollar quoting with a randomized delimiter to help prevent SQL injection attacks. It's not perfect but it's pretty darn close. I'm writing an example now.
Given problematic file:
$ cat > difficult.txt <__END__
Shell metacharacters like: $!(){}*?"'
SQL-significant characters like "'()
__END__
and sample table:
psql -c 'CREATE TABLE testfile(filecontent text not null);'
You can:
#!/bin/bash
filetoread=$1
sep=$(printf '%04x%04x\n' $RANDOM $RANDOM)
psql <<__END__
INSERT INTO testfile(filecontent) VALUES (
\$x${sep}\$$(cat ${filetoread})\$x${sep}\$
);
__END__
This could be a little hard to read and the random string generation is bash specific, though I'm sure there are probably portable approaches.
A random tag string consisting of alphanumeric characters (I used hex for convenience) is generated and stored in seq.
psql is then invoked with a here-document tag that isn't quoted. The lack of quoting is important, as <<'__END__' would tell bash not to interpret shell metacharacters within the string, wheras plain <<__END__ allows the shell to interpret them. We need the shell to interpret metacharacters as we need to substitute sep into the here document and also need to use $(...) (equivalent to backticks) to insert the file text. The x before each substitution of seq is there because here-document tags must be valid PostgreSQL identifiers so they must start with a letter not a number. There's an escaped dollar sign at the start and end of each tag because PostgreSQL dollar quotes are of the form $taghere$quoted text$taghere$.
So when the script is invoked as bash testscript.sh difficult.txt the here document lands up expanding into something like:
INSERT INTO testfile(filecontent) VALUES (
$x0a305c82$Shell metacharacters like: $!(){}*?"'
SQL-significant characters like "'()$x0a305c82$
);
where the tags vary each time, making SQL injection exploits that rely on prematurely ending the quoting difficult.
I still advise you to use a real scripting language, but this shows that it is indeed possible.
The best thing to do is to create a temporary table, COPY those from the files in question, and then run your updates.
Your secondary option would be to create a function in a language like pl/perlu and do this in the stored procedure, but you will lose a lot of performance optimizations that you can do when you update from a temp table.

Can kdb read from a named pipe?

I hope I'm doing something wrong, but it seems like kdb can't read data from named pipes (at least on Solaris). It blocks until they're written to but then returns none of the data that was written.
I can create a text file:
$ echo Mary had a little lamb > lamb.txt
and kdb will happily read it:
q) read0 `:/tmp/lamb.txt
enlist "Mary had a little lamb"
I can create a named pipe:
$ mkfifo lamb.pipe
and trying to read from it:
q) read0 `:/tmp/lamb.pipe
will cause kdb to block. Writing to the pipe:
$ cat lamb.txt > lamb.pipe
will cause kdb to return the empty list:
()
Can kdb read from named pipes? Should I just give up? I don't think it's a permissions thing (I tried setting -m 777 on my mkfifo command but that made no difference).
With release kdb+ v3.4 Q has support for named pipes: Depending on whether you want to implement a streaming algorithm or just read from the pipe use either .Q.fps or read1 on a fifo pipe:
To implement streaming you can do something like:
q).Q.fps[0N!]`:lamb.pipe
Then $ cat lamb.txt > lamb.pipe
will print
,"Mary had a little lamb"
in your q session. More meaningful algorithms can be implemented by replacing 0N! with an appropriate function.
To read the context of your file into a variable do:
q)h:hopen`:fifo://lamb.pipe
q)myText: `char$read1(h)
q)myText
"Mary had a little lamb\n"
See more about named pipes here.
when read0 fails, you can frequently fake it with system"cat ...". (i found this originally when trying to read stuff from /proc that also doesn't cooperate with read0.)
q)system"cat /tmp/lamb.pipe"
<blocks until you cat into the pipe in the other window>
"Mary had a little lamb"
q)
just be aware there's a reasonably high overhead (as such things go in q) for invoking system—it spawns a whole shell process just to run whatever your command is
you might also be able to do it directly with a custom C extension, probably calling read(2) directly…
The algorithm for read0 is not available to see what it is doing under the hood but, as far as I can tell, it expects a finite stream and not a continuous one; so it will will block until it receives an EOF signal.
Streaming from pipe is supported from v3.4
Details steps:
Check duplicated pipe file
rm -f /path/dataPipeFileName
Create named pipe
mkfifo /path/dataPipeFileName
Feed data
q).util.system[$1]; $1=command to fetch data > /path/dataPipeFileName &
Connect pipe using kdb .Q.fps
q).Q.fps[0N!]`$":/path/",dataPipeFileName;
Reference:
.Q.fps (streaming algorithm)
Syntax: .Q.fps[x;y] Where x is a unary function and y is a filepath
.Q.fs for pipes. (Since V3.4) Reads conveniently sized lumps of complete "\n" delimited records from a pipe and applies a function to each record. This enables you to implement a streaming algorithm to convert a large CSV file into an on-disk kdb+ database without holding the data in memory all at once.