Trouble with backslash in Tcl exec - postgresql

I am writing a data import script in Tcl (from SQL Server to Postgres) and have to call command line unix tr to scrub out null characters in a data file. I write the data to a temp file and then use exec to process the file through tr.
The tr call I would like Tcl to generate looks like this on the command line:
tr -d '\000' < blah >blah.notnull
The Tcl code I use to make the above is this, with $STATE(TMP) holding the temp file:
set ret [catch {exec tr -d '\\000' < $STATE(TMP) > $STATE(TMP).clean}]
However, sometimes this doesn't work and the PostgreSQL COPY fails because of x00 characters. If I run the command line version on the file, then COPY succeeds.
Could someone help me out understanding the exec call and quoting and backslashes? I am a bit stumped.
The error message, a reformatted version of the PG error:
Problem with COPY on blahblah: PGRES_FATAL_ERROR, ERROR: invalid byte sequence for encoding "UTF8": 0x00
Annoyingly, the Tcl exec code often works, but not always.
(We are hand rolling an import system using Tcl, Linux, BCP, SQL server, etc. beause all the off-the-shelf tools fail with the size of our data.)
Thanks to all who read or answer!

The thing is that Tcl doesn't ascribe any special meaning at all to single quotes. The equivalent in Tcl is braces, so use {\000} instead of '\000'. With what you wrote, you were sending three characters (a ', a NUL, and another ') in as that argument, and that causes all sorts of trouble since literal NUL characters don't go well as C strings.
Thus, you should be doing:
exec tr -d {\000} < blah >blah.notnull
or:
set ret [catch {
exec tr -d {\000} < $STATE(TMP) > $STATE(TMP).clean
}]
Tcl can also do that operation directly.
# Read binary data
set f [open $STATE(TMP) "rb"]
set data [read $f]
close $f
# Write transformed binary data
set f [open $STATE(TMP).clean "wb"]
puts -nonewline $f [string map [list \u0000 ""] $data]
close $f
[EDIT]: When the amount of data being transformed is large, it's better to do a bit at a time.
set fIn [open $STATE(TMP) "rb"]
set fOut [open $STATE(TMP).clean "wb"]
while true {
# 128kB chunk size; a bit arbitrary, but big enough to be OK
set data [read $fIn 131072]
# If we didn't read anything and instead got EOF, stop the loop
if {[eof $fIn]} break
puts -nonewline $fOut [string map [list \u0000 ""] $data]
}
close $fIn
close $fOut
You could also use a Tcl 8.6 channel transform to do the work and then fcopy to move things over, but there wouldn't be much difference in performance.

Related

Tcl Script OR Perl?

I wish to replace the following verilog code by using scripting.
assign x0 = in0 + in7;
I wish to search for the "+" sign above and replace the whole line with the line below:
KSA_32 U1(.A(in0), .B(in7), .Sum(x0));
any suggestion and sample script on this?
If your Verilog file is able to fit comfortably in memory, you can simply do:
# Read in the file
set f [open $verilogfile r]
set contents [read $f]
close $f
# Perform the transform across the whole contents
regsub -all {assign\s+(\w+)\s*=\s*(\w+)\s*\+\s*(\w+);} $contents \
{KSA_32 U1(.A(\2), .B(\3), .Sum(\1));} contents
# Write the results out to a new file (different filename so you can check the results by hand)
set f [open $verilogfile.new w]
puts -nonewline $f $contents
close $f
The first and third block are standard Tcl patterns for file manipulation. The second is a standard regular expression substitution, which I made by taking what you asked for and applying guesses for what are templates. Note that the literal + needs to be escaped, and spaces are best matched as \s+ or \s*.

Variable not being recognized after "read"

-- Edit : Resolved. See answer.
Background:
I'm writing a shell that will perform some extra actions required on our system when someone resizes a database.
The shell is written in ksh (requirement), the OS is Solaris 5.10 .
The problem is with one of the checks, which verifies there's enough free space on the underlying OS.
Problem:
The check reads the df -k line for root, which is what I check in this step, and prints it to a file. I then "read" the contents into variables which I use in calculations.
Unfortunately, when I try to run an arithmetic operation on one of the variables, I get an error indicating it is null. And a debug output line I've placed after that line verifies that it is null... It lost it's value...
I've tried every method of doing this I could find online, they work when I run it manually, but not inside the shell file.
(* The file does have #!/usr/bin/ksh)
Code:
df -k | grep "rpool/ROOT" > dftest.out
RPOOL_NAME=""; declare -i TOTAL_SIZE=0; USED_SPACE=0; AVAILABLE_SPACE=0; AVAILABLE_PERCENT=0; RSIGN=""
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
This is the result:
DBResize.sh[11]: TOTAL_SIZE=/1024: syntax error
I'm pulling hairs at this point, any help would be appreciated.
The code you posted cannot produce the output you posted. Most obviously, the error is signalled at line 11 but you posted fewer than 11 lines of code. The previous lines may matter. Always post complete code when you ask for help.
More concretely, the declare command doesn't exist in ksh, it's a bash thing. You can achieve the same result with typeset (declare is a bash equivalent to typeset, but not all options are the same). Either you're executing this script with bash, or there's another error message about declare, or you've defined some additional commands including declare which may change the behavior of this code.
None of this should have an impact on the particular problem that you're posting about, however. The variables created by read remain assigned until the end of the subshell, i.e. until the code hits a ), the end of a pipe (left-hand side of the pipe only in ksh), etc.
About the use of declare or typeset, note that you're only declaring TOTAL_SIZE as an integer. For the other variables, you're just assigning a value which happens to consist exclusively of digits. It doesn't matter for the code you posted, but it's probably not what you meant.
One thing that may be happening is that grep matches nothing, and therefore read reads an empty line. You should check for errors. Use set -e in scripts to exit at the first error. (There are cases where set -e doesn't catch errors, but it's a good start.)
Another thing that may be happening is that df is splitting its output onto multiple lines because the first column containing the filesystem name is too large. To prevent this splitting, pass the option -P.
Using a temporary file is fragile: the code may be executed in a read-only directory, another process may want to access the same file at the same time... Here a temporary file is useless. Just pipe directly into read. In ksh (unlike most other sh variants including bash), the right-hand side of a pipe runs in the main shell, so assignments to variables in the right-hand side of a pipe remain available in the following commands.
It doesn't matter in this particular script, but you can use a variable without $ in an arithmetic expression. Using $ substitutes a string which can have confusing results, e.g. a='1+2'; $((a*3)) expands to 7. Not using $ uses the numerical value (in ksh, a='1+2'; $((a*3)) expands to 9; in some sh implementations you get an error because a's value is not numeric).
#!/usr/bin/ksh
set -e
typeset -i TOTAL_SIZE=0 USED_SPACE=0 AVAILABLE_SPACE=0 AVAILABLE_PERCENT=0
df -Pk | grep "rpool/ROOT" | read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=TOTAL_SIZE/1024))
Strange...when I get rid of your "declare" line, your original code seems to work perfectly well (at least with ksh on Linux)
The code :
#!/bin/ksh
df -k | grep "/home" > dftest.out
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
print $TOTAL_SIZE
The result :
32962416 5732492 25552588 19% /home
5598
Which are the value a simple df -k is returning. The variables seem to last.
For those interested, I have figured out that it is not possible to use "read" the way I was using it.
The variable values assigned by "read" simply "do not last".
To remedy this, I have applied the less than ideal solution of using the standard "while read" format, and inside the loop, echo selected variables into a variable file.
Once said file was created, I just "loaded" it.
(pseudo code:)
LOOP START
echo "VAR_A="$VAR_A"; VAR_B="$VAR_B";" > somefile.out
LOOP END
. somefile.out

Sed command inside TCL script

Help me understand the syntax sed.I removed single quotes, but the code still does not work.
set id [open file.txt]
# send the request, get a lot of data
set tok [::http::geturl "http://example.com"-channel $id]
# cut out the necessary data between two words
exec sed s/{"data1":\(.*\)/data2\1/ $id
close $id
set ir [open file.txt]
set phone [read $ir]
close $ir
puts $phone
The problem is that I get data from a query of the following kind
{"id":3876,"form":"index","time":21,"data":"2529423","service":"Atere","response":"WAIT"}
The brace is an element of the syntax of the language, and I need to cut exactly the value between the word and the brace. How to implement this in a script.
Your code is rather confused, as (a) you are passing a file handle to the sed command. That's not going to work. (b) you are passing an input channel to http rather than an output channel (try opening the file for writing).
About the underlying problem.
If you are receiving basic JSON data back as shown.
a) You can use a JSON parser: tcllib's json module
b) Convert it to a form that Tcl can parse as a dictionary
# Assuming the JSON data is in the $data variable, and there's no
# other data present. This also assumes the data is very basic
# there are no embedded commas. Many assumptions means this
# code is likely to break in the future. A JSON parser would
# be a better choice.
set data "\{"
append data {"id":3876,"form":"index","time":21,"data":"2529423","service":"Atere","response":"WAIT"}
append data "\}"
regsub -all {[{}:",]} $data { } data
set mydatadict $data
puts [dict get $mydatadict id]
Edit:
For http processing:
set tok [::http::geturl "http://example.com"]
set data [::http::data $tok]
::http::cleanup $tok

Matlab fprint function with GrADS scripting

I am using Matlab to print a small text file (temp_script.exec) that will be used to run GrADS commands. The script looks like the following:
'reinit'
'open temp_ctl.ctl'
'set lon -100 -80'
'set lat 20 30'
'define prc = var'
'set sdfwrite data_out.nc'
'sdfwrite prc'
The script is called via cshell:
#!/bin/csh -f
grads -lbc << EOF
temp_script.exec
EOF
exit
The script seems to execute properly, but the output (data_out.nc) is not generated. Strangely, if I edit it using VI and replace the first character -- the single quotation before the command "reinit" -- by typing another single quotation, then re-run the script, data is generated properly.
My question is, what could be different? The scripts look identical in several different text editors, but the "modified" script (by typing) is 1 byte larger. I am using the "fprintf" function to generate the single quotes in Matlab. Could it be some problem with that function?
Thanks for reading.
To see if the files are really the same (the generated one and the one edited with vi):
od -c -t x1 temp_script.exec > temp_script.lis
od -c -t x1 vi_script.exec > vi_script.lis
diff exec_script.lis vi_script.lis
There could be a UNICODE BOM at the beginning of the file, or a missing newline at the end of file that is causing your issue.

updating table rows based on txt file

I have been searching but so far I only found how to insert date into tables based on a csv files.
I have the following scenario:
Directory name = ticketID
Inside this directory I have a couple of files, like:
Description.txt
Summary.txt - Contains ticket header and has been imported succefully.
Progress_#.txt - this is everytime a ticket gets udpdated. I get a new file.
Solution.txt
Importing the Issue.txt was easy since this was actually a CSV.
Now my problem is with Description and Progress files.
I need to update the existing rows with the data from this files. Something on the line of
update table_ticket set table_ticket.description = Description.txt where ticket_number = directoryname
I'm using PostgreSQL and the COPY command is valid for new data and it would still fail due to the ',;/ special chars.
I wanted to do this using bash script, but it seem that it is it won't be possible:
for i in `find . -type d`
do
update table_ticket
set table_ticket.description = $i/Description.txt
where ticket_number = $i
done
Of course the above code would take into consideration connection to the database.
Anyone has a idea on how I could achieve this using shell script. Or would it be better to just make something in Java and read and update the record, although I would like to avoid this approach.
Thanks
Alex
Thanks for the answer, but I came across this:
psql -U dbuser -h dbhost db
\set content = `cat PATH/Description.txt`
update table_ticket set description = :'content' where ticketnr = TICKETNR;
Putting this into a simple script I created the following:
#!/bin/bash
for i in `find . -type d|grep ^./CS`
do
p=`echo $i|cut -b3-12 -`
echo $p
sed s/PATH/${p}/g cmd.sql > cmd.tmp.sql
ticketnr=`echo $p|cut -b5-10 -`
sed -i s/TICKETNR/${ticketnr}/g cmd.tmp.sql
cat cmd.tmp.sql
psql -U supportAdmin -h localhost supportdb -f cmd.tmp.sql
done
The downside is that it will create always a new connection, later I'll change to create a single file
But it does exactly what I was looking for, putting the contents inside a single column.
psql can't read the file in for you directly unless you intend to store it as a large object in which case you can use lo_import. See the psql command \lo_import.
Update: #AlexandreAlves points out that you can actually slurp file content in using
\set myvar = `cat somefile`
then reference it as a psql variable with :'myvar'. Handy.
While it's possible to read the file in using the shell and feed it to psql it's going to be awkward at best as the shell offers neither a native PostgreSQL database driver with parameterised query support nor any text escaping functions. You'd have to roll your own string escaping.
Even then, you need to know that the text encoding of the input file is valid for your client_encoding otherwise you'll insert garbage and/or get errors. It quickly lands up being easier to do it in a langage with proper integration with PostgreSQL like Python, Perl, Ruby or Java.
There is a way to do what you want in bash if you really must, though: use Pg's delimited dollar quoting with a randomized delimiter to help prevent SQL injection attacks. It's not perfect but it's pretty darn close. I'm writing an example now.
Given problematic file:
$ cat > difficult.txt <__END__
Shell metacharacters like: $!(){}*?"'
SQL-significant characters like "'()
__END__
and sample table:
psql -c 'CREATE TABLE testfile(filecontent text not null);'
You can:
#!/bin/bash
filetoread=$1
sep=$(printf '%04x%04x\n' $RANDOM $RANDOM)
psql <<__END__
INSERT INTO testfile(filecontent) VALUES (
\$x${sep}\$$(cat ${filetoread})\$x${sep}\$
);
__END__
This could be a little hard to read and the random string generation is bash specific, though I'm sure there are probably portable approaches.
A random tag string consisting of alphanumeric characters (I used hex for convenience) is generated and stored in seq.
psql is then invoked with a here-document tag that isn't quoted. The lack of quoting is important, as <<'__END__' would tell bash not to interpret shell metacharacters within the string, wheras plain <<__END__ allows the shell to interpret them. We need the shell to interpret metacharacters as we need to substitute sep into the here document and also need to use $(...) (equivalent to backticks) to insert the file text. The x before each substitution of seq is there because here-document tags must be valid PostgreSQL identifiers so they must start with a letter not a number. There's an escaped dollar sign at the start and end of each tag because PostgreSQL dollar quotes are of the form $taghere$quoted text$taghere$.
So when the script is invoked as bash testscript.sh difficult.txt the here document lands up expanding into something like:
INSERT INTO testfile(filecontent) VALUES (
$x0a305c82$Shell metacharacters like: $!(){}*?"'
SQL-significant characters like "'()$x0a305c82$
);
where the tags vary each time, making SQL injection exploits that rely on prematurely ending the quoting difficult.
I still advise you to use a real scripting language, but this shows that it is indeed possible.
The best thing to do is to create a temporary table, COPY those from the files in question, and then run your updates.
Your secondary option would be to create a function in a language like pl/perlu and do this in the stored procedure, but you will lose a lot of performance optimizations that you can do when you update from a temp table.